Neural Networks¶
A neural network is defined through a collection of layers and represents a directed acyclic graph (DAG). Each layer has a name, a layer type, a list of input names, a list of output names, and a collection of parameters specific to the layer type.
The graph structure and connectivity of the neural network
is inferred from the input and output names.
A neural network starts with the layer
whose input name is equal to the value specified in
Model.description.input.name
,
and ends with the layer
whose output name is equal to the value specified in
Model.description.output.name
.
Layers must have unique input and output names,
and a layer may not have input or output names that
refer to layers that are not yet defined.
For CoreML specification version <=3, all inputs are mapped to static rank 5 tensors, with axis notations [Sequence, Batch, Channel, Height, Width].
From specification version 4 onwards (iOS >= 13, macOS >= 10.15), more options are available (see enums “NeuralNetworkMultiArrayShapeMapping”, “NeuralNetworkImageShapeMapping”) to map inputs to generic N-Dimensional (or N rank) tensors, where N >= 1.
Each layer type may have specific constraints on the ranks of its inputs and outputs.
Some of the layers (such as softmax, reduce, etc) have parameters that have been described in terms of notational axis “Channel”, “Height”, “Width” or “Sequence”. They can be re-interpreted easily in the general ND setting by using the following rule: “width” is same as axis = -1 (i.e. the last axis from the end) “height” is same as axis = -2 (i.e. the second last axis from the end) “channel” is same as axis = -3 (i.e. the third last axis from the end) “sequence” is same as axis = -5 (i.e. the fifth last axis from the end)
NeuralNetwork¶
A neural network.
message NeuralNetwork {
repeated NeuralNetworkLayer layers = 1;
repeated NeuralNetworkPreprocessing preprocessing = 2;
// use this enum value to determine the input tensor shapes to the neural network, for multiarray inputs
NeuralNetworkMultiArrayShapeMapping arrayInputShapeMapping = 5;
// use this enum value to determine the input tensor shapes to the neural network, for image inputs
NeuralNetworkImageShapeMapping imageInputShapeMapping = 6;
NetworkUpdateParameters updateParams = 10;
}
NeuralNetworkImageScaler¶
A neural network preprocessor that performs a scalar multiplication of an image followed by addition of scalar biases to the channels.
- Input: X
- An image in BGR or RGB format with shape
[3, H, W]
or in grayscale format with shape[1, H, W]
. - Output: Y
- An image with format and shape corresponding to the input.
If the input image is in BGR format:
Y[0, :, :] = channelScale * X[0, :, :] + blueBias
Y[1, :, :] = channelScale * X[1, :, :] + greenBias
Y[2, :, :] = channelScale * X[2, :, :] + redBias
If the input image is in RGB format:
Y[0, :, :] = channelScale * X[0, :, :] + redBias
Y[1, :, :] = channelScale * X[1, :, :] + greenBias
Y[2, :, :] = channelScale * X[2, :, :] + blueBias
If the input image is in grayscale format:
Y[0, :, :] = channelScale * X[0, :, :] + grayBias
message NeuralNetworkImageScaler {
float channelScale = 10;
float blueBias = 20;
float greenBias = 21;
float redBias = 22;
float grayBias = 30;
}
NeuralNetworkMeanImage¶
A neural network preprocessor that
subtracts the provided mean image from the input image.
The mean image is subtracted from the input named
NeuralNetworkPreprocessing.featureName
.
message NeuralNetworkMeanImage {
repeated float meanImage = 1;
}
NeuralNetworkPreprocessing¶
Preprocessing parameters for image inputs.
message NeuralNetworkPreprocessing {
string featureName = 1;
oneof preprocessor {
NeuralNetworkImageScaler scaler = 10;
NeuralNetworkMeanImage meanImage = 11;
}
}
ActivationReLU¶
A rectified linear unit (ReLU) activation function.
This function has the following formula:
message ActivationReLU {
}
ActivationLeakyReLU¶
A leaky rectified linear unit (ReLU) activation function.
This function has the following formula:
message ActivationLeakyReLU {
float alpha = 1; //negative slope value for leakyReLU
}
ActivationTanh¶
A hyperbolic tangent activation function.
This function has the following formula:
message ActivationTanh {
}
ActivationScaledTanh¶
A scaled hyperbolic tangent activation function.
This function has the following formula:
message ActivationScaledTanh {
float alpha = 1;
float beta = 2;
}
ActivationSigmoid¶
A sigmoid activation function.
This function has the following formula:
message ActivationSigmoid {
}
ActivationLinear¶
A linear activation function.
This function has the following formula:
message ActivationLinear {
float alpha = 1;
float beta = 2;
}
ActivationSigmoidHard¶
A hard sigmoid activation function.
This function has the following formula:
message ActivationSigmoidHard {
float alpha = 1;
float beta = 2;
}
ActivationPReLU¶
A parameterized rectified linear unit (PReLU) activation function. Input must be at least rank 3. Axis = -3 is denoted by “C”, or channels. “alpha” parameter can be a vector of length C.
This function has the following formula:
message ActivationPReLU {
// parameter of length C or 1.
// If length is 1, same value is used for all channels
WeightParams alpha = 1;
}
ActivationELU¶
An exponential linear unit (ELU) activation function.
This function has the following formula:
message ActivationELU {
float alpha = 1;
}
ActivationThresholdedReLU¶
A thresholded rectified linear unit (ReLU) activation function.
This function has the following formula:
message ActivationThresholdedReLU {
float alpha = 1;
}
ActivationSoftsign¶
A softsign activation function.
This function has the following formula:
message ActivationSoftsign {
}
ActivationSoftplus¶
A softplus activation function.
This function has the following formula:
message ActivationSoftplus {
}
ActivationParametricSoftplus¶
A parametric softplus activation function. Input must be at least rank 3. axis = -3 is denoted by “C”, or channels. “alpha”/”beta” parameter can be a vector of length C.
This function has the following formula:
message ActivationParametricSoftplus {
// If length is 1, same value is used for all channels
WeightParams alpha = 1; //parameter of length C or 1
WeightParams beta = 2; //parameter of length C or 1
}
ActivationParams¶
message ActivationParams {
oneof NonlinearityType {
ActivationLinear linear = 5;
ActivationReLU ReLU = 10;
ActivationLeakyReLU leakyReLU = 15;
ActivationThresholdedReLU thresholdedReLU = 20;
ActivationPReLU PReLU = 25;
ActivationTanh tanh = 30;
ActivationScaledTanh scaledTanh = 31;
ActivationSigmoid sigmoid = 40;
ActivationSigmoidHard sigmoidHard = 41;
ActivationELU ELU = 50;
ActivationSoftsign softsign = 60;
ActivationSoftplus softplus = 70;
ActivationParametricSoftplus parametricSoftplus = 71;
}
}
Tensor¶
Representation of the intermediate tensors
message Tensor {
// Number of dimensions in the tensor shape
uint32 rank = 1;
// actual value of the tensor shape.
// must be of length "rank". Can contain -1s for unknown dimensions.
repeated int64 dimValue = 2;
}
NeuralNetworkLayer¶
A single neural network layer.
message NeuralNetworkLayer {
string name = 1; //descriptive name of the layer
repeated string input = 2;
repeated string output = 3;
repeated Tensor inputTensor = 4; // must be the same length as the "input" field
repeated Tensor outputTensor = 5; // must be the same length as the "output" field
// Must be set to true to mark the layer as updatable.
// If true, the weightParams in the layer's properties must also be set to updatable
// If false, the value of the isUpdatable parameter within the layer's weights are ignored
bool isUpdatable = 10;
oneof layer {
// Start at 100 here
ConvolutionLayerParams convolution = 100;
PoolingLayerParams pooling = 120;
ActivationParams activation = 130;
InnerProductLayerParams innerProduct = 140;
EmbeddingLayerParams embedding = 150;
// Normalization related layers
BatchnormLayerParams batchnorm = 160;
MeanVarianceNormalizeLayerParams mvn = 165;
L2NormalizeLayerParams l2normalize = 170;
SoftmaxLayerParams softmax = 175;
LRNLayerParams lrn = 180;
CropLayerParams crop = 190;
PaddingLayerParams padding = 200;
UpsampleLayerParams upsample = 210;
ResizeBilinearLayerParams resizeBilinear = 211;
CropResizeLayerParams cropResize = 212;
UnaryFunctionLayerParams unary = 220;
// Elementwise operations
AddLayerParams add = 230;
MultiplyLayerParams multiply = 231;
AverageLayerParams average = 240;
ScaleLayerParams scale = 245;
BiasLayerParams bias = 250;
MaxLayerParams max = 260;
MinLayerParams min = 261;
DotProductLayerParams dot = 270;
ReduceLayerParams reduce = 280;
LoadConstantLayerParams loadConstant = 290;
// Data reorganization
ReshapeLayerParams reshape = 300;
FlattenLayerParams flatten = 301;
PermuteLayerParams permute = 310;
ConcatLayerParams concat = 320;
SplitLayerParams split = 330;
SequenceRepeatLayerParams sequenceRepeat = 340;
ReorganizeDataLayerParams reorganizeData = 345;
SliceLayerParams slice = 350;
// Recurrent Layers
SimpleRecurrentLayerParams simpleRecurrent = 400;
GRULayerParams gru = 410;
UniDirectionalLSTMLayerParams uniDirectionalLSTM = 420;
BiDirectionalLSTMLayerParams biDirectionalLSTM = 430;
// Custom (user-implemented) Layer
CustomLayerParams custom = 500;
// Following layers are available only after CoreML Specification version >= 4 (iOS >= 13, macOS >= 10.15)
// Control Flow related Layers
CopyLayerParams copy = 600;
BranchLayerParams branch = 605;
LoopLayerParams loop = 615;
LoopBreakLayerParams loopBreak = 620;
LoopContinueLayerParams loopContinue = 625;
RangeStaticLayerParams rangeStatic = 635;
RangeDynamicLayerParams rangeDynamic = 640;
// Elementwise Unary Layers
ClipLayerParams clip = 660;
CeilLayerParams ceil = 665;
FloorLayerParams floor = 670;
SignLayerParams sign = 680;
RoundLayerParams round = 685;
Exp2LayerParams exp2 = 700;
SinLayerParams sin = 710;
CosLayerParams cos = 715;
TanLayerParams tan = 720;
AsinLayerParams asin = 730;
AcosLayerParams acos = 735;
AtanLayerParams atan = 740;
SinhLayerParams sinh = 750;
CoshLayerParams cosh = 755;
TanhLayerParams tanh = 760;
AsinhLayerParams asinh = 770;
AcoshLayerParams acosh = 775;
AtanhLayerParams atanh = 780;
ErfLayerParams erf = 790;
GeluLayerParams gelu = 795;
// Elementwise Binary with Broadcasting Support
EqualLayerParams equal = 815;
NotEqualLayerParams notEqual = 820;
LessThanLayerParams lessThan = 825;
LessEqualLayerParams lessEqual = 827;
GreaterThanLayerParams greaterThan = 830;
GreaterEqualLayerParams greaterEqual = 832;
LogicalOrLayerParams logicalOr = 840;
LogicalXorLayerParams logicalXor = 845;
LogicalNotLayerParams logicalNot = 850;
LogicalAndLayerParams logicalAnd = 855;
ModBroadcastableLayerParams modBroadcastable = 865;
MinBroadcastableLayerParams minBroadcastable = 870;
MaxBroadcastableLayerParams maxBroadcastable = 875;
AddBroadcastableLayerParams addBroadcastable = 880;
PowBroadcastableLayerParams powBroadcastable = 885;
DivideBroadcastableLayerParams divideBroadcastable = 890;
FloorDivBroadcastableLayerParams floorDivBroadcastable = 895;
MultiplyBroadcastableLayerParams multiplyBroadcastable = 900;
SubtractBroadcastableLayerParams subtractBroadcastable = 905;
// Tensor Manipulations
TileLayerParams tile = 920;
StackLayerParams stack = 925;
GatherLayerParams gather = 930;
ScatterLayerParams scatter = 935;
GatherNDLayerParams gatherND = 940;
ScatterNDLayerParams scatterND = 945;
SoftmaxNDLayerParams softmaxND = 950;
GatherAlongAxisLayerParams gatherAlongAxis = 952;
ScatterAlongAxisLayerParams scatterAlongAxis = 954;
ReverseLayerParams reverse = 960;
ReverseSeqLayerParams reverseSeq = 965;
SplitNDLayerParams splitND = 975;
ConcatNDLayerParams concatND = 980;
TransposeLayerParams transpose = 985;
SliceStaticLayerParams sliceStatic = 995;
SliceDynamicLayerParams sliceDynamic = 1000;
SlidingWindowsLayerParams slidingWindows = 1005;
TopKLayerParams topK = 1015;
ArgMinLayerParams argMin = 1020;
ArgMaxLayerParams argMax = 1025;
EmbeddingNDLayerParams embeddingND = 1040;
BatchedMatMulLayerParams batchedMatmul = 1045;
// Tensor Allocation / Reshape sort of operations
GetShapeLayerParams getShape = 1065;
LoadConstantNDLayerParams loadConstantND = 1070;
FillLikeLayerParams fillLike = 1080;
FillStaticLayerParams fillStatic= 1085;
FillDynamicLayerParams fillDynamic = 1090;
BroadcastToLikeLayerParams broadcastToLike = 1100;
BroadcastToStaticLayerParams broadcastToStatic = 1105;
BroadcastToDynamicLayerParams broadcastToDynamic = 1110;
SqueezeLayerParams squeeze = 1120;
ExpandDimsLayerParams expandDims = 1125;
FlattenTo2DLayerParams flattenTo2D = 1130;
ReshapeLikeLayerParams reshapeLike = 1135;
ReshapeStaticLayerParams reshapeStatic = 1140;
ReshapeDynamicLayerParams reshapeDynamic = 1145;
RankPreservingReshapeLayerParams rankPreservingReshape = 1150;
// Random Distributions
RandomNormalLikeLayerParams randomNormalLike = 1170;
RandomNormalStaticLayerParams randomNormalStatic = 1175;
RandomNormalDynamicLayerParams randomNormalDynamic = 1180;
RandomUniformLikeLayerParams randomUniformLike = 1190;
RandomUniformStaticLayerParams randomUniformStatic = 1195;
RandomUniformDynamicLayerParams randomUniformDynamic = 1200;
RandomBernoulliLikeLayerParams randomBernoulliLike = 1210;
RandomBernoulliStaticLayerParams randomBernoulliStatic = 1215;
RandomBernoulliDynamicLayerParams randomBernoulliDynamic = 1220;
CategoricalDistributionLayerParams categoricalDistribution = 1230;
// Reduction related Layers:
ReduceL1LayerParams reduceL1 = 1250;
ReduceL2LayerParams reduceL2 = 1255;
ReduceMaxLayerParams reduceMax = 1260;
ReduceMinLayerParams reduceMin = 1265;
ReduceSumLayerParams reduceSum = 1270;
ReduceProdLayerParams reduceProd = 1275;
ReduceMeanLayerParams reduceMean = 1280;
ReduceLogSumLayerParams reduceLogSum = 1285;
ReduceSumSquareLayerParams reduceSumSquare = 1290;
ReduceLogSumExpLayerParams reduceLogSumExp = 1295;
// Masking / Selection Layers
WhereNonZeroLayerParams whereNonZero = 1313;
MatrixBandPartLayerParams matrixBandPart = 1315;
LowerTriangularLayerParams lowerTriangular = 1320;
UpperTriangularLayerParams upperTriangular = 1325;
WhereBroadcastableLayerParams whereBroadcastable = 1330;
// Normalization Layers
LayerNormalizationLayerParams layerNormalization = 1350;
}
}
BranchLayerParams¶
Branching Layer
A layer that providies the functionality of branching or an If-Else block.
Must have 1 input. There are no outputs as the execution is transferred to either the if or the else branch based on the value of the input.
Input is the condition predicate. Must be a scalar (length 1 tensor).
message BranchLayerParams {
NeuralNetwork ifBranch = 1;
NeuralNetwork elseBranch = 2;
}
LoopLayerParams¶
Loop Layer
A layer that providies the functionality of a “for” loop or a “while” loop.
There are either no inputs or 1 input. When an input is present, it corresponds to the maximum loop count, in that case the value of the “maxLoopIterations” field is ignored. Input must be a scalar. (For description below, maxLoopIterations is assumed to be the value of the input, when its present)
No outputs are produced. Blobs produced by the condition or the body network are visible in the scope of the overall network.
“conditionNetwork” must produce a tensor with the name specified in the “conditionVar” field.
There are 3 possible cases for determining the termination condition:
Case 1:
If there is no “conditionNetwork”, in this case the layer corresponds to a pure for loop, which is run “maxLoopIterations” number of times. Equivalent pseudo-code:
- for loopIterator = 0 : maxLoopIterations
- bodyNetwork()
Case 2:
“conditionNetwork” is present, and “maxLoopIterations” is 0 and there is no input, in this case the layer corresponds to a while loop. Equivalent pseudo-code:
conditionVar = conditionNetwork() while conditionVar:
bodyNetwork() conditionVar = conditionNetwork()
Case 3:
“conditionNetwork” is provided, and “maxLoopIterations” is positive or there is an input, in this case the layer corresponds to a while loop with a joint condition. Equivalent pseudo-code:
loopIterator = 0 conditionVar = conditionNetwork() while (conditionVar and loopIterator < maxLoopIterations):
bodyNetwork() loopIterator = loopIterator + 1 conditionVar = conditionNetwork()
message LoopLayerParams {
uint64 maxLoopIterations = 1;
string conditionVar = 2;
NeuralNetwork conditionNetwork = 3;
NeuralNetwork bodyNetwork = 4;
}
LoopBreakLayerParams¶
Loop break Layer
Terminate the loop that has this layer. If present, it should always reside in the “bodyNetwork” of the loop layer
No inputs/outptus
message LoopBreakLayerParams {
}
LoopContinueLayerParams¶
Loop Continue Layer
Stop the current loop iteration and continue on the next iteration. If present, it should always reside in the “bodyNetwork” of the loop layer
No inputs/outptus
message LoopContinueLayerParams {
}
CopyLayerParams¶
Copy Layer
A layer that copies its input tensor to the output tensor. Must have 1 input and 1 output, with distinct names. This is the only layer that is allowed to re-generate an output that is already present in the neural network prior to this layer, in which case it will overwrite the output tensor.
message CopyLayerParams {
}
GreaterThanLayerParams¶
GreaterThan Layer
Either 1 or 2 inputs. Produces 1 output. Perform elementwise greater than operation.
Output is 1.0f if the condition is true otherwise 0.0f.
y = x1 > x2
or
y = x1 > alpha, if only one input is provided
Broadcasting is supported.
message GreaterThanLayerParams {
float alpha = 2;
}
GreaterEqualLayerParams¶
GreaterEqual Layer
Either 1 or 2 inputs. Produces 1 output. Perform elementwise greater equal operation.
Output is 1.0f if the condition is true otherwise 0.0f.
y = x1 >= x2
or
y = x1 >= alpha, if only one input is provided
Broadcasting is supported.
message GreaterEqualLayerParams {
float alpha = 2;
}
LessThanLayerParams¶
LessThan Layer
Either 1 or 2 inputs. Produces 1 output. Perform elementwise less than operation.
Output is 1.0f if the condition is true otherwise 0.0f.
y = x1 < x2
or
y = x1 < alpha, if only one input is provided
Broadcasting is supported.
message LessThanLayerParams {
float alpha = 2;
}
LessEqualLayerParams¶
LessEqual Layer
Either 1 or 2 inputs. Produces 1 output. Perform elementwise less equal operation.
Output is 1.0f if the condition is true otherwise 0.0f.
y = x1 <= x2
or
y = x1 <= alpha, if only one input is provided
Broadcasting is supported.
message LessEqualLayerParams {
float alpha = 2;
}
EqualLayerParams¶
Equal Layer
Either 1 or 2 inputs. Produces 1 output. Perform elementwise equal operation.
Output is 1.0f if the condition is true otherwise 0.0f.
y = x1 == x2
or
y = x1 == alpha, if only one input is provided
Broadcasting is supported.
message EqualLayerParams {
float alpha = 1;
}
NotEqualLayerParams¶
NotEqual Layer
Either 1 or 2 inputs. Produces 1 output. Perform elementwise not equal operation.
Output is 1.0f if the condition is true otherwise 0.0f.
y = x1 != x2
or
y = x1 != alpha, if only one input is provided
Broadcasting is supported.
message NotEqualLayerParams {
float alpha = 1;
}
LogicalAndLayerParams¶
LogicalAnd Layer
Must have 2 inputs, produces 1 output. Perform elementwise logical AND operation.
Input is considered False if equal to 0.0f otherwise True. Output is 1.0f if the condition is true otherwise 0.0f.
y = AND(x1, x2)
Broadcasting is supported.
message LogicalAndLayerParams {
}
LogicalOrLayerParams¶
LogicalOr Layer
Must have 2 inputs, produces 1 output. Perform elementwise logical OR operation.
Input is considered False if equal to 0.0f otherwise True. Output is 1.0f if the condition is true otherwise 0.0f.
y = OR(x1, x2)
Broadcasting is supported.
message LogicalOrLayerParams {
}
LogicalXorLayerParams¶
LogicalXor Layer
Must have 2 inputs, produces 1 output. Perform elementwise logical XOR operation.
Input is considered False if equal to 0.0f otherwise True. Output is 1.0f if the condition is true otherwise 0.0f.
y = XOR(x1, x2)
Broadcasting is supported.
message LogicalXorLayerParams {
}
LogicalNotLayerParams¶
LogicalNot Layer
Must have 1 input, produces 1 output. Perform elementwise logical NOT operation.
Input is considered False if equal to 0.0f otherwise True. Output is 1.0f if the condition is true otherwise 0.0f.
y = NOT(x)
message LogicalNotLayerParams {
}
BorderAmounts¶
Specifies the amount of spatial border to be either padded or cropped.
For padding:
H_out = borderAmounts[0].startEdgeSize + H_in + borderAmounts[0].endEdgeSize
W_out = borderAmounts[1].startEdgeSize + W_in + borderAmounts[1].endEdgeSize
topPaddingAmount == Height startEdgeSize
bottomPaddingAmount == Height endEdgeSize
leftPaddingAmount == Width startEdgeSize
rightPaddingAmount == Width endEdgeSize
For cropping:
H_out = (-borderAmounts[0].startEdgeSize) + H_in + (-borderAmounts[0].endEdgeSize)
W_out = (-borderAmounts[1].startEdgeSize) + W_in + (-borderAmounts[1].endEdgeSize)
topCropAmount == Height startEdgeSize
bottomCropAmount == Height endEdgeSize
leftCropAmount == Width startEdgeSize
rightCropAmount == Width endEdgeSize
message BorderAmounts {
message EdgeSizes {
uint64 startEdgeSize = 1;
uint64 endEdgeSize = 2;
}
repeated EdgeSizes borderAmounts = 10;
}
BorderAmounts.EdgeSizes¶
message EdgeSizes {
uint64 startEdgeSize = 1;
uint64 endEdgeSize = 2;
}
ValidPadding¶
Specifies the type of padding to be used with Convolution/Deconvolution and Pooling layers.
After padding, input spatial shape: [H_in, W_in]
, gets modified to the
output spatial shape [H_out, W_out]
.
topPaddingAmount == Height startEdgeSize == borderAmounts[0].startEdgeSize
bottomPaddingAmount == Height endEdgeSize == borderAmounts[0].endEdgeSize
leftPaddingAmount == Width startEdgeSize == borderAmounts[1].startEdgeSize
rightPaddingAmount == Width endEdgeSize == borderAmounts[1].endEdgeSize
With Convolution or Pooling:
H_out = int_division_round_down((H_in + topPaddingAmount + bottomPaddingAmount - KernelSize[0]),stride[0]) + 1
which is same as:
H_out = int_division_round_up((H_in + topPaddingAmount + bottomPaddingAmount - KernelSize[0] + 1),stride[0])
With Deconvolution:
H_out = (H_in-1) * stride[0] + kernelSize[0] - (topPaddingAmount + bottomPaddingAmount)
The equivalent expressions hold true for W_out
as well.
By default, the values of paddingAmounts
are set to 0
,
which results in a “true” valid padding.
If non-zero values are provided for paddingAmounts
,
“valid” convolution/pooling is performed within the spatially expanded input.
message ValidPadding {
BorderAmounts paddingAmounts = 1;
}
SamePadding¶
Specifies the type of padding to be used with Convolution/Deconvolution and pooling layers.
After padding, input spatial shape: [H_in, W_in]
, gets modified to the
output spatial shape [H_out, W_out]
.
With Convolution or pooling:
H_out = int_division_round_up(H_in,stride[0])
W_out = int_division_round_up(W_in,stride[1])
This is achieved by using the following padding amounts:
totalPaddingHeight = max(0,(H_out-1) * stride[0] + KernelSize[0] - Hin)
totalPaddingWidth = max(0,(W_out-1) * stride[1] + KernelSize[1] - Win)
There are two modes of asymmetry:
BOTTOM_RIGHT_HEAVY
, and TOP_LEFT_HEAVY
.
If the mode is BOTTOM_RIGHT_HEAVY
:
topPaddingAmount = floor(totalPaddingHeight / 2)
bottomPaddingAmount = totalPaddingHeight - topPaddingAmount
leftPaddingAmount = floor(totalPaddingWidth / 2)
rightPaddingAmount = totalPaddingWidth - leftPaddingAmount
If the mode is TOP_LEFT_HEAVY
:
bottomPaddingAmount = floor(totalPaddingHeight / 2)
topPaddingAmount = totalPaddingHeight - bottomPaddingAmount
rightPaddingAmount = floor(totalPaddingWidth / 2)
leftPaddingAmount = totalPaddingWidth - rightPaddingAmount
With Deconvolution:
H_out = H_in * stride[0]
W_out = W_in * stride[1]
message SamePadding {
enum SamePaddingMode {
BOTTOM_RIGHT_HEAVY = 0;
TOP_LEFT_HEAVY = 1;
}
SamePaddingMode asymmetryMode = 1;
}
SamplingMode¶
Specifies how grid points are sampled from an interval.
Without the loss of generality, assume the interval to be [0, X-1] from which N points are to be sampled.
Here X may correspond to an input image’s height or width.
All the methods can be expressed in terms of numpy’s linspace function, along with the constraint that grid points have to lie in the interval [0, X-1].
Note: numpy.linspace(start = start, end = end, num = N, endpoint = True) corresponds to sampling
N points uniformly from the interval [start, end], endpoints included.
The methods vary in how the start
and end
values are computed.
message SamplingMode {
enum Method {
STRICT_ALIGN_ENDPOINTS_MODE = 0;
ALIGN_ENDPOINTS_MODE = 1;
UPSAMPLE_MODE = 2;
ROI_ALIGN_MODE = 3;
}
Method samplingMethod = 1;
}
BoxCoordinatesMode¶
Specifies the convention used to specify four bounding box coordinates for an image of size (Height, Width). The (0,0) coordinate corresponds to the top-left corner of the image.
message BoxCoordinatesMode {
enum Coordinates {
CORNERS_HEIGHT_FIRST = 0;
CORNERS_WIDTH_FIRST = 1;
CENTER_SIZE_HEIGHT_FIRST = 2;
CENTER_SIZE_WIDTH_FIRST = 3;
}
Coordinates boxMode = 1;
}
WeightParams¶
Weights for layer parameters. Weights are stored as repeated floating point numbers using row-major ordering and can represent 1-, 2-, 3-, or 4-dimensional data.
message WeightParams {
repeated float floatValue = 1;
bytes float16Value = 2;
bytes rawValue = 30;
QuantizationParams quantization = 40;
bool isUpdatable = 50;
}
QuantizationParams¶
Quantization parameters.
message QuantizationParams {
uint64 numberOfBits = 1;
oneof QuantizationType {
LinearQuantizationParams linearQuantization = 101;
LookUpTableQuantizationParams lookupTableQuantization = 102;
}
}
LinearQuantizationParams¶
message LinearQuantizationParams {
repeated float scale = 1;
repeated float bias = 2;
}
LookUpTableQuantizationParams¶
message LookUpTableQuantizationParams {
(2^numberOfBits) Elements.
repeated float floatValue = 1;
}
ConvolutionLayerParams¶
A layer that performs spatial convolution or deconvolution.
y = ConvolutionLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank greater than equal to 4. Rank 4 blob represents [Batch, channels, height, width] For ranks greater than 4, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- Output
- Rank is same as the input. e.g.: for rank 4 input, output shape is [B, C_out, H_out, W_out]
If dilationFactor
is not 1, effective kernel size is
modified as follows:
KernelSize[0] <-- (kernelSize[0]-1) * dilationFactor[0] + 1
KernelSize[1] <-- (kernelSize[1]-1) * dilationFactor[1] + 1
Type of padding can be valid
or same
. Output spatial dimensions depend on the
the type of padding. For details, refer to the descriptions of the messages “ValidPadding”
and “SamePadding”. Padded values are all zeros.
For Deconvolution, ConvolutionPaddingType
(valid
or same
) is ignored when outputShape
is set.
message ConvolutionLayerParams {
uint64 outputChannels = 1;
uint64 kernelChannels = 2;
uint64 nGroups = 10;
repeated uint64 kernelSize = 20;
repeated uint64 stride = 30;
repeated uint64 dilationFactor = 40;
oneof ConvolutionPaddingType {
ValidPadding valid = 50;
SamePadding same = 51;
}
bool isDeconvolution = 60;
bool hasBias = 70;
WeightParams weights = 90;
WeightParams bias = 91;
repeated uint64 outputShape = 100;
}
InnerProductLayerParams¶
A layer that performs a matrix-vector or matrix-matrix product. This is equivalent to a fully-connected, or dense layer. The weight parameters correspond to a matrix of dimensions (inputChannels, outputChannels) i.e. (C_in, C_out)
y = InnerProductLayer(x)
Requires 1 input and produces 1 output.
- Input
- Input can have rank 1 to rank 5. This is how it is reshaped in to the matrix (for rank > 1): rank 1 (x1) : in this case, the layer corresponds to a matrix-vector product. x1 must be equal to C_in rank 2 (x1, x2): x2 must be equal to C_in rank 3 (x1, x2, x3) –> (x1 * x2, x3). x3 must be equal to C_in rank 4 (x1, x2, x3, x4) —> (x1, x2 * x3 * x4). x2 * x3 * x4 must be equal to C_in rank 5 (x1, x2, x3, x4, x5) —> (x1 * x2, x3 * x4 * x5). x3 * x4 * x5 must be equal to C_in
- Output
- Output rank is same as the input rank rank 1: (C_out) rank 2: (x1, C_out) rank 3: (x1, x2, C_out) rank 4: (x1, C_out, 1, 1) rank 5: (x1, x2, C_out, 1, 1)
message InnerProductLayerParams {
uint64 inputChannels = 1;
uint64 outputChannels = 2;
bool hasBias = 10;
WeightParams weights = 20;
WeightParams bias = 21;
}
EmbeddingLayerParams¶
A layer that performs a matrix lookup and optionally adds a bias. The weights matrix is stored with dimensions [outputChannels, inputDim].
y = EmbeddingLayer(x)
Requires 1 input and produces 1 output.
- Input
Input values must be in the range
[0, inputDim - 1]
.Input must have rank equal to 4 or 5, such that the last 3 dimensions are all 1. rank 4: shape (x1, 1, 1, 1). x1 is effectively the batch/sequence length. rank 5: shape (x1, x2 , 1, 1, 1). x1 * x2 is effectively the combined batch/sequence length.
- Output
- Output rank is same as the input rank. Please see input description above. rank 4: shape (x1, outputChannels, 1, 1) rank 5: shape (x1, x2, outputChannels, 1, 1)
message EmbeddingLayerParams {
uint64 inputDim = 1;
uint64 outputChannels = 2;
bool hasBias = 10;
WeightParams weights = 20;
WeightParams bias = 21;
}
EmbeddingNDLayerParams¶
A layer that performs a matrix lookup and optionally adds a bias. The weights matrix is stored with dimensions [embeddingSize, vocabSize].
y = EmbeddingNDLayer(x)
Requires 1 input and produces 1 output.
- Input
- Input values must be in the range
[0, vocabSize - 1]
. Input must have rank at least 2. The last dimension must always be 1. rank 2: shape (x1, 1). x1 is the batch/sequence length. rank 3: shape (x1, x2, 1). x1 * x2 is effectively the combined batch/sequence length. rank 4: shape (x1, x2, x3, 1). x1 * x2 * x2 is effectively the combined batch/sequence length. rank 5: shape (x1, x2 , x3, x4, 1). x1 * x2 * x3 * x4 is effectively the combined batch/sequence length. - Output
- Output rank is same as the input rank. Please see input description above. rank 2: shape (x1, embeddingSize) rank 3: shape (x1, x2, embeddingSize) rank 4: shape (x1, x2, x3, embeddingSize) rank 5: shape (x1, x2, x3, x4, embeddingSize)
message EmbeddingNDLayerParams {
uint64 vocabSize = 1;
uint64 embeddingSize = 2;
bool hasBias = 3;
WeightParams weights = 20;
WeightParams bias = 21;
}
BatchnormLayerParams¶
A layer that performs batch normalization, which is performed along axis = -3, and repeated along the other axes, if present.
y = BatchnormLayer(x)
Requires 1 input and produces 1 output.
This operation is described by the following formula:
- Input
- A blob with rank greater than equal to 3. Example: Rank 4 blob represents [Batch, channels, height, width] For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- Output
- A blob with the same shape as the input.
message BatchnormLayerParams {
uint64 channels = 1;
bool computeMeanVar = 5;
bool instanceNormalization = 6;
float epsilon = 10;
WeightParams gamma=15;
WeightParams beta=16;
WeightParams mean=17;
WeightParams variance=18;
}
PoolingLayerParams¶
A spatial pooling layer.
y = PoolingLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank greater than equal to 4. Rank 4 blob represents [Batch, channels, height, width] For ranks greater than 4, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- Output
- Rank is same as the input. e.g.: for rank 4 input, output shape is [B, C, H_out, W_out]
Padding options are similar to ConvolutionLayerParams
with the additional option of ValidCompletePadding
(includeLastPixel
),
which ensures that the last application of the kernel
always includes the last pixel of the input image, if there is padding.
H_out = int_division_round_up((H_in + 2 * paddingAmounts[0] - kernelSize[0]),Stride[0]) + 1)
if (paddingAmounts[0] > 0 or paddingAmounts[1] > 0)
if ((H_out - 1) * Stride >= H_in + paddingAmounts[0]) {
H_out = H_out - 1
}
}
The equivalent expressions hold true for W_out
as well.
Only symmetric padding is supported with this option.
message PoolingLayerParams {
enum PoolingType{
MAX = 0;
AVERAGE = 1;
L2 = 2;
}
PoolingType type = 1;
repeated uint64 kernelSize = 10;
repeated uint64 stride = 20;
message ValidCompletePadding {
repeated uint64 paddingAmounts = 10;
}
oneof PoolingPaddingType {
ValidPadding valid = 30;
SamePadding same = 31;
ValidCompletePadding includeLastPixel = 32;
}
bool avgPoolExcludePadding = 50;
bool globalPooling = 60;
}
PoolingLayerParams.ValidCompletePadding¶
message ValidCompletePadding {
repeated uint64 paddingAmounts = 10;
}
PaddingLayerParams¶
A layer that performs padding along spatial dimensions.
y = PaddingLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank at least 3.
e.g.: blob with shape
[C, H_in, W_in]
. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch. - Output
- Same rank as the input.
e.g.: blob with shape
[C, H_out, W_out]
.
Output dimensions are calculated as follows:
H_out = H_in + topPaddingAmount + bottomPaddingAmount
W_out = W_in + leftPaddingAmount + rightPaddingAmount
topPaddingAmount == Height startEdgeSize == borderAmounts[0].startEdgeSize
bottomPaddingAmount == Height endEdgeSize == borderAmounts[0].endEdgeSize
leftPaddingAmount == Width startEdgeSize == borderAmounts[1].startEdgeSize
rightPaddingAmount == Width endEdgeSize == borderAmounts[1].endEdgeSize
There are three types of padding:
PaddingConstant
, which fills a constant value at the border.PaddingReflection
, which reflects the values at the border.PaddingReplication
, which replicates the values at the border.
Given the following input:
[1, 3, 4] : 1 2 3 4
5 6 7 8
9 10 11 12
Here is the output of applying the padding
(top=2, left=2, bottom=0, right=0)
with each of the supported types:
PaddingConstant
(value = 0
): .. code:[1, 5, 6] : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 0 0 5 6 7 8 0 0 9 10 11 12
PaddingReflection
: .. code:[1, 5, 6] : 11 10 9 10 11 12 7 6 5 6 7 8 3 2 1 2 3 4 7 6 5 6 7 8 11 10 9 10 11 12
PaddingReplication
: .. code:[1, 5, 6] : 1 1 1 2 3 4 1 1 1 2 3 4 1 1 1 2 3 4 5 5 5 6 7 8 9 9 9 10 11 12
message PaddingLayerParams {
message PaddingConstant {
float value = 1;
}
message PaddingReflection {
}
message PaddingReplication {
}
oneof PaddingType {
PaddingConstant constant = 1;
PaddingReflection reflection = 2;
PaddingReplication replication = 3;
}
BorderAmounts paddingAmounts = 10;
}
PaddingLayerParams.PaddingConstant¶
Fill a constant value in the padded region.
message PaddingConstant {
float value = 1;
}
PaddingLayerParams.PaddingReflection¶
Reflect the values at the border for padding.
message PaddingReflection {
}
PaddingLayerParams.PaddingReplication¶
Replicate the values at the border for padding.
message PaddingReplication {
}
ConcatLayerParams¶
A layer that concatenates along the axis = -3 or -5. For general concatenation along any axis, see ConcatNDLayer.
y = ConcatLayer(x1,x2,....)
Requires more than 1 input and produces 1 output.
- Input
- All input blobs must have same rank. If “sequenceConcat” = False, rank must be greater than equal to 3. In this case concatenation is along axis = -3 If “sequenceConcat” = True, rank must be greater than equal to 5. In this case concatenation is along axis = -5
- Output
- Same rank as the input.
message ConcatLayerParams {
bool sequenceConcat = 100;
}
LRNLayerParams¶
A layer that performs local response normalization (LRN).
y = LRNLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank greater than equal to 3. Example: Rank 4 blob represents [Batch, channels, height, width] For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- Output
- A blob with the same shape as the input.
This layer is described by the following formula:
where the summation is done over a (localSize, 1, 1)
neighborhood —
that is, over a window “across” channels in 1x1 spatial neighborhoods.
message LRNLayerParams {
float alpha = 1;
float beta = 2;
uint64 localSize = 3;
float k = 4;
}
SoftmaxLayerParams¶
Softmax Normalization Layer
A layer that performs softmax normalization. Normalization is applied along axis = -3 or N-3 (where N is the rank of the input) For softmax layer that can operate on any axis, see SoftmaxNDLayer.
y = SoftmaxLayer(x)
Requires 1 input and produces 1 output.
- Input
- Must be a blob with rank >= 3.
- Output
- A blob with the same shape as the input.
This layer is described by the following formula:
message SoftmaxLayerParams {
}
SplitLayerParams¶
A layer that uniformly splits across axis = -3 to produce a specified number of outputs. For general split operation along any axis, see SplitNDLayer.
(y1,y2,...yN) = SplitLayer(x), where N = nOutputs
Requires 1 input and produces multiple outputs.
- Input
- A blob with rank at least 3.
e.g.: blob with shape
[C, H, W]
- Output
nOutputs
blobs each with same rank as the input. e.g.: For input that is of shape[C, H, W]
, output shapes will be[C/nOutputs, H, W]
message SplitLayerParams {
uint64 nOutputs=1;
}
AddLayerParams¶
A layer that performs elementwise addition. This layer has limited broadcasting support. For general broadcasting see AddBroadcastableLayer.
y = AddLayer(x1,x2,...)
Requires 1 or more than 1 input and produces 1 output.
- Input
- In general, there are no rank constraints. However, only certain set of shapes are broadcastable. For example: [B, 1, 1, 1], [B, C, 1, 1], [B, 1, H, W], [B, C, H, W]
- Output
- A blob with shape equal to the input blob.
If only one input is provided, scalar addition is performed:
message AddLayerParams {
float alpha = 1;
}
MultiplyLayerParams¶
A layer that performs elementwise multiplication. This layer has limited broadcasting support. For general broadcasting see MultiplyBroadcastableLayer.
y = MultiplyLayer(x1,x2,...)
Requires 1 or more than 1 input and produces 1 output.
- Input
- In general, there are no rank constraints. However, only certain set of shapes are broadcastable. For example: [B, 1, 1, 1], [B, C, 1, 1], [B, 1, H, W], [B, C, H, W]
- Output
- A blob with shape equal to the first input blob.
If only one input is provided, scalar multiplication is performed:
message MultiplyLayerParams {
float alpha = 1;
}
UnaryFunctionLayerParams¶
A layer that applies a unary function.
y = UnaryFunctionLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with no rank constraints.
- Output
- A blob with the same shape as the input.
The input is first modified by shifting and scaling:
message UnaryFunctionLayerParams {
enum Operation{
SQRT = 0;
RSQRT = 1;
INVERSE = 2;
POWER = 3;
EXP = 4;
LOG = 5;
ABS = 6;
THRESHOLD = 7;
}
Operation type = 1;
float alpha = 2;
float epsilon = 3;
float shift = 4;
float scale = 5;
}
UpsampleLayerParams¶
A layer that scales up spatial dimensions. It supports two modes: nearest neighbour (default) and bilinear.
y = UpsampleLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank at least 3.
e.g.: blob with shape
[C, H, W]
. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch. - Output
- Same rank as the input.
e.g.: blob with shape
[C, scalingFactor[0] * H, scalingFactor[1] * W]
message UpsampleLayerParams {
repeated uint64 scalingFactor = 1;
enum InterpolationMode {
NN = 0;
BILINEAR = 1;
}
InterpolationMode mode = 5;
}
ResizeBilinearLayerParams¶
A layer that resizes the input to a pre-specified spatial size using bilinear interpolation.
y = ResizeBilinearLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank at least 3.
e.g.: blob with shape
[C, H_in, W_in]
. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch. - Output
- Same rank as the input.
e.g.: blob with shape
[C, H_out, W_out]
.
message ResizeBilinearLayerParams {
repeated uint64 targetSize = 1;
SamplingMode mode = 2;
}
CropResizeLayerParams¶
A layer that extracts cropped spatial patches or RoIs (regions of interest) from the input and resizes them to a pre-specified size using bilinear interpolation. Note that RoI Align layer can be implemented with this layer followed by a pooling layer.
y = CropResizeLayer(x)
Requires 2 inputs and produces 1 output.
- Input
There are two inputs. First input represents an image feature map. Second input represents the bounding box coordinates for N patches or RoIs (region of interest).
First input is rank 5: [1, Batch, C, H_in, W_in]. Second input is rank 5. Its shape can be either [N, 1, 4, 1, 1] or [N, 1, 5, 1, 1].
N: number of patches/RoIs to be extracted
- If RoI shape =
[N, 1, 4, 1, 1]
- The axis=-3 corresponds to the four coordinates specifying the bounding box. All the N RoIs are extracted from all the batches of the input.
- If RoI shape =
[N, 1, 5, 1, 1]
- The first element of the axis=-3 specifies the input batch id from which to extract the RoI and
- must be in the interval
[0, Batch - 1]
. That is, n-th RoI is extracted from the RoI[n,0,0,0,0]-th
input batch id. The last four elements of the axis=-3 specify the bounding box coordinates.
- If RoI shape =
- Output
- A blob with rank 5.
- Shape is [N, Batch, C, H_out, W_out] if input RoI shape is [N, 1, 4, 1, 1]
- Shape is [N, 1, C, H_out, W_out] if input RoI shape is [N, 1, 5, 1, 1]
message CropResizeLayerParams {
repeated uint64 targetSize = 1;
bool normalizedCoordinates = 2;
SamplingMode mode = 3;
BoxCoordinatesMode boxIndicesMode = 4;
float spatialScale = 5;
}
BiasLayerParams¶
A layer that performs elementwise addition of a bias, which is broadcasted to match the input shape.
y = BiasLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank at least 3.
e.g.: blob with shape
[C, H, W]
. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch. - Output
- A blob with the same shape as the input.
message BiasLayerParams {
repeated uint64 shape = 1;
WeightParams bias = 2;
}
ScaleLayerParams¶
A layer that performs elmentwise multiplication by a scale factor and optionally adds a bias; both the scale and bias are broadcasted to match the input shape.
y = ScaleLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank at least 3.
e.g.: blob with shape
[C, H, W]
. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch. - Output
- A blob with the same shape as the input.
message ScaleLayerParams {
repeated uint64 shapeScale = 1;
WeightParams scale = 2;
bool hasBias = 3;
repeated uint64 shapeBias = 4;
WeightParams bias = 5;
}
LoadConstantLayerParams¶
A layer that loads data as a parameter and provides it as an output. The output is rank 5. For general rank, see LoadConstantNDLayer.
y = LoadConstantLayer()
Takes no input. Produces 1 output.
- Input
- None
- Output:
- A blob with rank 5 and shape
[1, 1, C, H, W]
message LoadConstantLayerParams {
repeated uint64 shape = 1;
WeightParams data = 2;
}
L2NormalizeLayerParams¶
A layer that performs L2 normalization, i.e. divides by the the square root of the sum of squares of all elements of input.
y = L2NormalizeLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank greater than equal to 3. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- Output
- A blob with the same shape as the input.
This layer is described by the following formula:
message L2NormalizeLayerParams {
float epsilon = 1;
}
FlattenLayerParams¶
A layer that flattens the input.
y = FlattenLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank greater than equal to 3. e.g.: Rank 4 blob represents [Batch, C, H, W] For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- Output
- Same rank as the input, such that last two dimensions are both 1.
e.g.: For rank 4 input, output shape is
[Batch, C * H * W, 1, 1]
There are two X orders: CHANNEL_FIRST
and CHANNEL_LAST
.
CHANNEL_FIRST
does not require data to be rearranged,
because row major ordering is used by internal storage.
CHANNEL_LAST
requires data to be rearranged.
message FlattenLayerParams {
enum FlattenOrder {
CHANNEL_FIRST = 0;
CHANNEL_LAST = 1;
}
FlattenOrder mode = 1;
}
ReshapeLayerParams¶
A layer that recasts the input into a new shape.
y = ReshapeLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank 5.
e.g.:
[1, 1, C, H, W]
or[Seq, 1, C, H, W]
. - Output
- A blob with rank 5.
e.g.:
[1, 1, C_out, H_out, W_out]
or[Seq_out, 1, C_out, H_out, W_out]
.
There are two reshape orders: CHANNEL_FIRST
and CHANNEL_LAST
.
CHANNEL_FIRST
is equivalent to
flattening the input to [Seq, 1, C * H * W, 1, 1]
in channel first order
and then reshaping it to the target shape;
no data rearrangement is required.
CHANNEL_LAST
is equivalent to
flattening the input to [Seq, 1, H * W * C, 1, 1]
in channel last order,
reshaping it to [Seq_out, 1, H_out, W_out, C_out]
(it is now in “H_out-major”” order),
and then permuting it to [C_out, H_out, W_out]
;
both the flattening and permuting requires the data to be rearranged.
message ReshapeLayerParams {
repeated int64 targetShape = 1;
enum ReshapeOrder {
CHANNEL_FIRST = 0;
CHANNEL_LAST = 1;
}
ReshapeOrder mode = 2;
}
PermuteLayerParams¶
A layer that rearranges the dimensions and data of an input. For generic transpose/permute operation see TransposeLayer.
y = PermuteLayer(x)
Requires 1 input and produces 1 output.
- Input
- Must be a rank 5 blob.
e.g.: shape
[Seq, B, C, H, W]
. - Output
- Rank 5 blob. Tranposed version of the input, such that dimensions at axis=1 or axis=-4 is unchanged.
Examples:
Assume input shape is [Seq, B, C, H, W]
- If
axis
is set to[0, 3, 1, 2]
, then the output has shape[Seq, B, W, C, H]
- If
axis
is set to[3, 1, 2, 0]
, then the output has shape[W, B, C, H, Seq]
- If
axis
is set to[0, 3, 2, 1]
, then the output has shape[Seq, B, W, H, C]
- If
axis
is not set, or is set to[0, 1, 2, 3]
, the output is the same as the input.
message PermuteLayerParams {
repeated uint64 axis = 1;
}
ReorganizeDataLayerParams¶
A layer that reorganizes data in the input in specific ways.
y = ReorganizeDataLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank at least 3.
e.g.: blob with shape
[C, H, W]
. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch. - Output
- Same rank as the input.
e.g.: blob with shape
[C_out, H_out, W_out]
. - mode == SPACE_TO_DEPTH
[C_out, H_out, W_out]
:[C * blockSize * blockSize, H/blockSize, W/blockSize]
. blockSize must divide H and W. Data is moved from the spatial dimensions to the channel dimension. Input is spatially divided into non-overlapping blocks of size blockSize X blockSize and data from each block is moved into the channel dimension.- mode == DEPTH_TO_SPACE
[C_out, H_out, W_out]
:[C/(blockSize * blockSize), H * blockSize, W * blockSize]
. Square of blockSize must divide C. Reverse of SPACE_TO_DEPTH. Data is moved from the channel dimension to the spatial dimensions.
message ReorganizeDataLayerParams {
enum ReorganizationType {
SPACE_TO_DEPTH = 0;
DEPTH_TO_SPACE = 1;
}
ReorganizationType mode = 1;
uint64 blockSize = 2;
}
SliceLayerParams¶
A layer that slices the input data along axis = -1 or -2 or -3. For general slice along any axis, please see SliceStaticLayer/SliceDynamicLayer.
y = SliceLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob that can, in general, have any rank. However, depending on the value of “axis” , there may be additional rank constraints.
- Output
- A blob with the same rank as the input.
Sliced section is taken from the interval [startIndex, endIndex)
, i.e.
startIndex is inclusive while endIndex is exclusive.
stride must be positive and represents the step size for slicing.
Negative indexing is supported for startIndex and endIndex.
-1 denotes N-1, -2 denotes N-2 and so on, where N is the length of the dimension to be sliced.
message SliceLayerParams {
int64 startIndex = 1;
int64 endIndex = 2;
uint64 stride = 3;
enum SliceAxis {
CHANNEL_AXIS = 0;
HEIGHT_AXIS = 1;
WIDTH_AXIS = 2;
}
// The following mapping is used for interpreting this parameter:
// CHANNEL_AXIS => axis = -3, input must have rank at least 3.
// HEIGHT_AXIS => axis = -2, input must have rank at least 2.
// WIDTH_AXIS => axis = -1
SliceAxis axis = 4;
}
ReduceLayerParams¶
A layer that reduces the input using a specified operation.
y = ReduceLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob that can, in general, have any rank. However, depending on the value of “axis” ,
- there may be additional rank constraints.
- Output
A blob with the same rank as the input, which has 1s on the dimensions specififed in the parameter “axis”
Values supported for axis are [-1], [-2], [-3], [-2,-1], [-3,-2,-1] and the equivalent positive values (depending on the rank of the input) For mode == ‘ArgMax’, axis must be [-1] or [-2] or [-3].
message ReduceLayerParams {
enum ReduceOperation {
SUM = 0;
AVG = 1;
PROD = 2;
LOGSUM = 3;
SUMSQUARE = 4;
L1 = 5;
L2 = 6;
MAX = 7;
MIN = 8;
ARGMAX = 9;
}
ReduceOperation mode = 1;
float epsilon = 2;
enum ReduceAxis {
CHW = 0;
HW = 1;
C = 2;
H = 3;
W = 4;
}
// The following mapping is used for interpreting this parameter:
// CHW = axis [-3, -2, -1], input must have rank at least 3.
// HW = axis [-2, -1], input must have rank at least 2.
// C = axis [-3]
// H = axis [-2]
// W = axis [-1]
ReduceAxis axis = 3;
}
CropLayerParams¶
A layer that crops the spatial dimensions of an input. If two inputs are provided, the shape of the second input is used as the reference shape.
y = CropLayer(x1) or y = CropLayer(x1,x2)
Requires 1 or 2 inputs and produces 1 output.
- Input
1 or 2 tensors, each with rank at least 3, both inputs must have equal rank. Example:
- 1 input case: A blob with shape
[C, H_in, W_in]
. - 2 input case: 1st blob with shape
[C, H_in, W_in]
, 2nd blob with shape[C, H_out, W_out]
.
For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- 1 input case: A blob with shape
- Output
- Same rank as the inputs.
e.g.: A blob with shape
[C, H_out, W_out]
.
If one input is used, output is computed as follows:
y = x1[:, topCropAmount:H_in - bottomCropAmount, leftCropAmount:W_in - rightCropAmount]
topCropAmount == Height startEdgeSize == borderAmounts[0].startEdgeSize
bottomCropAmount == Height endEdgeSize == borderAmounts[0].endEdgeSize
leftCropAmount == Width startEdgeSize == borderAmounts[1].startEdgeSize
rightCropAmount == Width endEdgeSize == borderAmounts[1].endEdgeSize
H_out = H_in - topCropAmount - bottomCropAmount
W_out = W_in - leftCropAmount - rightCropAmount
If two inputs are used, output is computed as follows:
y = x1[:, offset[0]:offset[0] + H_out, offset[1]:offset[1] + W_out]
message CropLayerParams {
BorderAmounts cropAmounts = 1;
repeated uint64 offset = 5;
}
AverageLayerParams¶
A layer that computes the elementwise average of the inputs. This layer has limited broadcasting support. For general broadcasting see AddBroadcastableLayer.
y = AverageLayer(x1,x2,...)
Requires multiple inputs and produces 1 output.
- Input
- In general, there are no rank constraints. However, only certain set of shapes are broadcastable. For example: [B, 1, 1, 1], [B, C, 1, 1], [B, 1, H, W], [B, C, H, W]
- Output
- A blob with the same shape as each input.
message AverageLayerParams {
}
MaxLayerParams¶
A layer that computes the elementwise maximum over the inputs.
y = MaxLayer(x1,x2,...)
Requires multiple inputs and produces 1 output.
- Input
- In general, there are no rank constraints. However, only certain set of shapes are broadcastable. For example: [B, C, 1, 1], [B, C, H, W]
- Output
- A blob with the same shape as each input.
message MaxLayerParams {
}
MinLayerParams¶
A layer that computes the elementwise minimum over the inputs.
y = MinLayer(x1,x2,...)
Requires multiple inputs and produces 1 output.
- Input
- In general, there are no rank constraints. However, only certain set of shapes are broadcastable. For example: [B, C, 1, 1], [B, C, H, W]
- Output
- A blob with the same shape as each input.
message MinLayerParams {
}
DotProductLayerParams¶
A layer that computes the dot product of two vectors.
y = DotProductLayer(x1,x2)
Requires 2 inputs and produces 1 output.
- Input
- Two blobs with rank at least 3, such that the last two dimensions must be 1.
e.g.: blobs with shape
[B, C, 1, 1]
. For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch. - Output
- Same rank as the input. e.g. for rank 4 inputs, output shape: [B, 1, 1, 1]
message DotProductLayerParams {
bool cosineSimilarity = 1;
}
MeanVarianceNormalizeLayerParams¶
A layer that performs mean variance normalization, along axis = -3.
y = MeanVarianceNormalizeLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank greater than equal to 3. Example: Rank 4 blob represents [Batch, channels, height, width] For ranks greater than 3, the leading dimensions, starting from 0 to -4 (inclusive), are all treated as batch.
- Output
- A blob with the same shape as the input.
If acrossChannels == true
normalization is performed on flattened input, i.e. the input is reshaped to (Batch,C), where “Batch” constains
all dimensions from 0 to -4 (inclusive), and C constains dimensions -1, -2, -3.
If acrossChannels == false
normalization is performed within a channel,
across spatial dimensions (i.e. last two dimensions).
message MeanVarianceNormalizeLayerParams {
bool acrossChannels = 1;
bool normalizeVariance = 2;
float epsilon = 3;
}
SequenceRepeatLayerParams¶
A layer that repeats a sequence or the dimension sitting at axis = -5
y = SequenceRepeatLayer(x)
Requires 1 input and produces 1 output.
- Input
- A blob with rank at least 5.
e.g: shape
[Seq, B, C, H, W]
- Output
- A blob with the same rank as the input.
e.g.: for input shape
[Seq, B, C, H, W]
, output shape is[nRepetitions * Seq, B, C, H, W]
.
message SequenceRepeatLayerParams {
uint64 nRepetitions = 1;
}
SimpleRecurrentLayerParams¶
A simple recurrent layer.
y_t = SimpleRecurrentLayer(x_t, y_{t-1})
- Input
- A blob of rank 5, with shape [Seq, Batch, inputVectorSize, 1, 1]`.
This represents a sequence of vectors of size
inputVectorSize
. - Output
- Same rank as the input.
Represents a vector of size
outputVectorSize
. It is either the final output or a sequence of outputs at all time steps.
- Output Shape:
[1, Batch, outputVectorSize, 1, 1]
, ifsequenceOutput == false
- Output Shape:
[Seq, Batch, outputVectorSize, 1, 1]
, ifsequenceOutput == true
This layer is described by the following equation:
W
is a 2-dimensional weight matrix ([outputVectorSize, inputVectorSize]
, row-major)R
is a 2-dimensional recursion matrix ([outputVectorSize, outputVectorSize]
, row-major)b
is a 1-dimensional bias vector ([outputVectorSize]
)f()
is an activationclip()
is a function that constrains values between[-50.0, 50.0]
message SimpleRecurrentLayerParams {
uint64 inputVectorSize = 1;
uint64 outputVectorSize = 2;
ActivationParams activation = 10;
If false output is just the result after final state update.
If true, output is a sequence, containing outputs at all time steps.
bool sequenceOutput = 15;
bool hasBiasVector = 20;
WeightParams weightMatrix = 30;
WeightParams recursionMatrix = 31;
WeightParams biasVector = 32;
bool reverseInput = 100;
// If true, then the node processes the input sequence from right to left
}
GRULayerParams¶
Gated-Recurrent Unit (GRU) Layer
y_t = GRULayer(x_t, y_{t-1})
- Input
- A blob of rank 5, with shape [Seq, Batch, inputVectorSize, 1, 1]`.
This represents a sequence of vectors of size
inputVectorSize
. - Output
- Same rank as the input.
Represents a vector of size
outputVectorSize
. It is either the final output or a sequence of outputs at all time steps.
- Output Shape:
[1, Batch, outputVectorSize, 1, 1]
, ifsequenceOutput == false
- Output Shape:
[Seq, Batch, outputVectorSize, 1, 1]
, ifsequenceOutput == true
This layer is described by the following equations:
- Update Gate
- Reset Gate
- Cell Memory State
- Output Gate
- Output
W_z
,W_r
,W_o
are 2-dimensional input weight matrices ([outputVectorSize, inputVectorSize]
, row-major)R_z
,R_r
,R_o
are 2-dimensional recursion matrices ([outputVectorSize, outputVectorSize]
, row-major)b_z
,b_r
,b_o
are 1-dimensional bias vectors ([outputVectorSize]
)f()
,g()
are activationsclip()
is a function that constrains values between[-50.0, 50.0]
⊙
denotes the elementwise product of matrices
message GRULayerParams {
uint64 inputVectorSize = 1;
uint64 outputVectorSize = 2;
repeated ActivationParams activations = 10;
bool sequenceOutput = 15;
bool hasBiasVectors = 20;
WeightParams updateGateWeightMatrix = 30;
WeightParams resetGateWeightMatrix = 31;
WeightParams outputGateWeightMatrix = 32;
WeightParams updateGateRecursionMatrix = 50;
WeightParams resetGateRecursionMatrix = 51;
WeightParams outputGateRecursionMatrix = 52;
WeightParams updateGateBiasVector = 70;
WeightParams resetGateBiasVector = 71;
WeightParams outputGateBiasVector = 72;
bool reverseInput = 100;
}
LSTMParams¶
Long short-term memory (LSTM) parameters.
This is described by the following equations:
- Input Gate
- Forget Gate
- Block Input
- Cell Memory State
- Output Gate
- Output
W_i
,W_f
,W_z
,W_o
are 2-dimensional input weight matrices ([outputVectorSize, inputVectorSize]
, row-major)R_i
,R_f
,R_z
,R_o
are 2-dimensional recursion matrices ([outputVectorSize, outputVectorSize]
, row-major)b_i
,b_f
,b_z
,b_o
are 1-dimensional bias vectors ([outputVectorSize]
)p_
,p_f
,p_o
are 1-dimensional peephole vectors ([outputVectorSize]
)f()
,g()
,h()
are activationsclip()
is a function that constrains values between[-50.0, 50.0]
⊙
denotes the elementwise product of matrices
message LSTMParams {
bool sequenceOutput = 10;
bool hasBiasVectors = 20;
bool forgetBias = 30;
bool hasPeepholeVectors = 40;
bool coupledInputAndForgetGate = 50;
float cellClipThreshold = 60;
}
LSTMWeightParams¶
Weights for long short-term memory (LSTM) layers
message LSTMWeightParams {
WeightParams inputGateWeightMatrix = 1;
WeightParams forgetGateWeightMatrix = 2;
WeightParams blockInputWeightMatrix = 3;
WeightParams outputGateWeightMatrix = 4;
WeightParams inputGateRecursionMatrix = 20;
WeightParams forgetGateRecursionMatrix = 21;
WeightParams blockInputRecursionMatrix = 22;
WeightParams outputGateRecursionMatrix = 23;
//biases:
WeightParams inputGateBiasVector = 40;
WeightParams forgetGateBiasVector = 41;
WeightParams blockInputBiasVector = 42;
WeightParams outputGateBiasVector = 43;
//peepholes:
WeightParams inputGatePeepholeVector = 60;
WeightParams forgetGatePeepholeVector = 61;
WeightParams outputGatePeepholeVector = 62;
}
UniDirectionalLSTMLayerParams¶
A unidirectional long short-term memory (LSTM) layer.
(y_t, c_t) = UniDirectionalLSTMLayer(x_t, y_{t-1}, c_{t-1})
- Input
- A blob of rank 5, with shape [Seq, Batch, inputVectorSize, 1, 1]`.
This represents a sequence of vectors of size
inputVectorSize
. - Output
- Same rank as the input.
Represents a vector of size
outputVectorSize
. It is either the final output or a sequence of outputs at all time steps.
- Output Shape:
[1, Batch, outputVectorSize, 1, 1]
, ifsequenceOutput == false
- Output Shape:
[Seq, Batch, outputVectorSize, 1, 1]
, ifsequenceOutput == true
message UniDirectionalLSTMLayerParams {
uint64 inputVectorSize = 1;
uint64 outputVectorSize = 2;
repeated ActivationParams activations = 10;
LSTMParams params = 15;
LSTMWeightParams weightParams = 20;
bool reverseInput = 100;
}
BiDirectionalLSTMLayerParams¶
Bidirectional long short-term memory (LSTM) layer
(y_t, c_t, y_t_reverse, c_t_reverse) = BiDirectionalLSTMLayer(x_t, y_{t-1}, c_{t-1}, y_{t-1}_reverse, c_{t-1}_reverse)
- Input
- A blob of rank 5, with shape [Seq, Batch, inputVectorSize, 1, 1]`.
This represents a sequence of vectors of size
inputVectorSize
. - Output
- Same rank as the input.
Represents a vector of size
2 * outputVectorSize
. It is either the final output or a sequence of outputs at all time steps.
- Output Shape:
[1, Batch, 2 * outputVectorSize, 1, 1]
, ifsequenceOutput == false
- Output Shape:
[Seq, Batch, 2 * outputVectorSize, 1, 1]
, ifsequenceOutput == true
The first LSTM operates on the input sequence in the forward direction. The second LSTM operates on the input sequence in the reverse direction.
Example: given the input sequence [x_1, x_2, x_3]
,
where x_i
are vectors at time index i
:
The forward LSTM output is [yf_1, yf_2, yf_3]
,
where yf_i
are vectors of size outputVectorSize
:
yf_1
is the output at the end of sequence {x_1
}yf_2
is the output at the end of sequence {x_1
,x_2
}yf_3
is the output at the end of sequence {x_1
,x_2
,x_3
}
The backward LSTM output: [yb_1, yb_2, yb_3]
,
where yb_i
are vectors of size outputVectorSize
:
yb_1
is the output at the end of sequence {x_3
}yb_2
is the output at the end of sequence {x_3
,x_2
}yb_3
is the output at the end of sequence {x_3
,x_2
,x_1
}
Output of the bi-dir layer:
- if
sequenceOutput = True
: {[yf_1, yb_3]
,[yf_2, yb_2]
,[yf_3, yb_1]
} - if
sequenceOutput = False
: {[yf_3, yb_3]
}
message BiDirectionalLSTMLayerParams {
uint64 inputVectorSize = 1;
uint64 outputVectorSize = 2;
repeated ActivationParams activationsForwardLSTM = 10;
repeated ActivationParams activationsBackwardLSTM = 11;
LSTMParams params = 15;
repeated LSTMWeightParams weightParams = 20;
}
CustomLayerParams¶
message CustomLayerParams {
message CustomLayerParamValue {
oneof value {
double doubleValue = 10;
string stringValue = 20;
int32 intValue = 30;
int64 longValue = 40;
bool boolValue = 50;
}
}
string className = 10; // The name of the class (conforming to MLCustomLayer) corresponding to this layer
repeated WeightParams weights = 20; // Any weights -- these are serialized in binary format and memmapped at runtime
map<string, CustomLayerParamValue> parameters = 30; // these may be handled as strings, so this should not be large
string description = 40; // An (optional) description of the layer provided by the model creator. This information is displayed when viewing the model, but does not affect the model's execution on device.
}
CustomLayerParams.CustomLayerParamValue¶
message CustomLayerParamValue {
oneof value {
double doubleValue = 10;
string stringValue = 20;
int32 intValue = 30;
int64 longValue = 40;
bool boolValue = 50;
}
}
CustomLayerParams.ParametersEntry¶
message CustomLayerParamValue {
oneof value {
double doubleValue = 10;
string stringValue = 20;
int32 intValue = 30;
int64 longValue = 40;
bool boolValue = 50;
}
}
TransposeLayerParams¶
message TransposeLayerParams{
repeated uint64 axes = 1; //
}
BatchedMatMulLayerParams¶
A layer that computes the matrix multiplication of two tensors with numpy like broadcasting where the matrices reside in the last two indices of the tensor.
y = BatchedMatMul(a,b)
Requires 1 or 2 inputs and produces 1 output. The first tensor, “a”, must be provided as an input. The second tensor can either be an input or provided as a weight matrix parameter.
- Input
- a: First N-D tensor
- b: Second N-D tensor (either an N rank input or a matrix, i.e. N=2, provided as a layer parameter)
- Output
- A tensor containing the matrix product of two tensors. When there are two inputs: rank is max(2, rank(a), rank(b)) When there is one input: rank is same as that of the input.
This operation behaves as following:
- When there are two inputs:
- If N >= 2 for both tensors, it is treated as a batch of matrices residing in the last two indices. All the indices, except for the last two, are broadcasted using conventional rules.
- If the first tensor is 1-D, it is converted to a 2-D tensor by prepending a 1 to its shape. Eg. (D) -> (1,D)
- If the second tensor is 1-D, it is converted to a 2-D tensor by appending a 1 to its shape. Eg. (D) -> (D,1)
- When there is one input:
- The weight matrix corresponds to a matrix, of shape (X1, X2). Values of X1, X2 must be provided as layer parameters.
- The input, “a”, is reshaped into a matrix by combining all the leading dimensions, except the last, into a batch dimension. eg:
- if “a” is rank 1 (X1,) –> (1, X1). Output shape will be (X2,)
- if “a” is rank 2 (B1, X1) –> no need to reshape. Output shape will be (B1, X2)
- if “a” is rank 3 (B1, B2, X1) –> (B1 * B2, X1). Output shape will be (B1, B2, X2)
- etc
message BatchedMatMulLayerParams{
bool transposeA = 1;
bool transposeB = 2;
uint64 weightMatrixFirstDimension = 5;
uint64 weightMatrixSecondDimension = 6;
bool hasBias = 7;
WeightParams weights = 8;
WeightParams bias = 9;
}
ConcatNDLayerParams¶
message ConcatNDLayerParams{
int64 axis = 1;
}
SoftmaxNDLayerParams¶
A layer that performs softmax normalization along a specified axis.
y = SoftmaxNDLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor.
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message SoftmaxNDLayerParams{
int64 axis = 1;
}
ReverseLayerParams¶
message ReverseLayerParams {
repeated bool reverseDim = 1;
}
ReverseSeqLayerParams¶
message ReverseSeqLayerParams {
int64 batchAxis = 1; // batch axis has to be strictly less than seq_axis
int64 sequenceAxis = 2;
}
LoadConstantNDLayerParams¶
A layer that loads data as a parameter and provides it as an output.
y = LoadConstantNDLayer()
Takes no input. Produces 1 output.
- Input
- None
- Output:
- A blob whose rank is between 1 to 4.
message LoadConstantNDLayerParams {
repeated uint64 shape = 1;
WeightParams data = 2;
}
FillLikeLayerParams¶
A layer that fills the tensor with a scalar value.
y = FillLikeLayer(x)
Input
Output
message FillLikeLayerParams{
float value = 1;
}
FillStaticLayerParams¶
A layer that fills the tensor with a scalar value.
y = FillStaticLayer(x)
Input
Output
message FillStaticLayerParams{
float value = 1;
repeated uint64 targetShape = 2;
}
FillDynamicLayerParams¶
A layer that fills the tensor with a scalar value.
y = FillDynamicLayer(x)
Input
Output
message FillDynamicLayerParams{
float value = 1;
}
WhereBroadcastableLayerParams¶
y = WhereBroadcastableLayer(x)
Requires 3 inputs and produces 1 output.
message WhereBroadcastableLayerParams{
}
SinLayerParams¶
A layer that computes elementwise trigonometric sine function.
y = SinLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message SinLayerParams{
}
CosLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = CosLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message CosLayerParams{
}
TanLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = TanLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message TanLayerParams{
}
AsinLayerParams¶
A layer that computes elementwise trigonometric sine function.
y = AsinLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message AsinLayerParams{
}
AcosLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = AcosLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message AcosLayerParams{
}
AtanLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = AtanLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message AtanLayerParams{
}
SinhLayerParams¶
A layer that computes elementwise trigonometric sine function.
y = SinhLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message SinhLayerParams{
}
CoshLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = CoshLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message CoshLayerParams{
}
TanhLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = TanhLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message TanhLayerParams{
}
AsinhLayerParams¶
A layer that computes elementwise trigonometric sine function.
y = AsinhLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message AsinhLayerParams{
}
AcoshLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = AcoshLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message AcoshLayerParams{
}
AtanhLayerParams¶
A layer that computes elementwise trigonometric cosine function.
y = AtanhLayer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message AtanhLayerParams{
}
PowBroadcastableLayerParams¶
A layer that raises each element in first tensor to the power of corresponding element in the second tensor. Supports conventional numpy like broadcasting.
y = PowBroadcastableLayer(x)
Requires 2 inputs and produces 1 output.
- Input
- First N-D tensor
- Second N-D tensor
- Output
- An N-dimesnional tensor with the broadcasted shape.
message PowBroadcastableLayerParams{
}
Exp2LayerParams¶
A layer that computes the exponential of all elements in the input tensor.
y = Exp2Layer(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- An N-dimesnional tensor with the same shape as input tensor.
message Exp2LayerParams{
}
WhereNonZeroLayerParams¶
message WhereNonZeroLayerParams{
}
MatrixBandPartLayerParams¶
Parameters for matrix_band_part layer band(m, n) = (num_lower < 0 || (m-n) <= num_lower) && (num_upper < 0 || (n-m) <= num_upper). output[i, j, k, ..., m, n] = band(m, n) * input[i, j, k, ..., m, n]
message MatrixBandPartLayerParams{
int64 numLower = 1;
int64 numUpper = 2;
}
UpperTriangularLayerParams¶
message UpperTriangularLayerParams{
int64 k = 1; // Diagonal below which to zero elements. k = 0 (the default) is the main diagonal, k < 0 is below it and k > 0 is above
}
LowerTriangularLayerParams¶
message LowerTriangularLayerParams{
int64 k = 1; // Diagonal above which to zero elements. k = 0 (the default) is the main diagonal, k < 0 is below it and k > 0 is above
}
BroadcastToLikeLayerParams¶
A layer that broadcasts a tensor to a new shape.
y = BroadcastToLikeLayer(x)
Requires 1 input and produces 1 output.
Input
Output
message BroadcastToLikeLayerParams{
}
BroadcastToStaticLayerParams¶
A layer that broadcasts a tensor to a new shape.
y = BroadcastToStaticLayer(x)
Requires 1 input and produces 1 output.
Input
Output
message BroadcastToStaticLayerParams{
repeated uint64 targetShape = 1;
}
BroadcastToDynamicLayerParams¶
A layer that broadcasts a tensor to a new shape.
y = BroadcastToDynamicLayer(x)
Requires 1 input and produces 1 output.
Input
Output
message BroadcastToDynamicLayerParams{
}
AddBroadcastableLayerParams¶
message AddBroadcastableLayerParams{
}
MaxBroadcastableLayerParams¶
message MaxBroadcastableLayerParams{
}
MinBroadcastableLayerParams¶
message MinBroadcastableLayerParams{
}
ModBroadcastableLayerParams¶
message ModBroadcastableLayerParams{
}
FloorDivBroadcastableLayerParams¶
message FloorDivBroadcastableLayerParams{
}
SubtractBroadcastableLayerParams¶
message SubtractBroadcastableLayerParams{
}
MultiplyBroadcastableLayerParams¶
message MultiplyBroadcastableLayerParams{
}
DivideBroadcastableLayerParams¶
message DivideBroadcastableLayerParams{
}
GatherLayerParams¶
Requires 2 inputs and produces 1 output.
Given two inputs, ‘data’ and ‘indices’, gather the slices of ‘data’ and store into output. E.g. output[i] = data[indices[i]], for i in [0, length(indices) - 1] (1-D case, axis=0) Support negative indices and negative axis
message GatherLayerParams{
int64 axis = 1;
}
ScatterLayerParams¶
message ScatterLayerParams {
int64 axis = 1;
ScatterMode mode = 2;
}
GatherNDLayerParams¶
message GatherNDLayerParams{
}
ScatterNDLayerParams¶
message ScatterNDLayerParams{
ScatterMode mode = 1;
}
GatherAlongAxisLayerParams¶
message GatherAlongAxisLayerParams{
int64 axis = 1;
}
ScatterAlongAxisLayerParams¶
message ScatterAlongAxisLayerParams{
int64 axis = 1;
ScatterMode mode = 2;
}
RankPreservingReshapeLayerParams¶
Reshape layer that does not alter the rank of the input. Order of the data is left unchanged.
Requires 1 input and produces 1 output.
e.g:
input shape = (20,10) targetShape = (5,-1) output shape = (5,40)
input shape = (20,10,5) targetShape = (0,2,25) output shape = (20,2,25)
input shape = (10,3,5) targetShape = (25,0,-1) output shape = (25,3,2)
message RankPreservingReshapeLayerParams {
repeated int64 targetShape = 1;
}
RandomNormalLikeLayerParams¶
A tensor of the specified shape filled with random normal values.
- Input:
- seed: seed used for the normal distribution. mean: mean of the normal distribution. stdDev: standard deviation of the normal distribution.
Output:
message RandomNormalLikeLayerParams {
int64 seed = 1;
float mean = 2;
float stdDev = 3;
}
RandomNormalStaticLayerParams¶
A tensor of the specified shape filled with random normal values.
- Input:
- seed: seed used for the normal distribution. mean: mean of the normal distribution. stdDev: standard deviation of the normal distribution. outputShape: shape of the ouput tensor.
Output:
message RandomNormalStaticLayerParams {
int64 seed = 1;
float mean = 2;
float stdDev = 3;
repeated uint64 outputShape = 4;
}
RandomNormalDynamicLayerParams¶
A tensor of the specified shape filled with random normal values.
- Input:
- seed: seed used for the normal distribution. mean: mean of the normal distribution. stdDev: standard deviation of the normal distribution.
Output:
message RandomNormalDynamicLayerParams {
int64 seed = 1;
float mean = 2;
float stdDev = 3;
}
RandomUniformLikeLayerParams¶
A tensor of the specified shape filled with random uniform values.
- Input:
- seed: seed used for the uniform distribution. minVal: lower bound on the range of random values for the uniform distribution. maxVal: upper bound on the range of random values for the uniform distribution.
Output:
message RandomUniformLikeLayerParams {
int64 seed = 1;
float minVal = 2;
float maxVal = 3;
}
RandomUniformStaticLayerParams¶
A tensor of the specified shape filled with random uniform values.
- Input:
- seed: seed used for the uniform distribution. minVal: lower bound on the range of random values for the uniform distribution. maxVal: upper bound on the range of random values for the uniform distribution. outputShape: shape of the ouput tensor.
Output:
message RandomUniformStaticLayerParams {
int64 seed = 1;
float minVal = 2;
float maxVal = 3;
repeated uint64 outputShape = 4;
}
RandomUniformDynamicLayerParams¶
A tensor of the specified shape filled with random uniform values.
- Input:
- seed: seed used for the uniform distribution. minVal: lower bound on the range of random values for the uniform distribution. maxVal: upper bound on the range of random values for the uniform distribution.
Output:
message RandomUniformDynamicLayerParams {
int64 seed = 1;
float minVal = 2;
float maxVal = 3;
}
RandomBernoulliLikeLayerParams¶
A tensor of the specified shape filled with random Bernoulli distribution.
- Input:
- seed: seed used for the Bernoulli distribution. prob: probability of a 1 event.
Output:
message RandomBernoulliLikeLayerParams {
int64 seed = 1;
float prob = 2;
}
RandomBernoulliStaticLayerParams¶
A tensor of the specified shape filled with random Bernoulli distribution.
- Input:
- seed: seed used for the Bernoulli distribution. prob: probability of a 1 event. outputShape: shape of the ouput tensor.
Output:
message RandomBernoulliStaticLayerParams {
int64 seed = 1;
float prob = 2;
repeated uint64 outputShape = 3;
}
RandomBernoulliDynamicLayerParams¶
A tensor of the specified shape filled with random Bernoulli distribution.
- Input:
- seed: seed used for the Bernoulli distribution. prob: probability of a 1 event.
Output:
message RandomBernoulliDynamicLayerParams {
int64 seed = 1;
float prob = 2;
}
CategoricalDistributionLayerParams¶
A tensor of the specified shape filled with categorical distribution.
- Input:
- seed: seed used for the categorical distribution. numSamples: number of samples to draw. isLogits: true if the inputs are logits, false if the inputs are probabilities. eps: default value is 1e-10. temperature: default value is 1.0.
Output:
message CategoricalDistributionLayerParams {
int64 seed = 1;
int64 numSamples = 2;
bool isLogits = 3;
float eps = 4;
float temperature = 5;
}
ReduceL1LayerParams¶
message ReduceL1LayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceL2LayerParams¶
message ReduceL2LayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceMaxLayerParams¶
message ReduceMaxLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceMinLayerParams¶
message ReduceMinLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceSumLayerParams¶
message ReduceSumLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceProdLayerParams¶
message ReduceProdLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceMeanLayerParams¶
message ReduceMeanLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceLogSumLayerParams¶
message ReduceLogSumLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceSumSquareLayerParams¶
message ReduceSumSquareLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ReduceLogSumExpLayerParams¶
message ReduceLogSumExpLayerParams {
repeated int64 axes = 1;
bool keepDims = 2;
bool reduceAll = 3;
}
ExpandDimsLayerParams¶
Increase the rank of the input tensor by adding unit dimensions.
Requires 1 input and produces 1 output.
e.g.:
input shape = (10,5) axes = (0,1) output shape = (1,1,10,5)
input shape = (10,5) axes = (0,2) output shape = (1,10,1,5)
input shape = (10,5) axes = (-2,-1) output shape = (10,5,1,1)
message ExpandDimsLayerParams {
repeated int64 axes = 1;
}
FlattenTo2DLayerParams¶
message FlattenTo2DLayerParams {
int64 axis = 1;
}
ReshapeStaticLayerParams¶
message ReshapeStaticLayerParams {
repeated int64 targetShape = 1;
}
ReshapeLikeLayerParams¶
message ReshapeLikeLayerParams {
}
ReshapeDynamicLayerParams¶
message ReshapeDynamicLayerParams {
}
SqueezeLayerParams¶
Decrease the rank of the input tensor by removing unit dimensions.
Requires 1 input and produces 1 output.
Output rank is one less than input rank, if input rank is more than 1. If input rank is 1, output rank is also 1.
e.g.:
input shape = (1,1,10,5) axes = (0,1) output shape = (10,5)
input shape = (1,10,5,1) axes = (0,3) output shape = (10,5)
input shape = (10,5,1,1) axes = (-2,-1) output shape = (10,5)
input shape = (1,) axes = (0) output shape = (1,)
message SqueezeLayerParams {
repeated int64 axes = 1;
bool squeezeAll = 2; // if true squeeze all dimensions that are 1.
}
TopKLayerParams¶
Return top K (or bottom K) values and the corresponding indices of the input along a given axis.
Requires 1 or 2 inputs and produces 2 outputs. The second input is the value of the K, and is optional. If there is only one input, value of K that is specified in the layer parameter is used.
Both outputs have the same rank as the first input. Second input must correspond to a scalar tensor.
e.g.:
first input’s shape = (45, 34, 10, 5) axis = 1 output shape, for both outputs = (45, K, 10, 5)
message TopKLayerParams {
int64 axis = 1;
uint64 K = 2;
bool useBottomK = 3;
}
ArgMaxLayerParams¶
Return the indices of the maximum value along a specified axis in a tensor.
Requires 1 input and produces 1 output.
Output has the same rank as the input if “removeDim” is False (default). Output has rank one less than the input if “removeDim” is True and input rank is more than 1.
e.g.:
input shape = (45, 34, 10, 5) axis = -2 output shape = (45, 1, 10, 5), if removeDim = False (default) output shape = (45, 10, 5), if removeDim = True
input shape = (5,) axis = 0 output shape = (1,), if removeDim = False or True
message ArgMaxLayerParams {
int64 axis = 1;
bool removeDim = 2;
}
ArgMinLayerParams¶
Return the indices of the minimum value along a specified axis in a tensor.
Requires 1 input and produces 1 output.
Output has the same rank as the input if “removeDim” is False (default). Output has rank one less than the input if “removeDim” is True and input rank is more than 1.
e.g.:
input shape = (45, 34, 10, 5) axis = -2 output shape = (45, 1, 10, 5), if removeDim = False (default) output shape = (45, 10, 5), if removeDim = True
input shape = (5,) axis = 0 output shape = (1,), if removeDim = False or True
message ArgMinLayerParams {
int64 axis = 1;
bool removeDim = 2;
}
SplitNDLayerParams¶
Support unequal splits and negative indexing
message SplitNDLayerParams{
int64 axis = 1;
uint64 numSplits = 2;
repeated uint64 splitSizes = 3;
}
CeilLayerParams¶
message CeilLayerParams{
}
RoundLayerParams¶
message RoundLayerParams{
}
FloorLayerParams¶
message FloorLayerParams{
}
SignLayerParams¶
message SignLayerParams{
}
ClipLayerParams¶
Clip the input value within a minimum and maximum threshold values
Requires 1 input and produces 1 output. Parameter minVal: the minimum threshold. Parameter maxVal: the maximum threshold.
output = min(max(input, minVal), maxVal)
message ClipLayerParams{
float minVal = 1;
float maxVal = 2;
}
SliceStaticLayerParams¶
Support negative indexing and negative strides
message SliceStaticLayerParams{
repeated int64 beginIds = 1;
repeated bool beginMasks = 2;
repeated int64 endIds = 3;
repeated bool endMasks = 4;
repeated int64 strides = 5;
}
SliceDynamicLayerParams¶
Support negative indexing and negative strides
message SliceDynamicLayerParams{
repeated bool beginMasks = 2;
repeated int64 endIds = 3;
repeated bool endMasks = 4;
repeated int64 strides = 5;
}
TileLayerParams¶
message TileLayerParams{
repeated uint64 reps = 1;
}
GetShapeLayerParams¶
Input: a tensor. Output: a vector of length R, where R is the rank of the input tensor
message GetShapeLayerParams {
}
ErfLayerParams¶
Error function (also known as Gauss error function)
Requires 1 input and produces 1 output.
message ErfLayerParams {
}
GeluLayerParams¶
Gaussian error linear unit activation, which is:
y = 0.5 * x * (1 + erf(x/sqrt(2)))
Requires 1 input and produces 1 output.
message GeluLayerParams {
enum GeluMode {
EXACT = 0;
TANH_APPROXIMATION = 1;
SIGMOID_APPROXIMATION = 2;
}
GeluMode mode = 1;
}
RangeStaticLayerParams¶
RangeStatic function.
Requires either three inputs in order: end, start, stepSize. Each input is a scalar, or rank 1 and shape (1,).
Similar as np.arange() function.
message RangeStaticLayerParams {
float endValue = 1;
float startValue = 2;
float stepSizeValue = 3;
}
RangeDynamicLayerParams¶
message RangeDynamicLayerParams {
float startValue = 2;
float stepSizeValue = 3;
}
SlidingWindowsLayerParams¶
A layer that returns a tensor containing all windows of size “window_size” separated by “step” along the dimension “axis”.
y = SlidingWindows(x)
Requires 1 input and produces 1 output.
- Input
- An N-dimensional tensor
- Output
- A N+1 dimensioal tensor
- This operation behaves as following:
- if axis = 0 & input is rank 1 (L,). Output shape will be (M, W).
- if axis = 1 & input is rank 3 (B1, L, C1). Output shape will be (B1, M, W, C1)
- if axis = 2 & input is rank 5 (B1, B2, L, C1, C2) –> (B1 * B2, L, C1 * C2) –> (B1 * B2, M, W, C1 * C2). Output shape will be (B1, B2, M, W, C1, C2)
- etc
- where
- L, C, B refer to input length, feature dimension length & batch size respectively
- W is the window size.
- M is the number of windows/slices calculated as M = (L - W) / step + 1
message SlidingWindowsLayerParams {
int64 axis = 1;
uint64 windowSize = 2;
uint64 step = 3;
}
LayerNormalizationLayerParams¶
message LayerNormalizationLayerParams {
repeated int64 normalizedShape = 1;
float eps = 2;
WeightParams gamma = 3;
WeightParams beta = 4;
}
NeuralNetworkClassifier¶
A neural network specialized as a classifier.
message NeuralNetworkClassifier {
repeated NeuralNetworkLayer layers = 1;
repeated NeuralNetworkPreprocessing preprocessing = 2;
// use this enum value to determine the input tensor shapes to the neural network, for multiarray inputs
NeuralNetworkMultiArrayShapeMapping arrayInputShapeMapping = 5;
// use this enum value to determine the input tensor shapes to the neural network, for image inputs
NeuralNetworkImageShapeMapping imageInputShapeMapping = 6;
NetworkUpdateParameters updateParams = 10;
oneof ClassLabels {
StringVector stringClassLabels = 100;
Int64Vector int64ClassLabels = 101;
}
string labelProbabilityLayerName = 200;
}
NeuralNetworkRegressor¶
A neural network specialized as a regressor.
message NeuralNetworkRegressor {
repeated NeuralNetworkLayer layers = 1;
repeated NeuralNetworkPreprocessing preprocessing = 2;
// use this enum value to determine the input tensor shapes to the neural network, for multiarray inputs
NeuralNetworkMultiArrayShapeMapping arrayInputShapeMapping = 5;
// use this enum value to determine the input tensor shapes to the neural network, for image inputs
NeuralNetworkImageShapeMapping imageInputShapeMapping = 6;
NetworkUpdateParameters updateParams = 10;
}
NetworkUpdateParameters¶
message NetworkUpdateParameters {
repeated LossLayer lossLayers = 1;
Optimizer optimizer = 2;
Int64Parameter epochs = 3;
}
LossLayer¶
message LossLayer {
string name = 1;
oneof LossLayerType {
CategoricalCrossEntropyLossLayer categoricalCrossEntropyLossLayer = 10;
MeanSquaredErrorLossLayer meanSquaredErrorLossLayer = 11;
}
}
CategoricalCrossEntropyLossLayer¶
message CategoricalCrossEntropyLossLayer {
string input = 1;
string target = 2;
}
MeanSquaredErrorLossLayer¶
message MeanSquaredErrorLossLayer {
string input = 1;
string target = 2;
}
Optimizer¶
message Optimizer {
oneof OptimizerType {
SGDOptimizer sgdOptimizer = 10;
AdamOptimizer adamOptimizer = 11;
}
}
SGDOptimizer¶
message SGDOptimizer {
DoubleParameter learningRate = 1;
Int64Parameter miniBatchSize = 2;
DoubleParameter momentum = 3;
}
AdamOptimizer¶
message AdamOptimizer {
DoubleParameter learningRate = 1;
Int64Parameter miniBatchSize = 2;
DoubleParameter beta1 = 3;
DoubleParameter beta2 = 4;
DoubleParameter eps = 5;
}
BoxCoordinatesMode.Coordinates¶
enum Coordinates {
CORNERS_HEIGHT_FIRST = 0;
CORNERS_WIDTH_FIRST = 1;
CENTER_SIZE_HEIGHT_FIRST = 2;
CENTER_SIZE_WIDTH_FIRST = 3;
}
FlattenLayerParams.FlattenOrder¶
enum FlattenOrder {
CHANNEL_FIRST = 0;
CHANNEL_LAST = 1;
}
GeluLayerParams.GeluMode¶
enum GeluMode {
EXACT = 0;
TANH_APPROXIMATION = 1;
SIGMOID_APPROXIMATION = 2;
}
NeuralNetworkImageShapeMapping¶
enum NeuralNetworkImageShapeMapping {
RANK5_IMAGE_MAPPING = 0;
RANK4_IMAGE_MAPPING = 1;
}
NeuralNetworkMultiArrayShapeMapping¶
enum NeuralNetworkMultiArrayShapeMapping {
RANK5_ARRAY_MAPPING = 0;
EXACT_ARRAY_MAPPING = 1;
}
PoolingLayerParams.PoolingType¶
enum PoolingType{
MAX = 0;
AVERAGE = 1;
L2 = 2;
}
ReduceLayerParams.ReduceAxis¶
enum ReduceAxis {
CHW = 0;
HW = 1;
C = 2;
H = 3;
W = 4;
}
ReduceLayerParams.ReduceOperation¶
enum ReduceOperation {
SUM = 0;
AVG = 1;
PROD = 2;
LOGSUM = 3;
SUMSQUARE = 4;
L1 = 5;
L2 = 6;
MAX = 7;
MIN = 8;
ARGMAX = 9;
}
ReorganizeDataLayerParams.ReorganizationType¶
enum ReorganizationType {
SPACE_TO_DEPTH = 0;
DEPTH_TO_SPACE = 1;
}
ReshapeLayerParams.ReshapeOrder¶
enum ReshapeOrder {
CHANNEL_FIRST = 0;
CHANNEL_LAST = 1;
}
SamePadding.SamePaddingMode¶
enum SamePaddingMode {
BOTTOM_RIGHT_HEAVY = 0;
TOP_LEFT_HEAVY = 1;
}
SamplingMode.Method¶
enum Method {
STRICT_ALIGN_ENDPOINTS_MODE = 0;
ALIGN_ENDPOINTS_MODE = 1;
UPSAMPLE_MODE = 2;
ROI_ALIGN_MODE = 3;
}
ScatterMode¶
enum ScatterMode {
SCATTER_UPDATE = 0;
SCATTER_ADD=1;
SCATTER_SUB=2;
SCATTER_MUL=3;
SCATTER_DIV=4;
SCATTER_MAX=5;
SCATTER_MIN=6;
}
SliceLayerParams.SliceAxis¶
enum SliceAxis {
CHANNEL_AXIS = 0;
HEIGHT_AXIS = 1;
WIDTH_AXIS = 2;
}
UnaryFunctionLayerParams.Operation¶
A unary operator.
The following functions are supported:
SQRT
RSQRT
INVERSE
POWER
EXP
LOG
ABS
THRESHOLD
enum Operation{
SQRT = 0;
RSQRT = 1;
INVERSE = 2;
POWER = 3;
EXP = 4;
LOG = 5;
ABS = 6;
THRESHOLD = 7;
}
UpsampleLayerParams.InterpolationMode¶
enum InterpolationMode {
NN = 0;
BILINEAR = 1;
}