gluoncv.model_zoo

Gluon Vision Model Zoo

gluoncv.model_zoo.get_model(name, **kwargs)[source]

Returns a pre-defined model by name

Parameters:
  • name (str) – Name of the model.
  • pretrained (bool) – Whether to load the pretrained weights for model.
  • classes (int) – Number of classes for the output layer.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
Returns:

The model.

Return type:

HybridBlock

CIFAR

gluoncv.model_zoo.get_cifar_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • version (int) – Version of ResNet. Options are 1, 2.
  • num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_resnet20_v1(**kwargs)[source]

ResNet-20 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_resnet56_v1(**kwargs)[source]

ResNet-56 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_resnet110_v1(**kwargs)[source]

ResNet-110 V1 model for CIFAR10 from “Deep Residual Learning for Image Recognition” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_resnet20_v2(**kwargs)[source]

ResNet-20 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_resnet56_v2(**kwargs)[source]

ResNet-56 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_resnet110_v2(**kwargs)[source]

ResNet-110 V2 model for CIFAR10 from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.get_cifar_wide_resnet(num_layers, width_factor=1, drop_rate=0.0, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • num_layers (int) – Numbers of layers. Needs to be an integer in the form of 6*n+2, e.g. 20, 56, 110, 164.
  • width_factor (int) – The width factor to apply to the number of channels from the original resnet.
  • drop_rate (float) – The rate of dropout.
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_wideresnet16_10(**kwargs)[source]

WideResNet-16-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters:
  • drop_rate (float) – The rate of dropout.
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_wideresnet28_10(**kwargs)[source]

WideResNet-28-10 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters:
  • drop_rate (float) – The rate of dropout.
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
gluoncv.model_zoo.cifar_wideresnet40_8(**kwargs)[source]

WideResNet-40-8 model for CIFAR10 from “Wide Residual Networks” paper.

Parameters:
  • drop_rate (float) – The rate of dropout.
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Object Detection

SSD

class gluoncv.model_zoo.ssd.SSD(network, base_size, features, num_filters, sizes, ratios, steps, classes, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, iou_thresh=0.5, neg_thresh=0.5, negative_mining_ratio=3, stds=(0.1, 0.1, 0.2, 0.2), nms_thresh=0.45, nms_topk=-1, anchor_alloc_size=128, **kwargs)[source]

Single-shot Object Detection Network: https://arxiv.org/abs/1512.02325.

Parameters:
  • network (string or None) – Name of the base network, if None is used, will instantiate the base network from features directly instead of composing.
  • base_size (int) – Base input size, it is speficied so SSD can support dynamic input shapes.
  • features (list of str or mxnet.gluon.HybridBlock) – Intermediate features to be extracted or a network with multi-output. If network is None, features is expected to be a multi-output network.
  • num_filters (list of int) – Number of channels for the appended layers, ignored if network`is `None.
  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.
  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.
  • steps (list of int) – Step size of anchor boxes in each output layer.
  • classes (iterable of str) – Names of all categories.
  • use_1x1_transition (bool) – Whether to use 1x1 convolution as transition layer between attached layers, it is effective reducing model capacity.
  • use_bn (bool) – Whether to use BatchNorm layer after each attached convolutional layer.
  • reduce_ratio (float) – Channel reduce ratio (0, 1) of the transition layer.
  • min_depth (int) – Minimum channels for the transition layers.
  • global_pool (bool) – Whether to attach a global average pooling layer as the last output layer.
  • pretrained (bool) – Description of parameter pretrained.
  • iou_thresh (float, default is 0.5) – IOU overlap threshold of matching targets, used during training phase.
  • neg_thresh (float, default is 0.5) – Negative mining threshold for un-matched anchors, this is to avoid highly overlapped anchors to be treated as negative samples.
  • negative_mining_ratio (float, default is 3) – Ratio of negative vs. positive samples.
  • stds (tuple of float, default is (0.1, 0.1, 0.2, 0.2)) – Std values to be divided/multiplied to box encoded values.
  • nms_thresh (float, default is 0.45.) – Non-maximum suppression threshold. You can speficy < 0 or > 1 to disable NMS.
  • nms_topk (int, default is -1) –
    Apply NMS to top k detection results, use -1 to disable so that every Detection
    result is used in NMS.
  • anchor_alloc_size (tuple of int, default is (128, 128)) – For advanced users. Define anchor_alloc_size to generate large enough anchor maps, which will later saved in parameters. During inference, we support arbitrary input image by cropping corresponding area of the anchor map. This allow us to export to symbol so we can run it in c++, scalar, etc.
hybrid_forward(F, x)[source]

Hybrid forward

gluoncv.model_zoo.ssd.get_ssd(name, base_size, features, filters, sizes, ratios, steps, classes, dataset, pretrained=False, pretrained_base=True, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get SSD models.

Parameters:
  • name (str or None) – Model name, if None is used, you must specify features to be a HybridBlock.
  • base_size (int) – Base image size for training, this is fixed once training is assigned. A fixed base size still allows you to have variable input size during test.
  • features (iterable of str or HybridBlock) – List of network internal output names, in order to specify which layers are used for predicting bbox values. If name is None, features must be a HybridBlock which generate mutliple outputs for prediction.
  • filters (iterable of float or None) – List of convolution layer channels which is going to be appended to the base network feature extractor. If name is None, this is ignored.
  • sizes (iterable fo float) – Sizes of anchor boxes, this should be a list of floats, in incremental order. The length of sizes must be len(layers) + 1. For example, a two stage SSD model can have sizes = [30, 60, 90], and it converts to [30, 60] and [60, 90] for the two stages, respectively. For more details, please refer to original paper.
  • ratios (iterable of list) – Aspect ratios of anchors in each output layer. Its length must be equals to the number of SSD output layers.
  • steps (list of int) – Step size of anchor boxes in each output layer.
  • classes (iterable of str) – Names of categories.
  • dataset (str) – Name of dataset. This is used to identify model name because models trained on differnet datasets are going to be very different.
  • pretrained (bool, optional, default is False) – Load pretrained weights.
  • pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized. Note that if pretrained is Ture, this has no effect.
  • ctx (mxnet.Context) – Context such as mx.cpu(), mx.gpu(0).
  • root (str) – Model weights storing path.
Returns:

A SSD detection network.

Return type:

HybridBlock

gluoncv.model_zoo.ssd.ssd_300_vgg16_atrous_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 300x300 base network.

Parameters:
  • pretrained (bool, optional, default is False) – Load pretrained weights.
  • pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized.
Returns:

A SSD detection network.

Return type:

HybridBlock

gluoncv.model_zoo.ssd.ssd_512_vgg16_atrous_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with VGG16 atrous 512x512 base network.

Parameters:
  • pretrained (bool, optional, default is False) – Load pretrained weights.
  • pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized.
Returns:

A SSD detection network.

Return type:

HybridBlock

gluoncv.model_zoo.ssd.ssd_512_resnet50_v1_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v1 50 layers.

Parameters:
  • pretrained (bool, optional, default is False) – Load pretrained weights.
  • pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized.
Returns:

A SSD detection network.

Return type:

HybridBlock

gluoncv.model_zoo.ssd.ssd_512_resnet101_v2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v2 101 layers.

Parameters:
  • pretrained (bool, optional, default is False) – Load pretrained weights.
  • pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized.
Returns:

A SSD detection network.

Return type:

HybridBlock

gluoncv.model_zoo.ssd.ssd_512_resnet152_v2_voc(pretrained=False, pretrained_base=True, **kwargs)[source]

SSD architecture with ResNet v2 152 layers.

Parameters:
  • pretrained (bool, optional, default is False) – Load pretrained weights.
  • pretrained_base (bool, optional, default is True) – Load pretrained base network, the extra layers are randomized.
Returns:

A SSD detection network.

Return type:

HybridBlock

class gluoncv.model_zoo.ssd.VGGAtrousExtractor(layers, filters, extras, batch_norm=False, **kwargs)[source]

VGG Atrous multi layer feature extractor which produces multiple output feauture maps.

Parameters:
  • layers (list of int) – Number of layer for vgg base network.
  • filters (list of int) – Number of convolution filters for each layer.
  • extras (list of list) – Extra layers configurations.
  • batch_norm (bool) – If True, will use BatchNorm layers.
hybrid_forward(F, x, init_scale)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
gluoncv.model_zoo.ssd.get_vgg_atrous_extractor(num_layers, im_size, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Get VGG atrous feature extractor networks.

Parameters:
  • num_layers (int) – VGG types, can be 11,13,16,19.
  • im_size (int) – VGG detection input size, can be 300, 512.
  • pretrained (bool) – Load pretrained weights if True.
  • ctx (mx.Context) – Context such as mx.cpu(), mx.gpu(0).
  • root (str) – Model weights storing path.
Returns:

The returned network.

Return type:

mxnet.gluon.HybridBlock

gluoncv.model_zoo.ssd.vgg16_atrous_300(**kwargs)[source]

Get VGG atrous 16 layer 300 in_size feature extractor networks.

gluoncv.model_zoo.ssd.vgg16_atrous_512(**kwargs)[source]

Get VGG atrous 16 layer 512 in_size feature extractor networks.

Semantic Segmentation

BaseModel

FCN

class gluoncv.model_zoo.FCN(nclass, backbone='resnet50', norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, aux=True, **kwargs)[source]

Fully Convolutional Networks for Semantic Segmentation

Parameters:
  • nclass (int) – Number of categories for the training dataset.
  • backbone (string) – Pre-trained dilated backbone network type (default:’resnet50’; ‘resnet50’, ‘resnet101’ or ‘resnet152’).
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;

Reference:

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” CVPR, 2015

Examples

>>> model = FCN(nclass=21, backbone='resnet50')
>>> print(model)
forward(x)[source]

Defines the forward computation. Arguments can be either NDArray or Symbol.

gluoncv.model_zoo.get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

FCN model from the paper “Fully Convolutional Network for semantic segmentation”

Parameters:
  • dataset (str, default pascal_voc) – The dataset that model pretrained on. (pascal_voc, ade20k)
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn(dataset='pascal_voc', backbone='resnet50', pretrained=False)
>>> print(model)
gluoncv.model_zoo.get_fcn_voc_resnet50(**kwargs)[source]

FCN model with base network ResNet-50 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_voc_resnet50(pretrained=True)
>>> print(model)
gluoncv.model_zoo.get_fcn_voc_resnet101(**kwargs)[source]

FCN model with base network ResNet-101 pre-trained on Pascal VOC dataset from the paper “Fully Convolutional Network for semantic segmentation”

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.

Examples

>>> model = get_fcn_voc_resnet101(pretrained=True)
>>> print(model)

Dilated Network

We apply dilattion strategy to pre-trained ResNet models (with stride of 8). Please see gluoncv.model_zoo.SegBaseModel for how to use it.

DilatedResNetV0

class gluoncv.model_zoo.dilated.dilatedresnetv0.DilatedResNetV0(block, layers, num_classes=1000, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, **kwargs)[source]

Dilated Pre-trained DilatedResNetV0 Model, which preduces the strides of 8 featuremaps at conv5.

Parameters:
  • block (Block) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.
  • layers (list of int) – Numbers of layers in each block
  • num_classes (int, default 1000) – Number of classification classes.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

Reference:

  • He, Kaiming, et al. “Deep residual learning for image recognition.”

Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

  • Yu, Fisher, and Vladlen Koltun. “Multi-scale context aggregation by dilated convolutions.”
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
gluoncv.model_zoo.dilated.dilatedresnetv0.dilated_resnet18(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a DilatedResNetV0-18 model.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).
gluoncv.model_zoo.dilated.dilatedresnetv0.dilated_resnet34(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a DilatedResNetV0-34 model.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;
gluoncv.model_zoo.dilated.dilatedresnetv0.dilated_resnet50(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a DilatedResNetV0-50 model.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;
gluoncv.model_zoo.dilated.dilatedresnetv0.dilated_resnet101(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a DilatedResNetV0-101 model.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;
gluoncv.model_zoo.dilated.dilatedresnetv0.dilated_resnet152(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs)[source]

Constructs a DilatedResNetV0-152 model.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm;

DilatedResNetV2

class gluoncv.model_zoo.dilated.dilatedresnetv2.DilatedResNetV2(block, layers, channels, classes=1000, thumbnail=False, norm_layer=<class 'mxnet.gluon.nn.basic_layers.BatchNorm'>, **kwargs)[source]

Dilated_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • block (Block) – Class for the residual block. Options are BasicBlockV1, BottleneckV1.
  • layers (list of int) – Numbers of layers in each block
  • channels (list of int) – Numbers of channels in each block. Length should be one larger than layers list.
  • classes (int, default 1000) – Number of classification classes.
  • thumbnail (bool, default False) – Enable thumbnail.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
gluoncv.model_zoo.dilated.dilatedresnetv2.get_dilated_resnet(version, num_layers, pretrained=False, ctx=cpu(0), root='~/.mxnet/models', **kwargs)[source]

Dilated_ResNet V1 model from “Deep Residual Learning for Image Recognition” paper. Dilated_ResNet V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • version (int) – Version of Dilated_ResNet. Options are 1, 2.
  • num_layers (int) – Numbers of layers. Options are 18, 34, 50, 101, 152.
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).
gluoncv.model_zoo.dilated.dilatedresnetv2.dilated_resnet18(**kwargs)[source]

Dilated_ResNet-18 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).
gluoncv.model_zoo.dilated.dilatedresnetv2.dilated_resnet34(**kwargs)[source]

Dilated_ResNet-34 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).
gluoncv.model_zoo.dilated.dilatedresnetv2.dilated_resnet50(**kwargs)[source]

Dilated_ResNet-50 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).
gluoncv.model_zoo.dilated.dilatedresnetv2.dilated_resnet101(**kwargs)[source]

Dilated_ResNet-101 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).
gluoncv.model_zoo.dilated.dilatedresnetv2.dilated_resnet152(**kwargs)[source]

Dilated_ResNet-152 V2 model from “Identity Mappings in Deep Residual Networks” paper.

Parameters:
  • pretrained (bool, default False) – Whether to load the pretrained weights for model.
  • ctx (Context, default CPU) – The context in which to load the pretrained weights.
  • root (str, default '~/.mxnet/models') – Location for keeping the model parameters.
  • norm_layer (object) – Normalization layer used in backbone network (default: mxnet.gluon.nn.BatchNorm; for Synchronized Cross-GPU BachNormalization).

Common Components

Bounding Box

class gluoncv.model_zoo.bbox.BBoxCornerToCenter(split=False)[source]

Convert corner boxes to center boxes. Corner boxes are encoded as (xmin, ymin, xmax, ymax) Center boxes are encoded as (center_x, center_y, width, height)

Parameters:split (bool) – Whether split boxes to individual elements after processing.
Returns:
Return type:A BxNx4 NDArray if split is False, or 4 BxNx1 NDArray if split is True
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.model_zoo.bbox.BBoxCenterToCorner(split=False)[source]

Convert center boxes to corner boxes. Corner boxes are encoded as (xmin, ymin, xmax, ymax) Center boxes are encoded as (center_x, center_y, width, height)

Parameters:split (bool) – Whether split boxes to individual elements after processing.
Returns:
Return type:A BxNx4 NDArray if split is False, or 4 BxNx1 NDArray if split is True.
hybrid_forward(F, x)[source]

Hybrid forward

Coders

class gluoncv.model_zoo.coders.NormalizedBoxCenterEncoder(stds=(0.1, 0.1, 0.2, 0.2))[source]

Encode bounding boxes training target with normalized center offsets.

Input bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.

Parameters:stds (array-like of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
forward(samples, matches, anchors, refs)[source]

Forward

class gluoncv.model_zoo.coders.NormalizedBoxCenterDecoder(stds=(0.1, 0.1, 0.2, 0.2))[source]

Decode bounding boxes training target with normalized center offsets. This decoder must cooperate with NormalizedBoxCenterEncoder of same stds in order to get properly reconstructed bounding boxes.

Returned bounding boxes are using corner type: x_{min}, y_{min}, x_{max}, y_{max}.

Parameters:stds (array-like of size 4) – Std value to be divided from encoded values, default is (0.1, 0.1, 0.2, 0.2).
hybrid_forward(F, x, anchors)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.model_zoo.coders.MultiClassEncoder(ignore_label=-1)[source]

Encode classification training target given matching results.

This encoder will assign training target of matched bounding boxes to ground-truth label + 1 and negative samples with label 0. Ignored samples will be assigned with ignore_label, whose default is -1.

Parameters:ignore_label (float) – Assigned to un-matched samples, they are neither positive or negative during training, and should be excluded in loss function. Default is -1.
hybrid_forward(F, samples, matches, refs)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.model_zoo.coders.MultiClassDecoder(axis=-1, thresh=0.01)[source]

Decode classification results.

This decoder must work with MultiClassEncoder to reconstruct valid labels. The decoder expect results are after logits, e.g. Softmax.

Parameters:
  • axis (int) – Axis of class-wise results.
  • thresh (float) – Confidence threshold for the post-softmax scores. Scores less than thresh are marked with 0, corresponding cls_id is marked with invalid class id -1.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.model_zoo.coders.MultiPerClassDecoder(num_class, axis=-1, thresh=0.01)[source]

Decode classification results.

This decoder must work with MultiClassEncoder to reconstruct valid labels. The decoder expect results are after logits, e.g. Softmax. This version is different from gluoncv.model_zoo.coders.MultiClassDecoder with the following changes:

For each position(anchor boxes), each foreground class can have their own results, rather than enforced to be the best one. For example, for a 5-class prediction with background(totaling 6 class), say (0.5, 0.1, 0.2, 0.1, 0.05, 0.05) as (bg, apple, orange, peach, grape, melon), MultiClassDecoder produce only one class id and score, that is (orange-0.2). MultiPerClassDecoder produce 5 results individually: (apple-0.1, orange-0.2, peach-0.1, grape-0.05, melon-0.05).

Parameters:
  • num_class (int) – Number of classes including background.
  • axis (int) – Axis of class-wise results.
  • thresh (float) – Confidence threshold for the post-softmax scores. Scores less than thresh are marked with 0, corresponding cls_id is marked with invalid class id -1.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.

Features

class gluoncv.model_zoo.features.FeatureExtractor(network, outputs, inputs=('data', ), pretrained=False, ctx=cpu(0))[source]

Feature extractor.

Parameters:
  • network (str or HybridBlock or Symbol) – Logic chain: load from gluon.model_zoo.vision if network is string. Convert to Symbol if network is HybridBlock
  • outputs (str or list of str) – The name of layers to be extracted as features
  • inputs (list of str or list of Symbol) – The inputs of network.
  • pretrained (bool) – Use pretrained parameters as in gluon.model_zoo
  • ctx (Context) – The context, e.g. mxnet.cpu(), mxnet.gpu(0).
class gluoncv.model_zoo.features.FeatureExpander(network, outputs, num_filters, use_1x1_transition=True, use_bn=True, reduce_ratio=1.0, min_depth=128, global_pool=False, pretrained=False, ctx=cpu(0), inputs=('data', ))[source]

Feature extractor with additional layers to append. This is very common in vision networks where extra branches are attched to backbone network.

Parameters:
  • network (str or HybridBlock or Symbol) – Logic chain: load from gluon.model_zoo.vision if network is string. Convert to Symbol if network is HybridBlock.
  • outputs (str or list of str) – The name of layers to be extracted as features
  • num_filters (list of int) – Number of filters to be appended.
  • use_1x1_transition (bool) – Whether to use 1x1 convolution between attached layers. It is effective reducing network size.
  • use_bn (bool) – Whether to use BatchNorm between attached layers.
  • reduce_ratio (float) – Channel reduction ratio of the transition layers.
  • min_depth (int) – Minimum channel number of transition layers.
  • global_pool (bool) – Whether to use global pooling as the last layer.
  • pretrained (bool) – Use pretrained parameters as in gluon.model_zoo if True.
  • ctx (Context) – The context, e.g. mxnet.cpu(), mxnet.gpu(0).
  • inputs (list of str) – Name of input variables to the network.

Losses

class gluoncv.model_zoo.losses.FocalLoss(axis=-1, alpha=0.25, gamma=2, sparse_label=True, from_logits=False, batch_axis=0, weight=None, num_class=None, eps=1e-12, size_average=True, **kwargs)[source]

Focal Loss for inbalanced classification. Focal loss was described in https://arxiv.org/abs/1708.02002

Parameters:
  • axis (int, default -1) – The axis to sum over when computing softmax and entropy.
  • alpha (float, default 0.25) – The alpha which controls loss curve.
  • gamma (float, default 2) – The gamma which controls loss curve.
  • sparse_label (bool, default True) – Whether label is an integer array instead of probability distribution.
  • from_logits (bool, default False) – Whether input is a log probability (usually from log_softmax) instead.
  • batch_axis (int, default 0) – The axis that represents mini-batch.
  • weight (float or None) – Global scalar weight for loss.
  • num_class (int) – Number of classification categories. It is required is sparse_label is True.
  • eps (float) – Eps to avoid numerical issue.
  • size_average (bool, default True) – If True, will take mean of the output loss on every axis except batch_axis.
  • Inputs
    • pred: the prediction tensor, where the batch_axis dimension ranges over batch size and axis dimension ranges over the number of classes.
    • label: the truth tensor. When sparse_label is True, label’s shape should be pred’s shape with the axis dimension removed. i.e. for pred with shape (1,2,3,4) and axis = 2, label’s shape should be (1,2,4) and values should be integers between 0 and 2. If sparse_label is False, label’s shape must be the same as pred and values should be floats in the range [0, 1].
    • sample_weight: element-wise weighting tensor. Must be broadcastable to the same shape as label. For example, if label has shape (64, 10) and you want to weigh each sample in the batch separately, sample_weight should have shape (64, 1).
  • Outputs
    • loss: loss tensor with shape (batch_size,). Dimenions other than batch_axis are averaged out.
hybrid_forward(F, pred, label, sample_weight=None)[source]

Loss forward

Matchers

class gluoncv.model_zoo.matchers.CompositeMatcher(matchers)[source]

A Matcher that combines multiple strategies.

Parameters:matchers (list of Matcher) – Matcher is a Block/HybridBlock used to match two groups of boxes
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.model_zoo.matchers.BipartiteMatcher(threshold=1e-12, is_ascend=False)[source]

A Matcher implementing bipartite matching strategy.

Parameters:
  • threshold (float) – Threshold used to ignore invalid paddings
  • is_ascend (bool) – Whether sort matching order in ascending order. Default is False.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.model_zoo.matchers.MaximumMatcher(threshold)[source]

A Matcher implementing maximum matching strategy.

Parameters:threshold (float) – Matching threshold.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.

Predictors

class gluoncv.model_zoo.predictors.ConvPredictor(num_channel, kernel=(3, 3), pad=(1, 1), stride=(1, 1), activation=None, use_bias=True, **kwargs)[source]

Convolutional predictor. Convolutional predictor is widely used in object-detection. It can be used to predict classification scores (1 channel per class) or box predictor, which is usually 4 channels per box. The output is of shape (N, num_channel, H, W).

Parameters:
  • num_channel (int) – Number of conv channels.
  • kernel (tuple of (int, int), default (3, 3)) – Conv kernel size as (H, W).
  • pad (tuple of (int, int), default (1, 1)) – Conv padding size as (H, W).
  • stride (tuple of (int, int), default (1, 1)) – Conv stride size as (H, W).
  • activation (str, optional) – Optional activation after conv, e.g. ‘relu’.
  • use_bias (bool) – Use bias in convolution. It is not necessary if BatchNorm is followed.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.
class gluoncv.model_zoo.predictors.FCPredictor(num_output, activation=None, use_bias=True, **kwargs)[source]

Fully connected predictor. Fully connected predictor is used to ignore spatial information and will output fixed-sized predictions.

Parameters:
  • num_output (int) – Number of fully connected outputs.
  • activation (str, optional) – Optional activation after conv, e.g. ‘relu’.
  • use_bias (bool) – Use bias in convolution. It is not necessary if BatchNorm is followed.
hybrid_forward(F, x)[source]

Overrides to construct symbolic graph for this Block.

Parameters:
  • x (Symbol or NDArray) – The first input tensor.
  • *args (list of Symbol or list of NDArray) – Additional input tensors.

Samplers

class gluoncv.model_zoo.samplers.NaiveSampler[source]

A naive sampler that take all existing matching results. There is no ignored sample in this case.

hybrid_forward(F, x)[source]

Hybrid forward

class gluoncv.model_zoo.samplers.OHEMSampler(ratio, min_samples=0, thresh=0.5)[source]

A sampler implementing Online Hard-negative mining. As described in paper https://arxiv.org/abs/1604.03540.

Parameters:
  • ratio (float) – Ratio of negative vs. positive samples. Values >= 1.0 is recommended.
  • min_samples (int, default 0) – Minimum samples to be selected regardless of positive samples. For example, if positive samples is 0, we sometimes still want some num_negative samples to be selected.
  • thresh (float, default 0.5) – IOU overlap threshold of selected negative samples. IOU must not exceed this threshold such that good matching anchors won’t be selected as negative samples.
forward(x, logits, ious)[source]

Forward