Vision Model Zoo

GluonCV Model Zoo, similar to the upstream Gluon Model Zoo, provides pre-defined and pre-trained models to help bootstrap computer vision applications.

Model Zoo API

from gluoncv import model_zoo
# load a ResNet model trained on CIFAR10
cifar_resnet20 = model_zoo.get_model('cifar_resnet20_v1', pretrained=True)
# load a pre-trained ssd model
ssd0 = model_zoo.get_model('ssd_300_vgg16_atrous_voc', pretrained=True)
# load ssd model with pre-trained feature extractors
ssd1 = model_zoo.get_model('ssd_512_vgg16_atrous_voc', pretrained_base=True)
# load ssd model without initialization
ssd2 = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained_base=False)

We recommend using gluoncv.model_zoo.get_model() for loading pre-defined models, because it provides name checking and list available choices.

However, you can still load models by directly instantiate it like

from gluoncv import model_zoo
cifar_resnet20 = model_zoo.cifar_resnet20_v1(pretrained=True)

Hint

Detailed model_zoo APIs are available in API reference: gluoncv.model_zoo().

Summary of Available Models

GluonCV is still under development, more models will be added later.

Image Classification

The following table lists pre-trained models trained on CIFAR10. For models trained on ImageNet, please refer to upstream Gluon Model Zoo.

Hint

Our pre-trained models reproduce results from “Mix-Up” [4] . Please check the reference paper for further information.

Training commands in the table work with the following scripts:

Model Acc (Vanilla/Mix-Up [4] ) Training Command Training Log
CIFAR_ResNet20_v1 [1] 90.8 / 91.6 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet56_v1 [1] 92.8 / 93.8 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet110_v1 [1] 93.4 / 94.7 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet20_v2 [2] 90.8 / 91.3 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet56_v2 [2] 93.1 / 94.1 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_ResNet110_v2 [2] 93.7 / 94.6 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet16_10 [3] 95.1 / 96.1 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet28_10 [3] 95.6 / 96.6 Vanilla / Mix-Up Vanilla / Mix-Up
CIFAR_WideResNet40_8 [3] 95.9 / 96.7 Vanilla / Mix-Up Vanilla / Mix-Up

Object Detection

The following table lists pre-trained models for object detection and their performances.

Hint

Model attributes are coded in their names. For instance, ssd_300_vgg16_atrous_voc consists of four parts:

  • ssd indicate the algorithm is “Single Shot Multibox Object Detection” [5].
  • 300 is the training image size, which means training images are resized to 300x300 and all anchor boxes are designed to match this shape.
  • vgg16_atrous is the type of base feature extractor network.
  • voc is the training dataset.

Hint

The training commands work with the following scripts:

Model mAP Training Command Training log
ssd_300_vgg16_atrous_voc [5] 77.6 shell script log
ssd_512_vgg16_atrous_voc [5] 79.2 shell script log
ssd_512_resnet50_v1_voc [5] 80.1 shell script  

Semantic Segmentation

Table of pre-trained models for semantic segmentation and their performance.

Hint

The model names contain the training information. For instance, fcn_resnet50_voc:

  • fcn indicate the algorithm is “Fully Convolutional Network for Semantic Segmentation” [6].
  • resnet50 is the name of backbone network.
  • voc is the training dataset.

The training commands work with the script: Download train.py

Name Method mIoU Training Command Training log
fcn_resnet50_voc FCN [6] 69.4 shell script log
fcn_resnet101_voc FCN [6] 70.9 shell script log
[1](1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
[2](1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks.” In European Conference on Computer Vision, pp. 630-645. Springer, Cham, 2016.
[3](1, 2, 3) Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016).
[4](1, 2) Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412 (2017).
[5](1, 2, 3, 4) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.
[6](1, 2, 3) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.