Vision Model Zoo¶

GluonCV Model Zoo, similar to the upstream Gluon Model Zoo, provides pre-defined and pre-trained models to help bootstrap computer vision applications.

Model Zoo API¶

from gluoncv import model_zoo
# load a ResNet model trained on CIFAR10
cifar_resnet20 = model_zoo.get_model('cifar_resnet20_v1', pretrained=True)
# load a pre-trained ssd model
ssd0 = model_zoo.get_model('ssd_300_vgg16_atrous_voc', pretrained=True)
# load ssd model with pre-trained feature extractors
ssd1 = model_zoo.get_model('ssd_512_vgg16_atrous_voc', pretrained_base=True)
# load ssd model without initialization
ssd2 = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained_base=False)

We recommend using gluoncv.model_zoo.get_model() for loading pre-defined models, because it provides name checking and list available choices.

However, you can still load models by directly instantiate it like

from gluoncv import model_zoo
cifar_resnet20 = model_zoo.cifar_resnet20_v1(pretrained=True)

Hint

Detailed model_zoo APIs are available in API reference: gluoncv.model_zoo().

Summary of Available Models¶

GluonCV is still under development, more models will be added later.

Image Classification¶

The following table lists pre-trained models trained on CIFAR10. For models trained on ImageNet, please refer to upstream Gluon Model Zoo.

Hint

Our pre-trained models reproduce results from “Mix-Up” [4] . Please check the reference paper for further information.

Training commands in the table work with the following scripts:

For vanilla training: Download train_cifar10.py
For mix-up training: Download train_mixup_cifar10.py

Model	Acc (Vanilla/Mix-Up [4] )	Training Command	Training Log
CIFAR_ResNet20_v1 [1]	90.8 / 91.6	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_ResNet56_v1 [1]	92.8 / 93.8	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_ResNet110_v1 [1]	93.4 / 94.7	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_ResNet20_v2 [2]	90.8 / 91.3	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_ResNet56_v2 [2]	93.1 / 94.1	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_ResNet110_v2 [2]	93.7 / 94.6	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_WideResNet16_10 [3]	95.1 / 96.1	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_WideResNet28_10 [3]	95.6 / 96.6	Vanilla / Mix-Up	Vanilla / Mix-Up
CIFAR_WideResNet40_8 [3]	95.9 / 96.7	Vanilla / Mix-Up	Vanilla / Mix-Up

Object Detection¶

The following table lists pre-trained models for object detection and their performances.

Hint

Model attributes are coded in their names. For instance, ssd_300_vgg16_atrous_voc consists of four parts:

ssd indicate the algorithm is “Single Shot Multibox Object Detection” [5].
300 is the training image size, which means training images are resized to 300x300 and all anchor boxes are designed to match this shape.
vgg16_atrous is the type of base feature extractor network.
voc is the training dataset.

Hint

The training commands work with the following scripts:

For SSD networks: Download train_ssd.py

Model	mAP	Training Command	Training log
ssd_300_vgg16_atrous_voc [5]	77.6	shell script	log
ssd_512_vgg16_atrous_voc [5]	79.2	shell script	log
ssd_512_resnet50_v1_voc [5]	80.1	shell script

Semantic Segmentation¶

Table of pre-trained models for semantic segmentation and their performance.

Hint

The model names contain the training information. For instance, fcn_resnet50_voc:

fcn indicate the algorithm is “Fully Convolutional Network for Semantic Segmentation” [6].
resnet50 is the name of backbone network.
voc is the training dataset.

The training commands work with the script: Download train.py

Name	Method	mIoU	Training Command	Training log
fcn_resnet50_voc	FCN [6]	69.4	shell script	log
fcn_resnet101_voc	FCN [6]	70.9	shell script	log

[1]	(1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.

[2]	(1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks.” In European Conference on Computer Vision, pp. 630-645. Springer, Cham, 2016.

[3]	(1, 2, 3) Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016).

[4]	(1, 2) Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412 (2017).

[5]	(1, 2, 3, 4) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.

[6]	(1, 2, 3) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.