Vision Model Zoo¶
GluonCV Model Zoo, similar to the upstream Gluon Model Zoo, provides pre-defined and pre-trained models to help bootstrap computer vision applications.
Model Zoo API¶
from gluoncv import model_zoo
# load a ResNet model trained on CIFAR10
cifar_resnet20 = model_zoo.get_model('cifar_resnet20_v1', pretrained=True)
# load a pre-trained ssd model
ssd0 = model_zoo.get_model('ssd_300_vgg16_atrous_voc', pretrained=True)
# load ssd model with pre-trained feature extractors
ssd1 = model_zoo.get_model('ssd_512_vgg16_atrous_voc', pretrained_base=True)
# load ssd model without initialization
ssd2 = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained_base=False)
We recommend using gluoncv.model_zoo.get_model()
for loading
pre-defined models, because it provides name checking and list available choices.
However, you can still load models by directly instantiate it like
from gluoncv import model_zoo
cifar_resnet20 = model_zoo.cifar_resnet20_v1(pretrained=True)
Hint
Detailed model_zoo
APIs are available in API reference: gluoncv.model_zoo()
.
Summary of Available Models¶
GluonCV is still under development, more models will be added later.
Image Classification¶
The following table lists pre-trained models trained on CIFAR10. For models trained on ImageNet, please refer to upstream Gluon Model Zoo.
Hint
Our pre-trained models reproduce results from “Mix-Up” [4] . Please check the reference paper for further information.
Training commands in the table work with the following scripts:
- For vanilla training:
Download train_cifar10.py
- For mix-up training:
Download train_mixup_cifar10.py
Model | Acc (Vanilla/Mix-Up [4] ) | Training Command | Training Log |
---|---|---|---|
CIFAR_ResNet20_v1 [1] | 90.8 / 91.6 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_ResNet56_v1 [1] | 92.8 / 93.8 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_ResNet110_v1 [1] | 93.4 / 94.7 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_ResNet20_v2 [2] | 90.8 / 91.3 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_ResNet56_v2 [2] | 93.1 / 94.1 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_ResNet110_v2 [2] | 93.7 / 94.6 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_WideResNet16_10 [3] | 95.1 / 96.1 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_WideResNet28_10 [3] | 95.6 / 96.6 | Vanilla / Mix-Up | Vanilla / Mix-Up |
CIFAR_WideResNet40_8 [3] | 95.9 / 96.7 | Vanilla / Mix-Up | Vanilla / Mix-Up |
Object Detection¶
The following table lists pre-trained models for object detection and their performances.
Hint
Model attributes are coded in their names.
For instance, ssd_300_vgg16_atrous_voc
consists of four parts:
ssd
indicate the algorithm is “Single Shot Multibox Object Detection” [5].300
is the training image size, which means training images are resized to 300x300 and all anchor boxes are designed to match this shape.vgg16_atrous
is the type of base feature extractor network.voc
is the training dataset.
Model | mAP | Training Command | Training log |
---|---|---|---|
ssd_300_vgg16_atrous_voc [5] | 77.6 | shell script | log |
ssd_512_vgg16_atrous_voc [5] | 79.2 | shell script | log |
ssd_512_resnet50_v1_voc [5] | 80.1 | shell script |
Semantic Segmentation¶
Table of pre-trained models for semantic segmentation and their performance.
Hint
The model names contain the training information. For instance, fcn_resnet50_voc
:
fcn
indicate the algorithm is “Fully Convolutional Network for Semantic Segmentation” [6].resnet50
is the name of backbone network.voc
is the training dataset.
The training commands work with the script: Download train.py
Name | Method | mIoU | Training Command | Training log |
---|---|---|---|---|
fcn_resnet50_voc | FCN [6] | 69.4 | shell script | log |
fcn_resnet101_voc | FCN [6] | 70.9 | shell script | log |
[1] | (1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. |
[2] | (1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks.” In European Conference on Computer Vision, pp. 630-645. Springer, Cham, 2016. |
[3] | (1, 2, 3) Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016). |
[4] | (1, 2) Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412 (2017). |
[5] | (1, 2, 3, 4) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. |
[6] | (1, 2, 3) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. |