Vision Model Zoo¶
GluonCV Model Zoo, similar to the upstream Gluon Model Zoo, provides pre-defined and pre-trained models to help bootstrap computer vision applications.
Model Zoo API¶
from gluoncv import model_zoo
# load a ResNet model trained on CIFAR10
cifar_resnet20 = model_zoo.get_model('cifar_resnet20_v1', pretrained=True)
# load a pre-trained ssd model
ssd0 = model_zoo.get_model('ssd_300_vgg16_atrous_voc', pretrained=True)
# load ssd model with pre-trained feature extractors
ssd1 = model_zoo.get_model('ssd_512_vgg16_atrous_voc', pretrained_base=True)
# load ssd model without initialization
ssd2 = model_zoo.get_model('ssd_512_resnet50_v1_voc', pretrained_base=False)
We recommend using gluoncv.model_zoo.get_model() for loading
pre-defined models, because it provides name checking and list available choices.
However, you can still load models by directly instantiate it like
from gluoncv import model_zoo
cifar_resnet20 = model_zoo.cifar_resnet20_v1(pretrained=True)
Hint
Detailed model_zoo APIs are available in API reference: gluoncv.model_zoo().
Summary of Available Models¶
GluonCV is still under development, more models will be added later.
Image Classification¶
The following table lists pre-trained models trained on CIFAR10. For models trained on ImageNet, please refer to upstream Gluon Model Zoo.
Hint
Our pre-trained models reproduce results from “Mix-Up” [4] . Please check the reference paper for further information.
Training commands in the table work with the following scripts:
- For vanilla training:
Download train_cifar10.py - For mix-up training:
Download train_mixup_cifar10.py
| Model | Acc (Vanilla/Mix-Up [4] ) | Training Command | Training Log |
|---|---|---|---|
| CIFAR_ResNet20_v1 [1] | 90.8 / 91.6 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_ResNet56_v1 [1] | 92.8 / 93.8 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_ResNet110_v1 [1] | 93.4 / 94.7 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_ResNet20_v2 [2] | 90.8 / 91.3 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_ResNet56_v2 [2] | 93.1 / 94.1 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_ResNet110_v2 [2] | 93.7 / 94.6 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_WideResNet16_10 [3] | 95.1 / 96.1 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_WideResNet28_10 [3] | 95.6 / 96.6 | Vanilla / Mix-Up | Vanilla / Mix-Up |
| CIFAR_WideResNet40_8 [3] | 95.9 / 96.7 | Vanilla / Mix-Up | Vanilla / Mix-Up |
Object Detection¶
The following table lists pre-trained models for object detection and their performances.
Hint
Model attributes are coded in their names.
For instance, ssd_300_vgg16_atrous_voc consists of four parts:
ssdindicate the algorithm is “Single Shot Multibox Object Detection” [5].300is the training image size, which means training images are resized to 300x300 and all anchor boxes are designed to match this shape.vgg16_atrousis the type of base feature extractor network.vocis the training dataset.
| Model | mAP | Training Command | Training log |
|---|---|---|---|
| ssd_300_vgg16_atrous_voc [5] | 77.6 | shell script | log |
| ssd_512_vgg16_atrous_voc [5] | 79.2 | shell script | log |
| ssd_512_resnet50_v1_voc [5] | 80.1 | shell script |
Semantic Segmentation¶
Table of pre-trained models for semantic segmentation and their performance.
Hint
The model names contain the training information. For instance, fcn_resnet50_voc:
fcnindicate the algorithm is “Fully Convolutional Network for Semantic Segmentation” [6].resnet50is the name of backbone network.vocis the training dataset.
The training commands work with the script: Download train.py
| Name | Method | mIoU | Training Command | Training log |
|---|---|---|---|---|
| fcn_resnet50_voc | FCN [6] | 69.4 | shell script | log |
| fcn_resnet101_voc | FCN [6] | 70.9 | shell script | log |
| [1] | (1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. |
| [2] | (1, 2, 3) He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Identity mappings in deep residual networks.” In European Conference on Computer Vision, pp. 630-645. Springer, Cham, 2016. |
| [3] | (1, 2, 3) Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016). |
| [4] | (1, 2) Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. “mixup: Beyond empirical risk minimization.” arXiv preprint arXiv:1710.09412 (2017). |
| [5] | (1, 2, 3, 4) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016. |
| [6] | (1, 2, 3) Long, Jonathan, Evan Shelhamer, and Trevor Darrell. “Fully convolutional networks for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. |