Prepare PASCAL VOC datasets¶

Pascal VOC contains a collection of datasets for object detection. The most commonly adopted version for benchmarking is using 2007 trainval and 2012 trainval for training and 2007 test for validation. This tutorial will walk you through the steps for preparing this dataset to be used by GluonVision.

http://host.robots.ox.ac.uk/pascal/VOC/pascal2.png

Prepare the dataset¶

The easiest way is simply running this script:

Download Pascal VOC Prepare Script: pascal_voc.py

which will automatically download and extract the data into ~/.mxnet/datasets/voc.

python pascal_voc.py

Note

You need 8.4 GB disk space to download and extract this dataset. SSD is preferred over HDD because of its better performance.

Note

The total time to prepare the dataset depends on your Internet speed and disk performance. For example, it often takes 10min on AWS EC2 with EBS.

If you have already downloaded the following required files

Filename	Size	SHA-1
VOCtrainval_06-Nov-2007.tar	439 MB	34ed68851bce2a36e2a223fa52c661d592c66b3c
VOCtest_06-Nov-2007.tar	430 MB	41a8d6e12baa5ab18ee7f8f8029b9e11805b4ef1
VOCtrainval_11-May-2012.tar	1.9 GB	4e443f8a2eca6b1dac8a6c57641b67dd40621a49
benchmark.tgz	1.4 GB	7129e0a480c2d6afb02b517bb18ac54283bfaa35

then you can specify the folder name through --dir to avoid download them again.

For example, make sure you have these files exist in ~/VOCdevkit/downloads, and you can run

python pascal_voc.py --dir ~/VOCdevkit

to extract them.

How to load the dataset¶

Load image and label from Pascal VOC is quite straight-forward

from gluonvision.data import VOCDetection
train_dataset = VOCDetection(splits=[(2007, 'trainval'), (2012, 'trainval')])
val_dataset = VOCDetection(splits=[(2007, 'test')])
print('Training images:', len(train_dataset))
print('Validation images:', len(val_dataset))

Out:

Training images: 16551
Validation images: 4952

Check the first example¶

train_image, train_label = train_dataset[0]
bboxes = train_label[:, :4]
cids = train_label[:, 4:5]
print('image size:', train_image.shape)
print('bboxes:', bboxes.shape, 'class ids:', cids.shape)

from matplotlib import pyplot as plt
from gluonvision.utils import viz
ax = viz.plot_bbox(train_image.asnumpy(), bboxes, scores=None, labels=cids, class_names=train_dataset.classes)
plt.show()

Out:

image size: (375, 500, 3)
bboxes: (5, 4) class ids: (5, 1)

Gallery generated by Sphinx-Gallery