gluoncv.data.transforms¶

This file includes various transformations that is critical to vision tasks.

Bounding Box Transforms¶

Bounding boxes transformation functions.

gluoncv.data.transforms.bbox.crop(bbox, crop_box=None, allow_outside_center=True)[source]¶

Crop bounding boxes according to slice area.

This method is mainly used with image cropping to ensure bonding boxes fit within the cropped image.

Parameters:	bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations. crop_box (tuple) – Tuple of length 4. \((x_{min}, y_{min}, width, height)\) allow_outside_center (bool) – If False, remove bounding boxes which have centers outside cropping area.
Returns:	Cropped bounding boxes with shape (M, 4+) where M <= N.
Return type:	numpy.ndarray

gluoncv.data.transforms.bbox.flip(bbox, size, flip_x=False, flip_y=False)[source]¶

Flip bounding boxes according to image flipping directions.

Parameters:	bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations. size (tuple) – Tuple of length 2: (width, height). flip_x (bool) – Whether flip horizontally. flip_y (type) – Whether flip vertically.
Returns:	Flipped bounding boxes with original shape.
Return type:	numpy.ndarray

gluoncv.data.transforms.bbox.resize(bbox, in_size, out_size)[source]¶

Resize bouding boxes according to image resize operation.

Parameters:	bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations. in_size (tuple) – Tuple of length 2: (width, height) for input. out_size (tuple) – Tuple of length 2: (width, height) for output.
Returns:	Resized bounding boxes with original shape.
Return type:	numpy.ndarray

gluoncv.data.transforms.bbox.translate(bbox, x_offset=0, y_offset=0)[source]¶

Translate bounding boxes by offsets.

Parameters:	bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations. x_offset (int or float) – Offset along x axis. y_offset (int or float) – Offset along y axis.
Returns:	Translated bounding boxes with original shape.
Return type:	numpy.ndarray

Experimental¶

Experimental bounding box transformations.

gluoncv.data.transforms.experimental.bbox.random_crop_with_constraints(bbox, size, min_scale=0.3, max_scale=1, max_aspect_ratio=2, constraints=None, max_trial=50)[source]¶

Crop an image randomly with bounding box constraints.

This data augmentation is used in training of Single Shot Multibox Detector [#]_. More details can be found in data augmentation section of the original paper. .. [#] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,

Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. ECCV 2016.

Parameters:

bbox (numpy.ndarray) – Numpy.ndarray with shape (N, 4+) where N is the number of bounding boxes. The second axis represents attributes of the bounding box. Specifically, these are \((x_{min}, y_{min}, x_{max}, y_{max})\), we allow additional attributes other than coordinates, which stay intact during bounding box transformations.
size (tuple) – Tuple of length 2 of image shape as (width, height).
min_scale (float) – The minimum ratio between a cropped region and the original image. The default value is 0.3.
max_scale (float) – The maximum ratio between a cropped region and the original image. The default value is 1.
max_aspect_ratio (float) – The maximum aspect ratio of cropped region. The default value is 2.
constraints (iterable of tuples) – An iterable of constraints. Each constraint should be (min_iou, max_iou) format. If means no constraint if set min_iou or max_iou to None. If this argument defaults to None, ((0.1, None), (0.3, None), (0.5, None), (0.7, None), (0.9, None), (None, 1)) will be used.
max_trial (int) – Maximum number of trials for each constraint before exit no matter what.

Returns:

numpy.ndarray – Cropped bounding boxes with shape (M, 4+) where M <= N.
tuple – Tuple of length 4 as (x_offset, y_offset, new_width, new_height).

Image Transforms¶

Extended image transformations to mxnet.image.

gluoncv.data.transforms.image.imresize(src, w, h, interp=1)[source]¶

Resize image with OpenCV.

This is a duplicate of mxnet.image.imresize for name space consistancy.

Parameters:	src (mxnet.nd.NDArray) – source image w (int, required) – Width of resized image. h (int, required) – Height of resized image. interp (int, optional, default='1') – Interpolation method (default=cv2.INTER_LINEAR). out (NDArray, optional) – The output NDArray to hold the result.
Returns:	out – The output of this function.
Return type:	NDArray or list of NDArrays

Examples

>>> import mxnet as mx
>>> from gluoncv import data as gdata
>>> img = mx.random.uniform(0, 255, (300, 300, 3)).astype('uint8')
>>> print(img.shape)
(300, 300, 3)
>>> img = gdata.transforms.image.imresize(img, 200, 200)
>>> print(img.shape)
(200, 200, 3)

gluoncv.data.transforms.image.random_pca_lighting(src, alphastd, eigval=None, eigvec=None)[source]¶

Apply random pca lighting noise to input image.

Parameters:	img (mxnet.nd.NDArray) – Input image with HWC format. alphastd (float) – Noise level [0, 1) for image with range [0, 255]. eigval (list of floats.) – Eigen values, defaults to [55.46, 4.794, 1.148]. eigvec (nested lists of floats) – Eigen vectors with shape (3, 3), defaults to [[-0.5675, 0.7192, 0.4009], [-0.5808, -0.0045, -0.8140], [-0.5836, -0.6948, 0.4203]].
Returns:	Augmented image.
Return type:	mxnet.nd.NDArray

gluoncv.data.transforms.image.random_expand(src, max_ratio=4, fill=0, keep_ratio=True)[source]¶

Random expand original image with borders, this is identical to placing the original image on a larger canvas.

Parameters:

src (mxnet.nd.NDArray) – The original image with HWC format.
max_ratio (int or float) – Maximum ratio of the output image on both direction(vertical and horizontal)
fill (int or float or array-like) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with per-channel values.
keep_ratio (bool) – If True, will keep output image the same aspect ratio as input.

Returns:

mxnet.nd.NDArray – Augmented image.
tuple – Tuple of (offset_x, offset_y, new_width, new_height)

gluoncv.data.transforms.image.random_flip(src, px=0, py=0, copy=False)[source]¶

Randomly flip image along horizontal and vertical with probabilities.

Parameters:

src (mxnet.nd.NDArray) – Input image with HWC format.
px (float) – Horizontal flip probability [0, 1].
py (float) – Vertical flip probability [0, 1].
copy (bool) – If True, return a copy of input

Returns:

mxnet.nd.NDArray – Augmented image.
tuple – Tuple of (flip_x, flip_y), records of whether flips are applied.

gluoncv.data.transforms.image.resize_contain(src, size, fill=0)[source]¶

Resize the image to fit in the given area while keeping aspect ratio.

If both the height and the width in size are larger than the height and the width of input image, the image is placed on the center with an appropriate padding to match size. Otherwise, the input image is scaled to fit in a canvas whose size is size while preserving aspect ratio.

Parameters:

src (mxnet.nd.NDArray) – The original image with HWC format.
size (tuple) – Tuple of length 2 as (width, height).
fill (int or float or array-like) – The value(s) for padded borders. If fill is numerical type, RGB channels will be padded with single value. Otherwise fill must have same length as image channels, which resulted in padding with per-channel values.

Returns:

mxnet.nd.NDArray – Augmented image.
tuple – Tuple of (offset_x, offset_y, scaled_x, scaled_y)

gluoncv.data.transforms.image.ten_crop(src, size)[source]¶

Crop 10 regions from an array. This is performed same as: http://chainercv.readthedocs.io/en/stable/reference/transforms.html#ten-crop

This method crops 10 regions. All regions will be in shape :obj`size`. These regions consist of 1 center crop and 4 corner crops and horizontal flips of them. The crops are ordered in this order. * center crop * top-left crop * bottom-left crop * top-right crop * bottom-right crop * center crop (flipped horizontally) * top-left crop (flipped horizontally) * bottom-left crop (flipped horizontally) * top-right crop (flipped horizontally) * bottom-right crop (flipped horizontally)

Parameters:	src (mxnet.nd.NDArray) – Input image. size (tuple) – Tuple of length 2, as (width, height) of the cropped areas.
Returns:	The cropped images with shape (10, size[1], size[0], C)
Return type:	mxnet.nd.NDArray

Experimental¶

Experimental image transformations.

gluoncv.data.transforms.experimental.image.random_color_distort(src, brightness_delta=32, contrast_low=0.5, contrast_high=1.5, saturation_low=0.5, saturation_high=1.5, hue_delta=18)[source]¶

Randomly distort image color space. Note that input image should in original range [0, 255].

Parameters:	src (mxnet.nd.NDArray) – Input image as HWC format. brightness_delta (int) – Maximum brightness delta. Defaults to 32. contrast_low (float) – Lowest contrast. Defaults to 0.5. contrast_high (float) – Highest contrast. Defaults to 1.5. saturation_low (float) – Lowest saturation. Defaults to 0.5. saturation_high (float) – Highest saturation. Defaults to 1.5. hue_delta (int) – Maximum hue delta. Defaults to 18.
Returns:	Distorted image in HWC format.
Return type:	mxnet.nd.NDArray

Preset Transforms¶

We include presets for reproducing SOTA performances described in different papers. This is a complimentary section and APIs are prone to changes.

Single Shot Multibox Object Detector¶

Transforms described in https://arxiv.org/abs/1512.02325.

gluoncv.data.transforms.presets.ssd.load_test(filenames, short, max_size=1024, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶

A util function to load all images, transform them to tensor by applying normalizations. This function support 1 filename or list of filenames.

Parameters:	filenames (str or list of str) – Image filename(s) to be loaded. short (int) – Resize image short side to this short and keep aspect ratio. max_size (int, optional) – Maximum longer side length to fit image. This is to limit the input image shape. Aspect ratio is intact because we support arbitrary input size in our SSD implementation. mean (iterable of float) – Mean pixel values. std (iterable of float) – Standard deviations of pixel values.
Returns:	A (1, 3, H, W) mxnet NDArray as input to network, and a numpy ndarray as original un-normalized color image for display. If multiple image names are supplied, return two lists. You can use zip()` to collapse it.
Return type:	(mxnet.NDArray, numpy.ndarray) or list of such tuple

class gluoncv.data.transforms.presets.ssd.SSDDefaultTrainTransform(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶

Default SSD training transform which includes tons of image augmentations.

Parameters:	width (int) – Image width. height (int) – Image height. mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406]. std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].

class gluoncv.data.transforms.presets.ssd.SSDDefaultValTransform(width, height, mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))[source]¶

Default SSD validation transform.

Parameters:	width (int) – Image width. height (int) – Image height. mean (array-like of size 3) – Mean pixel values to be subtracted from image tensor. Default is [0.485, 0.456, 0.406]. std (array-like of size 3) – Standard deviation to be divided from image. Default is [0.229, 0.224, 0.225].