Vision Datasets¶

Cross-Domain Classification¶

ImageList¶

class common.vision.datasets.imagelist.ImageList(root, classes, data_list_file, transform=None, target_transform=None)[source]¶

A generic Dataset class for image classification

Parameters

root (str) – Root directory of dataset
classes (list[str]) – The names of all the classes
data_list_file (str) – File to read the image list from.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In data_list_file, each line has 2 values in the following format.

source_dir/dog_xxx.png 0
source_dir/cat_123.png 1
target_dir/dog_xxy.png 0
target_dir/cat_nsdf3.png 1

The first value is the relative path of an image, and the second value is the label of the corresponding image. If your data_list_file has different formats, please over-ride parse_data_file().

classmethod domains()[source]¶: All possible domain in this dataset

property num_classes¶: Number of classes

parse_data_file(file_name)[source]¶

Parse file to data list

Parameters

file_name (str) – The path of data file
return (list) – List of (image path, class_index) tuples

Office-31¶

class common.vision.datasets.office31.Office31(root, task, download=True, **kwargs)[source]¶

Office31 Dataset.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'A': amazon, 'D': dslr and 'W': webcam.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

amazon/
    images/
        backpack/
            *.jpg
            ...
dslr/
webcam/
image_list/
    amazon.txt
    dslr.txt
    webcam.txt

classmethod domains()[source]¶: All possible domain in this dataset

property num_classes¶: Number of classes

parse_data_file(file_name)¶

Parse file to data list

Parameters

file_name (str) – The path of data file
return (list) – List of (image path, class_index) tuples

Office-Caltech¶

class common.vision.datasets.officecaltech.OfficeCaltech(root, task, download=False, **kwargs)[source]¶

Office+Caltech Dataset.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'A': amazon, 'D': dslr, 'W':webcam and 'C': caltech.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

amazon/
    images/
        backpack/
            *.jpg
            ...
dslr/
webcam/
caltech/
image_list/
    amazon.txt
    dslr.txt
    webcam.txt
    caltech.txt

property num_classes¶: Number of classes

Office-Home¶

class common.vision.datasets.officehome.OfficeHome(root, task, download=False, **kwargs)[source]¶

OfficeHome Dataset.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'Ar': Art, 'Cl': Clipart, 'Pr': Product and 'Rw': Real_World.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

Art/
    Alarm_Clock/*.jpg
    ...
Clipart/
Product/
Real_World/
image_list/
    Art.txt
    Clipart.txt
    Product.txt
    Real_World.txt

classmethod domains()[source]¶: All possible domain in this dataset

property num_classes¶: Number of classes

parse_data_file(file_name)¶

Parse file to data list

Parameters

file_name (str) – The path of data file
return (list) – List of (image path, class_index) tuples

VisDA-2017¶

class common.vision.datasets.visda2017.VisDA2017(root, task, download=False, **kwargs)[source]¶

VisDA-2017 Dataset

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'Synthetic': synthetic images and 'Real': real-world images.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
    aeroplance/
        *.png
        ...
validation/
image_list/
    train.txt
    validation.txt

classmethod domains()[source]¶: All possible domain in this dataset

property num_classes¶: Number of classes

parse_data_file(file_name)¶

Parse file to data list

Parameters

file_name (str) – The path of data file
return (list) – List of (image path, class_index) tuples

DomainNet¶

class common.vision.datasets.domainnet.DomainNet(root, task, split='train', download=False, **kwargs)[source]¶

DomainNet (cleaned version, recommended)

See Moment Matching for Multi-Source Domain Adaptation for details.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'c':clipart, 'i': infograph, 'p': painting, 'q': quickdraw, 'r': real, 's': sketch
split (str, optional) – The dataset split, supports train, or test.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

clipart/
infograph/
painting/
quickdraw/
real/
sketch/
image_list/
    clipart.txt
    ...

classmethod domains()[source]¶: All possible domain in this dataset

property num_classes¶: Number of classes

parse_data_file(file_name)¶

Parse file to data list

Parameters

file_name (str) – The path of data file
return (list) – List of (image path, class_index) tuples

PACS¶

class common.vision.datasets.pacs.PACS(root, task, split='all', download=True, **kwargs)[source]¶

PACS Dataset.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'A': amazon, 'D': dslr and 'W': webcam.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

art_painting/
    dog/
        *.jpg
        ...
cartoon/
photo/
sketch
image_list/
    art_painting.txt
    cartoon.txt
    photo.txt
    sketch.txt

classmethod domains()[source]¶: All possible domain in this dataset

MNIST¶

class common.vision.datasets.digits.MNIST(root, mode='L', split='train', download=True, **kwargs)[source]¶

MNIST Dataset.

Parameters

root (str) – Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.
mode (str) – The channel mode for image. Choices includes "L"`, "RGB". Default: "L"`
split (str, optional) – The dataset split, supports train, or test.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

USPS¶

class common.vision.datasets.digits.USPS(root, mode='L', split='train', download=True, **kwargs)[source]¶

USPS Dataset.: The data-format is : [label [index:value ]*256 n] * num_lines, where label lies in [1, 10]. The value for each pixel lies in [-1, 1]. Here we transform the label into [0, 9] and make pixel values in [0, 255].

Parameters

root (str) – Root directory of dataset to store``USPS`` data files.
mode (str) – The channel mode for image. Choices includes "L"`, "RGB". Default: "L"`
split (str, optional) – The dataset split, supports train, or test.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

SVHN¶

class common.vision.datasets.digits.SVHN(root, mode='L', download=True, **kwargs)[source]¶

SVHN Dataset. Note: The SVHN dataset assigns the label 10 to the digit 0. However, in this Dataset, we assign the label 0 to the digit 0 to be compatible with PyTorch loss functions which expect the class labels to be in the range [0, C-1]

Warning

This class needs scipy to load data from .mat format.

Parameters

root (str) – Root directory of dataset where directory SVHN exists.
mode (str) – The channel mode for image. Choices includes "L"`, "RGB". Default: "RGB"`
split (str, optional) – The dataset split, supports train, or test.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

Partial Cross-Domain Classification¶

Partial Wrapper¶

common.vision.datasets.partial.partial(dataset_class, partial_classes)[source]¶

Convert a dataset into its partial version.

In other words, those samples which doesn’t belong to partial_classes will be discarded. Yet partial will not change the label space of dataset_class.

Parameters

dataset_class (class) – Dataset class. Only subclass of ImageList can be partial.
partial_classes (sequence[str]) – A sequence of which categories need to be kept in the partial dataset. Each element of partial_classes must belong to the classes list of dataset_class.

Examples:

>>> partial_classes = ['back_pack', 'bike', 'calculator', 'headphones', 'keyboard']
>>> # create a partial dataset class
>>> PartialOffice31 = partial(Office31, partial_classes)
>>> # create an instance of the partial dataset
>>> dataset = PartialDataset(root="data/office31", task="A")

common.vision.datasets.partial.default_partial(dataset_class)[source]¶

Default partial used in some paper.

Parameters: dataset_class (class) – Dataset class. Currently, dataset_class must be one of Office31, OfficeHome, VisDA2017, ImageNetCaltech and CaltechImageNet.

Caltech-256->ImageNet-1k¶

class common.vision.datasets.partial.caltech_imagenet.CaltechImageNet(root, task, download=True, **kwargs)[source]¶

Caltech-ImageNet is constructed from Caltech-256 and ImageNet-1K .

They share 84 common classes. Caltech-ImageNet keeps all classes of Caltech-256. The label is based on the Caltech256 (class 0-255) . The private classes of ImageNet-1K is discarded.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'C':Caltech-256, 'I': ImageNet-1K validation set.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

You need to put train and val directory of ImageNet-1K manually in root directory since ImageNet-1K is no longer publicly accessible. DALIB will only download Caltech-256 and ImageList automatically. In root, there will exist following files after downloading.

train/
    n01440764/
    ...
val/
256_ObjectCategories/
    001.ak47/
    ...
image_list/
    caltech_256_list.txt
    ...

ImageNet-1k->Caltech-256¶

class common.vision.datasets.partial.imagenet_caltech.ImageNetCaltech(root, task, download=True, **kwargs)[source]¶

ImageNet-Caltech is constructed from Caltech-256 and ImageNet-1K .

They share 84 common classes. ImageNet-Caltech keeps all classes of ImageNet-1K. The label is based on the ImageNet-1K (class 0-999) . The private classes of Caltech-256 is discarded.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'C':Caltech-256, 'I': ImageNet-1K training set.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

You need to put train and val directory of ImageNet-1K manually in root directory since ImageNet-1K is no longer publicly accessible. DALIB will only download Caltech-256 and ImageList automatically. In root, there will exist following files after downloading.

train/
    n01440764/
    ...
val/
256_ObjectCategories/
    001.ak47/
    ...
image_list/
    caltech_256_list.txt
    ...

Open Set Cross-Domain Classification¶

Open Set Wrapper¶

common.vision.datasets.openset.open_set(dataset_class, public_classes, private_classes=())[source]¶

Convert a dataset into its open-set version.

In other words, those samples which doesn’t belong to private_classes will be marked as “unknown”.

Be aware that open_set will change the label number of each category.

Parameters

dataset_class (class) – Dataset class. Only subclass of ImageList can be open-set.
public_classes (sequence[str]) – A sequence of which categories need to be kept in the open-set dataset. Each element of public_classes must belong to the classes list of dataset_class.
private_classes (sequence[str], optional) – A sequence of which categories need to be marked as “unknown” in the open-set dataset. Each element of private_classes must belong to the classes list of dataset_class. Default: ().

Examples:

>>> public_classes = ['back_pack', 'bike', 'calculator', 'headphones', 'keyboard']
>>> private_classes = ['laptop_computer', 'monitor', 'mouse', 'mug', 'projector']
>>> # create a open-set dataset class which has classes
>>> # 'back_pack', 'bike', 'calculator', 'headphones', 'keyboard' and 'unknown'.
>>> OpenSetOffice31 = open_set(Office31, public_classes, private_classes)
>>> # create an instance of the open-set dataset
>>> dataset = OpenSetDataset(root="data/office31", task="A")

common.vision.datasets.openset.default_open_set(dataset_class, source)[source]¶

Default open-set used in some paper.

Parameters

dataset_class (class) – Dataset class. Currently, dataset_class must be one of Office31, OfficeHome, VisDA2017,
source (bool) – Whether the dataset is used for source domain or not.

Cross-Domain Regression¶

ImageRegression¶

class common.vision.datasets.regression.image_regression.ImageRegression(root, factors, data_list_file, transform=None, target_transform=None)[source]¶

A generic Dataset class for domain adaptation in image regression

Parameters

root (str) – Root directory of dataset
factors (sequence[str]) – Factors selected. Default: (‘scale’, ‘position x’, ‘position y’).
data_list_file (str) – File to read the image list from.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In data_list_file, each line has 1+len(factors) values in the following format.

source_dir/dog_xxx.png x11, x12, ...
source_dir/cat_123.png x21, x22, ...
target_dir/dog_xxy.png x31, x32, ...
target_dir/cat_nsdf3.png x41, x42, ...

The first value is the relative path of an image, and the rest values are the ground truth of the corresponding factors. If your data_list_file has different formats, please over-ride ImageRegression.parse_data_file().

parse_data_file(file_name)[source]¶

Parse file to data list

Parameters: file_name (str) – The path of data file
Returns: List of (image path, (factors)) tuples

DSprites¶

class common.vision.datasets.regression.dsprites.DSprites(root, task, split='train', factors=('scale', 'position x', 'position y'), download=True, target_transform=None, **kwargs)[source]¶

DSprites Dataset.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'C': Color, 'N': Noisy and 'S': Scream.
split (str, optional) – The dataset split, supports train, or test.
factors (sequence[str]) – Factors selected. Default: (‘scale’, ‘position x’, ‘position y’).
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

color/
    ...
noisy/
scream/
image_list/
    color_train.txt
    noisy_train.txt
    scream_train.txt
    color_test.txt
    noisy_test.txt
    scream_test.txt

MPI3D¶

class common.vision.datasets.regression.mpi3d.MPI3D(root, task, split='train', factors=('horizontal axis', 'vertical axis'), download=True, target_transform=None, **kwargs)[source]¶

MPI3D Dataset.

Parameters

root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include 'C': Color, 'N': Noisy and 'S': Scream.
split (str, optional) – The dataset split, supports train, or test.
factors (sequence[str]) – Factors selected. Default: (‘horizontal axis’, ‘vertical axis’).
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

real/
    ...
realistic/
toy/
image_list/
    real_train.txt
    realistic_train.txt
    toy_train.txt
    real_test.txt
    realistic_test.txt
    toy_test.txt

Cross-Domain Segmentation¶

SegmentationList¶

class common.vision.datasets.segmentation.segmentation_list.SegmentationList(root, classes, data_list_file, label_list_file, data_folder, label_folder, id_to_train_id=None, train_id_to_color=None, transforms=None)[source]¶

A generic Dataset class for domain adaptation in image segmentation

Parameters

root (str) – Root directory of dataset
classes (seq[str]) – The names of all the classes
data_list_file (str) – File to read the image list from.
label_list_file (str) – File to read the label list from.
data_folder (str) – Sub-directory of the image.
label_folder (str) – Sub-directory of the label.
mean (seq[float]) – mean BGR value. Normalize and convert to the image if not None. Default: None.
id_to_train_id (dict, optional) – the map between the id on the label and the actual train id.
train_id_to_color (seq, optional) – the map between the train id and the color.
transforms (callable, optional) – A function/transform that takes in (PIL Image, label) pair and returns a transformed version. E.g, Resize.

Note

In data_list_file, each line is the relative path of an image. If your data_list_file has different formats, please over-ride parse_data_file().

source_dir/dog_xxx.png
target_dir/dog_xxy.png

In label_list_file, each line is the relative path of an label. If your label_list_file has different formats, please over-ride parse_label_file().

Warning

When mean is not None, please do not provide Normalize and ToTensor in transforms.

collect_image_paths()[source]¶: Return a list of the absolute path of all the images

decode_target(target)[source]¶

Decode label (each value is integer) into the corresponding RGB value.

Parameters: target (numpy.array) – label in shape H x W
Returns: RGB label (PIL Image) in shape H x W x 3

property evaluate_classes¶: The name of classes to be evaluated

property ignore_classes¶: The name of classes to be ignored

property num_classes¶: Number of classes

parse_data_file(file_name)[source]¶

Parse file to image list

Parameters: file_name (str) – The path of data file
Returns: List of image path

parse_label_file(file_name)[source]¶

Parse file to label list

Parameters: file_name (str) – The path of data file
Returns: List of label path

translate(transform, target_root, color=False)[source]¶

Translate an image and save it into a specified directory

Parameters

transform (callable) – a transform function that maps (image, label) pair from one domain to another domain
target_root (str) – the root directory to save images and labels

Cityscapes¶

class common.vision.datasets.segmentation.cityscapes.Cityscapes(root, split='train', data_folder='leftImg8bit', label_folder='gtFine', **kwargs)[source]¶

Cityscapes is a real-world semantic segmentation dataset collected in driving scenarios.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or val.
data_folder (str, optional) – Sub-directory of the image. Default: ‘leftImg8bit’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘gtFine’.
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download Cityscapes manually. Ensure that there exist following files in the root directory before you using this class.

leftImg8bit/
    train/
    val/
    test/
gtFine/
    train/
    val/
    test/

GTA5¶

class common.vision.datasets.segmentation.gta5.GTA5(root, split='train', data_folder='images', label_folder='labels', **kwargs)[source]¶

GTA5

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train.
data_folder (str, optional) – Sub-directory of the image. Default: ‘images’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘labels’.
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download GTA5 manually. Ensure that there exist following directories in the root directory before you using this class.

images/
labels/

Synthia¶

class common.vision.datasets.segmentation.synthia.Synthia(root, split='train', data_folder='RGB', label_folder='synthia_mapped_to_cityscapes', **kwargs)[source]¶

SYNTHIA

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train.
data_folder (str, optional) – Sub-directory of the image. Default: ‘RGB’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘synthia_mapped_to_cityscapes’.
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download GTA5 manually. Ensure that there exist following directories in the root directory before you using this class.

RGB/
synthia_mapped_to_cityscapes/

Foggy Cityscapes¶

class common.vision.datasets.segmentation.cityscapes.FoggyCityscapes(root, split='train', data_folder='leftImg8bit_foggy', label_folder='gtFine', beta=0.02, **kwargs)[source]¶

Foggy Cityscapes is a real-world semantic segmentation dataset collected in foggy driving scenarios.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or val.
data_folder (str, optional) – Sub-directory of the image. Default: ‘leftImg8bit’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘gtFine’.
beta (float, optional) – The parameter for foggy. Choices includes: 0.005, 0.01, 0.02. Default: 0.02
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download Cityscapes manually. Ensure that there exist following files in the root directory before you using this class.

leftImg8bit_foggy/
    train/
    val/
    test/
gtFine/
    train/
    val/
    test/

Cross-Domain Keypoint Detection¶

Dataset Base for Keypoint Detection¶

class common.vision.datasets.keypoint_detection.keypoint_dataset.KeypointDataset(root, num_keypoints, samples, transforms=None, image_size=(256, 256), heatmap_size=(64, 64), sigma=2, keypoints_group=None, colored_skeleton=None)[source]¶

A generic dataset class for image keypoint detection

Parameters

root (str) – Root directory of dataset
num_keypoints (int) – Number of keypoints
samples (list) – list of data
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.
image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
keypoints_group (dict) – a dict that stores the index of different types of keypoints
colored_skeleton (dict) – a dict that stores the index and color of different skeleton

group_accuracy(accuracies)[source]¶

Group the accuracy of K keypoints into different kinds.

Parameters: accuracies (list) – accuracy of the K keypoints
Returns: accuracy of N=len(keypoints_group) kinds of keypoints

visualize(image, keypoints, filename)[source]¶

Visualize an image with its keypoints, and store the result into a file

Parameters

image (PIL.Image) –
keypoints (torch.Tensor) – keypoints in shape K x 2
filename (str) – the name of file to store

class common.vision.datasets.keypoint_detection.keypoint_dataset.Body16KeypointDataset(root, samples, **kwargs)[source]¶: Dataset with 16 body keypoints.

class common.vision.datasets.keypoint_detection.keypoint_dataset.Hand21KeypointDataset(root, samples, **kwargs)[source]¶: Dataset with 21 hand keypoints.

Rendered Handpose Dataset¶

class common.vision.datasets.keypoint_detection.rendered_hand_pose.RenderedHandPose(root, split='train', task='all', download=True, **kwargs)[source]¶

Rendered Handpose Dataset

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, test, or all.
task (str, optional) – Placeholder.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.
image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

In root, there will exist following files after downloading.

RHD_published_v2/
    training/
    evaluation/

Hand-3d-Studio Dataset¶

class common.vision.datasets.keypoint_detection.hand_3d_studio.Hand3DStudio(root, split='train', task='noobject', download=True, **kwargs)[source]¶

Hand-3d-Studio Dataset

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, test, or all.
task (str, optional) – The task to create dataset. Choices include 'noobject': only hands without objects, 'object': only hands interacting with hands, and 'all': all hands. Default: ‘noobject’.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.
image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

We found that the original H3D image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training.

Note

In root, there will exist following files after downloading.

H3D_crop/
    annotation.json
    part1/
    part2/
    part3/
    part4/
    part5/

FreiHAND Dataset¶

class common.vision.datasets.keypoint_detection.freihand.FreiHand(root, split='train', task='all', download=True, **kwargs)[source]¶

FreiHand Dataset

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, test, or all.
task (str, optional) – The post-processing option to create dataset. Choices include 'gs': green screen recording, 'auto': auto colorization without sample points: automatic color hallucination, 'sample': auto colorization with sample points, 'hom': homogenized, and 'all': all hands. Default: ‘all’.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.
image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

In root, there will exist following files after downloading.

*.json
training/
evaluation/

Surreal Dataset¶

class common.vision.datasets.keypoint_detection.surreal.SURREAL(root, split='train', task='all', download=True, **kwargs)[source]¶

Surreal Dataset

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, test, or all. Default: train.
task (str, optional) – Placeholder.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.
image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

We found that the original Surreal image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training.

Note

In root, there will exist following files after downloading.

train/
test/
val/

LSP Dataset¶

class common.vision.datasets.keypoint_detection.lsp.LSP(root, split='train', task='all', download=True, image_size=(256, 256), transforms=None, **kwargs)[source]¶

Leeds Sports Pose Dataset

Parameters

root (str) – Root directory of dataset
split (str, optional) – PlaceHolder.
task (str, optional) – Placeholder.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – PlaceHolder.
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

In root, there will exist following files after downloading.

lsp/
    images/
    joints.mat

Note

LSP is only used for target domain. Due to the small dataset size, the whole dataset is used no matter what split is. Also, the transform is fixed.

Human3.6M Dataset¶

class common.vision.datasets.keypoint_detection.human36m.Human36M(root, split='train', task='all', download=True, **kwargs)[source]¶

Human3.6M Dataset

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, test, or all. Default: train.
task (str, optional) – Placeholder.
download (bool, optional) – Placeholder.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.
image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

You need to download Human36M manually. Ensure that there exist following files in the root directory before you using this class.

annotations/
    Human36M_subject11_joint_3d.json
    ...
images/

Note

We found that the original Human3.6M image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training. In root, there will exist following files after crop.

Human36M_crop/
annotations/
    keypoints2d_11.json
    ...

Cross-Domain ReID¶

Market1501¶

class common.vision.datasets.reid.market1501.Market1501(root, verbose=True)[source]¶

Market1501 dataset from Scalable Person Re-identification: A Benchmark (ICCV 2015).

Dataset statistics:

identities: 1501 (+1 for background)
images: 12936 (train) + 3368 (query) + 15913 (gallery)
cameras: 6

Parameters

root (str) – Root directory of dataset
verbose (bool, optional) – If true, print dataset statistics after loading the dataset. Default: True

translate(transform, target_root)[source]¶

Translate an image and save it into a specified directory

Parameters

transform (callable) – a transform function that maps images from one domain to another domain
target_root (str) – the root directory to save images

DukeMTMC-reID¶

class common.vision.datasets.reid.dukemtmc.DukeMTMC(root, verbose=True)[source]¶

DukeMTMC-reID dataset from Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking (ECCV 2016).

Dataset statistics:

identities: 1404 (train + query)
images:16522 (train) + 2228 (query) + 17661 (gallery)
cameras: 8

Parameters

root (str) – Root directory of dataset
verbose (bool, optional) – If true, print dataset statistics after loading the dataset. Default: True

translate(transform, target_root)[source]¶

Translate an image and save it into a specified directory

Parameters

transform (callable) – a transform function that maps images from one domain to another domain
target_root (str) – the root directory to save images

MSMT17¶

class common.vision.datasets.reid.msmt17.MSMT17(root, verbose=True)[source]¶

MSMT17 dataset from Person Transfer GAN to Bridge Domain Gap for Person Re-Identification (CVPR 2018).

Dataset statistics:

identities: 4101
images: 32621 (train) + 11659 (query) + 82161 (gallery)
cameras: 15

Parameters

root (str) – Root directory of dataset
verbose (bool, optional) – If true, print dataset statistics after loading the dataset. Default: True

translate(transform, target_root)[source]¶

Translate an image and save it into a specified directory

Parameters

transform (callable) – a transform function that maps images from one domain to another domain
target_root (str) – the root directory to save images

Natural Object Recognition¶

Stanford Dogs¶

class common.vision.datasets.stanford_dogs.StanfordDogs(root, split, sample_rate=100, download=False, **kwargs)[source]¶

The Stanford Dogs contains 20,580 images of 120 breeds of dogs from around the world. Each category is composed of exactly 100 training examples and around 72 testing examples.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

Stanford Cars¶

class common.vision.datasets.stanford_cars.StanfordCars(root, split, sample_rate=100, download=False, **kwargs)[source]¶

The Stanford Cars contains 16,185 images of 196 classes of cars. Each category has been split roughly in a 50-50 split. There are 8,144 images for training and 8,041 images for testing.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

CUB-200-2011¶

class common.vision.datasets.cub200.CUB200(root, split, sample_rate=100, download=False, **kwargs)[source]¶

Caltech-UCSD Birds-200-2011 is a dataset for fine-grained visual recognition with 11,788 images in 200 bird species. It is an extended version of the CUB-200 dataset, roughly doubling the number of images.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

FVGC Aircraft¶

class common.vision.datasets.aircrafts.Aircraft(root, split, sample_rate=100, download=False, **kwargs)[source]¶

FVGC-Aircraft is a benchmark for the fine-grained visual categorization of aircraft. The dataset contains 10,200 images of aircraft, with 100 images for each of the 102 different aircraft variants.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

Oxford-IIIT Pet¶

class common.vision.datasets.oxfordpet.OxfordIIITPet(root, split, sample_rate=100, download=False, **kwargs)[source]¶

The Oxford-IIIT Pet is a 37-category pet dataset with roughly 200 images for each class.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

COCO-70¶

class common.vision.datasets.coco70.COCO70(root, split, sample_rate=100, download=False, **kwargs)[source]¶

COCO-70 dataset is a large-scale classification dataset (1000 images per class) created from COCO Dataset. It is used to explore the effect of fine-tuning with a large amount of data.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

DTD¶

class common.vision.datasets.dtd.DTD(root, split, download=False, **kwargs)[source]¶

The Describable Textures Dataset (DTD) is an evolving collection of textural images in the wild, annotated with a series of human-centric attributes, inspired by the perceptual properties of textures. The task consists in classifying images of textural patterns (47 classes, with 120 training images each). Some of the textures are banded, bubbly, meshed, lined, or porous. The image size ranges between 300x300 and 640x640 pixels.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

OxfordFlowers102¶

class common.vision.datasets.oxfordflowers.OxfordFlowers102(root, split='train', download=False, **kwargs)[source]¶

The Oxford Flowers 102 is a consistent of 102 flower categories commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is divided into a training set, a validation set and a test set. The training set and validation set each consist of 10 images per class (totalling 1020 images each). The test set consists of the remaining 6149 images (minimum 20 per class).

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Specialized Image Classification¶

PatchCamelyon¶

class common.vision.datasets.patchcamelyon.PatchCamelyon(root, split, download=False, **kwargs)[source]¶

The PatchCamelyon dataset contains 327680 images of histopathologic scans of lymph node sections. The classification task consists in predicting the presence of metastatic tissue in given image (i.e., two classes). All images are 96x96 pixels

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Retinopathy¶

class common.vision.datasets.retinopathy.Retinopathy(root, split, download=False, **kwargs)[source]¶

Retinopathy dataset consists of image-label pairs with high-resolution retina images, and labels that indicate the presence of Diabetic Retinopahy (DR) in a 0-4 scale (No DR, Mild, Moderate, Severe, or Proliferative DR).

Note

You need to download the source data manually into root directory.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

EuroSAT¶

class common.vision.datasets.eurosat.EuroSAT(root, split='train', download=False, **kwargs)[source]¶

EuroSAT dataset consists in classifying Sentinel-2 satellite images into 10 different types of land use (Residential, Industrial, River, Highway, etc). The spatial resolution corresponds to 10 meters per pixel, and the image size is 64x64 pixels.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Resisc45¶

class common.vision.datasets.resisc45.Resisc45(root, split='train', download=False, **kwargs)[source]¶

Resisc45 dataset is a scene classification task from remote sensing images. There are 45 classes, containing 700 images each, including tennis court, ship, island, lake, parking lot, sparse residential, or stadium. The image size is RGB 256x256 pixels.

Note

You need to download the source data manually into root directory.

Parameters

root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports train, or test.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.
target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

property num_classes¶: Number of classes

Vision Datasets¶

Cross-Domain Classification¶

ImageList¶

Office-31¶

Office-Caltech¶

Office-Home¶

VisDA-2017¶

DomainNet¶

PACS¶

MNIST¶

USPS¶

SVHN¶

Partial Cross-Domain Classification¶

Partial Wrapper¶

Caltech-256->ImageNet-1k¶

ImageNet-1k->Caltech-256¶

Open Set Cross-Domain Classification¶

Open Set Wrapper¶

Cross-Domain Regression¶

ImageRegression¶

DSprites¶

MPI3D¶

Cross-Domain Segmentation¶

SegmentationList¶

Cityscapes¶

GTA5¶

Synthia¶

Foggy Cityscapes¶

Cross-Domain Keypoint Detection¶

Dataset Base for Keypoint Detection¶

Rendered Handpose Dataset¶

Hand-3d-Studio Dataset¶

FreiHAND Dataset¶

Surreal Dataset¶

LSP Dataset¶

Human3.6M Dataset¶

Cross-Domain ReID¶

Market1501¶

DukeMTMC-reID¶

MSMT17¶

Natural Object Recognition¶

Stanford Dogs¶

Stanford Cars¶

CUB-200-2011¶

FVGC Aircraft¶

Oxford-IIIT Pet¶

COCO-70¶

DTD¶

OxfordFlowers102¶

Specialized Image Classification¶

PatchCamelyon¶

Retinopathy¶

EuroSAT¶

Resisc45¶

Docs

Tutorials