Shortcuts

Vision Datasets

Cross-Domain Classification

ImageList

class common.vision.datasets.imagelist.ImageList(root, classes, data_list_file, transform=None, target_transform=None)[source]

A generic Dataset class for image classification

Parameters
  • root (str) – Root directory of dataset

  • classes (list[str]) – The names of all the classes

  • data_list_file (str) – File to read the image list from.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In data_list_file, each line has 2 values in the following format.

source_dir/dog_xxx.png 0
source_dir/cat_123.png 1
target_dir/dog_xxy.png 0
target_dir/cat_nsdf3.png 1

The first value is the relative path of an image, and the second value is the label of the corresponding image. If your data_list_file has different formats, please over-ride parse_data_file().

classmethod domains()[source]

All possible domain in this dataset

property num_classes

Number of classes

parse_data_file(file_name)[source]

Parse file to data list

Parameters
  • file_name (str) – The path of data file

  • return (list) – List of (image path, class_index) tuples

Office-31

class common.vision.datasets.office31.Office31(root, task, download=True, **kwargs)[source]

Office31 Dataset.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'A': amazon, 'D': dslr and 'W': webcam.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

amazon/
    images/
        backpack/
            *.jpg
            ...
dslr/
webcam/
image_list/
    amazon.txt
    dslr.txt
    webcam.txt
classmethod domains()[source]

All possible domain in this dataset

property num_classes

Number of classes

parse_data_file(file_name)

Parse file to data list

Parameters
  • file_name (str) – The path of data file

  • return (list) – List of (image path, class_index) tuples

Office-Caltech

class common.vision.datasets.officecaltech.OfficeCaltech(root, task, download=False, **kwargs)[source]

Office+Caltech Dataset.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'A': amazon, 'D': dslr, 'W':webcam and 'C': caltech.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

amazon/
    images/
        backpack/
            *.jpg
            ...
dslr/
webcam/
caltech/
image_list/
    amazon.txt
    dslr.txt
    webcam.txt
    caltech.txt
property num_classes

Number of classes

Office-Home

class common.vision.datasets.officehome.OfficeHome(root, task, download=False, **kwargs)[source]

OfficeHome Dataset.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'Ar': Art, 'Cl': Clipart, 'Pr': Product and 'Rw': Real_World.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

Art/
    Alarm_Clock/*.jpg
    ...
Clipart/
Product/
Real_World/
image_list/
    Art.txt
    Clipart.txt
    Product.txt
    Real_World.txt
classmethod domains()[source]

All possible domain in this dataset

property num_classes

Number of classes

parse_data_file(file_name)

Parse file to data list

Parameters
  • file_name (str) – The path of data file

  • return (list) – List of (image path, class_index) tuples

VisDA-2017

class common.vision.datasets.visda2017.VisDA2017(root, task, download=False, **kwargs)[source]

VisDA-2017 Dataset

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'Synthetic': synthetic images and 'Real': real-world images.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
    aeroplance/
        *.png
        ...
validation/
image_list/
    train.txt
    validation.txt
classmethod domains()[source]

All possible domain in this dataset

property num_classes

Number of classes

parse_data_file(file_name)

Parse file to data list

Parameters
  • file_name (str) – The path of data file

  • return (list) – List of (image path, class_index) tuples

DomainNet

class common.vision.datasets.domainnet.DomainNet(root, task, split='train', download=False, **kwargs)[source]

DomainNet (cleaned version, recommended)

See Moment Matching for Multi-Source Domain Adaptation for details.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'c':clipart, 'i': infograph, 'p': painting, 'q': quickdraw, 'r': real, 's': sketch

  • split (str, optional) – The dataset split, supports train, or test.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

clipart/
infograph/
painting/
quickdraw/
real/
sketch/
image_list/
    clipart.txt
    ...
classmethod domains()[source]

All possible domain in this dataset

property num_classes

Number of classes

parse_data_file(file_name)

Parse file to data list

Parameters
  • file_name (str) – The path of data file

  • return (list) – List of (image path, class_index) tuples

PACS

class common.vision.datasets.pacs.PACS(root, task, split='all', download=True, **kwargs)[source]

PACS Dataset.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'A': amazon, 'D': dslr and 'W': webcam.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

art_painting/
    dog/
        *.jpg
        ...
cartoon/
photo/
sketch
image_list/
    art_painting.txt
    cartoon.txt
    photo.txt
    sketch.txt
classmethod domains()[source]

All possible domain in this dataset

MNIST

class common.vision.datasets.digits.MNIST(root, mode='L', split='train', download=True, **kwargs)[source]

MNIST Dataset.

Parameters
  • root (str) – Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.

  • mode (str) – The channel mode for image. Choices includes "L"`, "RGB". Default: "L"`

  • split (str, optional) – The dataset split, supports train, or test.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

USPS

class common.vision.datasets.digits.USPS(root, mode='L', split='train', download=True, **kwargs)[source]
USPS Dataset.

The data-format is : [label [index:value ]*256 n] * num_lines, where label lies in [1, 10]. The value for each pixel lies in [-1, 1]. Here we transform the label into [0, 9] and make pixel values in [0, 255].

Parameters
  • root (str) – Root directory of dataset to store``USPS`` data files.

  • mode (str) – The channel mode for image. Choices includes "L"`, "RGB". Default: "L"`

  • split (str, optional) – The dataset split, supports train, or test.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

SVHN

class common.vision.datasets.digits.SVHN(root, mode='L', download=True, **kwargs)[source]

SVHN Dataset. Note: The SVHN dataset assigns the label 10 to the digit 0. However, in this Dataset, we assign the label 0 to the digit 0 to be compatible with PyTorch loss functions which expect the class labels to be in the range [0, C-1]

Warning

This class needs scipy to load data from .mat format.

Parameters
  • root (str) – Root directory of dataset where directory SVHN exists.

  • mode (str) – The channel mode for image. Choices includes "L"`, "RGB". Default: "RGB"`

  • split (str, optional) – The dataset split, supports train, or test.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

Partial Cross-Domain Classification

Partial Wrapper

common.vision.datasets.partial.partial(dataset_class, partial_classes)[source]

Convert a dataset into its partial version.

In other words, those samples which doesn’t belong to partial_classes will be discarded. Yet partial will not change the label space of dataset_class.

Parameters
  • dataset_class (class) – Dataset class. Only subclass of ImageList can be partial.

  • partial_classes (sequence[str]) – A sequence of which categories need to be kept in the partial dataset. Each element of partial_classes must belong to the classes list of dataset_class.

Examples:

>>> partial_classes = ['back_pack', 'bike', 'calculator', 'headphones', 'keyboard']
>>> # create a partial dataset class
>>> PartialOffice31 = partial(Office31, partial_classes)
>>> # create an instance of the partial dataset
>>> dataset = PartialDataset(root="data/office31", task="A")
common.vision.datasets.partial.default_partial(dataset_class)[source]

Default partial used in some paper.

Parameters

dataset_class (class) – Dataset class. Currently, dataset_class must be one of Office31, OfficeHome, VisDA2017, ImageNetCaltech and CaltechImageNet.

Caltech-256->ImageNet-1k

class common.vision.datasets.partial.caltech_imagenet.CaltechImageNet(root, task, download=True, **kwargs)[source]

Caltech-ImageNet is constructed from Caltech-256 and ImageNet-1K .

They share 84 common classes. Caltech-ImageNet keeps all classes of Caltech-256. The label is based on the Caltech256 (class 0-255) . The private classes of ImageNet-1K is discarded.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'C':Caltech-256, 'I': ImageNet-1K validation set.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

You need to put train and val directory of ImageNet-1K manually in root directory since ImageNet-1K is no longer publicly accessible. DALIB will only download Caltech-256 and ImageList automatically. In root, there will exist following files after downloading.

train/
    n01440764/
    ...
val/
256_ObjectCategories/
    001.ak47/
    ...
image_list/
    caltech_256_list.txt
    ...

ImageNet-1k->Caltech-256

class common.vision.datasets.partial.imagenet_caltech.ImageNetCaltech(root, task, download=True, **kwargs)[source]

ImageNet-Caltech is constructed from Caltech-256 and ImageNet-1K .

They share 84 common classes. ImageNet-Caltech keeps all classes of ImageNet-1K. The label is based on the ImageNet-1K (class 0-999) . The private classes of Caltech-256 is discarded.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'C':Caltech-256, 'I': ImageNet-1K training set.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

You need to put train and val directory of ImageNet-1K manually in root directory since ImageNet-1K is no longer publicly accessible. DALIB will only download Caltech-256 and ImageList automatically. In root, there will exist following files after downloading.

train/
    n01440764/
    ...
val/
256_ObjectCategories/
    001.ak47/
    ...
image_list/
    caltech_256_list.txt
    ...

Open Set Cross-Domain Classification

Open Set Wrapper

common.vision.datasets.openset.open_set(dataset_class, public_classes, private_classes=())[source]

Convert a dataset into its open-set version.

In other words, those samples which doesn’t belong to private_classes will be marked as “unknown”.

Be aware that open_set will change the label number of each category.

Parameters
  • dataset_class (class) – Dataset class. Only subclass of ImageList can be open-set.

  • public_classes (sequence[str]) – A sequence of which categories need to be kept in the open-set dataset. Each element of public_classes must belong to the classes list of dataset_class.

  • private_classes (sequence[str], optional) – A sequence of which categories need to be marked as “unknown” in the open-set dataset. Each element of private_classes must belong to the classes list of dataset_class. Default: ().

Examples:

>>> public_classes = ['back_pack', 'bike', 'calculator', 'headphones', 'keyboard']
>>> private_classes = ['laptop_computer', 'monitor', 'mouse', 'mug', 'projector']
>>> # create a open-set dataset class which has classes
>>> # 'back_pack', 'bike', 'calculator', 'headphones', 'keyboard' and 'unknown'.
>>> OpenSetOffice31 = open_set(Office31, public_classes, private_classes)
>>> # create an instance of the open-set dataset
>>> dataset = OpenSetDataset(root="data/office31", task="A")
common.vision.datasets.openset.default_open_set(dataset_class, source)[source]

Default open-set used in some paper.

Parameters
  • dataset_class (class) – Dataset class. Currently, dataset_class must be one of Office31, OfficeHome, VisDA2017,

  • source (bool) – Whether the dataset is used for source domain or not.

Cross-Domain Regression

ImageRegression

class common.vision.datasets.regression.image_regression.ImageRegression(root, factors, data_list_file, transform=None, target_transform=None)[source]

A generic Dataset class for domain adaptation in image regression

Parameters
  • root (str) – Root directory of dataset

  • factors (sequence[str]) – Factors selected. Default: (‘scale’, ‘position x’, ‘position y’).

  • data_list_file (str) – File to read the image list from.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In data_list_file, each line has 1+len(factors) values in the following format.

source_dir/dog_xxx.png x11, x12, ...
source_dir/cat_123.png x21, x22, ...
target_dir/dog_xxy.png x31, x32, ...
target_dir/cat_nsdf3.png x41, x42, ...

The first value is the relative path of an image, and the rest values are the ground truth of the corresponding factors. If your data_list_file has different formats, please over-ride ImageRegression.parse_data_file().

parse_data_file(file_name)[source]

Parse file to data list

Parameters

file_name (str) – The path of data file

Returns

List of (image path, (factors)) tuples

DSprites

class common.vision.datasets.regression.dsprites.DSprites(root, task, split='train', factors=('scale', 'position x', 'position y'), download=True, target_transform=None, **kwargs)[source]

DSprites Dataset.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'C': Color, 'N': Noisy and 'S': Scream.

  • split (str, optional) – The dataset split, supports train, or test.

  • factors (sequence[str]) – Factors selected. Default: (‘scale’, ‘position x’, ‘position y’).

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

color/
    ...
noisy/
scream/
image_list/
    color_train.txt
    noisy_train.txt
    scream_train.txt
    color_test.txt
    noisy_test.txt
    scream_test.txt

MPI3D

class common.vision.datasets.regression.mpi3d.MPI3D(root, task, split='train', factors=('horizontal axis', 'vertical axis'), download=True, target_transform=None, **kwargs)[source]

MPI3D Dataset.

Parameters
  • root (str) – Root directory of dataset

  • task (str) – The task (domain) to create dataset. Choices include 'C': Color, 'N': Noisy and 'S': Scream.

  • split (str, optional) – The dataset split, supports train, or test.

  • factors (sequence[str]) – Factors selected. Default: (‘horizontal axis’, ‘vertical axis’).

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

real/
    ...
realistic/
toy/
image_list/
    real_train.txt
    realistic_train.txt
    toy_train.txt
    real_test.txt
    realistic_test.txt
    toy_test.txt

Cross-Domain Segmentation

SegmentationList

class common.vision.datasets.segmentation.segmentation_list.SegmentationList(root, classes, data_list_file, label_list_file, data_folder, label_folder, id_to_train_id=None, train_id_to_color=None, transforms=None)[source]

A generic Dataset class for domain adaptation in image segmentation

Parameters
  • root (str) – Root directory of dataset

  • classes (seq[str]) – The names of all the classes

  • data_list_file (str) – File to read the image list from.

  • label_list_file (str) – File to read the label list from.

  • data_folder (str) – Sub-directory of the image.

  • label_folder (str) – Sub-directory of the label.

  • mean (seq[float]) – mean BGR value. Normalize and convert to the image if not None. Default: None.

  • id_to_train_id (dict, optional) – the map between the id on the label and the actual train id.

  • train_id_to_color (seq, optional) – the map between the train id and the color.

  • transforms (callable, optional) – A function/transform that takes in (PIL Image, label) pair and returns a transformed version. E.g, Resize.

Note

In data_list_file, each line is the relative path of an image. If your data_list_file has different formats, please over-ride parse_data_file().

source_dir/dog_xxx.png
target_dir/dog_xxy.png

In label_list_file, each line is the relative path of an label. If your label_list_file has different formats, please over-ride parse_label_file().

Warning

When mean is not None, please do not provide Normalize and ToTensor in transforms.

collect_image_paths()[source]

Return a list of the absolute path of all the images

decode_target(target)[source]

Decode label (each value is integer) into the corresponding RGB value.

Parameters

target (numpy.array) – label in shape H x W

Returns

RGB label (PIL Image) in shape H x W x 3

property evaluate_classes

The name of classes to be evaluated

property ignore_classes

The name of classes to be ignored

property num_classes

Number of classes

parse_data_file(file_name)[source]

Parse file to image list

Parameters

file_name (str) – The path of data file

Returns

List of image path

parse_label_file(file_name)[source]

Parse file to label list

Parameters

file_name (str) – The path of data file

Returns

List of label path

translate(transform, target_root, color=False)[source]

Translate an image and save it into a specified directory

Parameters
  • transform (callable) – a transform function that maps (image, label) pair from one domain to another domain

  • target_root (str) – the root directory to save images and labels

Cityscapes

class common.vision.datasets.segmentation.cityscapes.Cityscapes(root, split='train', data_folder='leftImg8bit', label_folder='gtFine', **kwargs)[source]

Cityscapes is a real-world semantic segmentation dataset collected in driving scenarios.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or val.

  • data_folder (str, optional) – Sub-directory of the image. Default: ‘leftImg8bit’.

  • label_folder (str, optional) – Sub-directory of the label. Default: ‘gtFine’.

  • mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.

  • transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download Cityscapes manually. Ensure that there exist following files in the root directory before you using this class.

leftImg8bit/
    train/
    val/
    test/
gtFine/
    train/
    val/
    test/

GTA5

class common.vision.datasets.segmentation.gta5.GTA5(root, split='train', data_folder='images', label_folder='labels', **kwargs)[source]

GTA5

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train.

  • data_folder (str, optional) – Sub-directory of the image. Default: ‘images’.

  • label_folder (str, optional) – Sub-directory of the label. Default: ‘labels’.

  • mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.

  • transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download GTA5 manually. Ensure that there exist following directories in the root directory before you using this class.

images/
labels/

Synthia

class common.vision.datasets.segmentation.synthia.Synthia(root, split='train', data_folder='RGB', label_folder='synthia_mapped_to_cityscapes', **kwargs)[source]

SYNTHIA

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train.

  • data_folder (str, optional) – Sub-directory of the image. Default: ‘RGB’.

  • label_folder (str, optional) – Sub-directory of the label. Default: ‘synthia_mapped_to_cityscapes’.

  • mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.

  • transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download GTA5 manually. Ensure that there exist following directories in the root directory before you using this class.

RGB/
synthia_mapped_to_cityscapes/

Foggy Cityscapes

class common.vision.datasets.segmentation.cityscapes.FoggyCityscapes(root, split='train', data_folder='leftImg8bit_foggy', label_folder='gtFine', beta=0.02, **kwargs)[source]

Foggy Cityscapes is a real-world semantic segmentation dataset collected in foggy driving scenarios.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or val.

  • data_folder (str, optional) – Sub-directory of the image. Default: ‘leftImg8bit’.

  • label_folder (str, optional) – Sub-directory of the label. Default: ‘gtFine’.

  • beta (float, optional) – The parameter for foggy. Choices includes: 0.005, 0.01, 0.02. Default: 0.02

  • mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.

  • transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g, Resize.

Note

You need to download Cityscapes manually. Ensure that there exist following files in the root directory before you using this class.

leftImg8bit_foggy/
    train/
    val/
    test/
gtFine/
    train/
    val/
    test/

Cross-Domain Keypoint Detection

Dataset Base for Keypoint Detection

class common.vision.datasets.keypoint_detection.keypoint_dataset.KeypointDataset(root, num_keypoints, samples, transforms=None, image_size=(256, 256), heatmap_size=(64, 64), sigma=2, keypoints_group=None, colored_skeleton=None)[source]

A generic dataset class for image keypoint detection

Parameters
  • root (str) – Root directory of dataset

  • num_keypoints (int) – Number of keypoints

  • samples (list) – list of data

  • transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.

  • image_size (tuple) – (width, height) of the image. Default: (256, 256)

  • heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)

  • sigma (int) – sigma parameter when generate the heatmap. Default: 2

  • keypoints_group (dict) – a dict that stores the index of different types of keypoints

  • colored_skeleton (dict) – a dict that stores the index and color of different skeleton

group_accuracy(accuracies)[source]

Group the accuracy of K keypoints into different kinds.

Parameters

accuracies (list) – accuracy of the K keypoints

Returns

accuracy of N=len(keypoints_group) kinds of keypoints

visualize(image, keypoints, filename)[source]

Visualize an image with its keypoints, and store the result into a file

Parameters
  • image (PIL.Image) –

  • keypoints (torch.Tensor) – keypoints in shape K x 2

  • filename (str) – the name of file to store

class common.vision.datasets.keypoint_detection.keypoint_dataset.Body16KeypointDataset(root, samples, **kwargs)[source]

Dataset with 16 body keypoints.

class common.vision.datasets.keypoint_detection.keypoint_dataset.Hand21KeypointDataset(root, samples, **kwargs)[source]

Dataset with 21 hand keypoints.

Rendered Handpose Dataset

class common.vision.datasets.keypoint_detection.rendered_hand_pose.RenderedHandPose(root, split='train', task='all', download=True, **kwargs)[source]

Rendered Handpose Dataset

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, test, or all.

  • task (str, optional) – Placeholder.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.

  • image_size (tuple) – (width, height) of the image. Default: (256, 256)

  • heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)

  • sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

In root, there will exist following files after downloading.

RHD_published_v2/
    training/
    evaluation/

Hand-3d-Studio Dataset

class common.vision.datasets.keypoint_detection.hand_3d_studio.Hand3DStudio(root, split='train', task='noobject', download=True, **kwargs)[source]

Hand-3d-Studio Dataset

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, test, or all.

  • task (str, optional) – The task to create dataset. Choices include 'noobject': only hands without objects, 'object': only hands interacting with hands, and 'all': all hands. Default: ‘noobject’.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.

  • image_size (tuple) – (width, height) of the image. Default: (256, 256)

  • heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)

  • sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

We found that the original H3D image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training.

Note

In root, there will exist following files after downloading.

H3D_crop/
    annotation.json
    part1/
    part2/
    part3/
    part4/
    part5/

FreiHAND Dataset

class common.vision.datasets.keypoint_detection.freihand.FreiHand(root, split='train', task='all', download=True, **kwargs)[source]

FreiHand Dataset

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, test, or all.

  • task (str, optional) – The post-processing option to create dataset. Choices include 'gs': green screen recording, 'auto': auto colorization without sample points: automatic color hallucination, 'sample': auto colorization with sample points, 'hom': homogenized, and 'all': all hands. Default: ‘all’.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.

  • image_size (tuple) – (width, height) of the image. Default: (256, 256)

  • heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)

  • sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

In root, there will exist following files after downloading.

*.json
training/
evaluation/

Surreal Dataset

class common.vision.datasets.keypoint_detection.surreal.SURREAL(root, split='train', task='all', download=True, **kwargs)[source]

Surreal Dataset

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, test, or all. Default: train.

  • task (str, optional) – Placeholder.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.

  • image_size (tuple) – (width, height) of the image. Default: (256, 256)

  • heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)

  • sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

We found that the original Surreal image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training.

Note

In root, there will exist following files after downloading.

train/
test/
val/

LSP Dataset

class common.vision.datasets.keypoint_detection.lsp.LSP(root, split='train', task='all', download=True, image_size=(256, 256), transforms=None, **kwargs)[source]

Leeds Sports Pose Dataset

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – PlaceHolder.

  • task (str, optional) – Placeholder.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transforms (callable, optional) – PlaceHolder.

  • heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)

  • sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

In root, there will exist following files after downloading.

lsp/
    images/
    joints.mat

Note

LSP is only used for target domain. Due to the small dataset size, the whole dataset is used no matter what split is. Also, the transform is fixed.

Human3.6M Dataset

class common.vision.datasets.keypoint_detection.human36m.Human36M(root, split='train', task='all', download=True, **kwargs)[source]

Human3.6M Dataset

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, test, or all. Default: train.

  • task (str, optional) – Placeholder.

  • download (bool, optional) – Placeholder.

  • transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g, Resize.

  • image_size (tuple) – (width, height) of the image. Default: (256, 256)

  • heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)

  • sigma (int) – sigma parameter when generate the heatmap. Default: 2

Note

You need to download Human36M manually. Ensure that there exist following files in the root directory before you using this class.

annotations/
    Human36M_subject11_joint_3d.json
    ...
images/

Note

We found that the original Human3.6M image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training. In root, there will exist following files after crop.

Human36M_crop/
annotations/
    keypoints2d_11.json
    ...

Cross-Domain ReID

Market1501

class common.vision.datasets.reid.market1501.Market1501(root, verbose=True)[source]

Market1501 dataset from Scalable Person Re-identification: A Benchmark (ICCV 2015).

Dataset statistics:
  • identities: 1501 (+1 for background)

  • images: 12936 (train) + 3368 (query) + 15913 (gallery)

  • cameras: 6

Parameters
  • root (str) – Root directory of dataset

  • verbose (bool, optional) – If true, print dataset statistics after loading the dataset. Default: True

translate(transform, target_root)[source]

Translate an image and save it into a specified directory

Parameters
  • transform (callable) – a transform function that maps images from one domain to another domain

  • target_root (str) – the root directory to save images

DukeMTMC-reID

class common.vision.datasets.reid.dukemtmc.DukeMTMC(root, verbose=True)[source]

DukeMTMC-reID dataset from Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking (ECCV 2016).

Dataset statistics:
  • identities: 1404 (train + query)

  • images:16522 (train) + 2228 (query) + 17661 (gallery)

  • cameras: 8

Parameters
  • root (str) – Root directory of dataset

  • verbose (bool, optional) – If true, print dataset statistics after loading the dataset. Default: True

translate(transform, target_root)[source]

Translate an image and save it into a specified directory

Parameters
  • transform (callable) – a transform function that maps images from one domain to another domain

  • target_root (str) – the root directory to save images

MSMT17

class common.vision.datasets.reid.msmt17.MSMT17(root, verbose=True)[source]

MSMT17 dataset from Person Transfer GAN to Bridge Domain Gap for Person Re-Identification (CVPR 2018).

Dataset statistics:
  • identities: 4101

  • images: 32621 (train) + 11659 (query) + 82161 (gallery)

  • cameras: 15

Parameters
  • root (str) – Root directory of dataset

  • verbose (bool, optional) – If true, print dataset statistics after loading the dataset. Default: True

translate(transform, target_root)[source]

Translate an image and save it into a specified directory

Parameters
  • transform (callable) – a transform function that maps images from one domain to another domain

  • target_root (str) – the root directory to save images

Natural Object Recognition

Stanford Dogs

class common.vision.datasets.stanford_dogs.StanfordDogs(root, split, sample_rate=100, download=False, **kwargs)[source]

The Stanford Dogs contains 20,580 images of 120 breeds of dogs from around the world. Each category is composed of exactly 100 training examples and around 72 testing examples.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

Stanford Cars

class common.vision.datasets.stanford_cars.StanfordCars(root, split, sample_rate=100, download=False, **kwargs)[source]

The Stanford Cars contains 16,185 images of 196 classes of cars. Each category has been split roughly in a 50-50 split. There are 8,144 images for training and 8,041 images for testing.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

CUB-200-2011

class common.vision.datasets.cub200.CUB200(root, split, sample_rate=100, download=False, **kwargs)[source]

Caltech-UCSD Birds-200-2011 is a dataset for fine-grained visual recognition with 11,788 images in 200 bird species. It is an extended version of the CUB-200 dataset, roughly doubling the number of images.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

FVGC Aircraft

class common.vision.datasets.aircrafts.Aircraft(root, split, sample_rate=100, download=False, **kwargs)[source]

FVGC-Aircraft is a benchmark for the fine-grained visual categorization of aircraft. The dataset contains 10,200 images of aircraft, with 100 images for each of the 102 different aircraft variants.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

Oxford-IIIT Pet

class common.vision.datasets.oxfordpet.OxfordIIITPet(root, split, sample_rate=100, download=False, **kwargs)[source]

The Oxford-IIIT Pet is a 37-category pet dataset with roughly 200 images for each class.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

COCO-70

class common.vision.datasets.coco70.COCO70(root, split, sample_rate=100, download=False, **kwargs)[source]

COCO-70 dataset is a large-scale classification dataset (1000 images per class) created from COCO Dataset. It is used to explore the effect of fine-tuning with a large amount of data.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • sample_rate (int) – The sampling rates to sample random training images for each category. Choices include 100, 50, 30, 15. Default: 100.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Note

In root, there will exist following files after downloading.

train/
test/
image_list/
    train_100.txt
    train_50.txt
    train_30.txt
    train_15.txt
    test.txt

DTD

class common.vision.datasets.dtd.DTD(root, split, download=False, **kwargs)[source]

The Describable Textures Dataset (DTD) is an evolving collection of textural images in the wild, annotated with a series of human-centric attributes, inspired by the perceptual properties of textures. The task consists in classifying images of textural patterns (47 classes, with 120 training images each). Some of the textures are banded, bubbly, meshed, lined, or porous. The image size ranges between 300x300 and 640x640 pixels.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

OxfordFlowers102

class common.vision.datasets.oxfordflowers.OxfordFlowers102(root, split='train', download=False, **kwargs)[source]

The Oxford Flowers 102 is a consistent of 102 flower categories commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is divided into a training set, a validation set and a test set. The training set and validation set each consist of 10 images per class (totalling 1020 images each). The test set consists of the remaining 6149 images (minimum 20 per class).

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Specialized Image Classification

PatchCamelyon

class common.vision.datasets.patchcamelyon.PatchCamelyon(root, split, download=False, **kwargs)[source]

The PatchCamelyon dataset contains 327680 images of histopathologic scans of lymph node sections. The classification task consists in predicting the presence of metastatic tissue in given image (i.e., two classes). All images are 96x96 pixels

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Retinopathy

class common.vision.datasets.retinopathy.Retinopathy(root, split, download=False, **kwargs)[source]

Retinopathy dataset consists of image-label pairs with high-resolution retina images, and labels that indicate the presence of Diabetic Retinopahy (DR) in a 0-4 scale (No DR, Mild, Moderate, Severe, or Proliferative DR).

Note

You need to download the source data manually into root directory.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

EuroSAT

class common.vision.datasets.eurosat.EuroSAT(root, split='train', download=False, **kwargs)[source]

EuroSAT dataset consists in classifying Sentinel-2 satellite images into 10 different types of land use (Residential, Industrial, River, Highway, etc). The spatial resolution corresponds to 10 meters per pixel, and the image size is 64x64 pixels.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

Resisc45

class common.vision.datasets.resisc45.Resisc45(root, split='train', download=False, **kwargs)[source]

Resisc45 dataset is a scene classification task from remote sensing images. There are 45 classes, containing 700 images each, including tennis court, ship, island, lake, parking lot, sparse residential, or stadium. The image size is RGB 256x256 pixels.

Note

You need to download the source data manually into root directory.

Parameters
  • root (str) – Root directory of dataset

  • split (str, optional) – The dataset split, supports train, or test.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, torchvision.transforms.RandomCrop.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

property num_classes

Number of classes

Docs

Access comprehensive documentation for Transfer Learning Library

View Docs

Tutorials

Get started for Transfer Learning Library

Get Started