Vision Datasets¶
Cross-Domain Classification¶
ImageList¶
-
class
common.vision.datasets.imagelist.
ImageList
(root, classes, data_list_file, transform=None, target_transform=None)[source]¶ A generic Dataset class for image classification
- Parameters
root (str) – Root directory of dataset
data_list_file (str) – File to read the image list from.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In data_list_file, each line has 2 values in the following format.
source_dir/dog_xxx.png 0 source_dir/cat_123.png 1 target_dir/dog_xxy.png 0 target_dir/cat_nsdf3.png 1
The first value is the relative path of an image, and the second value is the label of the corresponding image. If your data_list_file has different formats, please over-ride
parse_data_file()
.-
property
num_classes
¶ Number of classes
Office-31¶
-
class
common.vision.datasets.office31.
Office31
(root, task, download=True, **kwargs)[source]¶ Office31 Dataset.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'A'
: amazon,'D'
: dslr and'W'
: webcam.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
amazon/ images/ backpack/ *.jpg ... dslr/ webcam/ image_list/ amazon.txt dslr.txt webcam.txt
-
property
num_classes
¶ Number of classes
Office-Caltech¶
-
class
common.vision.datasets.officecaltech.
OfficeCaltech
(root, task, download=False, **kwargs)[source]¶ Office+Caltech Dataset.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'A'
: amazon,'D'
: dslr,'W'
:webcam and'C'
: caltech.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
amazon/ images/ backpack/ *.jpg ... dslr/ webcam/ caltech/ image_list/ amazon.txt dslr.txt webcam.txt caltech.txt
-
property
num_classes
¶ Number of classes
Office-Home¶
-
class
common.vision.datasets.officehome.
OfficeHome
(root, task, download=False, **kwargs)[source]¶ OfficeHome Dataset.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'Ar'
: Art,'Cl'
: Clipart,'Pr'
: Product and'Rw'
: Real_World.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
Art/ Alarm_Clock/*.jpg ... Clipart/ Product/ Real_World/ image_list/ Art.txt Clipart.txt Product.txt Real_World.txt
-
property
num_classes
¶ Number of classes
VisDA-2017¶
-
class
common.vision.datasets.visda2017.
VisDA2017
(root, task, download=False, **kwargs)[source]¶ VisDA-2017 Dataset
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'Synthetic'
: synthetic images and'Real'
: real-world images.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
train/ aeroplance/ *.png ... validation/ image_list/ train.txt validation.txt
-
property
num_classes
¶ Number of classes
DomainNet¶
-
class
common.vision.datasets.domainnet.
DomainNet
(root, task, split='train', download=False, **kwargs)[source]¶ DomainNet (cleaned version, recommended)
See Moment Matching for Multi-Source Domain Adaptation for details.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'c'
:clipart,'i'
: infograph,'p'
: painting,'q'
: quickdraw,'r'
: real,'s'
: sketchsplit (str, optional) – The dataset split, supports
train
, ortest
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
clipart/ infograph/ painting/ quickdraw/ real/ sketch/ image_list/ clipart.txt ...
-
property
num_classes
¶ Number of classes
PACS¶
-
class
common.vision.datasets.pacs.
PACS
(root, task, split='all', download=True, **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'A'
: amazon,'D'
: dslr and'W'
: webcam.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
art_painting/ dog/ *.jpg ... cartoon/ photo/ sketch image_list/ art_painting.txt cartoon.txt photo.txt sketch.txt
MNIST¶
-
class
common.vision.datasets.digits.
MNIST
(root, mode='L', split='train', download=True, **kwargs)[source]¶ MNIST Dataset.
- Parameters
root (str) – Root directory of dataset where
MNIST/processed/training.pt
andMNIST/processed/test.pt
exist.mode (str) – The channel mode for image. Choices includes
"L"`
,"RGB"
. Default:"L"`
split (str, optional) – The dataset split, supports
train
, ortest
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
USPS¶
-
class
common.vision.datasets.digits.
USPS
(root, mode='L', split='train', download=True, **kwargs)[source]¶ - USPS Dataset.
The data-format is : [label [index:value ]*256 n] * num_lines, where
label
lies in[1, 10]
. The value for each pixel lies in[-1, 1]
. Here we transform thelabel
into[0, 9]
and make pixel values in[0, 255]
.
- Parameters
root (str) – Root directory of dataset to store``USPS`` data files.
mode (str) – The channel mode for image. Choices includes
"L"`
,"RGB"
. Default:"L"`
split (str, optional) – The dataset split, supports
train
, ortest
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
SVHN¶
-
class
common.vision.datasets.digits.
SVHN
(root, mode='L', download=True, **kwargs)[source]¶ SVHN Dataset. Note: The SVHN dataset assigns the label 10 to the digit 0. However, in this Dataset, we assign the label 0 to the digit 0 to be compatible with PyTorch loss functions which expect the class labels to be in the range [0, C-1]
Warning
This class needs scipy to load data from .mat format.
- Parameters
root (str) – Root directory of dataset where directory
SVHN
exists.mode (str) – The channel mode for image. Choices includes
"L"`
,"RGB"
. Default:"RGB"`
split (str, optional) – The dataset split, supports
train
, ortest
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
transforms.RandomCrop
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
Partial Cross-Domain Classification¶
Partial Wrapper¶
-
common.vision.datasets.partial.
partial
(dataset_class, partial_classes)[source]¶ Convert a dataset into its partial version.
In other words, those samples which doesn’t belong to partial_classes will be discarded. Yet partial will not change the label space of dataset_class.
- Parameters
dataset_class (class) – Dataset class. Only subclass of
ImageList
can be partial.partial_classes (sequence[str]) – A sequence of which categories need to be kept in the partial dataset. Each element of partial_classes must belong to the classes list of dataset_class.
Examples:
>>> partial_classes = ['back_pack', 'bike', 'calculator', 'headphones', 'keyboard'] >>> # create a partial dataset class >>> PartialOffice31 = partial(Office31, partial_classes) >>> # create an instance of the partial dataset >>> dataset = PartialDataset(root="data/office31", task="A")
-
common.vision.datasets.partial.
default_partial
(dataset_class)[source]¶ Default partial used in some paper.
- Parameters
dataset_class (class) – Dataset class. Currently, dataset_class must be one of
Office31
,OfficeHome
,VisDA2017
,ImageNetCaltech
andCaltechImageNet
.
Caltech-256->ImageNet-1k¶
-
class
common.vision.datasets.partial.caltech_imagenet.
CaltechImageNet
(root, task, download=True, **kwargs)[source]¶ Caltech-ImageNet is constructed from Caltech-256 and ImageNet-1K .
They share 84 common classes. Caltech-ImageNet keeps all classes of Caltech-256. The label is based on the Caltech256 (class 0-255) . The private classes of ImageNet-1K is discarded.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'C'
:Caltech-256,'I'
: ImageNet-1K validation set.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
You need to put
train
andval
directory of ImageNet-1K manually in root directory since ImageNet-1K is no longer publicly accessible. DALIB will only download Caltech-256 and ImageList automatically. In root, there will exist following files after downloading.train/ n01440764/ ... val/ 256_ObjectCategories/ 001.ak47/ ... image_list/ caltech_256_list.txt ...
ImageNet-1k->Caltech-256¶
-
class
common.vision.datasets.partial.imagenet_caltech.
ImageNetCaltech
(root, task, download=True, **kwargs)[source]¶ ImageNet-Caltech is constructed from Caltech-256 and ImageNet-1K .
They share 84 common classes. ImageNet-Caltech keeps all classes of ImageNet-1K. The label is based on the ImageNet-1K (class 0-999) . The private classes of Caltech-256 is discarded.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'C'
:Caltech-256,'I'
: ImageNet-1K training set.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
You need to put
train
andval
directory of ImageNet-1K manually in root directory since ImageNet-1K is no longer publicly accessible. DALIB will only download Caltech-256 and ImageList automatically. In root, there will exist following files after downloading.train/ n01440764/ ... val/ 256_ObjectCategories/ 001.ak47/ ... image_list/ caltech_256_list.txt ...
Open Set Cross-Domain Classification¶
Open Set Wrapper¶
-
common.vision.datasets.openset.
open_set
(dataset_class, public_classes, private_classes=())[source]¶ Convert a dataset into its open-set version.
In other words, those samples which doesn’t belong to private_classes will be marked as “unknown”.
Be aware that open_set will change the label number of each category.
- Parameters
dataset_class (class) – Dataset class. Only subclass of
ImageList
can be open-set.public_classes (sequence[str]) – A sequence of which categories need to be kept in the open-set dataset. Each element of public_classes must belong to the classes list of dataset_class.
private_classes (sequence[str], optional) – A sequence of which categories need to be marked as “unknown” in the open-set dataset. Each element of private_classes must belong to the classes list of dataset_class. Default: ().
Examples:
>>> public_classes = ['back_pack', 'bike', 'calculator', 'headphones', 'keyboard'] >>> private_classes = ['laptop_computer', 'monitor', 'mouse', 'mug', 'projector'] >>> # create a open-set dataset class which has classes >>> # 'back_pack', 'bike', 'calculator', 'headphones', 'keyboard' and 'unknown'. >>> OpenSetOffice31 = open_set(Office31, public_classes, private_classes) >>> # create an instance of the open-set dataset >>> dataset = OpenSetDataset(root="data/office31", task="A")
-
common.vision.datasets.openset.
default_open_set
(dataset_class, source)[source]¶ Default open-set used in some paper.
- Parameters
dataset_class (class) – Dataset class. Currently, dataset_class must be one of
Office31
,OfficeHome
,VisDA2017
,source (bool) – Whether the dataset is used for source domain or not.
Cross-Domain Regression¶
ImageRegression¶
-
class
common.vision.datasets.regression.image_regression.
ImageRegression
(root, factors, data_list_file, transform=None, target_transform=None)[source]¶ A generic Dataset class for domain adaptation in image regression
- Parameters
root (str) – Root directory of dataset
factors (sequence[str]) – Factors selected. Default: (‘scale’, ‘position x’, ‘position y’).
data_list_file (str) – File to read the image list from.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In data_list_file, each line has 1+len(factors) values in the following format.
source_dir/dog_xxx.png x11, x12, ... source_dir/cat_123.png x21, x22, ... target_dir/dog_xxy.png x31, x32, ... target_dir/cat_nsdf3.png x41, x42, ...
The first value is the relative path of an image, and the rest values are the ground truth of the corresponding factors. If your data_list_file has different formats, please over-ride
ImageRegression.parse_data_file()
.
DSprites¶
-
class
common.vision.datasets.regression.dsprites.
DSprites
(root, task, split='train', factors=('scale', 'position x', 'position y'), download=True, target_transform=None, **kwargs)[source]¶ DSprites Dataset.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'C'
: Color,'N'
: Noisy and'S'
: Scream.split (str, optional) – The dataset split, supports
train
, ortest
.factors (sequence[str]) – Factors selected. Default: (‘scale’, ‘position x’, ‘position y’).
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
color/ ... noisy/ scream/ image_list/ color_train.txt noisy_train.txt scream_train.txt color_test.txt noisy_test.txt scream_test.txt
MPI3D¶
-
class
common.vision.datasets.regression.mpi3d.
MPI3D
(root, task, split='train', factors=('horizontal axis', 'vertical axis'), download=True, target_transform=None, **kwargs)[source]¶ MPI3D Dataset.
- Parameters
root (str) – Root directory of dataset
task (str) – The task (domain) to create dataset. Choices include
'C'
: Color,'N'
: Noisy and'S'
: Scream.split (str, optional) – The dataset split, supports
train
, ortest
.factors (sequence[str]) – Factors selected. Default: (‘horizontal axis’, ‘vertical axis’).
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
real/ ... realistic/ toy/ image_list/ real_train.txt realistic_train.txt toy_train.txt real_test.txt realistic_test.txt toy_test.txt
Cross-Domain Segmentation¶
SegmentationList¶
-
class
common.vision.datasets.segmentation.segmentation_list.
SegmentationList
(root, classes, data_list_file, label_list_file, data_folder, label_folder, id_to_train_id=None, train_id_to_color=None, transforms=None)[source]¶ A generic Dataset class for domain adaptation in image segmentation
- Parameters
root (str) – Root directory of dataset
classes (seq[str]) – The names of all the classes
data_list_file (str) – File to read the image list from.
label_list_file (str) – File to read the label list from.
data_folder (str) – Sub-directory of the image.
label_folder (str) – Sub-directory of the label.
mean (seq[float]) – mean BGR value. Normalize and convert to the image if not None. Default: None.
id_to_train_id (dict, optional) – the map between the id on the label and the actual train id.
train_id_to_color (seq, optional) – the map between the train id and the color.
transforms (callable, optional) – A function/transform that takes in (PIL Image, label) pair and returns a transformed version. E.g,
Resize
.
Note
In
data_list_file
, each line is the relative path of an image. If your data_list_file has different formats, please over-rideparse_data_file()
.source_dir/dog_xxx.png target_dir/dog_xxy.png
In
label_list_file
, each line is the relative path of an label. If your label_list_file has different formats, please over-rideparse_label_file()
.Warning
When mean is not None, please do not provide Normalize and ToTensor in transforms.
-
decode_target
(target)[source]¶ Decode label (each value is integer) into the corresponding RGB value.
- Parameters
target (numpy.array) – label in shape H x W
- Returns
RGB label (PIL Image) in shape H x W x 3
-
property
evaluate_classes
¶ The name of classes to be evaluated
-
property
ignore_classes
¶ The name of classes to be ignored
-
property
num_classes
¶ Number of classes
-
parse_data_file
(file_name)[source]¶ Parse file to image list
- Parameters
file_name (str) – The path of data file
- Returns
List of image path
Cityscapes¶
-
class
common.vision.datasets.segmentation.cityscapes.
Cityscapes
(root, split='train', data_folder='leftImg8bit', label_folder='gtFine', **kwargs)[source]¶ Cityscapes is a real-world semantic segmentation dataset collected in driving scenarios.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, orval
.data_folder (str, optional) – Sub-directory of the image. Default: ‘leftImg8bit’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘gtFine’.
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g,
Resize
.
Note
You need to download Cityscapes manually. Ensure that there exist following files in the root directory before you using this class.
leftImg8bit/ train/ val/ test/ gtFine/ train/ val/ test/
GTA5¶
-
class
common.vision.datasets.segmentation.gta5.
GTA5
(root, split='train', data_folder='images', label_folder='labels', **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
.data_folder (str, optional) – Sub-directory of the image. Default: ‘images’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘labels’.
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g,
Resize
.
Note
You need to download GTA5 manually. Ensure that there exist following directories in the root directory before you using this class.
images/ labels/
Synthia¶
-
class
common.vision.datasets.segmentation.synthia.
Synthia
(root, split='train', data_folder='RGB', label_folder='synthia_mapped_to_cityscapes', **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
.data_folder (str, optional) – Sub-directory of the image. Default: ‘RGB’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘synthia_mapped_to_cityscapes’.
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g,
Resize
.
Note
You need to download GTA5 manually. Ensure that there exist following directories in the root directory before you using this class.
RGB/ synthia_mapped_to_cityscapes/
Foggy Cityscapes¶
-
class
common.vision.datasets.segmentation.cityscapes.
FoggyCityscapes
(root, split='train', data_folder='leftImg8bit_foggy', label_folder='gtFine', beta=0.02, **kwargs)[source]¶ Foggy Cityscapes is a real-world semantic segmentation dataset collected in foggy driving scenarios.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, orval
.data_folder (str, optional) – Sub-directory of the image. Default: ‘leftImg8bit’.
label_folder (str, optional) – Sub-directory of the label. Default: ‘gtFine’.
beta (float, optional) – The parameter for foggy. Choices includes: 0.005, 0.01, 0.02. Default: 0.02
mean (seq[float]) – mean BGR value. Normalize the image if not None. Default: None.
transforms (callable, optional) – A function/transform that takes in (PIL image, label) pair and returns a transformed version. E.g,
Resize
.
Note
You need to download Cityscapes manually. Ensure that there exist following files in the root directory before you using this class.
leftImg8bit_foggy/ train/ val/ test/ gtFine/ train/ val/ test/
Cross-Domain Keypoint Detection¶
Dataset Base for Keypoint Detection¶
-
class
common.vision.datasets.keypoint_detection.keypoint_dataset.
KeypointDataset
(root, num_keypoints, samples, transforms=None, image_size=(256, 256), heatmap_size=(64, 64), sigma=2, keypoints_group=None, colored_skeleton=None)[source]¶ A generic dataset class for image keypoint detection
- Parameters
root (str) – Root directory of dataset
num_keypoints (int) – Number of keypoints
samples (list) – list of data
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g,
Resize
.image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
keypoints_group (dict) – a dict that stores the index of different types of keypoints
colored_skeleton (dict) – a dict that stores the index and color of different skeleton
-
group_accuracy
(accuracies)[source]¶ Group the accuracy of K keypoints into different kinds.
- Parameters
accuracies (list) – accuracy of the K keypoints
- Returns
accuracy of
N=len(keypoints_group)
kinds of keypoints
-
visualize
(image, keypoints, filename)[source]¶ Visualize an image with its keypoints, and store the result into a file
- Parameters
image (PIL.Image) –
keypoints (torch.Tensor) – keypoints in shape K x 2
filename (str) – the name of file to store
Rendered Handpose Dataset¶
-
class
common.vision.datasets.keypoint_detection.rendered_hand_pose.
RenderedHandPose
(root, split='train', task='all', download=True, **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
,test
, orall
.task (str, optional) – Placeholder.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g,
Resize
.image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
Note
In root, there will exist following files after downloading.
RHD_published_v2/ training/ evaluation/
Hand-3d-Studio Dataset¶
-
class
common.vision.datasets.keypoint_detection.hand_3d_studio.
Hand3DStudio
(root, split='train', task='noobject', download=True, **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
,test
, orall
.task (str, optional) – The task to create dataset. Choices include
'noobject'
: only hands without objects,'object'
: only hands interacting with hands, and'all'
: all hands. Default: ‘noobject’.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g,
Resize
.image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
Note
We found that the original H3D image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training.
Note
In root, there will exist following files after downloading.
H3D_crop/ annotation.json part1/ part2/ part3/ part4/ part5/
FreiHAND Dataset¶
-
class
common.vision.datasets.keypoint_detection.freihand.
FreiHand
(root, split='train', task='all', download=True, **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
,test
, orall
.task (str, optional) – The post-processing option to create dataset. Choices include
'gs'
: green screen recording,'auto'
: auto colorization without sample points: automatic color hallucination,'sample'
: auto colorization with sample points,'hom'
: homogenized, and'all'
: all hands. Default: ‘all’.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g,
Resize
.image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
Note
In root, there will exist following files after downloading.
*.json training/ evaluation/
Surreal Dataset¶
-
class
common.vision.datasets.keypoint_detection.surreal.
SURREAL
(root, split='train', task='all', download=True, **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
,test
, orall
. Default:train
.task (str, optional) – Placeholder.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g,
Resize
.image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
Note
We found that the original Surreal image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training.
Note
In root, there will exist following files after downloading.
train/ test/ val/
LSP Dataset¶
-
class
common.vision.datasets.keypoint_detection.lsp.
LSP
(root, split='train', task='all', download=True, image_size=(256, 256), transforms=None, **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – PlaceHolder.
task (str, optional) – Placeholder.
download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transforms (callable, optional) – PlaceHolder.
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
Note
In root, there will exist following files after downloading.
lsp/ images/ joints.mat
Note
LSP is only used for target domain. Due to the small dataset size, the whole dataset is used no matter what
split
is. Also, the transform is fixed.
Human3.6M Dataset¶
-
class
common.vision.datasets.keypoint_detection.human36m.
Human36M
(root, split='train', task='all', download=True, **kwargs)[source]¶ -
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
,test
, orall
. Default:train
.task (str, optional) – Placeholder.
download (bool, optional) – Placeholder.
transforms (callable, optional) – A function/transform that takes in a dict (which contains PIL image and its labels) and returns a transformed version. E.g,
Resize
.image_size (tuple) – (width, height) of the image. Default: (256, 256)
heatmap_size (tuple) – (width, height) of the heatmap. Default: (64, 64)
sigma (int) – sigma parameter when generate the heatmap. Default: 2
Note
You need to download Human36M manually. Ensure that there exist following files in the root directory before you using this class.
annotations/ Human36M_subject11_joint_3d.json ... images/
Note
We found that the original Human3.6M image is in high resolution while most part in an image is background, thus we crop the image and keep only the surrounding area of hands (1.5x bigger than hands) to speed up training. In root, there will exist following files after crop.
Human36M_crop/ annotations/ keypoints2d_11.json ...
Cross-Domain ReID¶
Market1501¶
-
class
common.vision.datasets.reid.market1501.
Market1501
(root, verbose=True)[source]¶ Market1501 dataset from Scalable Person Re-identification: A Benchmark (ICCV 2015).
- Dataset statistics:
identities: 1501 (+1 for background)
images: 12936 (train) + 3368 (query) + 15913 (gallery)
cameras: 6
- Parameters
DukeMTMC-reID¶
-
class
common.vision.datasets.reid.dukemtmc.
DukeMTMC
(root, verbose=True)[source]¶ DukeMTMC-reID dataset from Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking (ECCV 2016).
- Dataset statistics:
identities: 1404 (train + query)
images:16522 (train) + 2228 (query) + 17661 (gallery)
cameras: 8
- Parameters
MSMT17¶
-
class
common.vision.datasets.reid.msmt17.
MSMT17
(root, verbose=True)[source]¶ MSMT17 dataset from Person Transfer GAN to Bridge Domain Gap for Person Re-Identification (CVPR 2018).
- Dataset statistics:
identities: 4101
images: 32621 (train) + 11659 (query) + 82161 (gallery)
cameras: 15
- Parameters
Natural Object Recognition¶
Stanford Dogs¶
-
class
common.vision.datasets.stanford_dogs.
StanfordDogs
(root, split, sample_rate=100, download=False, **kwargs)[source]¶ The Stanford Dogs contains 20,580 images of 120 breeds of dogs from around the world. Each category is composed of exactly 100 training examples and around 72 testing examples.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.sample_rate (int) – The sampling rates to sample random
training
images for each category. Choices include 100, 50, 30, 15. Default: 100.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
train/ test/ image_list/ train_100.txt train_50.txt train_30.txt train_15.txt test.txt
Stanford Cars¶
-
class
common.vision.datasets.stanford_cars.
StanfordCars
(root, split, sample_rate=100, download=False, **kwargs)[source]¶ The Stanford Cars contains 16,185 images of 196 classes of cars. Each category has been split roughly in a 50-50 split. There are 8,144 images for training and 8,041 images for testing.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.sample_rate (int) – The sampling rates to sample random
training
images for each category. Choices include 100, 50, 30, 15. Default: 100.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
train/ test/ image_list/ train_100.txt train_50.txt train_30.txt train_15.txt test.txt
CUB-200-2011¶
-
class
common.vision.datasets.cub200.
CUB200
(root, split, sample_rate=100, download=False, **kwargs)[source]¶ Caltech-UCSD Birds-200-2011 is a dataset for fine-grained visual recognition with 11,788 images in 200 bird species. It is an extended version of the CUB-200 dataset, roughly doubling the number of images.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.sample_rate (int) – The sampling rates to sample random
training
images for each category. Choices include 100, 50, 30, 15. Default: 100.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
train/ test/ image_list/ train_100.txt train_50.txt train_30.txt train_15.txt test.txt
FVGC Aircraft¶
-
class
common.vision.datasets.aircrafts.
Aircraft
(root, split, sample_rate=100, download=False, **kwargs)[source]¶ FVGC-Aircraft is a benchmark for the fine-grained visual categorization of aircraft. The dataset contains 10,200 images of aircraft, with 100 images for each of the 102 different aircraft variants.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.sample_rate (int) – The sampling rates to sample random
training
images for each category. Choices include 100, 50, 30, 15. Default: 100.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
train/ test/ image_list/ train_100.txt train_50.txt train_30.txt train_15.txt test.txt
Oxford-IIIT Pet¶
-
class
common.vision.datasets.oxfordpet.
OxfordIIITPet
(root, split, sample_rate=100, download=False, **kwargs)[source]¶ The Oxford-IIIT Pet is a 37-category pet dataset with roughly 200 images for each class.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.sample_rate (int) – The sampling rates to sample random
training
images for each category. Choices include 100, 50, 30, 15. Default: 100.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
train/ test/ image_list/ train_100.txt train_50.txt train_30.txt train_15.txt test.txt
COCO-70¶
-
class
common.vision.datasets.coco70.
COCO70
(root, split, sample_rate=100, download=False, **kwargs)[source]¶ COCO-70 dataset is a large-scale classification dataset (1000 images per class) created from COCO Dataset. It is used to explore the effect of fine-tuning with a large amount of data.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.sample_rate (int) – The sampling rates to sample random
training
images for each category. Choices include 100, 50, 30, 15. Default: 100.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Note
In root, there will exist following files after downloading.
train/ test/ image_list/ train_100.txt train_50.txt train_30.txt train_15.txt test.txt
DTD¶
-
class
common.vision.datasets.dtd.
DTD
(root, split, download=False, **kwargs)[source]¶ The Describable Textures Dataset (DTD) is an evolving collection of textural images in the wild, annotated with a series of human-centric attributes, inspired by the perceptual properties of textures. The task consists in classifying images of textural patterns (47 classes, with 120 training images each). Some of the textures are banded, bubbly, meshed, lined, or porous. The image size ranges between 300x300 and 640x640 pixels.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
OxfordFlowers102¶
-
class
common.vision.datasets.oxfordflowers.
OxfordFlowers102
(root, split='train', download=False, **kwargs)[source]¶ The Oxford Flowers 102 is a consistent of 102 flower categories commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images. The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is divided into a training set, a validation set and a test set. The training set and validation set each consist of 10 images per class (totalling 1020 images each). The test set consists of the remaining 6149 images (minimum 20 per class).
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Specialized Image Classification¶
PatchCamelyon¶
-
class
common.vision.datasets.patchcamelyon.
PatchCamelyon
(root, split, download=False, **kwargs)[source]¶ The PatchCamelyon dataset contains 327680 images of histopathologic scans of lymph node sections. The classification task consists in predicting the presence of metastatic tissue in given image (i.e., two classes). All images are 96x96 pixels
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Retinopathy¶
-
class
common.vision.datasets.retinopathy.
Retinopathy
(root, split, download=False, **kwargs)[source]¶ Retinopathy dataset consists of image-label pairs with high-resolution retina images, and labels that indicate the presence of Diabetic Retinopahy (DR) in a 0-4 scale (No DR, Mild, Moderate, Severe, or Proliferative DR).
Note
You need to download the source data manually into root directory.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
EuroSAT¶
-
class
common.vision.datasets.eurosat.
EuroSAT
(root, split='train', download=False, **kwargs)[source]¶ EuroSAT dataset consists in classifying Sentinel-2 satellite images into 10 different types of land use (Residential, Industrial, River, Highway, etc). The spatial resolution corresponds to 10 meters per pixel, and the image size is 64x64 pixels.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
Resisc45¶
-
class
common.vision.datasets.resisc45.
Resisc45
(root, split='train', download=False, **kwargs)[source]¶ Resisc45 dataset is a scene classification task from remote sensing images. There are 45 classes, containing 700 images each, including tennis court, ship, island, lake, parking lot, sparse residential, or stadium. The image size is RGB 256x256 pixels.
Note
You need to download the source data manually into root directory.
- Parameters
root (str) – Root directory of dataset
split (str, optional) – The dataset split, supports
train
, ortest
.transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g,
torchvision.transforms.RandomCrop
.target_transform (callable, optional) – A function/transform that takes in the target and transforms it.
-
property
num_classes
¶ Number of classes