NICO: Non-I.I.D. Image dataset with Contexts

News (2022.04.18): An extended version of NICO named NICO++ is released at this url (paper: this url), whose scale is much larger, contexts and categories are much richer.

Introduction

NICO dataset is dedicatedly designed for Non-I.I.D. or OOD (Out-of-Distribution) image classification. It simulates a real world setting that the testing distribution may induce arbitrary shifting from the training distribution, which violates the traditional I.I.D. hypothesis of most ML methods. The typical research directions that the dataset can well support include but are not limited to transfer learning or domain adaptation (when testing distribution is known) and stable learning or domain generalization (when testing distribution is unknown).

The basic idea of constructing the dataset is to label images with both main concepts (e.g. dog) and the contexts (e.g. on grass) that visual concepts appear in. By adjusting the proportions of different contexts in training and testing data, one can control the degree of distribution shift flexibly and conduct studies on different kinds of Non-I.I.D. settings.

Samples with contexts in NICO.

Till now, there are two superclasses: Animal and Vehicle, with 10 classes for Animal and 9 classes for Vehicle. Each class has 9 or 10 contexts. The average number of images per context ranges from 83 to 215, and the average number of images per class is about 1300 images (similar to ImageNet). In total, NICO contains 19 classes, 188 contexts and nearly 25,000 images. The current version has been able to support the training of deep convolution networks (e.g. ResNet-18) from scratch. The scale is still increasing, and is easy to be expanded because of the hierarchical structure.

Samples with contexts in NICO.

Download

The NICO dataset is available at Dropbox or Baidu Cloud (Password: rpds). One can use the whole dataset or only partial data (e.g. one superclass) as you need, and divide it in your own way.

Attention

The intention of releasing NICO is to appeal more attention and promote the research on intrinsic learning mechanisms that possibly light up the way to the third-wave of AI featuring with strong generalization ability. Although some preprocessing tricks like saliency detection or object segmentation can improve the classification performance to some extent, they are not encouraged when exploiting NICO as the benchmark.

Citing

More details about NICO dataset and various Non-I.I.D. settings created using it can be looked up in paper Towards Non-I.I.D. Image Classification: A Dataset and Baselines. And anyone who adopts NICO in his/her experiments should cite this article in their publications and reports (as the citation template below).

@article{he2021towards,
    title={Towards non-iid image classification: A dataset and baselines},
    author={He, Yue and Shen, Zheyan and Cui, Peng},
    journal={Pattern Recognition},
    volume={110},
    pages={107383},
    year={2021},
    publisher={Elsevier}
}

Copyright

Please note that NICO dataset does not own the copyright of images. Only researchers and educators who wish to use the images for non-commercial researches and/or educational purposes, have access to NICO.