There are three challenge tracks. In the image compression track, images need to be compressed to the 0.075 bpp, 0.15 bpp, and 0.3 bpp (bits per pixel). In the video compression track, short video clips need to be compressed to below 1 Mbit/s. Finally, in the perceptual metric track, human preferences on pairs of images will have to be predicted. The image pairs will come from the decoders submitted to the image compression track.
For the image compression track, contestants will be asked to compress the entire dataset to three different bit-rates, namely 0.075 bpp, 0.15 bpp, and 0.3 bpp. The winners of the competition will be chosen based on a human perceptual rating task in which pairs of decoded images (from different codecs) are presented to the user. For guidance, several objective metrics will be shown on the leaderboard but not considered for prizes.
We provide training and validation sets of high quality images collected from Unsplash. The test set will contain images of similar quality from potentially different sources. Training on additional data is allowed. Participants will need to submit a decoder and encoded image files. The test dataset is going to be released after the validation phase ends.
The challenge data is hosted by the Computer Vision Lab of ETH Zurich and can be downloaded here:
The total size of all compressed validation images should not exceed 857,362 bytes (0.075 bpp), 1,714,724 bytes (0.15 bpp), and 3,429,448 bytes (0.3 bpp), respectively. This year we are further reducing the allowed model size which should not exceed 250MB.
The video compression track will require entrants to compress 2 second video clips in 720p resolution (60 frames at 30 fps). Instead of splitting the dataset into training and test sets, in this track the entire dataset is released before the test phase.
To discourage overfitting, the model size is added to the compressed dataset size and the sum cannot exceed the target bit-rate of 1 Mbit/s. That is, participants should try to minimize both the dataset size and the model size. The winner will be determined based on MS-SSIM.
In the perceptual metric track you will need to design a metric to rank the participants of the image compression task. Given a pair of images (one being the original and the other being a distorted image) your metric will need to generate a score. We can compare methods A and B by generating scores two scores d(O, A) and d(O, B), where O is the original image. If d(O, A) < d(O, B), the metric prefers method A over method B.
To evaluate your metric, we will compare the metric's preferences with the preferences of human raters. In our image compression evaluation, human raters are presented with three images (O, A, B) and asked to pick one of A or B. The winner of the perceptual metric challenge based on which metric predicted the correct preference the largest amount of times.
We will not provide training data for this task, but we will provide a small validation set to be released by the end of January. The test data will be made available after the image compression challenge closes.