The domain of image compression has traditionally used approaches discussed in forums such as ICASSP, ICIP and other very specialized venues like PCS, DCC, and ITU/MPEG expert groups. This workshop and challenge will be the first computer-vision event to explicitly focus on these fields. Many techniques discussed at computer-vision meetings have relevance for lossy compression. For example, super-resolution and artifact removal can be viewed as special cases of the lossy compression problem where the encoder is fixed and only the decoder is trained. But also inpainting, colorization, optical flow, generative adversarial networks and other probabilistic models have been used as part of lossy compression pipelines. Lossy compression is therefore a potential topic that can benefit a lot from a large portion of the CVPR community.
Recent advances in machine learning have led to an increased interest in applying neural networks to the problem of compression. At CVPR 2017, for example, one of the oral presentations was discussing compression using recurrent convolutional networks. In order to foster more growth in this area, this workshop will not only try to encourage more development but also establish baselines, educate, and propose a common benchmark and protocol for evaluation. This is crucial, because without a benchmark, a common way to compare methods, it will be very difficult to measure progress.
We propose hosting an image-compression challenge which specifically targets methods which have been traditionally overlooked, with a focus on neural networks (but also welcomes traditional approaches). Such methods typically consist of an encoder subsystem, taking images and producing representations which are more easily compressed than the pixel representation (e.g., it could be a stack of convolutions, producing an integer feature map), which is then followed by an arithmetic coder. The arithmetic coder uses a probabilistic model of integer codes in order to generate a compressed bit stream. The compressed bit stream makes up the file to be stored or transmitted. In order to decompress this bit stream, two additional steps are needed: first, an arithmetic decoder, which has a shared probability model with the encoder. This reconstructs (losslessly) the integers produced by the encoder. The last step consists of another decoder producing a reconstruction of the original image.
In the computer vision community many authors will be familiar with a multitude of configurations which can act as either the encoder and the decoder, but probably few are familiar with the implementation of an arithmetic coder/decoder. As part of our challenge, we therefore will release a reference arithmetic coder/decoder in order to allow the researchers to focus on the parts of the system for which they are experts.
While having a compression algorithm is an interesting feat by itself, it does not mean much unless the results it produces compare well against other similar algorithms and established baselines on realistic benchmarks. In order to ensure realism, we have collected a set of images which represent a much more realistic view of the types of images which are widely available (unlike the well established benchmarks which rely on the images from the Kodak PhotoCD, having a resolution of 768x512, or Tecnick, which has images of around 1.44 megapixels). We will also provide the performance results from current state-of-the-art compression systems as baselines, like WebP and BPG.
Please check out the discussion forum of the challenge for announcements and discussions related to the challenge.
We will be running two tracks: low-rate compression, to be judged on the quality, and transparent compression, to be judged by the bit rate. For the low-rate compression track, there will be a bitrate threshold that must be met. For the transparent track, there will be several quality thresholds that must be met. In all cases, the submissions will be judged based on the aggregate results across the test set: the test set will be treated as if it were a single ‘target’, instead of (for example) evaluating bpp or PSNR on each image separately.
For the low-rate compression track, the requirement will be that the compression is to less than 0.15 bpp across the full test set. The maximum size of the sum of all files will be released with the test set. In addition, a decoder executable has to be submitted that can run in the provided Docker environment and is capable of decompressing the submitted files. We will impose reasonable limitations for compute and memory of the decoder executable. The submissions in this track that are at or below that bitrate threshold will then be evaluated for best PSNR, best MS-SSIM, and best MOS from human raters.
For the transparent compression track, the requirement will be that the compression quality is at least 40 dB (aggregated) PSNR; at least 0.993 (aggregated) MS-SSIM; and a reasonable quality level using the Butteraugli measure (to be announced later). The submissions in this track that are at or better than these quality thresholds will then be evaluated for lowest total bitrate.
Prizes will given to the winners of the challenges. This is possible thanks to the sponsors.
We note that the organizers will not participate in the challenge and other teams from Google, Twitter and ETH Zurich are not eligible for any prizes.
We provide the same two training datasets as we did last year: Dataset P (“professional”) and Dataset M (“mobile”). The datasets are collected to be representative for images commonly used in the wild, containing around two thousand images. The challenge will allow participants to train neural networks or other methods on any amount of data (it should be possible to train on the data we provide, but we expect participants to have access to additional data, such as ImageNet).
Participants will need to submit a decoder and a file for each validation or test image. The test dataset is going to be released at a later point. To ensure that the decoder is not optimized for the test set, we will require the teams to use one of the decoders submitted in the validation phase of the challenge.
The challenge data is released by the Computer Vision Lab of ETH Zurich, and can be downloaded here:
- Training Dataset P (“professional”) (1.9GB)
- Training Dataset M (“mobile”) (3.8GB)
- Validation Dataset P (“professional”) (129MB)
- Validation Dataset M (“mobile”) (226MB)
The total size of all compressed images should not exceed 4,722,341 bytes for the validation set for the low-rate track. For the transparent track, see above for the PSNR, MS-SSIM and Butteraugli thresholds.
We use Docker to allow you to test your decoder in the same environment as is run on our server. To test your decoder in our environment, install Docker and run, for example:
docker run -v $(pwd):$(pwd) -w $(pwd) gcr.io/clic-215616/compression ./decode.py
The Docker environment and an example decoder can be viewed here:
To submit a decoder to our evaluation server, use the following command:
docker run -v "$(pwd)":"$(pwd)" -w "$(pwd)" gcr.io/clic-215616/submit -n <team> -p <password> -e <email> <decoder> <images>
It requires a team name, a password, an email, an executable decoder, and files representing the compressed
images. Alternatively, the decoder can be a zip file containing an executable named
decode and other supporting files such as model parameters.
The password will be set the first time you submit. You can then use this password to submit updated decoders.
The decoder will be called with no arguments and is expected to reconstruct all validation files (PNG) from the compressed
At the moment, the evaluation server only supports the low-bit rate task. We will provide instructions for the transparent task shortly.
There is a 12GB memory limit to run your decoder, and your decoder should run in a reasonable amount of time on a CPU (hours not days). We aim to provide GPU support later.
All deadlines are 23:59:59 PST.
|December 17th, 2018||Challenge announcement and the training part of the dataset released|
|January 8th, 2019||The validation part of the dataset released, online validation server is made available.|
|March 15th, 2019||The test set is released.|
|March 22th, 2019||The competition closes and participants are expected to have submitted their solutions along with the compressed versions of the test set.|
|April 8th, 2019||Deadline for paper submission and factsheets.|
|April 15th, 2019||Results are released to the participants.|
|April 22rd, 2019||Paper decision notification|
|April 30th, 2019||Camera ready deadline|
Does my model have to reconstruct images in full resolution or can it be cropped?
The decoder has to produce PNG images where each image has the same resolution as the corresponding image in the validation or test set.
How is PSNR calculated?
We compute a single MSE value by averaging across all RGB channels of all pixels of the whole dataset, and from that calculate a PSNR value.
The evaluation server gives “ERROR: Missing image IMG_20170114_210112.png”. What am I doing wrong?
The error means that the decoder failed and did not produce all required files. This could have many reasons. If the decoder works locally using our Docker environment but fails on the server, a likely explanation is that it uses too much memory.
In which directory should the decoder save images?
The decoder can save images in the current working directory
. or in any arbitrary subfolder