The domain of image compression has traditionally used approaches discussed in forums such as ICASSP, ICIP and other very specialized venues like PCS, DCC, and ITU/MPEG expert groups. The CLIC workshop and challenge was the first event at a computer vision conference to explicitly focus on image compression. Many techniques discussed at computer vision meetings have relevance for lossy compression. For example, super-resolution and artifact removal can be viewed as special cases of the lossy compression problem where the encoder is fixed and only the decoder is trained. But also inpainting, colorization, optical flow, generative adversarial networks and other probabilistic models have been used as part of lossy compression pipelines. A large portion of the CVPR community's work could therefore be of interest for lossy compression.
Recent advances in machine learning have led to an increased interest in applying neural networks to the problem of compression. At CVPR 2017, for example, one of the oral presentations discussed compression using recurrent convolutional networks. In recent CVPRs, multiple lossy and lossless compression works were presented. In order to foster more growth in this area, this workshop not only encourages more development but also establishes baselines, educates, and proposes a common benchmark and protocol for evaluation. This is crucial, because without a benchmark, a common way to compare methods, it will be very difficult to measure progress.
We host a lossy image and video compression challenge which specifically targets methods which have been traditionally overlooked, with a focus on neural networks, but we also welcome traditional approaches. Such methods typically consist of an encoder subsystem, taking images/videos and producing representations which are more easily compressed than pixel representations (e.g., it could be a stack of convolutions, producing an integer feature map), which is then followed by an arithmetic coder. The arithmetic coder uses a probabilistic model of integer codes in order to generate a compressed bit stream. The compressed bit stream makes up the file to be stored or transmitted. In order to decompress this bit stream, two additional steps are needed: first, an arithmetic decoder, which has a shared probability model with the encoder. This reconstructs (losslessly) the integers produced by the encoder. The last step consists of another decoder producing a reconstruction of the original images/videos.
While having a compression algorithm is an interesting feat by itself, it does not mean much unless the results it produces compare well against other similar algorithms and established baselines on realistic benchmarks. In order to ensure realism, we have collected a set of images which represent a much more realistic view of the types of images which are widely available (unlike the well established benchmarks which rely on the images from the Kodak PhotoCD, having a resolution of 768x512, or Tecnick, which has images of around 1.44 megapixels). For the P-frame track, we will use an existing dataset which we will provide preprocessed to allow for easier training. Additionally, we will use an existing video dataset for evaluation and will not be creating new video content for this challenge.