In this thesis, we explore multi-item compression by exploiting semantic redundancies. First, we show that classical compression frameworks are not adapted to multi-item compression, as the results are encouraging but insufficient. Indeed, as the compression rate increases, the quality of decoded images drastically drops. We conclude that we have to change the compression paradigm. To do so, the distortion evaluation moves to a higher level: semantics. We then looked at how to model and represent this semantic and converged to CLIP, a foundation model, for extracting and encoding this information. We experimentally showed that CLIP has interesting properties for semantically representing and manipulating images, and we built a proof-of-concept semantic-based coder: CoCliCo. This result allowed us to extend CLIP-based compression to multi-item scenarios. In this proposal, a dictionary of simple semantics that encapsulates the semantics of the data collection is learned. We show that this dictionary is also of a semantic nature and is able to describe images in an even more compact representation. This scheme achieves extremely low bitrates while conserving semantics and maintaining a good quality of image.
Guiseppe Valenzise, Research Scientist at CNRS, Centrale-Supélec, Paris Saclay, France (Reviewer)
Laurent Amsaleg, Research director at CNRS, IRISA, Rennes, France (Examiner)
Ewa Kijak, Associate Professor at University of Rennes, IRISA, Rennes, France (Examiner)
Sergio Barbarossa, Professor at La Sapienza University, Roma, Italy (Examiner)
Thomas Maugey, Research Director at INRIA, Rennes, France (Thesis Director)