Semantic compression of large image collection

Type de soutenance

Thèse

Date de début

ven 29/11/2024 - 14:00

Date de fin

ven 29/11/2024 - 17:00

Salle

Petri- Turing

Orateur

Tom BACHARD

Département principal

D6 - Signal, Image, Langage

Sujet

In this thesis, we explore multi-item compression by exploiting semantic redundancies. First, we show that classical compression frameworks are not adapted to multi-item compression, as the results are encouraging but insufficient. Indeed, as the compression rate increases, the quality of decoded images drastically drops. We conclude that we have to change the compression paradigm. To do so, the distortion evaluation moves to a higher level: semantics. We then looked at how to model and represent this semantic and converged to CLIP, a foundation model, for extracting and encoding this information. We experimentally showed that CLIP has interesting properties for semantically representing and manipulating images, and we built a proof-of-concept semantic-based coder: CoCliCo. This result allowed us to extend CLIP-based compression to multi-item scenarios. In this proposal, a dictionary of simple semantics that encapsulates the semantics of the data collection is learned. We show that this dictionary is also of a semantic nature and is able to describe images in an even more compact representation. This scheme achieves extremely low bitrates while conserving semantics and maintaining a good quality of image.

Composition du jury

Federica Battisti, Associate professor at Padova University, Italy (Reviewer)
Guiseppe Valenzise, Research Scientist at CNRS, Centrale-Supélec, Paris Saclay, France (Reviewer)
Laurent Amsaleg, Research director at CNRS, IRISA, Rennes, France (Examiner)
Ewa Kijak, Associate Professor at University of Rennes, IRISA, Rennes, France (Examiner)
Sergio Barbarossa, Professor at La Sapienza University, Roma, Italy (Examiner)
Thomas Maugey, Research Director at INRIA, Rennes, France (Thesis Director)