Equipe
Site web de l'équipe
Lieu
IRISA Rennes
Unité de recherche
IRISA - UMR 6074
Description du sujet de la thèse
Health data, social networks, electricity consumption... Vast quantities of personal data are collected today by private companies or public organizations. Various legal, monetary, or visibility incentives push data holders to envision sharing anonymized versions of the collected datasets or machine learning models trained over them. Indeed, sharing data or models at large, e.g., as open data, is expected to bring strong benefits (strengthening, e.g., scientific studies, innovation, public policies). Privacy-preserving data publishing techniques are dedicated to the sanitization of personal data before sharing in order to enjoy (hopefully strong enough) privacy guarantees. Substantial progress has been made during the last two decades, leading to a wealth of privacy models and algorithms, the differential privacy family being most prominent today [2, 6]. However, the rich body of work makes it hard for non expert users to understand clearly the privacy implications of choosing a specific privacy model or algorithm, or of selecting suitable values for their privacy parameters. In parallel to privacy-preserving data publishing techniques, the study of attacks on sanitized data or models has grown tremendously [7] (see, e.g., membership inference attacks [4, 8]). Attacks on sanitized data and models have been used for auditing real-life implementations of differentially private algorithms [5]. However, although privacy auditing can help obtain empirical estimates of the privacy guarantees offered by an algorithm and its privacy parameters, using privacy auditing techniques for risk analyzis is non trivial because of the large number of possible attackers and of the large costs of each attack. The main goal of this PhD thesis is to make use of today’s attacks on sanitized data or models in order to allow the data holder to analyze the risks related to one or more publications. This will require to formalize the attackers (e.g., privacy games, formal models dedicated to representing and analyzing security issues [3]), to structure the space of attackers (e.g., generalization/specialization of attackers, implications), and to design algorithms for exploring efficiently the resulting space (e.g., multi-criteria optimization algorithms). In addition to the core tasks of the project, the successful candidate will contribute to the organisation of competitions where the privacy guarantees of sanitization algorithms are challenged [1].
Bibliographie
[1] Tristan Allard, Louis Béziaud, and Sébastien Gambs. Snake challenge: Sanitization algorithms under attack. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23), 2023. [2] Damien Desfontaines and Balázs Pejó. Sok: Differential privacies. Proceedings on Privacy Enhancing Technologies, 2020(2):288–313, 2020. [3] Barbara Kordy (Fila), Ludovic Piètre-Cambacédès, and Patrick Schweitzer. Dag-based attack and defense mod- eling: Don’t miss the forest for the attack trees. Comput. Sci. Rev., 13-14:1–38, 2014. [4] Hongsheng Hu, Zoran A. Salcic, Lichao Sun, Gillian Dobbie, P. Yu, and Xuyun Zhang. Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR), 54:1 – 37, 2021. [5] Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott V. Zaresky-Williams, Edward Raff, Francis Ferraro, and Brian Testa. A general framework for auditing differentially private machine learning. In Advances in Neural Information Processing Systems (NeurIPS ’22), 2022. [6] Ryan McKenna, Gerome Miklau, and Daniel Sheldon. Winning the nist contest: A scalable and general approach to differentially private synthetic data, 2021. [7] Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, and Santiago Zanella-Béguelin. Sok: Let the privacy games begin! a unified treatment of data inference privacy in machine learning. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (S&P ’23), pages 327–345, 2023. [8] Antonin Voyez, Tristan Allard, Gildas Avoine, Pierre Cauchois, Élisa Fromont, and Matthieu Simonin. Member- ship inference attacks on aggregated time series with linear programming. In Proceedings of the 19th International Conference on Security and Cryptography (SECRYPT ’22), 2022.
Liste des encadrants et encadrantes de thèse
Nom, Prénom
Tristan ALLARD
Type d'encadrement
Directeur.trice de thèse
Unité de recherche
IRISA
Département
Equipe
Nom, Prénom
Barbara FILA
Type d'encadrement
2e co-directeur.trice (facultatif)
Unité de recherche
IRISA
Département
Equipe
Contact·s
Nom
Tristan ALLARD
Email
tristan.allard@irisa.fr
Mots-clés
privacy-preserving data publishing, differential privacy, membership inference attacks, risk analysis, formal methods