Privacy-preserving data publishing: attacks, countermeasures, and risk analysis

Publié le mer 26/06/2024 - 10:50

Equipe

SPICY

Site web de l'équipe

/equipes/spicy

Lieu

IRISA Rennes

Unité de recherche

IRISA - UMR 6074

Description du sujet de la thèse

Health data, social networks, electricity consumption... Vast
quantities of personal data are collected today by private companies
or public organizations. Various legal, monetary, or visibility
incentives push data holders to envision sharing anonymized versions
of the collected datasets or machine learning models trained over
them. Indeed, sharing data or models at large, e.g., as open data, is
expected to bring strong benefits (strengthening, e.g., scientific
studies, innovation, public policies).

Privacy-preserving data publishing techniques are dedicated to the
sanitization of personal data before sharing in order to enjoy
(hopefully strong enough) privacy guarantees. Substantial progress
has been made during the last two decades, leading to a wealth of
privacy models and algorithms, the differential privacy family being
most prominent today [2, 6]. However, the rich body of work makes it
hard for non expert users to understand clearly the privacy
implications of choosing a specific privacy model or algorithm, or of
selecting suitable values for their privacy parameters.

In parallel to privacy-preserving data publishing techniques, the
study of attacks on sanitized data or models has grown tremendously
[7] (see, e.g., membership inference attacks [4, 8]). Attacks on
sanitized data and models have been used for auditing real-life
implementations of differentially private algorithms [5]. However,
although privacy auditing can help obtain empirical estimates of the
privacy guarantees offered by an algorithm and its privacy parameters,
using privacy auditing techniques for risk analyzis is non trivial
because of the large number of possible attackers and of the large
costs of each attack.

The main goal of this PhD thesis is to make use of today’s attacks on
sanitized data or models in order to allow the data holder to analyze
the risks related to one or more publications. This will require to
formalize the attackers (e.g., privacy games, formal models dedicated
to representing and analyzing security issues [3]), to structure the
space of attackers (e.g., generalization/specialization of attackers,
implications), and to design algorithms for exploring efficiently the
resulting space (e.g., multi-criteria optimization algorithms). In
addition to the core tasks of the project, the successful candidate
will contribute to the organisation of competitions where the privacy
guarantees of sanitization algorithms are challenged [1].

Bibliographie

[1] Tristan Allard, Louis Béziaud, and Sébastien Gambs. Snake
challenge: Sanitization algorithms under attack.  Proceedings of the
32nd ACM International Conference on Information and Knowledge
Management (CIKM ’23), 2023.

[2] Damien Desfontaines and Balázs Pejó. Sok: Differential
privacies. Proceedings on Privacy Enhancing Technologies,
2020(2):288–313, 2020.

[3] Barbara Kordy (Fila), Ludovic Piètre-Cambacédès, and Patrick
Schweitzer. Dag-based attack and defense mod- eling: Don’t miss the
forest for the attack trees. Comput. Sci. Rev., 13-14:1–38, 2014.

[4] Hongsheng Hu, Zoran A. Salcic, Lichao Sun, Gillian Dobbie, P. Yu,
and Xuyun Zhang. Membership inference attacks on machine learning: A
survey. ACM Computing Surveys (CSUR), 54:1 – 37, 2021.

[5] Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott
V. Zaresky-Williams, Edward Raff, Francis Ferraro, and Brian Testa. A
general framework for auditing differentially private machine
learning. In Advances in Neural Information Processing Systems
(NeurIPS ’22), 2022.

[6] Ryan McKenna, Gerome Miklau, and Daniel Sheldon. Winning the nist
contest: A scalable and general approach to differentially private
synthetic data, 2021.

[7] Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew
Paverd, Anshuman Suri, Shruti Tople, and Santiago
Zanella-Béguelin. Sok: Let the privacy games begin! a unified
treatment of data inference privacy in machine learning. In
Proceedings of the 2023 IEEE Symposium on Security and Privacy (S&P
’23), pages 327–345, 2023.

[8] Antonin Voyez, Tristan Allard, Gildas Avoine, Pierre Cauchois,
Élisa Fromont, and Matthieu Simonin. Member- ship inference attacks on
aggregated time series with linear programming. In Proceedings of the
19th International Conference on Security and Cryptography (SECRYPT
’22), 2022.

Liste des encadrants et encadrantes de thèse

Tristan ALLARD

Type d'encadrement

Directeur.trice de thèse

Unité de recherche

IRISA

Département

D1 - Systèmes sécurisés et large échelle

Equipe

SPICY

Barbara FILA

Type d'encadrement

2e co-directeur.trice (facultatif)

Unité de recherche

IRISA

Département

D1 - Systèmes sécurisés et large échelle

Equipe

SPICY

Contact·s

Nom

Tristan ALLARD

tristan.allard@irisa.fr

Mots-clés

privacy-preserving data publishing, differential privacy, membership inference attacks, risk analysis, formal methods