Run-time policy enforcement acceleration via semantic caching

Publié le mar 28/01/2025 - 17:40

Equipe (ou département si l'offre n'est pas rattachée à une équipe)

SHAMAN

Site web de l'équipe

https://www-shaman.irisa.fr

Lieu

Lannion

Unité de recherche

IRISA - UMR 6074

Description du sujet de la thèse

Context

It is widely recognized that some of the data collected, processed and stored is of a personal

nature, making it sensitive and requiring appropriate protec9on. Organiza9ons, whether

public or private, e.g., in domains like healthcare, finance, logis9cs, that handle such data are

subject to increasingly stringent regulations. It is then crucial to equip information systems

with mechanisms that control and restrict access to sensitive data to authorized users only.

While experts in cybersecurity are involved in the design of advanced mechanisms to ensure

data protec9on, an increasing number of individuals and organizations with limited technical

exper9se are tasked with managing large volumes of sensitive data. The challenge then lies in

bridging the gap between sophis9cated data security practices and the practical, user-friendly

tools that non-specialists can implement and understand.

The strategy followed in this project is to enable non-experts to express their security needs

in natural language. By developing systems that can automatically translate natural language

security requirements into formal, enforceable policies, we could make data security

(particularly specification of security requirements) accessible to a wider range of users. Such

an approach will also provide users with a high degree of transparency and confidence in the

protection of their data, making complex security concepts more intuitive. Furthermore, users

should be able to audit and understand how their policies are being enforced, increasing trust

in data integration and automated data management systems.

With this thesis, we seek to democratize data security by designing an approach and

developing tools that allow users to specify data usage and access control policies in a way

that is understandable and manageable without the need for deep technical expertise.

Objective

When security policies are defined within an information system, ensuring their effective

enforcement becomes paramount. In this context, our objective is to enhance data protec9on

by detecting and blocking suspicious queries in real time. This will be based on user behavior

paKerns such as unusual query sequences, high-volume access within a short 9meframe, or

repeated aKempts to access sensitive data from a single user. These paKerns will be codified

into templates of suspicious behavior, allowing the system to quickly check and intercept

queries that match.

A policy cache will store these templates for efficient runtime evaluation. Additionally, for each

user, past queries will be logged and used to assess new queries. If a combination of past and

new queries forms a potentially dangerous transaction, the system will block the new query.

We will explore two approaches for this combination: one attribute-based, which links queries

via shared aKributes, and another based on auditing of query history.

Positioning

Access control (AC) is a vital aspect of safeguarding information systems. Various types of AC

have been proposed in the literature, such as Role-Based (RBAC), Organization-Based (OrBAC),

AKribute-Based (ABAC), etc. These access control systems provide a formal means of

specifying a security policy, usually using logical or constraint languages. Recently, several

research works have addressed the extraction of (ABAC) access control policies from natural

language texts using machine learning and natural language processing (NLP) techniques

[1,2,3]. In a recent work [4], authors recognize that human involvement is essen9al to validate

access control policy predic9on. An interactive (log-based) approach for adap9ve policies has

been proposed in [5], but it does not consider natural language inputs. Furthermore, while AC

blocks direct access, it does not prevent inference attacks. Approaches like [8] handle statistical

inference by limiting access to aggregated data, and [9] addresses semantic inference risks.

The authors of [14] propose an auditing module that focuses on historical query logs in

conjunction with the current query, aiming to identify the inference channel which potentially

leads to violations of access policies.

Caching is a common acceleration strategy in computer science. Researchers in the Data Base

community have been interested in data caching for decades. So_ware acceleration in this

field can be found in data warehouses, distributed databases, web search engines for example

[10]. Semantic caching [11] is a caching technique that uses semantic information to improve

the efficiency of the cache. In semantic caching, the cache stores not only the data but also

the semantics or meaning of the data, which can help to reduce cache misses and improve

cache hit rates. By understanding the seman9cs of the data, the cache can be more intelligent

in predicting and pre-fetching the data that is likely to be needed in the future. Semantic

caching has been seen as a solution to consider response time and energy consumption in

mobile cloud computing [12] or to efficiently rely on FPGA [13]. Exploring semantic caching,

which stores data along with its associated meaning, could serve a dual purpose: on the one

hand, it may empower the detection of inference channels, and on the other hand, its

integration could optimize query processing while enhancing security. This makes it necessary

to rethink the different strategies that caches may rely on. To the best of our knowledge,

security and optimization has never been addressed in a such manner before. Finally, despite

recent progress in AI's explainability [10], especially in machine learning, text-based

generation of explana9ons for access control models handling both permissions and

prohibitions is almost non-existent.

Organization

The PhD thesis will be organized as follows to consider run-time policy enforcement:

• Design an approach that exploits the interaction between security rules and data

dependencies to generate addi9onal rules designed to avoid the problem of inferring

sensitive data.

• Develop a module for policy enforcement and run-time monitoring of queries using

semantic cache techniques.

• Perform dynamic policy adjustment by refining templates so to improve the system’s

ability to block suspicious queries.

• Use available synthetic corpus of security policy requirements such as IBM-CM, iTrust,

• CyberChair to test our approach.

Bibliographie

[1] Nobi, M. & Gupta, M. & Praharaj, L. & Abdelsalam, M.& Krishnan, R. & Sandhu, R. (2022).

Machine Learning in Access Control: A Taxonomy and Survey. 10.48550/arXiv.2207.01739.

[2] Xiao, X, Paradkar A, Thummalapenta S, Xie T (2012) Automated extraction of security

policies from natural-language so_ware documents In: Proc. of the ACM SIGSOFT, NY, USA, FSE

’12, 12:1–12:11.

[3] Narouei, M, Khanpour H, Takabi H, Parde N, Nielsen R (2017) Towards a top-down policy

engineering framework for attribute-based access control In: Proc. of SACMAT ’17, ACM, 103–

114.

[4] John Heaps, Ram Krishnan, Yufei Huang, Jianwei Niu, and Ravi Sandhu (2021) Access

Control Policy Generation from User Stories Using Machine Learning. In Proc. of DBSec 2021,

LNCS, Springer, 19–20.

[5] Karimi, L., Abdelhakim, M., & Joshi, J.B. (2021). Adaptive ABAC Policy Learning: A

Reinforcement Learning Approach. ArXiv, abs/2105.08587.

[6] Farkas, C., & Jajodia, S. (2002). The inference problem: a survey. ACM SIGKDD Explora9ons

NewsleKer, 4(2), 6-11.

[7] Bindschaedler, V., Grubbs, P., Cash, D., Ristenpart, T., & Shma9kov, V. (2017). The tap of

inference in privacy-protected databases. Cryptology ePrint Archive.

[8] Stanley RM Oliveira and Osmar R Zaiane. “Privacy preserving clustering by data

transformation”. In: Journal of Informa9on and Data Management 1.1 (2010), pp. 37–37.

[9] Sabrina De Capitani di Vimerca9 et al. “Confidentiality protection in large databases”. In: A

Comprehensive Guide Through the Italian Database Research Over the Last 25 Years. Springer,

2018, pp. 457– 472.

[10] Carlos Barrios, Mohan Kumar: Service Caching and Computation Reuse Strategies at the

Edge: A Survey. ACM Comput. Surv. 56(2): 43:1-43:38 (2024)

[11] Shaul Dar, Michael J. Franklin, Bj.rn ..r J.nsson, Divesh Srivastava, Michael Tan: Seman9c

Data Caching and Replacement. VLDB 1996: 330-341

[12] Mikael Perrin, Jonathan Mullen, Florian Helff, Le Gruenwald, Laurent d'Orazio: Time-,

Energy-, and Monetary Cost-Aware Cache Design for a Mobile-Cloud Database System.

Big-O(Q)/DMAH@VLDB 2015: 71-85

[13] Van Long Nguyen Huu, Laurent d'Orazio, Emmanuel Casseau, Julien Lallet: MASCARAFPGA

coopera9on model: Query Trimming through accelerators. SSDBM 2021: 203-208

[14] Agoun, J., Terras, J., Hacid, M. S., & Hariri, S. (2023, December). Empowering Data

Federation Security in Polystore Systems. In 2023 20th ACS/IEEE Internti9onal Conference on

Computer Systems and Applica9ons (AICCSA) (pp. 1-8). IEEE.

Liste des encadrants et encadrantes de thèse

D'Orazio Laurent

Type d'encadrement

Directeur.trice de thèse

Unité de recherche

IRISA

Département

D7 - Gestion des données et de la connaissance

Equipe

SHAMAN

Contact·s

Nom

D'Orazio Laurent

laurent.dorazio@irisa.fr

Mots-clés

Security by design, access control, caching