Nowadays, large quantities of graph data can be found in many fields, encoding information about their respective domains. Such data can reveal useful knowledge to the user that analyzes it. However, the size and complexity of real-life datasets hinders their usage by human analysts. To help the users, pattern mining approaches extract frequent local structures, called patterns, from the data, so that they can focus on inferring knowledge from them, instead of analyzing the whole data at once. A well-known problem in pattern mining is the so-called problem of pattern explosion. Even on small datasets, the set of patterns that are extracted by classic pattern mining approaches can be very large in size, and contain many redundancies. In this thesis we propose three approaches that use the Minimum Description Length principle in order to generate and select small, human-sized sets of descriptive graph patterns from graph data. For that, we instantiate the MDL principle in a graph pattern mining context and we propose MDL measures to evaluate sets of graph patterns. We also introduce the notion of ports, allowing to describe the data as a composition of pattern occurrences with no loss of information. We evaluate all our contributions on real-life graph datasets from different domains, including the semantic web.
Jilles Vreeken, Professor, Université de Saarland
Matthijs van Leeuwen, Associate Professor, Université de Leiden
Nathalie Pernelle, Professor, Université Sorbonne Paris Nord
Arnaud Soulet, Associate Professor, Université de Tours
Alexandre Termier, Professor, Université Rennes 1
Peggy Cellier, Associate Professor, Insa Rennes
Sébastien Ferré, Professor, Université Rennes 1