Professional journalism is of utmost importance nowadays. It is a main feature distinguishing dictatorships from democracies, and a mirror sorely needed by society to look upon itself and understand its functioning. In turn, understanding is necessary for making informed decisions, such as political choices.
With the world turning increasingly digital, journalists need to analyze very large amounts of data, while having no control over the structure, organization, and format of the data. Since 2013, my team has been working to understand data journalism and computational fact-checking use cases, to identify and develop tools adapted for this challenging setting. I will describe our SourcesSay project (2020-2024), in which extremely heterogeneous data sources are integrated as graphs, on top of which journalistic applications can be supported through flexible graph queries. I will explain the data source integration module, the role played by Information Extraction and Entity Disambiguation, as well as novel techniques to explore and simplify these graphs.
This work is joint with Angelos Anadiotis, Oana Balalau, Helena Galhardas, Tayeb Merabti, Emmanuel Pietriga, and many other colleagues.
Project Web site: https://sourcessay.inria.fr