Type de soutenance
Thèse
Date de début
Date de fin
Lieu
IRISA Rennes
Salle
Métivier
Orateur
Cyrielle MALLART (Linkmedia)
Département principal
Sujet
Abstract :
This thesis presents several works about relation extraction and classification in articles from Ouest-France, the largest newspaper in France. This use-case reveals several challenges around the available data, including a lack of annotated corpora and unbalanced data. The present works therefore discuss two possible ways to apply the performant state-of-the-art to this scenario, while questioning the relevance of state-of-the-art models here.
A first approach is the detection of irrelevant entity pairs, to catch them before a classification model, so as to improve the quality of classification by improving the quality of samples to predict, when the second solution is active learning, where we incrementally feed samples to the model, selecting at each iteration samples to maximize the prediction performance of the relation classification model.
Those two approaches improve the performance of simple relation classification models, while the complexity of the state-of-the-art models proves not compatible with the type and amount of data currently available at Ouest-France.
Additionally, we quickly explore several options for unsupervised relation extraction, which is not adaptable to our task, or self-supervised representation of relations, which shows enough encouraging results to be explored in the future."
Composition du jury
Frédéric BÉCHET : LIS UMR 7020 CNRS / AMU / UTLN
Guillaume GRAVIER : UMR IRISA/CNRS
Pierre-François MARTEAU : ENSIBS
Pascale SEBILLOT : UMR IRISA/Insa
Xavier TANNIER : LIMICS, UMRS 1142
Anne VILNAT : LIMSI
Guillaume GRAVIER : UMR IRISA/CNRS
Pierre-François MARTEAU : ENSIBS
Pascale SEBILLOT : UMR IRISA/Insa
Xavier TANNIER : LIMICS, UMRS 1142
Anne VILNAT : LIMSI