Defense type
Thesis
Starting date
End date
Location
IRISA Rennes
Room
Métivier
Speaker
Cyrielle MALLART (Linkmedia)
Main department
Theme
Abstract :
This thesis presents several works about relation extraction and classification in articles from Ouest-France, the largest newspaper in France. This use-case reveals several challenges around the available data, including a lack of annotated corpora and unbalanced data. The present works therefore discuss two possible ways to apply the performant state-of-the-art to this scenario, while questioning the relevance of state-of-the-art models here.
A first approach is the detection of irrelevant entity pairs, to catch them before a classification model, so as to improve the quality of classification by improving the quality of samples to predict, when the second solution is active learning, where we incrementally feed samples to the model, selecting at each iteration samples to maximize the prediction performance of the relation classification model.
Those two approaches improve the performance of simple relation classification models, while the complexity of the state-of-the-art models proves not compatible with the type and amount of data currently available at Ouest-France.
Additionally, we quickly explore several options for unsupervised relation extraction, which is not adaptable to our task, or self-supervised representation of relations, which shows enough encouraging results to be explored in the future."
Composition of the jury
Frédéric BÉCHET : LIS UMR 7020 CNRS / AMU / UTLN
Guillaume GRAVIER : UMR IRISA/CNRS
Pierre-François MARTEAU : ENSIBS
Pascale SEBILLOT : UMR IRISA/Insa
Xavier TANNIER : LIMICS, UMRS 1142
Anne VILNAT : LIMSI
Guillaume GRAVIER : UMR IRISA/CNRS
Pierre-François MARTEAU : ENSIBS
Pascale SEBILLOT : UMR IRISA/Insa
Xavier TANNIER : LIMICS, UMRS 1142
Anne VILNAT : LIMSI