Research topics
From a general point of view, my research activities focus on
the
analysis of multimedia documents with the constant preoccupation
of proposing (stochastic) models to combine all the sources of
knowledge available. This
general philosophy translates in
two main areas:
- Spoken document
analysis: detecting and tracking audio events in videos; speaker
segmentation and tracking;
speech recognition; topic segmentation; spoken document indexing. I am
currently
interested in the following topics:
- combining ASR and NLP for robust spoken document
analysis
- integrating knowledge (e.g.
phonetic landmarks) in
HMM-based ASR
- motif/word discovery in audio streams
- Multimedia stream
modeling: joint models of multimedia streams for video analysis.
The aim of this research is to devise models that can integrate the
audio, visual and eventually textual information and represent their
relations (temporal
synchronisation model, correlations, etc.) for the analysis and
structuring of videos and for audiovisual ASR. Current
activities include:
- learning the dependencies in Bayesian networks for
event detection in videos
- multimodal topic segmentation
- speech-driven structuring of TV streams
Recent participation in
projects (contribution to the project)
I am currently involved in the following projects
- OpenSEM: an EIT ICT Labs
open portal for semantic access to videos (video
and spoken content analysis, navigation portal, program comittee of MediaEval 2011)
- Rev-TV: using virtual reality for television program edition (speech
recognition, lip sync)
- Attelage de Systèmes
Hétérogènes (ASH): harnessing heteregeneous speech
recognition systems for collaborative speech recognition (speech
recognition, knowledge integration)
- Évaluations en Traitement Automatique de la
Parole (ETAPE): evaluation campaign on TV stream transcription for the
French language (on behalf of the AFCP)
- Quaero:
multimodal search engines (audio event
detection, multimodal integration, video structure
analysis)
Over the last few years, I have participated to the following projects
- Rapsodis: improving speech recognition with syntax
and semantics
- Demi-Ton:
multimedia stream structuring (multimedia
integration, video structuring, speech transcription, transcribed text
analysis)
- Pelops: Soccer video analysis and repurposing (sound
class detection, word spotting)
- ESTER:
French spoken document rich transcription evaluation campaign (campaign
organization; BN rich transcription system development)
- Domus Videum: video abstracting and navigation (sound
class detection, multimedia integration)
Participation in the activities of the MUSCLE European Network of
Excellence.
Ph. D. students
Ongoing Ph. D. thesis I am supervising:
- Ludivine Kuznik. Browsing news archives (in collaboration with
INA - funding pending)
- Cédric Penet. Multimodal content based analysis for video
on demand (in collaboration with Technicolor)
- Stefan Ziegler. Landmark driven speech recognition
- Julien Fayolle. Information retrieval in TV streams
- Camille Guinaudeau. Speeh-based video structuring
Past Ph. D. students:
- Armando Muscariello. Variability tolerant discovery of arbitrary
repeating patterns in audio data with template matching. Ph. D. thesis,
Université de Rennes 1, January 2011.
- Gwénolé Lecorvé. Unsupervised
topic adaptation for robust speech recognition. Ph. D. thesis,
Université de Rennes 1, November 2010 (in French).
- Siwar Baghdadi. Sparse events detection in videos
with Bayesian networks. Ph. D. thesis, Université de Rennes 1,
February 2010 (in French).
- Wen Xuan Teng. Rapid speaker
adaptation using a variable subspace of reference models. Ph.
D. thesis, Université de Rennes 1, December 2008.
- Stephane Huet.Morpho-syntactic knowledge and topic
adaptation to improve speech recognition. Ph. D. thesis, Université
de Rennes 1, December 2007 (in French)
- Manolis Delakis. Multimodal
tennis video structure analysis with segment models. Ph. D.
thesis, Université de Rennes 1, October 2006.
- Ewa Kijak. Multimodal
sport video structuring with stochastic models. Ph. D.
thesis,
Université de Rennes 1, 2003 (in French).
More Ph. D. in which I have been or I am involved in (but not
supervising in any way):
- Romain Tavenard. Indexation de séquences de descripteurs
pour exploiter audio et vidéo.
- Xavier Naturel. Automatic structuring of TV streams.
Ph. D. thesis, Université de Rennes 1, 2007 (in French).
- Mathieu Ben. Robust approaches for automatic speaker verification
using normalization and hierarchical adaptation. Ph. D. thesis,
Université de Rennes 1, 2004 (in French).
Software development
I am actively participating in the development of the following free
software toolkits:
- SPro,
a speech signal processing toolkit
- AudioSeg,
generic tools for audio segmentation
- Sirocco,
a large vocabulary decoder for speech recognition
These toolkits are the base (with a little help from HTK) of the
IRENE broadcast news indexing platform ,
orginally developped for the French
Ester rich transcription evaluation campaign in collaboration with François Yvon. Also
check out my free ESTER
resources page.
In the framework of the ASR/NLP work group I am coanimating, we have
developed
several
pieces of code related to spoken document analysis. Among others, worth
mentioning are:
- IRISA News Topic Segmenter: wrapper to
topic-segmenter for the segmentation of broadcast news
- kiwi: keyword extraction from transcripts
- fishnet: fish texts on the Internet related to a
topic characterized by a few keywords (as given by kiwi)
- match-maker: corpus based acquisition of semantic
relations (and a bunch of relations from a large newspaper corpus)
These toolkits are not open-source freely distributed softwares but
we are nevertheless willing to share. Feel free to contact
me
should you be interested in any of those.
Selected recent publications
- Gwénolé Lecorvé,
Guillaume Gravier, and Pascale Sébillot. Automatically finding
semantically consistent N-grams to add new words in LVCSR systems. In
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
(ICASSP), 2011.
- Camille Guinaudeau,
Guillaume Gravier, and Pascale Sébillot. Improving
ASR-based topic segmentation of TV programs with confidence measures
and semantic relations.
In Proc. Annual Conf. of the Intl. Speech Communication Association
(Interspeech),
2010.
- Stéphane
Huet, Guillaume
Gravier, and Pascale Sébillot. Improvement of automatic
speech recognitionsystems with morpho-syntax applied to French. Computer
Speech and Language, (24):663-684,
2010.
- Armando
Muscariello, Guillaume
Gravier, and Frédéric Bimbot. Audio
keyword
extraction by unsupervised word discovery. In Conf.
of the Intl. Speech Communication Association (Interspeech), pages
2843-2846, 2009.
- Sylvain Galliano,
Guillaume Gravier, and Laura Chaubard. The ESTER
2 evaluation campaign for the rich transcription of French radio
broadcasts.
In Proc. Annual Intl. Speech Communication Association
Conference (Interspeech),
pages 2583-2586,
2009.
- Manolis
Delakis, Guillaume
Gravier, and Patrick Gros. Audiovisual Integration with
Segment Models for Tennis Video Parsing. Computer Vision
and Image Understanding, 111(2):142-154, August 2008.
- Siwar
Baghdadi, Guillaume
Gravier, Claire-Hélène Demarty, and Patrick Gros. Structure
learning in
Bayesian network based video indexing. In IEEE Intl.
Conf. on Multimedia and Exhibition, pages 667-680, 2008.
- Wen Xuan
Teng, Guillaume Gravier,
Frédéric Bimbot, and Frédéric Soufflet. Speaker
adaptation by
variable reference model subspace and application to large vocabulary
speech recognition. In IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing, pages 4381-4384, April 2009.
- Gwénolé
Lecorvé, Guillaume Gravier, and Pascale Sébillot. An
unsupervied
Web-based topic language model adaptation method. In IEEE
Intl. Conf. on Acoustics, Speech and Signal Processing, pages
5081-5084, April 2008..
Check out my complete
list of publications.
Short bio
I obtained a master degree in Applied Mathematics at the Institut National des Sciences
Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I
received a Ph. D. in Signal and Image Processing (Toward speech
modeling with Markov random fields) at the Ecole
National Superieure des Telecommunications (ENST Paris) in 2000.
After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology
group at IBM T. J. Watson research center from 2001 to 2002. Since
2002, I am a research fellow at the Centre
National pour la Recherche Scientifique (CNRS), working at the Institut de Recherche en Informatique et
Systèmes Aléatoires (IRISA). I received the
Habilitation à Diriger des Recherches (HDR) de
l'Université de Rennes 1, spécialité Informatique,
in 2009.
Guillaume
Gravier, Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71
firstname.secondname@irisa.fr