|
|
François COSTE
Research scientist (CR1), Symbiose team
Address : Symbiose, IRISA,
Campus de Beaulieu,
35042 Rennes Cedex,
France
Phone : (33|0) 2 99-84-74-91
Secretary : (33|0) 2 99-84-73-34
Fax : (33|0) 2 99-84-71-71
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
|
News
Learning grammars and application to linguistic modelling of biological sequences
I gave a tutorial on this subject at the tutorial day organized for the 10th anniversary of ICGI (ICGI'10). Here are the slides (6.3M) and the related bibliography.
Software
- Protomata Learner infers automata to model families of protein sequences. You can use it through a web interface on the Genouest Bioinformatics platform server. Here are some slides (4.4M) of its presentation at Gen2bio 2008.
We are working on a new version which will be soon available: stay tuned!
Grammatical Inference Benchmarks and Competitions
|
|
|
- I am making up a grammatical inference benchmarks repository (GIB): don't hesitate to
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
with your own data sets, especially real world ones !.
- I am maintaining the Gowachin server, a continuation of the Abbadingo One DFA learning competition, allowing to generate parametrized problems. I have also co-organized Omphalos, the competition on learning context-free languages, which is now over but the data sets are still available...
If you are interested in grammatical inference competitions, you should have a look at the 2010 competitions: Zulu and Stamina
|
PhD Students
-
Gaelle Garet, Discovery of enzymatic functions in the framework of formal languages (with Jacques Nicolas).
Former PhD Students:
- Matthias Gallé, Searching for Compact Hierarchical Structures in DNA by means of the Smallest Grammar Proble january 2011
- Goulven Kerbellec, Apprentissage d'automates modélisant des familles de séquences protéiques, june 2008
- Marie Lahaye, Apprentissage de signatures topologiques de protéines. Marie is gone too soon, but we are not forgetting her...
- Aurélien Leroux, Inférence grammaticale sur des alphabets ordonnés, june 2005 (main supervisor: Jacques Nicolas).
- Daniel Fredouille, Inférence d'automates finis non déterministes par gestion de l'ambiguïté, en vue d'applications en bioinformatique, oct 2003
Projects
I am currently involved in the following projects:
- ANR LepidOLF: Microgénomique de la sensille phéromonale d’un
lépidoptère : une approche novatrice pour comprendre les mécanismes
olfactifs et leur modulation
- ANR Pelican : Competing for light in the ocean: An integrative genomic approach of the ecology, diversity and evolution of cyanobacterial pigment types in the marine environment
- Collaboration MINCyT (ex SECyT) - INRIA with the "Grupo de Procesamiento de Lenguaje Natural " of Gabriel Infante-Lopez: Modélisation linguistique de séquences génomiques par apprentissage de grammaires
Previous projects:
- ANR Proteus: Reconnaissance de pli et repliement inverse : vers une prédiction à grande échelle des structures de protéines
- ANR Modulome: Deciphering and modelling the structural organization of genomes
Teaching
Selected publications( A more exhaustive list is available here )
-
Searching for Smallest Grammars on Large Sequences and Application to DNA,
Rafael Carrascosa, François Coste, Matthias Gallé, Gabriel Infante-Lopez,
Journal of Discrete Algorithms, in press, available online, 2011
-
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
Rafael Carrascosa, François Coste, Matthias Gallé, Gabrie Infante-Lopez
Algorithms, 4 (2011) 262-284
extended and more formal version of the paper presented at LATA 2010 Choosing word occurrences for the smallest grammar problem
- Modelling Biological Sequences by Grammatical Inference,
François Coste,
ICGI 2010 Tutorial Day
paper, slides
- In place update of suffix array while recoding words,
Matthias Gallé, Pierre Peterlongo and François Coste
International Journal of Foundation of Computer Science, vol. 20, Issue 6, 2009,
pp. 1025-1045
abstract, paper
extended version of paper presented at PSC 2008 (abstract, paper, slides)
supplementary material (code, data sets, experiments)
- Learning Automata on Protein Sequences, François Coste and Goulven Kerbellec, JOBIM 2006.
abstract, paper, slides (.pdf)
- A Similar Fragments Merging Approach to Learn Automata on Proteins , François Coste and Goulven Kerbellec, ECML 2005.
abstract, paper, extended version, data sets
Some recent slides presenting this work and more at a grammatical inference workshop: slides, 4 per pages for printing
- Progressing the State-of-the art in Grammatical Inference by Competition, Brad Starkie, François Coste and Menno van Zaanen, AI Communications, vol. 18, no. 2, 2005, pp. 93-115.
abstract, paper, slides (.ppt) presented at ICGI 2004
- Introducing Domain and Typing Bias in Automata Inference, François Coste, Daniel Fredouille, Christopher Kermorvant and Colin de la Higuera. ICGI 2004.
abstract, paper, slides (.ppt, 2.2MB)
- Mutually compatible and incompatible merges for the search of the smallest consistent DFA, John Abela, François Coste and Sandro Spina. ICGI 2004.
abstract, paper, slides (.ppt)
- What is the Search Space for the Inference of Non Deterministic, Unambiguous and Deterministic Automata ? François Coste, Daniel Fredouille, Techn. Report, RR-4907, 2003
- Efficient ambiguity detection in C-NFA, a step toward inference of non deterministic automata , François Coste, Daniel Fredouille, ICGI 2000, Grammatical inference: algorithms and applications, Lisbonne , 25-38 , september , 2000
paper (ps.gz) benchmark (.tar.gz).
Classification ambiguity!
Ph.D. Thesis
Apprentissage d'automates classifieurs en inférence grammaticale, IRISA/Université de Rennes 1, 27 janvier 2000.
Advisor: Jacques Nicolas.
abstract (English and French) , thesis (.ps.gz, .pdf, errata), slides ( .ps.gz, .pdf).
|
|