|
Goto: Research, Publications, Teaching, Software
I succesfully defended my thesis in Februrary 2011, and I am doing now a postdoc at XRCE with the MLDAT team. This page will probably become increasingly out-of-date.
Research Interests
Grammatical Inference & DNA sequences
I look what context-free grammars can tell us about DNA sequences. Specifically, I infer straight-line grammars to represent such sequences and see what they reveal on the underlying structure.
In our latest work we focus on the Smallest Grammar Problem, and we split the problem of searching for such grammars into two. This permits us to define a search space, and we define algorithms that traverse it. We improve state-of-the-art approximation algorithms for the Smallest Grammar Problem: our experiments reveal that not only the grammars we found are smaller, but that the structures they unveil are dramatically different.
Text Algorithms
We propose an algorithm to update a suffix array while replacing some of the occurrences of a word in the indexed text. This permits to accelerate SLG algorithms. More details and a some experiments can be found here.
More recently, I focused on a more specific class of repeats, called largest-maximal repeats and their relationship with a similar class for rigid patterns. Contact me if you want to know more about this...
Data Compression
Straight-line grammars have been used for compressing sequences in a framework called Grammar Based Codes. I extended this concept beyond context-freeness and have some promising results on compression of DNA sequences.
Publications
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.
2011 |
|
R. Carrascosa, F. Coste, M. Gallé, G. Infante-Lopez
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing. [pdf]
|
2011 |
|
R. Carrascosa, F. Coste, M. Gallé, G. Infante-Lopez
Searching for Smallest Grammars on Large Sequences and Application to DNA Accepted for publication at Journal of Discrete Algorithms [pdf]
|
2009 |
|
M. Gallé, P. Peterlongo, F. Coste
In-Place Update of Suffix Array While Recoding Words. [pdf] [bib] [html]
International Journal of Foundations of Computer Science 20-6. pages 1025-1045.
|
2010
|
|
M. Gallé
A New Tree Edit Distance for Structural Comparison of Sequences. [pdf] [bib]
Dagstuhl Seminar Proceedings: "Structure Discovery in Biology: Motifs, Networks & Phylogenies"
|
2010 |
|
R. Carrascosa, F. Coste, M. Gallé and G. Infante-Lopez
Choosing Word Occurrences for the Smallest Grammar Problem. [pdf] [slides] [bib]
In Proceedings of the 4th International Conference on Language and Automata Theory and Applications. pages 154-165.
|
2008 |
|
M. Gallé, P. Peterlongo, F. Coste
In-Place Update of Suffix Array While Recoding Words preliminary version. [pdf] [slides] [bib]
Prague Stringology Club 2008
|
2011 |
|
M. Gallé
Searching for Compact Hierarchical Structures in DNA by means of the Smallest Grammar Problem [contact me for the pdf] [slides]
PhD thesis. U. Rennes 1
|
2007 |
|
M. Gallé
Efficient search of similar instances [pdf (spanish)] [slides (spanish)]
Masther thesis. Universidad Nacional de Córdoba
|
See also the final report of internship of:
2010
M Perrin
|
Compression de sèquences d'A.D.N. à base de grammaires minimales [pdf (fr] [slides (fr)] |
ENS, 3rd year |
Teaching
Software
If you want some really efficient software to calculate exact repetas, maximal, largest maximal or irreducible rigid patterns (besides any program used in one of my publications, contact me.
|
|