|
Page 2 of 2
|
|
a.fr
|
Address |
Symbiose - Room A 109 |
|
INRIA/Irisa - Campus de Beaulieu |
|
35042 RENNES Cedex - France |
Tel |
+33 2 99 84 73 12 |
Fax |
+33 2 99 84 71 71 |
|
|
|
|
Current Position |
Senior Research Scientist at INRIA |
Research interests
- Bioinformatics
- Algorithmics on sequences
- Syntactical Analysis of Biological Sequences
- Pattern Discovery
- Machine Learning
- Grammatical Inference
- Version spaces, Decision trees, Inductive Logic Programming
- Logic Programming
- Prolog
- Answer Set Programming
My background is in Computer Science and Machine Learning.
I have started my research work in 1984 in a team interested in Knowledge Representation (KR), mainly on various logical frameworks. My PhD thesis has focused on the issue of generalization within the framework of Version spaces and with a representation language that was a decidable subset of first order predicate logic (the so-called Bernays Schonfinkel class). The aim was to produce deductively a formula that was a consequence of a given set of formulae (positive and negative instances of a concept to be learned). Generalization was achieved using a set of elementary operators and a dedicated theorem prover. I have continued this work in the field of Artificial Intelligence during several years.
I have discovered issues of knowledge representation and classification in Biology with the decisive encounter of J. Lebbe and R. Vignes in 1989 and Molecular Biology and Bioinformatics through a summer school in Paris in 1998, thanks to people like A. Danchin, A. Henault and J.-L. Risler. This has been quite a revelation and I am trying since then to share and pass on this enthusiasm. Bioinformatics is not only an opportunity to meet people in many scientific fields and to be introduced in the richness of the various mechanisms of life: it is also a source of challenging problems in computer science.
Helping in modelling is a key role of the bioinformatician. My basic line of research follows the idea that unlike many chemical or physical processes, the biological mechanisms are largely governed by a logic of discrete behaviours. This follows from the compact, hierarchical architecture of cells and the importance of relations between components that are characteristics of living organisms. In such a context, I am convinced that symbolic techniques have to play a major part in the study of life, wether in combinatorial data analysis, in machine learning or in automated reasoning. I am mainly interested in macromolecular sequences and studying explicit models relating sequences to structures or functions. I try to develop the point of view of the theory of languages in the analysis of sequences, with the double aim of formalizing meaningful classes and to give access to the biologists to the power of expressive languages.
I am particularly in charge of the research axis "Analysis of sequences with formal languages" in Symbiose. I am interested in syntactical modeling either on nucleic or proteic sequences. This axis is made up two sections.
The first one studies the formal and practical consequences of considering sequences of proteins in Grammatical Inference. The aim is to learn relevant characteristic models from sets of sequences that are known to belong to a target family or on the contrary, not to belong to this family. I have supervised several thesis on this topic, including difficult questions like "how to infer non-deterministic automata, since they seem more adapted to the expression of biological models than deterministic ones?" (D. Fredouille), "how to take into account a partially ordered structure on the subsets of the alphabet during inference, each subset reflecting some physico-chemical property on amino acids?"(A. Leroux) or "how to learn non-regular patterns such as contextual structures met in disulfide bonds in proteins"(I. Jacquemin).
The second part considers that the construction of the model is in charge of the biologist and the challenge is then to offer him/her a language of maximal expressivity while allowing whole genome analysis (billions of letters). Our approach is to compile data into efficient data structures like suffix arrays and to develop parsers on top of an abstract machine running on this data structure. Concerning expressivity, we develop researches on a logical string variable language, allowing to handle in an abstract way a string and its transformations. We have already validated such a framework on several biological issues: discovery of dog olfactive receptors, discovery of human beta-defensins or discovery of transposons in A. thaliana.
Selected Publications
Researches in Computational biology in Europe
Quick overview of the research axes of Symbiose
Linguistic Analysis of biological sequences
The successor of Stan, Logol, is now available on the Genouest web site and is the best available tool to date with this level of expressivity for parsing whole genomes.
We are also interested by the lexical level and the detection of genomic repeats (modules) within structures such as transposable elements or Crisprs.
An exploratory tool for repeats mining in genomes and its application to the detection of genomic transfers between viruses and bacteria.
- Domain organization within repeated DNA sequences: application to the study of a family of transposable elements., S. Tempel, M. Giraud, D. Lavenier, I.-C. Lerman, A.-S. Valin, I. Couee, A. E. Amrani and J. Nicolas, Bioinformatics, 2006, vol. 22, no 16, p. 1948 – 1954.
- Model-based Identification of Helitrons Results in a New Classification of Their Families in Arabidopsis thaliana, S. Tempel; J. Nicolas; A. El Amrani and I. Couée, Gene, 403(1-2):1299-1305 2007.
Segmentation of a family of genomic sequences into meaningful domains applied to the analysis of mobile genetic elements
Habilitation Document (HDR, in french)
Papers submitted
- Local and Maximal Repeats J. Nicolas; C. Rousseau; A. Siegel; P. Peterlongo; F. Coste; P. Durand; S. Tempel; A.-S. Valin; F. Mahé.
Pattern discovery in biological sequences
A review of the state of the art
- Disulfide bonds prediction using inductive logic programming I Jacquemin and J Nicolas In: Workshop on Constraint Based Methods for Bioinformatics, WCB, Sitges, Spain, pages 56-65 (2005).
- Cooperative metaheuristics for exploring proteomic data. R Gras, D Hernandez, P Hernandez, N Zangger, Y Mescam, J Frey, O Martin, J Nicolas, and R Appel. Artificial Intelligence Review. 20(1):95-120., 2004
- Genome wide distribution and potential regulatory functions of AtATE, a novel miniature inverted-repeat transposable element that is present in the promoter region of one of the Arginine Decarboxylase genes in Arabidopsis thaliana, A. Elamrani, L. Marie, A. Aïnouche, J. Nicolas, I. Couée. Molecular Genetics and Genomics, 267, 2001, p. 459-471. (http)
- A symbolic-numeric approach to find patterns in genomes : Application to the translation initiation sites of E. coli. C. Delamarche, P. Guerdoux-Jamet, R. Gras and J. Nicolas, Biochimie, 81, Elsevier, 1999. (http)
Machine learning applied to
Gene Discovery
- The dog and rat olfactory receptor repertoires, P Quignon, M Giraud, M Rimbault, P Lavigne, S Tacher, E Morin, E Retout, A S Valin, K Lindblad-Toh, J Nicolas, and F Galibert . Genome Biology 6(10):R83. 2005
More than 1000 olfactory receptor genes discovered in a non assembled version (36 M sequences) of dog genome.
TrackProt: Looking for new Human beta-defensins in whole genomes, with a syntactical approach J. Nicolas, F. Bourgeon, Y. Bastide, G. Ranchy , C. Alland, F. Aubry, Y. Mescam, B. Jegou and C. Pineau.
More than 30 new Human beta-defensins (anti-microbial peptides) have been discovered and validated.
Metabolomics
Theorem proving
Grammatical Inference
A study on grammatical inference in the framework of logic programming
- How considering incompatible state mergings may reduce the DFA induction search tree, F.Coste, J.Nicolas, Fourth International Colloquium on Grammatical Inference (ICGI'98), Ames Iowa, USA, 1998. (abstract, compressed postscript)
- Inference of finite automata: reducing the search space with an ordering of pairs of states, F.Coste, J.Nicolas, 10th European Conference on Machine Learning (ECML'98), Chemnitz, Germany, 1998. (abstract, compressed postscript)
- Regular Inference as a graph coloring problem, F.Coste and J.Nicolas, ICML97, Grammatical Inference Workshop, Nashville TN, USA, 1997.postcript, compressed postscript)
Clustering
- Sequence classification of water channels and related proteins in view of functional predictions. Basavanneppa Tallur, Jacques Nicolas, A. Froger, D. Thomas et C. Delamarche, Theoretical chemistry accounts, 1998.
- A method for classifying unaligned biological sequences. B. Tallur and J. Nicolas, in IFCS-96: Data Science, Classification and Related Methods, Springer Verlag, Tokyo, 1997.
- Twelve numerical, symbolic and hybrid supervised classification methods. O. Gascuel, B. Bouchon-Meunier, G. Caraux, P. Gallinari, A. Guénoche, Y. Guermeur, Y. Lechevallier, C. Marsala, L. Miclet, J. Nicolas, R. Nock, M. Ramdani, M. Sebag, Basavanneppa Tallur, G. Venturini et P. Vitte, « », Int. J. of Pattern Recognition and Artificial Intelligence, 12, n° 5, 1998, pages 517-572. (http)
See all publications.
Former PhD students
Recent Projects
Basic Lab.: Biocellular Assistant on a Silicium Intelligent Chip
- This is a very preliminary project aiming at integrating bioinformatics in lab on chips. The challenge is to control an experimental micro or nano-scaled device with automatic reasoning capacities. See the superb project of R. King and al. on the Robot Scientist Adam. We are interested to try similar approaches: probably more to come here next year...
Teaching
Cursus
Position
- Since 2002 Team Leader of Inria Project Symbiose (Bioinformatics, 27 people)
- Since 2002 Head of Bioinformatics for Ouest Genopole (a consortium of more than 50 public laboratories -mostly biological labs- for large scale analysis in genomics and post-genomics).
- 1998-2001 Team Leader of Inria Project Aïda (Artificial Intelligence, Machine Learning and Diagnosis, 34 people)
- 1988-1997 Member of Inria Project Repco (Knowledge representation, Team Leader Philippe Besnard)
Panels
- Member of the Scientific and Research Council of Ouest genopole since jan. 2002
- Member of the Scientific and Research Council of department MIA INRA since oct. 2002
- Member of the Scientific and Research Council of « Animal Bioinformatics» INRA since janv. 2006
- Member of the program committee of JOBIM and ICGI
Formation
- 1987 : PhD thesis in Computer Science, University of Rennes
Links
|
|