All publications sorted by year |
2022 |
We study some learnability problems in the family of Categorial Dependency Grammars (CDG), a class of categorial grammars defining dependency structures. CDG is a formal system, where types are attached to words, combining the classical categorial grammars' elimination rules with valency pairing rules defining non-projective (discontinuous) dependencies; very importantly, the elimination rules are naturally extended to the so called "iterated dependencies" expressed by a specific type constructor and related elimination rules. This paper first reviews key points on negative results: even the rigid (one type per word) CDG cannot be learned neither from function/argument structures, nor even from dependency structures themselves. Such negative results prove the impossibility to define a learning algorithm for these grammar classes. Nevertheless, we show that the CDG satisfying reasonable and linguistically valid conditions on the iterated dependencies are incrementally learnable in the limit from dependency structures. We provide algorithms and also discuss these aspects for recent variants of the formalism that allow the inference of CDG from linguistic treebanks. |
La langue géorgienne possède un système verbal complexe, à la fois agglutinant et flexionnel, avec de nombreusesirrégularités. Les formes fléchies d'un verbe peuvent être très différentes les unes des autres. Il faut une bonne connaissance de la grammaire géorgienne pour remonter à l'infinitif (le lemme d'accès des dictionnaires le plus fréquent). L'accès aux dictionnaires pour les débutants est, de ce fait, très difficile. De plus, il n'y a pas de consensus parmi les lexicographes du Géorgien sur les lemmes qui représentent un verbe dans les dictionnaires,ce qui complexifie encore davantage les accès.Nous proposons Kartu-Verbs, une base de formes fléchies de verbes géorgiens accessible par un système d'informations logiques. Cette démonstration montre comment, à partir de n'importe quelle forme fléchie, on peut trouver le lemme pertinent pour accéder à n'importe quel dictionnaire. Kartu-Verbs peut, ainsi,être utilisé comme une interface aux dictionnaires géorgiens. |
This paper advocates for the integration of environmental aspects in computing curricula, with a focus on higher education. We created knowledge-based curriculum specifications in order to help teachers who wish to add knowledge foundation on computing impacts. This document lists topics and references that can be integrated into curricula. We implemented it in several higher education institutions. This paper reports on our experience and feedback. We also discuss recommendations to overcome obstacles that, from our experience, are often faced when modifying computing curricula to integrate environmental challenges. |
2021 |
The specification of technical systems is a complex and error-prone task. From a methodological point of view, the expected characteristics must be rigorously specified. In practice, specifications group together the desired properties in the form of a list of rules to be checked, called requirements. The challenge of this book is to build an analysis process for application on textual documents, written in a natural language such as English. The targeted implementation is an end-to-end automated processing chain, and integrating faculties of interpretation and reasoning on specification data. Precisely, we propose to study and experiment how to link natural language statements to formal models that can be exploited in a theoretical framework. First, the principle of semantic transduction is advanced to extract and formalize natural language statements. In a second step, the algebraic properties of specification models are studied to define a theory to verify the consistency of requirements. |
A growing part of Big Data is made of knowledge graphs. Major knowledge graphs such as Wikidata, DBpedia or the Google Knowledge Graph count millions of entities and billions of semantic links. A major challenge is to enable their exploration and querying by end-users. The SPARQL query language is powerful but provides no support for exploration by end-users. Question answering is user-friendly but is limited in expressivity and reliability. Navigation in concept lattices supports exploration but is limited in expressivity and scalability. % In this paper, we introduce a new exploration and querying paradigm, Abstract Conceptual Navigation (ACN), that merges querying and navigation in order to reconcile expressivity, usability, and scalability. ACN is founded on Formal Concept Analysis (FCA) by defining the navigation space as a concept lattice. We then instantiate the ACN paradigm to knowledge graphs (Graph-ACN) by relying on Graph-FCA, an extension of FCA to knowledge graphs. We continue by detailing how Graph-ACN can be efficiently implemented on top of SPARQL endpoints, and how its expressivity can be increased in a modular way. Finally, we present a concrete implementation available online, Sparklis, and a few application cases on large knowledge graphs. |
The open nature of Knowledge Graphs (KG) often implies that they are incomplete. Knowledge graph completion (aka. link prediction) consists in inferring new relationships between the entities of a KG based on existing relationships. Most existing approaches rely on the learning of latent feature vectors for the encoding of entities and relations. In general however, latent features cannot be easily interpreted. Rule-based approaches offer interpretability but a distinct ruleset must be learned for each relation. In both latent- and rule-based approaches, the training phase has to be run again when the KG is updated. We propose a new approach that does not need a training phase, and that can provide interpretable explanations for each inference. It relies on the computation of Concepts of Nearest Neighbours (C-NN) to identify clusters of similar entities based on common graph patterns. Different rules are then derived from those graph patterns, and combined to predict new relationships. We evaluate our approach on standard benchmarks for link prediction, where it gets competitive performance compared to existing approaches. |
During the last decade, the need for reliable and massive Knowledge Graphs (KG) increased. KGs can be created in several ways: manually with forms or automatically with Information Extraction (IE), a natural language processing task for extracting knowledge from text. Relation Extraction is the part of IE that focuses on identifying relations between named entities in texts, which amounts to find new edges in a KG. Most recent approaches rely on deep learning, achieving state-of-the-art performances. However, those performances are still too low to fully automatize the construction of reliable KGs, and human interaction remains necessary. This is made difficult by the statistical nature of deep learning methods that makes their predictions hardly interpretable. In this paper, we present a new symbolic and interpretable approach for Relation Extraction in texts. It is based on a modeling of the lexical and syntactic structure of text as a knowledge graph, and it exploits extit{Concepts of Neighbours}, a method based on Graph-FCA for computing similarities in knowledge graphs. An evaluation has been performed on a subset of TACRED (a relation extraction benchmark), showing promising results. |
Graph pattern mining algorithms ease graph data analysis by extracting recurring structures. However, classic pattern mining approaches tend to extract too many patterns for human analysis. Recently, the GraphMDL algorithm has been proposed, which reduces the generated pattern set by using the \emph{Minimum Description Length} (MDL) principle to select a small descriptive subset of patterns. The main drawback of this approach is that it needs to first generate all possible patterns and then sieve through their complete set. In this paper we propose GraphMDL+, an approach based on the same description length definitions as GraphMDL but which tightly interleaves pattern generation and pattern selection (instead of generating all frequent patterns beforehand), and outputs a descriptive set of patterns at any time. Experiments show that our approach takes less time to attain equivalent results to GraphMDL and can attain results that GraphMDL could not attain in feasible time. Our approach also allows for more freedom in the pattern and data shapes, since it is not tied to an external approach. |
Interpreting data with many attributes is a difficult issue. A simple 2D display, projecting two attributes onto two dimensions, is relatively easy to interpret but provides limited help to see multidimensional correlations. We propose a tool, ComVisMD, which displays, from a dataset, five dimensions in compact 2D maps. A map contains cells; each one represents an object from the dataset. In addition to the usual horizontal and vertical projections and the use of colors, we offer holes and shapes. In order to compact the display, we partition objects according to two dimensions, grouping values of each dimension into up to seven categories. In this paper, we present two case studies covering two different domains, a cricket player dataset and a heart disease dataset. The cricket dataset has 15 attributes and 2170 objects. We show how, using ComVisMD, correlations between variables can be found in an intuitive way. The heart disease dataset has 14 attributes and 297 objects. Blokh and Stambler, in the June 2015 issue of ``Aging and Disease,'' state that individual attributes show little correlation with heart disease. Yet in combination the correlation improves dramatically. We show how ComVisMD helps visualize those multidimensional correlations between four attributes and heart disease diagnosis. |
The results of a SPARQL query are generally presented as a table with one row per result, and one column per projected variable. This is an immediate consequence of the formal definition of SPARQL results as a sequence of mappings from variables to RDF terms. However, because of the flat structure of tables, some of the RDF graph structure is lost. This often leads to duplicates in the contents of the table, and difficulties to read and interpret results. % We propose to use nested tables to improve the presentation of SPARQL results. A nested table is a table where cells may contain embedded tables instead of RDF terms, and so recursively. We introduce an automated procedure that lifts flat tables into nested tables, based on an analysis of the query. We have implemented the procedure on top of Sparklis, a guided query builder in natural language, in order to further improve the readability of its UI. It can as well be implemented on any SPARQL querying interface as it only depends on the query and its flat results. We illustrate our proposal in the domain of pharmacovigilance, and evaluate it on complex queries over Wikidata. |
As more and more data are available as RDF graphs, the availability of tools for data analytics beyond semantic search becomes a key issue of the Semantic Web. Previous work require the modelling of data cubes on top of RDF graphs. We propose an approach that directly answers analytical queries on unmodified (vanilla) RDF graphs by exploiting the computation features of SPARQL~1.1. We rely on the NAF design pattern to design a query builder that completely hides SPARQL behind a verbalization in natural language; and that gives intermediate results and suggestions at each step. Our evaluations show that our approach covers a large range of use cases, scales well on large datasets, and is easier to use than writing SPARQL queries. |
2020 |
In this chapter, we introduce Formal Concept Analysis (FCA) and some of its extensions. FCA is a formalism based on lattice theory aimed at data analysis and knowledge processing. FCA allows the design of so-called concept lattices from binary and complex data. These concept lattices provide a realistic basis for knowledge engineering and the design of knowledge-based systems. Indeed, FCA is closely related to knowledge discovery in databases, knowledge representation and reasoning. Accordingly, FCA supports a wide range of complex and intelligent tasks among which classification, information retrieval, recommendation, network analysis, software engineering and data management. Finally, FCA is used in many applications demonstrating its growing importance in data and knowledge sciences. |
Since the birth of the Semantic Web, numerous knowledge bases have appeared. The applications that exploit them rely on the quality of their data through time. In this regard, one of the main dimensions of data quality is conformance to the expected usage of the vocabulary. However, the vocabulary usage (i.e., how classes and properties are actually populated) can vary from one base to another. Moreover, through time, such usage can evolve within a base and diverges from the previous practices. Methods have been proposed to follow the evolution of a knowledge base by the observation of the changes of their intentional schema (or ontology); however, they do not capture the evolution of their actual data, which can vary greatly in practice. In this paper, we propose a data-driven approach to assess the global evolution of vocabulary usage in large RDF graphs. Our proposal relies on two structural measures defined at different granularities (dataset vs update), which are based on pattern mining techniques. We have performed a thorough experimentation which shows that our approach is scalable, and can capture structural evolution through time of both synthetic (LUBM) and real knowledge bases (different snapshots and updates of DBpedia). |
Knowledge graphs offer a versatile knowledge representation, and have been studied under different forms, such as conceptual graphs or RDF graphs in the Semantic Web. A challenge is to discover conceptual structures in those graphs, in the same way as Formal Concept Analysis (FCA) discovers conceptual structures in tables. FCA has been successful for analysing, mining, learning, and exploring tabular data, and our aim is to help transpose those results to graph-based data. Previous several FCA approaches have already addressed relational data, hence graphs, but with various limits. We propose Graph-FCA as an extension of FCA where a dataset is a hypergraph instead of a binary table. We show that it can be formalized simply by replacing objects by tuples of objects. This leads to the notion of "n-ary concept", whose extent is an n-ary relation of objects, and whose intent is a "projected graph pattern". In this paper, we formally reconstruct the fundamental results of FCA for knowledge graphs. We describe in detail the representation of hypergraphs, and the operations on them, as they are much more complex than the sets of attributes that they extend. We also propose an algorithm based on a notion of "pattern basis" to generate and display n-ary concepts in a more efficient and more compact way. We explore a few use cases, in order to study the feasibility and usefulness of Graph-FCA. We consider two use cases: workflow patterns in cooking recipes and linguistic structures from parse trees. In addition, we report on experiments about quantitative aspects of the approach. |
Plusieurs algorithmes de fouille de motifs ont été proposés pour identifier des structures récurrentes dans les graphes. Le principal défaut de ces approches est qu'elles produisent généralement trop de motifs pour qu'une analyse humaine soit possible. Récemment, des méthodes de fouille de motifs ont traité ce problème sur des données transactionnelles, séquentielles et relationnelles en utilisant le principe MDL (Minimum Description Length). Dans ce papier, nous proposons une approche MDL pour sélectionner un sous-ensemble représentatif de motifs sur des graphes non-orientés étiquetés. Une notion clé de notre approche est l'introduction de ports pour encoder les connections entre occurrences de motifs, sans perte d'information. Nos expériences montrent que le nombre de motifs est drastiquement réduit et que les motifs sélectionnés peuvent avoir des formes complexes. |
Pattern mining algorithms allow to extract structures from data to highlight interesting and useful knowledge. However, those approaches can only be truly helpful if the users can actually understand their outputs. Thus, visualization techniques play a great role in pattern mining, bridging the gap between the algorithms and the users. In this demo paper we propose GraphMDL Visualizer, a tool for the interactive visualization of the graph patterns extracted with GraphMDL, a graph mining approach based on the MDL principle. GraphMDL Visualizer is structured according to the behavior and needs of users when they analyze GraphMDL results. The tool has different views, ranging from more general (distribution of pattern characteristics), to more specific (visualization of specific patterns). It is also highly interactive, allowing the users to customize the different views, and navigate between them, through simple mouse clicks. GraphMDL Visualizer is freely available online. |
Many graph pattern mining algorithms have been designed to identify recurring structures in graphs. The main drawback of these approaches is that they often extract too many patterns for human analysis. Recently, pattern mining methods using the Minimum Description Length (MDL) principle have been proposed to select a characteristic subset of patterns from transactional, sequential and relational data. In this paper, we propose an MDL-based approach for selecting a characteristic subset of patterns on labeled graphs. A key notion in this paper is the introduction of ports to encode connections between pattern occurrences without any loss of information. Experiments show that the number of patterns is drastically reduced. The selected patterns have complex shapes and are representative of the data. |
The Georgian language has a complex verbal system, both agglutinative and inflectional, with many irregularities. Inflected forms of a given verb can differ greatly from one another and it is still a controversial issue to determine which lemmas should represent a verb in dictionaries. Verb tables help people to track lemmas starting from inflected forms but these tables are tedious and error-prone to browse. We propose Kartu-Verbs, a Semantic Web base of inflected Georgian verb forms. For a given verb, all its inflected forms are present. Knowledge can easily be traversed in all directions: from Georgian to French and English; from an inflected form to a masdar (a verbal noun, the form that comes closest to an infinitive), and conversely from a masdar to any inflected form; from component(s) to forms and from a form to its components. Users can easily retrieve the lemmas that are relevant to access their preferred dictionaries. Kartu-Verbs can be seen as a front-end to any Georgian dictionary, thus bypassing the lemmatization issues. |
The Georgian language has a complex verbal system, both agglutinative and inflectional, with many irregularities. Inflected forms of a given verb can differ greatly from one another and it is still a controversial issue to determine which lemmas should represent a verb in dictionaries. Verb tables help people to track lemmas starting from inflected forms but these tables are tedious and error-prone to browse. We propose Kartu-Verbs, a Semantic Web base of inflected Georgian verb forms. For a given verb, all its inflected forms are present. Knowledge can easily be traversed in all directions: from Georgian to French and English; from an inflected form to a masdar (a verbal noun, the form that comes closest to an infinitive), and conversely from a masdar to any inflected form; from component(s) to forms and from a form to its components. Users can easily retrieve the lemmas that are relevant to access their preferred dictionaries. Kartu-Verbs can be seen as a front-end to any Georgian dictionary, thus bypassing the lemmatization issues. |
Tables are a common form of query results, notably in SPARQL. However, due to the flat structure of tables, all structure from the RDF graph is lost, and this can lead to duplications in the table contents, and difficulties to interpret the results. We propose an extension of SPARQL 1.1 aggregations to get nested results, i.e. tables where cells may contain embedded tables instead of RDF terms, and so recursively |
As more and more data are available as RDF graphs, the availability of tools for analytical queries beyond semantic search becomes a key issue of the Semantic Web. Previous work require the modelling of data cubes on top of RDF graphs. We propose an approach that directly answers analytical queries on unmodified RDF graphs by exploiting the computation features of SPARQL 1.1 (aggregations, expressions). We rely on the NAF design pattern to design a query builder user interface that is user-friendly by completely hiding SPARQL behind a verbalization in natural language; and responsive by giving intermediate results and suggestions at each step. Our evaluations show that our approach covers a large range of use cases, and scales well on large datasets. |
Les concepts de voisins définissent une forme symbolique de similarité entre les entités d'un graphe de connaissances. Partant d'une entité, chaque concept de voisins est un cluster d'entités voisines partageant un même motif de graphe centré sur l'entité. Dans ce papier démo, nous rappelons les définitions des concepts de voisins et nous présentons une extension de la librairie Jena dont l'API permet de calculer les concepts de voisins pour un modèle RDF(S) Jena. Nous présentons également une interface graphique permettant à un utilisateur d'effectuer ces calculs de façon simple et interactive. |
The Knomana knowledge base brings together knowledge from the scientific literature on the use of plants with pesticidal or antibiotic effects on animals, plants, and human beings to propose protection solutions using local plants. In this literature, the elements of the 3-tuple (protected organism, protecting plant, pest) are named using the binomial nomenclature consisting of the genus name followed by the species name. In some instances, authors use the abbreviation "sp." in the singular or "spp." in the plural, as species name, to indicate the indeterminate status of the species for a guaranteed genus. To suggest protection solutions, the indeterminacy of the species has to be hypothesized based on assigning the sp./spp. to the other species in the same genus and conversely. This paper discusses the classification of ternary data containing some indeterminate values generated by three extensions of Formal Concept Analysis. |
2019 |
In this chapter, learning is viewed as a symbolic issue in an unsupervised setting,from raw or from structured data, for some variants of Lambek grammars and of categorial dependency grammars. For these frameworks, the authors present different type connectives and structures, some limitations, and some algorithms. On the experimental side, categorial grammar has potentials as a particular case of Logical Information System. |
The Georgian language has a complex verbal system, both agglutinative and inflectional, with many exceptions. It is still a controversial issue to deter- mine which lemmas should represent a verb in dictionaries. Verb tables help neophytes to track lemmas starting from inflected forms but if in paper documents they are tedious and error-prone to browse. We propose Kartu-Verbs, a Semantic Web base of inflected Georgian verb forms. For a given verb, all inflected forms are present. Knowledge can easily be traversed in all directions: from Georgian to French and English; from an inflected form to a masdar, and conversely from a masdar to any inflected form; from component(s) to forms and from a form to its components. Users can easily retrieve the lemmas that are relevant to access their preferred dictionary. Kartu-Verbs can be seen as a front-end to any Georgian dictionary, thus bypassing the lemmatization issues. This report illustrates in detail how to use Kartu-Verbs and gives some indications about how the base is built. Our base, in its current state, is already a successful proof of concept. It has proven helpful to learn about Georgian verbs. |
2018 |
Pharmacovigilance is in charge of studying the adverse effects of pharmaceutical products. In this field, pharmacovigilance specialists experience several difficulties when searching and exploring their patient data despite the existence of standardized terminologies (MedDRA). In this paper, we present our approach to enhance the way pharmacovigilance specialists perform search and exploration on their data. First, we have developed a knowledge graph that relies on the OntoADR ontology to semantically enrich the MedDRA terminology with SNOMED CT concepts, and that includes anonymized patient data from FAERS. Second, we have chosen and applied a semantic search tool, Sparklis, according to the user requirements that we have identified in pharmacovigilance. |
Pharmacovigilance is in charge of studying the adverse effects of pharmaceutical products. In this field, pharmacovigilance specialists experience several difficulties when searching and exploring their patient data despite the existence of standardized terminologies (MedDRA). In this paper, we present our approach to enhance the way pharmacovigilance specialists perform search and exploration on their data. First, we have developed a knowledge graph that relies on the OntoADR ontology to semantically enrich the MedDRA terminology with SNOMED CT concepts, and that includes anonymized patient data from FAERS. Second, we have chosen and extended a semantic search tool, Sparklis, according to the user requirements that we have identified in pharmacovigilance. We report the results of a usability evaluation that has been performed by human factors specialists to check the benefits of our proposal. |
Database information is multidimensional and often displayed in tabular format (row/column display). A Choropleth map is a thematic map in which areas are colored according to a variable of interest. They are used mostly for compact graphical representation of geographical information. We propose a system, ComVisMD, inspired by choropleth map, to visualize multidimensional data taking sets of 4 dimensions and projecting them on a compact 2D-display. The first dimension uses the attribute of main interest to color areas according to a 5-color scale. The next 2 dimensions define the displayed areas as square cells and give the horizontal and vertical axes. The fourth dimension is displayed in the form of varying-size holes in the cells. We illustrate our approach on cricket players data and show how ComVisMDs compact visualization can help analyze data and find correlations as well as explain the exceptions by the way of intuitive color observation, shape of the cells, information on cell, dynamic scaling, classification and clustering. |
Query relaxation has been studied as a way to find approximate answers when user queries are too specific or do not align well with the data schema. We are here interested in the application of query relaxation to similarity search of RDF nodes based on their description. However, this is challenging because existing approaches have a complexity that grows in a combinatorial way with the size of the query and the number of relaxation steps. We introduce two algorithms, answers partitioning and lazy join, that together significantly improve the efficiency of query relaxation. Our experiments show that our approach scales much better with the size of queries and the number of relaxation steps, to the point where it becomes possible to relax large node descriptions in order to find similar nodes. Moreover, the relaxed descriptions provide explanations for their semantic similarity. |
Controlled natural languages (CNL) have the benefits to combine the readability of natural languages, and the accuracy of formal languages. They have been used to help users express facts, rules or queries. While generally easy to read, CNLs remain difficult to write because of the constrained syntax. A common solution is a grammar-based auto-completion mechanism to suggest the next possible words in a sentence. However, this solution has two limitations: (a) partial sentences may have no semantics, which prevents giving intermediate results or feedback, and (b) the suggestion is often limited to adding words at the end of the sentence. % We propose a more responsive and flexible CNL authoring by designing it as a sequence of sentence transformations. Responsiveness is obtained by having a complete, and hence interpretable, sentence at each time. Flexibility is obtained by allowing insertion and deletion on any part of the sentence. Technically, this is realized by working directly on the abstract syntax, rather than on the concrete syntax, and by using Huet's zippers to manage the focus on a query part, the equivalent of the text cursor of a word processor. |
Sparklis is a SPARQL query builder that can connect to any endpoint, and that interacts with users in natural language only. Users are guided in the building of their queries so that they do not have to know the schema, and so that empty results are almost completely avoided. This demo paper presents a number of recent extensions to Sparklis. Most notably, it now supports analytical queries, Wikidata statement qualifiers, and the display of results on a map or as a slideshow. |
Relational Concept Analysis (RCA) has been introduced in order to allow concept analysis on multi-relational data. It significantly widens the field of application of Formal Concept Analysis (FCA), and it produces richer concept intents that are similar to concept definitions in Description Logics (DL). However, reading and interpreting RCA concept lattices is notoriously difficult. Nica {\em et al} have proposed to represent RCA intents by cpo-patterns in the special case of sequence structures. We propose an equivalent representation of a family of RCA concept lattices in the form of a hierarchy of concept graphs. Each concept belongs to one concept graph, and each concept graph exhibits the relationships between several concepts. A concept graph is generally transversal to several lattices, and therefore highlights the relationships between different types of objects. We show the benefits of our approach on several use cases from the RCA litterature. |
The quantity of event logs available is increasing rapidly, be they produced by industrial processes, computing systems, or life tracking , for instance. It is thus important to design effective ways to uncover the information they contain. Because event logs often record repetitive phenomena, mining periodic patterns is especially relevant when considering such data. Indeed, capturing such regularities is instrumental in providing condensed representations of the event sequences. We present an approach for mining periodic patterns from event logs while relying on a Minimum Description Length (MDL) criterion to evaluate candidate patterns. Our goal is to extract a set of patterns that suitably characterises the periodic structure present in the data. We evaluate the interest of our approach on several real-world event log datasets. |
This paper focuses on the construction of formal representations of natural language texts. The mapping from a natural language to a logical representation is realized with a grammatical formalism, linking the syntactic analysis of the text to a semantic representation. We target the behavioral aspect of the specifications for cyber-physical systems, ie any type of system in which software components interact closely with a physical environment. In this way, the challenge would be to provide assistance to the designer. So, we could simulate and verify, by automatic or assisted methods, "systems" specifications expressed in natural language. This paper presents some existing contributions that could enable progress on this issue. |
As more and more data are available as RDF graphs, the availability of tools for data analytics beyond semantic search becomes a key issue of the Semantic Web. Previous work has focused on adapting OLAP-like approaches and question answering by modelling RDF data cubes on top of RDF graphs. We propose a more direct -- and more expressive -- approach by guiding users in the incremental building of SPARQL~1.1 queries that combine several computation features (aggregations, expressions, bindings and filters), and by evaluating those queries on unmodified (vanilla) RDF graphs. We rely on the NF design pattern to hide SPARQL behind a natural language interface, and to provide results and suggestions at every step. We have implemented our approach on top of {\sc Sparklis}, and we report on three experiments to assess its expressivity, usability, and scalability. |
2017 |
Sparklis is a Semantic Web tool that helps users explore and query SPARQL endpoints by guiding them in the interactive building of questions and answers, from simple ones to complex ones. It combines the fine-grained guidance of faceted search, most of the expressivity of SPARQL, and the readability of (controlled) natural languages. No knowledge of the vocabulary and schema are required for users. Many SPARQL features are covered: multidimensional queries, union, negation, optional, filters, aggregations, ordering. Queries are verbalized in either English or French, so that no knowledge of SPARQL is ever necessary. All of this is implemented in a portable Web application, Sparklis, and has been evaluated on many endpoints and questions. No endpoint-specific configuration is necessary as the data schema is discovered on the fly by the tool. Online since April 2014, thousands of queries have been formed by hundreds of users over more than a hundred endpoints. |
We describe the theoretical principles that underlie the design of a software tool which could be used by judges for writing judgements and for making decisions about litigations. The tool is based on Binary Decision Diagrams (BDD), which are graphical representations of truth-valued functions associated to propositional formulas. Given a specific litigation, the tool asks questions to the judge; each question is represented by a propositional atom. Their answers, true or false, allow to evaluate the truth value of the formula which encodes the overall recommendation of the software about the litigation. Our approach combines some sort of `theoretical' or `legal' reasoning dealing with the core of the litigation itself together with some sort of `procedural' reasoning dealing with the protocol that has to be followed by the judge during the trial: some questions or group of questions must necessarily be examined and sometimes in a specific order. That is why we consider extensions of BDDs called Multi-BDDs. They are BDDs with multiple entries corresponding to the different specific issues that must necessarily be addressed by the judge during the trial. We illustrate our ideas on a case study dealing with French union trade elections, an example that has been used throughout a project with the French Cour de cassation. We end the article by sketching the architecture of a prototype software that has been developed during this project. |
(fr) Cette étude concerne les systèmes d'information en droit et propose un nouveau prototype. Nous discutons les systèmes d'information actuellement utilisés par les chercheurs en droit, en précisant leurs limites. Nous présentons les principes d'un nouveau prototype pour un meilleur système. Ce travail s'accompagne d'une première réalisation concrète, un système à facettes sémantiques, résultat de notre chaîne de traitement sur un ensemble de décisions du Conseil Constitutionnel. (en) This study concerns information systems in law and proposes a new prototype. We discuss the information systems currently used by legal researchers, including their limitations. We present the principles of a new prototype for a better system. This work is accompanied by a first concrete realization, a system with semantic facets, result of our processing chain on a set of decisions of the Constitutional Council. |
Database information is in the form of tables, that is two-dimensional row/column display. For large databases, the tabular data is difficult to read at a time for the aggregated results. Online Analytical database Processing (OLAP) proposes mechanisms to display data in aggregated forms. A choropleth map is a thematic map in which areas are colored in proportion to the measurement of a statistical variable being displayed, such as population density. They are used mostly for compact graphical representation of geographical information. We propose a system that adapts choropleth maps and the OLAP cube to visualize tabular data in a compact way. Our proposal displays multidimensional data like OLAP Cube (a multidimensional data set also called as hyper cube), where we are mapping coloring of database table record based on two attributes $a$ (first dimension) and $b$ (second dimension), mapping varying-size circles based on attribute $c$ (third dimension), mapping numbers based on attribute $d$ (fourth dimension). We illustrate our approach on Cricket players data, namely on two linked tables 'Country' and 'Player'. They have a large number of rows and columns like 16 rows, 9 columns and 251 rows and 17 columns. The visualization presented by the system reduces the size of the table by a factor of about 4, allowing users to grasp more information at a time than the bare table display. |
Nous introduisons la notion de {\em concept de voisins} comme alternative à la notion de distance numérique dans le but d'identifier les objets les plus similaires à un objet requête, comme dans la méthode des plus proches voisins. Chaque concept de voisins est composé d'une intension qui décrit symboliquement ce que deux objets ont en commun et d'une extension qui englobe les objets qui se trouvent entre les deux. Nous définissons ces concepts de voisins pour des données complexes, les graphes de connaissances, où les n{\oe}uds jouent le rôle d'objets. Nous décrivons un algorithme {\em anytime} permettant d'affronter la complexité élevée de la tâche et nous présentons de premières expérimentations sur un graphe RDF de plus de 120.000 triplets. |
(fr) Nous proposons d'exploiter les données en utilisant les méthodes de l'analyse de concepts logiques [Ferré et Ridoux (2003)] via des outils de l'équipe LIS : Camelis http://www.irisa.fr/LIS/ferre/camelis/ ou Sparklis http://www.irisa.fr/LIS/ferre/sparklis/. Des expérimentations avec le système de gestion de contexte et le contexte obtenu permettront d'améliorer l'approche et la construction des facettes sémantiques à mettre en avant. (en) We propose to exploit the data using methods of logical concept analysis [Ferré and Ridoux (2003)] using LIS team tools: Camelis http://www.irisa.fr/LIS/ferre/camelis/ or Sparklis http://www.irisa.fr/LIS/ferre/sparklis/. Experiments with the context management system and the context obtained will allow to improve the approach and the construction of semantic facets to be put forward. |
In the retail context, there is an increasing need for understanding individual customer behavior in order to personalize marketing actions. We propose the novel concept of customer signature, that identifies a set of important products that the customer refills regularly. Both the set of products and the refilling time periods give new insights on the customer behavior. Our approach is inspired by methods from the domain of sequence segmentation, thus benefiting from efficient exact and approximate algorithms. Experiments on a real massive retail dataset show the interest of the signatures for understanding individual customers. |
Logical Concept Analysis [1] (LCA) extends the Formal Concept Analysis [4] (FCA) approach. Recently, this approach has been undertaken for terminology, a workflow has been developed to go from XML data to a logical information context. Through experiments on specific resources, facet designs have been tuned to facilitate the search and control on the data. We will consider several usages of such contexts and illustrate benefits of the approach. |
2016 |
Heterogeneity of data and data formats in bioinformatics entail mismatches between inputs and outputs of different services, making it difficult to compose them into workflows. To reduce those mismatches, bioinformatics platforms propose ad'hoc converters, called shims. When shims are written by hand, they are time-consuming to develop, and cannot anticipate all needs. When shims are automatically generated, they miss transformations, for example data composition from multiple parts, or parallel conversion of list elements. This article proposes to systematically detect convertibility from output types to input types. Convertibility detection relies on a rule system based on abstract types, close to XML Schema. Types allow to abstract data while precisely accounting for their composite structure. Detection is accompanied by an automatic generation of converters between input and output XML data. % We show the applicability of our approach by abstracting concrete bioinformatics types (e.g., complex biosequences) for a number of bioinformatics services (e.g., blast). We illustrate how our automatically generated converters help to resolve data mismatches when composing workflows. % We conducted an experiment on bioinformatics services and datatypes, using an implementation of our approach, as well as a survey with domain experts. The detected convertibilities and produced converters were validated as relevant from a biological point of view. Furthermore the automatically produced graph of potentially compatible services exhibited a connectivity higher than with the ad'hoc approaches. Indeed, the experts discovered unknown possible connexions. |
Bonus distribution in enterprises or course allocation at universities are examples of sensitive multi-unit assignment problems, where a set of resources is to be allocated among a set of agents having multi-unit demands. Automatic processes exist, based on quantitative information, for example bids or preference ranking, or even on lotteries. In sensitive cases, however, decisions are taken by persons also using qualitative information. At present, no multi-unit assignment system supports both quantitative and qualitative information. In this paper, we propose \muaplis, an interactive process for multi-assignment problems where, in addition to bids and preferences, agents can give arguments to motivate their choices. Bids are used to automatically make pre-assignments, qualitative arguments and preferences help decision makers break ties in a founded way. A group decision support system, based on Logical Information Systems, allows decision makers to handle bids, arguments and preferences in a unified interface. We say that a process is {\em p-equitable} for a property $p$ if all agents satisfying $p$ are treated equally. We formally demonstrate that \muaplis{} is p-equitable for a number of properties on bids, arguments and preferences. It is also Pareto-efficient and Gale-Shapley-stable with respect to bids. A successful course allocation case study is reported. It spans over two university years. The decision makers were confident about the process and the resulting assignment. Furthermore, the students, even the ones who did not get all their wishes, found the process to be equitable. |
Expressions, such as mathematical formulae, logical axioms, or structured queries, account for a large part of human knowledge. It is therefore desirable to allow for their representation and querying with Semantic Web technologies. We propose an RDF design pattern that fulfills three objectives. The first objective is the structural representation of expressions in standard RDF, so that expressive structural search is made possible. We propose simple Turtle and SPARQL abbreviations for the concise notation of such RDF expressions. The second objective is the automated generation of expression labels that are close to usual notations. The third objective is the compatibility with existing practice and legacy data in the Semantic Web (e.g., SPIN, OWL/RDF). We show the benefits for RDF tools to support this design pattern with the extension of SEWELIS, a tool for guided exploration and edition, and its application to mathematical search. |
The Semantic Web is founded on a number of Formal Languages (FL) whose benefits are precision, lack of ambiguity, and ability to automate reasoning tasks such as inference or query answering. This however poses the challenge of mediation between machines and users because the latter generally prefer Natural Languages (NL) for accessing and authoring knowledge. In this paper, we introduce the NF design pattern based on Abstract Syntax Trees (AST), Huet's zippers and Montague grammars to zip together a natural language and a formal language. Unlike question answering, translation does not go from NL to FL, but as symbol NF suggests, from ASTs (A) of an intermediate language to both NL (NF). ASTs are built interactively and incrementally through a user-machine dialog where the user only sees NL, and the machine only sees FL. |
This work focuses on the statistical questions introduced by the QALD-6 challenge. With the growing amout of semantic data, including numerical data, the need for RDF analytics beyond semantic search becomes a key issue of the Semantic Web. We have extended SPARKLIS from semantic search to RDF analytics by covering the computation features of SPARQL (expressions, aggregations and groupings). We could therefore participate to the new task on statistical questions, and we report the achieved performance of SPARKLIS. Compared to other participants, SPARKLIS does not translate spontaneous questions by users, but instead guide users in the construction of a question. Guidance is based on the actual RDF data so as to ensure that built questions are well-formed, non-ambiguous, and inhabited with answers. We show that SPARKLIS enables superior results for both an expert user (94\% correct) and a beginner user (76\% correct). |
We propose a novel approach to ontology authoring that is centered on semantics rather than on syntax. Instead of writing axioms formalizing a domain, the expert is invited to explore the possible worlds of her ontology, and to eliminate those that do not conform to her knowledge. Each elimination generates an axiom that is automatically derived from the explored situation. We have implemented the approach in prototype PEW (Possible World Explorer), and conducted a user study comparing it to Protégé. The results show that more axioms are produced with PEW, without making more errors. More importantly, the produced ontologies are more complete, and hence more deductively powerful, because more negative constraints are expressed. |
With the rise of the Semantic Web, more and more relational data are made available in the form of knowledge graphs (e.g., RDF, conceptual graphs). A challenge is to discover conceptual structures in those graphs, in the same way as Formal Concept Analysis (FCA) discovers conceptual structures in tables. Graph-FCA has been introduced in a previous work as an extension of FCA for such knowledge graphs. In this paper, algorithmic aspects and use cases are explored in order to study the feasability and usefulness of G-FCA. We consider two use cases. The first one extracts linguistic structures from parse trees, comparing two graph models. The second one extracts workflow patterns from cooking recipes, highlighting the benefits of n-ary relationships and concepts. |
Knowledge acquisition is a central issue of the Semantic Web. Knowledge cannot always be automatically extracted from existing data, thus contributors have to make efforts to create new data. In this paper, we propose FORMULIS, a dynamic form-based interface designed to make RDF data authoring easier. FORMULIS guides contributors through the creation of RDF data by suggesting fields and values according to the previously filled fields and the previously created resources. |
2015 |
Dans les domaines scientifiques, particulièrement en bioinformatique, des services élémentaires sont composés sous forme de workflows pour effectuer des expériences d'analyse de données complexes. À cause de l'hétérogénéité des ressources, la composition de services est une tâche difficile. Les utilisateurs, en composant des workflows, manquent d'assistance pour retrouver et interconnecter les services compatibles. Les solutions existantes utilisent des services spéciaux définis de manière manuelle pour gérer les conversions de formats de données entre les entrées et sorties des services dans les workflows. Cela est pénible pour un utilisateur final. Gérer les incompatibilités des services avec des convertisseurs manuels prend du temps et est lourd. Il existe des solutions automatisées pour faciliter la composition de workflows mais elles sont généralement limitées dans le guidage et l'adaptation des données entre services. La première contribution de cette thèse propose de détecter systématiquement la convertibilité des sorties vers les entrées des services. La détection de convertibilité repose sur un système de règles basé sur une abstraction des types d'entrée et sortie des services. L'abstraction de types permet de considérer la nature et la composition des données d'entrée et sortie. Les règles permettent la décomposition et la composition ainsi que la spécialisation et la généralisation de types. Elles permettent également de générer des convertisseurs de données à utiliser entre services dans les workflows. La deuxième contribution propose une approche interactive qui permet de guider des utilisateurs à composer des workflows en fournissant des suggestions de services et de liaisons compatibles basées sur la convertibilité de types d'entrée et sortie des services. L'approche est basée sur le modèle des Systèmes d'Information Logiques (LIS) qui permettent des requêtes et une navigation guidées et sûres sur des données représentées avec une logique uniforme. Avec notre approche, la composition de workflows est sûre et complète vis-à-vis de propriétés désirées. Les résultats et les expériences, effectués sur des services et des types de données en bioinformatique, montrent la pertinence de nos approches. Nos approches offrent des mécanismes adaptés pour gérer les incompatibilités de services dans les workflows, en prenant en compte la structure composite des données d'entrée et sortie. Elles permettent également de guider, étape par étape, des utilisateurs à définir des workflows bien formés à travers des suggestions pertinentes. |
This paper proposes an interactive approach that guides users in the step-by-step composition of services by providing safe suggestions based on type convertibility. % Users specify the points of the workflow (called the focus) they want to complete, and our approach suggests services and connections whose data types are compatible with the focus. % We prove the safeness (every step produces a well-formed workflow) and the completeness (every well-formed workflow can be built) of our approach. |
Knowledge graphs offer a versatile knowledge representation, and have been studied under different forms, such as conceptual graphs or Datalog databases. With the rise of the Semantic Web, more and more data are available as knowledge graphs. FCA has been successful for analyzing, mining, learning, and exploring tabular data, and our aim is to help transpose those results to graph-based data. Previous FCA approaches have already addressed relational data, hence graphs, but with various limits. We propose G-FCA as an extension of FCA where the formal context is a knowledge graph based on n-ary relationships. The main contributions is the introduction of ``n-ary concepts'', i.e. concepts whose extents are n-ary relations of objects. Their intents, ``projected graph patterns'', mix relationships of different arities, objects, and variables. In this paper, we lay first theoretical results, in particular the existence of a concept lattice for each concept arity, and the role of relational projections to connect those different lattices. |
La conception d'ontologies constitue souvent un frein à l'adoption des techniques de l'ingénierie des connaissances et du Web sémantique. Une raison est bien sûr l'emploi de formalismes et des concepts logiques qui y sont associés. Une autre raison qui nous semble plus profonde est le fossé entre syntaxe et sémantique, c'est-à-dire entre la forme de surface de l'ontologie (axiomes) et ce qu'elle rend nécessaire/possible/impossible (modèles). Ce fossé entraîne des divergences entre l'intention du concepteur et sa modélisation qui se manifestent par des inférences inattendues, voire des incohérences. Nous proposons une nouvelle approche de conception d'ontologies fondée sur l'exploration et l'élimination interactive de ``mondes possibles'' (modèles). Elle réduit le fossé syntaxe/sémantique en interdisant par construction la production d'incohérence, et en montrant en permanence au concepteur ce qui peut être inféré ou non. Un prototype, PEW (Possible World Explorer), permet d'expérimenter cette approche et de la comparer à d'autres éditeurs d'ontologies. |
This article presents an automated construction of a logical information context from a terminological resource, available in xml ; we apply this to the resource FranceTerme and to Camelis tool and we discuss how the resulting context can be used with such a tool dedicated to logical contexts. The purpose of this development and the choices related to this experiment is twofold : to facilitate the use of a rich linguistic resource available as open-data in xml ; to test and envision a systematic transformation of such xml resources to logical contexts. A logical view of a context allows to explore information in a flexible way, without writing explicit queries, it may also provide insights on the quality of the data. Such a context can be enriched by other information (of diverse natures), it can also be linked with other applications (according to arguments supplied by the context). |
We present TermLis a logical information context constructed from terminological resources available in XML (FranceTerme), for a flexible use with a logical context system (CAMELIS). A logical view of a context allows to explore information in a flexible way, without writing explicit queries, it may also provide insights on the quality of the data. Such a context can be enriched by other information (of diverse natures), it can also be linked with other applications (according to arguments supplied by the context). We show how to use TermLis and we illustrate, through this concrete realization from FranceTerme data, the advantages of such an approach with terminological data. |
We present a new project, Akenou-Breizh, that aims to (1) put in place a platform allowing to study the influences of an heritage language, such as Breton, on a usage language, such as French, and (2) to make available, to all interested persons, tools well integrated in the "semantic and multilingual web" and proposing proactive access to various kinds of knowledge concerning Breton, as well as direct visualisation of infrasentential correspondences in aligned bilingual presentations. We plan not only to use the numerous freely available resources, in particular those of OPLB and of the APERTIUM project, but also to create new ones, such as good quality bilingual aligned corpora, thereby using the "collaborative web", and to build on the dedicated lingwarium.org web site linguistic modules improving on or extending those that exist, for example a morphological analyzer-generator.We also describe an experiment set up starting from a reduced lexicon for Breton, that shows how it is possible to enrich a classical dictionary, by linking it to a lattice of topics and to a context management system (here CAMELIS), in such a way one can query it (along semantic facets) and compare different resources. |
2014 |
In many domains where information access plays a central role, there is a gap between expert users who can ask complex questions through formal query languages (e.g., SQL), and lay users who either are dependent on expert users, or must restrict themselves to ask simpler questions (e.g., keyword search). Because of the formal nature of those languages, there seems to be an unescapable trade-off between expressivity and usability in information systems. The objective of this thesis is to present a number of results and perspectives that show that the expressivity of formal languages can be reconciled with the usability of widespread information systems (e.g., browsing, Faceted Search (FS)). The final aim of this work is to empower people with the capability to produce, explore, and analyze their data in a powerful way. \paragraph{} We have proposed a number of theories and implementations to better reconcile expressivity and usability, and applied them to a number of contexts going from file systems to the Semantic Web. In this thesis, we introduce an unifying framework inspired by Formal Concept Analysis (FCA) to factor out the main ideas of all those results: Abstract Conceptual Navigation (ACN). The principle of ACN is to guide users by letting them {\em navigate} in a conceptual space where places are {\em concepts} connected by navigation links. Concepts are characterized by a formal query, and are made of two parts: an {\em extension} and an {\em intension}. The extension is made of query results while the intension is made of the query itself and an index of query increments over results. Finally, navigation links are formally defined as query transformations. The conceptual space is not static but is induced by concrete data, and evolves with it. ACN therefore combines the {\em expressivity} of formal query languages with the {\em guidance} of conceptual navigation. The {\em readability} of queries is improved by verbalizing them to (or parsing them from) a Controlled Natural Language (CNL). Readability and guidance together support usability by speaking user's language, and by providing a systematic assistance. |
Orphanet est un organisme dont l'objectif est notamment de rassembler des collections d'articles traitant de maladies rares. Cependant, l'acquisition de nouvelles connaissances dans ce domaine est actuellement réalisée manuellement. Dès lors, obtenir de nouvelles informations relatives aux maladies rares est un processus chronophage. Permettre d'obtenir ces informations de manière automatique est donc un enjeu important. Dans ce contexte, nous proposons d'aborder la question de l'extraction de relations entre gènes et maladies rares en utilisant des approches de fouille de données, plus particulièrement de fouille de motifs séquentiels sous contraintes. Nos expérimentations montrent l'intérêt de notre approche pour l'extraction de relations entre gènes et maladies rares à partir de résumés d'articles de PubMed. |
Reasoning on multiple criteria is a key issue in group decision to take into account the multidimensional nature of real-world decision-making problems. In order to reduce the induced information overload, in multicriteria decision analysis, criteria are in general aggregated, in many cases by a simple discriminant function of the form of a weighted sum. It requires to, a priori and completely, elicit preferences of decision makers. That can be quite arbitrary. In everyday life, to reduce information overload people often use a heuristic, called ``Take-the-best'': they take criteria in a predefined order, the first criterion which discriminates the alternatives at stake is used to make the decision. Although useful, the heuristic can be biased. This article proposes the Logical Multicriteria Sort process to support multicriteria sorting within islands of agreement. It therefore does not require a complete and consistent a priori set of preferences, but rather supports groups to quickly identify the criteria for which an agreement exists. The process can be seen as a generalization of Take-the-best. It also proposes to consider one criterion at a time but once a criterion has been found discriminating it is recorded, the process is iterated and relevant criteria are logically combined. Hence, the biases of Take-the-best are reduced. The process is supported by a GDSS, based on Logical Information Systems, which gives instantaneous feedbacks of each small decision and keeps tracks of all of the decisions taken so far. The process is incremental, each step involves low information load. It guarantees some fairness because all considered alternatives are systematically analyzed along the selected criteria. A successful case study is reported. |
The Semantic Web (SW) is now made of billions of triples, which are available as Linked Open Data (LOD) or as RDF stores. The SPARQL query language provides a very expressive way to search and explore this wealth of semantic data. However, user-friendly interfaces are needed to bridge the gap between end-users and SW formalisms. Navigation-based interfaces and natural language interfaces require no or little training, but they cover a small fragment of SPARQL's expressivity. We propose SQUALL, a query and update language that provides the full expressiveness of SPARQL~1.1 through a flexible controlled natural language (e.g., solution modifiers through superlatives, relational algebra through coordinations, filters through comparatives). A comprehensive and modular definition is given as a Montague grammar, and an evaluation of naturalness is done on the QALD challenge. SQUALL is conceived as a component of natural language interfaces, to be combined with lexicons, guided input, and contextual disambiguation. It is available as a Web service that translates SQUALL sentences to SPARQL, and submits them to SPARQL endpoints (e.g., DBpedia), therefore ensuring SW compliance, and leveraging the efficiency of SPARQL engines. |
Heterogeneity of data and data formats in bioinformatics often entail a mismatch between inputs and outputs of different services, making it difficult to compose them into workflows. To reduce those mismatches bioinformatics platforms propose ad'hoc converters written by hand. This article proposes to systematically detect convertibility from output types to input types. Convertibility detection relies on abstract types, close to XML Schema, allowing to abstract data while precisely accounting for its composite structure. Detection is accompanied by an automatic generation of converters between input and output XML data. Our experiment on bioinformatics services and datatypes, performed with an implementation of our approach, shows that the detected convertibilities and produced converters are relevant from a biological point of view. Furthermore they automatically produce a graph of potentially compatible services with a connectivity higher than with the ad'hoc approaches. |
Nowadays, large sets of data describing trajectories of mobile objects are made available by the generalization of geolocalisation sensors. Relevant information, for instance, the most used routes by children to go to school or the most extensively used streets in the morning by workers, can be extracted from this amount of available data allowing, for example, to reconsider the urban space. A trajectory is represented by a set of points (x; y; t) where x and y are the geographic coordinates of a mobile object and t is a date. These data are difficult to explore and interpret in their raw form, i.e. in the form of points (x; y; t), because they are noisy, irregularly sampled and too low level. A first step to make them usable is to resample the data, smooth it, and then to segment it into higher level segments (e.g. "stops" and "moves") that give a better grip for interpretation than the raw coordinates. In this paper, we propose a method for the segmentation of these trajectories in accelerate/decelerate segments which is based on the computation of exponential moving averages (EMA). We have conducted experiments where the exponential moving average proves to be an efficient smoothing function, and the difference between two EMA of different weights proves to discover significant accelerating-decelerating segments. |
In this paper, we propose a process for small to medium scale multi-assignment problems. In addition to biddings, agents can give motivations to explain their choices in order to help decision makers break ties in a founded way. A group decision support system, based on Logical Information Systems, allows decision makers to easily face both biddings and motivations. Furthermore, it guaranties that all the agents are treated equally. A successful case study about a small course assignment problem at a technical university is reported. |
Linked data is increasingly available through SPARQL endpoints, but exploration and question answering by regular Web users largely remain an open challenge. Users have to choose between the expressivity of formal languages such as SPARQL, and the usability of tools based on navigation and visualization. In a previous work, we have proposed Query-based Faceted Search (QFS) as a way to reconcile the expressivity of formal languages and the usability of faceted search. In this paper, we further reconcile QFS with scalability and portability by building QFS over SPARQL endpoints. We also improve expressivity and readability. Many SPARQL features are now covered: multidimensional queries, union, negation, optional, filters, aggregations, ordering. Queries are now verbalized in English, so that no knowledge of SPARQL is ever necessary. All of this is implemented in a portable Web application, Sparklis, and has been evaluated on many endpoints and questions. |
SPARKLIS is a Semantic Web tool that helps users explore SPARQL endpoints by guiding them in the interactive building of questions and answers, from simple ones to complex ones. It combines the fine-grained guidance of faceted search, most of the expressivity of SPARQL, and the readability of (controlled) natural languages. No endpoint-specific configuration is necessary, and no knowledge of SPARQL and the data schema is required from users. This demonstration paper is a companion to the research paper~\cite{Fer2014iswc}. |
The purpose of this article is to show that the associative Lambek calculus extended with basic proper axioms can be simulated by the usual associative Lambek calculus, with the same number of types per word in a grammar. An analogue result had been shown earlier for pregroups grammars (2007). We consider Lambek calculus with product, as well as the product-free version. |
This paper studies mappings between CCG and pregroup grammars, to allow a transfer of linguistic resources from one formalism to the other. We focus on mappings that preserve the binary structures, we also discuss some possible alternatives in the underlying formalisms, with some experiments. |
Nous décrivons dans cet article notre participation à l'édition 2014 de DEFT. Nous nous intéressons à la tâche consistant à associer des noms de session aux articles d'une conférence. Pour ce faire, nous proposons une approche originale, symbolique et non supervisée, de découverte de connaissances. L'approche combine des méthodes de fouille de données séquentielles et de fouille de graphes. La fouille de séquences permet d'extraire des motifs fréquents dans le but de construire des descriptions des articles et des sessions. Ces descriptions sont ensuite représentées par un graphe. Une technique de fouille de graphes appliquée sur ce graphe permet d'obtenir des collections de sous-graphes homogènes, correspondant à des collections d'articles et de noms de sessions. |
2013 |
The purpose of this article is to show that the associative Lambek calculus extended with basic proper axioms can be simulated by the usual associative Lambek calculus, with the same number of types per word in a grammar. An analogue result had been shown earlier for pregroups grammars (2007). We consider Lambek calculus with product, as well as the product-free version. |
This paper reports work in progress about using Logical Information Systems to help facilitators build on experience when preparing meetings. Features of meetings similar to the one under construction are automatically suggested without having to ask for suggestions. Suggestions take into account the whole information about all the meetings already recorded in the system as well as facilitation knowledge, such as thinkLets. Usual techniques and processes that facilitators like to use are naturally suggested. An unusual technique is suggested for example if the facilitator enters a keyword that is a feature of that technique. Although a lot remains to be done, the proposed approach already shows contributions that make believe that it is worth investigating further. The main one is that it builds on the facilitator very practice. Other important features are flexibility and adaptability. |
The Semantic Web is now made of billions of triples, which are available as Linked Open Data (LOD) or as RDF stores. The most common approach to access RDF datasets is through SPARQL, an expressive query language. However, SPARQL is difficult to learn for most users because it exhibits low-level notions of relational algebra such as union, filters, or grouping. We present SQUALL, a high-level language for querying and updating an RDF dataset. It has a strong compliance with RDF, covers all features of SPARQL 1.1, and has a controlled natural language syntax that completely abstracts from low-level notions. SQUALL is available as two web services: one for translating a SQUALL sentence to a SPARQL query or update, and another for directly querying a SPARQL endpoint such as DBpedia. |
Because the Web of Documents is composed of structured pages that are not meaningful to machines, search in the Web of Documents is generally processed by keywords. However, because the Web of Data provides structured information, search in the Web of Data can be more precise. SPARQL is the standard query language for querying this structured information. SPARQL is expressive and its syntax is similar to SQL. However, casual user can not write SPARQL queries. Sewelis is a search system for the Web of Data offering to explore data progressively and more user-friendly than SPARQL. Sewelis guides the search with a query built incrementally because users only have to select query elements in order to complete the query. However, Sewelis does not scale to large datasets such as DBpedia, which is composed of about 2 billion triples. In this report, we introduce Scalewelis. Scalewelis is a search system for the Web of Data that is similar to Sewelis but scalable. Moreover, Scalewelis is independent to data because it connects to SPARQL endpoints. We took part in a challenge on DBpedia with Scalewelis. We were able to answer to 70 questions out of 99 with acceptable response times. |
This paper reports on the participation of the system {\sc squall2sparql} in the QALD-3 question answering challenge for DBpedia. {\sc squall2sparql} is a translator from SQUALL, a controlled natural language for English, to SPARQL 1.1, a standard expressive query and update language for linked open data. It covers nearly all features of SPARQL 1.1, and is directly applicable to any SPARQL endpoint. |
This paper overviews the participation of Scalewelis in the QALD-3 open challenge. Scalewelis is a Faceted Search system. Faceted Search systems refine the result set at each navigation step. In Scalewelis, refinements are syntactic operations that modify the user query. Scalewelis uses the Semantic Web standards (URI, RDF, SPARQL) and connects to SPARQL endpoints. |
2012 |
Faceted search and querying are two well-known paradigms to search the Semantic Web. Querying languages, such as SPARQL, offer expressive means for searching RDF datasets, but they are difficult to use. Query assistants help users to write well-formed queries, but they do not prevent empty results. Faceted search supports exploratory search, i.e., guided navigation that returns rich feedbacks to users, and prevents them to fall in dead-ends (empty results). However, faceted search systems do not offer the same expressiveness as query languages. We introduce {\em Query-based Faceted Search} (QFS), the combination of an expressive query language and faceted search, to reconcile the two paradigms. We formalize the navigation of faceted search as a navigation graph, where navigation places are queries, and navigation links are query transformations. We prove that this navigation graph is {\em safe} (no dead-end), and {\em complete} (every query that is not a dead-end can be reached by navigation). In this paper, the LISQL query language generalizes existing semantic faceted search systems, and covers most features of SPARQL. A prototype, Sewelis, has been implemented, and a usability evaluation demonstrated that QFS retains the ease-of-use of faceted search, and enables users to build complex queries with little training. |
Information overload is a key issue in group decision. A heuristics, called ``take-the-best'', has been shown useful to face multicriteria decisions while reducing information overload: when making decisions people often take criteria in a predefined order, the first criterion which discriminates the alternatives at stake is used to make the decision. In order to rationalize group work, Briggs and de Vreede have proposed collaboration design patterns, called thinkLets. This article presents the LogicalMulticriteriaSort which can be seen as a generalization of the take-the-best heuristics. It also proposes to consider criteria one at the time but once a criterion has been found discriminating it is kept in a record, and the process is iterated. The thinkLet is supported by a GDSS, based on Logical Information Systems, which gives an instantaneous feedback of each micro decision and keeps tracks of all of the decisions taken so far. The LogicalMulticriteriaSort ThinkLet guarantees more fairness and speed than the ChauffeurSort thinkLet. It also avoids the need to give artificial values and weights to the criteria as opposed to the Multicriteria thinkLet. A successful test case is reported. |
Les expressions mathématiques comptent pour une part importante dans les connaissances humaines. Nous en proposons une représentation en RDF afin de pouvoir les intégrer aux autres connaissances dans le Web sémantique. Nous étendons ensuite le langage de description et d'interrogation LISQL afin de concilier des représentations non-ambiguës, des requêtes expressives et des notations naturelles et concises. Par exemple, la requête exttt{int(...?X $\hat{~}$ 2...,?X)} permet de trouver les intégrales en~$x$ dont le corps contient la sous-expression~$x^2$. Tout cela permet d'utiliser Sewelis, un système d'information logique pour le Web sémantique, pour la représentation et l'exploration guidée d'expressions mathématiques. Ce guidage dispense les utilisateurs de maîtriser la syntaxe de LISQL et le vocabulaire tout en leur garantissant des expressions bien formées et des résultats à leurs requêtes. |
Formal languages play a central role in the Semantic Web. An important aspect regarding their design is syntax as it plays a crucial role in the wide acceptance of the Semantic Web approach. The main advantage of controlled natural languages (CNL) is to reconcile the high-level and natural syntax of natural languages, and the precision and lack of ambiguity of formal languages. In the context of the Semantic Web and Linked Open Data, CNL could not only allow more people to contribute by abstracting from the low-level details, but also make experienced people more productive, and make the produced documents easier to share and maintain. We introduce SQUALL, a controlled natural language for querying and updating RDF graphs. It has a strong adequacy with RDF, an expressiveness close to SPARQL 1.1, and a CNL syntax that completely abstracts from low-level notions such as bindings and relational algebra. We formally define the syntax and semantics of SQUALL as a Montague grammar, and its translation to SPARQL. It features disjunction, negation, quantifiers, built-in predicates, aggregations with grouping, and n-ary relations through reification. |
A number of information systems offer a limited exploration in that users can only navigate from one object to another object, e.g. navigating from folder to folder in file systems, or from page to page on the Web. An advantage of conceptual information systems is to provide navigation from concept to concept, and therefore from set of objects to set of objects. The main contribution of this paper is to push the exploration capability one step further, by providing navigation from set of concepts to set of concepts. Those sets of concepts are structured along a number of dimensions, thus forming a cube of concepts. We describe a number of representations of concepts, such as sets of objects, multisets of values, and aggregated values. We apply our approach to multi-valued contexts, which stand at an intermediate position between many-valued contexts and logical contexts. We explain how users can navigate from one cube of concepts to another. We show that this navigation includes and extends both conceptual navigation and OLAP operations on cubes. |
With the persistent deployment of ontological specifications in practice and the increasing size of the deployed ontologies, methodologies for ontology engineering are becoming more and more important. In particular, the specification of negative constraints is often neglected by the human expert, whereas they are crucial for increasing an ontology's deductive potential. % We propose a novel, arguably cognitively advantageous methodology for identifying and adding missing negative constraints to an existing ontology. To this end, a domain expert navigates through the space of satisfiable class expressions with the aim of finding absurd ones, which then can be forbidden by adding a respective constraint to the ontology. % We give the formal foundations of our approach, provide an implementation, called Possible World Explorer (PEW) and illustrate its usability by describing prototypical navigation paths using the example of the well-known pizza ontology. |
Quand un utilisateur crée un nouvel objet dans le Web s\'emantique, les outils existants n'exploitent ni les objets existants et leurs propri\'et\'es, ni les propri\'et\'es d\'ej\`a connues du nouvel objet. Nous proposons UTILIS, une m\'ethode d'aide \`a la cr\'eation de nouveaux objets. UTILIS cherche des objets similaires au nouvel objet en appliquant des r\`egles de relaxation \`a sa description. % Les propri\'et\'es des objets similaires servent de suggestions pour compl\'eter la description du nouvel objet. % Une \'etude utilisateur men\'ee avec des \'etudiants en master montre que les suggestions d'UTILIS ont \'et\'e utilis\'ees. Les utilisateurs ont trouv\'e les suggestions pertinentes : dans la plupart des cas, ils pouvaient trouver l'\'el\'ement recherch\'e dans les trois premiers ensembles de suggestions. De plus, ils les ont appr\'eci\'ees, car la majorit\'e souhaitent les avoir dans un \'editeur de donn\'ees du Web s\'emantique. |
With existing tools, when creating a new object in the Semantic Web, users benefit neither from existing objects and their properties, nor from the already known properties of the new object. % We propose UTILIS, an interactive process to help users add new objects. While creating a new object, relaxation rules are applied to its current description to find similar objects, whose properties serve as suggestions to expand the description. % A user study conducted on a group of master students shows that students, even the ones disconcerted by the unconventional interface, used UTILIS suggestions. In most cases, they could find the searched element in the first three sets of properties of similar objects. % Moreover, with UTILIS users did not create any duplicate whereas with the other tool used in the study more than half of them did. |
UTILIS (Updating Through Interaction in Logical Information Systems), introduced in a research paper at EKAW'12, is an interactive process to help users create new objects in a RDF graph. While creating a new object, relaxation rules are applied to its current description to find similar objects, whose properties serve as suggestions to expand the description. UTILIS is implemented in Sewelis, a system that reconciles the expressiveness of querying languages (e.g., SPARQL), and the benefits of exploratory search found in faceted search. The same interaction principles are used for both exploration and creation of semantic data. We illustrate the UTILIS approach by applying Sewelis to the semantic annotation of comic panels, reusing the dataset that was used for a user evaluation. |
Dans cet article, nous pr{\'e}sentons une {\'e}tude sur l'utilisation de m{\'e}thodes de fouille de donn{\'e}es pour l'analyse stylistique - d'un point de vue linguistique - en consid{\'e}rant des motifs s{\'e}quentiels {\'e}mergents. Nous montrons tout d'abord que la fouille de motifs s{\'e}quentiels de mots en utilisant la contrainte gap permet d'obtenir de nouveaux patrons linguistiques pertinents par rapport aux patrons construits {\`a} partir de n-grammes. Nous {\'e}tudions ensuite l'utilisation de motifs s{\'e}quentiels d'itemsets pour produire des patrons linguistiques plus g{\'e}n{\'e}raux. Nous validons notre approche d'un point de vue quantitatif et d'un point de vue linguistique, en r{\'e}alisant des exp{\'e}rimentations sur trois corpus fran{\c c}ais correspondant {\`a} diff{\'e}rents genres de texte (la po{\'e}sie, les correspondances et les romans, respectivement). En consid{\'e}rant plus particuli{\`e}rement les textes po{\'e}tiques, nous montrons que les techniques de fouille de donn{\'e}es employ{\'e}es permettent d'identifier des patrons linguistiques caract{\'e}ristiques. |
Dans cet article, nous proposons une approche pour explorer des textes de taille importante en mettant en {\'e}vidence des sous-parties coh{\'e}rentes. Cette m{\'e}thode d'exploration s'appuie sur une repr{\'e}sentation en graphe du texte, en utilisant le mod{\`e}le linguistique de Hoey pour s{\'e}lectionner et apparier les phrases dans le graphe. Notre contribution porte sur l'utilisation de techniques de fouille de graphes sous contraintes pour extraire des sous-parties pertinentes du texte (c'est-{\`a}-dire des collections de sous-r{\'e}seaux phrastiques homog{\`e}nes). Nous avons r{\'e}alis{\'e} des exp{\'e}rimentations sur deux textes anglais de taille cons{\'e}quente pour montrer l'int{\'e}r{\^e}t de l'approche que nous proposons. |
In this paper, we study the use of data mining techniques for stylistic analysis, from a linguistic point of view, by considering emerging sequential patterns. First, we show that mining sequential patterns of words with gap constraints gives new relevant linguistic patterns with respect to patterns built on n-grams. Then, we investigate how sequential patterns of itemsets can provide more generic linguistic patterns. We validate our approach from a linguistic point of view by conducting experiments on three corpora of various types of French texts (Poetry, Letters, and Fictions). By considering more particularly poetic texts, we show that characteristic linguistic patterns can be identified using data mining techniques. We also discuss how to improve our proposed approach so that it can be used more efficiently for linguistic analyses. |
We have explored in different perspectives on how categorial grammars can be considered as Logical Information Systems (LIS), where objects are organized and queried by logical properties, both theoretically and practically. LIS have also been considered for the development of pregroup grammars. We propose to illustrate these points with the CAMELIS tool that is an implementation of Logical Information Systems (LIS) and that has been developped at Irisa Rennes. CAMELIS may give another view on linguistic data, and provide an easy help to browse, to update, to create and to maintain or to test such data. |
2011 |
Since the beginning of data processing, the companies have realized the importance of information management solutions. The gathered data are a powerful asset to study the trends and make choices for the future. The Business Intelligence appeared in the mid-90s (the information synthesis to assist decision-making) with OLAP (On-Line Analytical Processing, a tools set for exploration, analysis and display of multidimensional data) and S-OLAP (Spatial OLAP, OLAP with spatial support). A OLAP user, unspecialized in computer sciences, does not need to know a language to handle multidimensional data, create graphics, etc. However, we consider that the OLAP data model is too rigid, because of its permanent multidimensionnal structure and because each content must have a single aggregate value. This observation is the starting point of this thesis. We propose a new paradigm of information system, able to analyze and explore multidimensional and multivalued data. To model this paradigm, we use the logical information systems (LIS) which is an information system that has common features with OLAP, especially on the data mining aspects. Our paradigm is defined by a flexible data model, an easy navigation and modular representation. We concluded this thesis by the application of this paradigm on several topics, including the exploration of geographic data. |
The notion of k-valued categorial grammars where a word is associated to at most k types is often used in the field of lexicalized grammars as a fruitful constraint for obtaining several properties like the existence of learning algorithms. This principle is relevant only when the classes of k-valued grammars correspond to a real hierarchy of languages. Such a property had been shown earlier for classical categorial grammars. This paper establishes the relevance of this notion when categorial grammars are enriched with iterated types. |
We study learnability of Categorial Dependency Grammars (CDG), a family of categorial grammars expressing all kinds of projective, discontinuous and repeatable dependencies. For these grammars, it is known that they are not learnable from dependency structures. We propose two different ways of modelling the repeatable dependencies through iterated types and the two corresponding families of CDG which cannot distinguish between the dependencies repeatable at least K times and those repeatable any number of times. For both we show that they are incrementally learnable in the limit from dependency structures. |
We have proposed an interactive fault localization method based on two data mining techniques, formal concept analysis and association rules. A lattice formalizes the partial ordering and the dependencies between the sets of program elements (e.g., lines) that are most likely to lead to program execution failures. The paper provides an algorithm to traverse that lattice starting from the most suspect places. The main contribution is that the algorithm is able to deal with any number of faults within a single execution of a test suite. In addition, a stopping criterion independent of the number of faults is provided. |
Data mining techniques are used in order to discover emerging knowledge (patterns) in databases. The problem of such techniques is that there are, in general, too many resulting patterns for a user to explore them all by hand. Some methods try to reduce the number of patterns without a priori pruning. The number of patterns remains, nevertheless, high. Other approaches, based on a total ranking, propose to show to the user the top-k patterns with respect to a measure. Those methods do not take into account the user's knowledge and the dependencies that exist between patterns. In this paper, we propose a new way for the user to explore extracted patterns. The method is based on navigation in a partial order over the set of all patterns in the Logical Concept Analysis framework. It accommodates several kinds of patterns and the dependencies between patterns are taken into account thanks to partial orders. It allows the user to use his/her background knowledge to navigate through the partial order, without a priori pruning. We illustrate how our method can be applied on two different tasks (software engineering and natural language processing) and two different kinds of patterns (association rules and sequential patterns). |
Logical Information Systems (LIS) are based on Logical Concept Analysis, an extension of Formal Concept Analysis. This paper describes an application of LIS to support group decision. A case study gathered a research team. The objective was to decide on a set of potential conferences on which to send submissions. People individually used Abilis, a LIS web server, to preselect a set of conferences. Starting from 1041 call for papers, the individual participants preselected 63 conferences. They met and collectively used Abilis to select a shared set of 42 target conferences. The team could then sketch a publication planning. The case study provides evidence that LIS cover at least three of the collaboration patterns identified by Kolfschoten, de Vreede and Briggs. Abilis helped the team to build a more complete and relevant set of information (Generate/Gathering pattern); to build a shared understanding of the relevant information (Clarify/Building Shared Understanding); and to quickly reduce the number of target conferences (Reduce/Filtering pattern). |
Faceted search and querying are two well-known paradigms to search the Semantic Web. Querying languages, such as SPARQL, offer expressive means for searching RDF datasets, but they are difficult to use. Query assistants help users to write well-formed queries, but they do not prevent empty results. Faceted search supports exploratory search, i.e., guided navigation that returns rich feedbacks to users, and prevents them to fall in dead-ends (empty results). However, faceted search systems do not offer the same expressiveness as query languages. We introduce Query-based Faceted Search (QFS), the combination of an expressive query language and faceted search, to reconcile the two paradigms. In this paper, the LISQL query language generalizes existing semantic faceted search systems, and covers most features of SPARQL. A prototype, Sewelis (aka. Camelis 2), has been implemented, and a usability evaluation demonstrated that QFS retains the ease-of-use of faceted search, and enables users to build complex queries with little training. |
Faceted search and querying are the two main paradigms to search the Semantic Web. Querying languages, such as SPARQL, offer expressive means for searching knowledge bases, but they are difficult to use. Query assistants help users to write well-formed queries, but they do not prevent empty results. Faceted search supports exploratory search, i.e., guided navigation that returns rich feedbacks to users, and prevents them to fall in dead-ends (empty results). However, faceted search systems do not offer the same expressiveness as query languages. We introduce semantic faceted search, the combination of an expressive query language and faceted search to reconcile the two paradigms. The query language is basically SPARQL, but with a syntax that better fits in a faceted search interface. A prototype, Camelis 2, has been implemented, and a usability evaluation demonstrated that semantic faceted search retains the ease-of-use of faceted search, and enables users to build complex queries with little training. |
La mise à jour des bases de connaissances existantes est cruciale pour tenir compte des nouvelles informations, régulièrement découvertes. Toutefois, les données actuelles du Web Sémantique sont rarement mises à jour par les utilisateurs. Les utilisateurs ne sont pas suffisament aidés lors de l'ajout et de la mise à jour d'objets. Nous proposons une approche pour aider l'utilisateur à ajouter de nouveaux objets de manière incrémentale et dynamique. Notre approche est fondée sur les Systèmes d'Information Logiques pour l'interaction utilisateur. Pour le guidage, le système cherche les objets ayant des propriétés en commun avec la description de l'objet en cours de création. Les propriétés de ces objets, non présents dans la description, servent de suggestions pour compléter la description de l'objet. |
Updating existing knowledge bases is crucial to take into account the information that are regularly discovered. However, this is quite tedious and in practice Semantic Web data are rarely updated by users. This paper presents UTILIS, an approach to help users create and update objects in RDF(S) bases. While creating a new object, $o$, UTILIS searches for similar objects, found by applying relaxation rules to the description of $o$, taken as a query. The resulting objects and their properties serve as suggestions to expand the description of $o$. |
In this paper, we present an automatic process based on text reduction introduced by Hoey. The application of that kind of approaches on large texts is difficult to do by hand. In the paper, we propose an automatic process to treat large texts. We have conducted some experiments on different kinds of texts (narrative, expositive) to show the benefits of the approach. |
Languages play a central role in the Semantic Web. An important aspect regarding their design is syntax as it plays a crucial role in the wide acceptance of the Semantic Web approach. Like for programming languages, an evolution can be observed from low-level to high-level designs. High-level languages not only allow more people to contribute by abstracting from the details, but also makes experienced people more productive, and makes the produced documents easier to share and maintain. We introduce SQUALL, a high-level language for querying and updating semantic data. It has a strong adequacy with RDF, an expressiveness very similar to SPARQL 1.1, and a controlled natural language syntax that completely abstracts from low-level notions such as bindings and relational algebra. We first give an informal presentation of SQUALL through examples, comparing it with SPARQL. We then formally define the syntax and semantics of SQUALL as a Montague grammar, and its translation to SPARQL. |
Faceted search and querying are the two main paradigms to search the Semantic Web. Querying languages, such as SPARQL, offer expressive means for searching knowledge bases, but they are difficult to use. Query assistants help users to write well-formed queries, but they do not prevent empty results. Faceted search supports exploratory search, i.e., guided navigation that returns rich feedbacks to users, and prevents them to make navigation steps that lead to empty results (dead-ends). However, faceted search systems do not offer the same expressiveness as query languages. We introduce {\em semantic faceted search}, the combination of an expressive query language and faceted search to reconcile the two paradigms. The query language is basically SPARQL, but with a syntax that extends Turtle with disjunction and negation, and that better fits in a faceted search interface: LISQL. We formalize the navigation of faceted search as a navigation graph, where nodes are queries, and navigation links are query transformations. We prove that this navigation graph is {\em safe} (no dead-end), and {\em complete} (every query that is not a dead-end can be reached by navigation). That formalization itself is a contribution to faceted search. \KILL{The expressiveness of LISQL is shown comparable to SPARQL, and they are based on the same relational algebra.} A prototype, Camelis~2, has been implemented, and a usability evaluation with graduate students demonstrated that semantic faceted search retains the ease-of-use of faceted search, and enables most users to build complex queries with little training. |
Query-based Faceted Search (QFS), introduced in a research paper at ISWC'11, reconciles the expressiveness of querying languages (e.g., SPARQL), and the benefits of exploratory search found in faceted search. Because of the interactive nature of QFS, which is difficult to fully render in a research paper, we feel it is important to complement it with a demonstration of our QFS prototype, Sewelis (aka. Camelis 2). An important addition to the research paper is the extension of QFS to the guided edition of RDF bases, where suggestions are based on existing data. This paper motivates our approach, shortly presents Sewelis, and announces the program of the demonstration. Screencasts of the demonstration, as well as material (program and data) to reproduce it, are available at { t http://www.irisa.fr/LIS/softwares/sewelis}. |
2010 |
The concept of pregroup was introduced by Lambek for natural language analysis, with a close link to non-commutative linear logic. We reformulate the pregroup calculus so as to extend it by composition with other logics and calculi. The cut elimination property and the decidability property of the sequent calculus proposed in the article are shown. Properties of composed calculi are also discussed. |
Dans le domaine des bases de données, les outils de calcul de règles d'association et de dépendances fonctionnelles affichent traditionnellement les résultats sous forme d'une liste de règles, difficiles à lire. Nous proposons ici de projeter une relation d'une base de données sur un cube de données OLAP, afin d'afficher ces règles de manière plus structurée et plus intuitive. De plus, nous montrons que les liens de navigation d'OLAP peuvent aider l'utilisateur à naviguer dans ces règles produites. |
Discovering dependencies in data is a well-know problem in database theory. The most common rules are Functional Dependencies (FDs), Conditional Functional Dependencies (CFDs) and Association Rules (ARs). Many tools can display those rules as lists, but those lists are often too long for inspection by users. We propose a new way to display and navigate through those rules. Display is based on On-Line Analytical Processing (OLAP), presenting a set of rules as a cube, where dimensions correspond to the premises of rules. Cubes reflect the hierarchy that exists between FDs, CFDs and ARs. Navigation is based on a lattice, where nodes are OLAP views, and edges are OLAP navigation links, and guides users from cube to cube. We present an illustrative example with the help of our prototype. |
We study the learnability problem in the family of Categorial Dependency Grammars (CDG), a class of categorial grammars defining unlimited dependency structures. CDG satisfying a reasonable condition on the iterant dependencies are shown to be incrementally learnable in the limit. |
Concept lattices have been successfully used for information retrieval and browsing. They offer the advantage of combining querying and navigation in a consistent way. Conceptual navigation is more flexible than hierarchical navigation, and easier to use than plain querying. It has already been applied to formal, logical, and relational contexts, but its application to the semantic web is a challenge because of inference mechanisms and expressive query languages such as SPARQL. The contribution of this paper is to extend conceptual navigation to the browsing of RDF graphs, where concepts are accessed through SPARQL-like queries. This extended conceptual navigation is proved consistent w.r.t. the context (i.e., never leads to an empty result set), and complete w.r.t. the conjunctive fragment of the query language (i.e., every query can be reached by navigation only). Our query language has an expressivity similar to SPARQL, and has a more natural syntax close to description logics. |
We explore different perspectives on how categorial grammars can be considered as Logical Information Systems (LIS) both theoretically, and practically. Categorial grammars already have close connections with logic. We discuss the advantages of integrating both approaches. We consider more generally different ways of connecting computational linguistic data and LIS as an application of Formal Concept Analysis. |
2009 |
Since the arrival of digital cameras, many people are faced to the challenge of organizing and browsing the overwhelming flood of photos their life produces. The same is true for all sorts of documents, e.g.~emails, audio files. Existing systems either let users fill query boxes without any assistance, or drive them through rigid navigation structures (e.g., hierarchies); or they do not let users put annotations on their documents, even when this would support the organization and retrieval of any documents on customized criteria. We present a tool, {\sc Camelis}, that offers users with an organization that is dynamically computed from documents and their annotations. {\sc Camelis} is designed along the lines of Logical Information Systems (LIS), which are founded on logical concept analysis. Hence, (1) an expressive language can be used to describe photos and query the collection, (2) manual and automatic annotations can be smoothly integrated, and (3) expressive querying and flexible navigation can be mixed in a same search and in any order. This presentation is illustrated on a real collection of more than 5,000 photos. |
Managing and supervising security in large networks has become a challenging task, as new threats and flaws are being discovered on a daily basis. This requires an in depth and up-to-date knowledge of the context in which security-related events occur. Several tools have been proposed to support security operators in this task, each of which focuses on some specific aspects of the monitoring. Many alarm fusion and correlation approaches have also been investigated. However, most of these approaches suffer from two major drawbacks. First, they only take advantage of the information found in alerts, which is not sufficient to achieve the goals of alert correlation, that is to say to reduce the overall amount of alerts, while enhancing their semantics. Second, these techniques have been designed on an ad hoc basis and lack a shared data model that would allow them to reason about events in a cooperative way. In this paper, we propose a federative data model for security systems to query and assert knowledge about security incidents and the context in which they occur. This model constitutes a consistent and formal ground to represent information that is required to reason about complementary evidences, in order to confirm or invalidate alerts raised by intrusion detection systems. |
Pregroup grammars are a mathematical formalism in the spirit of categorial grammars. They are close to logical formalism like Lambek calculus but have a polynomial parsing algorithm. The paper presents a parser based on pregroup gram- mar that uses a tabular approach based on majority partial composition. |
Pregroup grammars are a formalism in the spirit of categorial grammars and the Lambek calculus. In contrast to the latter, their parsing is polynomial. We present in this article a toolbox that contains a parser and a set of programs for the construction and use of grammars, in particular for French. |
De nombreuses décisions sont prises en commission, par exemple pour affecter des ressources. Les critères de décision sont difficiles à exprimer et la situation globale est en général trop complexe pour que les participants puissent l'appréhender pleinement. Dans cet article, nous décrivons un processus de décision où l'analyse de concepts est utilisée pour faire face à ces problèmes. Grâce à l'analyse de concepts, les personnes fair play ont la possibilité d'être équitables envers les candidats et de faire preuve de cohérence dans leurs jugements sur toute la durée de la réunion. |
Formal concept analysis is recognized as a good paradigm for browsing data sets. Besides browsing, update and complex data are other important aspects of information systems. To have an efficient implementation of concept-based information systems is difficult because of the diversity of complex data and the computation of conceptual structures, but essential for the scalability to real-world applications. We propose to decompose contexts into simpler and specialized components: logical context functors. We demonstrate this allows for scalable implementations, updatable ontologies, and richer navigation structures, while retaining genericity. |
Exploratory search is about browsing and understanding an information base. It requires a more interactive process than retrieval search, where the user sends a query and the system returns answers. Logical Information Systems (LIS) support exploratory search by guiding users in the construction of queries, and by giving summaries of query results. The contribution of this paper is to adapt and extend LIS to the Semantic Web. We define a summarization and navigation data structure that provides rich views over data, and guides users from view to view. This guiding is proved consistent (i.e., never leads to an empty result set), and complete (i.e., every query can be reached by navigation only). The query language covers a large fragment of SPARQL, and is more concise by using a syntax close to description logics. Our approach could be implemented on top of existing tools for storing, reasoning and querying. |
On entend souvent des plaintes au sujet d'applications informatiques inadaptées à leurs utilisateurs. Nous pensons que les informaticiens doivent être sensibilisés à la prise en compte des utilisateurs dans la conception et le développement de logiciels. Cela ne relève pas de techniques informatiques, mais c'est néanmoins suffisement délicat et complexe pour justifier un cours. Au département informatique de l'INSA de Rennes, les étudiants réalisent une analyse par conception participative d'un produit grand public. C'est un travail par groupe et collaboratif, tous les groupes travaillent sur le même produit mais avec des utilisateurs différents. Le cours est également sa propre illustration. L'avis des étudiants est systématiquement collecté et le cours est en constante évolution. |
2008 |
We discuss the relationship between pregroups and dependency grammars. Conventional pregroup grammars do not formally account for optionality and for iteration. We introduce Gentzen-style rules to take care of two new operations, and an equivalent rewriting system. The extended pregroup calculus enjoys several properties shared with traditional dependency grammars, yet does not significantly expand the polynomial complexity of the syntactic analysis on the pregroup grammar. |
Today, the thematic layer is still the prevailling structure in geomatics for handling geographical information. However, the layer model is rigid: it implies partitionning geographical data in predefined categories and using the same description schema for all elements of a layer. Recently, Logical Information Systems (LIS) introduced a new paradigm for information management and retrieval. Using LIS, we propose a more flexible organisation of vectorial geographical data at a thiner level since it is centered on the geographical feature. LIS do not rely on a hierarchical organisation of information, and enable to tightly combine querying and navigation. In this article, we present the use of LIS to handle geographical data. In particular, we detail a data model for geographical features and the corresponding querying and navigation model. These models have been implemented in the GEOLIS prototype, which has been used to lead experiments on real data. |
Formal Concept Analysis (FCA) is a natural framework to learn from examples. Indeed, learning from examples results in sets of frequent concepts whose extent contains mostly these examples. In terms of association rules, the above learning strategy can be seen as searching the premises of rules where the consequence is set. In its most classical setting, FCA considers attributes as a non-ordered set. When attributes of the context are partially ordered to form a taxonomy, Conceptual Scaling allows the taxonomy to be taken into account by producing a context completed with all attributes deduced from the taxonomy. The drawback, however, is that concept intents contain redundant information. In this article, we propose a parameterized algorithm, to learn rules in the presence of a taxonomy. It works on a non-completed context. The taxonomy is taken into account during the computation so as to remove all redundancies from intents. Simply changing one of its operations, this parameterized algorithm can compute various kinds of concept-based rules. We present instantiations of the parameterized algorithm to learn rules as well as to compute the set of frequent concepts. |
The semantic web aims at enabling the web to understand and answer the requests from people and machines. It relies on several standards for representing and reasoning about web contents. Among them, the Web Ontology Language (OWL) is used to define ontologies, i.e. knowledge bases, and is formalized with description logics. In this paper, we demonstrate how dynamic taxonomies and their benefits can be transposed to browse OWL~DL ontologies. We only assume the ontology has an assertional part, i.e. defines objects and not only concepts. The existence of relations between objects in OWL leads us to define new navigation modes for crossing these relations. A prototype, Odalisque, has been developed on top of well-known tools for the semantic web. |
Pregroup grammars are a context-free grammar formalism which may be used to describe the syntax of natural languages. However, this formalism is not able to easily define types corresponding to optional or iterated arguments like an optional complement of a verb or a sequence of its adverbial modifiers. This paper introduces two constructions that %% solve this issue. make up for this deficiency. |
Because of the expansion of geo-positioning tools and the democratization of geographical information, the amount of geo-localized data that is available around the world keeps increasing. So, the ability to efficiently retrieve informations in function of their geographical facet is an important issue. In addition to individual properties such as position and shape, spatial relations between objects are an important criteria for selecting and reaching objects of interest: e.g., given a set of touristic points, selecting those having a nearby hotel or reaching the nearby hotels. In this paper, we propose Logical Concept Analysis (LCA) and its handling of relations for representing and reasoning on various kinds of spatial relations: e.g., Euclidean distance, topological relations. Furthermore, we present an original way of navigating in geolocalized data, and compare the benefits of our approach with traditional Geographical Information Systems (GIS). |
Recent work in fault localization crosschecks traces of correct and failing execution traces. The implicit underlying technique is to search for association rules which indicate that executing a particular source line will cause the whole execution to fail. This technique, however, has limitations. In this article, we first propose to consider more expressive association rules where several lines imply failure. We then propose to use Formal Concept Analysis (FCA) to analyze the resulting numerous rules in order to improve the readability of the information contained in the rules. The main contribution of this article is to show that applying two data mining techniques, association rules and FCA, produces better results than existing fault localization techniques. |
In academia, many decisions are taken in committee, for example to hire people or to allocate resources. Genuine people often leave such meetings quite frustrated. Indeed, it is intrinsically hard to make multi-criteria decisions, selection criteria are hard to express and the global picture is too large for participants to embrace it fully. In this article, we describe a recruiting process where logical concept analysis and formal concept analysis are used to address the above problems. We do not pretend to totally eliminate the arbitrary side of the decision. We claim, however, that, thanks to concept analysis, genuine people have the possibility to 1) be fair with the candidates, 2) make a decision adapted to the circumstances, 3) smoothly express the rationales of decisions, 4) be consistent in their judgements during the whole meeting, 5) vote (or be arbitrary) only when all possibilities for consensus have been exhausted, and 6) make sure that the result, in general a total order, is consistent with the partial orders resulting from the multiple criteria. |
taux acceptation 19/70 = 27\% |
Dynamic taxonomies and faceted search are increasingly used to organize and browse document collections. The main function of dynamic taxonomies is to start with the full collection, and zoom-in to a small enough subset of items for direct inspection. In this paper, we present other navigation modes than zoom-in for less directed and more exploratory browsing of a document collection. The presented navigation modes are zoom-out, shift, pivot, and querying by examples. These modes correspond to query transformations, and make use of boolean operators. Therefore, the current focus is always clearly specified by a query. |
2007 |
This paper investigates the learnability by positive examples in the sense of Gold of Pregroup Grammars. In a first part, Pregroup Grammars are presented and a new parsing strategy is prop osed. Then, theoretical learnability and non-learnability results for subclasses of Pregroup Grammars are proved. In the last two parts, we focus on learning Pregroup Grammars from a sp ecial kind of input called feature-tagged examples. A learning algorithm based on the parsing strategy presented in the first part is given. Its validity is proved and its prop erties are examplified. |
Dans cet article, nous présentons nos résultats récents concernant l'apprentissage de la syntaxe des langues naturelles, en adoptant le point de vue de l'inférence grammaticale symbolique. L'objectif est d'identifier à partir d'exemples, dans une classe de grammaires connue à l'avance, une grammaire particulière qui engendre les dits exemples. Le modèle de Gold fixe les conditions et le critère de réussite d'une telle entreprise : quand un algorithme produisant une grammaire candidate existe-t-il ? quelle structure doivent contenir les exemples : suites de mots, suites de mots étiquetés, arbres d'analyse ? D'un point de vue théorique, nos résultats établissent l'apprenabilité ou la non-apprenabilité de certaines classes de grammaires catégorielles. En pratique, nos résultats permettent aussi d'acquérir automatiquement des ressources syntaxiques à partir de données réelles. Au final, nous discutons de l'intérêt de cette approche pour modéliser l'acquisition de leur langue naturelle par les enfants ainsi que pour construire automatiquement des grammaires électroniques à partir de corpus. In this paper, we present our recent results on the acquistion of the syntax of natural languages, from the point of view of the theory of grammatical inference. Given a class of possible grammars, the objective is to identify, from a set of positive examples, a grammar in the class which produces the examples. The Gold model formalises the learning process and gives stringent criteria of its success: when does there exist an algorithm producing a target grammar ? what kind of structure should the examples have (strings of words, strings of tagged words, trees) ? From a theoretical point of view, our results establish the learnability or the unlearnability of various classes of categorial grammars. From a practical perspective, these results enable the extraction of syntactic information from real data. Finally, we discuss the interest of this approach for modelling child language acquisition and for automated induction of grammars from corpora. |
Geographical data are mainly structured in layers of information. However, this model of organisation is not convenient for navigation inside a dataset, and so limits geographical data exploration to querying. We think information retrieval could be made easier in GIS by the introduction of a navigation based on geographical object properties. For this purpose, we propose a prototype, GEOLIS1, which tightly combines querying and navigation in the search process of geographical data. GEOLIS relies on Logical Information Systems (LIS), which are based on Formal Concept Analysis (FCA) and logics. In this paper, we detail data organisation and navigation process in GEOLIS. We also present the results of an experimentation led on a real dataset. |
Pregroup grammars are a context-free grammar formalism introduced as a simplification of Lambek calculus. This formalism is interesting for several reasons: the syntactical properties of words are specified by a set of types like the other type-based grammar formalisms ; as a logical model, compositionality is easy ; a polytime parsing algorithm exists. However, this formalism is not completely lexicalized because each pregroup grammar is based on the free pregroup built from a set of primitive types together with a partial order, and this order is not lexical information. In fact, only the pregroup grammars that are based on primitive types with an order that is equality can be seen as fully lexicalized. We show here how we can transform, using a morphism on types, a particular pregroup grammar into another pregroup grammar that uses the equality as the order on primitive types. This transformation is at most quadratic in size (linear for a fixed set of primitive types), it preserves the parse structures of sentences and the number of types assigned to a word. |
Formal Concept Analysis (FCA) is a natural framework for learning from positive and negative examples. Indeed, learning from examples results in sets of frequent concepts whose extent contains only these examples. In terms of association rules, the above learning strategy can be seen as searching the premises of exact rules where the consequence is fixed. In its most classical setting, FCA considers attributes as a non-ordered set. When attributes of the context are ordered, Conceptual Scaling allows the related taxonomy to be taken into account by producing a context completed with all attributes deduced from the taxonomy. The drawback, however, is that concept intents contain redundant information. In this article, we propose a parameterized generalization of a previously proposed algorithm, in order to learn rules in the presence of a taxonomy. The taxonomy is taken into account during the computation so as to remove all redundancies from intents. Simply changing one component, this parameterized algorithm can compute various kinds of concept-based rules. We present instantiations of the parameterized algorithm for learning positive and negative rules. |
Since the arrival of digital cameras, many people are faced to the challenge of organizing and retrieving the overwhelming flow of photos their life produces. Most people put no metadata on their photos, and we believe this is because existing tools make a very limited use of them. We present a tool, Camelis, that offers users with an organization of photos that is dynamically computed from the metadata, making worthwhile the effort to produce it. Camelis is designed along the lines of Logical Information Systems (LIS), which are founded on logical concept analysis. Hence, (1) an expressive language can be used to describe photos and query the collection, (2) manual and automatic metadata can be smoothly integrated, and (3) expressive querying and flexible navigation can be mixed in a same search and in any order. This presentation is illustrated by experiences on a real collection of more than 5000 photos. |
Strings are an important part of most real application multi-valued contexts. Their conceptual treatment requires the definition of {\em substring scales}, i.e., sets of relevant substrings, so as to form informative concepts. However these scales are either defined by hand, or derived in a context-unaware manner (e.g., all words occuring in string values). We present an efficient algorithm based on suffix trees that produces complete and concise substring scales. Completeness ensures that every possible concept is formed, like when considering the scale of all substrings. Conciseness ensures the number of scale attributes (substrings) is less than the cumulated size of all string values. This algorithm is integrated in Camelis, and illustrated on the set of all ICCS paper titles. |
Dynamic taxonomies have been proposed as a solution for combining querying and navigation, offering both expressivity and interactivity. Navigation is based on the filtering of a multidimensional taxonomy w.r.t. query answers, which helps users to focus their search. We show that properties that are commonly used only in queries can be integrated in taxonomies, and hence in navigation, by the use of so-called logics. Hand-designed taxonomies and concrete domains (e.g., dates, strings) can be combined so as to form complex taxonomies. For instance, valued attributes can be handled, and different roles between documents and locations can be distinguished. Logical Information Systems (LIS) are characterized by the combination of querying and navigation, and the systematic use of logics. |
The concept of pregroup was introduced by Lambek for natural language analysis, with a close link to non-commutative linear logic. We reformulate the pregroup calculus so as to extend it by composition with other logics and calculii.The cut elimination property and the decidabilityproperty of the sequent calculus proposed in the article are shown.Properties of composed calculii are also discussed. |
2006 |
This paper is concerned with learning categorial grammars from positive examples in the model of Gold. Functor-argument structures (written FA) are usual syntactical decompositions of sentences in sub-components distinguishing the functional parts from the argument parts defined in the case of classical categorial grammars also known as AB-grammars. In the case of non-associative type-logical grammars, we propose a similar notion that we call generalized functor-argument structures and we show that these structures capture the essence of non-associative Lambek (NL) calculus without product. We show that (i) rigid and k-valued non-associative Lambek (NL without product) grammars are learnable from generalized functor-argument structured sentences. We also define subclasses of k-valued grammars in terms of arity. We first show that (ii) for each k and each bound on arity the class of FA-arity bounded k-valued NL languages of FA structures is finite and (iii) that FA-arity bounded k-valued NL grammars are learnable both from strings and from FA structures as a corollary. Result (i) is obtained from (ii); this learnability result (i) is interesting and surprising when compared to other results: in fact we also show that (iv) this class has infinite elasticity. Moreover, these classes are very close to classes like rigid associative Lambek grammars learned from natural deduction structured sentences (that are different and much richer than FA or generalized FA) or to k-valued non-associative Lambek grammars unlearnable from strings or even from bracketed strings. Thus, the class of k-valued non-associative Lambek grammars learned from generalized functor-argument sentences is at the frontier between learnable and unlearnable classes of languages. |
Software reuse requires that programmers be able to locate reusable components in software repositories. We propose that a general information retrieval framework, which is able to combine arbitrary indexing schemes and called Logical Information Systems, is applied to querying in software repositories. As an illustration, indexing of methods in a package is studied, and three indexing schemes are presented in this framework: a formal scheme, a semi-formal one, and an informal one. The formal one captures object-orientation by combining type isomorphism axioms and inheritance relations. The semi-formal scheme captures naming conventions and the informal one captures keywords in comments. Theory of the formal methods and details on the experiments are presented. |
Numéro thématique AFADL'04 |
Today, the thematic layer is still the prevailling structure in geomatics for handling geographical information. However, the layer model is rigid: it implies partitionning geographical data in predefined categories and using the same description schema for all elements of a layer. Recently, Logical Information Systems (LIS) introduced a new paradigm for information management and retrieval. Using LIS, we propose a more flexible organisation of vectorial geographical data at a thiner level since it is centered on the geographical feature. LIS does not rely on a hierarchical organisation of information, and enable to tightly combine querying and navigation in a same search. In this article, we present a work in progress about the use of LIS model to handle geographical data. In particular, we detail a data model for geographical features and the corresponding querying and navigation model. These models have been implemented in the GEOLIS prototype, which has been used to lead experiments with real data. |
Formal Concept Analysis (FCA) considers attributes as a non-ordered set. This is appropriate when the data set is not structured. When an attribute taxonomy exists, existing techniques produce a completed context with all attributes deduced from the taxonomy. Usual algorithms can then be applied on the completed context for finding frequent concepts, but the results systematically contain redundant information. This article describes an algorithm which allows the frequent concepts of a formal context with taxonomy to be computed. It works on a non-completed context and uses the taxonomy information when needed. The results avoid the redundancy problem with equivalent performance. |
La contribution principale de ce document est une approche plaçant l'opérateur au coeur de l'analyse de journaux d'alarmes faiblement structurées en lui permettant d'utiliser ce qu'il sait, même si ses connaissances sont partielles, et sans le submerger d'informations. Des motifs temporels structurés sont extraits par agrégation d'alarmes généralisées et corrélation se basant sur la date des alarmes et sur la similarité d'attributs autres que la date. L'approche est appliquée aux alarmes produites par un concentrateur VPN (Virtual Private Network). Une étude de cas montre comment 5000 d'alarmes peuvent être regroupées en 50 motifs. |
Logic Functors form a framework for specifying new logics, and deriving automatically theorem provers and consistency/completeness diagnoses. Atomic functors are logics for manipulating symbols and concrete domains, while other functors are logic transformers that may add connectives or recursive structures, or may alter the semantics of a logic. The semantic structure of the framework is model theoretic as opposed to the verifunctional style often used in classical logic. This comes close to the semantics of description logics, and we show indeed that the logic~${\cal ALC}$ can be rebuilt using logic functors. This offers the immediate advantage that variants of~${\cal ALC}$ can be explored and implemented almost for free. This report comes with extensive appendices describing in detail a toolbox of logic functors (definitions, algorithms, theorems, and proofs). |
2005 |
Many application domains make use of specific data structures such as sequences and graphs to represent knowledge. These data structures are ill-fitted to the standard representations used in machine learning and data-mining algorithms: propositional representations are not expressive enough, and first order ones are not efficient enough. In order to efficiently represent and reason on these data structures, and the complex patterns that are related to them, we use domain-specific logics. We show these logics can be built by the composition of logical components that model elementary data structures. The standard strategies of top-down and bottom-up search are ill-suited to some of these logics, and lack flexibility. We therefore introduce a dichotomic search strategy, that is analogous to a dichotomic search in an ordered array. We prove this provides more flexibility in the search, while retaining completeness and non-redundancy. We present a novel algorithm for learning using domain specific logics and dichotomic search, and analyse its complexity. We also describe two applications which illustrates the search for motifs in sequences; where these motifs have arbitrary length and length-constrained gaps. In the first application sequences represent the trains of the East-West challenge; in the second application they represent the secondary structure of Yeast proteins for the discrimination of their biological functions. |
In this paper, we define Dependency Structure Grammars (DSG), which are rewriting rule grammars generating sentences together with their dependency structures, are more expressive than CF-grammars and non-equivalent to mildly context-sensitive grammars. We show that DSG are weakly equivalent to Categorial Dependency Grammars (CDG) recently introduced in [6,3]. In particular, these dependency grammars naturally express long distance dependencies and enjoy good mathematical properties. |
This paper is concerned with the inference of categorial grammars, a context-free grammar formalism in the field of computational linguistics. A recent result has shown that whereas they are not learnable from strings in the model of Gold, rigid and k-valued non-asso ciative Lamb ek grammars are still learnable from generalized functor-argument structured sentences. We fo cus here on the algorithmic part of this result and provide an algo- rithm that can b e seen as an extension of Buszkowski, Penn and Kanazawa's contributions for classical categorial grammars. |
The notion of k-valued categorial grammars where a word is associated to at most k types is often used in the field of lexicalized grammars as a fruitful constraint for obtaining several properties like the existence of learning algorithms. This principle is relevant only when the classes of k-valued grammars correspond to a real hierarchy of languages. This paper establishes the relevance of this notion for two related grammatical systems. In the first part, the classes of k-valued non-associative Lambek (NL) grammars without product is proved to define a strict hierarchy of languages. The second part introduces the notion of generalized functor argument for non-associative Lambek ($NL_{\emptyset}$) calculus without product but allowing empty antecedent and establishes also that the classes of k-valued ($NL_{\emptyset}$) grammars without product form a strict hierarchy of languages |
The current trend in debugging and testing is to cross-check information collected during several executions. Jones et al., for example, propose to use the instruction coverage of passing and failing runs in order to visualize suspicious statements. This seems promising but lacks a formal justification. In this paper, we show that the method of Jones et al. can be re-interpreted as a data mining procedure. More particularly, they define an indicator which characterizes association rules between data. With this formal framework we are able to explain intrinsic limitations of the above indicator. |
A program invariant is a property that holds for every execution of the program. Recent work suggest to infer likely-only invariants, via dynamic analysis. A likely invariant is a property that holds for some executions but is not guaranteed to hold for all executions. In this paper, we present work in progress addressing the challenging problem of automatically verifying that likely invariants are actual invariants. We propose a constraint-based reasoning approach that is able, unlike other approaches, to both prove or disprove likely invariants. In the latter case, our approach provides counter-examples. We illustrate the approach on a motivating example where automatically generated likely invariants are verified. |
A logical view of formal concept analysis considers attributes of a formal context as unary predicates. In a first part, we propose an augmented definition that handles {\em binary relations} between objects. A Galois connection is defined on augmented contexts. It represents concept inheritance as usual, but also relations between concepts. As usual, labeling operators are also defined. In particular, concepts and relations are visible and labeled in a single structure. In a second part, we show how relations can be used for navigating in an augmented concept lattice. This part augments the theory of Logical Information Systems. An implementation is sketched, and first experimental results are presented. |
Tracers provide users with useful information about program executions. In this paper we propose a ``tracer driver'', from a single tracer, it provides a powerful front-end for multiple dynamic analysis tools while limiting the overhead of the trace generation. The tracer driver can be used both synchronously and asynchronously. The relevant execution events are specified by flexible event patterns and a large variety of trace data can be given either systematically or ``on demand''. The proposed tracer driver has been designed and experimented in the context of constraint logic programming, within GNU-Prolog. Its principles are, however, independent of the traced programming language. Experimental measures show that the flexibility and power of the described architecture are also the basis of reasonable performances. |
Programs with constraints are hard to debug. In this paper, we describe a general architecture to help develop new debugging tools for constraint programming. The possible tools are fed by a single general-purpose tracer. A tracer-driver is used to adapt the actual content of the trace, according to the needs of the tool. This enables the tools and the tracer to communicate in a client-server scheme. Each tool describes its needs of execution data thanks to event patterns. The tracer driver scrutinizes the execution according to these event patterns and sends only the data that are relevant to the connected tools. Experimental measures show that this approach leads to good performance in the context of constraint logic programming, where a large variety of tools exists and the trace is potentially huge. |
The Parts-of-file File System (PofFS) allows read-write accesses to different views of a given file or set of files in order to help the user separate and manipulate different concerns. The set of files is considered as a mount point from which views can be selected as read-write files via directories. Paths are formulas mentioning properties of a desired view. Each directory contain a file (the view) which contains the parts of the mounted files that satisfy the properties. This service is offered generically at the file system level, and a plug-in interface permits that file formats, or application-specific details are handled by user-defined operators. Special plug-ins called transducers can be defined for automatically attaching properties to parts of files. Performances are encouraging; files of 100 000 lines are handled efficiently. |
We present Logical Information Systems (LIS). A LIS can be viewed as a schema-less database whose objects are described by logical formulas. Objects are automatically organized according to their logical description, and logical formulas can be used for representing both queries and navigation links. The key feature of a LIS is that it answers a query with a set of navigation links expressed in the same logic as the query. As navigation links are dynamically computed from any query, and can be used as query increments, it follows that querying and navigation steps can be combined in any order. We then present LISFS, a file-system implementation of a LIS, where objects are files or parts of files. This has the benefit to make LIS features available right now to existing applications. This implementation can easily be extended and specialized through a plug-in mechanism. Finally, we present some applications in the field of personal databases (e.g., music, images, emails), and demonstrate that building specialized interfaces for visualizing databases can be done easily through LISFS navigation. |
2004 |
Logical information systems (LIS) use logic in a uniform way to describe their contents, to query it, to navigate through it, to analyze it, and to maintain it. They can be given an abstract specification that does not depend on the choice of a particular logic, and concrete instances can be obtained by instantiating this specification with a particular logic. In fact, a logic plays in a LIS the role of a schema in databases. We present the principles of LIS, the constraints they impose on the expression of logics, and hints for their effective implementation. |
BLID (Bio-Logical Intelligent Database) is a bioinformatic system designed to help biologists extract new knowledge from raw genome data by providing high-level facilities for both data browsing and analysis. We describe BLIDrsquos novel data browsing system which is based on the idea of Logical Information Systems. This enables combined querying and navigation of data in BLID (extracted from public bioinformatic repositories). The browsing language is a logic especially designed for bioinformatics. It currently includes sequence motifs, taxonomies, and macromolecule structures, and it is designed to be easily extensible, as it is composed of reusable components. Navigation is tightly combined with this logic, and assists users in browsing a genome through a form of human-computer dialog. |
2003 |
We present the new paradigm of logic file systems, its implementation, and first experimental results. It offers in an integrated way navigation and classification, the possibility of expressive queries, ease of use, and possible heterogeneity of data. This paradigm is object-centered. It associates logical descriptions to objects, and logical relations between descriptions serve as a basis for navigation and querying. We compare logic file systems with the hierarchical, boolean, and data-base paradigms. We present briefly the rôle of logic in logic file systems, and in more details the implementation issues of a particular logic file system that uses a simple logic. |
2002 |
Les deux principaux paradigmes de recherche d'information, la navigation et l'interrogation, sont souvent dissociés. Les systèmes hiérarchiques offrent une structure de navigation figée qui ne convient pas à toutes les utilisations ; ce qu'ils compensent par des outils de recherche. Ceux-ci, fondés sur l'interrogation, sont plus souples mais sont plus difficiles à utiliser pour les non-initiés et rendent délicat le contrôle du volume des réponses. Il apparaît donc comme nécessaire de combiner étroitement navigation et interrogation. Pour réaliser cette combinaison, nous nous fondons sur l'Analyse de concepts (AC) qui permet de construire automatiquement, à partir d'une description des objets, une structure de navigation appelée «treillis de concepts», où les concepts jouent à la fois le rôle de répertoire et de requête. Comme dans l'AC les descriptions se limitent à des ensembles d'attributs, nous avons généralisé l'AC pour les remplacer par des formules d'une logique arbitraire. Ceci nous semble important pour traiter des applications diverses. Les Systèmes d'information logiques (SIL) se définissent donc par la combinaison navigation/interrogation, l'emploi de la logique (descriptions, requêtes et liens de navigation) et la généricité. Sur cette base, nous avons développé plusieurs mécanismes pour faciliter l'expression et la découverte de connaissances. Les connaissances d'un domaine peut être exprimées par une terminologie. Un dialogue homme-machine, fondé sur le treillis de concepts, permet de retrouver des objets (navigation) et de découvrir des régularités entre les objets (extraction de connaissances). Un mécanisme d'apprentissage offre une assistance à la classification des objets. Enfin, un prototype a été développé pour d'expérimenter ces mécanismes. Il est générique dans le sens où il ne dépend pas de la logique employée. Ces logiques peuvent être assemblés à l'aide d'un jeu de composants logique, que nous avons constitué. |
A formal context associates to objects a description that combines automatically extracted properties (intrinsic) and manually assigned ones (extrinsic). The extrinsic properties are expressed by users according to intentions that are often subjective and changing, and determine the classification and retrieval of objects. So, we find it important to assist users in this task through the automatic suggestion of extrinsic properties to be assigned and even the discovery of rules to automate these assignements. The principle is to learn from the description of existing objects the extrinsic description of a new object. Because of the changing nature of users' intentions, the assistance given in the incremental building of a logical context must be interactive. We present formal principles, and an application to the classification of email messages. |
Formal Concept Analysis (FCA) is interested in the formation of concept lattices from binary relations between objects and attributes, a.k.a. contexts. Many algorithms have been proposed to generate the set of all concepts, and also the edges of the lattice between these concepts. We develop the principle and the code of a new algorithm combining two existing ones, Godin's and Bordat's algorithms. Then, we show by both a theoretical and practical study that it is the most efficient algorithm for sparse contexts, which are usually found in real applications. |
Logical Information Systems (LIS) use logic in a uniform way to describe their contents, to query it, to navigate through it, to analyze it, and to maintain it. They can be given an abstract specification that does not depend on the choice of a particular logic, and concrete instances can be obtained by instantiating this specification with a particular logic. In fact, a logic plays in a LIS the role of a schema in data-bases. We present the principles of logical information systems, the constraints they impose on the expression of logics, and hints for their effective implementation. |
Logic-based applications often use customized logics which are composed of several logics. These customized logics are also often embedded as a black-box in an application. So, implementing them requires the specification of a well-defined interface with common operations such as a parser, a printer, and a theorem prover. In order to be able to compose these logic, one must also define composition laws, and prove their properties. We present the principles of logic functors and their compositions for constructing logics that are ad-hoc, but sound. An important issue is how the operations of different sublogics inter-operate. We propose a formalization of the logic functors, their semantics, implementations, proof-theoretic properties, and their composition. |
2001 |
We present a generalization of logic All I Know by presenting it as an extension of standard modal logics. We study how this logic can be used to represent complete and incomplete knowledge in Logical Information Systems. In these information systems, a knowledge base is a collection of objects (e.g., files, bibliographical items) described in the same logic as used for expressing queries. We show that usual All I Know (transitive and euclidean accessibility relation) is convenient for representing complete knowledge, but not for incomplete knowledge. For this, we use \emph{serial} All I Know (serial accessibility relation). |
Logic-based applications often use customized logics which are composed of several logics. These customized logics are also often embedded as a black-box in an application. Their implementation requires the specification of a well-defined interface with common operations such as a parser, a printer, and a theorem prover. In order to be able to compose these logics, one must also define composition laws, and prove their properties. We present the principles of logic functors and their compositions for constructing customized logics. An important issue is how the operations of different sublogics inter-operate. We propose a formalization of the logic functors, their semantics, implementations, and their composition. |
Logical Concept Analysis is Formal Concept Analysis where logical formulas replace sets of attributes. We define a Logical Information System that combines navigation and querying for searching for objects. Places and queries are unified as formal concepts represented by logical formulas. Answers can be both extensional (objects belonging to a concept) and intensional (formulas refining a concept). Thus, all facets of navigation are formalized in terms of Logical Concept Analysis. We show that the definition of being a refinement of some concept is a specific case of Knowledge Discovery in a formal context. It can be generalized to recover more classical KD~operations like machine-learning through the computation of necessary or sufficient properties (modulo some confidence), or data-mining through association rules. |
2000 |
Tracing by automatic program source instrumentation has major advantages over compiled code instrumentation: it is more portable, it benefits from many compiler optimizations, it produces traces in terms of the original program, and it can be tailored to specific debugging needs. The usual argument in favor of compiled code instrumentation is its supposed efficiency. We have compared the performance of two operational low-level Prolog tracers with source instrumentation. We have executed classical Prolog benchmark programs, collecting trace information without displaying it. On average, collecting trace information by program instrumentation is about as fast as using a low-level tracer in one case, and only twice slower in the other. This is a minor penalty to pay, compared to the advantages of the approach. To our knowledge, this is the first time that a quantitative comparison of both approaches is made for any programming language. |
We present the design of a file system whose organization is based on Concept Analysis ``à la Wille-Ganter''. The aim is to combine querying and navigation facilities in one formalism. The file system is supposed to offer a standard interface but the interpretation of common notions like directories is new. The contents of a file system is interpreted as a Formal Context, directories as Formal Concepts, and the sub-directory relation as Formal Concepts inclusion. We present an organization that allows for an efficient implementation of such a Conceptual File System. |
We propose a generalization of Formal Concept Analysis (FCA) in which sets of attributes are replaced by expressions of an almost arbitrary logic. We prove that all FCA can be reconstructed on this basis. We show that from any logic that is used in place of sets of attributes can be derived a contextualized logic that takes into account the formal context and that is isomorphic to the concept lattice. We then justify the generalization of FCA compared with existing extensions and in the perspective of its application to information systems. |
1999 |
Opium is a system for analysing and debugging Prolog programs. Its kernel comprises an execution tracer and a programming language with a set of primitives for trace and source analysis. In this chapter we show the power of Opium for supporting abstract views of Prolog executions. Abstract views give high-level points of view about executions. They filter out irrelevant details; they restructure the remaining information; and they compact it so that the amount of information given at each step has a reasonable size. The examples of abstract views given in the following are a goal execution profile, some data abstractions, an instantiation profile, a failure analysis, a loop analysis, and a kind of explanation for an expert system written in Prolog. |
Traces of program executions are a helpful source of information for program debugging. They, however, give a picture of program executions at such a low level that users often have difficulties to interpret the information. Opium, our extendable trace analyzer, is connected to a ``standard'' Prolog tracer. Opium is programmable and extendable. It provides a trace query language and abstract views of executions. Users can therefore examine program executions at the levels of abstraction which suit them. Opium has shown its capabilities to build abstract tracers and automated debugging facilities. This article describes in depth the trace query mechanism, from the model to its implementation. Characteristic examples are detailed. Extensions written so far on top of the trace query mechanism are listed. Two recent extensions are presented: the abstract tracers for the LO (Linear Objects) and the CHR (Constraint Handling Rules) languages. These two extensions were specified and implemented within a few days. They show how to use Opium for real applications. |
We present Coca, an automated debugger for C, where the breakpoint mechanism is based on events related to language constructs. Events have semantics whereas source lines used by most debuggers do not have any. A trace is a sequence of events. It can be seen as an ordered relation in a database. Users can specify precisely which events they want to see by specifying values for event attributes. At each event, visible variables can be queried. The trace query language is Prolog with a handful of primitives. The trace query mechanism searches through the execution traces using both control flow and data whereas debuggers usually search according to either control flow or data. As opposed to fully ``relational'' debuggers which use plain database querying mechanisms, Coca trace querying mechanism does not require any storage. The analysis is done on the fly, synchronously with the traced execution. Coca is therefore more powerful than ``source line'' debuggers and more efficient than relational debuggers. |
Also RR-3489 |
Monitoring requires to gather data about executions. The monitoring functionalities currently available are built on top of ad hoc instrumentations. Most of them are implemented at low-level; in any case they require an in-depth knowledge of the system to instrument. The best people to implement these instrumentations are generally the implementors of the compiler. They, however, cannot decide which data to gather. Indeed, hundreds of variants can be useful and only end-users know what they want. In this article, we propose a primitive which enables users to easily specify what to monitor. It is built on top of the tracer of the Mercury compiler. We illustrate how to use this primitive on two different kinds of monitoring. Firstly, we implement monitors that collect various kinds of statistics; each of them is well-known, the novelty is that users can get exactly the variants they need. Secondly, we define two notions of test coverage for logic programs and show how to measure coverage rates with our primitive. To our knowledge no definition of test coverage exist for logic programming so far. Each example is only a few lines of Mercury. Measurements show that the performance of the primitive on the above examples is acceptable for an execution of several millions of trace events. Our primitive, although simple, lays the foundation for a generic and powerful monitoring environment. |
In this paper we show that a tracer with a trace analyser can be used to achieve more than debugging. We first illustrate how to compute coverage ratios for test cases. We also give 4 examples to monitor the behavior of programs. Thus, instead of building ad hoc instrumentations, which is currently the case for such tools, one can use a uniform environment which allows a synergy between the tools to take place. As a matter of fact, while studying the test coverage measurement we enriched the trace information, to the benefit of the other tools. Moreover, ad hoc instrumentations require an in depth knowledge of the system to instrument, either at low level or by source to source transformation. Even it is not technically difficult, it always requires a significant programming effort. On the opposite, with our approach, the instrumentation is generic. It is done once for all, and the specific analyses can be relatively simple. The examples of this article consist of less than a dozen of Prolog lines each. |
In this article, we have give a formal specification of Byrd's box model and we show how this specification can be extended to specify richer trace models. We have also sho how these specifications can be executed by a direct translation into lambda-Prolog, leading to a Prolog interpreter that performs execution traces. This interpreter can be used both to experiment various trace models and to validate the different event specifications. Hence we have a formal framework to specify and prototype trace models. |
Existing explanation systems for deductive databases show forests of proof trees. Although proof trees are often useful, they are only one possible interesting representation. We argue that an explanation system for deductive databases must be able to generate explanations at several levels of abstraction. One possible and well known technique to achieve this flexibility is to instrument meta-interpreters. It is, however, not often used because of its inefficiency. On the other hand, deductive databases often generate intermediate information stored in the physical database. This information can be considered as a low-level trace giving a faithful picture of what has happened at the relational level. The deductive reasoning is lost but can be very easily recovered by a meta-interpreter. In this article we describe a technique to generate explanations by integrating a relational trace and an instrumented meta-interpreter. The expensive aspects of meta-interpretation are reduced by the use of the trace which avoids many costly calculations. The flexibility of meta-interpretation is preserved, as illustrated by the generation of three different kinds of explanations: a box-oriented trace, a multi-SLD-AL tree and abstract AND trees. This technique enables powerful explanation systems to be implemented with very few modifications of the deductive database mechanism itself. |
In this paper, we explore the testing-verification relationship with the objective of mechanizing the generation of test data. We consider program classes defined as recursive program schemes and we show that complete and finite test data sets can be associated with such classes, that is to say that these test data sets allow us to distinguish every two different functions in these schemes. This technique is applied to the verification of simple properties of programs. |
Nous proposons une généralisation de l'analyse de concepts formels (ACF) dans laquelle les ensembles d'attributs sont remplacés par des expressions d'une logique presque arbitraire. Nous prouvons que toute l'ACF peut être reconstruite sur cette base. Nous montrons qu'à partir de toute logique utilisée à la place des ensembles d'attributs, on peut dériver une logique contextualisée qui prend en compte le contexte formel et qui est isomorphe au treillis de concepts. Nous comparons ensuite la généralisation de l'ACF aux extensions qui y ont déjà été apportées. Enfin, nous présentons nos perspectives d'application aux systèmes d'information. |
FCA,logic,SI |
Deductive databases manage large quantities of data and, in general, in a set-oriented way. The existing systems of explanation for deductive databases do not take these constraints into account. We propose a tracing technique which consists of integrating a "relational" trace and an instrumented meta-interpreter using substitution sets. The relational trace efficiently gives precise information about data extraction from the relational database. The meta-interpreter manages substitution sets and gives explanation on the deduction. The expensive aspects of meta-interpretation are reduced by the use of the trace which avoids many calculations. The flexibility of meta-interpretation is preserved. It allows different profiles of trace to be easily produced. |
This document gathers the user manual and the reference manual of Opium-M, an analyser of execution traces of Mercury Programs. Opium-M is an adaptation to Mercury of Opium a trace analyser for Prolog. Mercury is a new logic programming language. Its type, mode and determinism declarations enable codes to be generated that is at the same time more efficient and more reliable than with current logic programming languages. The deterministic parts of Mercury programs are as efficient as their C counterparts. Moreover, numerous mistakes are detected at compilation time. However, our industrial partner experience shows that the fewer remaining mistakes, the harder they are to be diagnosed. A high-level debugging tool was thus necessary. Program execution traces given by traditional debuggers provide programmers with useful pieces of information. However, using them requires to analyse by hand huge amounts of information. Opium-M is connected to the traditional tracer of Mercury, it allows execution trace analyses to be automated. It provides a relational trace query language based on Prolog which enables users to specify precisely what they want to see in the trace. Opium-M, then, automatically filters out information irrelevant for the users. |
1998 |
Software development usually involves a collection of properties, programs and data as input or output documents. Putting these three kinds of documents at the vertices of a triangle, one sees that all three sides of the triangle have been exploited in formal methods, and that they have often been used in both directions. However, richer combinations have seldom been envisaged, and formal methods often amount to a strict orientation of the figure by imposing functional dependencies (e.g.,~infering test cases from specifications). Moreover, undecidability problems arise when properties are expressed in full predicate logic (or similar formalisms) or programs are written in Turing-equivalent programming languages. We advocate that (1) formal methods should provide more flexible ways to exploit the developer's knowledge and offer a variety of possibilities to construct programs, properties and test data and (2) it is worth restricting the power of logic formalisms and programming languages for the benefit of mechanization. We go one step in this direction, and present a formal method for generating test cases that combines techniques from abstract interpretation ({\em program -> property}) and testing ({\em program+property -> test data}), and takes inspiration from automated learning (test generation via a {\em testing bias}). The crucial property of the test suites generated this way is that they are robust with respect to a test objective formalized as a property. In other words, if a program passes the test suite, then it is guaranteed to satisfy the property. As this process leads to decision problems in very restricted formalisms, it can be fully mechanized. |
In January 1994, to replace a highly unpopular denotational semantics course, I undertook to set up a course on the B method at the INSA of Rennes (Institut National des Sciences Appliquées), at a Bac+4 level. I had almost no previous knowledge of formal methods. I had, however, programmed much in Prolog and felt the need for a strong programming discipline, supported if possible by methods and tools. The experience is, in my opinion, successful. The students do learn much during the course, find interesting placements where their competence is appreciated and every occurrence of the course teaches me something. In the article, I first list reasons to start the experience. I then discuss the pedagogical objectives of the course. The contents of the course is given and an assessment is made. |
Tracing by automatic program source instrumentation has major advantages over compiled code instrumentation: it is more portable, it benefits from many compiler optimizations, it produces traces in terms of the original program, and it can be tailored to specific debugging needs. The usual argument in favor of compiled code instrumentation is its supposed efficiency. We have compared the performance of two operational low-level Prolog tracers with source instrumentation. We have executed classical Prolog benchmark programs, collecting trace information without displaying it. On average, collecting trace information by program instrumentation is about as fast as using a low-level tracer in one case, and only twice slower in the other. This is a minor penalty to pay, compared to the advantages of the approach. To our knowledge, this is the first time that a quantitative comparison of both approaches is made for any programming language. |
Le développement des bases de données déductives nécessite des outils, en particulier pour le débogage. Les bases de données déductives gèrent des quantités importantes de données et, en général, de manière ensembliste. Les systèmes d'explication existants pour les bases de données déductives ne prennent pas en compte ces contraintes. Nous proposons une technique de traçage qui consiste à intégrer une trace ``relationnelle'' avec un méta-interprète instrumenté utilisant des ensembles de substitutions. La trace relationnelle donne, de manière efficace, de l'information précise sur l'extraction de données de la base relationnelle. Le méta-interprète ensembliste gère des ensembles de substitutions et donne des explications sur la déduction. Les aspects coûteux de la méta-interprétation sont réduits par l'utilisation de la trace qui évite beaucoup de calculs. La flexibilité de la méta-interprétation est conservée. Elle permet de produire facilement des traces de profils différents. |
We present Coca, an automated debugger for C, where the breakpoint mechanism is based on events related to language constructs. Events have semantics whereas source lines used by most debuggers do not have any. A trace is a sequence of events. It can be seen as an ordered relation in a database. Users can specify precisely which events they want to see by specifying values for event attributes. At each event, visible variables can be queried. The trace query language is Prolog with a handful of primitives. The trace query mechanism searches through the execution traces using both control flow and data whereas debuggers usually search according to either control flow or data. As opposed to fully ``relational'' debuggers which use plain database querying mechanisms, Coca trace querying mechanism does not require any storage. The analysis is done on the fly, synchronously with the traced execution. Coca is therefore more powerful than ``source line'' debuggers and more efficient than relational debuggers. |
1997 |
The power of deductive systems in general is that programs express what should be done and not how it should be done. Nevertheless, deductive systems need debugging and explanation facilities. Indeed, their operational semantics is less abstract than the declarative semantics of the programs. If users have to understand all the low level details of the operational semantics much of the benefits of using a deductive system is lost. Existing explanation systems for deductive databases produce proof trees to be shown to users. Although useful, proof trees give a fragmented view of query evaluations, and users face a, most of the time large, forest of proof trees. We propose a new data structure, called the DDB tree, which merges the information of a proof tree forest into one concise tree. A DDB tree gives a global picture of a query evaluation in a dramatically reduced structure with no loss of information. DDB trees can be shown to users or can be analyzed further by an explanation system. |
1996 |
Tracing by automatic program source instrumentation has major advantages over compiled code instrumentation: it is cheaper to develop and more portable, it benefits from most compiler optimizations, it produces traces in terms of the original program, and it can be tailored to specific debugging needs. The usual argument in favor of compiled code instrumentation is its supposed efficiency. Tolmach and Appel 1 designed and implemented a tracer for Standard ML based on automatic program source instrumentation. The resulting code runs only 3 times slower than optimized code. They conjectured that a low-level tracer would run at about the same speed. However they had no reasonable low-level tracer at hand to actually compare their results with. We have performed such a comparison in the context of Prolog, using the ECRC ECLiPSe environment. The built-in low-level tracer of ECLiPSe is, at present, one of the most interesting tracers for Prolog. We have compared it with an instrumentation based on O'Keefe's "advice" utility 2 , made compatible with the ECLiPSe tracer. We traced "standard" Prolog benchmark programs 3 with both tracing techniques and measured the resulting CPU times. On average the performances of both implementations are equivalent: tracing Prolog programs by program instrumentation is no slower than using a low-level tracer. To our knowledge, this is the first time that a quantitative comparison of both approaches is made. Another contribution is that our source instrumentation is more complete than O'Keefe's advice package. In particular, it deals with built-in predicates, and allows predicates to be skipped/unskipped. ---------------------------------------- 1 A. Tolmach and A.W. Appel. A debugger for Standard ML. Journal of Functional Programming, 5(2):155-200, April 1995. 2 The "advice" utility is part of the DEC10 Prolog library, available by anonymous ftp from the AIAI of the University of Edinburg (aiai.edinburgh.ac.uk). 3 P. Van Roy and A. M. Despain. High-performance logic programming with the Aquarius Prolog compiler. Computer, 25(1):54-68, January 1992. |
Typed Prolog and LambdaProlog are logic programming languages with a strict typing discipline which is based on simple types with variables. These variables are interpreted as denoting generic polymorphism. Experiments show that this discipline does not handle properly common logic programming practices used in Prolog. For instance, the usual transformation for computing the Clark completion of a Prolog program does not work well with some typed programs. We observe that the head-condition is at the heart of these problems, and conclude that it should be enforced. We propose a second-order scheme which is compatible with usual practices. In this scheme, type variables denote parametric polymorphism. It allows quantifying types and terms, passing type and term parameters to goals and terms, and to express type guards for selecting goals. We give its syntax and deduction rules, and propose a solution to keep the concrete notation of programs close to the usual one. |
An abstract representation for grammar rules that permits an easy implementation of several attributed grammar transformations is presented. It clearly separates the actions that contribute to evaluating attribute values from the circulation of these values, and it makes it easy to combine the representations of several rules in order to build the representation of new rules. This abstract form applies well to such transforms as elimination of left-recursion, elimination of empty derivation, unfolding and factorization. Finally, the technique is applied to DCGs and a LambdaProlog implementation of the abstract form and of the transforms is described. |
Slicing is a program analysis technique originally developed by Weiser for imperative languages. Weiser showed that slicing is a natural tool for debugging, but it has other numerous applications (program integration, program optimization, etc.) In this article we describe a backward slicing algorithm for Prolog which produces executable slices. The proposed algorithm is applicable at least to pure Prolog extended by some simple built-in predicates that handle the explicit unification =/2 and arithmetic. To our knowledge, this algorithm is the first one to be proposed for Prolog. Because of the indeterminism and lack of explicit control flow of Prolog, existing algorithms cannot be trivially adapted. The two main contributions of this paper are a general definition of slicing adapted to Prolog and a slicing algorithm that produces executable programs. |
Le slicing est une technique d'analyse de programme développée à l'origine par Weiser pour les langages impératifs. Weiser a montré que le slicing est un outil naturel de débogage, mais il a également de nombreuses autres applications (intégration de programmes, optimisation, etc.) Dans cet article, nous proposons une définition du slicing pour Prolog et un algorithme. Celui-ci est au moins applicable à Prolog pur étendu par quelques prédicats de base (=/2 et arithmétiques). À notre connaissance, cet algorithme est le premier à être proposé pour Prolog. Les spécificités de Prolog (indéterminisme et manque de flot de contr\^ole explicite), ne permettent pas d'adapter trivialement les algorithmes existants pour langages impératifs. |
1995 |
LambdaProlog est un langage de programmation logique dont les clauses et les termes généralisent ceux de Prolog. On peut se demander si toutes ces extensions sont nécessaires simultanément et si des langages intermédiaires intéressants ne pourraient pas être définis, au moins dans un but pédagogique. Nous répondons à cette question en montrant que des liens de nécessité conduisent à adopter toutes les extensions à partir de l'introduction du nouveau domaine de termes. De cette reconstruction découle une heuristique de programmation par induction sur les types qui est un guide commode pour utiliser LambdaProlog. LambdaProlog is a logic programming language in which clauses and terms are more general than in Prolog. One may wonder whether these extensions are simultaneously needed, and what are the useful subsets of LambdaProlog, at least for pedagogical purposes. We answer this question by exhibiting necessity links from the addition of the new term domain to the extension of the formula language. A handy heuristic for programming by induction on types can be derived from these links. |
Computational Linguistics and Logic Programming have strong connections, but the former uses concepts that are absent from the most familiar implementations of the latter. We advocate that a Logic Programming language need not feature the Computational Linguistics concepts exactly, it must only provide a logical way of dealing with them. We focus on the manipulation of higher-order terms and the logical handling of context, and we show that the advanced features of Prolog~II and LambdaProlog are useful for dealing with these concepts. Higher-order terms are native in LambdaProlog, and Prolog~II's infinite trees provide a handy data-structure for manipulating them. The formula language of LambdaProlog can be transposed in the Logic Grammar realm to allow for a logical handling of context. |
Traces of program executions tell how programs behave in given cases. They are a helpful source of information for automated debugging. Opium is an automated trace analyser for Prolog programs. It is programmable and extendable. It provides a trace query language and abstract views of executions as a basis for automated debugging. Opium has shown its capabilities to build abstract tracers and automated debugging facilities. This paper lists the extensions written so far, and describes two recent extensions: the abstract tracers for the LO (Linear Objects) language and for the CHR (Constraint Handling Rules) language. |
We study under which conditions the domain of lambda-terms and the equality theory of the lambda-calculus form the basis of a usable constraint logic programming language (CLP). The conditions are that the equality theory must contain axiom {$\eta$}, and the formula language must depart from Horn clauses and accept universal quantifications and implications in goals. In short, CLP-lambda must be close to LambdaProlog. |
1994 |
Programming environments are essential for the acceptance of programming languages. This survey emphasizes that program analysis, both static and dynamic, is the central issue of programming environments. Because their clean semantics makes powerful analysis possible, logic programming languages have an indisputable asset in the long term. This survey is focused on logic program analysis and debugging. The large number of references provided show that the field, though maybe scattered, is active. A unifying framework is given which separates environment tools into extraction, analysis, and visualization. It facilitates the analysis of existing tools and should give some guide lines to develop new ones. Achievements in logic programming are listed; some techniques developed for other languages are pointed out, and some trends for further research are drawn. Among the main achievements are algorithmic debugging, tracing for sequential Prolog, and abstract interpretation. The main missing techniques are slicing, test case generation, and program mutation. The perspectives we see are integration, evaluation and, above all, automated static and dynamic analysis. |
1993 |
Continuations are well know in functional programming where they have been used to transform and compile programs. Some languages provide explicit manipulations of the continuation for the user: The user can catch and modify the current continuation. Continuations have also been used in the logic programming context to give a denotational semantics for Prolog, to generate Prolog compilers and to transform Prolog programs. In this paper, we propose to introduce new built-ins in a logic programming language to enable the user to explicitly replace the continuations. These built-ins allow the user to have a new control of the execution. We choose LambdaProlog because of its higher-order syntax and implications in the goals which are necessary for the definition and use of these built-ins. In order to define the built-ins, we extend to LambdaProlog the Prolog semantics based on continuations. Then, we show that an exception mechanism can be easily implemented using these new built-ins. The proposed semantics is also used to prove equivalence of goals changing the continuations. |
This article proposes a structuring view of the area of automated debugging. Nineteen automated debugging systems are analyzed. Thirteen existing automated debugging techniques are briefly evaluated from a pragmatic point of view. The three underlying strategies are identified, namely verification with respect to specification, checking with respect to language knowledge and filtering with respect to symptom. The verification strategy compares the actual program with some formal specification of the intended program. The checking strategy looks for suspect places which do not comply with some explicit knowledge of the programming language. The filtering strategy assumes correct parts of the code which cannot be responsible for the error symptom. Assertion evaluation and algorithmic debugging are the most promising verification techniques. Some intrinsic limitations of the checking strategy makes it only a complementary, though helpful, debugging support. The slicing technique should be included in any debugger. |
La plupart des systèmes Prolog proposent un formalisme de grammaire logique appelé DCG (Definite Clause Grammar), dont l'utilité est reconnue. Nous présentons deux nouveaux formalismes de grammaire logique appelé DCG' et lambda-HHG (higher-order Hereditary Harrop Grammar)---grammaires héréditaires de Harrop d'ordre supérieur) destinés à être utilisés dans les systèmes LambdaProlog. Les relations entre DCG, DCG', et lambda-HHG, d'une part, et entre Prolog et LambdaProlog, d'autre part, peuvent être résumées de la manière suivante. (1) Prolog, DCG et la traduction de DCG en Prolog sont classiques. (2) Miller propose l'évolution de Prolog à LambdaProlog, et Pereira, Pareschi et Miller montrent l'intérêt d'utiliser LambdaProlog pour le traitement de la langue naturelle. (3) Nous proposons une variante fortement typée de DCG (appelée) afin de pouvoir la traduire en LambdaProlog dans le système LambdaProlog. C'est un premier pas vers un formalisme plus élaboré. (4) lambda-HHG est à DCG ce que LambdaProlog est à Prolog. Ce formalisme combine les avantages d'être grammatical et de cacher les opération d'un analyseur (comme DCG), et d'avoir des termes d'ordre supérieur comme attributs et de proposer une approche logique à la représentation des contextes (comme LambdaProlog). |
A logic grammar formalism called DCG (Definite Clause Grammars), which has proved to be useful, is part of most Prolog implementations. We develop two new logic grammar formalisms called DCG' and lambda-HHG (higher-order Hereditary Harrop Grammars) that can be used in LambdaProlog implementations. The relations between DCG, DCG', and lambda-HHG, and Prolog and LambdaProlog can be summarized as follows: (1) The language Prolog, the DCG formalism, and the translation of DCG into Prolog by Prolog are classical. (2) The evolution from Prolog to LambdaProlog is due to Miller and the advantage of using LambdaProlog for doing natural language analysis is shown by Pereira, and Pareschi and Miller. (3) We propose a strongly typed variant of DCG (called DCG') for its translation into LambdaProlog by LambdaProlog. It is a first stage towards a more elaborate formalism. (4) A formalism that is to DCG what LambdaProlog is to Prolog is still missing, and also the way to translate it into LambdaProlog. Such a formalism combines the advantage of being grammatical and hiding the house-keeping operations (like DCG) and of having higher-order terms as attributes and providing a logical approach to context (like LambdaProlog). lambda-HHG is such a formalism. |
1992 |
The dissertation describes the innovative features of Opium, a high-level debugging environment for Prolog, designed and implemented at ECRC between 1985 and 1991. Debugging is a costly process, and automating it would significantly reduce the cost of software production and maintenance. However, it is unrealistic to aim at fully automating the task. In particular programmers have to understand rapidly changing situations, examining large amounts of data. In the current state of the art it is beyond the capabilities of computers to take the place of programmer's understanding. Nevertheless, computers can significantly help programmers to select the data to be analysed. The data used by program analysis in general is often restricted to the source code of the analysed programs. However, there is a complementary source of information, namely traces of program executions. An execution trace contains less general information than the program source, but it tells how the program behaves in a particular case. Furthermore, there are intrinsically dynamic aspects in a program which are best analysed at execution time, for example uses of read/write. These remarks suggested to build the automated debugging functionalities of Opium on top of an existing tracer, extending it to a general trace and source analyser. The most important features of Opium, not to be found in other debuggers, are as follows. - It provides a trace query language which is a solution to the ever growing command sets of other tracers. With two primitives plus Prolog, users can already specify more precise trace queries than with the hard coded commands of other tracers. - Opium is programmable and extendable. It is thus an environment where debugging strategies can be easily programmed and integrated. Some strategies are already implemented. - Abstract views of executions are proposed as a basis for automated debugging. They help users to understand the behaviours of programs by browsing through executions at a higher level than single steppers. Opium is fully implemented. More than 20 academic sites have recently requested Opium prototype, and some are actually implementing new abstract views. |
Logic programming languages are becoming more complex with the introduction of new features such as constraints or terms with an equality theory. With this increase in complexity, they require more and more sophisticated memory management. This survey gives an insight into the memory management problems in sequential logic programming languages implementations; it also describes the presently know solutions. It is meant to be understood by non-specialists in logic programming with good knowledge of memory management in general. We first describe a "usefulness logic" for run-time objects. Usefulness logic defines non-garbage objects. Next, memory management systems are presented from the most trivial original run-time system, with no real concern for memory problems, to elaborated run-time systems with memory management closely observing the usefulness logic. Finally, the choice of a garbage collection technique is discussed in relation with logic programming specifities. |
LambdaProlog is a logic programming language accepting a more general clause form than standard Prolog (namely hereditary Harrop formulas instead of Horn formulas) and using simply typed lambda-terms as a term domain instead of first order terms. Despite these extensions, it is still amenable to goal-directed proofs and can still be given procedural semantics. However, the execution of LambdaProlog programs requires several departures from the standard resolution scheme. First, the augmented clause form causes the program (a set of clauses) and the signature (a set of constants) to be changeable, but in a very disciplined way. Second, the new term domain has a semi-decidable and infinitary unification theory, and it introduces the need for a beta-reduction operation at run-time. MALI is an abstract memory that is suitable for storing the search-state of depth-first search processes. Its main feature is its efficient memory management. We have used an original LambdaProlog-to-C translation: predicates are transformed into functions operating on several continuations. The compilation scheme is sometimes an adaptation of the standard Prolog scheme, but at other times it has to handle new features such as types, beta-reduction and delayed unification. Two keywords of this implementation are "sharing" and "folding" of representations. Sharing amounts to recognising that some representation already exists and reusing it. Folding amounts to recognising that two different representations represent the same thing and replacing one by the other. We assume a basic knowledge of Prolog and LambdaProlog. |
We present a general trace query language which is a solution to the ever growing command sets of other tracers. It provides all the required generality while being very simple and efficient. We model a program execution into a trace which is a stream of events. Execution events have a uniform representation, and can be analysed by Prolog programs. With this approach and thanks to the expressive power of Prolog, two high-level primitives plus Prolog are enough to provide a general trace query language. With a few optimizations this language can work on large executions without any loss of performance, if compared to traditional tracers. This paper describes the trace query mechanism from its high level specification down to some implementation details. The proposed model of trace query depends only on the sequentiality of the execution, and the principles behind the design of the optimizations do not depend on the traced language. |
Automated debugging and expert system explanations have in common to aim at helping people to understand executions. We have designed an extendable trace analyser for the purpose of automated debugging, which we propose to use to prototype expert system explanations. Two examples illustrate how simple it is to implement abstract tracing of executions and how easy it is to play with them. |
The result of a Prolog execution can simply be ``no'', when the programmer is expecting something else. This symptom is typical of Prolog, and especially requires the help of an execution tracer to get clues of what the problem can be. We present a solution which helps programmers to understand how unexpected failures have occurred. We first propose a hierarchy of failing goals. We argue that there is one kind of leaf failures which is interesting to track at the first place. Then we give the algorithm for our leaf failure tracking and two examples illustrating its use. |
The data used by program analysis in general is often restricted to the source code of the analysed programs. However, there is a complementary source of information, namely traces of program executions. Usual tracers, which extract this trace information, do not allow for general trace analysis. Opium, our debugger for Prolog, sets up a framework where program sources and traces of program executions can be jointly analysed. As the debugging process is heuristic and not all the debugging strategies have been identified so far, Opium is programmable. In particular, its trace query language gives more flexibility and more power than the hard coded command sets of usual tracers. This trace query language is based on Prolog. Opium is therefore both a helpful tool for Prolog and a nice application of Prolog. The most innovative extensions of Opium compute abstract views of Prolog executions to help users understand the behaviours of programs. In particular they help them understand how error symptoms have been produced. This article briefly recalls some general information about Opium. A debugging session is then commented in detail. |
We present a compiled implementation of LambdaProlog that uses the abstract memory MALI for representing the execution state. LambdaProlog is a logic programming language allowing a more general clause form than Standard Prolog's (namely hereditary Harrop formulas instead of Horn formulas) and using simply typed lambda-terms as a term domain instead of first order terms. The augmented clause form causes the program (a set of clauses) and the signature (a set of constants) to be changeable in a very disciplined way. The new term domain has a semi-decidable and infinitary unification theory, and it introduces the need for a beta-reduction operation at run-time. MALI is an abstract memory that is suitable for storing the search-state of depth-first search processes. Its main feature is its efficient memory management. We have used an original LambdaProlog-to-C translation along which predicates are transformed into functions operating on continuations for handling failure and success in unifications, and changes in signatures and programs. Two keywords of this implementation are ``sharing'' and ``folding'' of representations. Sharing amounts to recognising that some representation already exists and to reuse it. Folding amounts to recognising that two different representations represent the same thing and to replace one by the other. |
1991 |
Opium is an extensible debugging environment for PROLOG providing high-level debugging facilities for programmers and debugging experts. In the design of debuggers there are two tasks which are often mixed, extraction and analysis of debugging information. The aim of the extraction task is to collect the whole debugging information so that users do not miss any important information about their program. On the other hand, the aim of the analysis task is to restrict in an accurate way the amount of debugging information shown to the user so that the latter has to examine only the relevant parts. This task clearly depends on the debugging situation and, to our point of view, there is no general restriction which can be done a priori. However, the two tasks are usually mixed and hard-coded, the result is that not enough relevant information and too much useless information is displayed. In Opium the two tasks are clearly separated. The extraction module collects the whole debugging information (execution trace and program source) which is then available for the analysis module. The presentation concentrates on the analysis module, discussing the main aspects of Opium: programmability, high-level debugging, extensibility mechanisms, meta-debugging, support for end-users and debugging experts. |
We propose a new implementation of logic programming with higher-order terms. In order to illustrate the properties of our implementation, we apply the coding of lists as functions to the context of logic programming. As a side-effect, we show that higher-order unification is a good tool for manipulating the function-lists. It appears that the efficiency of the program thus obtained relies critically upon the implementation of higher-order operations (unification and reduction). In particular, we show that a good choice for data-structures and reduction strategy yields a linear naïve reverse. |
Opium is a system for analysing and debugging Prolog programs. Its kernel comprises an execution tracer and a programming language with a full set of primitives for trace and source analysis. In this paper we show the power of Opium for supporting abstract views of Prolog executions. Abstract views give high-level points of view about the execution. They filter out irrelevant details; they restructure the remaining information; and they compact it so that the amount of information given at each step has a reasonable size. The examples of abstract views given in the following are a goal execution profile, some data abstractions, an instantiation profile, a failure analysis and a kind of explanation for an expert system written in Prolog. |
1988 |
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.
This document was translated from BibTEX by bibtex2html