First Data and Knowledge Quality Workshop French Version  

 

In conjunction with ECG 2005

18th January 2005, Paris, France

Scope

 

The problem of poor data quality stored in database and data warehouse systems is extensively widespread in the governmental, commercial and industrial environments. Knowledge discovery and decision-making based on poor quality data have significant and direct consequences for the companies and the practitioners. Data and knowledge quality is now becoming one of the topics of emerging interest in the academic and industrial communities.

 

Many data analysis applications, such as data mining or text mining, require various forms of data preparation with several data processing techniques, because the input to the data mining algorithms is assumed to conform to nice data distributions, containing no missing, inconsistent or incorrect values. This leaves a large gap between the available data and the available machinery to process the data. In fine, the evaluation of results obtained from data analysis is usually made by specialists (experts, analysts ...).  The cost of this task is often very high, and the way to reduce it is to help the specialists while giving them relevant decision criteria as quality or interestingness measures of results. These measures of knowledge quality have to be designed in order to combine two dimensions: the objective dimension related to data quality, and the subjective dimension related to the specialist’s focus of interest.

 

The DKQ (Data and Knowledge Quality) workshop in conjunction with EGC'2005 in Paris deals specifically with data quality issues and related knowledge discovery techniques. It intends to address methods, techniques of massive data analysis, methodologies, new algorithmic approaches or approaches to developing data quality metrics in order to understand and to explore data, to find data glitches and to ensure both data quality and knowledge quality discovered from data.

 

We invite the submission of original research contributions, industrial papers and case studies relating to all aspects of data quality and knowledge quality defined broadly, from the data preparation process to the data analysis.

The duration of the workshop is one day dedicated to technical presentations matching the following topics of interests.

 

Topics

 

Particular topics of interest for the workshop include but are not limited to:

  • Data Quality Metrics, Quality Metrics for data mining results, Human-centred quality metrics
  • Rule interestingness
  • Detection of Contradictory Data, Outliers, Duplicates, Inconsistencies, Noise
  • Mining for Patterns of non- or poor quality data
  • Validation of data mining model
  • Automatic Record Matching
  • Object Identification
  • Data Transformations, Data Reconciliation, Data Consolidation
  • Error Correction
  • Data Cleaning Techniques
  • Intelligent Data Preparation

 

For all kind of data types: XML, transactional data, numerical or categorical data, multimedia data and different application contexts: Bioinformatics, Marketing, CRM, e-Business, etc.

 

Attendance

The expected audience for this workshop includes researchers, students and practitioners from the database, knowledge discovery and statistics communities that have an interest in data quality in databases and data warehouse systems, data preparation, inconsistency or contradictory data discovery and cleaning techniques, and extraction transformation and loading systems.

In addition, we also target professional users that deal with data quality problems.

Attendance is not limited to the paper authors. We strongly encourage interested researchers from related areas to attend the workshop. The workshop should be of interest to researchers and practitioners conducting research or building applications that involve various data analysis and rich data and knowledge representations, in particular, those from: Academic Data Mining, Commercial Data Mining, Relational Data Mining/Association Rules, Text Mining. We expect that the workshop topics will attract attention of regular EGC attendees who are interested in Data Mining, but also potentially encourage the attendance by participants interested in database techniques for data cleaning and preparation.

Workshop Program

 

9h-9h30: Welcome Fabrice Guillet and Laure Berti-Equille

 

9h30-10h30: Session 1 – Data Quality in Databases 

-          Verónika Peralta Mokrane Bouzeghoub (PRISM, Versailles St- Quentin), Data Freshness Evaluation in Different Application Scenarios

-          Laure Berti-Equille (IRISA, Rennes), Nettoyage de données XML : combien ça coûte ?

 

10h30h-10h45: Coffee break

 

10h45- 12h15: Session 2 – Quality of Discovered Association Rules

           

-          Régis Gras, Raphaël Couturier, Fabrice Guillet, Filippo Spagnolo (Ecole Polytechnique de Nantes, IUT de Belfort, Université de Palerme), Extraction de règles en incertain par la méthode implicative

-          Julien Blanchard, Fabrice Guillet, Henri Briand, Régis Gras (Ecole Polytechnique de Nantes), IPEE : Indice Probabiliste d’Ecart à l’Equilibre pour l’évaluation de la qualité des règles

-          Cyril Nortet, Ansaf Salleb, Teddy Turmeaux, Christel Vrain (LIFO Orléans, IRISA Rennes), Le rôle de l’utilisateur dans un processus d’extraction de règles d’association

 

12h15 – 14h: Lunch break

 

14 h-15h: Session 3 – Quality and classification

-          Gilbert Ritschard (Université de Genève), Arbre BIC optimal et taux d’erreur

-          Jérôme David, Fabrice Guillet, Vincent Philippé, Henri Briand, Régis Gras (Ecole Polytechnique de Nantes, PerformanSE SA), Validation d'une expertise textuelle par une méthode de classification basée sur l'intensité d'impliqué

 

15h- 16h: Session 4 - Evaluation Platforms for Knowledge Quality 

-          Xuan-Hiep Huynh, Fabrice Guillet, Henri Briand, (Ecole Polytechnique de Nantes), ARQAT: une plateforme d'analyse exploratoire pour la qualité des règles d'association

-          Benoît Vaillant, Patrick Meyer, Elie Prudhomme, Stéphane Lallich, Philippe Lenca (ENST Bretagne, Université du Luxembourg, ERIC - Université de Lyon 2), Mesurer l’intérêt des règles d’association

 

16h-16h15: Coffee break (tbc)

 

16h15-17h45: Session 5 - Operational Approaches

-          Mireille Cosquer, Béatrice Le Vu, Alain Livartowski (Institut Curie), Mise en place d’un plan d’Assurance et Contrôle Qualité du Dossier Patient

-          Gilles Amat, Brigitte Laboisse (sociétés AID, BDQS), B.D.Q.S. Une gestion opérationnelle de la qualité de données

-          David Graveleau (DGA/CTSN), SILURE, mise en oeuvre d'un meta-modèle associant traçabilité et qualité des données pour la constitution d'une base de référence multi-sources en veille technologique

 

 

17h45-18h30: Round Table and  Closing Session

 

Workshop Organization

 

Laure Berti-Équille, IRISA-CNRS Rennes, France

 

Proposed Program Committee

 

Fabrice Guillet, IRIN, Université de Nantes, France (Program Chair)

Ansaf Salleb, IRISA-CNRS Rennes, France

Jérôme Azé, LRI, Université de Paris-Sud, France

Mokrane Bouzeghoub, PRISM, Université de Versailles, France

Henri Briand, IRIN, Université de Nantes, France

Béatrice Duval, Université d’Angers, France

Johann-Christoph Freytag, Humboldt-Universität zu Berlin, Germany

Helena Galhardas, INESC, Lisboa, Portugal

Régis Gras, IRIN, Université de Nantes, France

Yves Kodratoff, LRI, Université de Paris-Sud, France

Pascale Kuntz, IRIN, Université de Nantes, France

Stéphane Lallich,  ERIC, Université de Lyon 2, France

Ludovic Lebart, ENST-CNRS, Paris, France

Philippe Lenca,  ENSTbr, Brest, France

Amedeo Napoli, LORIA, Nancy, France

Gilbert Ritschard, Université de Genève, Switzerland

Monica Scannapieco, Universita’ di Roma “La Sapienza”, Italy

Dan A. Simovici, University of Massachussets, Boston, U.S.

            Einoshin Suzuki, Yokohama National University, Japan
             Djamel Zighed, ERIC, Université de Lyon 2, France
 

Venue to DKQ

The workshop DKQ will take place at the following address: 45, rue des Saints Pères, 75006 Paris in the buildings of “UFR de mathématiques et d´informatique”.

 See the second map.

 

Paper Formatting Instructions

Following are the guidelines for formatting your papers for submission.

Papers should be in English or in French, no longer than 7 pages (or 3,000 words) inclusive of all references and figures, formatted for A4 paper, following the EGC proceedings formats. Proceedings templates are available in Word and LaTeX. All papers must be submitted in either PDF (preferred) or postscript. Please ensure that any special fonts used are included in the submitted documents. All papers must be original, and have not been published elsewhere.

Please submit the PDF or PS version of your manuscript to both of the workshop co-chairs:

 Laure Berti-Equille and Fabrice Guillet.

 

Important Dates

Submission Deadline:

December 20, 2004

Acceptance Notification:

December 27, 2004

Camera-ready Copies:

January 5, 2005

Workshop date:

January 18, 2004