Bonjour,

Vous êtes cordialement invités à la soutenance de thèse d’Afshin MOIN qui se tiendra le 9 juillet 2012 en salle Métivier à partir de 14 heures 30.

Le jury est composé de :

– Pierre Fraigniaud, Directeur du Liafa (Rapporteur)

– Marie-Christine Rousset, Professeur au Laboratoire d’Informatique de Grenoble (Rapporteur)

– Eric Fleury, Professeur à l’ENS de Lyon

– Arnaud Guyader, Maître de conférence à l’Université de Rennes 2

– Anne-Marie Kermarrec, Directeur de Recherche à Inria Rennes Bretagne Atlantique

Title: Recommendation And Visualization Techniques For Large Scale Data

Titre : Les Techniques De Recommandation Et De Visualisation Pour Les Données A Une Grande Echelle
Abstract

We have witnessed the rapid development of the information technology during the last decade. On one side, processing and stocking capacity of digital devices is increasing constantly thanks to advances in construction methods. On the other side, the interaction between these powerful devices has been made possible through networking technology. As a natural consequence of these progresses, the volume of the data generated in different applications has grown with an unprecedented rate. Consequently, it is becoming increasingly harder for internet users to find items and content matching their needs. Henceforth, we are confronted with new challenges to efficiently process and represent the huge mass of data at our disposal. This thesis is centered around the two axes of recommending relevant content and its proper visualization. The role of the recommender systems is to help users in the process of decision making to find items with relevant content and satisfactory quality among the large set of alternatives existing in the Web. On the other hand, the adequate representation of the processed data is central both for increasing its utility to the end-user and for designing efficient analysis tools. In this presentation, the prevalent approaches to recommender systems and the principal techniques for visualization of data in the form of graphs are discussed. Furthermore, it is shown how some of the same techniques applied to recommender systems can be modified to consider visualization requirements.
Résumé

Nous avons assisté au développement rapide de la technologie de l’information au cours de la dernière décennie. D’une part, la capacité du traitement et du stockage des appareils numériques est en constante augmentation grâce aux progrès des méthodes de construction. D’autre part, l’interaction entre ces dispositifs puissants a été rendue possible grâce à la technologie de réseautage. Une conséquence naturelle de ces progrès, est que le volume des données générées dans différentes applications a grandi à un rythme sans précédent. Désormais, nous sommes confrontés à de nouveaux défis pour traiter et représenter efficacement la masse énorme de données à notre disposition. Cette thèse est centrée autour des deux axes de recommandation du contenu pertinent et de sa visualisation correcte. Le rôle des systèmes de recommandation est d’aider les utilisateurs dans le processus de prise de décision pour trouver des articles avec un contenu pertinent et une qualité satisfaisante au sein du vaste ensemble des possibilités existant dans le Web. D’autre part, la représentation correcte des données traitées est un élément central à la fois pour accroître l’utilité des données pour l’utilisateur final et pour la conception des outils d’analyse efficaces. Dans cet exposé, les principales approches des systèmes de recommandation ainsi que les techniques les plus importantes de la visualisation des données sous forme de graphes sont discutées. En outre, il est montré comment quelques-unes des mêmes techniques appliquées aux systèmes de recommandation peuvent être modifiées pour tenir compte des exigences de visualisation.

Opportunistic Mobile Social Networks at Work

Opportunistic networks exploit human mobility and consequent device-to-device ad hoc contacts to disseminate content in a ”store-carry-forward” fashion. In opportunistic networks, disconnections and highly variable delays caused by human mobility are the norm. Another major challenge in opportunistic communications arises from the small form factor of mobile devices which introduces resource limitations compared to static computing systems. Lastly, human mobility and social interactions have a large impact on the structure and performance of opportunistic networks, hence, understanding these phenomena is crucial for the design of efficient algorithms and applications.

In this work, we take an experimental approach to better understand opportunistic mobile social networks. We design and implement of MobiClique, a communication middleware for opportunistic mobile social networking. MobiClique takes advantage of user mobility and social relationships to forward messages in an opportunistic manner. We perform a large-scale MobiClique experiment with 80 people, where we collect social network information (i.e. their Facebook profiles), and ad hoc contact and communication traces. We use the collected data with three other data sets to analyse in detail epidemic content dissemination in opportunistic networks. Most of the related works have focused on the pairwise contact history among users in conference or campus environments. We claim that given the density of these networks, this approach leads to a biased understanding of the content dissemination process. We design a methodology to break the contact traces down into “temporal communities”, i.e., groups of people who meet periodically during an experiment. We show that these communities correlate with people’s social communities. As in previous works, we observe that efficient content dissemination is mostly due to high contact rate users. However, we show that high contact rate users that are more frequently involved in temporal communities contribute less to the dissemination process, leading us to conjecture that social communities tend to limit the efficiency of content dissemination in opportunistic mobile social networks.

Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications

The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core’s cache.

We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB.

Calculabilité et conditions de progression des objets partagés en présence de défaillances

Directeur de Thèse :
– Michel Raynal, professeur Université de Rennes 1
Rapporteurs:

  • Dominique Mery, Professeur Loria
  • Franck Petit, Professeur Lip6

Examinateurs :

  • Carole Deplorte, Professeur Paris Diderot
  • Petr Kuznetsov, Chercheur TU Berlin
  • Rachid Guerraoui, Professeur EPFL Lausanne
  • Panagiota Fatourou, Assistant Professeur, University of Crete

Dans un système distribué, différents processus communiquent et se synchronisent pour résoudre un calcul global. La difficulté vient du fait qu’un processus ne connait pas les entrées des autres. Nous considérons ici un système asynchrone: on ne fait aucune hypothèses sur les vitesses d’exécution relatives des différents processus. De plus, pour modéliser les pannes, nous considérons que les processus peuvent crasher: ils peuvent arrêter leur exécution à n’importe quel endroit de leur programme.

Dans l’étude théorique des systèmes distribués, les problèmes doivent être considérés selon deux aspects: la sûreté et la progression. La sûreté définit quand une valeur de sortie est correcte. La progression définit dans quelles conditions un processus doit terminer une opération, indépendamment de la valeur qu’il choisit comme sortie.

Cette thèse se concentre sur les liens entre calculabilité et conditions de progression des objets distribués. Dans un premier temps, nous introduisons et étudions la notion de conditions de progression asymétriques: des conditions de progression qui peuvent être différentes pour différents processus du système. Nous étudions ensuite la possibilité de
fournir des abstractions dans un système donné. La question de l’équivalence de modèles de systèmes est ensuite abordée, en particulier dans le cas où les processus ont accès à des objets puissants. Pour finir, la thèse traite le sujet des tâches colorées en fournissant un algorithme de renommage adapté au cas où la concurrence est réduite. Une nouvelle classe de tâches colorées est enfin introduite qui englobe, sous un formalisme unique, plusieurs problèmes considérés jusqu’ici comme indépendants.

Calculability and progress conditions of shared objects when facing crashes

In a distributed system, different processes synchronize in order to solve a global computation. The difficulty comes from the fact that a process does not know the other inputs. We consider here asynchronous systems: no assumption can be made regarding the relative speeds of processes. We also consider that processes can crash, i.e. stop their execution at an arbitrary point of their code.

In the theoretical study of distributed systems, problems have to be considered according to two aspects: safety and progress. Safety defines whether a given value can be output. Progress defines under which conditions a process is required to terminate its operation, regardless of the value it outputs.

This thesis is about the links between calculability and progress conditions of shared objects. We start by introducing and studying the notion of asymmetric progress conditions: progress conditions that don’t necessarily impose the same requirements for different processes. We then study the possibility of supplying processes with abstractions in a given model. The issue of the equivalence of system models is then raised, especially when processes have access to strong objects. Finally, the thesis studies colored tasks. It presents a renaming algorithm that terminates if contention gets below a given threshold. It then introduces a new class of colored tasks that allows to unify in a single framework different problems which were previously studied independently.

Title: Applications and Network in Data Centers: Friends or Foes?

Abstract: Since the early days of networks, a basic principle has been that applications treat the network as a black box. An application injects a packet with a destination address and the network delivers the packet. This principle has served us well, and has enabled the Internet to scale to billions of devices using networks owned by competing companies and running applications developed by different parties. However, this approach might not be optimal for large-scale Internet data centers, such as those run by Amazon, Google, Microsoft and Facebook, in which all the components are controlled by a single entity.

In this talk, I will describe two examples where a richer interaction between applications and network is beneficial for both. First, I will briefly overview our recent research in the context of providing applications with predictable performance in multi-tenant data centers. Then, I will describe more extensively CamCube, a recent project in collaboration with Microsoft Research, in which, we have been looking at a different approach to build data centers, borrowing ideas from the fields of high performance parallel computing, distributed systems and networking. We use a direct-connect topology, similar to those used in HPC, and a novel networking stack, which supports a key-based routing functionality. By providing applications with a more fine-grained control on network resources, CamCube enables increasing performance and reducing development complexity and cluster costs. I will describe and motivate its peculiar design choices and then discuss a number of services that we implemented on CamCube. These include a MapReduce service that provides significant higher performance than existing solutions running on traditional clusters.

Bio: Paolo Costa holds an Imperial’s fellowship at the Department of Computing of Imperial College London. Before joining Imperial, he spent 2.5 years in the Systems and Networking Group of the Microsoft Research Lab in Cambridge. Prior to that, he was a Postdoctoral Researcher in the Computer Systems group at Vrije Universiteit Amsterdam. Paolo holds a M. Sc. and Ph.D. degree in Computer Engineering from the Politecnico di Milano, received, respectively, in 2002 and 2006.

Where: INRIA Rennes – Salle Sardaigne, Batiment 12 F, Campus de Beaulieu

When: On Wed 08-02-2012, at 2pm (until 3pm)

Organized by: ASAP Team – INRIA Rennes

Please confirm your presence by sending an email to: davide.frey@inria.fr

« Newer Posts - Older Posts »