Direction des Relations Internationales (DRI)
EQUIPE
ASSOCIEE |
DataCloud@work |
sélection |
2010 |
Equipe-Projet INRIA : KerData |
Organisme étranger partenaire / Partner Institution: Politehnica University of Bucharest (PUB) |
Centre de recherche INRIA : Rennes - Bretagne Atlantique Thème INRIA : Réseaux, systèmes et services, calcul distribué |
Pays / Country : Romania |
|
Coordinateur français / French Coordinator |
Coordinateur étranger / Partner Coordinator |
Autre partenaire français / Other French Partner |
Nom, prénom / First name, Given name |
ANTONIU Gabriel |
CRISTEA Valentin |
MORIN Christine |
Grade, statut / Position |
Chargé de recherche |
Professeur |
Directrice de recherche |
Organisme
d'appartenance/
Home Institution |
INRIA, Centre Rennes - Bretagne Atlantique Equipe KerData |
National Center for International Technology (NCIT) Politehnica University of Bucharest (PUB) |
INRIA, Centre Rennes - Bretagne Atlantique Equipe-projet PARIS |
Adresse postale / Postal address |
Campus de Beaulieu, 35042 Rennes cedex |
313, Splaiul Independentei, 0600042, Bucuresti, Romania |
Campus de Beaulieu, 35042 Rennes cedex |
URL / Website |
http://csite.cs.pub.ro/index.php/en/component/comprofiler/?task=userProfile&user=73/ |
http://www.irisa.fr/paris/web/component/option,com_uhp/task,view/Itemid,110/id,40/ |
|
Téléphone / Telephone |
+33 2 99 84 72 44 |
+40 214 029 332 |
+33 2 99 84 72 90 |
Télécopie / Fax |
+33 2 99 84 71 71 |
+40 214 029 333 |
+33 2 99 84 71 71 |
Courriel / Email |
gabriel.antoniu@inria.fr |
valentin.cristea@cs.pub.ro |
christine.morin@inria.fr |
NOTA: Si la proposition d'Equipe Associée
comporte plusieurs partenaires, français et/ou étrangers,
vous pouvez :
- soit ajouter une colonne,
- soit dupliquer le
tableau ci-dessus autant de fois que nécessaire, en remplaçant
"Coordinateur français ou étranger" par
"Autre participant français ou étranger".
/
In the case of multiple INRIA project-teams and/or multiple foreign
partners, applicant may:
- either add another column on the
right
- or duplicate the above table as many times as needed, and
replace "French coordinator" / "Partner coordinator"
by "Other french or partner Participant"
La
proposition en bref /
The proposal in brief
Titre de la thématique de collaboration (en français et en anglais) / Title of the collaboration theme (in French and in English) : Stockage Autonome pour les Services sur Clouds / Autonomic Storage for Cloud Services |
Descriptif (environ 10 lignes) / Description (approximately 10 lines) : While the cloud computing paradigm is progressively being adopted by companies wishing to deliver large-scale distributed services, such as Amazon, IBM, Google or Yahoo!, other research efforts in the area of large-scale distributed computing are exploring the concept of a grid operating system. Both kinds of systems aim at providing seamless access to a powerful distributed processing infrastructure, while hiding as much as possible all aspects related to the management of the underlying physical resources. In both contexts, data management is a key issue. It significantly impacts the quality of service being delivered by such distributed infrastructures. In this project, we aim at investigating ways to provide advanced, autonomic storage mechanisms for cloud services. More specifically, the goal is explore how to build an efficient, secure and reliable storage service for data intensive distributed applications running in cloud environments by enabling an autonomic behavior. In addition, we will leverage the grid operating system approach as a cloud technology (e.g., by relying on its OS-support for virtual organizations). For validation purposes, experimental prototypes will be implemented based on the BlobSeer data-sharing platform (designed by the KerData Team), on the MonALISA monitoring framework (using the expertise of the PUB Team), and on the XtreemOS grid operation system (designed under the leadership of the PARIS Team). The work will also include interactions with the Nimbus team from Argonne National Lab, led by Kate Keahey: experiments will be carried out using the Nimbus cloud software. The validation phase will include intensive, large-scale experiments on the ALADDIN-Grid'5000 grid testbed. |
The emerging cloud computing model [1,2,3] is gaining serious interest from both industry and academia in the area of large-scale distributed computing. It provides a new paradigm for managing computing resources: instead of buying and managing hardware, users rent virtual machines and storage space. Various cloud software stacks have been proposed by leading industry companies, like Google, Amazon or Yahoo!. They aim at providing fully configurable virtual machines or virtual storage (IaaS: Infrastructure-as-a-Service [4,5,6]), higher-level services including programming environments such as Map-Reduce [7] (PaaS: Platform-as-a-Service [8,9]) or community-specific applications (SaaS: Software-as-a-Service [10,11]). On the academic side, one of the most visible projects in this area is Nimbus [5,12], from the Argonne National Lab (USA), which aims at providing a reference implementation for a IaaS. In parallel to these trends, other research efforts focused on the concept of grid operating system: a distributed operating system for large-scale wide-area dynamic infrastructure spanning multiple administrative domains. XtreemOS [13, 14] is such a grid operating system, which provides native support for virtual organizations. Since both the cloud approach and the grid operating system approach deal with resource management on large-scale distributed infrastructures, the relative positioning of these two approaches with respect to each other are currently subject to on-going investigation within the PARIS Project-Team (http://www.irisa.fr/paris/web/) at INRIA Rennes - Bretagne Atlantique: a preliminary discussion is available in [15].
Both in the contexts of the emerging cloud infrastructures and in that of grid operating systems, some of the most critical open issues relate to data management. The KerData research team (http://www.irisa.fr/kerdata/) of INRIA Rennes - Bretagne Atlantique, has recently been created with the goal of exploring ways to address the main challenges raised by data storage and management on cloud infrastructures. The team is designing and implementing BlobSeer [16, 17], a generic data-sharing platform which aims at providing support for storing massive data with fine-grained access control under heavy concurrency on large-scale distributed infrastructures. In addition, it will support versioning and decentralized metadata management. Providing the users with the possibility to store and process data on externalized, virtual resources from the cloud requires simultaneously investigating important aspects related to security, efficiency and quality of service. To this purpose, it clearly becomes necessary to create mechanisms able to provide feedback about the state of the storage system along with the underlying physical infrastructure. This information thus monitored, can further be fed back into the storage system and used by self-managing engines, in order to enable an autonomic behavior, possibly with several goals such as self-configuration, self-optimization, or self-healing. To start moving towards this goal, the KerData Team has started to work with the Distributed Systems and Grids team from NCIT (PUB, Romania) on the design of preliminary introspection mechanisms for BlobSeer. This work is relying on MonALISA [18,19], a general purpose monitoring framework whose main contributors belong to the PUB Team. This preliminary work is detailed in [20].
In this project, we aim at investigating several open issues related to autonomic storage in the context of cloud services. The goal is explore how to build an efficient, secure and reliable storage IaaS for data-intensive distributed applications running in cloud environments by enabling an autonomic behavior, while leveraging the advantages of the grid operating system approach (such OS-support for virtual organizations). For validation purposes, experimental prototypes will be implemented based on the BlobSeer data-sharing platform (designed by the KerData Team), on the XtreemOS grid operation system (designed under the leadership of the PARIS Team) and on the MonALISA monitoring framework (using the expertise of the PUB Team). This work will also include involvement with the Nimbus team from Argonne National Lab, led by Kate Keahey. Experiments will be carried out with the Nimbus cloud software infrastructure. The validation phase will include intensive, large-scale experiments on the Grid'5000 [21,22] grid testbed. We have divided the work in three main areas (each of which corresponds to one of the three years of the project), as described below.
Scenario:
Infrastructure as a Service (IaaS) is the delivery of computer
infrastructure (typically a platform virtualization environment) as a
service. The client typically runs a distributed application using
virtual machines (VMs) rented from a service provider. The client
applications are executed by the service provider as a set of virtual
machines in a secure environment that enforces several restrictions,
according to some pre-established contract. In such a context, access
to local storage space on the physical machine where the application
is running (owned by the service provider) is typically denied.
Clients are instead provided with a specialized storage service they
can access directly, through a specific API (e.g., Amazon S3 [23]).
Role of BlobSeer: In this context, the BlobSeer storage system will serve to enable the IaaS provider to offer advanced data sharing facilities to collaborating clients running within distinct VMs on the IaaS. BlobSeer's API will be directly made available as a distributed file system (e.g., within a given virtual organization). BlobSeer exposes a multiversioning interface which can be used in two ways: (1) to enable application data checkpointing (as part of checkpointing the application itself) and (2) to expose a multiversioning interface directly at application level through a specific access API. In the second phase, we will also enable the IaaS provider to allow client applications to share application data through a standard POSIX file system API. File system calls are transparently mapped to specific, secured data accesses to the internal storage service implementing data sharing for multiple VMs that form a given distributed application.
Role of MonALISA: First, the MonALISA monitoring framework includes automated management functions performed by higher-level, agent-based services. We will use these facilities to define a self-adaptive, autonomic behavior of BlobSeer through optimized, dynamic control for large-scale data transfers on dedicated circuits, data-transfer scheduling, distributed data scheduling, automated management and performance prediction of remote storage services (e.g., BlobSeer's Data Providers). Second, MonALISA will serve to introduce client monitoring , in order to ensure that the contract established with the provider is being respected. Related to security in this context, the storage service has to be aware of the different types of clients and of their access rights. Based on configurable policies that can be implemented based on MonALISA, BlobSeer will support different access patterns and enforce adaptive security rules. Moreover, the MonALISA monitoring framework can be used to monitor and to detect malicious behavior. In case of such events, MonALISA will alert the administrators or automatically implement pre-defined policies (e.g., blacklisting users and banning access for specific periods of time). Finally, the same mechanism can also be used to build a consistent reputation system that than further be used by the IaaS provider when scheduling storage resources to users.
Role of XtreemOS: Here, XtreemOS will be used as an internal cloud technology: the IaaS is XtreemOS and the secure environment where the application is running is the virtual machine itself. In its current version, XtreemOS internally relies on XtreemFS [24,25] for distributed data sharing. Our goal is to explore the possibility of using BlobSeer as an advanced, version-enabled, concurrency-optimized storage back-end, used by operating systems running inside VMs.
Scenario: We consider an
Infrastructure as a Service (IaaS) provider which exposes a
specialized storage service to the client applications that run on
the rented virtual machines, as presented in the scenario above. The
storage service is a large-scale distributed application itself,
which has to be able to efficiently handle massive data and heavy
concurrent accesses. Therefore it needs to run on a large number of
physical storage nodes, which can be rented from other, possibly
multiple second-level IaaS providers. Each such second-level IaaS
provider has its own pricing policies, charging clients for the
number of hours they use the resources, for the amount of stored data
or for the amount of network traffic generated, while offering
different QoS levels. In this context, it is important to design a
cost-effective scheduling policy for our first-level IaaS (in terms
of money spent), for deciding how to provision storage space from the
various second-level IaaS providers.
Role of BlobSeer: We will design for BlobSeer a cost-based scheduler for its storage manager in order to select one or more IaaSes that will provide the external, virtualized storage resources needed by BlobSeer. The goal is to minimize the costs of storing and transferring the data and to preserve agreed QoS levels. Clients benefit transparently from minimized storage costs, whereas BlobSeer seamlessly handles the dynamic migration of its virtualized storage hosts from one IaaS provider to another.
Role of MonALISA: In this case, the MonALISA framework is an essential building block, necessary for building an efficient cost-aware BlobSeer provider manager. Its contribution is twofold: MonALISA will monitor both the data providers for QoS evaluations and the corresponding IaaS sites for pricing information. We rely on MonALISA's ability to collect and store data from a large number of nodes in near real time and quickly retrieve it on demand. MonALISA provides an abstract data API, thus enabling the user to define which is the relevant information that has to be collected, i.e., the data needed for selecting the best IaaS providers, both in terms of cost and of node capabilities (network latency, memory size, storage space). As the process of monitoring numerous nodes or services yields a large volume of raw data which have to be stored and interpreted (millions of published parameters with high update frequency rates), we will rely on MonALISA's advanced mechanisms such as dynamically-loadable filters to select and aggregate relevant information.
Role of XtreemOS: The future direction for XtreemOS is to explore how its technology can help for federating clouds. In this context, XtreemOS aims to offer a unified cloud image, while being set up on top of several cloud infrastructures. An important aspect here is data sharing at virtual organization level, task currently fulfilled by XtreemFS. However, as XtreemFS was designed in the context of grid computing, it does not include cost-effective resource management for the case where the resources are provisioned from clouds. We will investigate how BlobSeer can do this job of implementing cost-effective storage on top of multiple, external IaaS providers.