Title :

Publish/Subscribe for Large-Scale Social Interaction: Design, Analysis and Resource Provisioning

Abstract:

Publish/subscribe (pub/sub) is a popular communication paradigm in the design of large-scale distributed systems. We are witnessing an increasingly widespread use of the pub/sub for wide array of applications both in industry and academia and yet there is a lack of detailed study of a large-scale real-world pub/sub system. In our work we present an overview of a pub/sub system used to drive social interaction at Spotify. We then present a detailed analysis of the traces from real deployment of Spotify pub/sub. Inspired by the peer-assisted solution used by Spotify to stream music, we explore a similar solution to disseminate messages of Spotify pub/sub to the users. The task of distributing the workload among user peers and datacenter servers prompts a fundamental problem: How to select a subset of pub/sub workload to be served by datacenter servers in a manner to maximise satisfaction requirements of users under resource constraints?

In our recent work, we provide, to the best of our knowledge, the first formal treatment of the above problem by introducing two metrics that capture subscriber satisfaction in the presence of limited resources. This allows us to formulate the problem as two new flavors of maximum coverage optimization problems. Unfortunately, both variants of the problem prove to be NP-hard. By subsequently providing formal approximation bounds and heuristics, we show, however, that efficient approximations can be attained. We validate our approach using real-world traces from Spotify and show that our solutions can be executed periodically in real-time in order to adapt to workload
variations.

Further, we try to answers to the following three fundamental questions: Given a pub/sub workload, (1) what is the minimum amount of resources needed to satisfy all the subscribers, (2) what is a cost-effective way to allocate resources for the given workload, and (3) what is the cost of hosting it on a public Infrastructure-as-a-Service (IaaS) provider like Amazon EC2.