DuyHai Doan is a Cassandra technical advocate. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant
There is a big buzz about Big Data and those new disruptive technologies that will supposedly change our life. But do you know that most of them rely on well-known distributed algorithms that exist for decades already ?
During this session, we'll dig into 2 popular algorithms widely used in Big Data technologies but quite new for the developers.
Exact counting in a distributed world is a hard task and requires a storage proportional to the data set size to be counted. The HyperLogLog algorithm allows an accurate estimate (with a tunable error rate) of the cardinality and requires only a tiny storage.
And for distributed systems with master/slave architecture, the major challenge has always been making master election safe and reliable. Paxos, a distributed consensus algorithm, provides an elegant and mathematically proven solution for this challenge.
For this workshop, we’ll use Cassandra and Spark to create an online music service à-la Spotify. You’ll learn how to use Spark for processing and aggregating raw data coming from Cassandra. All processed data will be saved back into Cassandra so that they can serve a live web application, your own Spotify service