LWB-7208 Distributed algorithms for Big Data | Devoxx

Distributed algorithms for Big Data


cloud Cloud & BigData

Room 4A - Metrosoft

Wednesday at 15:10 - 16:10

There is a big buzz about Big Data and those new disruptive technologies that will supposedly change our life. But do you know that most of them rely on well-known distributed algorithms that exist for decades already ?

During this session, we'll dig into 2 popular algorithms widely used in Big Data technologies but quite new for the developers.

Exact counting in a distributed world is a hard task and requires a storage proportional to the data set size to be counted. The HyperLogLog algorithm allows an accurate estimate (with a tunable error rate) of the cardinality and requires only a tiny storage.

And for distributed systems with master/slave architecture, the major challenge has always been making master election safe and reliable. Paxos, a distributed consensus algorithm, provides an elegant and mathematically proven solution for this challenge.


DuyHai Doan is a Cassandra technical advocate. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant