Distributed algorithms for Big Data
Conference
Cloud & BigData | |
Room 4A - Metrosoft |
Wednesday at 15:10 - 16:10 |
There is a big buzz about Big Data and those new disruptive technologies that will supposedly change our life. But do you know that most of them rely on well-known distributed algorithms that exist for decades already ? During this session, we'll dig into 2 popular algorithms widely used in Big Data technologies but quite new for the developers. Exact counting in a distributed world is a hard task and requires a storage proportional to the data set size to be counted. The HyperLogLog algorithm allows an accurate estimate (with a tunable error rate) of the cardinality and requires only a tiny storage. And for distributed systems with master/slave architecture, the major challenge has always been making master election safe and reliable. Paxos, a distributed consensus algorithm, provides an elegant and mathematically proven solution for this challenge. |
DuyHai DOAN |
---|
DuyHai Doan is a Cassandra technical advocate. He spends his time between technical presentations/meetups on Cassandra, coding on open source projects to support the community and helping all companies using Cassandra to make their project successful. Previously he was working as a freelance Java/Cassandra consultant |