Notes From Yaron Minsky Talk: A Whirlwind Tour of Consistency

Here are my notes I took during Yaron Minsky’s talk about Distributed Systems.  They largely mirror his slides.  As usual, if WordPress has screwed up the formatting, visit:

Yaron Minsky – A Whirlwind Tour of Consistency:

Distributed Systems are really important when building large systems
They can be really confusing – complicated ideas – not well separated out

A story of Two Systems:
SKS (he wrote)
Public Key Encryption (OpenPGP Database)
Replicated, robust, queryable store
Every node should know all of the keys
PKS was precursor to SKS

“Distributed Coordination Service”
looks like an in-memory file-system.  Used for:
Configuration Management

API looks like a File System
Tree like Data Structure
Create a Node
Use ls

In-Memory System – lots of little bits of Data

Used for lots of things
Solid place from which to share their data

Run a lot of applications from a Master Database without having multiple instances of a Database

Configuration Management:
Stores all configurations

The middle point between 2 different services that rely on each other
Publishes the Mapping from Semantic Locations to Physical Locations
What I want to Where I get it

What’s the Service & how is it implemented?
Rendezvous is some way of finding the implementation

SKS Implementation
Replication driven by random pairwise reconciliation
Every node know some subset of its neighbors & then they talk
Like spreading Mono
Gossip/Epidemic Distribution
Cute algorithm to discover differences
Merge by set union + key merge

Commutative, Associative Merge Operators ~= CRDT
Commutative = Order Doesn’t Matter
Idempotent – Merge it once & never have to do it again??

Has a Hash of key & all of associated metadata
Merge on the keys themselves.
Generates a new key with a new hash

CRDT = Commutative Replicative Data Type
Most things are not CRDTs
CRDTs have to be Monotonic
Adding is progress, must be moving in that direction
If it included Deletion, this would be conflicting

Deletion could be a special type of Merge

Very easy to run in a loosely organized community

Zookeeper Implementation
ZAB: Zookeeper Atomic Broadcast
Single leader handles all writes (hopefully)
State change messages delivered to all hosts in same order, even with multiple leaders
Progress requires majority
Leader failure => election

Atomic means that either everybody gets it, or nobody gets it
Message logs are Critical
Known as State Machine Replication
ZAB ~= Paxos & Raft

The point is that it’s consistent replication (sequence-oriented) anywhere in the system

One Node is elected the leader.
System is changed by going to Leader.
Leader Broadcasts to others in the same Order
This even occurs when there are Multiple Leaders

Everything is seen by Everyone – No Matter What

A Leader only needs to get its message to a Majority to make sure things are distributed properly
(Majority must share players with its members?)

Trying to simulate a Single Computer on top of the Network

eventual consistency
Highly Available

Strong Consistency
General Purpose
Pretty Available

Why Impossibility Results?
So you know what’s impossible :p
Under the following set of assumptions: The following thing is not possible

“Proving impossibility results causes us to take a very analytical approach to understanding the area.  It causes us to state carefully…”
Quote From: A Hundred Impossibility Proofs for Distributed Systems

Impossibility #1:
Setup (Hypothetical Circumstances):
Collection of Hosts
Asynchronous Network (no timing info)
One node may fail

All hosts agree on a single bit (Must be either 0 or 1)
Not trivial (Can’t just write a program that only outputs 1)

No non-trivial deterministic algorithms that are both *safe* and *live*
Safe – Never does the wrong thing
Live – Eventually Finishes
Is possible if you allow Randomization

Safe under particular set of assumptions
Provided things are good with your network, you make eventual progress

Two-Phase Commit
Go to everyone to get commit
Once everyone agrees, then the commit is done

If not everyone agrees, you may have to switch leaders – the system can get caught up

Impossibility #2 – CAP Theorem:
Three desirable properties for your distributed systems:
(strong) consistency

pick any two!!

CAP Theorem (revised):
Consistency in the face of partition (CP)
Availability in the face of Partition (AP)
Service continues to be usable
Can look at this on a Micro or Macro Level

Pick One

Consistency is a Safety Property
Availability is a Live Property

Pick Two: AP
Examples: SKS, DNS, Riak, web-caches, git

Allow local deviations
Be optimistic

Availability on Different Time Scales is like Latency

Note: Git *is* a *Distributed* Version Control System

Pick Two: CP

Examples: Zookeeper, RDBMS,

[missed the rest of slide]

Pick Three, sort of:
Maintain C and A when not partitioned
Give up some of each during a partition
Disable some critical operations
Allow some inconsistency, and repair later

Recommended Reading:
CAP Theorem, 12 years later:
Call me maybe
Call me maybe on consistency:

Q & A:

How to learn Jepsen Jargon:
See Recommended Reading

Zookeeper invented their own protocol.  Hard to get right initially – why did they do this:
Programmers usually like to come up with new protocol – it’s fun
Usually with the point of optimizations & customizations
Paper: The Part Time Parliament – Described the original Paxos Algorithm
Paxos Made Simple & Paxos Made Moderately Complex – two papers written on this
Whole industry trying to explain Paxos
Other people put out a Raft Paper
Raft is an easier to understand Paxos
People who use Paxos always have to put in extra optimizations

Ron Minskey – Practical Set Reconciliation

Jane Street – focuses on Consistency?
That’s correct – we’re not concerned with being clever
When things fail – it’s often tolerable to roll back a bit & use a lot of the State
State that you care about tends to be kept by other players

NASDAQ is totally built on State Exchange Reconciliation
When it fails, A Human has to come in & restore it from a backup
That single Leader processors 3 million transactions per second
Actually runs on Java – so Java *can* go fast
Could be done in C or NoCamal(?)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s