CAP and Eventual Consistency


Source: https://medium.com/@marton.waszlavik/demystifying-cap-theorem-eventual-consistency-and-exactly-once-delivery-guarantee-ed20cf7cc877

CAP:

  1. Consistency ( C )
  2. Availability ( A )
  3. Partition Tolerance ( P ): certain sequence of communication events can be lost between the parties. Eg. one server-node restart in a cluster.

Choose between C or A. P is a fact. So a system can be:

  1. CP
  2. AP

Discussion Points

  1. Most systems can be configured for CP / AP behavior. It is wrong to say that MySQL is a CP system.
  2. Same system may treat one request as CP and other as AP. Eg. Money transfer -> CP; product like request -> AP.
  3. Consistency can be introduced to AP by way of Eventual Consistency. This is NOT C, which denotes Strong Consistency. All AP systems are not eventually consistent–it is a design choice.
  4. Split-brain: A AP system that is not eventually consistent.

Implementations and perf

Scenario: client sends request to server-cluster. Request times out.

cap.png

Two General's problem:

  1. The request may have got lost.
  2. The request may have been processed by the server, but the response from the server may have been lost.

System design choices:

  1. Fire-and-forget client. The message may or may not be received by the server. Aka: at most once delivery guarantee.
  2. The client retries. Aka: at least once delivery guarantee. There are two possibilities:
    1. The server receives the message once.
    2. The server receives the message multiple times.

(2) is complex to design: re-transmissions, ordering of messages, temporary message buffers etc. needs to be considered.

Money transfer usecase: We need exactly once delivery guarantee. Design:

Client sends unique identifier (uid) per request. The server cluster checks the uid to verify if the request has been processed already, before processing. Scenario: server A receives the message, when server B is down; server B comes up, and server A goes down; if message is sent to server A, and retried to B, there must not be any duplicates. Meaning: uid check needs to be done on the whole cluster, not on per server basis.
Posted on Tue May 01 23:01:50 EDT 2018 by Subhash Chandran
cap systems architecture