A State Machine Datastore in the Wild
Richard Crowley
Betable operations
Richard Crowley
Betable operations
Gambling-as-a-Service
We do the math and the money
We’re licensed and regulated so game developers don’t have to be
Single-player
Single-event
Against the house
Easy to model as a REST API
Multi-player
Multi-event
Still against the house
Not RESTful
Most events are asynchronous from a player’s perspective
www.ietf.org/rfc/rfc6455.txt
Not completely busted anymore!
All Cassandra, all the time
N=3
R=2
W=2
(quorum reads and writes)
Naturally, we tried this first
Sell 1000 bingo cards “instantly”
Don’t sell more than 1000
Don’t sell more than 4 to any player
Don't sell two identical cards
Don’t sell any after the first ball is drawn
Doubly-linked list
Optimistic concurrency control
Two column families: items
and pointers
(with LongType
columns)
Death by roundtrip
Write new tail to items
and include the previous tail’s row key
Write to the previous tail’s row in pointers
with a timestamp and the new tail’s row key
Read the previous tail’s row in pointers
to see if our write won the append race
Consistent hashing of entities to processes
Locked read-modify-write over the network
Also death by roundtrip
At this point, why even use Cassandra?
Lacks expressiveness
(read: Turing-completeness)
SELECT FOR UPDATE
Also also death by roundtrip
Enforce complex constraints on data
Efficient in terms of network I/O
Scale horizontally
1000 bingo card strawman
Nothing more permanent
than a temporary solution
Small scale makes a custom
solution plausible
Arbitrary size and structure
Unit of atomicity
Unit of distribution
Examples: RouletteTable
, RouletteRound
, but not RouletteBet
type Table struct { ID string GameID string RoundID string Players []Player NextTransition time.Time // ... }
Mechanism of all reads and writes
Analogous to stored procedures
Apply to (at most) one entity
Examples: CreateRouletteTable
, CreateRouletteRound
, CreateRouletteBet
, and ResolveRouletteRound
type CreateTable struct { TableID string // ... } func (ct *CreateTable) Prepare(pc PrepareContext) error { return nil } func (ct *CreateTable) Apply(ac ApplyContext) (*Table, error) { t := &Table{ ID: ct.TableID, // ... } if err := ac.Store(t.ID, t); nil != err { return nil, err } return t, nil }
Dispatch other transitions
Generate data
Mutate the transition
PrepareContext
has Dispatch
, Load
, and more but not Store
, LoadExcl
,
or StoreExcl
func (ct *CreateTable) Prepare(pc PrepareContext) error { r, err := pc.Dispatch(&CreateRound{ // ... }) if nil != err { return err } ct.RoundID = r.ID return nil }
Cross-entity operations
No transactional guarantees
Efficient for humans and networks
Transition written to write-ahead log
Enforce domain-specific constraints
Update a stored entity
ApplyContext
has Load
, Store
, LoadExcl
, StoreExcl
, and more
GetEntity
reads an entity given a type and a key
GetIndex
reads a secondary index given its name and a secondary key
UpdateEntity
could theoretically update fields in any entity but we haven’t needed it
Atomicity: entity writes are as atomic as you want
Consistency: weak guarantees since transitions contain arbitrary code
Isolation: within an entity when requested
Durability: fsync(2)
the write-ahead log
Consistency: transitions always applied by their coordinator
Availability: no second-choice coordinators
Partition tolerance: unreachable coordinators don’t make reachable coordinators unavailable
Atomicity of entity-spanning operations
Locality of any two entities
See Pat Helland’s Life beyond Distributed Transactions: an Apostate’s Opinion in CIDR 2007
Transitions declare a distribution key
Distribution key conventionally identifies
an entity
Consistent hash function maps transitions to a coordinator process
Command-line flags easily configured
by Puppet
Declare peers on your consistent hash ring
Declare masters from which to replicate
Master-slave
Master-master
Next N-1
peers
One listening socket per process
Clients connect to any process
Peers proxy transitions to their coordinator
Clients can embed the consistent hashing algorithm as an optimization
Asynchronous
Loosely inspired by MySQL
Replicate transitions, which are small, not entities, which may be very large
Prepare
not called when replicating
Slave sends a BeginReplication
transition with an Index
and Offset
Master streams transitions from the initial Index
and Offset
forward
Index
and Offset
logged with replicated transitions and read from the log to resume
type Table struct { ID string `index:"-"` // Primary key GameID string `index:"GameID_TableID"` // Secondary key // ... }
Declare a primary key for the entity
Declare fields to be indexed
Automatically updated during
Store
and StoreExcl
Eventually consistent via replication
Not atomic with entity storage
Objects and blobs
ObjectStore
and BlobStore
interfaces
Load
, Store
, LoadExcl
, and StoreExcl
BSONObjectStore
serializes and stores objects in a BlobStore
The key is prefixed with the object’s type
FileBlobStore
stores blobs
on the filesystem
Store
and StoreExcl
are atomic
via rename(2)
LoadExcl
and StoreExcl
use flock(2)
+---------+----------+----------+---------+ | | | | | | version | flags | length | type | | (uint8) | (uint16) | (uint16) | (UTF-8) | | | | | | +---------+----------+----------+---------+ +----------+----------------+ | | +------------+ | | length | | transition | | | (uint32) | | (BSON) | | | | +------------+ | +----------+----------------+
Type prefix facilitates deserialization
Naive and expensive
It’s easy to create protocol buffer-like versioning and extensibility by being careful with BSON
go-bson
nicely mirrors encoding/json
go-bson
actually feels more naturally Go than goprotobuf
Always google it as “golang”
Brevity
Static type system
Compiles to x86, ARM, etc.
CSP
Thompson, Pike, Cox, Griesemer
Brad Fitz
Entities and transitions are declared
in the datastore
Awkward development workflow
Can test each side of distribution/replication as units
Start 3 services in one Go process and sleep a bit to test together
Jump through hoops to test network errors
Used to allow Dispatch
in Apply
methods
Prevent double-applying in replication
Non-deterministic replication
Eliot Moss’s Nested Transactions:
An Approach to Reliable
Distributed Computing
We’re backing more and more services with this storage engine every day
It’s less of a brain teaser each time
Replication improvements
Next-generation secondary indexes
Declaring entities and transitions outside the datastore
High-availability coordinators