Blueflood

Multi-tenanted time series datastore

Credits

Built by engineers at Rackspace.
Thanks: James Burkhart, Shane Duan, Gary Dusbabek, Chinmay Gupte, Dominic Lobue and others.
If you have product questions or comments, speak with James Colgan and Mark Everett.

What?

A giant distributed calculator that loves numbers.

What?

A time series datastore built on top of Cassandra.
Provides HTTP APIs to ingest and query data.
Supports numeric, string and boolean time series data.
Blueflood is open source. Hack away!

Why?

Need...

a time series datastore for graphs.
a multi-tenanted solution.
a datastore that scales horizontally (as number of tenants, number of metrics grow).

How?

So what?

Rackspace cloud control panel shows graphs now.
We are able to ingest billions of data points per day.

Twitter reactions

The positive ones are boring.

"Interesting rrd-like system but at cloud scale. How does it compare to #opentsdb or #kairosdb ?"
"We did build something similar to this... ...but we push tens of billions of points a day through it, and counting."
"Automatic Rollups are the new MRTG/RRDTool... many efforts to produce data that might never be read."

Why Cassandra?

High write throughput (60, 000 points/sec peak on a single box).
Reasonable read performance (depends on queries).
Cassandra data model supports time series datastore easily.
Casdandra's native TTL support.
Cassandra committer in team, devops experience at operations.
Lessons learned from CloudKick.

OpenTSDB? Kairos DB? Cyanite? Graphite? InfluxDB?

OpenTSDB is the only real competitor but Cassandra vs. HBASE.

Blueflood primary components

Ingest module - Handling incoming writes.
Rollup module - Computing aggregations/summarizations.
Query module - Handling user queries.

Ingest module

HTTP POST with JSON body.
Production now uses scribe and thrift.
Custom ingestion adapters can be written.

Metric structure

name - ord1-maas-prod-dcass0.bf.rollup_timer
value - 35.6789
ttl (in seconds) (optional) - 172800
unit (optional) - 'seconds'

Example: Publish numeric metrics

Rollup module

Fixed granularities - 5 min, 20 min, 60 min, 4 hr, 1 day.
Restrictive rollup types.
Basic rollups - mean, min, max, std. dev
Experimental statsd support for counters, timers, gauge, set
Experimental histogram support.
No rollups for strings and boolean data.

Query module

HTTP APIs, JSON response.
Batched reads of metric data is possible.
A time series is identified by metric name.
We support "Get by points" and "Get by resolution" calls.
No fancy queries yet.
Custom output adapters can be written.

Example: Retrieve numeric metrics

Blueflood optional components

Elastic Search indexer and discovery. (Experimental)
Cloud files exporter for rollups. (Experimental)
Apache Kafka exporter for rollups. (Experimental)

10, 000 ft view of Blueflood architecture

Metrics -> shards (128).
Each BF node owns a set of shards.
Each BF worker has a peer. ZK for coordination.
Time -> Slots (modulo 14 days).

Cassandra cluster

32 nodes across two data centers.
Replication factor of 3.
All read and write operations happen at ConsistencyLevel.ONE
Astyanax client library with thrift.

Blueflood deployment

Blueflood node can run in any permutation of ingest, rollup and query modes.
Blueflood nodes run on same boxes as dcass. Not required.
Blueflood chef recipes would be open sourced eventually.

Operations

Blueflood is heavily instrumented. All metrics now reported to graphite.
Rackspace monitoring agent plugins to capture KPIs.
Command line tools to dump metrics, roll data etc.

Cool story, give me data?

We ingest 1 million individual data points a minute. Peaked at 3 M/min.
We roll more than 1 million individual metrics.
We have hit a peak of 3 million Cassandra operations a minute.
Read queries are more like 500 a minute.

Team logistics

Small team with one remote developer.
Primary communication happens on IRC.
Mostly not big on process but ownership is strong.
Every merge to master must be deployed.
Instrumentation is paramount. Operational focus is vital.
Ground up product and project decisions.

Upcoming features

Graphite integration.
Tags based metrics retrieval.
Richer queries.
Aggregation functions.

Technical lessons learnt

Most major operational issues so far are due to Cassandra.
Split metrics into different column families for isolation.
Leveled compaction is bad; Use size tiered for time series data.
Live upgrade of Cassandra cluster is not easy.
Cassandra rpc type 'sync' works better for us than 'hsha'.
Migrations are hard. Think through data model carefully.
Upgrade Cassandra on every opportunity.
Distributed systems are still hard in 2014. Changes are not easy to make.

Meta lessons learnt

Y U JAVA? Java is not everyone's cup of tea.
Blueflood requires better packaging. Docker!!!
40, 000 lines of code is not fun. Open source early.
Documentation! We have documentation days now.
People ask good questions on email/IRC. Capture them.

How can I participate?

Open sourced in summer of 2013. Apache 2 license.
Most discussions happen on IRC. #blueflood on freenode.
blueflood-discuss google groups for technical discussions.

Questions?

(F)AQ

Blueflood

Multi-tenanted time series datastore

http://blueflood.io

https://intelligence.rackspace.com

Credits

What?

What?

Why?

How?

So what?

Twitter reactions

Why Cassandra?

OpenTSDB? Kairos DB? Cyanite? Graphite? InfluxDB?

Blueflood primary components

Ingest module

Metric structure

Rollup module

Query module

Blueflood optional components

10, 000 ft view of Blueflood architecture

Cassandra cluster

Blueflood deployment

Operations

Cool story, give me data?

Team logistics

Upcoming features

Technical lessons learnt

Meta lessons learnt

How can I participate?

Questions?