Blueflood

Multi-tenanted time series datastore

http://blueflood.io

https://intelligence.rackspace.com



Lakshmi Kannan
https://github.com/lakshmi-kannan

Credits

  • Built by engineers at Rackspace.
  • Thanks: James Burkhart, Shane Duan, Gary Dusbabek, Chinmay Gupte, Dominic Lobue and others.
  • If you have product questions or comments, speak with James Colgan and Mark Everett.

What?

A giant distributed calculator that loves numbers.

What?

  • A time series datastore built on top of Cassandra.
  • Provides HTTP APIs to ingest and query data.
  • Supports numeric, string and boolean time series data.
  • Blueflood is open source. Hack away!

Why?

Need...

  • a time series datastore for graphs.
  • a multi-tenanted solution.
  • a datastore that scales horizontally (as number of tenants, number of metrics grow).

How?

So what?

  • Rackspace cloud control panel shows graphs now.
  • We are able to ingest billions of data points per day.

Twitter reactions

The positive ones are boring.

  • "Interesting rrd-like system but at cloud scale. How does it compare to #opentsdb or #kairosdb ?"
  • "We did build something similar to this... ...but we push tens of billions of points a day through it, and counting."
  • "Automatic Rollups are the new MRTG/RRDTool... many efforts to produce data that might never be read."

Why Cassandra?

  • High write throughput (60, 000 points/sec peak on a single box).
  • Reasonable read performance (depends on queries).
  • Cassandra data model supports time series datastore easily.
  • Casdandra's native TTL support.
  • Cassandra committer in team, devops experience at operations.
  • Lessons learned from CloudKick.

OpenTSDB? Kairos DB? Cyanite? Graphite? InfluxDB?

OpenTSDB is the only real competitor but Cassandra vs. HBASE.

Blueflood primary components

  • Ingest module - Handling incoming writes.
  • Rollup module - Computing aggregations/summarizations.
  • Query module - Handling user queries.

Ingest module

  • HTTP POST with JSON body.
  • Production now uses scribe and thrift.
  • Custom ingestion adapters can be written.

Metric structure

  • name - ord1-maas-prod-dcass0.bf.rollup_timer
  • value - 35.6789
  • ttl (in seconds) (optional) - 172800
  • unit (optional) - 'seconds'

Example: Publish numeric metrics

Rollup module

  • Fixed granularities - 5 min, 20 min, 60 min, 4 hr, 1 day.
  • Restrictive rollup types.
  • Basic rollups - mean, min, max, std. dev
  • Experimental statsd support for counters, timers, gauge, set
  • Experimental histogram support.
  • No rollups for strings and boolean data.

Query module

  • HTTP APIs, JSON response.
  • Batched reads of metric data is possible.
  • A time series is identified by metric name.
  • We support "Get by points" and "Get by resolution" calls.
  • No fancy queries yet.
  • Custom output adapters can be written.

Example: Retrieve numeric metrics

Blueflood optional components

  • Elastic Search indexer and discovery. (Experimental)
  • Cloud files exporter for rollups. (Experimental)
  • Apache Kafka exporter for rollups. (Experimental)

10, 000 ft view of Blueflood architecture

  • Metrics -> shards (128).
  • Each BF node owns a set of shards.
  • Each BF worker has a peer. ZK for coordination.
  • Time -> Slots (modulo 14 days).

Cassandra cluster

  • 32 nodes across two data centers.
  • Replication factor of 3.
  • All read and write operations happen at ConsistencyLevel.ONE
  • Astyanax client library with thrift.

Blueflood deployment

  • Blueflood node can run in any permutation of ingest, rollup and query modes.
  • Blueflood nodes run on same boxes as dcass. Not required.
  • Blueflood chef recipes would be open sourced eventually.

Operations

  • Blueflood is heavily instrumented. All metrics now reported to graphite.
  • Rackspace monitoring agent plugins to capture KPIs.
  • Command line tools to dump metrics, roll data etc.

Cool story, give me data?

  • We ingest 1 million individual data points a minute. Peaked at 3 M/min.
  • We roll more than 1 million individual metrics.
  • We have hit a peak of 3 million Cassandra operations a minute.
  • Read queries are more like 500 a minute.

Team logistics

  • Small team with one remote developer.
  • Primary communication happens on IRC.
  • Mostly not big on process but ownership is strong.
  • Every merge to master must be deployed.
  • Instrumentation is paramount. Operational focus is vital.
  • Ground up product and project decisions.

Upcoming features

  • Graphite integration.
  • Tags based metrics retrieval.
  • Richer queries.
  • Aggregation functions.

Technical lessons learnt

  • Most major operational issues so far are due to Cassandra.
  • Split metrics into different column families for isolation.
  • Leveled compaction is bad; Use size tiered for time series data.
  • Live upgrade of Cassandra cluster is not easy.
  • Cassandra rpc type 'sync' works better for us than 'hsha'.
  • Migrations are hard. Think through data model carefully.
  • Upgrade Cassandra on every opportunity.
  • Distributed systems are still hard in 2014. Changes are not easy to make.

Meta lessons learnt

  • Y U JAVA? Java is not everyone's cup of tea.
  • Blueflood requires better packaging. Docker!!!
  • 40, 000 lines of code is not fun. Open source early.
  • Documentation! We have documentation days now.
  • People ask good questions on email/IRC. Capture them.

How can I participate?

  • Open sourced in summer of 2013. Apache 2 license.
  • Most discussions happen on IRC. #blueflood on freenode.
  • blueflood-discuss google groups for technical discussions.

Questions?