Tumblr Engineering (Posts tagged databases)

1.5M ratings
277k ratings

See, that’s what the app is perfect for.

Sounds perfect Wahhhh, I don’t wanna

Juggling Databases Between Datacenters

    Recently we went through an exercise where we moved all of our database masters between data centers. We planned on doing this online with minimal user impact. Obviously when performing this sort of action there are a variety of considerations such as cache consistency and other pieces of shared state in stores like HBase, but the focus of this post will be primarily on MySQL.

    During this move we had a number of constraints. As mentioned above this was to be online when serving production traffic with minimal user impact. In aggregate we service hundreds of thousands of database queries per second. Additionally we needed to encrypt all data transferring between data centers. MySQL replication supports encryption, but connections to the servers themselves present several challenges. Specifically, from a performance standpoint the handshake to establish a connection across a WAN can impact latency if there is significant connection churn. Additionally, servicing read queries across a backhaul link adds latency, which is never desirable.

    We decided to tackle these issues in several ways. We were able to leverage a number of existing features of our applications and infrastructure, as well as developing new automation to fill gaps in functionality. Our configuration and applications in various runtimes, were able to support a read/write split (which may seem obvious to some, but isn’t always easy to accomplish in every scenario). We used the read/write split, along with encrypted replication, to provide a local read replica. Some runtimes can set up a persistent encrypted connection to a remote master, which serviced read requests in those cases, as the per-connection latency was amortized over a large number of queries. For runtimes which have a high churn rate, such as PHP, we used a MySQL proxy, ProxySQL, which provided persistent, encrypted connections, as well as meeting our performance requirements. We built automation to deploy proxies for numerous database pools, servicing thousands of requests per second, per pool.

    When performing the cutover, our workflow was as follows. In each data center, there was a config which pointed to a local read slave, a remote master, and a local proxy with the master (remote or local) as a backend. When moving masters between datacenters, our database automation, Jetpants (new release coming soon!), reparented all replicas, and our automation updated the proxy backend to point to the new master. This resulted in seconds of read-only state per database pool and minimal user impact.

More coming soon!

databases mysql proxysql jetpants datacenters

Tumblr Engineering @ Percona Live MySQL Conference

We’re pleased to announce that Tumblr’s Database Engineering team will be attending the Percona Live MySQL Conference next week in Santa Clara, CA!

We’ll be giving a talk on our open source automation software, Jetpants, which has helped us scale to over 175 billion distinct rows of relational data to date. We’re also looking forward to attending a number of amazing sessions from our friends at Percona, Facebook, Oracle, Palomino, Etsy, and more.

If you haven’t registered yet, use code SpeakMySQL to save 15%. Hope to see you there!

MySQL databases

Slides from our talk at Percona Live NYC are now available. The presentation covers the design and implementation of Jetpants, Tumblr’s open source toolkit for interacting with hundreds of MySQL database servers.

Interested in working with relational databases at scale? Tumblr is currently hiring database engineers in NYC!

MySQL jetpants databases

Jetpants: a toolkit for huge MySQL topologies

Tumblr is one of the largest users of MySQL on the web. At present, our data set consists of over 60 billion relational rows, adding up to 21 terabytes of unique relational data. Managing over 200 dedicated database servers can be a bit of a handful, so naturally we engineered some creative solutions to help automate our common processes.

Today, we’re happy to announce the open source release of Jetpants, Tumblr’s in-house toolchain for managing huge MySQL database topologies. Jetpants offers a command suite for easily cloning replicas, rebalancing shards, and performing master promotions. It’s also a full Ruby library for use in developing custom billion-row migration scripts, automating database manipulations, and copying huge files quickly to multiple remote destinations.

Dynamically resizable range-based sharding allows you to scale MySQL horizontally in a robust manner, without any need for a central lookup service or massive pre-allocation of tiny shards. Jetpants supports this range-based model by providing a fast way to split shards that are approaching capacity or I/O limitations. On our hardware, we can split a 750GB, billion-row pool in half in under six hours.

Jetpants can be obtained via GitHub or RubyGems.

Interested in this type of work? We’re hiring!

MySQL databases sharding