white envelope with sprout social stamp

Disclaimer: This post is intentionally aimed at the engineering crowd. It’s a little geeky—and even esoteric among geeks—but it also showcases the deep thinking that goes into creating the best experience possible for our customers. Read on for insights into the backend of our product, as we geek out for a bit.


Simple Switch, Big Fix

At Sprout Social, we recently changed a server-side TCP configuration parameter for our Internet-facing endpoints. I thought it’d be interesting to share the decrease in download times we’ve seen, given this took very little effort to implement (basically some research, analysis and testing). The change was simply adjusting a setting.

First, here’s the context. We changed the TCP initial congestion window parameter from the default of four segments to 10. This allows more of our data to reach a user sooner, especially when first visiting our site, making that experience better. Later visits, or subsequent TCP requests, say for JavaScript assets, don’t benefit, as the impact is only to the initial TCP “ramp.” I won’t go into the nuts and bolts of why, as below I link to a number of resources that explain the necessary detail well. I’m just here to say the change was cheap and the impact has been nice. In the case of the initial DOM download for our public-facing sites, for high-speed Internet connections, we’ve seen average European latency decrease by over 100 ms and average North American latency drop by about 40 ms; we don’t have useful metrics from elsewhere on the globe. The expected latency decrease is correlated with distance from our origin servers in the US or with higher-latency Internet. Thus, the gains are likely greater on other continents and on mobile networks.

For a visual, here’s the response time of this Insights blog, from places in Europe, over the past month. The dip you see around April 6 is the result of just this one parameter changing.

sprout social tcp chart

For the record, we’re pretty late to the party. There’s been plenty of lip service for this change. Google Research argued for it back in 2010, the Linux kernel’s default changed to 10 in 2011, an IETF RFC was published in April of 2013, and then shortly thereafter, an O’Reilly book discussed it.

Why are we late when Linux has had this for years? Well, because one must consider that TCP can terminate on network devices, prior to reaching, say, web servers running recent Linux kernels (which we have). In our case, we’ve been conservative about upgrading boundary devices on which our Internet ingress TCP terminates. Moreover, we serve a fair number of assets from AWS Cloudfront, which has had this setting in place for some time. So we’re only partly late!

That’s it. Change your TCP initial congestion window to 10, if it isn’t already, to see this benefit.