Measuring Code.001

You can’t know what your code is doing unless you measure it. And, if you don’t know what your code is doing, you won’t know if you’re letting your users down. That’s why you need to instrument.

The value of gathering metrics has been beaten to death, and a deep dive into the importance of instrumenting your code is outside the scope of this post. If you’re interested, we highly recommend you check out Coda Hale’s talk on Metrics. Hale’s YouTube video tackles the how and why of instrumenting production code and is a good introduction to the library upon which Sprout Social’s instrumentation tooling relies heavily.

For those who haven’t seen it, this quote by Hale serves as a good summary, “if it affects business value, somebody should get woken up.”

Still, instrumenting methods is hard. As developers, in order to achieve benefits, we need to have foresight regarding what we want to measure. We need to be able to predict what kinds of deficiencies might affect our applications, before they break our applications. We need to collect these deficiencies and surface metrics in a consistent manner. Ultimately, we need to avoid inventing a hundred ways to capture data and events.

There’s no cure-all solution and only developers understand the specifics of how our code creates value for our users. Only we can really know what parts of our app are critical.

That said, Sprout has found some things that we consistently want to measure for almost every key method in our Java codebase. The things we care about the most are error rates, performance, frequency and bandwidth. At Sprout, we use metrics, and a small set of tools that we wrote to help measure these efforts.

The library is called Instrumentor. This minimizes the amount of time developers have to spend writing instrumentation code, while maximizing the amount of value developers and operations get from real-time metrics and sophisticated application level health checks.

What You Can Report Using Instrumentor

For the sake of example, let’s assume we’re instrumenting the method below:

package com.mycompany;

class Example {
    public void sayHello() {
        System.out.println("Hello, World");
    }
}

If we were to instrument the method com.example.Example#sayHello, an Instrumentor would report a number of metrics related to the timing, call rate, error rates, and a count of calls in flight.

In Flight Count

We’ll get a count of the current method calls in flight.

  • com.mycompany.Example.sayHello.inFlight.count — a count of method calls currently being executed

Call Rate

We’ll get several stats for how often the sayHello method is being called.

  • com.mycompany.Example.sayHello.count — a lifetime count of calls to the method
  • com.mycompany.Example.sayHello.mean_rate — mean rate of calls
  • com.mycompany.Example.sayHello.m1_rate — rate of calls over the last 1 minute
  • com.mycompany.Example.sayHello.m5_rate — rate of calls over the last 5 minutes
  • com.mycompany.Example.sayHello.m15_rate — rate of calls over the last 15 minutes

Timing Statistics

We’ll get a Histogram of the timing of sayHello. This is a statistical representation of the distribution i.e: “how long did it take to call this method?”

  • com.mycompany.Example.sayHello.max — maximum time it took to call the method
  • com.mycompany.Example.sayHello.mean — mean time to call the method
  • com.mycompany.Example.sayHello.min — minimum time it took to call the method
  • com.mycompany.Example.sayHello.p50 — time it took for 50% of calls to complete
  • com.mycompany.Example.sayHello.p75 — time it took for 75% of calls to complete
  • com.mycompany.Example.sayHello.p95 — time it took for 95% of calls to complete
  • com.mycompany.Example.sayHello.p99 — time it took for 99% of calls to complete
  • com.mycompany.Example.sayHello.p999 — time it took for 99.9% of calls to complete
  • com.mycompany.Example.sayHello.stddev — standard deviation of method call times

Error Rate

We’ll get stats to track the rate at which your method throws exceptions

  • com.mycompany.Example.sayHello.errors.count — a lifetime count of errors
  • com.mycompany.Example.sayHello.errors.mean_rate — mean rate of errors
  • com.mycompany.Example.sayHello.errors.m1_rate — rate of errors over the last 1 minute
  • com.mycompany.Example.sayHello.errors.m5_rate — rate of errors over the last 5 minutes
  • com.mycompany.Example.sayHello.errors.m15_rate — rate of errors over the last 15 minutes

Percent Error Rate

We’ll get stats to track the percentage of method calls are throwing exceptions

  • com.mycompany.Example.sayHello.errors.total_pct — a lifetime percent of errors
  • com.mycompany.Example.sayHello.errors.mean_pct — mean percent of errors
  • com.mycompany.Example.sayHello.errors.m1_pct — percent of errors over the last 1 minute
  • com.mycompany.Example.sayHello.errors.m5_pct — percent of errors over the last 5 minutes
  • com.mycompany.Example.sayHello.errors.m15_pct — percent of errors over the last 15 minutes

Health Check

An Instrumentor will also (optionally) create a Health Check that will monitor the percent of errors, and notify you when the error rate exceeds the threshold.

Assuming threshold specified given is 0.1 (i.e. ten percent), and there were five errors in the last 100 calls, the healtcheck would report as “Healthy.”

If there were, however, 20 errors in the last 100 calls, the Health Check would report as “Unhealthy,” with a message of “value=0.2&threshold=0.1”

Health Checks give us a simple means of setting off alarms when something goes haywire. It’s easy to make them drive tools like Pagerduty, Nagios or Zabbix. With minimal effort and a bit of configuration, the appropriate developers can get alerts on their preferred devices when checks report problems.

Getting Started

You can include the latest version of Instrumentor using maven:

<dependency>
    <groupId>com.sproutsocial</groupId>
    <artifactId>instrumentor-core</artifactId>
    <version>0.8.0</version>
</dependency>

Then, to instrument the sayHello method, create an Instrumentor:

Instrumentor instrumentor = new Instrumentor();
Example example = new Example();
instrumentor.run(example::sayHello, "sayHello")
You can inspect the results by looking at the underlying MetricRegistry:
MetricRegistry registry = instrumentor.getMetricRegistry();

// check total calls
long callCount = registry.meter("sayHello").count();
System.out.println("sayHello called " + callCount + " times");

// check error count
long exceptionCount = registry.meter("sayHello.errors").count();
System.out.println("sayHello threw " + exceptionCount + " exceptions.");
With an error threshold, you'll be able to check if a method is healthy.
double errorThreshold = 0.1;

instrumentor.run(example::sayHello, "sayHello", errorTreshold);

HealthCheckRegistry healthChecks = instrumentor.getHealthCheckRegistry();
HealthCheck.Result result = healthChecks.runHealthCheck("sayHello");
System.out.println("sayHello is " + result.isHealthy() ? "healthy" : "unhealthy");

By default, new Instrumentor will create new instances of MetricRegistry and HealthCheckRegistry. If you’d like, you can specify your own already-instantiated instances of one or both via the Builder:

Instrumentor instrumentor = Instrumentor.builder()
    .metricRegistry(myMetricRegistry)
    .healthCheckRegistry(myHealthCheckRegistry)
    .build();

Instrumenting with Guice AOP

If you’re using Guice and you want to save some code, you can use the instrumentor-aop library and use annotations to instrument your code.

You can include it using maven:

<dependency>
    <groupId>com.sproutsocial</groupId>
    <artifactId>instrumentor-aop</artifactId>
    <version>0.8.0</version>
</dependency>

Then, to instrument the sayHello method, just add an @Instrumented annotation to the method:

class Example {
    @Instrumented
    public void sayHello() {
        System.out.println("Hello, World");
    }
}

And add the InstrumentedAnnotations module to your injector:

Injector injector = Guice.createInjector(new InstrumentedAnnoations())
Example example = injector.getInstance(Example.class);
example.sayHello();

You can inspect the results by looking at the underlying MetricRegistry, which will be bound by InstrumentedAnnotations:

MetricRegistry registry = injector.getInstance(MetricRegistry.class);

long callCount = registry.meter("com.mycompany.Example.sayHello").count();
System.out.println("sayHello called " + callCount + " times");

Note that in this case the name of the method was automatically inferred. You can override it using the name keyword:

class Example {
    @Instrumented(name = "mySweetName")
    public void sayHello() {
        System.out.println("Hello, World");
    }
}

Then, instead of “com.mycompany.Example.sayHello”,the base name for all metrics will just be “mySweetName”.

Injector injector = Guice.createInjector(new InstrumentedAnnoations())
Example example = injector.getInstance(Example.class);
example.sayHello();

MetricRegistry registry = injector.getInstance(MetricRegistry.class);

long callCount = registry.meter("mySweetName").count();
System.out.println("sayHello called " + callCount + " times");

With an error threshold, you’ll be able to check if a method is healthy.

class Example {
    @Instrumented(errorThreshold = 0.1d)
    public void sayHello() {
        System.out.println("Hello, World");
    }
}
Injector injector = Guice.createInjector(new InstrumentedAnnotations());
Example example = injector.getInstance(Example.class);
example.sayHello();

HealthCheckRegistry healthChecks = injector.getInstance(HealthCheckRegistry.class);
HealthCheck.Result result = healthChecks.runHealthCheck("sayHello");
System.out.println("sayHello is " + result.isHealthy() ? "healthy" : "unhealthy");

By default, new InstrumentedAnnotations() will create new instances of MetricRegistry and HealthCheckRegistry. If you’d like, you can specify your own already-instantiated instances of one or both via the Builder:

Module aopModule = InstrumentedAnnotations.builder()
    .metricRegistry(myMetricRegistry)
    .healthCheckRegistry(myHealthCheckRegistry)
    .build();

Injector injector = Guice.createInjector(aopModule);
Example example = injector.getInstance(Example.class);

What Does Instrumentor Provide?

Instrumentor encourages you to gather the metrics you need and helps write cleaner, safe code. Developers now have no excuse to not measure their application’s behavior.

A standard, singular interface for collecting metrics means less code, less repetition, and more consistency. Furthermore, Instrumentor implements several metrics that require a bit of boilerplate. By implementing this into a library, writing application code gets that easier (nothing is more irritating than finding a bug in one-off metrics gathering or logging code).

Finally, Instrumentor makes it easy to enforce a consistent metrics naming scheme, which makes organizing and aggregating data easier. Whether it’s an increase in network latency, a spike in database connection errors, an increase in pending items in a queue, or a sudden drop in usage of an API endpoint–there are always parts of your app that will need monitoring. With Instrumentor, protecting your users just got a little bit easier.

Credit and Contributing

Instrumentor was originally written by Dexter Horthy. It’s maintained by David Huber. Instrumentor is MIT licensed, and pull requests are welcome. If you find a problem or want to work on a feature, feel free to file an issue.