How you can shave a few more precious milliseconds off end user latency by parallelizing your Go server handlers.

Have you ever written a Go server handler that could be parallelized, but you left it as a linear sequence of operations? Parallelizing your Go server handlers can provide a huge improvement in end user latency, but it’s easy to pass up the opportunity because it seems like managing goroutines, correctly propagating errors and values, and cleaning up afterwards isn't worth the extra code complexity. If this sounds familiar, let me introduce you to your new friend, errgroup.WithContext() found in golang.org/x/sync/errgroup.

Serial Processing

Suppose we have a web app that produces a list of friends for the logged in user. In our system, each user has a list of friend ids which refer to another user, and we can get a friend id iterator from the Friend service. To get the actual details (like name, email, etc), we look up each friend’s id with the Profile service. (Once upon a time, this might have been done in SQL with a table join, but in the era of microservices, let’s assume the Friend and Profile services are two different things.) Here’s a simple implementation of the server logic.

We iterate over a returned list of friend ids (collecting them in memory), look up each friend’s profile by id, and put the result in a response object. It’s simple and straightforward (and probably fine for many use cases). But the problem is that it’s completely serialized—we don’t start looking up profiles until we’ve collected all the friend ids, and we look up each profile one after another. The more friends you have, the longer this takes!

Parallel Processing

We can speed this up immensely by parallelizing all the remote calls. Our handler code is not doing much actual, real CPU work—it's just waiting on network responses from other services. So let’s rewrite this with goroutines and channels:

This executes much faster! Using a channel, we pass each friend id along to multiple mapper routines as soon as we get it. Each mapper looks up a single profile at a time, so we can look up as many profiles at a time as we have mappers. The mappers submit each friend profile to the final reducer. Everything happens with maximum concurrency. (Note that in this trivial example, we could have just used a sync.Mutex to update the response object directly from the mappers; but for the purposes of illustration let’s have a separate reducer routine.)

Using errgroup

As you may have surmised from the code comments, we have a problem—no error propagation. Any of these individual network calls might fail, time out, or otherwise produce an error. In the real world, returning partial results is often better than failing completely, but for our example, let’s assume that if any part of the process fails, we’d like to immediately exit the entire operation and return the error for whatever piece failed.

This is where errgroup.WithContext() becomes our best friend. What is an error group? As the doc states:

A Group is a collection of goroutines working on subtasks that are part of the same overall task.

And:

WithContext returns a new Group and an associated Context derived from ctx. The derived Context is canceled the first time a function passed to Go returns a non-nil error or the first time Wait returns, whichever occurs first.

How does it work?

  1. Create a new Group and associated context:
    g, ctx := errgroup.WithContext(ctx)
  2. Start all the group worker functions:
    g.Go(func() error {...})
  3. Wait on the result:
    err := g.Wait()

An errgroup.Group unifies error propagation and context cancelation. It performs several critical functions for us:

  1. The calling routine Wait()s for subtasks to complete; if any subtask returns an error, Wait() returns that error back to the caller.
  2. If any subtask returns an error, the group Context is canceled, which early-terminates all subtasks.
  3. If the parent Context (say, an http request context) is canceled, the group Context is also canceled. This helps avoid unnecessary work. For example, if the user navigates away from the page and cancels the http request, we can stop immediately.

So here’s what this all looks like: a collection of subtasks whose lifetimes and fates are bound together, producing a single success/failure result.

This is pretty much the same amount of code as naked goroutines, but now we get error propagation. We didn’t have to add a bunch of extra code, more channels, or additional synchronization ourselves.

Using select

There’s one final piece of the puzzle: how do we know that all of our goroutines will exit in a timely manner? Failing to exit goroutines is a serious problem. Left unchecked, leaked goroutines pin memory and process resources, eventually slowing down and then crashing the process.

We can rely on well-written libraries that accept a Context to respect cancelation. A call like GetUserProfile(ctx, id) should halt if the supplied context is canceled, because the underlying remote calls (such as those in Go’s http client library—when correctly using request.WithContext) should check for context cancelation and exit quickly with a context.Canceled or context.DeadlineExceeded error. But there’s one particular interaction that we have to pay close attention to and manage ourselves: channel reads and writes.

In Go, channel operations are potentially blocking. Writing into a full channel blocks forever until some reader makes space, and reading from an empty channel blocks forever until the writer adds a new value or closes the channel. Channels are a big potential source of deadlock.

How does this impact our code? In the example above, we take care to ensure that writers always close the channels they control. We even set them up as defer functions (generally a best practice!) so that even if a writer function panics, it will still close the channel as it exits. As long as the producer loop eventually exits, it will close the friendIds channel; and as long as all the mappers eventually exit, they will close the friends channel. So if we can ensure the writers all exit in a timely manner, we can be confident that the reads won’t deadlock. This is nice, because it allows us to use Go’s for-loop-over-channel construct.

But what about the writes? Here we have a problem. Readers can detect “end of stream” when a writer closes the channel, but there’s no such signaling mechanism in the other direction. Writers have no way to know “no one is listening anymore.” In our code, if the reducer loop exits with an error and stops pulling data from the friends channel, the workers can deadlock trying to write into a full channel. If the mappers error exit and stop pulling data from the friendIds channel, the producer loop can deadlock. How can we avoid this pitfall?

The answer lies in Go’s select statement. Instead of bare channel writes, we’ll do this instead:

In Go, a select statement blocks until any one of the possible channel operations can be performed. So this statement blocks until either we can write id into the friendIds channel or the ctx.Done() channel becomes readable. (The latter happens when the context is canceled: the ctx.Done() channel closes, making it readable.)

If any subtask returns an error, the shared group context is canceled, ctx.Done() becomes readable, and any writers who were blocked trying to write into full channels exit immediately. Similarly, if the parent context is canceled (the request terminates early), the same thing will happen. And since writers close our data channels on exit, all the reads will unblock as well. The final (hopefully correct) version of our handler looks like this:

Final Thoughts

So there you have it: errgroup.WithContext() provides a clean way to run subtasks in parallel, cleanly manage their life cycle, and unify error propagation. And there’s one more great thing I haven’t mentioned: error Groups compose (nest) perfectly well. Any of your subtasks can have its own error group to further divide your task processing. That means you don’t have to break all your layers of extraction—you can just use a new group exactly where you need it for a distinct operation, and various layers of your code will interoperate seamlessly as long as everyone is respecting Context and returning error values correctly.

If you’d like to play around with the various versions of the code yourself, here’s the full runnable version on golang playground and as a gist. Try adding random errors, or reducing the context timeout in main(), and see what happens.

Here’s to shaving a few more milliseconds off end user latency!


References and Further Reading

  1. More on Go channels and select statement
  2. More on Context

You Might Also Like