I’ve been watching the 2017 Gophercon videos from here. There are many good talks that I would recommend watching from that list.

One that I really wanted to try out on my own was the profiling presentation that Peter Bourgon did. I assume for the sake of simplicity, he left some details out. I’ve been trying to figure them out on my own.

I was inspired to try to profile my own Golang web apps using the method he presented in his talk. So in this post I’ll show you a simplified example of how to profile your own web applications in more detail.

The web app

To see a practical example of profiling and to keep it simple, we need to create a web app that when called on a particular route, performs some sort of calculation and returns a result in it’s payload.

If we were to mimic real life scenarios, we would be here all day. So just to keep it simple, we will create a single route that calculates the 25th Fibonacci number when it’s called and returns it in the response body.

I’ve written two versions of the Fibonacci function before hand, one that performs recursion (exponential time – very CPU intensive) and one that calculates the number using a vector (linear time – not very CPU intensive).

In order to use the profiling tool, you need to import the net/http/pprof  package and register some routes. In the presentation I mentioned earlier, the speaker mentioned that we could leave this imported even in production environments since it does not affect performance.

As you can see, this is a really simple application, save it to a file called  main.go and build it like this:  go build -o myserver main.go .

Now you can run your binary:  ./myserver and to check if it’s working we’ll send a request to it:

The CPU profile

Now, while the server is running, you will need to run two commands in parallel. You first need to start the profiling tool which will record data for 30 seconds after it is run AND as it is running, run the Apache Benchmark tool to send a few requests it’s way.

So you will need to run the profiling tool like this:

go tool pprof -seconds 30 myserver http://localhost:8080/debug/pprof/profile

While that’s running, run the benchmark:

ab -k -c 8 -n 100000 "http://127.0.0.1:8080/"

This will create a total of 100,000 keep-alive requests to your server, using 8 cores. To better understand what this benchmark command does, it is explained in the link provided.

After 30 seconds, the profiling tool will show a prompt for commands that will look something like this:

Here you can run commands to show you how much of CPU time each function took and other useful information. For example, if I run top5  it will list the top 5 functions that are hogging the CPU:

As you can see the exponentialFibonacci function really hogs the CPU.

Please note: These values might differ on your machine. For reference I’m using a MacBook Pro (Retina, 13-inch, Early 2015), 2.7 GHz Intel Core i5 Processor and 8 GB 1867 MHz DDR3 of Memory.

If we wanted to see a graph of this profile we need to run the web command like this:

This will open up your default browser and display an image of the profile. Here’s a crop of the image that concerns our Fibonacci function:

golang profiling

So during that profile, 60.97% of the time, the exponentialFibonacci was running on the CPU.

Optimizations

Now, we know from the theory that O(n) < O(2^n). Let’s see if this holds up in practice, it we were to replace the exponentialFibonacci  call with linearFibonacci  inside the handler  function.

Now we run the profile again. You can immediately see that it took less time because the benchmark actually finishes really fast this time.

If we run  top5 now, the linearFibonacci  function doesn’t even make the cut.  Even if you try to do  top100 you will not find it because the compiler inlined that particular code.

So we need to rebuild the application with the compiler flags that disable inlining like this:

go build -gcflags -l -o myserver main.go

 

Now even with this flag enabled I had a hard time finding the function in the top. I went ahead and increased the hard-coded value for the n-th Fibonacci number to 10,000. So I’m looking for the 10,000th Fibonacci number, this number doesn’t even fit inside the integer datatype in Golang. It will overflow several times before coming to a stop. I also increased the benchmark to 1,000,000 requests.

Now if I run top5 I get:

Or in graphical format:

golang profiling

As you can see, it barely even makes a dent.

So for this test, calculating the 25th Fibonacci number recursively takes 60% of the CPU while calculating the 10,000th Fibonacci number linearly takes 4% of the CPU (without inlining).

Another useful command for pprof to see how much CPU time a function takes is the list  command. Or, if you’re like me, to find out if a function is actually called.

For our linearFibonacci  function it looks like this:

A better comparison

A better way to compare the two methods, and the theory to practice, is this:

  1. Knowing that the exponentialFibonacci method is O(2^n), it would take approximately 2^25 = 33554432 instructions to calculate the 25th Fibonacci number.
  2. Linearly, calculating the 33554432th Fibonacci number should take roughly the same time as calculating the 25th number exponentially.

So following the methodology above we do this:

  1. Build the application using the exponentialFibonacci(25) call.
  2. Start the application.
  3. Start the Apache Benchmark for 1,000,000 requests.
  4. Start the CPU profile for 30s seconds.

We get this:

Now for the second part:

  1. Build the application using the linearFibonacci(33554432) call.
  2. Start the application.
  3. Start the Apache Benchmark for 1,000,000 requests.
  4. Start the CPU profile for 30s seconds.

We get this:

As you can see, the flat percentages, which is how much of the time was spent in the routine itself, is roughly the same. 61.38% vs 57.45%, it’s about 4% difference between them.

Profiling memory

Using the same process, you can run the following command to profile memory:

go tool pprof -alloc_objects myserver http://localhost:8080/debug/pprof/heap

If you run a top command you should see something like this:

Conclusion

Now that you’ve seen the basics on how to profile your Golang web apps, you can start diving into heavier stuff like this. Take some time and run a profile on your own Golang web apps.

Also, you should see the Gophercon talk I mentioned at the start of this post, it’s quite good.