Profiling web applications in Golang

I've been watching the 2017 Gophercon videos from here. There are many good talks that I would recommend watching from that list.

One that I really wanted to try out on my own was the profiling presentation that Peter Bourgon did. I assume for the sake of simplicity, he left some details out. I've been trying to figure them out on my own.

I was inspired to try to profile my own Golang web apps using the method he presented in his talk. So in this post I'll show you a simplified example of how to profile your own web applications in more detail.

The web app

To see a practical example of profiling and to keep it simple, we need to create a web app that when called on a particular route, performs some sort of calculation and returns a result in it's payload.

If we were to mimic real life scenarios, we would be here all day. So just to keep it simple, we will create a single route that calculates the 25th Fibonacci number when it's called and returns it in the response body.

I've written two versions of the Fibonacci function before hand, one that performs recursion (exponential time - very CPU intensive) and one that calculates the number using a vector (linear time - not very CPU intensive).

In order to use the profiling tool, you need to import the [crayon-5f0123f72c0ca379433295-i/]  package and register some routes. In the presentation I mentioned earlier, the speaker mentioned that we could leave this imported even in production environments since it does not affect performance.
As you can see, this is a really simple application, save it to a file called [crayon-5f0123f72c0d9759996358-i/] and build it like this: [crayon-5f0123f72c0dd538958484-i/] .

Now you can run your binary: [crayon-5f0123f72c0de151712954-i/] and to check if it's working we'll send a request to it:

The CPU profile

Now, while the server is running, you will need to run two commands in parallel. You first need to start the profiling tool which will record data for 30 seconds after it is run AND as it is running, run the Apache Benchmark tool to send a few requests it's way.

So you will need to run the profiling tool like this:


While that's running, run the benchmark:


This will create a total of 100,000 keep-alive requests to your server, using 8 cores. To better understand what this benchmark command does, it is explained in the link provided.

After 30 seconds, the profiling tool will show a prompt for commands that will look something like this:
Here you can run commands to show you how much of CPU time each function took and other useful information. For example, if I run [crayon-5f0123f72c104364890461-i/]  it will list the top 5 functions that are hogging the CPU:
As you can see the [crayon-5f0123f72c107962330811-i/] function really hogs the CPU.

Please note: These values might differ on your machine. For reference I'm using a MacBook Pro (Retina, 13-inch, Early 2015), 2.7 GHz Intel Core i5 Processor and 8 GB 1867 MHz DDR3 of Memory.

If we wanted to see a graph of this profile we need to run the web command like this:
This will open up your default browser and display an image of the profile. Here's a crop of the image that concerns our Fibonacci function:

golang profiling

So during that profile, 60.97% of the time, the [crayon-5f0123f72c10b137300048-i/] was running on the CPU.


Now, we know from the theory that O(n) < O(2^n). Let's see if this holds up in practice, it we were to replace the [crayon-5f0123f72c10d987710352-i/]  call with [crayon-5f0123f72c10e456075198-i/]  inside the [crayon-5f0123f72c10f756419744-i/]  function.

Now we run the profile again. You can immediately see that it took less time because the benchmark actually finishes really fast this time.

If we run [crayon-5f0123f72c111781766269-i/] now, the [crayon-5f0123f72c112512142089-i/]  function doesn't even make the cut.  Even if you try to do [crayon-5f0123f72c114262253169-i/] you will not find it because the compiler inlined that particular code.

So we need to rebuild the application with the compiler flags that disable inlining like this:



Now even with this flag enabled I had a hard time finding the function in the top. I went ahead and increased the hard-coded value for the n-th Fibonacci number to 10,000. So I'm looking for the 10,000th Fibonacci number, this number doesn't even fit inside the integer datatype in Golang. It will overflow several times before coming to a stop. I also increased the benchmark to 1,000,000 requests.

Now if I run [crayon-5f0123f72c117707948531-i/] I get:
Or in graphical format:

golang profiling

As you can see, it barely even makes a dent.

So for this test, calculating the 25th Fibonacci number recursively takes 60% of the CPU while calculating the 10,000th Fibonacci number linearly takes 4% of the CPU (without inlining).

Another useful command for pprof to see how much CPU time a function takes is the [crayon-5f0123f72c11a699234011-i/]  command. Or, if you're like me, to find out if a function is actually called.

For our [crayon-5f0123f72c12c710559356-i/]  function it looks like this:

A better comparison

A better way to compare the two methods, and the theory to practice, is this:

  1. Knowing that the exponentialFibonacci method is O(2^n), it would take approximately 2^25 = 33554432 instructions to calculate the 25th Fibonacci number.
  2. Linearly, calculating the 33554432th Fibonacci number should take roughly the same time as calculating the 25th number exponentially.

So following the methodology above we do this:

  1. Build the application using the exponentialFibonacci(25) call.
  2. Start the application.
  3. Start the Apache Benchmark for 1,000,000 requests.
  4. Start the CPU profile for 30s seconds.

We get this:
Now for the second part:

  1. Build the application using the linearFibonacci(33554432) call.
  2. Start the application.
  3. Start the Apache Benchmark for 1,000,000 requests.
  4. Start the CPU profile for 30s seconds.

We get this:
As you can see, the flat percentages, which is how much of the time was spent in the routine itself, is roughly the same. 61.38% vs 57.45%, it's about 4% difference between them.

Profiling memory

Using the same process, you can run the following command to profile memory:


If you run a top command you should see something like this:


Now that you've seen the basics on how to profile your Golang web apps, you can start diving into heavier stuff like this. Take some time and run a profile on your own Golang web apps.

Also, you should see the Gophercon talk I mentioned at the start of this post, it's quite good.

OOP in Golang vs C++

Before I started to learn Go, every online opinion I would read about the language complained about the lack of generics and how OOP was dumbed down  and so on.

It made me put off learning it for quite some time, more than I would like to admit. Coming from a C++ background, OOP, generics and meta-programming was my daily bread.

It wasn't until I had to actually learn Go that I saw what it offered me in terms of OOP and it was just enough. As such, I wanted to put a side-by-side comparison of typical C++ code that deals with classes, and it's corresponding implementation in Go that does more or less the same thing.

This is by no means an exhaustive list of examples, but I thought it might prove useful for someone trying to figure out Go.

To run all Go examples, copy them to a file and run [crayon-5f0123f72cce2977489072-i/] .

To run all C++  examples, copy them to a file and run [crayon-5f0123f72ccea969907737-i/] .

Class declaration

In C++:
Go equivalent:

Inheritance (sort of)

In C++:
Go equivalent:


In C++:
Go equivalent:


There are basic equivalences between traditional OOP languages like C++ and the syntax and functionality that Golang provides.

In it's own simple way, Golang provides ways to implement encapsulation, inheritance and polymorphism. In my humble opinion, these mechanisms are enough for most object-oriented projects.

Emulating a Redis Failover with Docker

Reading the Redis documentation can be a bit confusing without the hands-on experience. You could in theory create multiple processes of the Redis Server on your machine and configure each of them in part, but what if you could do it in a few commands? Not only that but emulate the network they’re connected to as well.

I’ve been looking into this and there’s a few examples out there on Web, the best one I could find was this one:

So, starting from that example, I’ve tried to do the next best thing, which is to create a single docker-compose.yml file for all of it. Removing the need to build each image, just to do a docker-compose up and scale as needed.

Here’s what I got:
Basically, after saving this into a docker-compose.yml file and running docker-compose up in that folder you’ll get this:

You can now scale as needed. For example, by running:
You’ll end up with:

To initiale a failover, you’ll need to take the master out of the picture, you can do that with:
You can now observe the communication between the sentinels and slaves. After the down-after-milliseconds and failover timeout passes, one of the slaves will be selected for promotion.

After the sentinels agree on the selection, the slave will become the new master.

You can now unpause the old master by doing this:
The old master will now become a slave of the new master and perform a sync.

That’s about it. As an exercise you could try setting up a cluster starting from this and observe failovers there.

Go debugging

Debugging Golang apps in Docker with Visual Studio Code


We’ve recently had some problems with a Go application that was running inside a Docker container in a very big Docker Compose setup.

After getting fed up with writing console prints and rebuilding the Docker image for that container and spinning up all the containers to debug things, we started investigating how we could speed up our debugging process.

Enter Visual Studio Code and its wonderful Go extension which supports Delve.

Now if you read through the pages linked above you will find out how to install and setup all these things. It’s pretty straight forward. The Docker part, however, is not. As such, I will show you a basic Go application which mimics what we had to deal with and how to set up debugging for it.

The application

The following is the main.go of our app. It will connect to a Redis server, set and get a value.

As you can see, it relies on the Redigo package, so make sure you get it and place it in your vendor folder.

To make sure you have everything setup the right way, go ahead and build it locally by running :
If you run the application built this way, it will fail of course, because you need to connect to Redis. I’ve set the hostname for the server to redis which will point to an IP on the docker-machine when we docker-compose up.

The Dockerfile

Now we have to build the image for this application.

When this image will be built, it will basically copy the application code, set up the environment and build the Go application. The application’s entrypoint will be the main executable that will be built. We also install the Delve command line tool but we won’t use it if we run a container from this image directly (i.e. docker run).

Note the GOPATH variable and the path to which we copy our code. This path is very important for Delve and our debug configuration.

The Docker Compose file

Now that we have the Dockerfile to build the image, we have to define the docker-compose.yml file. Here, however we will overwrite the entrypoint for the container to launch Delve. Also the code that we copied will be replaced with a volume that will point to the code on the host machine, and we will also remove some security constraints that prevent Delve from forking the process.

Essentially, for the context I mentioned above we try not to touch the base image for the application since it might get accidentally pushed to the Docker Hub with debugging parameters. So in order to avoid that we have our Docker Compose process override the image with what we need to go about debugging.

Here’s the docker-compose.yml file :

It's here that we introduce the Redis server dependency we have.  Note that for the myapp container we’ve exposed the ports that the Delve command line tool listens to.

So to see that everything is working, you can now run :
This will build the image and start up the redis and myapp containers.

You should see the following output coming from the myapp container:
Which means that the Delve command line tool compiled our Go code into a debug executable, started it, and it’s listening for remote connections to the debugger on port 2345.

Now we just have to set up our launch.json config in the .vscode folder of our project.

The launch configuration

Here’s how our launch.json should look like:
You might have to change the host IP  to what your docker-machine ip output is.

Now all we have to do is set up a few breakpoints and start the debugger using the Remote Docker configuration.

Our docker compose terminal should print something like this from the myapp container :
You can Next and Continue, look at the callstack, see the locals, view contents of specific variables, etc.

Final thoughts

I hope this proves to be as useful to you as it did for us. The tools mentioned in this post really save us a heap of trouble.

We really have to thank the open source community that brought us these tools. They are the real heroes.

Happy debugging!

Building a Face Detection Web API in Node.js


As a follow-up to my previous article on how to use your webcam for face detection with OpenCV, I’d like to show you how you can create your own web API for that.

There are a few Node.js modules out there that do just that. A few of them even provide bindings for OpenCV so you can use it directly from Javascript.

The catch is that most of these modules either rely directly on binaries or they need to be built for your machine from a makefile or a Visual Studio project, etc. That’s why some of them work on Windows for example, but not on Mac, or vice-versa.

The objective of this article is to show you the steps needed to create such a module for yourself so that you can customize it for your machine specifically. What we’re going to do is create a native Node.js add-on and a web server that will use that add-on to detect faces and show them to you.


I’ve built this on a MacBook Pro running OS X El Capitan Version 10.11.1.

Since we’re going to use OpenCV you’ll need to set this up for your machine, I’ve described how to do this in this article.

Next, we’ll need Node.js which you can get from here. This will also install NPM (the package manager for node) which we need to install some extra node modules.

The next thing we need is node-gyp which you can install using npm. But before you do that make sure you have all the dependencies required which are described here. For Mac they are python 2.7, xcode, gcc and make. So basically if you followed the OpenCV installation guide you should be good on everything except python which you should install. After that you can install node-gyp like this :
Node-gyp is used to generate the appropriate files needed to build a native node.js add-on.

That’s pretty much it. Next up, we’ll generate a simple native add-on.

Setting up

First, we need to create a folder for the node project, I’m doing this in my home directory :
Now we need a folder to hold the native module and navigate to it :
Node-gyp uses a file which specifies the target module name, source files, includes and libraries and other cflags to use when building the module. We need to create that file and call it binding.gyp. It’s contents should look like this :
Node-gyp still has some hiccups on Mac OS X and will use only either cc or c++ by default when building (instead of gcc/g++ or whatever you have configured).

Now we use node-gyp to generate the project files :

The native module

As specified in the binding.gyp file, we now need to create the source file of the native module i.e. src/face-detect.cpp.

Here is the source code for that :

Basically what this code does is register a method to our module. The method gets the first parameter as a buffer, decodes it to an OpenCV Mat image, detects the faces within the image using the classifier (which should be placed next to the binary), and returns a JSON string containing the coordinates of the faces found in the image.

Now that we have all the pieces in place for the native module, we can build it using :
If everything goes well, in the folder ./build/Release you should find a file called face-detect.node. This file represents our native module and we should now be able to require it in our javascript files. Also, next to this file, we need to copy the lbpcascade_frontalface.xml from the OpenCV source folder under /data/lbpcascades/.

The Server

Now we have to create the server.js file for the node server. We should load the native add-on for face detection, create a server that will listen to PUT requests and call the native add-on on the contents of these requests. The code for that should look like this :
To start the server just run :

Test it out

Save an image containing human faces as image.jpg. Then, using curl from the command line send the image via a PUT request to the node server like this :
Depending on the image you send, you should see something like this :


Sometimes Node.js libraries might not meet your application needs or they might not fit your machine resulting in errors during npm install. When that happens, you can write your own custom native Node.js add-on to address those needs and hopefully, this article showed you that it’s possible.

As an exercise you can try changing this application to return an image with rectangles surrounding the detected faces. If you’re having trouble returning a new buffer from inside the native add-on, try returning the image as Data URI string.

Face Detector using OpenCV and C

Build a Face Detector on OS X Using OpenCV and C++

Building and using C++ libraries can be a daunting task, even more so for big libraries like OpenCV. This article should get you started with a minimal build of OpenCV and a sample application written in C++.

This application will get images from the webcam, draw rectangles around the faces in the images and show them to you on screen.


I've built this on a MacBook Pro running OS X El Capitan Version 10.11.1.

We'll be using the GNU C++ compiler (g++) from the command line. Note that you should still have Xcode installed (I have Xcode 7.1 installed).

Here's what you need to do :

  1. Get "OpenCV for Linux/Mac" from the OpenCV Downloads Page I got version 3.0.
  2. Extract the contents of the zip file from step 1 to a folder of your choosing (I chose ~/opencv-3.0.0).
  3. Get a binary distribution of Cmake from the Cmake Downloads Page I got cmake-3.4.0-Darwin-x86_64.dmg.
  4. Install Cmake.


    Building OpenCV

OpenCV uses CMake files to describe how the project needs to be built. CMake can transform these files into actual project settings (e.g. an Xcode project, Unix makefiles, a Visual Studio project, etc.) depending on the generator you choose.

First open CMake and a small window will pop-up that will let you choose your build options based on the CMakeList.txt files in the opencv source directory. First click on the Browse Source... button and choose the path to the opencv source folder (the folder you extracted the zip file to at step 2). Then click on the Browse Build... button and choose a path to a build folder, I'm going to create a new folder called build in the previously mentioned source folder.

If at any point you are prompted to choose a generator, pick Unix Makefiles. If the paths you chose were correct, after you click the Configure button, you should be looking at something like this :


For a somewhat minimal OpenCV build, make sure you only have the following options enabled :

  • BUILD_opencv_apps
  • BUILD_opencv_calib3d
  • BUILD_opencv_core
  • BUILD_opencv_features2d
  • BUILD_opencv_flann
  • BUILD_opencv_hal
  • BUILD_opencv_highgui
  • BUILD_opencv_imgcodecs
  • BUILD_opencv_imgproc
  • BUILD_opencv_ml
  • BUILD_opencv_objdetect
  • BUILD_opencv_photo
  • BUILD_opencv_python2
  • BUILD_opencv_shape
  • BUILD_opencv_stitching
  • BUILD_opencv_superres
  • BUILD_opencv_ts
  • BUILD_opencv_video
  • BUILD_opencv_videoio
  • BUILD_opencv_videostab
  • WITH_1394
  • WITH_V4L

You should disable the options that are not in the list, especially the BUILD_SHARED_LIBS one. Don't touch the options that are text fields unless you know what you're doing.

Most of these options you don't need for this particular exercise, but it will save you time by not having to rebuild OpenCV should you decide to try something else.

Once you have selected the settings above, click Generate. Now you can navigate to the build folder, I'll do so with cd ~/opencv-3.0.0/build/ and run make to build OpenCV.

    Installing OpenCV

If everything goes well, after the build finishes, run make install to add the OpenCV includes to the /usr/local/include folder and the libraries to the /usr/local/lib and /usr/local/share/OpenCV/3rdparty/lib folders.

After that's done, you should be able to build your own C++ applications that link against OpenCV.

    The Face Detector Application

Now let's try to build our first application with OpenCV

Here's the code and comments that explain how to do just that :

I saved this as main.cpp. To build it I used the following command :
Hopefully, no errors should occur.


Before running the application, you have to copy the lbpcascade_frontalface.xml next to the main file.  You can find this file in the OpenCV source folder under /data/lbpcascades/. You can also find some other cascades to detect eyes, cat faces, etc.

Now just run the ./main and enjoy!