pprof -tagroot: Custom Labels in Go Flamegraphs

Break down your flamegraph with custom labels: Hugo site generation profile broken down by site section, HTML template, and HTML page.

This post describes a new pprof feature I just landed.

pprof -tagroot lets you break down pprof software profiles by  domain-specific key-value pairs (Labels, aka Tags) attached to each stack sample.

e.g. on a pprof profile with  http_route and user_id labels, this command:

$ pprof -tagroot http_route,user_id input_profile.pprof

inserts synthetic stack frames of the label value at the root of the profile stack.

pprof's flamegraphs visualise these new stack frames, letting you see which HTTP routes and user IDs are causing the most usage.

The first specified label (here: http_route) inserts new frames at the new root stack frame with the values matching that key.

pprof -tagroot will land in go tool pprof with Go 1.18, expected Feb 2022. To use it today, install upstream pprof:

$ go install github.com/google/pprof@latest

Background

pprof is a software profiler data format & analysis toolkit, which ships with the Go programming language, but is also widely used also analyse profiles of many other languages.

Example: Profiling Hugo Static Site Generator

Previously, in Attributing Latency in Go with pprof Tags, I investigated why the Hugo static site generator was slow to build my site, using custom pprof Labels to drill-down by template name, generated page name, and the site section.

Hugo compiles each page of the site, and some pages are slow. It's hard to tell just from the stack trace what part of the site is being built, so I added custom pprof Labels to each page's generation, with:

labels := pprof.Labels(
    "target_filename", targetPath,
    "section", p.Section(),
    "template", t)

pprof.Do(ctx, labels, func(ctx context.Context) {
    [... compile the page ...]
})

Record a profile of building the website to hugo-slow.pprof:

$ cd graphviz.gitlab.io/
$ hugo --profile-cpu hugo-slow.pprof

Then open the profile in the browser:

$ go tool pprof -http :8080 hugo-slow.pprof

Then click 'View', 'Flame Graph'.

This Flamegraph below (open fullscreen) only shows stack traces about texttemplate processing, with no information about which file/template is being processed.

Only showing code stack traces, without showing the input files the code is acting on, makes it hard to relate the profile to the input HTML templates, which are what I can control to make the site faster.

We can add information about which page is being compiled into the Flamegraph by generating new synthetic stack frames from the custom Labels, breaking down by site section, then template name, then target_filename:

$ pprof -http :8080 \
  -tagroot section,template,target_filename \
  -filefunctions \
  hugo-slow.pprof

Gives this Flamegraph (open fullscreen):

Label keys are encoded as filenames of the generated stack frames. So we can show the label keys in the flamegraph using -filefunctions.

This flamegraph shows:

  • On middle left hand side, second row: 64% of time spent in the docs section
  • A small 3% of time spent in the Gallery section (far left)
  • Each section is broken down by which template used (e.g. middle, row 3: 42% in attrs/single.html)
  • The right-hand side of the graph is the parts of Hugo that aren't directly generating an individual page, and they show <unknown> for their synthetic label stack frames. This keeps the actual code stack frames all at the same level.

And you can click the new stack frames to drill down to identify where time was spent compiling individual sections, templates, and individual pages.

More Use Cases

You could break down your profile by:

  • HTTP route: e.g. /api/v2/profile.
  • UserID / Tenant ID, e.g. on a multitenant system where you suspect one user may be hogging all the resources.
  • Process ID / Process Name. When looking at a whole-OS profile.
  • Thread ID / Thread Name / Threadpool name. I use -tagroot threadpool a lot when analysing multithreaded Android applications, to zoom in to the important threads: UI thread and RenderThread, and to discount work happening on a low-priority background thread.
  • Data Center / Availability Zone / Replica ID, if you are combining profiles together from many processes.
  • You could break down memory profiles by the allocation size label.
  • Types of input the code is acting on

Just be careful, your profiles will get very large if you add high-cardinality labels.

With combinations of the above, you could e.g. drill down an entire globally replicated cloud server with:

pprof -tagroot datacenter,job,replica,process_name,threadpool,thread_name,http_route

You can also pivot by selecting labels in a different order. You could analyse the above in any order, for example, -tagroot http_route,threadpool to see a global profile broken down by HTTP route & threadpool.

Also new: pprof -tagleaf

For symmetry, we also added -tagleaf, which is the same as -tagroot, except it inserts new stack frames for the labels at the stack leaves instead of the stack root.

If you specify multiple labels -tagleaf http_route,user_id then the last item user_id becomes the new leaf stack frame, with http_route immediately before.

While tagroot is useful for going from a label to the stacks inside it, tagleaf is useful if you want to drill down from code to the label. Perhaps you have a hotspot, and you aren't sure which thread is causing the hotspot. -tagleaf thread lets you drill down from from the code to the thread responsible.

More pprof Label Resources

Credits

Thanks to:

Mark Hansen

Mark Hansen

Sydney, Australia