[RFC] Add script for plotting string benchmark results

Message ID VE1PR08MB47837A9E030729BC36C83452846C0@VE1PR08MB4783.eurprd08.prod.outlook.com
State New
Headers show
  • [RFC] Add script for plotting string benchmark results
Related show

Commit Message

Krzysztof Koch Oct. 18, 2019, 2:47 p.m.
Add a script for visualizing the JSON output generated by existing
glibc string microbenchmarks.

plot_strings.py is capable of plotting both absolute and relative
(--relative flag) timings. For both types of graphs, there is an
option to explicitly specify the ifuncs to plot, from the ones listed
in the benchmark results file. This is can be achieved using the
--ifuncs parameter.

For absolute plots, it is possible to show the exact timing values
(--values flag), but the default behavior is to hide the figures.

When plotting relative timing differences between ifuncs, the first
ifunc listed in the input JSON file is the baseline, unless the
baseline implementation is explicitly chosen with the --baseline

For ease of reading, the script marks the statistically insignificant
differences on the graphs. The default is +-5% but this value can
be controlled with the --threshold parameter.

Finally, if a window manager is available, one can enable interactive
figure display which allows navigation through the graph. To enable
this feature use the --display flag and make sure you are using a
GUI-based matplotlib backend.

To accommodate for the heterogeneity in string benchmark results
files, one can control i.e x-axis scale, or the key to access the
varied parameter value in the JSON tree, with options like
--logarithmic or --param. This ensures that the script works with
all files which pass JSON schema validation. The schema to validate
against can be chosen with the --schema parameter.

plot_strings.py traverses the JSON tree until a 'results' array
is found and generates a separate figure for each such array.
The figure is then saved to a file in one of the available formats
(controlled with the --extension parameter).

As the tree is traversed, the recursive function tracks the metadata
about the test being run, so that each figure has a unique and
meaningful title and filename. Given the current state of the
benchmarks, this data is limited to the top-level 'bench-variant'.

While plot_strings.py works with existing benchmarks, provisions
have been made to allow adding more structure and metadata to these
benchmarks. Currently, many benchmarks produce multiple timing values
for the same value of the varied parameter (typically 'length').
Mutiple data points for the same parameter usually mean that some other
parameter was varied as well, for example, if memmove's src and dst
buffers overlap or not (see bench-memmove-walk.c and

Unfortunately, this information is not exposed by the benchmark, so
plot_strings.py has to resort to computing the geometric mean of
these multiple values. In the process, useful information about the
benchmark configuration is lost. Also, averaging the timings for
different alignments can hide useful characterstics of the benchmarked

The script has been tested on all existing string microbenchmarks
which produce results in JSON format. plot_string.py was tested
on both Windows and Ubuntu 16.04.2 LTS. It runs on both python 2
and 3 (2.7.12 and 3.5.12 tested).

Useful commands:
1. Plot absolute timings for all ifuncs in bench-strlen.out:
$ ./plot_strings.py bench-strlen.out

2. Display help:
$ ./plot_strings.py -h

3. Plot absolute timings for selected ifuncs. Save the generated figures
in pdf format to the 'results' directory. Use logarithmic x-axis scale
and show grid lines as well as exact timing numbers:
$ ./plot_strings.py bench.out -o results/ -g -v -l -e pdf \
-i __memset_avx512_unaligned_erms __memset_avx512_unaligned

4. Plot relative timings for all ifuncs in bench.out with __generic_memset
as baseline. Display a percentage difference threshold of +-10%:
$ ./plot_strings.py bench.out -r -b __generic_memset -t 10

1. I would like to propose relaxing the benchout_strings.schema.json
to allow specifying either a 'results' array with 'timings' (as before)
or a 'variants' array right after the array of 'ifuncs'. See below

 "timing_type": "hp_timing",
 "functions": {
  "memcpy": {
   "bench-variant": "default",
   "ifuncs": ["generic_memcpy", "__memcpy_thunderx"],
   "variants": [
     "name": "powers of 2",
     "variants": [
       "name": "both aligned",
       "results": [
         "length": 1,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
         "length": 2,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
         "length": 65536,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
       "name": "dst misaligned",
       "results": [
         "length": 1,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
         "length": 2,
         "align1": 0,
         "align2": 1,
         "timings": [x, y]

'variants' array consists of objects such that each object has a 'name'
attribute to describe the configuration of a particular test in the
benchmark. This can be a description, for example, of how the parameter
was varied or what was the buffer alignment tested. The 'name' attribute
is then followed by another 'variants' array or a 'results' array.

The nesting of variants allows arbitrary grouping of benchmark timings,
while allowing giving description of these groups. Using recusion, one
can proceduraly create titles and filenames for the figures being

1. Is the idea of adding variants array to benchout_strings.schema.json
a good one? The goal is not to break existing scripts but to start the
process of tidying up the string benchmarks and adding metadata to them.

2. Would people find this script useful? Are there any features missing?
Have I made some wrong assumptions?

3. I am aware that compare_strings.py is already capable of plotting
string benchmark results, however, the readability of the genereated
graphs is questionable. One problem with the script is that it labels
each individual data point with the specific parameter values because
multiple parameters are varied simultaneously. This approach only works
for few data points.

If we add more structure to the string benchmark output, then
it will become possible to produce much more timings without overwhelming
the reader. For example, bench-memcpy.c increments the 'length' by 1 to
generate some of the 'timings', but it only goes from 0 to 31, which I
believe is mostly motivated by the fear of generating too much data.
 benchtests/scripts/plot_strings.py | 263 +++++++++++++++++++++++++++++++++++++
 1 file changed, 263 insertions(+)
 create mode 100755 benchtests/scripts/plot_strings.py



Szabolcs Nagy Oct. 18, 2019, 3:13 p.m. | #1
On 18/10/2019 15:47, Krzysztof Koch wrote:
> +import matplotlib as mpl

> +mpl.use("Agg")      # Remove this line to use (default) GUI backend

> +

> +import argparse

> +from collections import defaultdict

> +import json

> +import matplotlib.pyplot as plt

i think it should be possible to move this import into
main after arg parsing, then set a global plt.

then you can have a --backend command line option that
changes mpl.use("Agg") to something else.

> +import numpy as np

> +import os

> +


diff --git a/benchtests/scripts/plot_strings.py b/benchtests/scripts/plot_strings.py
new file mode 100755
index 0000000..34d39b9
--- /dev/null
+++ b/benchtests/scripts/plot_strings.py
@@ -0,0 +1,263 @@ 
+# Plot GNU C Library string microbenchmark output.
+# Copyright (C) 2019 Free Software Foundation, Inc.
+# This file is part of the GNU C Library.
+# The GNU C Library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+# The GNU C Library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# Lesser General Public License for more details.
+# You should have received a copy of the GNU Lesser General Public
+# License along with the GNU C Library; if not, see
+# <https://www.gnu.org/licenses/>.
+"""Plot string microbenchmark results.
+Given a benchmark results file in json format and a benchmark schema file,
+plot either absolute or relative benchmark timings.
+Separate figure is generated and saved to a file for each 'results' array
+found in the input file.
+import matplotlib as mpl
+mpl.use("Agg")      # Remove this line to use (default) GUI backend
+import argparse
+from collections import defaultdict
+import json
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+    import jsonschema as validator
+except ImportError:
+    print("Could not find jsonschema module.")
+    raise
+# Use pre-selected markers for plotting lines to improve readability
+markers = [".", "s", "x", "o", "^", "|", "*", "1", "+", "D"]
+# Benchmark variants for which the x-axis scale should be logarithmic
+log_variants = {"powers of 2"}
+def gmean(numbers, axis=None):
+    """Compute geometric mean.
+    Args:
+        numbers: numbers to compute geometric mean of
+    Return:
+        geometric mean of numbers
+    """
+    a = np.array(numbers, dtype=np.complex)
+    means = a.prod(axis) ** (1.0 / len(a))
+    return np.real(means)
+def relativeDifference(x, x_reference):
+    """Compute relative difference.
+    Args:
+        x: values
+        x_reference: reference (baseline) values
+    Return:
+        relative difference between x and x_reference (in %)
+    """
+    abs_diff = np.subtract(x, x_reference)
+    return np.divide(np.multiply(abs_diff, 100.0), x_reference)
+def plotRecursive(json_iter, routine, ifuncs, bench_variant, title, outpath,
+                  x_scale):
+    """Plot benchmark timings.
+    Args:
+        json_iter: reference to json object
+        routine: benchmarked string routine
+        ifuncs: list of ifuncs tested
+        bench_variant: top-level benchmark variant name
+        title: generated figure's title
+        outpath: output file path (without extension)
+        x_scale: x-axis scale
+    """
+    # 'variants' array found, continue recursive search for 'results' array
+    if "variants" in json_iter:
+        for variant in json_iter["variants"]:
+            new_title = "%s%s, " % (title, variant["name"])
+            new_outpath = "%s_%s" % (outpath, variant["name"].replace(" ", "_"))
+            new_x_scale = "log" if variant["name"] in log_variants else x_scale
+            plotRecursive(variant, routine, ifuncs, bench_variant, new_title,
+                          new_outpath, new_x_scale)
+        return
+    # BASE CASE:
+    # 'results' array found, collect and plot timings
+    domain = []
+    timings = defaultdict(list)
+    for result in json_iter["results"]:
+        domain.append(result[args.param])
+        timings[result[args.param]].append(result["timings"])
+    domain = np.unique(np.array(domain))
+    averages = []
+    # Use geometric mean if there are multple timings for each parameter
+    # value.
+    for parameter in domain:
+        averages.append(gmean(timings[parameter], axis=0))
+    averages = np.array(averages).transpose()
+    # Choose ifuncs to plot
+    if args.ifuncs:
+        plotted_ifuncs = [x.replace("__", "") for x in args.ifuncs]
+    else:
+        plotted_ifuncs = ifuncs
+    plotted_indices = [ifuncs.index(x) for x in plotted_ifuncs]
+    # Plot relative timings
+    if args.relative:
+        # Choose the baseline ifunc
+        if args.baseline:
+            baseline = args.baseline.replace("__", "")
+        else:
+            baseline = ifuncs[0]
+        baseline_index = ifuncs.index(baseline)
+        # Compare the timings against baseline
+        codomain = relativeDifference(averages[plotted_indices,:],
+                                      averages[baseline_index])
+        # Start plotting relative timings
+        plt.figure()
+        plt.axhspan(-args.threshold, args.threshold, color="lightgray",
+                    alpha=0.3)
+        plt.axhline(0, color="k", linestyle="--", linewidth=0.4)
+        plt.ylabel("Relative timing (in %)")
+        title = "Relative timing comparison against %s\nfor %s benchmark, %s" \
+                % (baseline, bench_variant, title)
+        outpath = os.path.join(args.outdir, "%s_%s%s" % \
+                  (baseline, bench_variant, outpath))
+    # Plot absolute timings
+    else:
+        codomain = averages[plotted_indices,:]
+        plt.figure()
+        if not args.values:
+            plt.axes().yaxis.set_major_formatter(plt.NullFormatter())
+        plt.ylabel("Timing")
+        title = "%s %s benchmark results\n%s" % (routine, bench_variant, title)
+        outpath = os.path.join(args.outdir, "%s_%s%s" \
+                               % (routine, bench_variant, outpath))
+    # Finish generating Figure
+    plt.xlabel(args.param)
+    plt.xscale(x_scale)
+    plt.title(title)
+    if args.grid:
+        plt.grid(color="k", linestyle=":", linewidth=0.5, alpha=0.5)
+    for i in range(len(plotted_ifuncs)):
+        plt.plot(domain, codomain[i], marker=markers[i % len(markers)], label=plotted_ifuncs[i])
+    plt.legend(loc="best")
+    plt.savefig("%s_%s.%s" % (outpath, x_scale, args.extension),
+                format=args.extension)
+    if args.display:
+        plt.show()
+    plt.close()
+def main(args):
+    """Program Entry Point.
+    Args:
+      args: command line arguments (excluding program name)
+    """
+    schema = None
+    with open(args.schema, "r") as f:
+        schema = json.load(f)
+    for filename in args.bench:
+        bench = None
+        with open(filename, "r") as f:
+            bench = json.load(f)
+        validator.validate(bench, schema)
+        for function in bench["functions"]:
+            bench_variant = bench["functions"][function]["bench-variant"]
+            x_scale = "log" if args.logarithmic else "linear"
+            ifuncs = bench["functions"][function]["ifuncs"]
+            ifuncs_trimmed = [x.replace("__", "") for x in ifuncs]
+            plotRecursive(bench["functions"][function], function,
+                          ifuncs_trimmed, bench_variant, "", "", x_scale)
+""" main() """
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description=
+            "Plot string microbenchmark timings",
+            formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    # Required parameter
+    parser.add_argument("bench", nargs="+",
+                        help="benchmark results file(s) in json format")
+    # Optional parameters
+    parser.add_argument("-b", "--baseline", type=str,
+                        help="baseline ifunc for relative plot")
+    parser.add_argument("-d", "--display", action="store_true",
+                        help="display figures (requires window manager)")
+    parser.add_argument("-e", "--extension", type=str, default="png",
+                        choices=["png", "pdf", "svg"],
+                        help="output file(s) extension")
+    parser.add_argument("-g", "--grid", action="store_true", help="show grid")
+    parser.add_argument("-i", "--ifuncs", nargs="+", help="ifuncs to plot")
+    parser.add_argument("-l", "--logarithmic", action="store_true",
+                        help="use logarithmic x-axis scale")
+    parser.add_argument("-o", "--outdir", type=str, default=os.getcwd(),
+                        help="output directory")
+    parser.add_argument("-p", "--param", type=str, default="length",
+                        help="varied parameter name")
+    parser.add_argument("-r", "--relative", action="store_true",
+                        help="plot relative timing differences")
+    parser.add_argument("-s", "--schema", type=str,
+                        default=os.path.join(os.path.dirname(
+                        os.path.realpath(__file__)),
+                        "benchout_strings.schema.json"),
+                        help="schema file to validate the result file.")
+    parser.add_argument("-t", "--threshold", type=int, default=5,
+                        help="diference threshold to mark in graph (in %%)")
+    parser.add_argument("-v", "--values", action="store_true",
+                        help="show actual timing values")
+    args = parser.parse_args()
+    if (not args.relative and args.baseline) or (args.relative and args.values):
+        raise ValueError("Conflicting command line arguments")
+    main(args)
\ No newline at end of file