Martin Robillard · Blog

Summarizing Qualitative Evidence with Spark-Histograms

28 November 2014 by Martin P. Robillard based on work with Annie Ying

When reporting on data analyzed using a qualitative research approach, a major challenge is to properly indicate the nature and amount of the evidence that support a given observation.

Explicit links to the raw data are essential for transparency and to allow readers to assess the strength of the evidence for the observations reported. Unfortunately, continual references to participants and units of analysis can quickly overwhelm the text. Consider this example from an article in Empirical Software Engineering, where observations about the use of code examples were supported by statements from six developers, one team lead, and free-form survey responses:

This is stretching the limit: had the observations been reported by a couple more participants, we would have had an entire line used up just for links to the evidence.

Summarizing the data quantitatively (e.g., "99% of participants said they did not find meetings useful") can easily be misinterpreted as a sample statistic, which is not a correct interpretation when obtained from theoretically-derived samples commonly used for grounded theory studies. So where is the sweet spot? With every qualitative study my collaborators and I write up, we agonize about how much links we can include without destroying the readability of the paper.

In a recent study conducted by Annie Ying as part of her Ph.D. work, our qualitative data set consisted of 156 code summaries produced by 16 participants - hence potential for serious clutter.

The solution we employed was to summarize this two-dimensional evidence using tiny histograms, which we embedded directly in the text of the paper.

Here, a histogram presents the distribution of observations of a given code summarization practice for a participant (each bar) over the ten code fragments (the vertical axis). This technique is inspired by sparklines, and shows both the relative and absolute strength of the evidence for a practice. Intuitively, the area covered by the bars is the total amount of evidence. Furthermore, patterns in the distribution of the evidence can easily be spotted: e.g., an occasional practice common across participants (wide shallow shape), or a strongly personal practice (thin, tall shape).

To generate the histograms with LaTeX, you need to use the Tikz LaTeX package and include icon.tex (macro courtesy of Annie Ying). In the text body, you generate histograms with the command \picture{<histogram data file>}. The format of the file is (one per line): x y, where x is a coordinate on the x-axis (the participant axis in our paper) and y the height of the vertical bar (in our paper, the number of code fragments where the practice was observed).

I encourage you to explore this option for your next qualitative paper!