There are so many, many ways to graphically convey scientific data. But depending on how this information is presented, it can be perceived differently by different people — if not completely inaccurately. Here are 10 simple rules to help you convey your data more effectively.
Top image: Called "Security Blanket," it's a digitally printed image on a cotton-fabric quilt with layers of color-coded passwords. The artist, Lorrie Faith Cranor from Carnegie Mellon University, formed a research group to find ways of improving password policies by analyzing stolen passwords. The artwork reveals the extent to which people choose identical — and weak — passwords.
These tips were compiled by Nicolas Rougier, Michael Droettboom, and Philip Bourne. The key thing to keep in mind, they argue, is to create effective graphical interfaces between people and the data being presented.
"[We] do not pretend to explain everything about this interface," they write in their news PLOS Computational Biology article. "Instead we aim to provide a basic set of rules to improve figure design and to explain some of the common pitfalls.
It's important to identify — as early as possible in the design process — the audience and the message the graphic is supposed to convey. As the authors write:
[If] you intend to publish a figure in a scientific journal, you should make sure your figure is correct and conveys all the relevant information to a broader audience. Student audiences require special care since the goal for that situation is to explain a concept. In that case, you may have to add extra information to make sure the concept is fully understood.
Visualizations for the general public must be simple — even approximated — to convey only the most salient part of the research.
A remake of a figure originally published in the New York Times. It's effective because it exploits the fact that the number of new cases is always greater that the corresponding number of deaths to mix the two values. It also takes advantage of left-to-right reading direction. This is great for a general-audience publication, but unacceptable for a scientific publication, which would require actual numerical values.
It's crucial that you clearly identify the role of the graphic. Once this has been done, this message will be a strong guide for the design of the figure.
This projection from the retina surface (left) to the colicular surface (a brainstem structure, at right) is based on a model in which a logarithmic mapping function is used. The checkerboard pattern is artificially introduced to demonstrate the extreme magnification of the foveal region, which is the main message of the figure.
"Only after identifying the message will it be worth the time to develop your figure, just as you would take the time to craft your words and sentences when writing an article only after deciding on the main points of the text," write the authors. "If your figure is able to convey a striking message at first glance, chances are increased that your article will draw more attention from the community."
Where is your visualization going to appear? A poster? A computer model? A projection screen? Or a simple sheet of paper? This matters, because each of these media represents different physical sizes for the figure. More importantly, each of them connotes different ways of viewing and interacting with the visualization.
For example, during an oral presentation, a figure will be displayed for a limited time. Thus, the viewer must quickly understand what is displayed and what it represents while still listening to your explanation. In such a situation, the figure must be kept simple and the message must be visually salient in order to grab attention.
Here's an effective infographic on "Wearable Power" produced by scientists at Drexel University. [Credit: Kristy Jost, Babak Anasori, Majid Beidaghi, Genevieve Dion, and Yuri Gogotsi; Drexel University.]
Figures project on video screens, which are often seen from a distance, should have figure elements with thicker lines or bigger fonts. Colors should have strong contrast, and vertical text avoided. For journals the situation is markedly different. Feel free to use lots of details, along with complementary explanations in the caption.
An excellent example of a video visualization, the "Dynamic Earth" by NASA.
Whether describing an experimental setup, introducing a new model, or presenting new results, you cannot explain everything within the figure itself—a figure should be accompanied by a caption. The caption explains how to read the figure and provides additional precision for what cannot be graphically represented. This can be thought of as the explanation you would give during an oral presentation, or in front of a poster, but with the difference that you must think in advance about the questions people would ask.
Plotting software arrives with a set of default settings. They are not necessarily your friends. Parameters typically include size, font, colors, styles, ticks, markers, and others.
Since these settings are to be used for virtually any type of plot, they are not fine-tuned for a specific type of plot. In other words, they are good enough for any plot but they are best for none. All plots require at least some manual tuning of the different settings to better express the message, be it for making a precise plot more salient to a broad audience, or to choose the best colormap for the nature of the data.
At left, the sine and cosine functions as rendered by matplotlib using default settings. It can be visually improved by tweaking the various available settings, as shown on the right.
As noted by Edward Tufte, color can be either your greatest ally or your worst enemy if not used properly. But if you decide to use color in your visualizations, you need to consider which colors to use and where to use them.
These figures show the same signal, whose frequency increases to the right and intensity increases towards the bottom. Three different color maps are used here, but the color map and the seismic colormap are ineffective because they obscure details in the high frequency domain, as shown at the bottom-right of the purples colormap.
For example, to highlight some element of a figure, you can use color for this element while keeping other elements gray or black. This provides an enhancing effect. However, if you have no such need, you need to ask yourself, "Is there any reason this plot is blue and not black?" If you don't know the answer, just keep it black.
Keep in mind that you're trying to convey the data as objectively as possible. This is science after all. The figure is tied to the data — but if you loosen this tie, you may unintentionally project a different message than intended. As noted by the authors:
[A] number of implicit choices made by the library or software you're using that are meant to be accurate in most situations may also mislead the viewer under certain circumstances. If your software automatically re-scales values, you might obtain an objective representation of the data (because title, labels, and ticks indicate clearly what is actually displayed) that is nonetheless visually misleading; you have inadvertently misled your readers into visually believing something that does not exist in your data. You can also make explicit choices that are wrong by design, such as using pie charts or 3-D charts to compare quantities. These two kinds of plots are known to induce an incorrect perception of quantities and it requires some expertise to use them properly.
Also, make sure to use the simplest type of plots that can convey your message. Only use labels, ticks, title, and the full range of values when relevant. And don't hesitate to ask colleagues about their interpretation of your figures.
On the left part of the figure, a series of four values are represented: 30, 20, 15, 10. On the upper left part, the disc area is used to represent the value, while in the bottom part the disc radius is used. The results are visually very different. In the latter case (red circles), the last value (10) appears very small compared to the first one (30), while the ratio between the two values is only 3:1.
This refers to "all the unnecessary or confusing visual elements found in a figure that do not improve the message (in the best case) or add confusion (in the worst case)." Stuff like too many colors, too many labels, gratuitously colored backgrounds, useless grid lines, and so on.
Noisy and useless.
Graphics programs tend to prioritize aesthetics over content. But even if many of those graphics are beautiful, most of them do not with within the scientific framework.
An xkcd filter on matplotlib was used to create this figure. "[The] message is particularly clear even if the aesthetic of the figure is questionable," write the authors.
"Remember, in science, message and readability of the figure is the most important aspect while beauty is only an option."
Depending on the type of visualization you're trying to create, you can be fairly certain that a dedicated tool will help you accomplish what you're trying to achieve. Here'a small list of open-source tools:
Matplotlib is a python plotting library, primarily for 2-D plotting, but with some 3-D support, which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. It comes with a huge gallery of examples that cover virtually all scientific domains (http://matplotlib.org/gallery.html).
R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly extensible.
Inkscape is a professional vector graphics editor. It allows you to design complex figures and can be used, for example, to improve a script-generated figure or to read a PDF file in order to extract figures and transform them any way you like.
TikZ and PGF are TeX packages for creating graphics programmatically. TikZ is built on top of PGF and allows you to create sophisticated graphics in a rather intuitive and easy manner, as shown by the Tikz gallery (http://www.texample.net/tikz/examples/….
GIMP is the GNU Image Manipulation Program. It is an application for such tasks as photo retouching, image composition, and image authoring. If you need to quickly retouch an image or add some legends or labels, GIMP is the perfect tool.
ImageMagick is a software suite to create, edit, compose, or convert bitmap images from the command line. It can be used to quickly convert an image into another format, and the huge script gallery (http://www.fmwconcepts.com/imagemagick/in… by Fred Weinhaus will provide virtually any effect you might want to achieve.
Cytoscape is a software platform for visualizing complex networks and integrating these with any type of attribute data. If your data or results are very complex, cytoscape may help you alleviate this complexity.
Circos was originally designed for visualizing genomic data but can create figures from data in any field. Circos is useful if you have data that describes relationships or multilayered annotations of one or more scales.
All charts via Rougier et al/PLOS except where indicated.
Read the entire article at PLOS Computational Biology: "Ten Simple Rules for Better Figures".
Follow me on Twitter: @dvorsky