abstracting data: signal from noise

In keeping with the previous post about the need to incorporate creative thinking into science, here’s a novel approach that made the NYT headlines this morning.

Instance of a DataSet - Month 7, Floor 6, Pan

Brooklyn based artist Daniel Kohn is working with geneticists on conceptual tools to analyze their large data sets. Using intuitive, perceptual learning combined with his artistic approach to data reduction, Kohn is helping these scientists understand new ways to find signal amongst the noise. Starting with a collaboration in 2003 with research groups in Boston that led to a residency with the Broad Institute for Genetic research, at MIT and Harvard, Kohn’s interest in science expanded to include different mediums: painting, drawing, and computer modeling [1]. Since his pioneering collaboration with Broad, Kohn and several other artists have participated in their artist-in-residence program. More information about the program can be found on their webpage.

Shown in the image above, his series “Instance of a Dataset” culminated with a unique mural for the Broad Institute and an ongoing collaboration between artists and scientists. More images from this installation may be found in his gallery online[2]. Kohn’s work speaks to a type of perceptual thinking and visual learning that we all utilize in our daily lives, with the difference that he is putting these tools into experimental approaches to real world datasets. Call it what you will, creative, intuitive, perceptual, or visual learning, these methods are all part of a new approach to thinking about complex data in novel ways.

Here’s another example of sorting out the signal from noise in a simple dataset from my first thesis[3].

Prochlorococcus pop4 (blue)

Prochlorococcus pop4 (blue)

This is a flow cytometric histogram or density plot showing distinctly different populations of marine cyanobacteria from a station sampled off the coast of South America in 2008. The frozen vial of seawater was analyzed by running a small volume through a flow cytometer and the output is literally a cloud of dots like this. Each dot signifies a particle of a particular size with unique fluorescence properties. The goal is to quantify this mess and distinguish between the background noise and populations of interest. This is accomplished easily with out of the box image analysis software and careful knowledge of the properties inherent in the type of data you’re working with – but there’s still an intuitive nature to this analysis. It can be subjective and open ended when you are hand selecting groups of dots and making artificial cut-offs. There are no steadfast rules to this type of data analysis, and you must be the kind of scientist that can work with imperfect data.

I recently finished a 3-year fellowship working on a unique time-series dataset to extract patterns. Most of my work involved the application and understanding of statistical models. There was a lot of time in front of messy data. There were a lot of visual tools and head scratching. There were things like this simple heat-map that took too much time to construct, a lot more data points that you can imagine – but resulted in a visually interesting approach to think about a dataset[3].

Annual abundance of selected plankton groups from a Long-term data series. Plot by H.A.Wright using the statistical software R.

Heatmap plot showing annual abundance of selected plankton groups from a Long-term data series. Plot by H.A.Wright using the statistical software R.

Using this type of approach leaves the data open to interpretation in a sometimes fuzzy manner, but from the vast types of data and rapidly evolving software, there are new and beautiful ways to think about your science. I’ll feature innovative artists and scientists from time to time on this webpage. Feel free to comment or ask further questions about my previous work.


  1. Kohn, Daniel, http://kohnworkshop.com/TextPage-GR-Broad.php
  2. Kohn, Daniel, Online Flickr gallery Commissions Broad Institute 2013, “Instance of a Dataset”  url: https://flic.kr/s/aHsjy5jxB8
  3. Wright, H.A., Biogeographical analysis of picoplankton populations across the Patagonian Shelf Break during austral summer, MS Thesis, 2010.
  4. Wright, H.A. MPhil thesis: Long-term variability of plankton phenology in a coastal, Mediterranean time series (LTER-MC), 2013.

Tags: , , , , , ,

About hawright

marine ecologist

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: