Brooklyn based artist Daniel Kohn is working with geneticists on conceptual tools to analyze their large data sets. Using intuitive, perceptual learning combined with his artistic approach to data reduction, Kohn is helping these scientists understand new ways to find signal amongst the noise. Starting with a collaboration in 2003 with research groups in Boston that led to a residency with the Broad Institute for Genetic research, at MIT and Harvard, Kohn’s interest in science expanded to include different mediums: painting, drawing, and computer modeling . Since his pioneering collaboration with Broad, Kohn and several other artists have participated in their artist-in-residence program. More information about the program can be found on their webpage.
Shown in the image above, his series “Instance of a Dataset” culminated with a unique mural for the Broad Institute and an ongoing collaboration between artists and scientists. More images from this installation may be found in his gallery online. Kohn’s work speaks to a type of perceptual thinking and visual learning that we all utilize in our daily lives, with the difference that he is putting these tools into experimental approaches to real world datasets. Call it what you will, creative, intuitive, perceptual, or visual learning, these methods are all part of a new approach to thinking about complex data in novel ways.
Here’s another example of sorting out the signal from noise in a simple dataset from my first thesis.
This is a flow cytometric histogram or density plot showing distinctly different populations of marine cyanobacteria from a station sampled off the coast of South America in 2008. The frozen vial of seawater was analyzed by running a small volume through a flow cytometer and the output is literally a cloud of dots like this. Each dot signifies a particle of a particular size with unique fluorescence properties. The goal is to quantify this mess and distinguish between the background noise and populations of interest. This is accomplished easily with out of the box image analysis software and careful knowledge of the properties inherent in the type of data you’re working with – but there’s still an intuitive nature to this analysis. It can be subjective and open ended when you are hand selecting groups of dots and making artificial cut-offs. There are no steadfast rules to this type of data analysis, and you must be the kind of scientist that can work with imperfect data.
I recently finished a 3-year fellowship working on a unique time-series dataset to extract patterns. Most of my work involved the application and understanding of statistical models. There was a lot of time in front of messy data. There were a lot of visual tools and head scratching. There were things like this simple heat-map that took too much time to construct, a lot more data points that you can imagine – but resulted in a visually interesting approach to think about a dataset.
Using this type of approach leaves the data open to interpretation in a sometimes fuzzy manner, but from the vast types of data and rapidly evolving software, there are new and beautiful ways to think about your science. I’ll feature innovative artists and scientists from time to time on this webpage. Feel free to comment or ask further questions about my previous work.
- Kohn, Daniel, http://kohnworkshop.com/TextPage-GR-Broad.php
- Kohn, Daniel, Online Flickr gallery Commissions Broad Institute 2013, “Instance of a Dataset” url: https://flic.kr/s/aHsjy5jxB8
- Wright, H.A., Biogeographical analysis of picoplankton populations across the Patagonian Shelf Break during austral summer, MS Thesis, 2010.
- Wright, H.A. MPhil thesis: Long-term variability of plankton phenology in a coastal, Mediterranean time series (LTER-MC), 2013.
In today’s BBC Nature news:
After a cold spell, British scientists are concerned about the late arrival of rare butterfly species.
Although my current research is focused primarily on marine plankton phenology, dramatic examples of year to year changes in terrestrial biology are interesting to mark. The recorded observations of flowering events, leaf-out, ice-out and annual migratory patterns comprise phenology across many different ecosystems. Shifting phenological timing due to climatic conditions is difficult to track unless long-term records of both climate and species occurrences are marked.
In contrast with previous year’s observations, the timing of this year’s insects was up to a month later. What role do rare species play in this complex ecosystem interplay of phenological timing and response to environmental conditions?
I cringe at the thought of calling myself an “armchair oceanographer” because in my mind it equates to less and less time spent in the field and more and more time spent in the confines of an office. At some point in our careers, there comes a time when we face the music. From a fellow colleague and well seasoned oceanographer in the field, I’ve been told there isn’t anything wrong with doing armchair science. What this all means to those of you not familiar with this terminology is that rather than being a field-based ecologist with manipulative experiments or extensive survey plots to count, I spend my days in front of large data sets and experiment with a vast array of working hypotheses to test associations and relationships within my data.
This branch of computational science or biology is now being termed “data science”. A recent NYT article discussed the future job prospects of this field and from what I’ve seen it seems data scientists are going to be a hot commodity in the marketplace. I should be happy right? The immediate answer to this question is both yes and no.
An interesting “infographic” from Wikibon.org presents the application of data science to various fields from social networking to time series. While I agree that data science is the new black, I would urge caution about how it is applied. The mechanics of how data science is carried out are fascinating and involve everything from cloud computing to hacking, programming and high level statistics. This part is the “science” in my opinion. The other side of this equation is the why, or experimental approach if you will. Why do data science? What does it tell us? Do you really want to data mine every facebook profile and related tweet to find out what the next generation is thinking, buying and saying? The philosophical side of me thinks this new branch of science should be grounded in a guiding approach to manage not only the accumulation of data but whether it’s truly worth our data-mining efforts.
Regardless of the ethics, data science is the new sexy. Now, let’s test the significance of this statement… off to the reality of science!
The timing of species occurrence in the environment is termed phenology. Just as we can estimate the arrival and departure of seasons by the migratory patterns of birds or the appearance of buds and flowers on trees, a similar pattern is present in the ocean. Due to their small nature, the recurrent appearance and disappearance of phytoplankton and corresponding zooplankton populations goes largely un-noticed unless it is a bloom of significance such as a toxic red tide producing organism for example: Alexandrium fundyense These microscopic organisms have a dramatic impact on food availability to higher trophic level organisms and regulate the carbon and nutrient cycling in the global oceans. In short, the frequency and timing of blooms is an important aspect of global ocean health. When these regular cycles of production shift rapidly, scientists look for clues such as fluctuations in the regular water properties. When phenology shifts over longer time periods, scientists look to larger shifts in climate patterns as a possible mechanism.
A recent finding by SIO researchers indicates that for more than a decade, the timing of spring phytoplankton blooms in the Arctic is occurring earlier each year. Using satellite ocean color data which typically provides an estimate of surface chlorophyll-a levels, the researchers found the blooms were not just earlier but shifted northwards towards the pole. The regions where blooms occur also correlate with areas of decreasing ice and earlier spring melting. Why is this of concern to scientists? The zooplankton population that relies upon the phytoplankton community as a food source may not be able to adjust to the altered timing of the spring bloom event. If this is true, it result in what is commonly called a “trophic mis-match” which is exactly as it sounds. On the bottom of the food-web (trophic level), phytoplankton blooms occur earlier, then the corresponding zooplankton population may not respond to the peak food availability and a lapse in production and consumption occurs. Shorter blooms of phytoplankton that are missing the corresponding zooplankton population may result in a greater carbon flux and lowering of available oxygen levels in the water column.
The recently published paper in Global Change biology illustrates the difficulties with applying global satellite data to address long term trends. For example, I commonly think of using SeaWIFS data sets, but in this case the data sets did not have adequate temporal coverage and also include error introduced by cloud cover. Therefore, when considering long term changes, it is valuable to have corresponding in-situ data to verfiy patterns that may not be representative.
“Alexandrium fundyense Balech”. Encyclopedia of Life, available from http://www.eol.org/data_objects/475511.Accessed March 17, 2011.
University of California – San Diego (2011, March 3). Arctic blooms occurring earlier: Phytoplankton peak arising 50 days early, with unknown impacts on marine food chain and carbon cycling. ScienceDaily. Retrieved March 17, 2011, from http://www.sciencedaily.com /releases/2011/03/110302171320.htm
KAHRU, M., BROTAS, V., MANZANO-SARABIA, M., & MITCHELL, B. (2011). Are phytoplankton blooms occurring earlier in the Arctic? Global Change Biology, 17 (4), 1733-1739 DOI: 10.1111/j.1365-2486.2010.02312.x