You would be forgiven for not initially recognizing some of the high-level similarities between the practice of research in sciences such as physics and research in ornithology. One basic similarity is that we are all constrained in what we can measure. Quantum physics has its uncertainty principle that describes limits on what can be measured. Ornithologists are at times limited in what they can measure by the very things that they are trying to observe: birds will sometimes actively avoid detection. Additionally, we all have to deal with imperfect measuring devices and the need to create calibrations for these devices. And we all need to do “big science” to find answers to some of our questions. In the case of ornithology, various groups are building sensor networks that span countries if not entire continents. It’s just that ornithologists call their sensors “bird watchers.”
One of these ornithological sensor networks was prototyped roughly sixteen years ago across the United States and Canada, called the Great Backyard Bird Count (GBBC). It served to as a test platform for engaging the general public in reporting bird observations over a large geographical area, as well as the web systems needed to ingest and manage the information that was provided. While the GBBC still happens each year, engaging tens of thousands of people over a single long weekend in February, some of the GBBC’s participants keep counts and report on the birds that they see year-round, and from the GBBC a global bird-recording project emerged, named eBird. eBird collects a lot of data, thousands of lists of birds each day, at a rate fast enough that you can watch the data coming into the database).
So, what do we do with all of those data? That’s where mathematics comes into the picture in a big way. As I already wrote, we know that the lists of birds that people report are not perfect records of the birds that were present. Some subset of the birds, and even entire species, almost certainly went undetected. We need to account for these uncertainties in what observers detect in order to get an accurate picture of where birds are living, and where they’re traveling. Sometimes we have enough background information to be able to write down the statistical equations that describe the processes that affect the detection of birds and the decisions that birds make about where to live. There are other times, however, when we do not know enough to be able to write down an accurate statistical model in advance, but instead we need to discover the appropriate model as part of our analyses of the information. In these instances, our analyses fall into the realm of data mining and machine learning.
Using a novel machine-learning method, we are able to describe the distributions of bird species across the United States, accurately showing where species are found throughout the entire year. The map, below, is an example of the results. This map shows the distribution of a bird species called an Orchard Oriole that winter in Central and South America. In spring, most of these orioles fly across the Gulf of Mexico to reach the United States, where the migrants divide up into two distinct populations: one living in the eastern United States and a second population in the Great Plains states. Then in fall, both populations take a more westerly route back to their wintering grounds along the east coast of Mexico. Being able to accurately describe the seasonally-changing distribution of these orioles and other species of birds means that our machine-learning analyses were able to use information on characteristics of the environment, such as habitat, in order to identify the preferred habitats of birds as well as how these habitat preferences change over the course of a year. So, not only do these analyses tell us where birds are living, but these analyses also provide insights into the reasons why birds choose to live where they do.
Knowing where birds live isn’t an end in itself. Being able to create an accurate map of a species’ distribution means that we understand something about that species’ habitat requirements. Additionally, knowledge of birds’ distributions, especially fine-grained descriptions of distributions, can have very practical applications. This observation was the basis for a very practical application: determining the extent to which different parties are responsible for conservation and management of different bird species. This effort, jointly undertaken by a number of governmental and non-governmental agencies, took the continent-wide range maps throughout the year and superimposed them on information about land ownership throughout the United States. The result was the first assessment of the extent to which many bird species were living on lands that were publicly or privately owned, and within the public lands the agencies most responsible for management were also identified. The product, the State of the Birds report for 2011, provided the first quantitative assessment of management responsibilities for a large number of species across their U.S. ranges.
The computational work underlying the State of the Birds report is a final point of similarity between ornithology and other big-data sciences: all of the model building is well beyond the capacity of a desktop computer. Climatology, astronomy, and biomedical research all readily come to mind as areas of research that make heavy use of high-performance computer systems (or supercomputers) in which a larger task can be broken into many smaller pieces that are each handled by one of a large number of individual processing units. The building of hundreds of year-round, continent-wide bird distribution models lends itself to this same divide-and-conquer process, because the continent-wide distribution models are built from hundreds of sub-models that each describes environmental associations in a smaller region and narrow slice of time.
The collection of citizen-science data in the Great Backyard Bird Count and eBird is only the start of a long process of gaining insights from these raw data. Extracting information from the data has required collaboration between ornithologists, statisticians, and computer scientists working together in the interface between biology and mathematics. As an ornithologist by training, it has been an interesting and exciting journey for me to travel.
Dr. Wesley Hochachka is a senior research associate at Cornell University, and the assistant director of the Bird Population Studies program at the Cornell Lab of Ornithology.