The two new techniques developed in this project—Collective Graphical Models (CGMs) and Semi-Parametric Latent Process Models (SLPMs)—will greatly expand the scope of problems that can be addressed using the state-of-the-art modeling framework of probabilistic graphical models. Over the past 20 years, probabilistic graphical models have transformed machine learning and statistical modeling by providing a framework for reasoning about complex probabilistic systems in a computationally tractable way. However, there are major challenges when attempting to apply these techniques to the problem of bird migration.
First, it is often the case in fields such as ecology and social sciences that one has access only to aggregate data (eBird, acoustic, and radar data are all population-level data sets), but one wishes to model individual behavior (i.e., the migration routes of individual birds and how they respond to different environmental conditions). There is a natural way to formulate this problem using a graphical model: simply include a variable for every member of the population, and reason about the way that the individuals come together to produce the aggregate data. The problem with this approach is that it results in an enormous and computationally intractable model. Collective Graphical Models are a technique to make this connection between individual models and aggregate data, but they avoid the need to reason about individuals by working directly in the space of sufficient statistics of the individual model. They are generally applicable to settings such as US Census data and other social science data, where privacy concerns dictate that only aggregate data can be released, but one wishes to reason about individual behavior. Research questions surrounding CGMs include the development of efficient learning and inference algorithms, as well as understanding the limitations of learning with aggregate data compared with individual-level data (e.g., tracking individual birds).
Another major challenge arises when applying graphical models to scientific data. Bird migration is an example of a latent process: observations such as eBird, acoustic, and radar record only the occurrence of birds at different instants in space and time, but do not record any direct measurements of migration. While sophisticated techniques exist for learning latent variable models, these typically care very little about the mechanisms of the process itself, as long as the model is a good fit to empirical data. However, in scientific pursuits such as this one, the explicit goal is to infer mechanisms from evidence. The goal of our research in Semi-Parametric Latent Process Models (SLPMs) is to incorporate flexible machine learning methods and algorithms into latent process models for large-scale scientific data, while developing new techniques to ensure that the learned models remain faithful to underlying mechanisms.
Jed Irvine at Oregon State University created an online radar annotation tool that allows us to manually scan through radar images and tag them with appropriate characteristics and attributes. For example, when we review radar imagery, we want to know whether some images contain primarily biological targets (e.g. birds, bats, insects), contamination from other aerial particulate matter (“aerial plankton” e.g. smoke, pollen), or contamination from meteorological phenomena. “Non-biological” targets can pose problems for the construction of our early model products.
Since 2010, Dan Sheldon, together with David Winkler from Cornell and colleagues at Oregon State and Tulane University, has been developing methods to understand the migration and roosting behavior of the Tree Swallow using WSR-88D. Each night during fall migration and winter, Tree Swallows collect into huge roosts, which sometimes contain upwards of one million birds. At sunrise, the birds leave en masse and disperse in all directions to forage, creating a cloud that shows up as a distinctive expanding “roost ring” on weather radar. Dan and colleagues are developing machine learning and computer vision techniques to automatically locate these roosts so we can then study the patterns of Tree Swallow distribution and migration across the entire US using the network of WSR-88D radars.