In this post Associate Editor Yolanda Wiersma discusses a paper she recently handled by Eelke Folmer and colleagues ‘Consensus forecasting of intertidal seagrass habitat in the Wadden Sea

“All models are wrong, but some are useful”

The above quote, from British statistician George E.P. Box, has become something of an aphorism in modelling. Upon discovery of the quote, graduate students immediately take comfort from it, especially when their models don’t work as well as they had hoped. Although most modellers will agree that “models are wrong”, there is debate within the modelling community about whether some models are more useful than others.

Eelgrass (Zostera mariana). Image by Ronald C. Phillips, PhD. Used under Creative Commons Attribution-Share Alike 3.0 Unported license. Photo obtained via Wiki Media Commons.

Debates often hinge on decisions modellers make about what kind of ecological model to apply, which tools to use, and which data to take advantage of (see Mike Austin’s 2002 paper in Ecological Modelling for an excellent overview of modelling). Decisions about ecological, data and statistical models are ones which all modellers must make, and the challenges inherent in making these decisions are well-illustrated in the recent paper by Folmer et al. “Consensus forecast of intertidal seagrass habitat in the Wadden Sea”. Folmer et al. develop habitat distribution models to identify potential seagrass habitat for two species of seagrass (Zostera noltii and Z. marina) in the Wadden Sea. Seagrasses are an important habitat for many species and also contribute to nutrient cycling in shallow waters (Fig. 1). In the Wadden Sea, seagrass beds have been negatively affected by eutrophication and distributions of seagrasses are limited compared to historical levels (Fig. 2). Conservation and restoration efforts in the Netherlands were not initially successful although recently seagrass extents have increased. By identifying potentially suitable habitat, Folmer et al. hope their work can inform future conservation, restoration and monitoring efforts.

wadden sea
Map showing the extent of the Wadden Sea in blue. Countries shown include The Netherlands (bottom left), Germany and Denmark (top right). Map used under GNU Free Documentation License and obstained via Wiki Media Commons. Map creator: Aotearoa

Habitat distribution models are a commonly used practice in conservation to predict where a species or habitat of interest may be found. This is due to the high cost of direct survey work vs. the relatively inexpensive availability of data and computing power to develop models. Although modelling has appeal as it can predict over time and spaces in ways that direct surveys are unable to, modelling is a complex process (Fig. 3). There are various inputs and decisions that have to be made, and knowledge is needed about the ecology of the system (the ecological model), the data used (data model) and the technical dimensions of the tools applied to actually build the model (statistical model). Folmer et al.’s paper illustrates these steps well, and suggests ways in which models can be applied to a real world conservation challenge, while also acknowledging the limits of their models.

Ecological model

Modellers should know something about the ecology of their species, including its life history, tolerance thresholds, resource needs, reproductive strategies, and in the case of animals, its behaviour. This stage can sometimes be overlooked when modellers have a rich data set and are highly quantitative researchers. Nonetheless, knowing something about the ecology of the organism being modelled will generally lead to a better model. Folmer et al. describe a conceptual model  for their species, which is akin to an ecological model to describe how environmental factors influence seagrass distribution. They explain how the physiology of their species is adapted to intertidal zones where they have to tolerate full submersion and exposure as well as a wide range of salinity levels. The benthic environment that their modelled species inhabits is highly dynamic, which leads to challenges in developing a data model to use to predict occurrence.

Data model

The data model is the suite of independent variables (“predictor variables”) that are used to predict where a species occurs. The list of variables is informed by the ecological model. Sometimes this information is not well-known, or as in the case of Folmer et al.’s work, it can be found in a variety of sources, including historical maps and surveys, published papers and grey literature. For a species and system that shares habitat across three countries (The Netherlands, Germany and Denmark) this requires compilation from a range of sources. Once the list of predictor variables is developed, modellers must assess if data are available to model. Usually this is in the form of GIS layers. In some cases, static layers that represent environmental variables such as elevation or bathymetry are readily available. In dynamic environments such as the seagrass beds modelled by Folmer et al., the predictors involve their own modelling. Folmer et al. draw on related research to model hydrodynamics, including currents, tides and freshwater runoff. This is a common feature of species distribution models in dynamic environments, but carries the challenge that models then are limited by the uncertainty of the modelled environmental variables.

Statistical model

The statistical model refers to the statistical tool(s) applied to generate the model. Heated debates in the modelling literature hinge on whether statistical explanatory models (such as GLMs) are preferred over “black box” machine learning tools. Machine learning has been shown to generally do a better job predicting but they are not as easy to interpret, thus limiting researchers’ ability to deduce what ecological mechanisms may be driving a system. However, consideration of the ecological model is important here; as Folmer et al. point out, complex responses with non-linear relationships and unknown interactions may not be amendable to explanatory models. The purpose for models should also be considered. Folmer et al. acknowledge this debate, but chose a machine-learning approach because they want to predict where future surveys and restoration efforts can best be targeted. A recent development in modelling is to use ensemble modelling and “transferability-consensus approaches”. The latter is the approach used in this paper whereby the authors combined a suite of machine-learning approaches to develop the best model to predict intertidal seagrass habitats for the entire Wadden Sea.

A schematic for the modelling process (developed based on ideas in Austin 2002). Rectangles indicate inputs, trapezoids indicate processes and circle is the output.

So, are models useful?

Should scientists use models if they are wrong? Folmer et al., and many others, would agree that yes, models have a place. But as the authors conclude in their paper, “… it is crucial that scientists and managers collaborate to avoid misunderstanding and to develop insights and models that are apt for the problems at hand.” Thus, for some problems, some models will be more useful than others, and thus modellers should be transparent about decisions made in developing their approach.