Thresholds for adding degraded tropical forest to the conservation estate

All data manipulation, data analysis and construction of figures were conducted in the R v.4.02 computing environment⁴⁶, using the packages ape (v.5.0)⁴⁷, betareg (v.3.1-4)⁴⁸, dplyr (v.1.1.4)⁴⁹, lme4 (v.1.1-35.1)⁵⁰, lmtest (v.0.9-40)⁵¹, lubridate (v.1.9.3)⁵², MASS (v.7.3-60.0.1)⁵³, openxlsx (v.4.2.5.2)⁵⁴, paletteer (v.1.6.0)⁵⁵, pastecs (v.1.4.2)⁵⁶, png (v.0.1-8)⁵⁷, raster (v.3.6-26)⁵⁸, reshape2 (v.1.4.4)⁵⁹, rgdal (v.1.6-7)⁶⁰, rgeos (v.0.6-4)⁶¹, safedata (v.1.1.3)⁶², scales (v.1.3.0)⁶³, sf (v.1.0-15)⁶⁴, spgwr (v.0.6-36)⁶⁵, stringr (v.1.5.1)⁶⁶ and strucchange (v.1.5-3)⁶⁷.

Taxa records and functional groups

We summarized taxon responses from 8,130 combinations of surveys and taxa. We compiled biodiversity data from 55 published data sources^{68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122} (Supplementary Table 1), from which we extracted presence–absence data following the methods of ref. ¹²³. Previous analyses of multi-taxa biodiversity data have demonstrated that comparisons of presence–absence data among taxa are more robust than analyses of abundance data^23,124. Moreover, abundance data were not available for all taxa, meaning that presence–absence data are the highest-level data that allowed us to use exactly the same analysis method for all taxa. Data sources that sampled multiple years were split into separate, annual surveys, allowing us to more accurately align biodiversity observations with forest degradation measurements taken at different time points, and to account for year-to-year variation in taxon-specific responses to the same ecological gradient¹²³. Data sources that included multiple sampling methods were also split into separate, method-specific surveys¹²³. This process resulted in a total of 127 surveys being used for analysis.

Not all taxa in all surveys were identified to species or morphospecies level. We retained data on taxa identified to higher taxonomic levels because these could often be confidently placed into valid functional groups for analysis. Our data encompassed 4,691 taxa distributed widely across the terrestrial tree of life (Extended Data Fig. 1), of which 1,777 were identified to species and a further 2,288 to morphospecies. We restricted our statistical analyses to 1,681 taxa that had ≥5 occurrences (Extended Data Fig. 1), of which more than half (n = 946) were observed in more than 1 survey (Extended Data Fig. 2). Sensitivity analyses on these same data have demonstrated that a cutoff of five occurrences is appropriate to generate consistently reliable results¹²³.

Taxonomic and functional groups

We aggregated taxa into high-level taxonomic and functional groups to examine group-specific trends. First, we categorized taxa into ten taxonomic groups for separate analysis (plants, arachnids, non-ant insects, ants, other invertebrates, mammals, birds, reptiles, amphibians and fish). Second, we compiled information on directly recorded morphological, functional and physiological traits for as many taxa as we could, which we used to allocate taxa into 126 functional groups (Supplementary Table 2). In doing so, we relied heavily on previously published surveys^{68,69,125,126,127,128,129,130}, literature reviews and expert knowledge.

We included International Union for Conservation of Nature Red List status¹³¹, which we collapsed into two categories: threatened (Critically Endangered, Endangered or Vulnerable) or not threatened (Least Concern, Lower Risk or Near Threatened). For plant taxa, we obtained data on wood density and photosynthesis rates⁶⁸, and used data on leaf area, leaf dry matter content and specific leaf area to estimate the strength of their association with each of three life history strategies: competitor, stress-tolerator or ruderal¹³². All plant traits were continuous, which we categorized into two groupings for analysis (low and high according to whether trait values were below or above the median respectively).

For animal taxa, we compiled data on body mass for mammals¹²⁵, birds¹³³, fish¹²⁶ and beetles¹²⁷ from previously published surveys, estimated amphibian body mass from snout–vent length measurements¹³⁴, and estimated ant body mass using a combination of morphometric data⁵⁹ and published scaling relationships⁶⁶. Body mass was categorized into three groupings (low, medium and high) separately for each taxonomic group. Grouping boundaries were set by log₁₀-transforming body mass and dividing taxa into three equal quantiles.

Animal taxa were assigned categories for physiology (endotherm or ectotherm), development (direct or indirect), sociality (solitary, pair, social or eusocial) and movement mode (winged, legged or legless). We used published records and expert knowledge to record non-mutually exclusive categories of forest strata use, classified as the strata where that taxon forages for food (subterranean, ground-dwelling, understorey, canopy-dwelling or aquatic), trophic level (saprophage, producer, herbivore, carnivore, parasite, parasitoid) and 21 diet categories (soil feeder, coprophage, necrophage, detritivore, saprophage, algivore, mycophage, rhizophage, folivore, florivore, nectarivore, palynivore, frugivore, granivore, xylophage, phloeophage, bacteriophage, invertivore, vertivore, piscivore, haematophage). For each of these last three functional traits, we counted the number of categories associated with each taxon, and categorized taxa as having either low or high generalism according to whether they fell above or below the median value for that trait.

Quantifying forest degradation

We followed the protocols described previously¹²³ to develop a quantitative metric of forest degradation. In brief, data were collected at the Stability of Altered Forest Ecosystems Project²⁶ study site in Sabah, Malaysia. Taxa were sampled at sites that varied in the extent of historical disturbance from unlogged, old-growth forest through to salvage-logged forest and into deforested sites converted into oil palm plantations. We based our degradation metric on above-ground carbon density (ACD; Mg ha⁻¹) derived from airborne LiDAR data^135,136. ACD values varied between 1 Mg ha⁻¹ in cleared areas to a maximum of 273 Mg ha⁻¹ in unlogged forest. For ease of interpretation, we converted ACD into a metric representing the percentage reduction in biomass relative to unlogged forest. We set the value of unlogged forest (0% biomass removal) to be the median biomass density observed in unlogged forest (230 Mg ha⁻¹). We chose to report values as a percentage as opposed to megagrams per hectare as it is more easily transferable to other tropical forests where the maximum ACD may vary¹³⁷. Forest degradation was quantified at two time points that approximately bracketed a salvage logging operation in the project area—November 2014¹³⁵ and April 2016¹³⁶—and taxa were analysed using the forest degradation values that were most closely matched in time to the date of the survey in which the taxon was observed.

The occurrence of a taxon at a given site is almost certainly a response to habitat conditions in a wider radius surrounding that site, so we calculated a spatial average to use as our predictor variable in analyses³³. We selected all pixels within a radius of 250, 500, 1,000, 2,000 and 4,000 m, respectively, around each sample site. Pixels within the buffer area were averaged, with pixels weighted using a Gaussian distance weighting to ensure that those located close to the sample site carried more weight than those located further away. The Gaussian distance weighting (W_g) was given by the equation:

$${W}_{{\rm{g}}}={{\rm{e}}}^{{\left(-\left(\frac{d}{h}\right)\right)}^{2}}$$

in which d represents distance from the central sample site and h gives the bandwidth that was calculated as the maximum buffer distance divided by 100 (ref. ¹³⁸).

Quantifying and summarizing responses

We focus our analyses on the response patterns of individual taxa or functional groups and not aggregated metrics such as species richness¹³⁹ or coarse, vote-counting comparisons of the number of positively versus negatively affected taxa¹⁴⁰. We take this approach because turnover in the identity of taxa and functions are more sensitive measures of changes in biodiversity and ecosystem function. We focus instead on the locations of significant changes in response along the forest degradation gradient as opposed to the signs of those changes.

We test for two conservation-relevant patterns of change in the responses of individual taxa to forest degradation (Extended Data Fig. 3)—change points: the point at which forest degradation first exerts a discernible impact on the occurrence pattern of a taxon or functional group¹⁴¹; and maximum rate points: the point along the degradation gradient where the rate of change in occurrence is most rapid.

We use the aggregation of change points across taxa and functional groups along the forest degradation gradient to identify thresholds for prioritizing proactive conservation, whereas the aggregation of maximum rate points indicates locations where relatively small changes in habitat quality can have the largest impact on the system. If the pattern by which biodiversity recovers from logging is the reverse of the pattern by which it is affected by logging—that is, if there is no hysteresis³⁷—then maximum rate points represent thresholds at which reactive conservation actions, such as forest restoration, are likely to be most effective. This is because conservation actions that add small amounts of biomass to the forest are expected to result in the largest collective change in the occurrence patterns of the affected taxa.

Occurrence models

We standardized all taxon observations to presence–absence data. To generate equivalent data for functional groups, within each survey we aggregated the presences of all taxa that belonged to a particular functional group. For each taxon and functional group, we then determined which survey(s) contained relevant data and combined all observations into a single data frame for analysis. Only taxa or functional groups that had ≥5 occurrence records were analysed, and this is the threshold value that results in repeated single-year surveys having the most consistent ecological results¹²³. All individual taxa and functional groups were analysed independently of each other.

All models tested for an effect of percentage forest degradation on the probability of occurrence. Forest degradation was calculated at each of the five buffer sizes, and we selected the most appropriate spatial scale using the Akaike information criterion⁵³. Statistical significance of the best model was determined with a log-likelihood ratio test comparing the best model to a null model. We tested for a main, linear effect of forest degradation alone. This was because visual inspection and diagnostic plots of exploratory analyses containing a polynomial term failed to identify clear cases of taxa that had peaks in occurrence at intermediate levels of biomass removal.

If a given taxon or functional group was present in more than one survey, we first used a binomial generalized linear mixed model (GLMM) including a random intercept term for survey identity. If GLMMs failed to converge, or if the taxon or functional group was present in only a single survey, we used binomial generalized linear models (GLMs). We were able to fit GLMMs to 798 out of the 946 taxa that were observed in multiple studies (84% of fitted models) and 72 functional groups (59%). The main reason by which GLMMs failed to converge was because taxa or functional groups observed in multiple datasets were not necessarily observed equally in all datasets, and low numbers of observations in one or more surveys can limit the ability of a GLMM to estimate survey-specific random effects.

We opted not to use modelling methods that directly control for detectability, as such models routinely failed to converge in preliminary analyses. This problem is often encountered for analyses of tropical biodiversity in which many species are rare and have low detection probabilities¹⁴². We note, however, that detectability models of species occupancy patterns along ecological gradients do not differ greatly from models that ignore detection probability¹⁴², so we do not expect our choice of approach will notably influence our key results.

Maximum rate points

We used the first derivatives of fitted models to find the point along the forest degradation gradient where the predicted rate of change in occurrence is most rapid¹⁴¹, which we termed the ‘maximum rate point’ (Extended Data Fig. 3c). This point was numerically estimated by identifying the point at which the predicted occurrence pattern from the binomial GLM had the highest absolute slope (as represented by the root of the second derivative), and corresponds to the point along the habitat degradation gradient where the probability of occurrence is 50%. We used absolute slope as occurrence patterns may either increase or decrease along the forest degradation gradient, resulting in positive or negative slopes, respectively.

Change points

We used the second derivatives of fitted models to find change points of the fitted binomial models (Extended Data Fig. 3c,d), which represent the point along the forest degradation gradient where the rate of change in occurrence is itself changing the fastest¹⁴¹. As with the maximum rate points, these were numerically estimated by identifying the point at which the first derivative of the binomial GLM had the highest absolute slope (as represented by the root of the third derivative).

Binomial GLMs with significant slopes have a change point on either side of the maximum rate point, and we focused our analyses on the point at the higher value of forest quality (lowest amount of biomass reduction in Extended Data Fig. 3). These represent the change points at which taxa first begin to respond to reductions in forest quality. Change points are undefined for models with no significant slope.

Taxonomic bias in results

Although the taxa we examined were diverse and are widely distributed across the tree of life (Extended Data Fig. 1), they are not evenly distributed across the tree of life. If the different taxa exhibit consistent variation in the pattern of their responses, this taxonomic bias might affect our overall conclusions. To test for this, we modelled both maximum rate points and change points as a function of taxonomic group, and used log-likelihood ratio tests to compare both models against a null model. There was no significant effect in either case (change points: ${\chi }_{9}^{2}=2.79$, P = 0.97; maximum rate points: ${\chi }_{9}^{2}=9.78$, P = 0.37), indicating that taxonomic bias in our dataset is unlikely to influence the interpretation of our results.

Temporal bias in results

Environmental conditions might influence the outcome of ecological studies¹⁴³. If the surveys we analyse here are unequally distributed through time, and taxon responses to habitat degradation are time-dependent, then temporal autocorrelation might influence our conclusions. In a separate analysis of the same data used in this study, we have quantified this effect and demonstrated that it is not a concern¹²³. We examined whether taxon-specific occurrence patterns across the habitat degradation gradient varied among surveys and years. We found that although occurrence patterns do vary among surveys, there was no consistent signal of survey year on those patterns. Specifically, the number of years between two surveys had no significant impact on the probability of two surveys reporting statistically indistinguishable response patterns.

Long-term shifts in the composition of forest communities might mean that the biodiversity patterns we associate with primary forest in our data are themselves depauperate relative to historical patterns¹⁴⁴. Similarly, the complex logging history of our study site with repeated, but unequally distributed, rounds of logging means that many sites have been through multiple stages of degradation separated by partial recovery^145,146. Our data are not sufficient to quantify historical patterns of occupancy or the impact of time lags on trajectories of occupancy, so we are unable to directly test for these effects. Nonetheless, long-term declines and local extinction of megafauna such as the Sumatran rhino Dicerorhinus sumatrensis harrissoni¹⁴⁷ make it likely that a shifting baseline is a valid concern at our study site. However, we have no way of knowing whether the rates of biodiversity change from the processes that might generate baseline shifts will be the same or different in primary and logged forest. Consequently, we can only emphasize that our analyses are based on a space-for-time substitution, which makes the implicit assumption that the effects of habitat degradation we quantify are additional to, and do not interact with, any other processes contributing to long-term biodiversity change.

Identifying thresholds

We fitted density curves to model the distribution of taxa and functional group change points along the forest degradation gradient. Density curves were fitted using the kernel density estimation function with default settings in the ‘stats’ package⁴⁶. Estimates were extracted, and we used breakpoint regression on the fitted density distributions to identify the number and location of thresholds in aggregated biodiversity and functional group responses to forest degradation.

Thresholds differ from the analysis of individual change points in that they are based on the aggregation of all change points. Whereas change point analysis identified locations where the occurrence pattern of an individual taxon changes, the thresholds identified here represent locations where there is a change in the accumulated responses of the 1,681 taxa or 126 functional groups. Two classes of thresholds are possible: breakpoints signalling either an increase or decrease in the rate of accumulation of affected taxa or functional groups. The former are acceleration points that signify locations at which the situation becomes worse, in that the rate at which the number of affected taxa or functional groups begins to increase (or the rate of decline begins to slow down) as forest degradation increases.

We repeated this approach using the distribution of maximum rate points for both taxa and functional groups. In all cases, the breakpoint regression identified an optimal model containing two acceleration breakpoints. We set the threshold for proactive and reactive conservation to be based on the first and last acceleration points, respectively. For each type of conservation, there were four proactive and four reactive thresholds estimated; one each for taxa change points, taxa maximum rate points, functional group change points and functional group maximum rate points. To obtain an aggregate threshold for proactive and reactive conservation, we used the mean of these four values.

We used bootstrapping to estimate a 95% confidence interval around these means by resampling the fitted models 100 times and estimating the 2.5 and 97.5% quantiles around the threshold estimates.

Vulnerability of taxonomic and functional categories to forest degradation

We combined two metrics to estimate the relative vulnerability of taxonomic and functional groups to forest degradation: probability of impact, defined as the proportion of taxa within that group that exhibited a change point; and severity of impact, defined as the mean location of change points among taxa within that group. Specifically, probability of impact (PI) is calculated as:

$${\rm{PI}}=\frac{{\sum }_{t=1}^{N}{I}_{t}}{N}$$

in which N represents the number of taxa within that taxonomic category, I_t is a binary outcome representing whether taxon t is significantly affected by forest degradation, calculated as:

$${I}_{t}=\left\{\begin{array}{c}1,{\rm{if}}\,{p}_{t} < 0.05\\ 0,{\rm{if}}\,{p}_{t}\ge 0.05\end{array}\right.$$

and ${p}_{t}$ is the P value from the analysis of taxon t’s occurrence pattern in response to forest degradation. Taxonomic categories with large numbers of affected taxa have high probability of impact values. Correlation analyses demonstrated that there was no impact of sample size (the number of taxa per group) on probability of impact for either taxonomic groups (r = −0.21, d.f. = 8, P = 0.56) or functional groups (r = −0.11, d.f. = 45, P = 0.45).

Severity of impact (SI) is calculated as:

$${\rm{SI}}=1-\frac{{\sum }_{t=1}^{N}{{\rm{CP}}}_{t}}{N\times 100}$$

in which CP_t is the change point of taxon t’s response pattern to forest degradation (Extended Data Fig. 3c), and scales such that categories containing many taxa that tend to be affected after the removal of small amounts of biomass have high severity of impact values. The change point for taxa that are not affected by forest degradation $({p}_{t} > 0.05)$ is undefined, but excluding them from the severity of impact calculation would skew severity estimates: categories with large numbers of unaffected taxa would retain the severity value calculated from the small number of affected taxa. We therefore assigned unaffected taxa a change point of 100 before calculating severity. This value indicates that the taxon is not affected until 100% of biomass has been removed, and represents the least-sensitive, real-world change point value.

Both probability of impact and severity of impact are bounded at zero and one, and we combined them into a single metric of vulnerability (V) calculated as

$$V={\rm{PI}}\times {\rm{SI}}$$

which is also bounded at zero and one. Taxonomic categories containing a high proportion of taxa that are affected by low amounts of biodiversity loss have high vulnerability values. By contrast, categories in which a low proportion of taxa are affected, and the taxa that are affected experience change points only after the removal of large amounts of forest biomass, have the lowest vulnerability values.

To summarize functional vulnerability, we categorized functional groups into ten higher-level categories: Red List status, habitat strata, physiology, development, sociality, movement, diet, trophic, body mass and plant traits (for all plant-specific functional groups). Within each category, we treated the individual functional groups as replicates, allowing us to calculate the probability of impact, severity of impact and vulnerability of broadly categorized functional responses.

Inclusion and ethics

All data used were collected in Malaysia. Non-Malaysian researchers conducting field work collaborated with local researchers throughout the research process. All local collaborators were invited to co-author this publication, as were all Malaysian research students involved in data collection.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link