Skip to main content

Hidden diversity: DNA metabarcoding reveals hyper-diverse benthic invertebrate communities



Freshwater ecosystems, such as streams, are facing increasing pressures from agricultural land use and recent literature stresses the importance of robust biomonitoring to detect trends in insect decline globally. Aquatic insects and other macroinvertebrates are often used as indicators of ecological condition in freshwater biomonitoring programs; however, these diverse groups can present challenges to morphological identification and coarse-level taxonomic resolution can mask patterns in community composition. Here, we incorporate molecular identification (DNA metabarcoding) into a stream biomonitoring sampling design to explore the diversity and variability of aquatic macroinvertebrate communities at small spatial scales. While individual stream reaches can be very heterogenous, most community ecology studies focus on larger, landscape-level patterns of community composition. A high degree of community variability at the local scale has important implications for both biomonitoring and ecological research, and the incorporation of DNA metabarcoding into local biodiversity assessments will inform future sampling protocols.


We sampled twenty streams in southern Ontario, Canada, for aquatic macroinvertebrates across multiple time points and assessed local community variability by comparing field replicates taken ten meters apart within the same stream. Using bulk-tissue DNA metabarcoding, we revealed that aquatic macroinvertebrate communities are highly diverse at small spatial scales with unprecedented levels of local taxonomic turnover. We detected over 1600 Operational Taxonomic Units (OTUs) from 149 families, and a single insect family, the Chironomidae, contained over one third of the total number of OTUs detected in our study. Benthic communities were largely comprised of rare taxa detected only once per stream despite multiple biological replicates (24–94% rare taxa per site). In addition to numerous rare taxa, our species pool estimates indicated that there was a large proportion of taxa that remained undetected by our sampling regime (14–94% per site). Our sites were located across a gradient of agricultural activity, and while we predicted that increased land use would homogenize benthic communities, this was not supported as within-stream dissimilarity was unrelated to land use. Within-stream dissimilarity estimates were consistently high for all levels of taxonomic resolution (invertebrate families, invertebrate OTUs, chironomid OTUs), indicating stream communities are very dissimilar at small spatial scales.

Peer Review reports


As the effects of climate change become more severe and we enter a sixth mass extinction event [1], it is now more than ever critical to conserve vulnerable habitats and slow the rate of biodiversity loss. While trends in vertebrate species have been easier to document and traditionally received more attention [e.g., 24], the importance of insects and threats towards them have garnered a broader interest in the past decade with the publication of alarming trends in insect decline. For example, Hallman et al. [5] estimated that there has been a 75% decline in flying insect biomass in Germany since the late 1980s, and Sánchez-Bayo and Wyckhuys [6] have predicted that 40% of insect species will be extinct in the next few decades. While there has been debate whether the decline will be as severe as Sánchez-Bayo and Wyckhuys [6] predicted [e.g., 7], it nevertheless remains clear that there is a consistent pattern of insect decline across a broad range of taxonomic groups and habitats in response to climate change and land use [8,9,10]. Freshwater habitats, such as streams, are particularly threatened by anthropogenic land use and climate change, despite their irreplaceable ecosystem services [11, 12]. The biodiversity of freshwater systems is declining at a faster rate than marine or terrestrial habitats [13], stressing how essential stream biomonitoring and conservation projects are to preserve these ecosystems. Biomonitoring assessments often incorporate aquatic invertebrates, which are key bioindicator taxa and sensitive to habitat disturbances [14]. However, these groups present challenges to traditional morphological identification. For example, many immature larval stages cannot be reliably identified to species-level due to the lack of diagnostic characters which can result in identification errors [15, 16]. Due to these constraints, environmental assessments using morphological identifications often use coarse taxonomic resolution (e.g., family-level) as a surrogate for species-level identification, which could potentially mask species-level turnover within a family or not prove sensitive enough to detect impairment [17]. The limitations in time and financial resources, large volumes of samples, and either coarse taxonomic resolution or narrow taxonomic focus can be impediments to monitor stream systems exposed to complex physical and chemical stressors.

The above challenges, combined with the need for species detection in an ongoing biodiversity crisis, has prompted research programs which suggest that molecular tools can provide a promising future for freshwater biodiversity assessments [18,19,20]. Over the past decade, there have been major advancements in molecular identification tools (e.g., high-throughput sequencing or metabarcoding; [21, 22]), which have the potential to be incorporated into biomonitoring programs and ecological research. Metabarcoding has huge potential for environmental assessments as high-throughput sequencing (HTS) platforms can efficiently sequence and identify entire samples [22, 23]. In previous studies comparing morphological identification and DNA metabarcoding of invertebrate communities, molecular methods have proven either equally or more effective than traditional approaches at investigating patterns of biodiversity [23,24,25,26]. Very speciose taxa can especially benefit from metabarcoding applications, such as the family Chironomidae (non-biting midges, a ubiquitous group of flies with a freshwater larval stage), which occupy most freshwater habitats and are often grouped at family-level resolution in assessments [27].

While metabarcoding provides an avenue for efficient and cost-effective macroinvertebrate identification for biomonitoring programs, the variability of stream habitats can make it challenging to determine whether sampling efforts have been sufficient. Local micro-habitat level conditions, such as riparian vegetation, sediment type, organic matter, and flow regime (e.g., riffle versus pool), are important factors in structuring benthic communities [28, 29]. Since stream microhabitats can vary on small spatial scales, this creates a very patchy system in terms of physical stream attributes and resources, and thus affects community composition of aquatic invertebrates [30]. This spatial extent knowledge gap, combined with the potentially unknown taxonomic diversity of benthic invertebrates, can be uniquely answered by sampling methods based on metabarcoding principles to inform biomonitoring protocols and efficiently record biodiversity in stream systems.

Small-scale variations in stream habitats can be caused by both natural and anthropogenic processes, although loss of microhabitats (e.g., habitat homogeneity) is often associated with adjacent agricultural land use, which results in channelization and reduction of riparian vegetation [31]. Heterogeneous habitats tend to support higher species richness [32, 33], and it is perhaps unsurprising that increasing agricultural land use in the surrounding catchment area can cause taxonomic and functional homogenization in aquatic macroinvertebrates, resulting in more similar communities of ‘tolerant’ taxa [34, 35]. However, the relationship between beta diversity (a measurement of community dissimilarity) and both land use and habitat heterogeneity in stream macroinvertebrates is unclear and often case dependent. In Finland, Heino et al. [36] determined that heterogeneity was not a significant predicator of invertebrate beta diversity in streams using a mix of species and genus-level identifications. Additionally, research using predominately genus-level identifications by Petsch et al. [37] concluded that land use did not cause homogenization of stream invertebrates in boreal (Finland) or subtropical (Brazil) regions. However, contrasting patterns have been detected in New Zealand, where habitat heterogeneity was a strong driver of beta diversity in stream invertebrate communities [38] and in North America (Maryland, USA) where Maloney et al. [39] detected a negative relationship between beta diversity and increased pasture and crop cover. The above studies of stream invertebrate beta diversity patterns are performed at large spatial scales (e.g., comparing beta diversity between major watersheds or geographic regions), and there are few stream studies that explore spatial resolution at small scales (e.g., microhabitat level, but see [36, 40]).

In this study, we used bulk tissue DNA metabarcoding to determine how variable benthic invertebrate communities are at small spatial scales across three time points in a single year. We determined the importance of taxonomic resolution in revealing biodiversity patterns by performing all analyses at both family-level and OTU-level resolution. As chironomid OTUs generally comprise high levels of diversity in freshwater samples, we also repeated all analyses using only OTUs from this family. We assessed whether overall taxonomic richness is linked to land use and calculated within-stream dissimilarity to determine if small-scale changes in community composition are influenced by agricultural activity. We hypothesized that agricultural landscapes homogenize stream communities and cause more uniform benthic communities due the loss of habitat complexity, and therefore predicted that within-stream dissimilarity will decrease (e.g., more homogenous communities) as the percentage of agricultural land use in the catchment increases. We also explored how stream biodiversity estimates change between different levels of taxonomic resolution by calculating rarefaction curves and estimating the total regional species pool, calculating the sampling coverage at each stream site and determining the percentage of a local community made up by rare taxa in order to inform future sampling efforts.


Site selection and stream sampling

We collected benthic macroinvertebrates from twenty streams in southern Ontario across three time periods (May, July, and September 2019; Fig. 1). We selected streams on a continuum of surrounding land use, and sites were located either on Conservation Authority property or privately owned land (farm sites), and additionally were required to be wadable and wet for the entire study period. We used the Ontario Flow Assessment Tool (OFAT) [41] to determine stream watershed boundaries in ArcGIS v. 10.6.1 [42] and the Ontario Land Cover Compilation v. 2.0 [43] to determine the percentage of agriculture land use (cropping) surrounding each stream site.

For field collection of aquatic macroinvertebrates, we collected four biological replicates within each stream (i.e., four bulk samples per stream) by selecting four transects that were approximately 10–20 m apart and positioned downstream to upstream to avoid contamination from sampling-related disturbance. We placed transects to include multiple microhabitats if present (e.g., riffles and pools, different substrates) and collected benthic macroinvertebrates and associated habitat information based on the Ontario Benthic Biomonitoring Network (OBBN) [44] and the Ontario Stream Assessment Protocol (OSAP) [45]. Each sample consisted of a 3-minute travelling kick-and-sweep using a 500 μm D-net across the width of the stream. We then transferred the bulk sample to a 500 μm mesh sieve for rinsing and the removal of large debris, before storing in a sample container and preserving in 95% ethanol on site. We kept the invertebrate samples in a chilled cooler until transfer to the lab on the same day, where they were stored in a 4 °C fridge until further processing. All sampling equipment (e.g., nets, sieves, forceps, waders) was cleaned with a 10% bleach solution and rinsed with de-ionized water (DI) between sites. In total, we collected four biological replicates per stream and 80 samples each month, for a total of 240 bulk samples.

Fig. 1
figure 1

A map of stream sampling sites in conservation areas (blue circles) and privately owned farms (orange squares) in southern Ontario, Canada, along the north shore of Lake Erie. The map was created using QGIS [46]. The black outlines demonstrate quaternary watershed boundaries where we sampled. The inset map shows the province of Ontario in white with our study region outlined in a black rectangle

Sample sorting and DNA extraction

Bulk macroinvertebrate samples were rinsed with DI water over a sterilized 500 μm sieve and sorted under a dissection microscope. Benthic macroinvertebrates were removed from sample debris and placed in a sterile 20 mL tube containing 95% ethanol and ten 4 mm diameter steel beads for later homogenization. As many bulk samples were very large, we used a subsampling approach based on equal effort by stopping sorting after 4 h had elapsed. The unsorted portion of sample was placed on a white grid and scanned for 2 min for any rare taxa that had not been encountered during the initial subsampling. After sorting had been completed, excess ethanol was removed by a pipette and samples were air dried for 1 week while covered with a kimwipe. The dry biomass of each sample was recorded, and then samples were homogenized within their sampling tubes using an IKA Tube Mill (IKA, Staufen, Germany) at 4000 rpm for 15 min. Smaller samples were ground in a 2 mL sterile tube with two steel beads using a TissueLyser II (Qiagen, Hilden, Germany) at 30 Hz for 1 min. We subsampled 20 mg (± 1 mg) of ground tissue into a sterile 2 mL tube and used a DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) following manufactures’ guidelines for DNA extraction, follow by quantification using a Qubit 3.0 Fluorometer (ThermoFisher Scientific, MA, USA). Several samples contained less than 20 mg of tissue, and the entire sample was used for DNA extraction in place of sub-sampling.

PCR amplification and library preparation

We used a two-step PCR library preparation protocol to first amplify our target region followed by a second indexing reaction [see 47]. For the first step, we used the Qiagen multiplex PCR kit (Qiagen, Hilden, Germany) as our master mix and selected a primer pair (BF2 + BR2; [48]) that has been successful at amplifying a broad range of invertebrate taxa, including aquatic invertebrates collected from our study region [49, 50]. We selected the mitochondrial cytochrome c oxidase subunit I (CO1) gene as our marker and the BF2 + BR2 primer set targeted a 421 bp region to amplify in our initial PCR reaction. Each reaction consisted of 2.5 µL DNA extract, 12.5 µL of 2x Qiagen master mix, 9 µL of molecular water, and 0.5 µL of each primer (BF2 + BR2, reaction concentration of 0.2 µM) for a total reaction volume of 25 µL. Our thermocycling profile followed Qiagen’s manufacturer’s protocol: a 95 °C initial denaturation for fifteen minutes, followed by 25 cycles of 94 °C for 30 s, 50 °C for 90 s, 72 °C for 60 s, and a final extension at 72 °C for ten minutes and visualized using precast 2.0% agarose e-gels (E-Gel 96 SYBR Safe DNA stain; ThermoFisher Scientific, MA, USA). Each sampling period (e.g., month) consisting of 80 samples were prepared in their own plate, in addition to 6 PCR negative controls, 1 sequencing negative control, and 1 extraction negative control. We included 8 PCR technical replicates per plate to ensure PCR reproducibility and explicitly selected samples of both lower and higher invertebrate abundance. This resulted in three 96-well PCR plates consisting of 240 samples, 24 technical replicates, and 24 negative controls. The resulting PCR products were purified using NucleoMag NGS clean up and size select magnetic beads (Macherey-Nagel, USA) with an 0.8x ratio of beads to PCR product as per Milián-García et al. [51].

A second PCR reaction was prepared using indexing primers to tag samples for library preparation. Here, we prepared a 50 µL PCR reaction using 5 µL of our purified PCR product, 25 µL of 2x Qiagen master mix, 10 µL of molecular water, and 5 µL each of forward and reverse indexing primers (initial concentration 10 µM) based on Illumina’s standard indexing protocol. The thermocycling profile included an initial denaturation at 95 °C for fifteen minutes, 8 cycles of 95 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s, and a final extension of 72 °C for five minutes. After e-gel visualization to confirm amplification, we again purified the PCR products using NucleoMag beads (0.6x ratio, [51]). After a final visualization, we submitted the prepared libraries to the Advanced Analysis Center at the University of Guelph. Each plate was normalized, pooled, and sequenced separately on the Illumina MiSeq platform for a total of three separate runs.

In some cases, samples did not perform well in sequencing and were subsequently filtered out of the dataset based on low sequence read (36 samples filtered out with fewer than 80k sequences). Most of the failed samples came from the same streams and had lower-than-average DNA concentration, and we re-ran these samples following the same protocol as above, but instead increased the template volume to 5 µL in the initial amplification PCR and added an additional 10 cycles to the thermocycling program for the first PCR (CO1 amplification) and submitted a fourth plate for sequencing as above to replace failed samples.

Bioinformatics pipeline

We used the bioinformatics platform JAMP v. 0.67 ( to process the raw sequence data. The protocol is listed in detailed in Persaud et al. [50], but in brief this involved paired-end merging of de-multiplexed reads using USEARCH v. 11.0.6668 [52] followed by trimming primer sequences from reads using cutadapt v. 1.15 [53]. We assessed sequence size by filtering out any that were more than ten base pairs longer or shorter than our target (421 bp) and filtered out low-quality sequence with expected errors ≥ 1. We used USEARCH v. 11.0.6668 [52] to cluster quality sequences from all runs into Operational Taxonomic Units (OTUs) using a 97% similarity threshold, and OTUs with less than 0.01% abundance across all samples were filtered out [24, 54]. We matched our OTUs to the Barcode of Life Data System reference sequence library (BOLD) [55] using the Python program BOLDigger [56]. Raw sequences are available on NCBI’s Sequence Read Archive (BioProject ID: PRJNA783201) and our final OTU table with sequence reads per sample and associated taxonomic metadata are available as supplementary information.

Data quality control

All of our statistical analyses and figures were performed using R version 4.0.3 [57], and all plots were created using the package ggplot2 v.3.3.3 [58]. We used the R package metabaR v. 1.0.0 [59] to assess the quality of our metabarcoding data, including confirming sequencing depth was appropriate and checking for contamination on sequences present in our negative controls (see Additional file 1: Appendix 1 for further details). A sequence was identified as a contaminate if it had a relative abundance that was highest in a negative control (as no other DNA should be present in negative controls, contaminants should be preferentially amplified). A sample would be flagged as failed if more than 10% of its total sequence reads corresponded to an OTU identified as a contaminant. Based on the small number of reads in sequencing controls, we tested multiple filtering thresholds to lower the influence of tag jumps and prevent false positives. The abundance of an OTU in sample was changed to zero if the relative abundance of that OTU was less than 0.001% of the total abundance of that OTU in the entire dataset. We then filtered out samples with low sequence reads (less than 1 SD below average sequence read; 87,344) and assessed the quality of technical replicates for reproducibility based on Bray-Curtis distances within and between samples (e.g., contrasting the dissimilarities in OTU composition). A sample was flagged as a failed if the distance within a sample (e.g., between technical replicates) was greater than the threshold of the intersection value for within and between sample distances. We filtered out any non-target taxa and retained only arthropods, annelids and molluscs which were the three must abundant phyla in terms of both total sequence reads and OTU counts. We did not include nematodes in any further analysis as we did not obtain many sequences or OTUs matching to this group, likely due to a combination of the small size of some species and primer bias. Finally, we filtered out poor-quality taxonomic matches (< 90% match to reference database). After cleaning the data using metabaR, we calculated the total number of sequences and OTUs that were removed from the dataset during this process. See Additional file 1: Appendix 1 for additional details and figures for the above protocol.

Statistical analysis

To assess the influence of taxonomic resolution on ecological patterns, we analyzed the data first at family-level and OTU-level resolution. We additionally analyzed the chironomid OTUs on their own as this family made up a large portion of the diversity within our dataset. All data were analyzed at three time points in our data set (May, July, September). To determine if taxon richness varied between stream type (e.g., located in a conservation area or on private property) or between sampling months, we used the ‘lmer’ function in the R package lmerTest v. 3.1-3 [60] to perform a mixed effects model to test the significance of site type and sampling month (and the interaction between them) on taxa richness for each identification level. We incorporated stream site as a crossed random effect to account for all streams being repeatedly sampled each month. We used lmerTest to generate p values for the fixed effects in the above and all subsequent models by using Satterthwaite’s method to approximate degrees of freedom for all F-tests (see [60]).

To assess how within-stream dissimilarity is influenced by surrounding land use, we calculated the Raup-Crick index between the four biological replicates (transects) within a stream using the ‘raupcrick’ function with 999 simulations in the R package vegan v. 2.5-7 [61] and then took the mean of all pairwise comparisons as the dissimilarity value for that site. Using Raup-Crick as a dissimilarity index is ideal for metabarcoding data as it treats the number of sequence reads per OTU as binary (e.g., presence/absence) as opposed to an abundance value, and it is based on occurrence probabilities in proportion to frequency of a species occurrence and thus should be robust to the influence of rare taxa and large differences in richness between sites. We calculated a mixed effects model for each identification level to test if within site dissimilarity (i.e., the mean of all within-site pairwise comparisons) was correlated with the percentage of agricultural land use or sampling month, and if there was an interaction between these two fixed effects.

We used the R package iNEXT v. 2.0.20 [62] to generate rarefaction and extrapolation curves to estimate the regional diversity in each sampling period (month) for macroinvertebrate families, macroinvertebrate OTUs and chironomid OTUs. To estimate how many undetected taxa remained at each stream site (e.g., locally), we calculated the number of expected total taxa at each based on Chao’s equation using the ‘specpool’ function in the R package vegan [61] and plotted this as the percentage of coverage achieved by dividing the observed number of taxa over the expected number of taxa. Finally, to assess how many rare taxa make up a local community, we calculated the percentage of taxa which only occurred in one (of four) biological replicates at each stream site. We calculated two separate mixed effects models to determine if either the percentage of sampling coverage or the percentage of rare taxa were significantly different between either sampling month or level of identification (i.e., families, all OTUs, chironomid OTUs). All raw data and associated R code are provided as supplementary information (see Additional files 23, 4).


Summary and taxonomic richness

We received 74,771,159 sequence reads from the four MiSeq runs (average of 18,893,999 sequences per run), and after post-bioinformatic processing our samples contained a total of 51, 334, 969 reads (average per sample 161,430 ± 69,286 standard deviation) and 2276 OTUs (average 72 per sample ± 33 SD), while all negative controls combined had only 1954 reads (average per control 57 ± 16 SD) and 241 OTUs (average per control 16 ± 13 SD). Notably, our OTU table originally contained 5597 OTUs which were filtered out after clustering for not meeting the 0.01% abundance threshold (these OTUs corresponded to only 33,840 sequences in total). Three OTU sequences were flagged as contaminants through metabaR [59], two of which were unidentified algae and one matched to a species of maple tree (Acer sp.), indicating that the most likely cause of the small amount of sequences in the negative controls were tag jumps during sequencing. No samples were flagged as contaminated, and all technical replicates passed our reproducibility criteria. Three samples were removed from the dataset for sequencing depth lower than one standard deviation of the mean (< 86k sequences). The most common reason for an OTU to be removed from the final dataset was a match of less than 90% to the sequence reference database (BOLD). Post metabaR [59], our cleaned dataset contained a total of 41,978,040 high quality reads (177,123 per sample ± 46,881 SD) and 1681 macroinvertebrate OTUs (65 per sample ± 29 SD). We created two more data frames from the cleaned data, one with OTUs scaled back to family-level resolution (145 families total; average of 13 per sample ± 5 SD) and one containing only OTUs from the family Chironomidae (586 OTUs; average of 23 per sample ± 17 SD), as chironomids contained a third of the total diversity within the invertebrate dataset.

There was no significant influence of site type (CA versus farm; F1,18 = 0.39, p = 0.54) or sampling month (F2, 36 = 2.62, p = 0.09) on average OTU richness (Fig. 2). Likewise, there was no effect of either site type or sampling month on average family richness (type: F1,18 = 0.22, p = 0.75; month: F2,36 = 1.98, p = 0.15; Fig. 2). While site type had no effect on average chironomid OTU richness (F1,18 = 0.77, p = 0.72), there was slight evidence that sampling month was important (though not statistical significant; F2,36 = 3.12, p = 0.056; Fig. 2). For all above models, there was no significant interaction between site type and sampling month on taxonomic richness.

Fig. 2
figure 2

The average taxonomic richness per site in each category (conservation area - CA), n = 7, or farm, n = 13) over the sampling period (May, July, Sept, total n = 237). There was no significant effect of site type or sampling month on taxonomic richness for any group, though sampling month approach significance for chironomid OTUs (F1,36 = 3.12, p = 0.056). Note the different Y-axis scales in the 3 figures

Site dissimilarity and agricultural land use

Mean within-site Raup-Crick dissimilarity was not significantly correlated with the percentage of agricultural land or month for family-level identification (agriculture: F1,18 = 1.28, p = 0.27; month: F2,36 = 0.36, p = 0.70), OTU-level identification (agriculture: F1,18 = 1.74, p = 0.21; month: F2,36 = 0.51, p = 0.60), or for chironomid OTUs (agriculture: F1,18 = 1.26, p = 0.27; month: F1,36 = 0.09, p = 0.91). Dissimilarity was generally high for all sites (i.e., most values within range 0.5–1.0; Fig. 3).

Fig. 3
figure 3

The mean Raup-Crick dissimilarity of all pairwise comparisons within a site is plotted against the percentage of agricultural land use surrounding the stream site. There was no significant relationship between dissimilarity and agriculture or sampling month for any identification level

Rare and undetected taxa

We collected a total of 80 samples for each sampling period, but the species rarefaction curves did not level off (Fig. 4). Extrapolations estimated that at 150 sampling units, the number of new taxa would begin to level off for family-level identification and for chironomid OTUs. Based on Chao’s equation, we determined that each sampling month collected approximately 69% of total invertebrate families present, 62% of all invertebrate OTUs present, and 60% of chironomid OTUs present, and at the stream level this value ranged dramatically from 32–94% for families, 28–84% for all invertebrate OTUs, and 14–90% for chironomid OTUs, indicating that many taxa can be missed locally (Fig. 5). We observed that there were significant effects of both identification level (F2,152 = 7.5, p < 0.001) and sampling month (F2,152 = 5.26, p < 0.01). OTU-level resolution (for all invertebrates and just chironomids) had more undetected taxa than family-level resolution and September had the highest coverage for all groups (particularly families).

In addition to undetected taxa, we also calculated the proportion of rare taxa at each stream to determine how variable streams are at small spatial scales. Streams had on average 59% (range 40–87%) of invertebrate OTUs that were only detected in one of four biological replicates, indicating a high degree of turnover within a single site (Fig. 6). Likewise, there was an average of 46% of unique taxa per stream (range 24–71%) for invertebrate families and 61% (range 31–94%) for chironomid OTUs (Fig. 6). There was a significant difference in the number of rare taxa between different identification levels (F2,156 = 37.6, p < 0.001) and slight evidence that sampling month had an effect (though not statistically significant; F2,156 = 2.91, p = 0.057). There were less rare taxa in the family-level dataset compared to the OTU and chironomid OTU datasets.

Fig. 4
figure 4

Rarefaction curves for the number of taxa collected each sampling month where the solid line represents interpolated richness from our samples and the dashed line is an extrapolation based on expected number of taxa with continued sampling. Symbols indicate the point where our sampling stopped (n = 80). Note difference in y-axis scales

Fig. 5
figure 5

Estimated percentage of taxonomic coverage achieved at each stream by dividing the observed number of taxa over the extrapolated number of taxa expected based on Chao’s equation for each sampling month. Grey lines connect the same stream over time. The amount of ‘undetected’ taxa was significantly different between identification levels (F2,152 = 7.5, p < 0.001) and sampling months (F2,152 = 5.26, p < 0.01)

Fig. 6
figure 6

There were four biological replicates at each of the twenty streams to assess small-scale variation in community composition. Bars represent the percentage of the total number of taxa occurring at a stream that were only collected in one of four biological replicates (transects). Grey lines connect the same stream over time. The amount of ‘undetected’ taxa was significantly different between identification levels (F2,156 = 37.6, p < 0.001). Sampling month had no significant effect on the percentage of rare taxa (F2,156 = 2.91, p = 0.057)


Rare and missing taxa

Our approach of sampling at small spatial scales within streams revealed incredibly diverse benthic macroinvertebrate communities which varied considerably over short distances. While we collected a total of 80 samples for each sampling month, this was not sufficient to detect biodiversity at the OTU-level at either the local scale (within a stream; alpha diversity) or the regional scale (total species pool; gamma diversity). Even at family-level taxonomic resolution, we consistently underestimated total number of taxa present both locally and regionally, and this increased dramatically for OTUs. Our extrapolations suggest that after 150 samples, the accumulation curve would not be complete. This not only indicates a vast level of diversity masked at the family level, but also suggests that our sampling was not sufficient to capture to complete local community. Doubling our sampling effort (i.e., eight local transects per stream) would be unlikely to represent adequately OTU-level diversity as invertebrate OTUs continued to increase past our extrapolation threshold. While few aquatic metabarcoding papers explore biodiversity at the site level or compare observed versus expected number of taxa, morphological studies using family-level resolution have found similar results as our study. In shallow streams in Brazil, Ligeiro et al. [63] detected 53 invertebrate families and determined that 81% of them were rare (in this case based on a threshold of less than 1% abundance, which differs from our definition). However, Ligeiro et al. [63] concluded that, since sampling a single riffle in a stream collected approximately 75% of invertebrate families present, intensive sampling is not efficient and that the priority should be broad spatial coverage. In contrast, we observed that a single sample within a stream in our study is highly dissimilar in OTU composition from a second point only ten meters away and can represent anywhere between 27 and 85% of the total OTUs detected. Based on our results, we conclude that one sampling point is not sufficient and a more thorough sampling protocol is necessary for a more accurate representation of local diversity. While sampling intensity is an important consideration, a caveat of course is that improvements in sequencing analyses (e.g., increasing read depth) could improve the detection of rare taxa.

Our results more closely resemble terrestrial arthropod metabarcoding studies using malaise traps to explore community composition. In southern Ontario, Steinke et al. [54] surprisingly discovered that ten Malaise traps (tent-like insect collection traps) placed in a row had remarkably high differences in community composition. This field design of trap placement is comparable to our four kicknet transects within a stream, where we similarly found very high within-stream dissimilarity, numerous rare taxa, and large estimates of undetected taxa. For example, Steinke et al. [54] estimated that their OTU pool contained approximately 60% of the total diversity in the region, whereas our estimates were 69% for all invertebrate OTUs (and 78% for chironomid OTUs only). These patterns not only resemble our study in terms of missing taxa, but also display strikingly similar patterns in the number of rare taxa or those occurring only once within a site. Steinke et al. [54] detected close to 3000 OTUs, and almost half of these only occurred once in the same location. At the site level, the percentage of rare taxa in our study (e.g., only occurring at one of four transects in a stream) ranged from 40 to 90% and averaged approximately 60%.

This pattern of arthropod rarity persists across geographic ranges, as a tropical DNA barcoding study using Malaise traps [64] observed very high beta diversity amongst traps, which did not decrease with increased local sampling, indicating that the regional species pool had not been adequately sampled. However, repeated yearly sampling of the same localities decreased beta diversity and allowed D’Souza and Hebert [64] to determine which taxa were present yearly (even if rare) and which were ‘transient’ taxa. D’Souza and Hebert’s [64] study clearly demonstrated the need for both spatially and temporally robust datasets to estimate accurately taxon richness and beta diversity both within a site and over time. In our dataset, we see a large proportion of rare taxa; however, repeated yearly sampling would allow us better power to determine which taxa are rare and which are transient. This is an ongoing challenge of attempting to characterize local communities in streams with no discrete boundaries and of how to delineate local sites in a continuous system. The spatial extent and sampling grain of a study can significantly alter conclusions drawn regarding community composition [65, 66]. It is possible that sampling small spatial scales like in our study can result in an overinflation of local richness due to mass effects swamping out local environmental signal [36], such as taxa being carried to the site due to water flow but not actually being able to establish there (e.g., a “sink” habitat for that species) and thus result in these transient taxa skewing dissimilarity estimates. Alternatively, sampling too large an area can miss changes in community composition in response to environmental signals due to dispersal limitation. Here, we demonstrated a vast amount of both diversity and variability in aquatic macroinvertebrates at both the local and regional scale, underscoring the importance of developing consistent, long-term monitoring programs to assess more accurately patterns in insect declines and thus better inform protection measures.

Taxonomic richness and land use

While both family and OTU-level richness varied between streams, this was not significantly linked to seasonality or site location (i.e., whether the stream was located on private land or on conservation authority property). This is perhaps unsurprising as local richness (or alpha diversity) can be influenced by a number of habitat parameters in aquatic systems, such as microhabitat availability in streams [28] or hydroperiod in wetlands [67], or can be unrelated to any habitat parameters [63]. While aquatic macroinvertebrates have been successfully used as indicators for decades [14, 68], it is likely that a binary category (e.g., farm or conservation area) was not an effective tool to classify stream quality, especially in a heavily impacted landscape such as southern Ontario. Streams vary in physical habitat parameters, such as the riparian buffer width and the slope of the bank, and these metrics may provide a local buffer from adjacent or upstream landscape-scale agricultural practices [31]. The importance of local habitat conditions has been shown also in previous work in southern Ontario streams by Yates and Bates [69], who determined that aquatic macroinvertebrates (at family-level) were more associated with human activity directly adjacent to the stream such as channel alteration (e.g., decreased sinuosity) and buffer width. In contrast, more mobile fish communities were more responsive to landscape-scale parameters over local conditions in the same system [69]. Fish also have a more evident threshold response in community composition to agricultural impairment, whereas macroinvertebrates (albeit at family-level resolution) respond more gradually to such changes [70] However, there is also the potential that largely developed landscapes, such as southern Ontario, have historically excluded sensitive taxa through centuries of agriculture, and invertebrate communities are already homogenous due to lack of true reference-condition streams [71].

While it is possible these invertebrate communities are homogenous at the family level as suggested by Krynak and Yates [71], and our total family count was generally in concordance with other stream studies in this region using family resolution through morphological identification [69,70,71], the total OTU richness in our study indicates that vast levels of turnover within a family are possible. At over 1600 OTUs, our metabarcoding richness appears to be much higher than stream metabarcoding studies in various geographic regions [26, 72,73,74] and more closely resembles richness counts in terrestrial [54, 75] and soil [76] metabarcoding papers. It is likely that bioinformatic decisions in clustering and matching OTUs have a large influence on the taxonomic diversity in a dataset [77]. While our choice of clustering threshold for OTU delineation (97%) is comparable to other studies, it is a more conservative choice in terms of richness than would be achieved at 98% clustering or by using exact or amplicon sequence variants (ESV/ASV) [78] and thus probably not responsible for the high level of diversity found in our dataset. Our OTU count may also be considered a conservative estimate of diversity due to the potential for taxonomic blind spots arising from the use of a single marker (COI) and primer set [79]. We additionally filtered out very low-read OTUs post-clustering (less than 0.01% abundance) which would have reduced the diversity in our dataset. This filtering threshold is commonly used in other invertebrate metabarcoding studies [24, 54], however it decreased the total OTUs in our dataset by half. However, while these OTUs made up 50% of the total OTUs, they only corresponded to 33,840 sequence reads (0.06% of total sequences) and it is important to define a threshold to eliminate OTUs based upon erroneous sequences. We also selected a threshold of 90% sequence similarity to the reference database (BOLD) in order to be included in the final dataset. While many studies use a conservative 98% threshold [26, 54], others have selected a 85% similarity threshold [74]. In our dataset, it is likely that 98% is too strict a cut-off due to the sparsity of reference sequences for understudied taxa such as chironomids. A match of 90% similarity is generally accepted as a ‘family-level’ match in multiple metabarcoding studies [26, 74] and thus aligned well with our approach of comparing OTU and family-level diversity. In addition to bioinformatic sources of bias when estimating total diversity, it is important to note that all stages of a protocol can ultimately alter the number of taxa detected. For example, we incorporated sub-sampling approaches for both removing benthic macroinvertebrates from the bulk sample and taking a representative sample of the subsequent homogenized tissue for DNA extraction. Smaller samples would therefore be more thoroughly sampled than large, dense samples. Laboratory components such as PCR biases can also affect sample diversity as primer biases can result in some taxa being preferentially amplified [48]. The number of PCR cycles will also affect the detection of rare taxa. We encountered a subset of samples which initially resulted in poor sequencing results and subsequently required a higher number of PCR cycles. While this allowed us to include those samples in this manuscript, it does introduce a source of variability when comparing taxa composition.

In our system, chironomids had the highest OTU count of any family. This is consistent with previous aquatic metabarcoding work, notably Beerman et al. [24], who detected nearly 200 chironomid OTUs in a stream mesocosm experiment. Of these total chironomid OTUs, 85% did not have a binomial name assigned from BOLD, which is nearly identical to the value we detected (84.8% without a binomial name in the reference database for high similarity matches). Even with unnamed OTUs, Beerman et al. [24] detected unique responses to environmental stressors (even amongst OTUs which matched to the same binomial name). Reference databases are not complete for many groups, especially very species-rich groups or difficult-to-identify taxa such as aquatic insect larvae, and improvement in species coverages must be made in order to improve the quality of metabarcoding datasets. It is also important to consider that the use of 97% as a clustering threshold can also result in the “splitting” of OTUs (e.g., a single species with large intraspecific divergence of CO1 being split into two OTUs) and thus artificially inflating the OTU count. For example, some chironomid species complexes can have as high as a 10% divergence [80], which would result in multiple OTU clusters when using a 97% threshold. Ultimately, there are numerous bioinformatic decisions which can either increase or decrease the number of OTUs in a metabarcoding study. The vast diversity of the chironomid family in particular merits future study to determine the extent of haplotype diversity which may interfere with establishing discrete OTUs using a set clustering threshold [24, 80].

Dissimilarity and habitat heterogeneity

Mean Raup-Crick dissimilarity values were consistently high for all levels of taxonomic assignment and sampling season, indicating that our biological replicates were quite unique from each other even when collected at the same stream. While a categorical land use variable did not prove informative in distinguishing macroinvertebrate communities from farm or conservation area streams, we expected the percentage of land used for cropping in the catchment would be a more accurate indication of stream condition. We expected an increased level of agriculture would result in more homogeneous invertebrate communities within a stream (i.e., lower beta diversity) due to the reduction in habitat complexity [34, 35, 38, 39]. We detected no significant relationship between land use and beta diversity [36, 37], and dissimilarity values were quite variable.

Interestingly, we observed steeper declines in dissimilarity values in May along a gradient of agricultural land use for all groups compared to other sampling months, though this trend was not statistically significant. While it is possible this is a spurious correlation, Zizka et al. [81] observed seasonal community changes for aquatic macroinvertebrate OTUs in a study of urban streams. In reference condition (“near natural”) streams, Zizka et al. [81] discovered that the community composition of invertebrates stayed consistent across seasonal sampling periods, whereas communities in more highly impacted streams differed across seasonal sampling periods. This contrasting pattern in seasonal change may be in response to the fact that the inflow of stressors can change over time in impacted areas, resulting in a community-level response that differs over time [81]. It is possible here that changes in water quality or other habitat parameters during the spring resulted in more homogenous communities compared to later in the sampling season. Of course, this may also be due to the emergence of different insect taxa in streams over a season. Our results here indicate that the percentage of agriculture in the landscape is not sufficient here to detect a complicated influence of land use; more detailed parameters are necessary, or our sampling was not sufficient to detect any homogenization, or perhaps the variation in land use in our study was not significant enough to detect a gradient response in community composition.


Metabarcoding revealed a huge amount of diversity in southern Ontario streams, with many rare or undetected taxa present within each stream. This study highlights the importance of developing robust monitoring plans given the challenges associated with obtaining adequate community composition data for aquatic macroinvertebrate taxa. Field sampling design is an extremely important consideration in monitoring studies, and by using molecular identification we reveal that one sampling point in a stream in not sufficient to detect local diversity. While we detected no discernable response in taxonomic richness or within-stream dissimilarity to agricultural land use, it is possible our interpretation of agriculture land use was too narrow. Given the importance of local parameters to aquatic invertebrate communities, future metabarcoding studies should consider more precise estimates of land use, including both water chemistry and physical parameters, such as stream sinuosity, riparian buffer width, and bank slope to reflect more accurately site condition. In general, our conclusions regarding OTU diversity and rarity were similar to terrestrial metabarcoding work and indicate that aquatic biomonitoring programs can benefit from molecular identification to more accurately reflect trends in insect biodiversity.

Availability of data and materials

Raw sequencing data is available at NCBI Sequence Read Archive (Accession: PRJNA783201;



Amplicon sequence variant


Barcode of life data system


Exact sequence variant


Operational taxonomic unit


  1. Ceballos G, Ehrlich PR, Barnosky AD, et al. Accelerated modern human-induced species losses: entering the sixth mass extinction. Sci Adv. 2015;1:9–13.

    Article  Google Scholar 

  2. Ceballos G, Ehrlich PR, Dirzo R. Biological annihilation via the ongoing sixth mass extinction signaled by vertebrate population losses and declines. Proc Natl Acad Sci U S A. 2017;114:E6089–96.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ceballos G, Ehrlich PR, Raven PH. Vertebrates on the brink as indicators of biological annihilation and the sixth mass extinction. Proc Natl Acad Sci U S A. 2020;117:13596–602.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Rosenberg KV, Dokter AM, Blancher PJ, et al. Decline of the north american avifauna. Science. 2019;366:120–4.

    Article  CAS  PubMed  Google Scholar 

  5. Hallmann CA, Sorg M, Jongejans E, et al. More than 75% decline over 27 years in total flying insect biomass in protected areas. PLoS ONE. 2017.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Sánchez-Bayo F, Wyckhuys KAG. Worldwide decline of the entomofauna: a review of its drivers. Biol Conserv. 2019;232:8–27.

    Article  Google Scholar 

  7. Thomas CD, Jones TH, Hartley SE. “Insectageddon”: a call for more robust data and rigorous analyses. Glob Chang Biol. 2019;25:1891–2.

    Article  PubMed  Google Scholar 

  8. Raven PH, Wagner DL. Agricultural intensification and climate change are rapidly decreasing insect biodiversity. Proc Natl Acad Sci U S A. 2021;118:1–6.

    Article  Google Scholar 

  9. Wagner DL, Grames EM, Forister ML, et al. Insect decline in the Anthropocene: death by a thousand cuts. Proc Natl Acad Sci U S A. 2021;118:1–10.

    Article  Google Scholar 

  10. Fourcade Y, Åström S, Öckinger E. Climate and land-cover change alter bumblebee species richness and community composition in subalpine areas. Biodivers Conserv. 2019;28:639–53.

    Article  Google Scholar 

  11. Albert JS, Destouni G, Duke-Sylvester SM, et al. Scientists’ warning to humanity on the freshwater biodiversity crisis. Ambio. 2021;50:85–94.

    Article  PubMed  Google Scholar 

  12. Dudgeon D. Multiple threats imperil freshwater biodiversity in the Anthropocene. Curr Biol. 2019;29:R960–7.

    Article  CAS  PubMed  Google Scholar 

  13. Reid AJ, Carlson AK, Creed IF, et al. Emerging threats and persistent conservation challenges for freshwater biodiversity. Biol Rev. 2019;94:849–73.

    Article  PubMed  Google Scholar 

  14. Rosenberg DM, Resh VH. Freshwater biomonitoring and benthic invertebrates. New York: Chapman & Hall; 1993.

    Google Scholar 

  15. Haase P, Pauls SU, Schindehu K, Sundermann A. First audit of macroinvertebrate samples from an EU Water Framework Directive monitoring program: human error greatly lowers precision of assessment results. J North Am Benthol Soc. 2010;29:1279–91.

    Article  Google Scholar 

  16. Haase P, Murray-Bligh J, Lohse S, et al. Assessing the impact of errors in sorting and identifying macroinvertebrate samples. Hydrobiologia. 2006;566:505–21.

    Article  Google Scholar 

  17. Gleason JE, Rooney RC. Aquatic macroinvertebrates are poor indicators of agricultural activity in northern prairie pothole wetlands. Ecol Indic. 2017;81:333–9.

    Article  Google Scholar 

  18. Baird DJ, Hajibabaei M. Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next-generation DNA sequencing. Mol Ecol. 2012;21:2039–44.

    Article  PubMed  Google Scholar 

  19. Pauls SU, Alp M, Bálint M, et al. Integrating molecular tools into freshwater ecology: developments and opportunities. Freshw Biol. 2014;59:1559–76.

    Article  Google Scholar 

  20. Blackman R, Mächler E, Altermatt F, et al. Advancing the use of molecular methods for routine freshwater macroinvertebrate biomonitoring—the need for calibration experiments. Metabarcoding Metagenom. 2019;3:e34735.

    Article  Google Scholar 

  21. Cristescu ME. From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity. Trends Ecol Evol. 2014;29:566–71.

    Article  PubMed  Google Scholar 

  22. Taberlet P, Coissac E, Pompanon F, et al. Towards next-generation biodiversity assessment using DNA metabarcoding. Mol Ecol. 2012;21:2045–50.

    Article  CAS  PubMed  Google Scholar 

  23. Elbrecht V, Vamos EE, Meissner K, et al. Assessing strengths and weaknesses of DNA metabarcoding-based macroinvertebrate identification for routine stream monitoring. Methods Ecol Evol. 2017.

    Article  Google Scholar 

  24. Beermann AJ, Zizka VMA, Elbrecht V, et al. DNA metabarcoding reveals the complex and hidden responses of chironomids to multiple stressors. Environ Sci Eur. 2018;30:26.

    Article  Google Scholar 

  25. Beermann AJ, Elbrecht V, Karnatz S, et al. Multiple-stressor effects on stream macroinvertebrate communities: a mesocosm experiment manipulating salinity, fine sediment and flow velocity. Sci Total Environ. 2018;610–611:961–71.

    Article  CAS  PubMed  Google Scholar 

  26. Emilson CE, Thompson DG, Venier LA, et al. DNA metabarcoding and morphological macroinvertebrate metrics reveal the same changes in boreal watersheds across an environmental gradient. Sci Rep. 2017;7:1–12.

    Article  CAS  Google Scholar 

  27. Nicacio G, Juen L. Chironomids as indicators in freshwater ecosystems: an assessment of the literature. Insect Conserv Divers. 2015;8:393–403.

    Article  Google Scholar 

  28. Beisel JN, Usseglio-Polatera P, Thomas S, Moreteau JC. Stream community structure in relation to spatial variation: the influence of mesohabitat characteristics. Hydrobiologia. 1998;389:73–88.

    Article  Google Scholar 

  29. Dobson M. Microhabitat as a determinant of diversity: stream invertebrates colonizing leaf packs. Freshw Biol. 1994;32:565–72.

    Article  Google Scholar 

  30. Heino J, Louhi P, Muotka T. Identifying the scales of variability in stream macroinvertebrate abundance, functional composition and assemblage structure. Freshw Biol. 2004;49:1230–9.

    Article  Google Scholar 

  31. Allan JD. Landscapes and riverscapes: the influence of land use on stream ecosystems. Annu Rev Ecol Evol Syst. 2004;35:257–84.

    Article  Google Scholar 

  32. Ortega JCG, Thomaz SM, Bini LM. Experiments reveal that environmental heterogeneity increases species richness, but they are rarely designed to detect the underlying mechanisms. Oecologia. 2018;188:11–22.

    Article  PubMed  Google Scholar 

  33. Stein A, Gerstner K, Kreft H. Environmental heterogeneity as a universal driver of species richness across taxa, biomes and spatial scales. Ecol Lett. 2014;17:866–80.

    Article  PubMed  Google Scholar 

  34. Siqueira T, Lacerda CGLT, Saito VS. How does landscape modification induce biological homogenization in tropical stream metacommunities? Biotropica. 2015;47:509–16.

    Article  Google Scholar 

  35. Martins RT, Couceiro SRM, Melo AS, et al. Effects of urbanization on stream benthic invertebrate communities in Central Amazon. Ecol Indic. 2017;73:480–91.

    Article  CAS  Google Scholar 

  36. Heino J, Grönroos M, Ilmonen J, et al. Environmental heterogeneity and β diversity of stream macroinvertebrate communities at intermediate spatial scales. Freshw Sci. 2013;32:142–54.

    Article  Google Scholar 

  37. Petsch DK, Saito VS, Landeiro VL, et al. Beta diversity of stream insects differs between boreal and subtropical regions, but land use does not generally cause biotic homogenization. Freshw Sci. 2021;40:53–64.

    Article  Google Scholar 

  38. Astorga A, Death R, Death F, et al. Habitat heterogeneity drives the geographical distribution of beta diversity: the case of New Zealand stream invertebrates. Ecol Evol. 2014;4:2693–702.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Maloney KO, Munguia P, Mitchell RM. Anthropogenic disturbance and landscape patterns affect diversity patterns of aquatic benthic macroinvertebrates. J North Am Benthol Soc. 2011;30:284–95.

    Article  Google Scholar 

  40. Costa SS, Melo AS. Beta diversity in stream macroinvertebrate assemblages: among-site and among-microhabitat components. Hydrobiologia. 2008;598:131–8.

    Article  Google Scholar 

  41. Government of Ontario. User guide for Ontario Flow Assessment Tool. Ministry of Northern Development, Mines, Natural Resources and Forestry; 2020. p. 53.

  42. Esri. ArcGIS. Esri Inc.; 2020.

  43. Ontario Ministry of Natural Resources and Forestry. Ontario Land Cover Compilation Data Specifications Version 2.0; 2016.

  44. Jones FC, Somers KM, Craig B, Reynoldson TB. Ontario benthos biomonitoring network: protocol manual. Dorset, ON: Ontario Ministry of Environment; 2007. p. 109.

    Google Scholar 

  45. Stanfield L, editor. Ontario Stream Assessment Protocol. Version 8.0. Fisheries Policy Section. Peterborough: Ontario Ministry of Natural Resources; 2017. p. 376.

    Google Scholar 

  46. QGISDevelopmentTeam. (2020) QGIS Geographic Information System. Open Source Geospatial Foundation Project.

  47. Bohmann K, Elbrecht V, Carøe C, et al. Strategies for sample labelling and library preparation in DNA metabarcoding studies. Mol Ecol Resour. 2022;22:1231–46.

    Article  CAS  PubMed  Google Scholar 

  48. Elbrecht V, Leese F. Validation and development of COI metabarcoding primers for freshwater macroinvertebrate bioassessment. Front Environ Sci. 2017;5:1–11.

    Article  Google Scholar 

  49. Gleason JE, Elbrecht V, Braukmann TWA, et al. Assessment of stream macroinvertebrate communities with eDNA is not congruent with tissue-based metabarcoding. Mol Ecol. 2021;30:3239–51.

    Article  CAS  PubMed  Google Scholar 

  50. Persaud SF, Cottenie K, Gleason JE. Ethanol eDNA reveals unique community composition of aquatic macroinvertebrates compared to bulk tissue metabarcoding in a biomonitoring sampling scheme. Diversity. 2021;13:1–15.

    Article  CAS  Google Scholar 

  51. Milián-García Y, Young R, Madden M, et al. Optimization and validation of a cost-effective protocol for biosurveillance of invasive alien species. Ecol Evol. 2021;11:1999–2014.

    Article  PubMed  PubMed Central  Google Scholar 

  52. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–1.

    Article  CAS  PubMed  Google Scholar 

  53. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. Embnet J. 2011;17:10–2.

    Article  Google Scholar 

  54. Steinke D, Braukmann TW, Manerus L, et al. Effects of Malaise trap spacing on species richness and composition of terrestrial arthropod bulk samples. Metabarcoding Metagenom. 2021;5:43–50.

    Article  Google Scholar 

  55. Ratnasingham S, Hebert PDN. BOLD: the Barcode of Life Data System ( Mol Ecol Notes. 2007;7:355–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Buchner D, Leese F. BOLDigger - a Python package to identify and organise sequences with the Barcode of Life Data systems. Metabarcodi Metagenom. 2020;4:19–21.

    Article  Google Scholar 

  57. R Core Team. R: a language and environment for statistical computing, version 4.03.Vienna, Austria. R Foundation for Statistical computing; 2020.

  58. Wickham H. ggplot2: elegant graphics for data analysis. Volume ISBN 978–3–319–24277–4. Springer-Verlag New York; 2020.

  59. Zinger L, Lionnet C, Benoiston AS, et al. metabaR: an r package for the evaluation and improvement of DNA metabarcoding data quality. Methods Ecol Evol. 2021;2021:1–7.

    Article  Google Scholar 

  60. Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest Package: tests in linear mixed effects models. J Stat Softw. 2017;82:1–26.

    Article  Google Scholar 

  61. Oksanen J, Blanchet FG, Legendre P et al. (2017) vegan: Community Ecology Package. R package version 2.4-2.

  62. Hsieh T, Ma K, Chao A. (2020) iNEXT: Interpolation and extrapolation for species diversity. R package version 2.0.20,

  63. Ligeiro R, Melo AS, Callisto M. Spatial scale and the diversity of macroinvertebrates in a neotropical catchment. Freshw Biol. 2010;55:424–35.

    Article  Google Scholar 

  64. D’Souza ML, Hebert PDN. Stable baselines of temporal turnover underlie high beta diversity in tropical arthropod communities. Mol Ecol. 2018;27:2447–60.

    Article  PubMed  Google Scholar 

  65. Viana DS, Chase JM. Spatial scale modulates the inference of metacommunity assembly processes. Ecology. 2019;100:e02576.

    Article  PubMed  Google Scholar 

  66. Cottenie K. Integrating environmental and spatial processes in ecological community dynamics. Ecol Lett. 2005;8:1175–82.

    Article  PubMed  Google Scholar 

  67. Daniel J, Gleason JE, Cottenie K, Rooney RC. Stochastic and deterministic processes drive wetland community assembly across a gradient of environmental filtering. Oikos. 2019.

    Article  Google Scholar 

  68. Buss DF, Carlisle DM, Chon T-S, et al. Stream biomonitoring using macroinvertebrates around the globe: a comparison of large-scale programs. Environ Monit Assess. 2015.

    Article  PubMed  Google Scholar 

  69. Yates AG, Bailey RC. Covarying patterns of macroinvertebrate and fish assemblages along natural and human activity gradients: implications for bioassessment. Hydrobiologia. 2010;637:87–100.

    Article  Google Scholar 

  70. Yates AG, Bailey RC. Effects of taxonomic group, spatial scale and descriptor on the relationship between human activity and stream biota. Ecol Indic. 2011;11:759–71.

    Article  Google Scholar 

  71. Krynak EM, Yates AG. Benthic invertebrate taxonomic and trait associations with land use in an intensively managed watershed: implications for indicator identification. Ecol Indic. 2018;93:1050–9.

    Article  Google Scholar 

  72. Carew ME, Kellar CR, Pettigrove VJ, Hoffmann AA. Can high-throughput sequencing detect macroinvertebrate diversity for routine monitoring of an urban river? Ecol Indic. 2018;85:440–50.

    Article  CAS  Google Scholar 

  73. Serrana JM, Miyake Y, Gamboa M, Watanabe K. Comparison of DNA metabarcoding and morphological identification for stream macroinvertebrate biodiversity assessment and monitoring. Ecol Indic. 2019;101:963–72.

    Article  Google Scholar 

  74. Kuntke F, de Jonge N, Hesselsøe M, Lund Nielsen J. Stream water quality assessment by metabarcoding of invertebrates. Ecol Indic. 2020;111:105982.

    Article  CAS  Google Scholar 

  75. Steinke D, DeWaard S, Sones J, et al. Message in a bottle—metabarcoding enables biodiversity comparisons across ecoregions. GigaScience. 2022;11:giac040.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Young M, Hebert PDN. Unearthing soil arthropod diversity through DNA metabarcoding. PeerJ. 2022;10:e12845.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Clare EL, Chain FJJ, Littlefair JE, et al. The effects of parameter choice on defining molecular operational taxonomic units and resulting ecological analyses of metabarcoding data. Genome. 2016;59:981–90.

    Article  PubMed  Google Scholar 

  78. Brandt MI, Günther B, Arnaud-Haond S. Bioinformatic pipelines combining denoising and clustering tools allow for more comprehensive prokaryotic and eukaryotic metabarcoding. Mol Ecol Resour. 2021.

    Article  PubMed  Google Scholar 

  79. Morey KC, Bartley TJ, Hanner RH. Validating environmental DNA metabarcoding for marine fishes in diverse ecosystems using a public aquarium. Environ DNA. 2020.

    Article  Google Scholar 

  80. Lin XL, Stur E, Ekrem T. DNA barcodes and morphology reveal unrecognized species in Chironomidae (Diptera). Insect Syst Evol. 2017;49:329–98.

    Article  Google Scholar 

  81. Zizka VMA, Geiger MF, Leese F. DNA metabarcoding of stream invertebrates reveals spatio-temporal variation but consistent status class assessments in a natural and urban river. Ecol Indic. 2020;115:106383.

    Article  Google Scholar 

Download references


We would like to acknowledge that our stream sampling sites are located on the traditional territory of the Mississaugas of the Credit First Nations, the Anishnabek, Haudenosaunee (Iroquis), and Ojibway/Chippewa peoples. We would like to thank Marie Gutgesell, Emily Champagne, Christine Dulal-Whiteway, and Dr. Kevin McCann for their assistance locating study sites and Evan Versteeg for assistance calculating land use in ArcGIS. We would also like to thank Ian Thompson for his valuable assistance collecting and sorting benthic macroinvertebrate samples, Dr. Yoamel Milián-García for helpful advice and feedback throughout lab work, and Dr. Sarah (Sally) Adamowicz for helpful suggestions regarding our bioinformatics pipeline. Thank you to staff at the following conservation authorities: Catfish Creek CA, Kettle Creek CA, Long Point CA, Nature Conservancy Canada and Thames Talbot Land Trust. We thank two anonymous reviewers for their helpful suggestions to improve this manuscript.


This research was funded by a Canada First Research Fund – Food From Thought grant to KC and RH, an NSERC Discovery Grant to KC and an NSERC CGS-D scholarship to JEG. The funding body played no role in the design of the study and collection, analysis, interpretation of data, or in writing the manuscript.

Author information

Authors and Affiliations



JEG, RHH, and KC conceived the research ideas; JEG designed the study, performed the field and laboratory work, performed the bioinformatics and data analysis, and wrote the first draft of the manuscript; JEG, RHH, and KC edited the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jennifer Erin Gleason.

Ethics declarations

Ethics approval and consent to participate

Use of lower-order invertebrates (e.g., insects, worms, snails) does not need ethics approval. Samples were collected from privately owned farms with permission from the landowner, or on conservation area property with permits from the following Conservation Authorities in Ontario, Canada: Catfish Creek CA, Kettle Creek CA, Long Point CA, Nature Conservancy Canada and Thames Talbot Land Trust.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Appendix 1.

A word document elaborating on data cleaning protocols post-bioinformatic processing using metabaR.

Additional file 2: Appendix 2.

The R code used to clean the data following the protocol in Appendix 1.

Additional file 3: Appendix 3.

Raw data post JAMP processing, to be used in Appendix 2. Composed of 4 csv files.

Additional file 4: Appendix 4.

The R code used to analysis the data and generate figures. This code is meant to run using the output from Appendix 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gleason, J.E., Hanner, R.H. & Cottenie, K. Hidden diversity: DNA metabarcoding reveals hyper-diverse benthic invertebrate communities. BMC Ecol Evo 23, 19 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Aquatic invertebrates
  • Biodiversity
  • Biomonitoring
  • Metabarcoding
  • Stream ecology