DNA-based identification of aquatic invertebrates – useful in the South African context ?

HOW TO CITE: Venter HJ, Bezuidenhout CC. DNA-based identification of aquatic invertebrates – useful in the South African context? S Afr J Sci. 2016;112(5/6), Art. #2015-0444, 4 pages. http://dx.doi.org/10.17159/ sajs.2016/20150444 The concept of using specific regions of DNA to identify organisms – processes such as DNA barcoding – is not new to South African biologists. The African Centre for DNA Barcoding reports that 12 548 plant species and 1493 animal species had been barcoded in South Africa by July 2013, while the Barcode of Life Database (BOLD) contains 62 926 records for South Africa, 11 392 of which had species names (representing 4541 species). In light of this, it is surprising that aquatic macroinvertebrates of South Africa have not received much attention as potential barcoding projects thus far – barcoding of aquatic species has tended to focus on invasive species and fishes. Perusal of the BOLD records for South Africa indicates a noticeable absence of aquatic macroinvertebrates, including families used for biomonitoring strategies such as the South African Scoring System. Meanwhile, the approach of collecting specimens and isolating their DNA individually in order to identify them (as in the case of DNA barcoding), has been shifting towards making use of the DNA which organisms naturally shed into their environments (eDNA). Coupling environmental and bulk sample DNA with high-throughput sequencing technology has given rise to metabarcoding, which has the potential to characterise the whole community of organisms present in an environment. Harnessing barcoding and metabarcoding approaches with environmental DNA (eDNA) potentially offers a non-invasive means of measuring the biodiversity in an environment and has great potential for biomonitoring. Aquatic ecosystems are well suited to these approaches – but could they be useful in a South African context?

The concept of using specific regions of DNA to identify organisms -processes such as DNA barcoding -is not new to South African biologists.The African Centre for DNA Barcoding reports that 12 548 plant species and 1493 animal species had been barcoded in South Africa by July 2013, while the Barcode of Life Database (BOLD) contains 62 926 records for South Africa, 11 392 of which had species names (representing 4541 species).In light of this, it is surprising that aquatic macroinvertebrates of South Africa have not received much attention as potential barcoding projects thus far -barcoding of aquatic species has tended to focus on invasive species and fishes.Perusal of the BOLD records for South Africa indicates a noticeable absence of aquatic macroinvertebrates, including families used for biomonitoring strategies such as the South African Scoring System.Meanwhile, the approach of collecting specimens and isolating their DNA individually in order to identify them (as in the case of DNA barcoding), has been shifting towards making use of the DNA which organisms naturally shed into their environments (eDNA).Coupling environmental and bulk sample DNA with high-throughput sequencing technology has given rise to metabarcoding, which has the potential to characterise the whole community of organisms present in an environment.Harnessing barcoding and metabarcoding approaches with environmental DNA (eDNA) potentially offers a non-invasive means of measuring the biodiversity in an environment and has great potential for biomonitoring.Aquatic ecosystems are well suited to these approaches -but could they be useful in a South African context?

The rise of DNA-based identification for biomonitoring
Several authors have pointed out that conservation of natural resources and ecosystems hinges on the provision of data regarding the presence and distribution of species within an environment [1][2][3] -data which usually are supplied through biomonitoring initiatives.This principle is at the core of monitoring programmes, including the South African Scoring System or SASS (one of the programmes that forms part of the current National Aquatic Ecosystem Health Monitoring Programme: River Health Programme 4 ), which analyses macroinvertebrate communities as a measure of stream ecosystem health.SASS, like similar indices, uses morphology to make identifications, which are then assigned weight and values which are then used to gauge the relative well-being of the system. 4e limitations of morphology-based indices for the identification of macroinvertebrates are discussed at length in other publications, 5,6 so will only be mentioned briefly here.DNA-based identification methods provide an alternative to morphological identifications, and have been useful in addressing several of these problems.For example, in morphology-based identification of macroinvertebrates it often is difficult to differentiate between cryptic (morphologically indistinguishable) species, several of which have come to light after DNA-based approaches were used. 7,8Larval stages of aquatic insects are often extremely difficult to identify morphologically, even for experts.In addition, morphological identification keys for aquatic macroinvertebrates tend to focus on adult stages, compounding the difficulties of juvenile identification.Furthermore, linking of life stages (especially juveniles) and female and phenotypically variant individuals to identified representative/voucher specimens is incomplete for many macroinvertebrate species.DNA-based methods have proven useful in resolving such issues, because they rely on genetic loci which are applicable regardless of sex, life stage or appearence. 6,9,10rphological identification is time-consuming and requires a great deal of taxonomic knowledge and skill to be successful.However, once the initial effort has been made to identify a species using DNA, the expertise needed to identify subsequent specimens of the same species is drastically reduced. 5Another advantage of using DNA-based approaches is the potential for identification of specimens to species level, in contrast to many morphological indices which often stop at family level.Identification to species level can bring a sensitivity and depth of knowledge to biomonitoring that coarser identifications are not able to. 3 The standardisation of DNA-based identification methods gave rise to the Consortium for the Barcode of Life.This initiative was set up with the goal of promoting the use of specific regions of DNA (in the COI gene for animals, rbcl and matK for plants) to determine the sequence of those regions which was particular to each species, for use in identification.The Barcode of Life Database (BOLD) is a centralised database in which such barcode sequences, as well as specimen collection and species distribution details, are available to any interested person.The database contains over 3.7 million entries from all over the world -including 62 926 from South Africa, 11 392 of which have species names (representing 4541 species). 11wever, despite the optimism and success stories associated with this approach, there are a number of limitations attached to using the COI region as a marker, not least of which is that this method relies on a DNA sequence which is about 650 base pairs long.This effectively eliminates the possibility of using this approach on a large body of damaged DNA fragments as they are too short -including many museum voucher specimens (potentially a source of reference material 12 ), as well as most environmental DNA (eDNA -see Box 1), which is typically damaged.And http://www.sajs.co.za May/June 2016 so the method has continued to evolve, with the advent of the 'minibarcode' 13 , as well as several suggestions for alternative DNA regions to be used.

Box 1: Definition of terms eDNA (environmental DNA)
Although some authors define eDNA simply as DNA obtained directly from environmental samples (soil, sediment, water, etc.) as opposed to individual specimens, a more specific definition would include the corollary that there should be no obvious signs of biological source material. 2

Metabarcoding
The use of DNA markers (metabarcodes) to perform analyses on total DNA isolated from a bulk/environmental sample, in order to characterise the assemblage of organisms present in that environment. 14,15

BINS (barcode index numbers)
Operational taxonomic units of genetically identical taxa indexed under a common identifier. 16ing able to work with eDNA precipitated from samples from aquatic ecosystems opens up a range of monitoring and research opportunities. 17,18Thus other primers and markers have been developed for eDNA, which are often species-specific (in contrast to the broad range of COI), and which have been applied to the detection of indicator species as well as rare, invasive or pathogenic species. 17Mächler et al. 18 demonstrated the potential of using eDNA and specific primers to detect macroinvertebrate species in both river and lake systems.Using standard polymerase chain reaction or PCR (which is cheaper than next generation sequencing), they were able to detect both indicator and nonnative species using their own primer design.Specially designed primers and probes used in conjunction with qPCR were successfully used to survey the population of European weather loach in Denmark.During this study, this near-extinct fish was detected at sites where its presence had been observed recently, as well as one location where it had not been observed since 1995.In addition to successful detection, the authors report that this approach is less costly, both economically and in terms of effort (person-hours). 19A metabarcoding is an approach which uses bulk DNA collections (such as faeces 20 , sediment meiofaunal communities 21 and eDNA 2 ) coupled with next generation sequencing to obtain an overview of the organisms which are present in an environment as a whole.Instead of targeting one or a few species, this approach aims to give a more holistic view of ecosystem composition.4,22 The reads which result from such high throughput sequencing are clustered into operational taxonomic units (OTUs).Depending on the genetic loci used, OTUs can be used to match to sequences in databases such as BOLD (if COI was used) or GenBank, in order to identify the organisms whose DNA was in the sample.14,22,23 In other cases, in which primer sets other than COI are used or when no reference sequences are available for the OTUs produced, taxonomic assignation is not possible.However, that does not mean the data are not useful. Moecular taxonomic units (MOTUs) refer to representative sequence clusters which have been grouped together using particular algorithms. MOTUdata can be utilised in lieu of 'true' species data, by comparing MOTU profiles of different environments or time periods.24 Although the metabarcoding approach (see Box 1) still is being refined, it potentially allows monitoring of community-level responses to change, including responses to remediation strategies and climate change.14,15 eDNA in aquatic ecosystems The fact that DNA has a relatively short turnover time in aquatic systems means that aqueous eDNA is likely to represent a 'real-time' view of species present within a relatively small window of time. 25Strickler et al. 26 investigated the effects of temperature, pH and UVB radiation on eDNA in water and found that it degraded faster in warmer water, with a neutral pH and a moderate UVB level.Because these conditions are also amenable to microbial growth, the authors speculated that eDNA breakdown was at least partially facilitated by microbial action.
During investigations into DNA persistence in both laboratory and field conditions (ponds), species could be detected using eDNA for 25 days and 21 days after removal of the organisms, respectively. 27However, the fact that DNA may be concentrated and survive much longer in sediments may be a complicating factor. 25A dispersal in flowing streams and rivers is also a concern, as it may give false positive results downstream where the organism in question is not found.DNA dispersal was investigated by Laramie et al. 28 who traced eDNA of Chinook salmon (Oncorhynchus tshawytscha), and Deiner et al. 29 who studied eDNA of a daphnid (Daphnia longispina) and swollen river mussel (Unio tumidus).Both studies found that eDNA signals tend to decrease as distance from the source increases. 28,29In Deiner et al.'s study, DNA from lake-dwelling invertebrates was detected 12 km downstream from the lake inhabited by the target organisms.
The authors suggest that when using eDNA to estimate biodiversity in such ecosystems, sample sites should be 5-10 km apart, and follow the stream hierarchy. 29

Other challenges
Just as a person cannot be identified by their fingerprints unless a record of their fingerprints exists as a reference, if the barcode databases do not contain a matching record of barcoded specimens of a species with which to compare a query sequence, DNA cannot be used to make species level identifications (although specimens may still be placed within families or genera).We encountered this problem when attempts were made to use macroinvertebrate COI sequences to provide further resolution to morphological identifications.Only weak matches could be found with sequences in the GenBank database.These matches often did not agree with the morphological identifications, or corresponded to species found in countries in the northern hemisphere or in Australia.In addition, as mentioned earlier, BOLD Systems Database records for South Africa indicate a noticeable dearth of aquatic macroinvertebrates (with greater focus on fish and invasive species 30 ).For example, there are zero entries referring to families such as Baetidae and Ephemeridae, and only 43 records for the order Ephemeroptera.Similarly, Plecoptera had only 13 entries, while Odonata had 66 and Trichoptera 138.When compared to the 3621 records for Coleoptera and 3150 for Hemiptera, 16,31 it is clear that there is considerable room for improvement for aquatic organisms.
This challenge can be overcome by building up sequence libraries.A possible starting point may be natural history museums.The addition of sequence data to curated specimen records could be invaluable.The Fresh Water Invertebrates collection of the Albany Museum was reported to contain 67 000 specimens in 2009. 32Although it would be a boon if this collection could act as a starting point for aquatic invertebrate barcoding initiatives, the storage conditions and age of museum specimens tend to lead to DNA degradation, and have been known to impede barcoding efforts. 12Thus, although some success has been achieved using such specimens, it is not possible to escape collection and identification of fresh specimens entirely.
Because better quality DNA may be obtained from fresh specimens, or those which have been stored correctly (in 95% ethanol and then at -10 °C) 12 , this practice should be encouraged among those who sample and collect aquatic invertebrates.Although they may not be familiar with the techniques necessary to isolate and process the DNA for barcoding, if the experts who are able to identify aquatic invertebrates do so and then store the specimens correctly, the molecular work can be done at a later stage.Alternatively, specimens can be barcoded first and clustered into barcode index numbers or BINs 16  the limit of information which can be gained from DNA.Additionally, if other DNA regions need to be selected in future applications (such as metabarcoding and genome skimming), then the DNA which has already been isolated can be used to characterise the organism from a different perspective. 33,34challenge which is less of a problem during traditional barcodingin which organisms are identified one specimen at a time -but is an obstacle for bulk samples, is PCR bias.During the initial enrichment steps, PCR bias can create a number of problems for metabarcoding and eDNA analyses.For example, COI primers used in DNA barcoding have been found not to operate with the same efficiency for all organisms' DNA.In a mixed sample, bias towards certain organisms may cause their presence to be overstated, while others are underrepresented or missed entirely. 35To overcome this bias, Taberlet et al. 33 suggest that metabarcodes and primers be tailored to the needs of each project.In order to do this, they propose that the DNA and barcodes collected and placed in curated collections during barcoding efforts could be used to develop this technique further.Although there are advantages to designing metabarcodes from within the COI barcoding region -such as access to the vast amounts of already identified sequences -these advantages must be weighed against potential biases. 14,22,36mpetence in bioinformatics and molecular biology techniques, particularly those involving high-throughput sequencing, will have to be developed and encouraged in order to take full advantage of the huge data sets which such techniques generate.So too, expertise in traditional taxonomy and morphological identification -far from rendering such skills obsolete, as some fear 37 , these initiatives cannot be accomplished without robust morphological identifications.Utilising the three approaches of taxonomy, barcoding and metabarcoding in tandem will allow researchers to link nearly three centuries of taxonomic research to modern data sets and community structures.

Conclusion
By understanding the biodiversity of South Africa better, we may be better able to protect it.By learning more about biota and their interaction with the environment, predictions can be made regarding how ecosystems will respond to change, and what can be done to preserve them.The recent report by Dallas and Rivers-Moores 1 both highlights the possible changes which may be wrought by climate change, and calls for more proactive monitoring.Clearly, barcoding and metabarcoding could be advantageous for biologists working with aquatic macroinvertebrates and aquatic ecosystem monitoring, particularly for those who do not have a background in taxonomy.However, in order to harness the usefulness of these techniques, an effort has to be made to collect the necessary data.For this reason, we advocate the establishment of regional collections which link identified aquatic species with their DNA sequences, which can be used to develop primer sets and standard methods for the use of eDNA in biomonitoring.Furthermore, we recommend that the establishment of collections be done in conjunction with a SASS approach, so that DNA-based approaches can be made relatable to previous work.
(see Box 1) according to barcode similarity.Representative specimens from such BINs can be selected for morphological identification and description, especially if potential cryptic species come to light.Building up a library of aquatic invertebrate DNA may thus lead to interdisciplinary cooperation and collaboration.Establishing an identifying sequence for a species is not http://www.sajs.co.za