Research portal


Estimating the Completeness of Preserved Collections in Representing Global Biodiversity

Research output: Chapter in Book/Report/Conference proceedingC3: Conference Abstractpeer-review

There are an estimated 8.7 million eukaryotic species globally and knowledge of those organisms is organised about their scientific names and the specimens we have of those species (Sweetlove 2011, Mora et al. 2011). Likewise there are between 1.2 and 2.1 billion (109) specimens held in biodiversity collections globally (Ariño 2010). These collections constitute an infrastructure and scientific tool to understand, catalogue and study biodiversity. Yet we find it hard to answer the simple question, how many species are in a collection? This is not trivial to answer, collections are not completely inventoried, do not use the same taxonomy, and the volume of data is vast (Samy et al. 2013, Ariño 2010). We have developed a method that allows us to take a list of collections and to estimate the species richness contained within them. By doing this we will have a deeper insight into the scientific value of the world's biodiversity collections.

Dealing with non-homogeneous and non-random, but incomplete, sampling of sites is a common issue that occurs in many ecological studies (Magurran and McGill 2011, Colwell et al. 2012, Gotelli and Colwell 2001). By using techniques and toolboxes, such as iNEXT (Chao et al. 2014b) and vegan (Oksanen et al. 2020) we can estimate species richness under these conditions. In the case of collections we consider not only the digitized and published proportion of preserved collections, but make extrapolations to the specimens that have not made their way to the Global Biodiversity Information Facility (GBIF) yet.

Nevertheless, to calculate on such large datasets we need to employ innovative Big Data analytic tools. GBIF contains 1.8 billion observations that amount to 120 GB of data compressed. This can then be interrogated in the cloud or locally using tools such as Galaxy, which has made it possible to process large numbers of records in a single batch. We can now evaluate the biodiversity within collections, and divide the result by taxon and geographical region, and compare them to one another.

Ultimately, this work will allow individual collections and consortia to evaluate their coverage of biodiversity and help them better target their collecting strategies.
Original languageEnglish
Title of host publicationBiodiversity Information Science and Standards
PublisherPensoft Publishers
Publication date7-Sept-2021
Publication statusPublished - 7-Sept-2021


Log in to Pure