Connecting molecular sequences to their voucher specimens

Onderzoeksoutput: Andere bijdrage

When sequencing molecules from an organism it is standard practice to create voucher specimens.This ensures that the results are repeatable and that the identification of the organism can beverified. It also means that the sequence data can be linked to a whole host of other datarelated to the specimen, including traits, other sequences, environmental data, and geography.It is therefore critical that explicit, preferably machine readable, links exist between voucherspecimens and sequence. However, such links do not exist in the databases of the InternationalNucleotide Sequence Database Collaboration (INSDC). If it were possible to create permanentbidirectional links between specimens and sequence it would not only make data more findable,but would also open new avenues for research. In the Biohackathon we built a semi-automatedworkflow to take specimen data from the Meise Herbarium and search for references to thosespecimens in the European Nucleotide Archive (ENA). We achieved this by matching dataelements of the specimen and sequence together and by adding a “human-in-the-loop” processwhereby possible matches could be confirmed. Although we found that it was possible todiscover and match sequences to their vouchers in our collection, we encountered manyproblems of data standardization, missing data and errors. These problems make the processunreliable and unsuitable to rediscover all the possible links that exist. Ultimately, improvedstandards and training would remove the need for retrospective relinking of specimens withtheir sequence. Therefore, we make some tentative recommendations for how this could beachieved in the future.
Originele taal-2Engels
StatusGepubliceerd - 1-mrt-2021


  • B110-bioinformatica



Inloggen in Pure