Questions about DNA-derived data

Migrated from Slack

  1. How can I document genome size estimates from:
  • flow cytometry (from pleopod and/or gill tissue of snap frozen organisms)
  • Feulgen imaging (from pleopod or tail tissue from both snap frozen samples and ethanol preserved samples)
  1. Which fields should I use to publish ddRAD sequencing data? Do we have an example dataset like this please?

@Saara Hi Yi-Ming! I have to admit these are questions that I haven’t come across before. At the moment the use of the extension is described for metabarcoding and qPCR only, so I don’t have direct answers. I would also like to hear what type of information is most important for this data, are these questions from a data provider? Would they be interested/available to look into this together?

Possibly we can look into what other standards (e.g. gsc) are providing for this data, and see if we can integrate those fields to the dna-derived data extension.

@ymgan Thank you so much @Saara! You asked a good question. Frankly, I don’t know what type of information is most important for this data, nor which of these are important for OBIS. These are questions from me because our data provider is not familiar with Darwin Core. It is my first time getting this type of information. The data is about specimens (known taxa) with COI sequences (for phylogeny and haplotypes network) and a subset of them with ddRAD sequencing to assess population structure and connectivity of different populations of certain cryptic species around Antarctica. Sure, I can connect them with you. Thanks for the offer, I hope we can meet and find a solution together.

@sformel For the genome size estimates, I would default to EMoF using C-value (or genome size) as the measurementType. Then I would encourage publication to Genome Size DBs like:

For ddRAD, the most important thing is that the data be cross-linked with INSDC archives of the assembled seqs. It’s also sounds like organismID, or other organism-specific metadata will be important for downstream users to reconstruct phylogenies and networks.

IMO, the conversation about what processed information for ddRAD is probably very similar to MAGs. The conversation about what metadata should be shared in biodiversity platforms is still evolving (i.e. we kicked that can down the road): Publishing DNA-derived data through biodiversity data platforms

Would you mind also raising these in the GBIF MDT discourse? It’s cool that you have examples to work with, that will help us figure it out: For matters relating to the Metabarcoding Data Toolkit or the datatype in general - Data Publishing - GBIF community forum

Questions that came up in our discussion with the data provider:

  1. How to link two different genetic data types that are derived from the same organims (COI-data and ddRAD data). Can we link two rows of the DNA-derived data to the same occurrenceID, or do we need separate occurrences?

  2. Suggestion to add a high-level category (data type) field to the DNA-derived data extension (i.e. ASV/OTU/metagenome/ddRAd/mitogenome/long-read etc…). This would help with data access later.

1 Like