Get access to the paper's data

The raw data from most papers (certainly those curated by the iReceptor Team) are stored at NCBI's Sequence Read Archive (or SRA) or the European equivalent of the EBI, the European Nucleotide Archive (or ENA). These are two common repositories where researchers publish their sequence data after publication. Typically researchers will refer to where they publish their data in a "Data Access" or similar section of their paper, where they would reference the accession number of their data set (e.g. SRP042205 or ERP002120). If you have the accession number, you can typically refer to the data for that accession number using a direct URL as follows: https://www.ncbi.nlm.nih.gov/sra/SRP042205

Once you have found the sequence data in a repository, you will need to download each of the sequence files for the study in question. In addition, you will need to map each downloaded file, using the appropriate NCBI BioProject and BioSample pages to metadata from the publication. 

Using the paper's NCBI/EBI sequence data repository:

  • Match the metadata in the NCBI/EBI repository to metadata in the publication
  • Match the samples in the paper to BioSample and SRA IDs
  • If there are issues in determining a correct mapping from the publication to files in the NCBI/EBI repository, contact the author and resolve the issues.

Once these mappings are defined, this information should be stored in the appropriate MiAIRR fields in the Metadata CSV file.