One of the challenges in sharing and re-using data is the different vocabularies used to organize, store and disseminate information. While sharing this information across repositories it becomes critical that the vocabularies used to represent information mean the same thing. This would enable the information to be consumed and processed, both manually and programmatically, for proper semantic understanding and processing.
The mechanism used to handle this is by the use of controlled vocabularies in a standardized manner. The standards list the controlled vocabularies used for representing information related to a specific field of knowledge. They are as they mean, controlled by standardizing the vocabulary used to represent information. In addition, each controlled vocabulary is defined by specifically describing the information it represents.
Once the information from the data is organized and specified using the controlled vocabularies, the data has more fidelity. The data if shared will be practically more useful and will help in understanding of scientific knowledge with better precision and accuracy, lowering the margin of error due to incorrect interpretation.
Controlled vocabulary used by iReceptor:
iReceptor uses the Adaptive Immune Receptor Repertoire (AIRR) community standards for both, the study metadata and the annotated receptor sequence features. For data fields that are not currently representable by the standards are represented by iReceptor’s own field indicated by the “_ir” tag. The standards are developed by scientists with expertise in the AIRR field. Their work is published in - and can be followed here: AIRR community standards and MiAIRR standard.
The metadata of a study are captured and represented using the minimal standard fields determined by the AIRR community as available here:AIRR minimal standard data elements. iReceptor uses the “AIRR Formats WG field name” in the data elements table to represent the study metadata.
The controlled vocabulary for the data representing annotated receptor sequence features are obtained from the work of the AIRR data representation working group. The different annotation tools’ (IMGT, MiXCR) data fields are mapped to the standardized AIRR rearrangement controlled vocabulary.
In future work, for filling in the categorical values for the study data elements, iReceptor will take advantage of several well-developed ontologies.
Breden F et al. Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data. Front Immunol 8:1418 (2017)
Rubelt F et al. AIRR Community Recommendations for Sharing Immune Repertoire Sequencing Data. Nat Immunol 18:1274 (2017)
Vander Heiden JA et al. AIRR Community Standardized Representations for Annotated Immune Repertoires. Front Immunol. (2018)