The AIRR Data Commons and the MiARR Standard

One of the goals of both the AIRR Community and the iReceptor Project is to make it easy for researchers to find, share, compare, and reuse AIRR-seq data (antibody/B-cell and T-cell receptor repertoires). Two main initiatives developed by the AIRR Community support these goals. First, the MiAIRR Standard is a set of standards and protocols for curating and sharing these complicated and immense repertoire repositories. Second, the AIRR Data Commons (ADC) is a distributed system of AIRR-seq data repositories that follow these standards, thus utilizing a common data model, a common query language, and common interoperability formats for storage, query, and downloading of AIRR-seq data. The iReceptor project provides a science gateway that makes that sharing a reality for our users. Our implementation consists of three fundamental components,

a database software stack for storing AIRR-seq data (the iReceptor Turnkey Repository),
a web based API for querying such an AIRR-seq repository (the iReceptor Web API), and
the iReceptor Scientific Gateway, a scientific web portal which can query one or more distributed AIRR-seq repositories to find, explore, and analyze the data in those repositories.

Rather than create a single, large repository, our goal is to federate many large, distributed repositories. The two driving reasons for the distributed repository approach are:

It is not practical to expect a single central repository to scale to the size that will be required by the continually growing number of AIRR-seq studies and the amount of AIRR-seq data that those studies generate.
Due to data privacy and ethics requirements, data stewards need to control and manage their own data and will be reluctant to upload their data to an external repository.

Rather than a small number of large repositories, we envision many (10s or 100s) of institutional AIRR-seq data repositories, each managed and controlled by a local data steward, connected together in what we call the "AIRR Data Commons".

AIRR Data Commons Repositories accessible through the iReceptor Gateway

iReceptor COVID-19 (iReceptor Turnkey)

In collaboration with the AIRR Community and in response to the AIRR Community's call for sharing COVID-19 related AIRR-seq data the iReceptor Project has deployed a set of iReceptor Turnkey Repositories and the project team is curating public COVID-19 data on behalf of the Community. These repositories are queryable using the AIRR Data Commons Web API and are searchable using the iReceptor Gateway.

If you have or know of any public COVID-19 AIRR-seq data please contact us at support@ireceptor.org. If you are interested tracking the research on COVID-19, and in particularl those studies that are producing AIRR-seq data, please follow along on the b-t.cr Wiki on COVID-19 public data. This area is evolving fast, so please let us know if there are any papers or data sets available.

For more information on the iReceptor COVID-19 repositories, please visit the COVID-19 page. Details on the provenance of the data stored in the AIRR COVID-19 repositories can be found in the Repository Provenance page.

The iReceptor Public Archive

The iReceptor project runs a set of repositories that store AIRR-seq data that are nodes in the AIRR Data Commons network of repositories. These repositories are federated, at the level of the iReceptor Scientific Gateway, so that they appear as a single repository, but in reality, behind the scenes, they exist as a cluster of repositories. This is one of the key features of the iReceptor Architecture and is what allows us to scale repositories to store a very large number of AIRR-seq data sets. Collectively, we call these repositories the iReceptor Public Archive or IPA. To access the repositories, please visit the iReceptor Scientific Gateway.

As of June 4, 2020, the iReceptor Platform transitioned to iReceptor v3.0. This release transitions to using v1.3 of the AIRR Standards and v1.0 of the AIRR Data Commons API (ADC API). For more details on this transition, in particular around the impact on data provenance, please see the iReceptor v3.0 Provenance page.

Details on the provenance of the data stored in the iReceptor Public Archive repositories can be found in the Repository Provenance page.

VDJServer Community Data Portal

The first integration of remote AIRR-seq repositories through the iReceptor Gateway was between the VDJServer Community Data Portal and the iReceptor Public Archive. This reflected a strong collaboration nurtured through AIRR Community Meetings and the AIRR Community Common Repository Working Group. VDJServer was established with funding from the National Institute of Allergy and Infectious Diseases (#1R01A1097403). VDJServer is a partner in the iReceptor+ Consortium (https://www.ireceptor-plus.com) with current support from the European Union’s Horizon 2020 Research and Innovation Program (#825821). High-Performance compute and data resources are provided by The Texas Advanced Computing Center (TACC) at The University of Texas at Austin.

Details on the provenance of the data stored in the VDJServer ADC Repository, as searchable from the iReceptor Gateway, can be found in the Repository Provenance page.

VDJBase AIRR-Seq repository (iReceptor Turnkey)

In February of 2021 the fourth AIRR Data Commons repository was added. A collaboration through the iReceptor Plus Project, a community portal hosted by the Yaari Lab and iReceptor teams was created to host AIRR-seq data as part of the AIRR Data Commons.

Details on the provenance of the data stored in the VDJBase ADC Repository, as searchable from the iReceptor Gateway, can be found in the Repository Provenance page.

sciReptor AIRR-Seq repository (iReceptor Turnkey)

In September of 2021 the fifth AIRR Data Commons repository was added. A collaboration through the iReceptor Plus Project and DKFZ in Germany led to the installation of an iReceptor Turnkey repository at DKFZ as part of the AIRR Data Commons.

Details on the provenance of the data stored in the sciReptor ADC Repository, as searchable from the iReceptor Gateway, can be found in the Repository Provenance page.

NICD AIRR-Seq repository (iReceptor Turnkey)

In February of 2022 the sixth AIRR Data Commons repository was added. A collaboration with the Scheepers group at NICD, South Africa and the iReceptor team led to the installation of an iReceptor Turnkey repository at the NICD in South Africa as part of the AIRR Data Commons.

Details on the provenance of the data stored in the NICD ADC Repository, as searchable from the iReceptor Gateway, can be found in the Repository Provenance page.

University of Meunster AIRR-Seq repository (iReceptor Turnkey)

In March of 2022 the seventh AIRR Data Commons repository was added. A collaboration with the Schwab group at the University of Meunster, Germany and the iReceptor team led to the installation of an iReceptor Turnkey repository at the University of Muenster as part of the AIRR Data Commons.

Details on the provenance of the data stored in the Schwab Lab ADC Repository, as searchable from the iReceptor Gateway, can be found in the Repository Provenance page.

Roche/Kings College London repository (iReceptor Turnkey)

In June of 2022 the eighth AIRR Data Commons repository was added. A collaboration with Roche, Kings College London, and the iReceptor team led to the installation of an iReceptor Turnkey hosted by iReceptor as part of the AIRR Data Commons.

Details on the provenance of the data stored in the Roche ADC Repository, as searchable from the iReceptor Gateway, can be found in the Repository Provenance page.

Type 1 Diabetes (T1D) repository (iReceptor Turnkey)

In January of 2023 the ninth AIRR Data Commons repository was added. A collaboration with University of Colorado, University of Florida, the Sugar Science group, and the iReceptor team led to the installation of an iReceptor Turnkey hosted by IReceptor as part of the AIRR Data Commons.

Details on the provenance of the data stored in the T1D ADC Repository, as searchable from the iReceptor Gateway, can be found in the Repository Provenance page.

The iReceptor Turnkey Repository Software

The iReceptor Turnkey repository software is our effort to make installing and managing a database node in the AIRR Data Commons easy (or at least as easy as possible) for an individual researcher, a research lab, an institution, or a company. The iReceptor Turnkey Repository consists of a database software stack (based on MongoDB), a web service based API that allows external users to query that repository through the API, and a set of services that help users curate data in the repository. If you are technically oriented, this is all done using Docker containers, making it easy to install and orchestrate the overall solution.

Our goal is to make it easy for a research group to download and install the software, curate their data into their repository, and then integrate that repository into the AIRR Data Commons network of repositories. By using the iReceptor Turnkey Repository, your research lab will have access to both a local repository for your data as well as the ability to share that data by integrating your repository node into the AIRR Data Commons. In this manner, it would then be simple to use the iReceptor Scientific Gateway to perform queries across all of the data in the AIRR Data Commons, including the data in your own repository.

It is important to point out that the data that is curated in the Turnkey Repository is driven largely by the recommendations of the AIRR Community, which includes the AIRR Minimal Standard for AIRR-seq study metadata or MiAIRR and the AIRR Data Representation working groups specification for AIRR rearrangement data. It is this data, from the AIRR Data Commons, that the iReceptor Scientific Gateway allows users to interactively explore and query.

If you are interested in downloading and installing the iReceptor Turnkey Repository, please visit the iReceptor Turnkey GitHub repository or email support@ireceptor.org.