The AIRR Data Commons and the MiARR Standard

One of the goals of both the AIRR Community and the iReceptor Project is to make it easy for researchers to find, share, compare, and reuse AIRR-seq data (antibody/B-cell and T-cell receptor repertoires). Two main initiatives developed by the AIRR Community support these goals. First, the MiAIRR Standard is a set of standards and protocols for curating and sharing these complicated and immense repertoire repositories. Second, the AIRR Data Commons is a distributed system of AIRR-seq data repositories that follow these standards, thus utilizing a common data model, a common query language, and common interoperability formats for storage, query, and downloading of AIRR-seq data. The iReceptor project provides a science gateway that makes that sharing a reality for our users. Our implementation consists of three fundamental components,

  1. a database software stack for storing AIRR-seq data (the iReceptor Turnkey Repository),
  2. a web based API for querying such an AIRR-seq repository (the iReceptor Web API), and
  3. the iReceptor Scientific Gateway, a scientific web portal which can query one or more distributed AIRR-seq repositories to find, explore, and analyze the data in those repositories.

Rather than create a single, large repository, our goal is to federate many large, distributed repositories. The two driving reasons for the distributed repository approach are:

  1. It is not practical to expect a single central repository to scale to the size that will be required by the continually growing number of AIRR-seq studies and the amount of AIRR-seq data that those studies generate.
  2. Due to data privacy and ethics requirements, data stewards need to control and manage their own data and will be reluctant to upload their data to an external repository.

Rather than a small number of large repositories, we envision many (10s or 100s) of institutional AIRR-seq data repositories, each managed and controlled by a local data steward, connected together in what we call the "AIRR Data Commons".

AIRR Data Commons Repositories accessible through the iReceptor Gateway

AIRR COVID-19

In collaboration with the AIRR Community and in response to the AIRR Community's call for sharing COVID-19 related AIRR-seq data the iReceptor Project has prepared an iReceptor Turnkey Repository and the project team is curating public COVID-19 data on behalf of the Community. Currently two iReceptor COVID-19 repositories are available online at covid19-1.ireceptor.org and covid19-2.ireceptor.org. These repositories are queryable using the AIRR Data Commons Web API and are searchable using the iReceptor Gateway.

If you have or know of any public COVID-19 AIRR-seq data please contact us at support@ireceptor.org. If you are interested tracking the research on COVID-19, and in particularl those studies that are producing AIRR-seq data, please follow along on the b-t.cr Wiki on COVID-19 public data. This area is evolving fast, so please let us know if there are any papers or data sets available.

For more information on the iReceptor COVID-19 repositories, please visit the COVID-19 page

The iReceptor Public Archive

The iReceptor project runs a set of repositories that store AIRR-seq data that are nodes in the AIRR Data Commons network of repositories. These repositories are federated, at the level of the iReceptor Scientific Gateway, so that they appear as a single repository, but in reality, behind the scenes, they exist as a cluster of repositories. This is one of the key features of the iReceptor Architecture and is what allows us to scale repositories to store a very large number of AIRR-seq data sets. Collectively, we call these repositories the iReceptor Public Archive or IPA. To access the repositories, please visit the iReceptor Scientific Gateway.

As of June 4, 2020, the iReceptor Platform transitioned to iReceptor v3.0. This release transitions to using v1.3 of the AIRR Standards and v1.0 of the AIRR Data Commons API (ADC API). For more details on this transition, in particular around the impact on data provenance, please see the iReceptor v3.0 Provenance page.

Details on the provenance of the data stored in the iReceptor Public Archive repositories can be found in the Repository Provenance page.

VDJServer

The first integration of remote AIRR-seq repositories through the iReceptor Gateway was between VDJServer and the iReceptor Public Archive.  This reflected a strong collaboration nutured through AIRR Community Meetings and the AIRR Community Common Repository Working Group.  VDJServer is Funded by a National Institute of Allergy and Infectious Diseases research grant (#1R01A1097403), the “VDJServer” project is led by The University of Texas Southwestern (UTSW) Medical Center in collaboration with the J. Craig Venter Institute and Yale University. The Texas Advanced Computing Center (TACC) at The University of Texas at Austin leads the cyberinfrastructure implementation, including the high performance computing (HPC) systems, storage, and software solutions.

The iReceptor Turnkey Repository Software

The iReceptor Turnkey repository software is our effort to make installing and managing a database node in the AIRR Data Commons easy (or at least as easy as possible) for an individual researcher, a research lab, an institution, or a company. The iReceptor Turnkey Repository consists of a database software stack (based on MongoDB), a web service based API that allows external users to query that repository through the API, and a set of services that help users curate data in the repository. If you are technically oriented, this is all done using Docker containers, making it easy to install and orchestrate the overall solution.

Our goal is to make it easy for a research group to download and install the software, curate their data into their repository, and then integrate that repository into the AIRR Data Commons network of repositories. By using the iReceptor Turnkey Repository, your research lab will have access to both a local repository for your data as well as the ability to share that data by integrating your repository node into the AIRR Data Commons. In this manner, it would then be simple to use the iReceptor Scientific Gateway to perform queries across all of the data in the AIRR Data Commons, including the data in your own repository.

It is important to point out that the data that is curated in the Turnkey Repository is driven largely by the recommendations of the AIRR Community, which includes the AIRR Minimal Standard for AIRR-seq study metadata or MiAIRR and the AIRR Data Representation working groups specification for AIRR rearrangement data. It is this data, from the AIRR Data Commons, that the iReceptor Scientific Gateway allows users to interactively explore and query. 

If you are interested in downloading and installing the iReceptor Turnkey Repository, please visit the iReceptor Turnkey GitHub repository or email support@ireceptor.org.