One of the goals of both the AIRR Community and the iReceptor Project is to make it easy for researchers to find, share, compare, and reuse AIRR-seq data. The AIRR Community is setting standards around this (e.g. the MiAIRR Standard), and the iReceptor project is implementing those standards to help make that sharing a reality. Our implementation consists of three fundamental components,
- a database software stack for storing AIRR-seq data (the iReceptor Turnkey Repository),
- a web based API for querying such an AIRR-seq repository (the iReceptor Web API), and
- the iReceptor Scientific Gateway, a scientific web portal which can query one or more distributed AIRR-seq repositories to find, explore, and analyze the data in those repositories.
Rather than create a single, large repository, our goal is to federate many large, distributed repositories. The two driving reasons for the distributed repository approach are:
- It is not practical to expect a single central repository to scale to the size that will be required by the continually growing number of AIRR-seq studies and the amount of AIRR-seq data that those studies generate.
- Due to data privacy and ethics requirements, data stewards need to control and manage their own data and will be reluctant to upload their data to an external repository
Rather than a small number of large repositories, we envision many (10s or 100s) of institutional AIRR-seq data repositories, each managed and controlled by a local data steward, connected together in what we call the "AIRR Data Commons".
The iReceptor Turnkey repository is our effort to make installing and managing a database node in the AIRR Data Commons easy (or at least as easy as possible) for an individual researcher, a reserarch lab, an institution, or a company. The iReceptor Turnkey Repository consists of a database software stack (based on MongoDB), a web service based API that allows external users to query that repository through the API, and a set of services that help users curated data in the repository. If you are technically oriented, this is all done using Docker containers, making it easy to install and orchestrate the overall solution.
Our goal is to make it easy for a research group to download and install the software, curate their data into their repository, and then integrate that repository into the AIRR Data Commons network of repositories. By using the iReceptor Turnkey Repository, your reserach lab will have access to both a local repository for your data as well as the ability to share that data by integrating your repository node into the AIRR Data Commons. In this manner, it would then be simple to use the iReceptor Scientific Gateway to perform queries across all of the data in the AIRR Data Commons, including the data in your own repository.
It is important to point out that the data that is curated in the Turnkey Repository is driven largely by the recommendations of the AIRR Community, which includes the AIRR Minimal Standard for AIRR-seq study metadata or MiAIRR and the AIRR Data Representation working groups specification for AIRR rearrangement data. It is this data from the AIRR Data Commons that the iReceptor Scientific Gateway allows users to interactively explore and query.
If you are interested in downloading and installing the iReceptor Turnkey Repository, please visit the iReceptor Turnkey GitHub repository or email firstname.lastname@example.org.
The iReceptor project runs a set of repositories that store AIRR-seq data that are nodes in the AIRR Data Commons network of repositories. These repositories are federated, at the level of the iReceptor Scientific Gateway, so that they appear as a single repository, but in reality, behind the scenes, they exist as a cluster of repositories. This is one of the key features of the iReceptor Architecture and is what allows us to scale repositories to store a very large number of AIRR-seq data sets. Collectively, we call these repositories the iReceptor Public Archive or IPA. In order to access the repositories, please visit the iReceptor Scientific Gateway.
Details on the provenance of the data stored in each of the repositories can be found in the Repository Provenance pages linked below: