The Analysis Jobs that the iReceptor Gateway runs on your behalf are extremely complex, and can therefore long run times can be quite common. Remember, analysis jobs are batch offline jobs, targeted at performing complex calculations on very large data sets. You should not expect them to be interactive in nature. Log out, go have a coffee, the jobs will still be there when you come back - don't worry, they will be done soon!
Here are some reasons as to why your jobs might be taking a long time...
Data sets of interst can be large!
Analysis Jobs can be run on any data that is discovered in the AIRR Data Commons. The first step in an Analysis Job is for the iReceptor Gateway to federate that data into a single data analysis unit. This can take a substantial amount of time, in particular if the data set consists of 10s or 100s of Millions of Rearrangements, Clones, or Cells and/or is being federated from many repositories. Federating and downloading such data sets can take many hours. If your job says it is "Federating Data" then the iReceptor Gateway is actively downloading data from the AIRR Data Commons repositories - please be patient!
Many people may be downloading data!
In order to maintain performance, the iReceptor Gateway only allows two active download at a time, one "small" download and one "large" download can happen concurently. If many users are trying to federate data sets for download or other Analysis Jobs then your download may be in a "Waiting" state before it can start federating data.
Data transfers to compute resources can take time!
Although typically faster than extracting and federating your data from the AIRR Data Commons, staging your federated data from the iReceptor Gateway to the compute resource being used for your analysis can also take time. Since this is typically a large transfer of a single file, this step is typicall much faster than the "Federating Data" step. If your job is in a "Staging Inputs" state, this is what it is doing!
Analysis compute resources can be busy!
Although the iReceptor Platform makes use of a large compute provider (the Research Alliance of Canada), we do have a fixed resource allocation with the Alliance. In addition, because this is a shared, national compute resource, the overall resource itself can be very busy. The iReceptor Gateway uses the Alliance's batch queueing system (called Slurm) to queue jobs for execution. Once your data and your analysis application has been staged to the compute resource, your job is "Queued". Depending on how busy the Alliance platform is as well as how many jobs have recently been submitted by iReceptor Gateway users, your job may run almost immediately or your job may sit in the Queue for some time (possibly hours). Again, please be patient. Note that you can have many jobs in a "Queued" state at the same time and in fact you may also have many jobs in the "Running" state at the same time (jobs run in parallel).
Analysis Jobs can take a long time to run!
Depending on how much data your jobs is analyzing, combined with the computational complexity of the job, an individal job may run for many hours. If your job is in a "Running" state, then the job is actively executing on a remote node of the Alliance compute resource. We currently have a 96 hour limit for any single job, but we expect most, it not all jobs, to take a fraction of that time.
Analysis output can be quite large!
The output of each Analysis Job is staged back to the iReceptor Gateway once the job is done, where you as a user can browse the output of your job. Just like the data input, the output from analysis jobs can be quite large, and therefore it can take some time for the the analysis output to be archived and moved back to the iReceptor Gateway. If your job is in an "Archiving" state, the job is actively copying the analysis output back to the iReceptor Gateway for you to assess, analyze, and download.
- Log in to post comments