Interacting with the cloud cluster

To run jobs remotely, configure the preference file with the remote servers.

Run: labdata run --help for help; labdata run list to list available analysis.

Example of the compute section in the user_preferences.json file:

"compute": {
    "remotes":{"aws": {
        "scheduler":"slurm",
        "user":"ec2-user",
            "permission_key":"<PERMISSION FILE>.pem",
            "address":"<AWS PARALLEL CLUSTER ADDRESS>",
            "analysis_options": { "spks":{"queue":"gpu"},
                                  "detect":{"queue":"cpu-large"},
                                  "populate":{"queue":"cpu"}},
        "pre_cmds": ["export LABDATA_DELETE_FILES_AFTER_POPULATE=1",
                     "export LABDATA_OVERRIDE_DATAPATH=/scratch",
                     "export APPTAINER_BIND=/scratch,/shared"]
        },
        "hodgkin": {
        "scheduler":"slurm",
        "user":"joao",
            "permission_key":"hodgkin-couto.pem",
            "address":"hodgkin",
            "analysis_options": { "spks":{"queue":"normal"},
                                  "detect":{"queue":"normal"},
                                  "populate":{"queue":"normal"}},
        "pre_cmds": []
        }
    },
    "containers":{"local":"/Users/<USER>/labdata/containers",
                  "storage":"analysis"},
    "analysis": {"detect":"labdata.<PLUGIN>.<BASECOMPUTE>"},
    "default_target":"aws"
}

If you set the default_target to aws then jobs will be submitted to the aws queue.

To monitor the jobs running on the queue use labdata2 run queue. This command also takes -t TARGET to check which jobs are running on a remote target but supports only SLURM schedulers at the moment.

Automate data ingestion

Upload jobs do not delete data from the server and it is therefore possible to submit jobs on that server to populate tables and perform analysis.

To automate populate of a user table, a repeted compute job can be added to the serve crontab (linux only). These automated jobs are specific to each user. To add an automated job:

  1. Open the user chontab crontab -e
  2. add the line */10 * * * * labdata2 run populate -t local --force-submit -- PLUGIN.TABLENAME -r completed_today -i labdata to the bottom of the file. Exit (is using the vim text editor type :wq). The line will run every 10 minutes and will run the populate method for files that were uploaded less than 24h ago.
  3. Don't forget to add a new line in the end of the crontab file

Creating a compute task for a plugin

Compute tasks can be useful to perform tasks that are compute intensive and require raw data or use specific resources (like preprocessing).

Any plugin can have one or multiple compute tasks.

Debugging AWSParallelCluster jobs

To request an interactive job (note that there are very few circunstances when you need this): srun -p gpu-large --exclusive --pty bash

To check the reason for a node not starting or being down: tail /var/log/slurmctld.log

Make a down node idle: sudo /opt/slurm/bin/scontrol update nodename=cpu-dy-m6id-2xlarge-4 state=idle

Building containers for compute tasks

To build a container:

``labdata build-container <container-path-to.sdef>``

Exporting data between projects

It should be done with the help of an administrator. In some cases, one may want to share a subset of a project with collaborators. The easiest way to do this is to create a separate project and migrate only the required data.

Opening a jupyter lab instance on the AWSParallelCluster instance

To launch a job: srun -p <partition> --exclusive --pty bash

To start the jupyter notebook: apptainer shell <path to an apptainer container>

Create a tunnel though the headnode: ssh -L 8898:<name of the compute node>:8888 -i "cluster.pem" ec2-user@<address to the cluster head node> -N

Then open a browser and point to localhost:8898/lab/token=...