Installation

Installation on a local computer

To use labdata you need to install a set of python scripts in your local computer and configure a preference file.

To install labdata on a local computer you can install from pip using the command: pip install labdata or pip install labdata[dashboard] for GUI functionality.

After installation run labdata --help from a terminal. This will create a preference file in <USER_HOME>/labdata/user_preferences.json.

Edit the .json file to add the details from the database and S3 buckets. You can get the database address from the database admin.

Advanced instalation (for administrators)

Upload server

For uploading data from experimental computers, it is recomemded to use an upload server.

Cloud configuration (AWS)

Generate and copy container files. The cluster runs tasks using apptainer containers; the containers are defined as .sdef files located in the containers folder. To build a container, use the command labdata2 build-container <CONTAINER SDEF FILE> --upload this will also copy the container to the analysis data storage.

Using containers makes is simpler to deploy accross computers and attempts to improve reproducibility. Note that the reproducibility argument is only valid if one keeps track of different versions of the containers - which is at the moment not enforced by labdata structure.

Each "BaseCompute" class has a "container" that will be called. To update the container files at a target (download from the analysis storage, run the command: labdata container -t <TARGET> --update - NOT IMPLEMENTED).

Installing a cluster on AWS:

These instructions are for the ADMIN of the cluster. This is done with AWS ParallelCluster. The easiest is perhaps configuring from AWS CloudShell from a browser (icon in the lower left corner of the screen).

Create an EC2 Key Pair to use on the cluster, this will be used to communicate with the cluster. This can be done from the EC2 console.

First install pcluster: python3 -m pip install aws-parallelcluster --upgrade --user or python3 -m pip install "aws-parallelcluster==3.13.2" --upgrade --user to install a specific version.

Edit and put the cluster-config.yaml and on_start.sh files to the containers folder. Skeleton of these files are in the source code aws folder.

To check installed clusters: pcluster list-clusters

Download the cluster-config.yaml file to the the cloudshell: aws s3 cp s3://<ANALYSIS_BUCKET_NAME>/containers/cluster-config.yaml . You can just drag and drop the file to that folder (given the correct permissions).

Update the cluster: pcluster update-cluster -n labdata-cluster-spot -c cluster-config.yaml

To check the cluster definition, for a known cluster: pcluster describe-cluster -n labdata-cluster-spot

To delete: pcluster delete-cluster -n labdata-cluster-spot

To create: pcluster create-cluster -n labdata-cluster-spot -c cluster-config.yaml

For managing many nodes, you may need to use an instance with more memory. We use a t3.small for 35 nodes. You can always create the cluster with small capacity and increase the capacity later (so create-cluster doesn't complain).

After creating, you can use describe-cluster to check progress. If you have issues and need to monitor the head-node while building the cluster, do pcluster create-cluster -n labdata-cluster-spot -c cluster-config.yaml --rollback-on-failure false when creating, then you can go to EC2, instances and conenct to the instance using the "connect" button (use SSH, you'll need the permission key used in the config file):

After creating the cluster:

Find out what is the address of the headnode (from the EC2 instances page)
Copy the permission key to the labdata folder in your preference file so you can communicate with the headnode.
Add plugins to the headnode. alias ssh-ec2='ssh -i "$HOME/labdata/gpu-cluster.pem" ec2-user@ec2-35-161-9-244.us-west-2.compute.amazonaws.com'
Edit the headnode user-preference file vim /shared/labdata/user_preferences.json Add a database user and password that has write, update the schemas, but no delete. Make sure the user has permission to the required schemas (both lab and user). For example GRANT REFERENCES, SELECT, INSERT, UPDATE ON *.* TO 'ec2-user'@'%'; will grant this user permission to select and insert on all schemas.

Configuration on the local computer

To be able to submit jobs from a local computer you need to edit the compute section of the config file. Example configuration:

    "compute": {
        "remotes": {
            "aws": {
                "scheduler":"slurm",
                "user":"ec2-user",
                "permission_key":"gpu-cluster.pem",
                "address":"<<CLUSTER_NODE_ADDRESS>.<AWS_AZ>.compute.amazonaws.com",
                "analysis_options": { "spks":{"queue":"gpu"},
                                      "detect":{"queue":"cpu-large"},
                                      "caiman":{"queue":"cpu-large",
                                                "ncpus":8},
                                      "populate":{"queue":"cpu"}},
                "pre_cmds": ["export LABDATA_DELETE_FILES_AFTER_POPULATE=1",
                             "export LABDATA_OVERRIDE_DATAPATH=/scratch",
                             "export APPTAINER_BIND=/scratch,/shared"]
            }
    },
    "containers": {"local":"/Users/<USER_NAME>/labdata/containers",
                   "storage":"analysis"},
    "default_target":"aws"
    }

analysis_options lists the queue or other scheduler options that can be added to control the launching of the job. pre_cmds are commands appended to the job, used to configure labdata on the server. In the case of AWS, that is setting the datapath and the binds of the containers.

Database administration

To manage user permissions and manage the database, you may use a mysql client.

Add the following alias to the ~/.bashrc or, on mac, ~/.zshrc file to create a shortcut to connect to the mysql client. alias mysql-dataserver='mysql -u admin -p -h <DATABASE_ADDRESS>'

You can do this from the terminal using mysql -h <DATABASE ADDRESS> -u admin -p. The following commands are ran from the mysql client unless otherwise stated.

Adding users to the database

To add users, you need to first create the user and give the user permissions to specific database tables.

To create a user: CREATE USER 'USERNAME'@'%' IDENTIFIED BY 'PASSWORD'; where USERNAME is the desired user name and PASSWORD is the desired password. Check that the user was added: SELECT user, host from mysql.user;

To add permissions, run: GRANT REFERENCES, SELECT, INSERT, UPDATE ON `lab\_%`.* TO 'USERNAME'@'%'; to add permission to the main table. Note that this will also give update permissions. If you don't want the users to be able to change entries, give only references, select and insert permissions.

GRANT ALL PRIVILEGES ON `user\_USERNAME\_%`.* TO 'USERNAME'@'%'; to allow the user to create their own tables. The tables need to start with user_USERNAME.

Note: Users can change the password by using: ALTER USER 'USERNAME' IDENTIFIED BY ‘PASSWORD’;

Managing projects


import labdata.export import add_user_to_project
add_user_to_project(project, user, password=None, add_to_table=True)