The SLURM Job Maker

18 May 2017

HPC Centers are akin to beautiful, containerized aquariums. You know, these things?

/assets/img/posts/hpc/ecosphere.jpg

They exist primarily to serve their users, and over time, each institution has built up the ideal ecosystem for them. What this means, however, is that we aren’t very good at working on things together, or even moving life across environments. If you take a Stanford Sherlock fish and let him loose in a Texas Stampede fishtank, he’s probably not going to thrive, at least before understanding how the new environment works. In terms of running code, technology like containers can help that a bit, but containers don’t dig into the earthy host environment of where they are being run. The user stills needs to know how to use his or her cluster. The user still needs some familarity with job management, and a command line, generally. The pie in the sky would be a web-based job manager that users containers. I aspire to make that. But for now, let’s start with more low hanging fruit.

How can we better work together?

Collaboration across Centers

We probably won’t have the same ecosystems, but there is a good chance we are sharing some of the same tools. Given a long term vision of (someday) being able to seamlessly work across centers, or even between a local resource and a cloud resource, a good first step is figuring out how centers can better work together. I think the low hanging fruit is to build small tools that multiple centers can easily deploy for their users. This was my first goal with the Job Maker.

Job Maker

I have talked to many scientists that don’t mind writing or compiling code, or even debugging, but they get pains in their stomach when they think about getting their jobs running on a cluster. As a regular user of a cluster during graduate school, it never even occurred to me how many additional things I can specify in a job file (did you know that nodes have features?). In my mind, this isn’t about how good or bad the cluster or SLURM documentation is (it’s rather good, actually). It comes down to the fact that we (as users) don’t have nice, modern tools for helping us out. It should be the case that I am exposed to learning and can generate my script via an interface that isn’t “just read the entire slurm documentation page. Ugh.

Inspiration

One of my colleagues shared a link to the NERSC script generator, such an interface. It was simple, clean, and beautifully showed me the link between “the thing I want to do” and “this is the file that will do it.” I used this as inspiration to go one step further, and create a tool that would generate the entire interface, for one or more specific SLURM clusters, only by way of the configuration file for the cluster. My goals were as follows:

Generation

I shouldn’t need any additional dependencies or permissions

If I had permission to access the slurmdb (I believe this is the database where everything lives) then of course I could hook into that and find out just about anything. I could also use commands like scontrol, but again, this relies on that software dependency, and having permissions to run it. The most reasonable solution seemed to be to work with the slurm.conf file, which is normally located in /etc/slurm/slurm.conf and since most nodes (not root users) need to read it, it doesn’t have some wanky “only for super controllers of the HPC universe” setting.

Generation should work with one command

I have an attention span of a goldfish. If I can’t clone a repo, and quickly run something and get an output, I’m not usually interested. I also don’t want to spend the time to read long complicated documentation and setup. Thus, to use the job generator, you just need to run it in the same directory as a slurm.conf.

python slurm2json.py

I’m not telling you very much with that command, so let’s look at the details. The full running would really look more like this:

git clone https://www.github.com/researchapps/job-maker
cd job-maker/helpers
cp /etc/slurm/slurm.conf $PWD
python slurm2json.py

and this would be using the defaults of slurm.conf for the input file, and writing the default output data file of machines.json. I would then copy machines.json into the data folder, and dump the whole thing into some place with a web server (or a Github pages!) and it would deploy for my cluster. The python executable itself is supported for python versions 2 and 3, and doesn’t require any additional or special modules beyond json, sys, os, re, and argparse, which are supported back to 2.6. If this ever changed and I had more robust dependencies, then I would provide the entire thing as Docker or Singularity containers.

The tool should be able to modify and customize data structures

More realistically, the user (who in this case is some HPC admin) wants to have more control over the generation, and likely has multiple clusters to generate an interface for. And maybe each cluster has different partitions to include or disclude. Now we can look at the full command usage:

python slurm2json.py --help
usage: slurm2json.py [-h] [--input INPUT] [--update]
                     [--disclude-part DISCLUDE_PART] [--print] [--force]
                     [--outfile OUTFILE]

convert slurm.conf to machines.json

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         path to one or more slurm config files, separated by
                        commas. Default is one slurm.conf in present working
                        directory.
  --update              Update an already existing machines.json (or other)
  --disclude-part DISCLUDE_PART
                        Partitions to disclude, separated by commas
  --print               print to screen instead of saving to machines.json
  --force               Force overwrite of the output file, if it exists.
  --outfile OUTFILE     output json file. Default is machines.json

Notice that we can specify the --input and --outfile arguments, and we can even choose to disclude partitions, or update an already existing file. So to generate a data file for three clusters at Stanford, I first copied all of our slurm.conf files to some place:

$ ls slurm*.conf
slurm.conf  slurm-corn.conf  slurm-xstream.conf

and then I could generate a machines.json for all of them combined:

python slurm2json.py --input slurm.conf,slurm-corn.conf,slurm-xstream.conf

but I don’t want to do that, because one of the clusters need to have parititions removed. Instead, I first generate the data file for two of the clusters:

$ python slurm2json.py --input slurm.conf,slurm-corn.conf

Parsing slurm.conf,slurm-corn.conf, please wait!
All partitions will be included.
Adding cluster sherlock
Adding cluster farmshare2
Compiling clusters sherlock,farmshare2

and then I add the third, specifying to remove a partition:

$ python slurm2json.py --input slurm-xstream.conf --update --disclude-part pascal
Parsing slurm-xstream.conf, please wait!
Found machines.json to update.
pascal will not be included.
Adding cluster xstream
Compiling clusters xstream,sherlock,farmshare2

I won’t go into more detail about the commands, but if you want these, check out the Github repo README.

The Data Structure

The cool thing about basing the web interface off a data structure is that the data structure could be a general thing that is built into other tools or applications. The data structure is parsed from the slurm.conf file, and with any parsing task, it’s output should be checked when used with a new cluster in case of a convention that was not expected. This worked for three (different) clusters at Stanford, but I can’t say confidently this covers all cases. You can see what this beastly thing looks like, or keep reading for a gist. It generally takes this format:

- cluster1

   - partitions
      - partition1
      .
      - partitionN

  - nodes
      - node1
      .
      . nodeN

  - features
     - partition1
     .
     - partitionN

  - defaults
    - nodes
    - partitions
.
- clusterN

Parititions

A partition is a group of nodes with a particular qos (quality of service), with defaults for time, maximum memory, and memory per CPU. For example, slurm2json.py will return this for my graduate school lab’s partition:

{ "russpold"
     {"AllowQos": "normal,russpold,russpold_interactive,system",
      "DefMemPerCPU": "8000",
      "DefaultTime": "2:0:0",
      "MaxMemPerCPU": "8192",
      "PartitionName": "russpold",
      "maxNodes": 16}
}

Nodes

A node of course, is a node. It usually is associated with one or more partitions, which means different groups are allowed to use it. slurm2json.py might parse a node that looks like this:

{"sh-18-25": {"Feature": ""CPU_HSW,E5-2640v3,2.60GHz,NOACCL,NOACCL"",
  "RealMemory": "64000",
  "Weight": "100071",
  "partitions": ["normal",
                 "russpold",
                 "owners"]
}

By the way, today is russpold’s birthday! Happy birthday russpold!

Features

Features are attributes for a node, that (I think) we are allowed to define. Actually, slurm understands a feature as a “constraint,” specified with -C or --constraint (see here) For this application, I decided to make Features indexed by partitions, so when the user selects a partition, we can look up features available for it. Here are features available for the russpold partition:

 "russpold": ["CPU_IVY", "E5-2650v2", "2.60GHz", "NOACCL", "NOACCL"],

I could only guess as to what most of these mean. I would bet you most users wouldn’t know, let alone knowing that they are available. This is an indication that we need to do a better job.

Defaults

I noticed that some partitions and nodes have a “Default” indicator as a variable, and so I parse a lookup for defaults, for each general category of partitions, and nodes. In the case of my test file, I found that normal was the default partition:

{"nodes": [], "partitions": ["normal"]}

And I would then select this partition for the user if he/she did not select one. I (@vsoch) came up with the organization primarily to be able to look things up in the web interface as needed. There are definitely other ways to go about it.

Those are the basic commands. If you need further functionality, or better yet, want to contribute, please open an issue

HPC Centers - Let’s work Together!

A call to other centers! I bet you have a SLURM cluster lying around somewhere. And I can certaintly attest that this little web application can be so much better. I picked up a new framework (Vue.js) and annoyed my colleagues with questions like “How many leading zeros get parsed from name expressions?” for two days straight, and the result was the Job Maker. I’m looking at it right now, and the following things come to mind:

MPI

It would be nice if we could also parse MPI configuration. We don’t do a ton of MPI, so I didn’t add it. Does someone want to give a stab at a pull request (PR) to add this feature?

Interface Sections

More clear sectioning of the input fields is warranted. I should know what is required vs. optional, and perhaps there could be helper links to point me to learn more about a field, if I’m interested.

Integration across centers?

We can easily move json data structures across the web (this is the basis of RESTful APIs, of course) so would it be so hard to have a more robust “Job-Maker,” one that might let me select from more clusters, possibly across centers? The data structure, a single machines.json doesn’t make sense given large clusters, or wanting to combine across. For example, imagine if the machines.json looked like this:

{"stanford":
   {
	"sherlock": 'https://www.github.com/srcc/clusters/master/sherlock.json',
	"farmshare2": 'https://www.github.com/akkornel/clusters/master/farmshare2.json',
	"xstream": 'https://www.github.com/sthiell/clusters/master/xstream.json'
   },
  "berkeley":
   {
	"unicorn": 'https://www.github.com/berkeley-hpc/clusters/master/unicorn.json',  
	"acorn": 'https://www.github.com/berkeley-hpc/clusters/master/acorn.json'  
   }
 }

And then would let the user select a center first, followed by the clusters, and the interface would (akin to how it does now) load the required data upon the user selection. Could you imagine a portal that has a collection of shared tools like this, perhaps being updating programatically from each cluster, and serving data to users also programatically?

We need better collaboration between our groups, and dinky tools like this are a reasonable start! Imagine what we could do for documentation, and general help resources for our users? Please ping me, or post an issue on a board somewhere, if you want to chat!