This is the seventh blog post in a series of articles about using the CCEB cluster. An overview of the series is available here. This post focuses on advanced topics related to interactive sessions.
Though this post centers around advanced content related to interactive sessions, you’ll still be requesting appropriate system parameters using the LSF
bsub command. We focus here on more advanced options associated with the
bsub to request more cores for parallelization, force memory constraints on your session, and request certain machines. Just like a basic interactive session, once a session is open you can run your code on the loaded host interactively. This allows you to work dynamically on your code. Utilizing this resource on the cluster is best for when you’re writing code for the first time, debugging code, or running code that uses a small amount of memory and runs fast.
In this section, I will cover some of what I find to be the most useful
bsub options. For a more rigorous full list of
bsub commands and their usage see the IBM documentation here.
Recall the most basic
bsub for opening an interactive session is as follows:
bsub -Is -q cceb_interactive "bash"
This command opens an interactive session with 1 core by default.
Some common additions include:
-n #- to request multiple cores
bsub -Is -q cceb_interactive -n 8 "bash"
This command opens an interactive session and requested 8 cores by default.
-R "span[hosts=1]"- to request that the number of cores you obtain be on the same execution host.
bsub -Is -q cceb_interactive -n 8 -R "span[hosts=1]" "bash"
This command opens an interactive session and requested 8 cores all on the same execution host. The option provided in 2 will also open a session on 8 cores but does not guarantee that all 8 cores are the on the same machine. For example, 4 cores could be on “silver01” and 4 on “silver02”.
-R "rusage[mem=####]"- to request the amount of memory required for your job to run. You job will not run until this amount of memory is available and once running you will retain/reserve the memory. Note, #### memory is in megabytes.
Warning: Only request the amount of memory you think your job requires or else other users will not have it available to them! Remember, sharing is caring! In a future blog post I’ll share some tips on quality controlling and detecting the amount of memory your jobs are using to input an educated amount.
bsub -Is -q cceb_interactive -n 8 -R "span[hosts=1] rusage[mem=5000]" "bash" bsub -Is -q cceb_interactive -n 8 -R "rusage[mem=5000]" "bash"
-m "machinename"- to request a specific execution host by name.
bsub -Is -q cceb_interactive -m "silver01" -n 8 -R "span[hosts=1] rusage[mem=5000]" "bash" bsub -Is -q cceb_interactive -m "silver01" -n 8 -R "rusage[mem=5000]" "bash"
-M ####- kills the job if it exceeds the #### memory amount you allotted.
bsub -Is -q cceb_interactive -m "silver01" -n 8 -R "span[hosts=1] rusage[mem=5000]" -M 10000 "bash" bsub -Is -q cceb_interactive -m "silver01" -n 8 -R "rusage[mem=5000]" -M 10000 "bash"
You don’t necessarily need to use these all together. I commonly use the last set of code and fill in the options. Order doesn’t always matter but with a few (like -m “machinename”) it does. If the
bsub errors first try re-arranging the arguments more logically or looking up the order of options in the manual.
-M ##### are good cluster etiquette so that your job doesn’t take all the memory on a host and crash or slow the entire host. You should try to always use either
-M #### or “rusage[mem=####]” so that you don’t accidentally run out of memory and crash the system.
Again, interactive sessions should be avoided unless writing or quality controlling your code. Full jobs should always be sent through the normal queue.
The grid is a Unix based machine so the easiest way to parallelize is by using
parallel::mclapply() in R. This is a nice overview of HPC computing in R. The information does not exactly align with what I showed here but provides nice descriptions.
On the cluster in R, the classic
parallel::detectCores() does not work. Other package variants that check the system for how many cores are available also do not work. These functions report the number of cores available to use on your execution host not the number you requested in your
Instead, if you want to automatically detect cores based on the number you requested after you
Sys.getenv('LSB_DJOB_NUMPROC'). This will return a character vector with the number of cores you are currently hosting.
We will run example code that will utilize parallel computing with a
parallel::mclapply statement on 8 cores.
ssh onto the cluster, request 8 cores with a
bsub, load R, and open R. The code below is the
bsub you should run after
ssh’ing onto the cluster and loads R.
bsub -Is -q cceb_interactive -n 8 -R "span[hosts=1] rusage[mem=5000]" -M 5000 "bash" module load R/3.5.0 R
We can compare the output returned from running
Sys.getenv('LSB_DJOB_NUMPROC') using the code below.
library(parallel) parallel::detectCores() Sys.getenv('LSB_DJOB_NUMPROC')
The screen shot below shows the output from running these two commands.
parallel::detectCores() incorrectly reports 40 cores. This is the number of cores available on this machine but since you only requested 8 with the
bsub command you only have access to 8.
Sys.getenv('LSB_DJOB_NUMPROC') does properly report access to 8 cores. A character vector is returned from this function call so it is often useful to
Below I provide a function,
example, that takes a random sample of the
iris dataset provided by R then fits
Sepal.Length using a linear model. The function returns the fitted model.
Rationale behind some of this code:
saveRDS()because it is really fast.
message()function let’s you know how your code is progressing and gives insight as to where bugs may be if the code errors.
To run this code on your own machine, you should set
wd to the working directory where you would like to save your results.
We can run this code in parallel over 100 iterations using
parallel::mclapply(). Each iteration will take a unique random sample from the full iris dataset, fit the linear model, save the model, and also return the model locally. Please set the same seed if you would like results to exactly match what is presented here. Running this code should output the messages directed from running the
example function for the 100 iterations.
set.seed(23) # Run the example function using 100 iterations on 8 cores # as.numeric the system report of 8 cores results = parallel::mclapply(1:100, example, wd = '/project/taki3/amv/cluster/', mc.cores = as.numeric(Sys.getenv('LSB_DJOB_NUMPROC')))
You MUST ALWAYS specify
mc.cores! Do not use the default
NULL on the cluster.
length(results) results[] results[] list.files('/project/taki3/amv/cluster/')
Notice, the results returned in the first element of the
results list is not the same as the second. This indicates each iteration properly randomly sampled from the iris dataset. If each model in each element of the list was the same it would indicate you did not properly randomly sample across iterations in parallel and there is a bug in your code.
Now exit out of R and your interactive session.
This should keep you logged onto the cluster but put you back onto the submission host. Now, let’s use a
bsub command with an insufficient memory allocation for the job.
bsub -Is -q cceb_interactive -n 8 -R "span[hosts=1] rusage[mem=5000]" -M 30 "bash" module load R/3.5.0 R
Here we request a session using
-M 30 rather than
-M 5000. Recall that the
-M #### option will kill the interactive session if your memory goes beyond that requested. We requested 30 megabytes, a very small amount of memory. If you re-run the
example function in this session and then run the code using the
parallel::mclapply statement. The code will freeze and your session will terminate since your job will use more than 30 megabytes.
-M #### option is a really great way to quality control your job and ensure you do not accidentally use all the memory on a machine.
There are a number of options available when opening an interactive session. Depending on what you want and need for your job you may only specify a small subset or you may need more sophisticated options available for quality control. Interactive sessions are best when writing code, de-bugging and quality control, and running short fast sets of code. For longer more intensive jobs a normal session will be more appropriate.