How to use RStudio Server in HPC environment

Last modified by Pekka Hintsanen on 2024/03/26 10:23

UNDER DEVELOPMENT
This feature basicly works. Environment tuning is not possible, e.g. modules are not available.

Using Rstudio Server in HPC environment is a little bit tricky operation. Here is the How to Guide to explain how to do it properly.

The basic idea is to reserve an interactive session from HPC and start the Rstudio Server in that session. Then connect your browser to the computing node you are running your interactive session. The HPC computing nodes are not visible to the outside network, so the computer running the browser needs to be connected to the university network, or you need a tunnel your network traffic inside the university network.

Please notice that the examples here might not work out of the box. You might need to edit the host addresses and the port numbers because the cluster will not always give you the same node, and something else might use the port specified in the example.

How to run a Rstudio Server on a compute node

First, we need to log in to your favorite login node, e.g. turso.cs.helsinki.fi . If you are working in your sensitive data computing environment you need to log in to to your team's login node. Then request an interactive session. You might want to reserve the resources you need and choose the cluster where you want to run your job. The guide on how to do that is here HPC Environment User Guide#4.2InteractiveSessions

Here is an example of HPC, where you want compute in Vakka federation (NOTE: working directory is /wrk-vakka/users/$USER):

<USER>@<LOGINNODE>:~$ srun --interactive -M ukko --pty bash
srun: job 70875278 queued and waiting for resources
srun: job 70875278 has been allocated resources
<USER>@<COMPUTENODE>:~$

Here is an example  of sensitive data environment (Arkku):

<USER>@<LOGINNODE>:~$ srun --interactive -p <YOUR_PARTITION> --pty bash
srun: job 70875278 queued and waiting for resources
srun: job 70875278 has been allocated resources
<USER>@<COMPUTENODE>:~$

Starting Rstudio Server on a compute node

Rstudio is started on compute node. Check the environment variable and modify accordingly. Note the password that you use to login to rstudio.
Update at 26.3.2024: Newest rstudio version is 4.3.2 (/appl/opt/commercial/rstudio-server/rstudio-server-4.3.2-R-4.sif).



TMPDIR=~/rstudio-tmp # your choice
mkdir -p $TMPDIR/tmp/rstudio-server
uuidgen > $TMPDIR/tmp/rstudio-server/secure-cookie-key
chmod 600 $TMPDIR/tmp/rstudio-server/secure-cookie-key
mkdir -p $TMPDIR/var/{lib,run}
PASSWORD='yourpasswordtologintoGUI' singularity exec -B $TMPDIR/var/lib:/var/lib/rstudio-server -B $TMPDIR/var/run:/var/run/rstudio-server -B $TMPDIR/tmp:/tmp /appl/opt/rstudio/<VERSION>/rstudio_<VERSION>.sif rserver --auth-none=0  --auth-pam-helper-path=pam-helper --www-address=0.0.0.0 --www-port 8787  --server-user $USER

NOTE! newest version has also some additional R versions installed. You can check them e.g. with this command:
singularity exec /appl/opt/commercial/rstudio-server/rstudio-server-4.3.2-R-4.sif ls /opt/R

If you want to use some specific R version, add R version argument to the end of start command:
PASSWORD='XXX' singularity exec -B $TMPDIR/var/lib:/var/lib/rstudio-server -B $TMPDIR/var/run:/var/run/rstudio-server -B $TMPDIR/tmp:/tmp /appl/opt/commercial/rstudio-server/rstudio-server-4.3.2-R-4.sif rserver --auth-none=0  --auth-pam-helper-path=pam-helper --www-address=0.0.0.0 --www-port 8787  --server-user $USER --rsession-which-r /opt/R/4.1.3/bin/R

Accessing Rstudio Server

Ok. This is more difficult. We need to create a network tunnel inside a university network to get this working. Ssh has a feature called port forwarding which we can use. So we can use any interactive Linux hosts the University of Helsinki provides. In this example we use melkki.cs.helsinki.fi.

The ssh syntax looks like this:

ssh -L <port at localhost>:<compute node dns name>:<port at compute node> <host dns name which has access to the university network>

So, -L switch means that we are tunneling all the data sent or received from host <compute node dns name>:<port at compute node> to local port <port at localhost> on the computer where you are running the ssh command. And the tunnel goes through the <host dns name, which has access to the university network>

ssh -L 8787:<COMPUTENODE>cs.helsinki.fi.local.:8787 melkki.cs.helsinki.fi

Running that command creates a tunnel from <COMPUTENODE>.cs.helsinki.fi's port 8787 to our local computer's port 8787 through an ssh session connected to melkki.cs.helsinki.fi.

TIP! to get the exact compute node name, you can check it with hostname:

$USER@<COMPUTENODE>:~$ hostname
<COMPUTENODE>.local.cs.helsinki.fi
$USER@<COMPUTENODE>:~$ 

Now we can start a web browser type an address "localhost:8787" to a web browser address field, and it will open the Rstudio Server session on the browser.

Installing / loading packages

Proxy setup is needed in this environment always when you download and install R packages/libraries or need some other online resources in your session. Run these commands in your rstudio-server session immediately after you have successfully accessed the server:

 Sys.setenv(http_proxy =www-cache.cs.helsinki.fi":3128")
Sys.setenv(https_proxy =www-cache.cs.helsinki.fi":3128")