GPU accelerated VDI Cubblies

Last modified by Xwiki VePa on 2024/02/14 09:29


Contact info

For any inquiries regarding VDI Cubblies please contact helpdesk@helsinki.fi.


General remarks:

  • Users' home directories are mounted NFS home directories. The quota for NFS homedir is 20 GB.
  • There is a three-hour idle timer, which means that after the remote session has been disconnected, the timer starts and if no reconnection happens within next three hours, then the user is forcibly logged off and the computer is freed to the pool of available machines.
  • There is also a ten-hour total session timer, which means that the remote session is automatically disconnected after ten hours has past from the last connection. Let us know if these limits are problematic to you.
  • TSclient link on the desktop will show your local files that are shared to the virtual desktop. You may select the folders to be shared from the Horizon Client settings.      

Check the specs with nvidia-smi. VMwareBlastServer is the remote connection agent. The screen output is encoded and send as H.264 stream to client endpoint.

nvidia-smi
psmaatta@vdi-cubbli-g02:~$ nvidia-smi
Sun Jun 10 10:35:36 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.57                 Driver Version: 390.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID P100-4Q        On   | 00000000:02:01.0  On |                  N/A |
| N/A   N/A    P0    N/A /  N/A |    780MiB /  4096MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1163      G   /usr/lib/xorg/Xorg                           136MiB |
|    0     11339    C+G   ...ent/VMwareBlastServer/VMwareBlastServer   345MiB |
+-----------------------------------------------------------------------------+



These should run Tensorflow on GPU without (any known) issues. Here's a sample. CUDA 9.1 is currently supported and CUDA 9.2 is not yet.

Test run
Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> x1 = tf.constant([1,2,3,4])
>>> x2 = tf.constant([5,6,7,8])
>>> result = tf.multiply(x1, x2)
>>> sess = tf.Session()
2018-06-09 23:54:57.726930: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-06-09 23:54:57.854369: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-06-09 23:54:57.855128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GRID P100-4Q major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:02:01.0
totalMemory: 4.00GiB freeMemory: 2.71GiB
2018-06-09 23:54:57.855153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-06-09 23:55:00.512760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-06-09 23:55:00.512820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-06-09 23:55:00.512832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-06-09 23:55:00.513004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2422 MB memory) -> physical GPU (device: 0, name: GRID P100-4Q, pci bus id: 0000:02:01.0, compute capability: 6.0)
>>> print(sess.run(result))
[ 5 12 21 32]
>>> quit()

Playing with CUDA samples.

Play with CUDA samples
luser@vdi-cubbli-g01:~$ /usr/local/cuda-9.1/bin/cuda-install-samples-9.1.sh ~/CUDA-samples
luser@vdi-cubbli-g01:~$ cd CUDA-samples/NVIDIA_CUDA-9.1_Samples/2_Graphics/Mandelbrot/
luser@vdi-cubbli-g01:~/CUDA-samples/NVIDIA_CUDA-9.1_Samples/2_Graphics/Mandelbrot$ make
  
/usr/local/cuda-9.1/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o Mandelbrot.o -c Mandelbrot.cpp
..clipclip..
luser@vdi-cubbli-g01:~/CUDA-samples/NVIDIA_CUDA-9.1_Samples/2_Graphics/Mandelbrot$ ../../bin/x86_64/linux/release/Mandelbrot


Mandelbrot Set.png