...
Background work
Adapt volunteers to change their work practices
How to handle data- sensitive non-sensitive
We do not care about people who do not wish to adapt workflows to increasing complexities.
- We'll focus on user that are willing to adapt either by personal reasons (personal development.
Current situation
- Storage:
- sensitive: netApp, Umpio, local disks?
- non-sensitive: netApp, allas, local disks, kappa
- Applications and OS
- Applications
- Windows ?? → wild wild west
- Linux: packages and local installations
- Applications
- Storage:
- cPouta and ePouta users: Is it possible to move all applications under Modules
- ePouta, cPouta heterogeneous OS available → Standardize OS → General module repository
- Windows/Linux users: Is there a way to introduce VDI-Linux/HPC as an alternative solution for analysis?
- cPouta and ePouta users: Is it possible to move all applications under Modules
Pitfalls/ Challenges
- We cannot change the current data handling workflow (pipeline) to something else
- We can change something under hood:
- how data is copied, moved or analyzed
- automatization of some critical 'hand made processes'
- where data is analyzed (HPC. VDI, VDI-GPU)?
- We can change something under hood:
- People do not want any changes!
- People do not want to learn anything new if it's not necessary → Prove them benefits of new approach of analysis procedure
- Current culture is afraid of limitations:
- HPC batch job queue for analysis
- Storage quota
- etc..
- Can we give users such tools to improve the analysis and research
Actions
- Pilot case
- Jessica Lucenius case (Eläintalli, Neurotieteiden tutkimuskeskus)
- Current issues:
- They don't have a storage, where team can do the analysis from multiple locations
- space needed 100TB (will LSDCC solve this in the future?)
- Question to solve: How I/O intensive the analysis is?
- Solution:
- Create IDM-group to have an access to kappa-wrk Lustre storage, which can be mounted under Windows and Linux
- Question: Do they need backups of raw data and where it will be stored?
- Next steps
- Figure out current pipeline and the data flow
- Figure out where we can fork data to unchanging target place (CEPH/ALLAS ?) →
- current pipeline and identify bottlenecks and manually steps of pipeline
- First need is