This year, the cluster will undergo a major hardware and software upgrade, described below.
Important notice
We remind you that we do not perform backups of your data under any circumstances. Users are encouraged to regularly transfer their data from the cluster to their own storage resources and to delete from the cluster data that is no longer useful for calculations. This is valid in all circumstances and we decline any responsibility for your files on the cluster.
Software update complete
The software update was performed in March.
The OS has been upgraded from CentOS 7.9 to Rocky 9.5 on all machines.
Slurm has been upgraded from version 18.08.8 to 23.11.10.
The drivers and firmware have been updated.
After the software update, please note that:
- As drivers will be updated, codes compiled by the users may have to be re-compiled using the new modules.
- The software modules were updated as well. The most important modules are available (GNU, OpenMPI and Intel compilers, miniconda, Python, …). However, their version (hence their name) has changed. Before you submit any job, please make sure to adjust the module name you need to load. Some modules can only be loaded if you load their dependencies first. Example: to load HDF5, you first need to load gnu14/14.2.0 then hdf5/1.14.5. Click here if you need help on how to use the modules.
- The NFS (/home) now has storage limits (1 Tb per user). The NFS should not be used for computations (frequent read/write on files). This impacts the cluster performance. Please use your /workspace folder instead. The /workspace will still have no storage limits. Affected users (with too many files on the NFS) will be contacted in a separate e-mail.
- Load limits on login-hpc have been implemented to prevent users from running code directly on this machine and slowing it down.
Remaining operations schedule
This summer, the cluster will move from its current location at Inria Sophia Antipolis to the new machine room on the Valrose campus.
- July 4, 2025: Cluster shutdown. No more access to your data, no calculation possible.
- July 7-18, 2025: Moving the existing cluster, adding new hardware and reinstalling it. Total unavailability of the cluster.
- July 21 to August 1, 2025: Configuration of new hardware. Disruptions are also expected in access to existing facilities.
- August 1, 2025: Expected commissioning of the entire cluster.
New hardware in summer 2025
In July 2025, the following hardware will be added to the existing one:
35 PowerEdge R6625 machines for CPU computing, each equipped with:
- 2 AMD EPYC 9124 processors, 3 GHz with 16 cores (= 32 cores per machine)
- 384 GB RAM
3 PowerEdge R760XA machines for GPU computing, each equipped with:
- 4 NVIDIA H100 NVL cards, PCIe, 350 W-400 W, 94 GB
- 2 Intel Xeon Gold 5420+ 2 GHz processors with 28 cores (= 56 cores per machine)
- 512 GB RAM
The memory of the existing SMP machine will be increased to 1.5 TB.
The BeeGFS storage will be doubled to 480 TB.