2022 Supercomputer upgrade


Picasso facility

At end 2021, the infrastructure of Picasso was modified, adding space for additional racks, improvements in the air conditioing, power facilities and UPS. That allowed the upgrade of almost all the nodes, an upgrade that has moved Picasso to a new level, with more than 30.000 compute cores and a pool of machine learning optimized machines with 32 dedicated A100 GPUs.

Total available resources:

 - 30.616 compute cores
 - 156 TB of RAM
 - 32 x A100 GPUS with 13K Tensor Cores, 110K FP32 cores and 1280 GB of RAM
 - 2 PetaFLOPS in FP64 + 20 PetaFLOPS AI in GPUs
 - 960 Tb of shared storage with GPFS via Infiniband HDR100/200
 - 6 PB of object store storage

 

Below you will find the all available machines and their respective resources:

 

Cluster Lenovo ThinSystem SR645 

  - 2 x AMD EPYC 7H12 processors/node x 64 cores = 128 cores/node
  - 160 nodes x 128 cores = 20.480 cores
  - 2.60 GHz per core (base). Up to 3.3 GHz
  - 160 nodes x 512 GB RAM = 80 TB total RAM
  - Infiniband HDR network at 100 Gbit/s
  - Localscratch of 950 GB

 

Cluster Nvidia DGX A100

 - 2 x AMD EPYC 7742 processors/node x 64 cores = 128 cores/node
 - 4 nodes x 128 cores = 512 cores
 - 2.25 Ghz per core (base). Up to 3.4 GHz
 - 4 nodes x 1024 GB RAM = 4 TB total RAM
 - 4 nodes x 8 Infiniband HDR network at 200 Gbit/s (only inside DGX cluster)
 - Infiniband HDR network at 200 Gbit/s (all cluster)
 - 4 nodes x 8 Nvidia A100 GPUS = 32 Nvidia A100 GPUs
 - 32 Nvidia A100 GPUs x 3456 cores CUDA FP64 = 110.592 cores CUDA on GPUs.
 - 32 Nvidia A100 GPUs x 422 Tensor cores for AI =  13504 Tensor cores on GPUs.
 - 32 Nvidia A100 GPUs x 40 GB VRAM per GPU = 1280 GB VRAM on GPUs.
 - Localscratch of 950 GB

 

Cluster Lenovo ThinSystem SD530

- 2 x Intel Xeon Gold 6230R processors/node x 26 cores = 52 cores/node
- 126 nodes x 52 cores = 6552 cores
- 2.10 GHz per core. Up to 4 GHz
- 126 nodes x 192 GB RAM = ~ 24 TB total RAM
- Infiniband HDR network at 100 Gbit/s
- Localscratch of 950 GB
 

Cluster Atos Bull 

 - 2 x AMD EPYC 7H12 processors x 64 cores = 128 cores
 - 24 nodes x 128 cores = 3072 cores
 - 2.60 GHz per core. Up to 3.3 GHz
 - 24 nodes x 2 TB RAM = 48 TB total RAM
 - Infiniband HDR network at 200 Gbit/s
 - Localscratch of 3.5 TB 
 

 

ESX virtualization cluster

proliants All our critical virtual desktops and servers are hosted on a VMware VSAN cluster with High Availability and VMotion. This cluster consists of: 

 - 5 x Lenovo at 2.66 Ghz with 512 GB of RAM each
 - 40 TB shared VSAN storage.
 - 25 Gbits internal network.
 - 10 Gbits external network.

 

Shared storage

All our clusters uses a shared storage with GPFS.
 
 
Top

2018 Supercomputer upgrade


picasso8m

At mid 2018, the shared storage system has been upgraded, adding a new high performance filesystem with 555 TB and a long term object storage with a raw capacity of 6 PB. This new upgrade increases the performance of the filesystem thanks to a super fast SSD layer of 55TB. Hot data is kept on the faster storage layer, being moved down to a second level of 500 TB NL-SAS disks when data is not used. The storage uses the infiniband network between all nodes of the system. 

 

Total available resources:

      - 4016 compute cores

      - 74 TFLOPS in compute nodes + 33 TFLOPS in GPUs

      -

 

Below you will find the all available machines and their respective resources:

 

Cluster HP Intel E5-2670 

picasso izq

- 48 HP-SL230G8 x 2 processors x 160 GLOPS = 16 TFLOPS
- 32 GPUS M2075 x 1030 GFLOPS = 33 TFLOPS
- 48 x 2 E5-2670 processors x 8 cores = 768 cores
- 2.60GHz per core
- 48 * 64 GB RAM = 3 TB total RAM
- Infiniband FDR network at 54.54 Gbit/s
- Shared scratch in lustre filesystem

 

Shared memory machines with 2 TB of RAM each.

dl980- 7 DL980G7 x 8 processors x 96 GLOPS = 5 TFLOPS
- 7 x 8 E7-4870 processors x 10 cores = 560 cores
- 2.40GHz per core
- 7 x 2 TB RAM = 14 TB total RAM
- 7 x 2 TB RAID5 = 14 TB local scratch
- Infiniband QDR network at 32 Gbit/s

 
 

Cluster IBM Intel E5-2670


picasso6m

 - 168 IBM dx360 M4 x 2 processors x 160 GFLOPS = 53 TFLOPS
 - 168 x 2 Intel E5-2670 processors x 8 cores = 2668 cores
           - 2.60GHz per core

           - 168 * 32 GB RAM = 5.4 TB total RAM

           - Infiniband FDR network at 40 Gbit/s

           - Shared scratch in lustre filesystem

 

 

ESX virtualization cluster

proliants All our critical virtual desktops and servers are hosted on a VMware ESX Enterprise cluster with High Availability and VMotion. This cluster consists of: 

 - 2 x HP-DL380G5 at 2.66 Ghz with 32 GB of RAM each

 For those virtual machines and user grade desktops that doesn't need High Availability support, there is set of machines with the ESXi hypervisor installed. These servers are:

 - 3 x HP-DL380G5 at 2.66 Ghz with 16 GB of RAM each
 - 4 x HP-DL385G7 at 2.13 Ghz with 124 GB of RAM each

 

Shared storage

The shared storage with Lustre FS was replaced by a GPFS filesystem with  555 Tb of shared storage vía Infiniband. Also 6 PB of object store storage is used for archiving of long term project's data. 
Top

2017 Supercomputer upgrade


picasso8m

At mid 2017, the cluster of 41 nodes with AMD Opteron 6176 has been replaced with 168 IBM dx360 M4 compute nodes. This new upgrade increases the amount of available compute cores up to a total of 4016. Now, all nodes share the same infiniband network and a common architecture, and are unified behind an unique Slurm queue system that delivers jobs to the most adecuate equipment based on requested resources.

 

Total available resources:

      - 4016 compute cores

      - 74 TFLOPS in compute nodes + 33 TFLOPS in GPUs

      - 750 Tb of shared storage with lustre

      - Infiniband QDR/FDR interconnection network.

 

 

Below you will find the all available machines and their respective resources:

 

Cluster HP Intel E5-2670 

picasso izq

- 48 HP-SL230G8 x 2 processors x 160 GLOPS = 16 TFLOPS
- 32 GPUS M2075 x 1030 GFLOPS = 33 TFLOPS
- 48 x 2 E5-2670 processors x 8 cores = 768 cores
- 2.60GHz per core
- 48 * 64 GB RAM = 3 TB total RAM
- Infiniband FDR network at 54.54 Gbit/s
- Shared scratch in lustre filesystem

 

Shared memory machines with 2 TB of RAM each.

dl980- 7 DL980G7 x 8 processors x 96 GLOPS = 5 TFLOPS
- 7 x 8 E7-4870 processors x 10 cores = 560 cores
- 2.40GHz per core
- 7 x 2 TB RAM = 14 TB total RAM
- 7 x 2 TB RAID5 = 14 TB local scratch
- Infiniband QDR network at 32 Gbit/s

 
 

Cluster IBM Intel E5-2670


picasso6m

 - 168 IBM dx360 M4 x 2 processors x 160 GFLOPS = 53 TFLOPS
 - 168 x 2 Intel E5-2670 processors x 8 cores = 2668 cores
           - 2.60GHz per core

           - 168 * 32 GB RAM = 5.4 TB total RAM

           - Infiniband FDR network at 40 Gbit/s

           - Shared scratch in lustre filesystem

 

 

ESX virtualization cluster

proliants All our critical virtual desktops and servers are hosted on a VMware ESX Enterprise cluster with High Availability and VMotion. This cluster consists of: 

 - 2 x HP-DL380G5 at 2.66 Ghz with 32 GB of RAM each

 For those virtual machines and user grade desktops that doesn't need High Availability support, there is set of machines with the ESXi hypervisor installed. These servers are:

 - 3 x HP-DL380G5 at 2.66 Ghz with 16 GB of RAM each
 - 4 x HP-DL385G7 at 2.13 Ghz with 124 GB of RAM each

 

Shared storage

ddnDDN SFA rackAll our clusters uses a shared storage with Lustre FS, supported by a DDN storage rack with five three dimensional disk enclosures and two redundant SFA10000 controllers.
 
It is actually deployed with 750 TB of storage, communications are done via infiniband or normal network, depending of the nodes and the network they have available. 
 
 
Top

2013 Supercomputer upgrade


At the begining of 2013, all supercomputing resources has been upgraded. Now, they share a common architecture, and are unified behind an unique Slurm queue system that delivers jobs to the most adecuate equipment based on requested resources.

Below you will find the new available machines and their respective resources:

 

Cluster Intel E5-2670 

picasso izq

- 48 HP-SL230G8 x 2 processors x 160 GLOPS = 16 TFLOPS
- 32 GPUS M2075 x 1030 GFLOPS = 33 TFLOPS
- 48 x 2 E5-2670 processors x 8 cores = 768 cores
- 2.60GHz per core
- 48 * 64 GB RAM = 3 TB total RAM
- Infiniband FDR network at 54.54 Gbit/s
- Shared scratch in lustre filesystem

 

Shared memory machines with 2 TB of RAM each.

dl980- 7 DL980G7 x 8 processors x 96 GLOPS = 5 TFLOPS
- 7 x 8 E7-4870 processors x 10 cores = 560 cores
- 2.40GHz per core
- 7 x 2 TB RAM = 14 TB total RAM
- 7 x 2 TB RAID5 = 14 TB local scratch
- Infiniband QDR network at 32 Gbit/s
 
 
 

Cluster AMD Opteron 6176


picasso der

 - 41 HP-DL165G7 x 2 processors x 110 GFLOPS = 9 TFLOPS
 - 41 x 2 Opteron 6176 processors x 12 cores = 984 cores
 - 2.30GHz per core
 - 41 x 96 GB RAM =  4 TB RAM
 - Gigabit ethernet network.
 - 41 * 6 TB RAID5 = 246 TB total scratch

 

 

ESX virtualization cluster

proliants All our critical virtual desktops and servers are hosted on a VMware ESX Enterprise cluster with High Availability and VMotion. This cluster consists of: 

 - 2 x HP-DL380G5 at 2.66 Ghz with 32 GB of RAM each

 For those virtual machines and user grade desktops that doesn't need High Availability support, there is set of machines with the ESXi hypervisor installed. These servers are:

 - 3 x HP-DL380G5 at 2.66 Ghz with 16 GB of RAM each
 - 4 x HP-DL385G7 at 2.13 Ghz with 124 GB of RAM each

 

Shared storage

ddnDDN SFA rackAll our clusters uses a shared storage with Lustre FS, supported by a DDN storage rack with five three dimensional disk enclosures and two redundant SFA10000 controllers.
 
It is actually deployed with 750 TB of storage, communications are done via infiniband or normal network, depending of the nodes and the network they have available. 
 
Top

Additional information