-- MichalHusejko - 05 Aug 2014

The 60 node cluster (Current system)

  • 10Gb Ethernet NIC: (Intel) NetEffect NE020 10Gb Accelerated Ethernet Adapter (iWARP RNIC) (rev 05)
  • 10Gb Ethernet Switch: HP

Head node

Chassis Cores Memory GPU/Co-processor Internal I/O Public Network OS Version Software installed Machine name
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA Intel NetEffect iWARP 1 Gb SLC 6.5 NFS server, Storage of 3.7 TB (RAID5) exported through NFS to all compute nodes, and available under /data lxbrf45c01

Compute nodes

In total there are 58 compute nodes.

Chassis Cores Memory GPU/Co-processor Internal I/O Public Network OS Version Software installed Machine name
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf45c0[2-3]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf47c0[2-8]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf49c0[1-8]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf51c0[1-4]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf53c0[1-8]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf55c0[1-8]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf57c0[1-8]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf59c0[1-4]
E4 SandyBridge 2*6cores = 12cores 64 GB@1333 NA iWARP 10Gb 1 Gb SLC 6.4 OFED 3.5-2 lxbrf61c0[1-8]

MPI setup - MVAPICH2

export MV2_USE_IWARP_MODE=1
export MV2_USE_RDMA_CM=1
export PATH=/usr/mpi/gcc/mvapich2-1.9/bin/:$PATH
export LD_LIBRARY_PATH=/usr/mpi/gcc/mvapich2-1.9/lib64:$LD_LIBRARY_PATH

export MPI_HOME=/usr/mpi/gcc/mvapich2-1.9
export MPI_INCLUDE=/usr/mpi/gcc/mvapich2-1.9/include

Simple MPI tests

-bash-4.1$ mpichversion
MVAPICH2 Version:       1.9
MVAPICH2 Release date:  Mon May  6 12:25:08 EDT 2013
MVAPICH2 Device:        ch3:mrail
MVAPICH2 configure:     --prefix=/usr/mpi/gcc/mvapich2-1.9/ --with-device=ch3:mrail --with-rdma=gen2 --enable-shared --enable-rdma-cm --libdir=/usr/mpi/gcc/mvapich2-1.9/lib64
MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX:   c++   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
MVAPICH2 FC:    gfortran   -O2



-bash-4.1$ mpirun_rsh -n 2 lxbrf45c02 lxbrf45c04 MV2_CPU_MAPPING=3:3 /afs/cern.ch/work/m/michalh/public/systems/qcd40/mpi/osu-micro-benchmarks-4.3/mpi/pt2pt/osu_latency

-bash-4.1$ mpirun_rsh -n 2 lxbrf45c02 lxbrf45c04 MV2_CPU_MAPPING=3:3 /afs/cern.ch/work/m/michalh/public/systems/qcd40/mpi/osu-micro-benchmarks-4.3/mpi/pt2pt/osu_bw 



pdsh -w lxbrf47c0[2-8],lxbrf49c0[1-3,5-8],lxbrf51c0[1-4],lxbrf53c0[1-8],lxbrf55c0[1-8],lxbrf45c0[2-4] 'pgrep mpi'

Hostfile

Current 50 healthy compute nodes.


lxbrf45c02
lxbrf45c03
lxbrf45c04
lxbrf47c04
lxbrf47c05
lxbrf47c06
lxbrf47c07
lxbrf49c01
lxbrf49c02
lxbrf49c04
lxbrf49c05
lxbrf49c06
lxbrf49c07
lxbrf49c08
lxbrf51c01
lxbrf51c02
lxbrf51c03
lxbrf51c04
lxbrf53c01
lxbrf53c02
lxbrf53c03
lxbrf53c04
lxbrf53c05
lxbrf53c06
lxbrf53c07
lxbrf53c08
lxbrf55c01
lxbrf55c02
lxbrf55c03
lxbrf55c04
lxbrf55c05
lxbrf55c06
lxbrf55c08
lxbrf59c01
lxbrf59c02
lxbrf59c04
lxbrf57c01
lxbrf57c02
lxbrf57c03
lxbrf57c04
lxbrf57c05
lxbrf57c06
lxbrf57c07
lxbrf57c08
lxbrf61c01
lxbrf61c03
lxbrf61c05
lxbrf61c06
lxbrf61c07
lxbrf61c08


Nodes with hardware failures:


lxbrf47c02 - serious problem with dirac/time1, dirac/time2 checks
lxbrf47c03 - serious problem with dirac/time1, dirac/time2 checks
lxbrf47c08 - minor problem with dirac/time1, dirac/time2 checks
lxbrf49c03 - minor problem with dirac/time1, dirac/time2 checks
lxbrf55c07 - moderate problem with dirac/time1, dirac/time2 checks
lxbrf59c03 - minor problem with dirac/time1, dirac/time2 checks

lxbrf61c02 - HDD fail
lxbrf61c04 - HDD fail

Administration node

lxbrf47c01

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2016-05-26 - AritzBrosaIartza
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    HardwareLabs/HardwareLabsPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback