NP04 Offline

Monitoring Data Flow

Meeting at CERN

https://indico.cern.ch/event/724210/

These are questions we would like to cover in our discussion related to monitoring. Based on our discussion with Geoff it looks like it would be useful to do the following:

  • Add monitoring all buffers filling rate, quotas, send alarms
  • Add monitoring of services FTS - light and FTS
  • Monitoring of network (protodune switch) - need to see if we could get this information from CERN ES
  • Add monitoring of EOS usage and quota, r/w rates and alarms
  • Pull from MONIT ES EOS report.log info related to protodune
  • Create dashboard's layout for shifters (red/green/yellow buttons with the ability to drill down)
  • Add data quality batch jobs monitoring and output plot creation date alert.
  • What alarms should we generate? Where should they go? log book? Slack?
  • What is the timeline for this?

Data Challenge 2

https://wiki.dunescience.org/wiki/ProtoDUNE_Dual_Phase_and_Single_Phase_Joint_Data_Challenge#Detailed_day-to-day_run_plan

  • Monitoring on np04-srv-001 and 002
    • iotop
    • top
    • iftop -b -i bond0
    • ganglia???
  • CERN network monitoring - https://cs-capc-pc.cern.ch:8182
    • 40GigabitEthernet1/2/6 - 40 Gb Link to EOS
    • np04-srv-001 bonded link (20Gb)
      • 10GigabitEthernet1/1/17 - 10 Gb link
      • 10GigabitEthernet1/1/18 - 10 Gb link
    • np04-srv-002 bonded link (20Gb)
      • 10GigabitEthernet1/1/19 - 10 Gb link
      • 10GigabitEthernet1/1/20 - 10 Gb link
  • np04-srv-001
[root@np04-srv-001 ~]# uname -r
3.10.0-693.17.1.el7.x86_64

[root@np04-srv-001 ~]# ./np04-check-disk-config.sh 
/sys/block/md0/md/group_thread_cnt = 0
/sys/block/md1/md/group_thread_cnt = 0
/sys/block/md2/md/group_thread_cnt = 0
/sys/block/md3/md/group_thread_cnt = 0
/proc/sys/vm/dirty_background_ratio = 10
/proc/sys/vm/dirty_ratio = 20
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw 20480   512  4096          0  60010408181760   /dev/md0
rw 20480   512  4096          0  60010408181760   /dev/md1
rw 20480   512  4096          0  60010408181760   /dev/md2
rw 20480   512  4096          0  60010408181760   /dev/md3
/proc/sys/dev/raid/speed_limit_min = 1000
/proc/sys/dev/raid/speed_limit_max = 200000
  • np04-srv-002
[np04data@np04-srv-002 test]$ uname -r
4.4.126-1.el7.elrepo.x86_64

[root@np04-srv-002 ~]# ./np04-check-disk-config.sh 
/sys/block/md0/md/group_thread_cnt = 4
/sys/block/md1/md/group_thread_cnt = 4
/sys/block/md2/md/group_thread_cnt = 4
/sys/block/md3/md/group_thread_cnt = 4
/proc/sys/vm/dirty_background_ratio = 1
/proc/sys/vm/dirty_ratio = 2
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw 131072   512  4096          0  60010408181760   /dev/md0
rw 131072   512  4096          0  60010408181760   /dev/md1
rw 131072   512  4096          0  60010408181760   /dev/md2
rw 131072   512  4096          0  60010408181760   /dev/md3
/proc/sys/dev/raid/speed_limit_min = 50000
/proc/sys/dev/raid/speed_limit_max = 500000

CERN Openstack Virtual Machines

SLC 6

dune-vm-build-02

[root@dune-vm-build-02 yum]# yum install https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
sudo yum install cvmfs cvmfs-config-default


[root@dune-vm-build-02 yum]# yum repolist
Loaded plugins: changelog, kernel-module, protectbase, security, tsflags,
              : versionlock
156 packages excluded due to repository protections
repo id      repo name                                                status
cernvm       CernVM packages                                                60+1
epel         UNSUPPORTED: Extra Packages for Enterprise Linux add-ons 12,216+156
slc6-extras  Scientific Linux CERN 6 (SLC6) add-on packages, no forma        854
slc6-os      Scientific Linux CERN 6 (SLC6) base system packages           6,963
slc6-updates Scientific Linux CERN 6 (SLC6) bugfix and security updat     27,107
repolist: 47,200

[root@dune-vm-build-02 ~]# cvmfs_config setup

create /etc/cvmfs/default.local

[root@dune-vm-build-02 cvmfs]# cat default.local 
CVMFS_REPOSITORIES=dune.opensciencegrid.org,fermilab.opensciencegrid.org
CVMFS_HTTP_PROXY="http://ca-proxy.cern.ch:3128"

CentOS 7

dune-vm-build-01

root/config.txt
Geoff Savage
03-Aug-2017

Some notes on configuring a CC7 VM for running protoDUNE software.

[root@dune-vm-build-01 ~]# locmap --list
[Available Modules]

Module name : kerberos[enabled]
Module name : sendmail[enabled]
Module name : cernbox[disabled]
Module name : ntp[enabled]
Module name : gpg[enabled]
Module name : cvmfs[disabled]
Module name : ssh[enabled]
Module name : lpadmin[enabled]
Module name : nscd[disabled]
Module name : eosclient[disabled]
Module name : sudo[enabled]
Module name : afs[enabled]

[root@dune-vm-build-01 ~]# locmap --disable lpadmin
[INFO] Disabling lpadmin module..

[root@dune-vm-build-01 ~]# locmap --enable cernbox
[INFO] Enabling cernbox module.

[root@dune-vm-build-01 ~]# locmap --enable cvmfs
[INFO] Enabling cvmfs module.

[root@dune-vm-build-01 ~]# locmap --enable eosclient
[INFO] Enabling eosclient module.

[root@dune-vm-build-01 ~]# locmap --list
[Available Modules]

Module name : kerberos[enabled]
Module name : sendmail[enabled]
Module name : cernbox[enabled]
Module name : ntp[enabled]
Module name : gpg[enabled]
Module name : cvmfs[enabled]
Module name : ssh[enabled]
Module name : lpadmin[disabled]
Module name : nscd[disabled]
Module name : eosclient[enabled]
Module name : sudo[enabled]
Module name : afs[enabled]

[root@dune-vm-build-01 ~]# locmap --configure all
[INFO] Configuring all enabled modules.

[INFO] Account dostefan already exists.

[INFO] Account dsavage already exists.

[INFO] Account espinal already exists.

[INFO] Account hennessy already exists.

[INFO] Account mpotekhi already exists.

[INFO] Account mxp already exists.

[INFO] Account pordes already exists.

[INFO] Account rosulej already exists.

[INFO] User(s) ['espinal'] found with public ssh key.

Notice: Compiled catalog for dune-vm-build-01.cern.ch in environment production in 1.44 seconds
Notice: /Stage[main]/Eosclient/Package[jemalloc]/ensure: created
Notice: /Stage[main]/Afs::Pam/Exec[enable krb5]/returns: executed successfully
Notice: /Stage[main]/Eosclient/Package[lsof]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse::Interactive/File[/etc/profile.d/eos-select.sh]/ensure: defined content as '{md5}df9d30b18bfa938a6c5026f969c11e1b'
Notice: /Stage[main]/Eosclient::Fuse/File[/usr/local/sbin/eos-cleanup.sh]/ensure: defined content as '{md5}28d816a3b3e11c1ddc6bc1045fac7188'
Notice: /Stage[main]/Eosclient::Fuse/Cron[eos-cleanup]/ensure: created
Notice: /Stage[main]/Cernbox::Install/Yumrepo[cernbox]/ensure: created
Notice: /Stage[main]/Cernbox::Install/Package[cernbox-client]/ensure: created
Notice: /Stage[main]/Eosclient/Package[eos-client]/ensure: created
Notice: /Stage[main]/Afs::Config/Exec[Grab CellServDB context from http://afs.web.cern.ch/afs/CellServDB]/returns: executed successfully
Notice: /Stage[main]/Cvmfs::Install/Package[cvmfs]/ensure: created
Notice: /Stage[main]/Cvmfs::Install/File[/etc/cvmfs/cvmfsfacts.yaml]/ensure: defined content as '{md5}26e3278e2a9df93b20d7e1db0488c020'
Notice: /Stage[main]/Cvmfs::Config/File[/etc/fuse.conf]/content: content changed '{md5}464a1a320cb1cf48fcbc83e190fd4c31' to '{md5}5ba907162a0f2dd533489cfeab344e58'
Notice: /Stage[main]/Cvmfs::Config/Augeas[cvmfs_automaster]/returns: executed successfully
Notice: /Stage[main]/Cvmfs/Cvmfs::Domain[cern.ch]/File[/etc/cvmfs/domain.d/cern.ch.local]/ensure: defined content as '{md5}966b23b617d08b59f592f7c3b6505468'
Notice: /Stage[main]/Cvmfs::Config/File[/etc/cvmfs/domain.d/README.PUPPET]/ensure: defined content as '{md5}0c1d1b9d346f6cdbb5e20f934ac49aee'
Notice: /Stage[main]/Eosclient::Fuse::Interactive/File[/etc/profile.d/eos-select.csh]/ensure: defined content as '{md5}ae1b8f7a526782d607efdb5fa6715923'
Notice: /Stage[main]/Eosclient::Fuse/Package[eos-fuse-core]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Augeas[eosfuse_logrotate]/returns: executed successfully
Notice: /Stage[main]/Eosclient::Fuse/File[/etc/sysconfig/eos]/ensure: defined content as '{md5}d7008f16d5929cd8770faba6f57fa5a0'
Notice: /Stage[main]/Eosclient::Fuse/File[/eos]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[ams]/File[/etc/sysconfig/eos.ams]/ensure: defined content as '{md5}569b139de2f0940fdb4025ca93042178'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[experiment]/File[/var/log/eos/fuse/fuse.experiment.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[project]/File[/etc/sysconfig/eos.project]/ensure: defined content as '{md5}3ac93ac5c5942778621227b6da77bf22'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[workspace]/File[/etc/sysconfig/eos.workspace]/ensure: defined content as '{md5}81d529291994997f303e8f9521793d76'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[atlas]/File[/var/log/eos/fuse/fuse.atlas.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[user]/File[/var/log/eos/fuse/fuse.user.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[theory]/File[/var/log/eos/fuse/fuse.theory.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[project]/File[/var/log/eos/fuse/fuse.project.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[lhcb]/File[/etc/sysconfig/eos.lhcb]/ensure: defined content as '{md5}7d157012c6429002f5ff20c3e659975c'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[ams]/File[/var/log/eos/fuse/fuse.ams.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[user]/File[/etc/sysconfig/eos.user]/ensure: defined content as '{md5}d2892958adc5e61051095fa3cd51404c'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[cms]/File[/var/log/eos/fuse/fuse.cms.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[cms]/File[/etc/sysconfig/eos.cms]/ensure: defined content as '{md5}e5ef155e40446d31aac6bddba8b4175d'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[atlas]/File[/etc/sysconfig/eos.atlas]/ensure: defined content as '{md5}8c730671fae0d96184d72f686db9d772'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[theory]/File[/etc/sysconfig/eos.theory]/ensure: defined content as '{md5}c2b84cc382f7929a0f9b70c13140712c'
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[lhcb]/File[/var/log/eos/fuse/fuse.lhcb.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[workspace]/File[/var/log/eos/fuse/fuse.workspace.log]/ensure: created
Notice: /Stage[main]/Eosclient::Fuse/Eosclient::Mount[experiment]/File[/etc/sysconfig/eos.experiment]/ensure: defined content as '{md5}5501b88dcf1d60305af13a13e7098a1b'
Notice: /Stage[main]/Eosclient::Fuse::Interactive/Exec[EOSFUSE at SSH login]/returns: executed successfully
Notice: /Stage[main]/Eosclient::Autofs/Ini_setting[set autofs browse mode yes]/value: value changed '[redacted sensitive information]' to '[redacted sensitive information]'
Notice: /Stage[main]/Eosclient::Autofs/File[/etc/auto.eos]/ensure: defined content as '{md5}b6ac5b7578da7513cd8f8c48781f4e4f'
Notice: /Stage[main]/Eosclient::Autofs/Augeas[eos_automaster]/returns: executed successfully
Notice: /Stage[main]/Cvmfs::Config/Concat[/etc/cvmfs/default.local]/File[/etc/cvmfs/default.local]/ensure: defined content as '{md5}4bb037b24d4f343388abf3d75192a2b1'
Notice: /Stage[main]/Cvmfs::Service/Service[autofs]/ensure: ensure changed 'stopped' to 'running'
Notice: /Stage[main]/Cvmfs::Service/Exec[Reloading cvmfs]: Triggered 'refresh' from 1 events
Notice: Finished catalog run in 81.18 seconds

[root@dune-vm-build-01 ~]# 


[root@dune-vm-build-01 ~]# cat .k5login 
# This file is managed by puppet and will be overwritten
# every time puppet runs against the system.
dostefan@CERN.CH
dsavage@CERN.CH
espinal@CERN.CH
hennessy@CERN.CH
mpotekhi@CERN.CH
mxp@CERN.CH
pordes@CERN.CH
rosulej@CERN.CH


[root@dune-vm-build-01 sudoers.d]# cat 000-sudo-users 
#Sudo privileges to the responsible user
dostefan       ALL=(ALL)       ALL
dsavage       ALL=(ALL)       ALL
espinal       ALL=(ALL)       ALL
hennessy       ALL=(ALL)       ALL
mpotekhi       ALL=(ALL)       ALL
mxp       ALL=(ALL)       ALL
pordes       ALL=(ALL)       ALL
rosulej       ALL=(ALL)       ALL


[root@dune-vm-build-01 sudoers.d]# cat cern-config-users 
## This file is controlled by the cern-config-users script, do no edit!
dostefan       ALL=(ALL)       ALL
dsavage        ALL=(ALL)       ALL
espinal        ALL=(ALL)       ALL
hennessy       ALL=(ALL)       ALL
mpotekhi       ALL=(ALL)       ALL
pordes         ALL=(ALL)       ALL
rosulej        ALL=(ALL)       ALL


From landb.
Responsible for the device:
DUNE-COMP-VM E-GROUP EP URD  
DUNE-COMP-VM@CERN.CH   /  Tlf: 7XXXX  
Main User of the device:
DUNE-COMP-VM E-GROUP EP URD 
DUNE-COMP-VM@CERN.CH   /  Tlf: 7XXXX  


From lxplus
First need to set the project correctly.
.openrc
export OS_PROJECT_NAME=DUNE
openstack server show dune-vm-build-01
[dsavage@lxplus073 ~]$ openstack server set --property landb-mainuser="dune-comp-users" dune-vm-build-01
ERROR: openstack No server with a name or ID of 'dune-vm-build-01' exists.
Maybe openstack registrations take time be registered everywhere.
  



[root@dune-vm-build-01 /]# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-bus-proxy:x:999:997:systemd Bus Proxy:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:998:996:User for polkitd:/:/sbin/nologin
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
puppet:x:52:52:Puppet:/var/lib/puppet:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
chrony:x:997:995::/var/lib/chrony:/sbin/nologin
colord:x:996:993:User for colord:/var/lib/colord:/sbin/nologin
dostefan:x:51643:2835:Dorota Stefan,9999 0-000,,:/afs/cern.ch/user/d/dostefan:/bin/bash
dsavage:x:98358:2841:Geoff Savage,892 2-B15,,:/afs/cern.ch/user/d/dsavage:/bin/bash
espinal:x:3808:1307:Xavier Espinal Curull,31 1-014,+41227663533,:/afs/cern.ch/user/e/espinal:/bin/bash
hennessy:x:21340:2841:Karol Hennessy,22 1-011,+41227674834,:/afs/cern.ch/user/h/hennessy:/bin/bash
mpotekhi:x:28506:1307:Maxim Potekhin,892 2-D23,+41227676509,:/afs/cern.ch/user/m/mpotekhi:/bin/zsh
pordes:x:8044:2835:Ruth Pordes,892 2-B20,,:/afs/cern.ch/user/p/pordes:/bin/tcsh
rosulej:x:71307:2841:Robert Sulej,594 R-026,+41227677556,:/afs/cern.ch/user/r/rosulej:/bin/bash
mxp:x:51644:51644::/home/mxp:/bin/bash
saslauth:x:995:76:Saslauthd user:/run/saslauthd:/sbin/nologin
mailnull:x:47:47::/var/spool/mqueue:/sbin/nologin
smmsp:x:51:51::/var/spool/mqueue:/sbin/nologin
cvmfs:x:994:992:CernVM-FS service account:/var/lib/cvmfs:/sbin/nologin

-- DavidGeoffreySavage - 2017-10-26

Edit | Attach | Watch | Print version | History: r14 | r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r7 - 2018-07-20 - DavidGeoffreySavage
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback