Date | Service | Summary | Description | C5 Worthy ![]() |
---|---|---|---|---|
21/02/2012 | AI | Attended AI Meeting | ||
21/02/2012 | AI | Attended AI Meeting | ||
21/02/2012 | ITTF | Prepring VoIP session 30 March | Discussed with Rodrigo Sierra. Title may have to be changed. |
|
20/02/2012 | Schedule MAPS | Agreed with Massimo and Manuel to have it this week.Asked Massimo for a little time to prepare. | ||
20/02/2012 | CPCD | Proposed to the list of presidents of the CPCD (Commission paritaire consultative de Discipline). |
Discussed with Manuel. Would like to discuss with Helge. |
|
20/02/2012 | LFC | bdii of prod-lfc-shared-central down | looked with Manuel and Ulrich. Trick described in https://svnweb.cern.ch/trac/gservices/ticket/17 ![]() |
|
20/02/2012 | LCG | Attended LCG Ops meeting. | Replacing Alex, replacing Philippe. | |
20/02/2012 | CASTOR | Looked into permissions for DB access. | Requested by Xavier on the framework of the investigation of checkreplicas not working due to DB schema changes in CASTOR 2.1.12 | |
20/02/2012 | Back from holiday last week. | Processing mail backlog. | ||
21/02/2012 | ITTF | Announcement of ITTF 24 Feb Agile Infrastructure: Monitoring |
Verified that IT Auditorium as bad as usual. Mail to it-dep |
|
21/02/2012 | batch | Updated https://twiki.cern.ch/twiki/bin/view/Batch/QueuePlanning | to include the things done for the ATLAS CAT resource rationalisation as requested by Manuel. |
|
21/02/2012 | Access Request renewal for the 513 computer room | https://edh.cern.ch/Document/General/ACRQ/4878391![]() |
||
21/02/2012 | batch | Answered INC106437 | problem running OpenFOAM with miltiple machines on -R "select[(eng)]" on AFS | |
21/02/2012 | Started preparing my MARS | |||
22/02/2012 | batch | Discussed with Ulrich cleanup of processes and/or possible campaign of node draining + restart | ||
22/02/2012 | Working on my MARS | To happen Fri afternoon 14:30 with Manuel around 17:00 (after Arne) with Massimo. |
||
23/02/2012 | Attended IT-PES-PS section meeting | |||
23/02/2012 | Still worked on my MARS text. | |||
23/02/2012 | batch | Request from sysadmins for the retirement of the machines which are in encra6501 enclosure (rack 65) |
Followed up with Ulrich and Eric. Answered that waiting to verify draining procedures with Ricardo and Gavin. | |
23/02/2012 | AI | Attended AI Sprint Meeting | Node importance, service criticality, HW homogeneity and KVM. | |
24/02/2012 | ITTF | ITTF session on Agile Infrastructure: Monitoring |
by Markus Schulz and Pedro Andrade | |
24/02/2012 | MARS interview with Manuel | Mainly objectives | ||
27/02/2012 | ITTF | Contacted Tim by mail for SNOW Usage Experience ITTF session | As discussed on Friday. He appointed Bruno Lenski. |
|
27/02/2012 | ITTF | Contacted Alistair Bland by mail for for ITTF session on BE/CO computing activities. | As discussed on Friday. | |
27/02/2012 | CPCD | Confirmed to Sigrid and Wisla that OK for the list of presidents of the CPCD (Commission paritaire consultative de Discipline). |
||
27/02/2012 | MARS interview with Massimo | Mainly results of last year. | ||
27/02/2012 | CASTOR | Made Massimo owner of castor-monitoring and castor-monitoring-admins | On request of Jan Iven. | |
27/02/2012 | CASTOR | Prompted by PHP version question for compass02 and compass22 that are not from SWREP, pointed to documentation for vobox fileservers | https://savannah.cern.ch/task/?16223![]() https://savannah.cern.ch/task/?23045 https://savannah.cern.ch/task/?23222 https://savannah.cern.ch/task/?19559 |
|
27/02/2012 | batch | Closed again INC106437 after answer from Nils. | Although no batch issue this is a hasle. I should have given the ticket to Nils or to the AFS guys from the beginning. | |
27/02/2012 | CASTOR | Fixed useracess configuration of vona4801 and vona4802 that was affected by the problem described at the end of https://twiki.cern.ch/twiki/bin/view/ELFms/ELFmsZuulSLC5 | Also told Emmanuele Leonardi that not dealing with these boxes anymore so he should contact Massimo. | |
28/02/2012 | ITTF | Contacted Vito Baggiolini by mail for ITTF session on computing in BE/CO. Mentioned this to Nils who is running a coordination meeting with them. | Fixed a date and created event in Indico in https://indico.cern.ch/conferenceDisplay.py?confId=180286![]() |
|
28/02/2012 | ITTF | Proposed the date of 20 April for the session on Experiences Using Service-Now in IT. | Created event in Indico in https://indico.cern.ch/conferenceDisplay.py?confId=180141![]() |
|
28/02/2012 | Attended PES group meeting. | With presentation from Alberto Pace on Starage Strategy. | ||
28/02/2012 | LFC batch | Attended PES-GT meeting. | Agreed to revive LFC pps node. Main information is that EMI WN is not compatible with lcg libs so should be avoided for SLC5. I we want ARGUS we should go for a newer LCG release. |
|
28/02/2012 | Message Brokers | Answered question from Lionel about not being able to (re)install. | You have to change the driver for the network interface from 'synthetic' to 'emulated' for PXE to work. 'emulated' is required for installation with PXE but 'synthetic' is fater and better for normall operation. You can change it with something like /afs/cern.ch/user/v/vmmaster/bin/vmtool nic lxxxx --type=emulated Pls note that this implies a reboot of the VM. More info in https://twiki.cern.ch/twiki/bin/view/PESgroup/VirtualMachineCreation |
|
28/02/2012 | Message Brokers | Lionel still complained that no way to check the network drivers and that he has no access to manage the VM. | I to check with Alex for the display question. Sugested to check with Manuel for the acces question. | |
29/02/2012 | Still had to work on my MARS text with input from Massimo. | Fed the text back to Massimo. | ||
29/02/2012 | ITTF | Discussed with Waine on the possible use of the IT Auditorium for the coming iTTF sessions. He thought that we should not count on it. | Proposed to discuss with Tony opening the corridor doors in 513 1-024 to mitigate the risk. | |
01/03/2012 | LFC | Worked on bdii startup as I thought that its failure could affect yaim. | Found that it is selinux (configured in enforcing mode) that prvents the startup by not allowing slapd to bind to port 2170 | |
01/03/2012 | LFC | Investigating https://ggus.eu/ws/ticket_info.php?ticket=78770![]() |
Trying to find out why we seem to publish wrong data for the LFC in the BDII. | |
01/03/2012 | LFC | Looked into https://ggus.eu/ws/ticket_info.php?ticket=77026![]() |
||
01/03/2012 | Message Brokers | As suggested by Helge called a meeting about the Message Brokers to signal that I will take over their operation. | Invited Pedro as monitoring person and later also Lionel and Massimo. | |
01/03/2012 | Discussed with Tony on using the IT Auditorium for the coming sessions of the ITTF. | Tony suggested to contact Frederic to ask for his advice. | ||
02/03/2012 | CASTOR | Discussed with Massimo why a user with group def-cg does not get a CASTOR home directory. | Basically you need a "resource" group that you can map to an organisational unit and def-cg is not. | |
01/03/2012 | Got answer to RQF0072607 : Error. NoAccess Opened because Massimo cannot access my MARS. |
Send to Massimo printout of current MARS contents. | ||
02/03/2012 | AI | After discussion in the mornign meeting pointed Tomas to the LDAP query for batch egroup | The thingy that recursively queries LDAP for group membership and fills a sticky cache is in https://svnweb.cern.ch/cern/wsvn/batchinter/trunk/batch/CERN-CC-LSF/scripts/loadPwentCache.py This we run in a daily cron table. The thingy that uses the sticky cache is in https://svnweb.cern.ch/cern/wsvn/batchinter/trunk/batch/CERN-CC-LSF/scripts/egroup |
|
02/03/2012 | LFC | As discussed with Manuel yesterday, submited request RQF0073296 for a VM to test and debug LFC deployment. | LFCTEST. | |
02/03/2012 | LFC | Pursued the problem of wrong info in bdii for CERN LFCs | I found that the yaim function /opt/glite/yaim/functions/config_gip_lfc provided by glite-yaim-lfc-4.1.1-1 was failing because the INSTALL_ROOT variable was not passed so it was not generating/updating the info provider in /opt/glite/etc/gip/provider/glite-lfc-provider. On the other hand I found that some nodes already had /opt/glite/etc/gip/provider/glite-lfc-provider generated by other means (hand ?) with different params. I proposed to patch myself the yaim function and generate/update the info provider in all lfc production nodes. |
|
05/03/2012 | ITTF | Discussed with Mats on session on Service Management, | ||
05/03/2012 | On rota this week. | |||
05/03/2012 | LFC | Pursuing the problem of wrong info in bdii for CERN LFCs, investigated errors when running yaim in production LFC nodes. | I realized that the voms errors, no matter how bad they look, do not seem to be fatal for yaim. The only fatal error was actually the one due to config_gip_only not defined anymore but referenced in from /opt/glite/yaim/node-info.d/glite-lfc_oracle. So the voms errors not being fatal help explain why nobody cared to clean the gridgroups entries for lcgadmin in prod/components/yaim_usersconf/defaults.tpl. |
|
05/03/2012 | Message Brokers | Meeting about the Message Brokers signaling that I start working on them. | Notes from Pedro in https://twiki.cern.ch/twiki/pub/AgileInfrastructure/AgileInfraDocsMinutes/AI_monitoring_messaging_5th_March_2012.txt | |
05/03/2012 | CASTOR | Followed up on alarm from TSM admin that a backup tape for VOHARP01 has been lost. | Started an incremental backup. Also pointed out that from now on Massimo will take care of supporting VOHARP01. |
|
06/03/2012 | Doing tickets for the support rota | INC110495: myproxy registration request for crab3dev.cern.ch INC110324: lfc_noread lfc_nowrite on lfcshared01 INC110471: lfcatlas01 lfc_noread INC109285: No access to lxbsp0501 and lxbsp0502 INC110557:LSF js on lxbsu2014.cern.ch: LFS js: no AFS token INC110845: GGUS-Ticket-ID: #79939 Ticket "please add new authorized renewer to myproxy configuration" |
||
06/03/2012 | ITTF | Mail discussion with Tony, Wayne and Frederic on what to do with ITTF sessions when the IT Auditorium is off. | Ignacio, Unfortunately it is a little more complicated. The back door is closed as opening it causes problems for people evacuating from the other rooms along that corridor, all the more so as there is no window in the door so you might open it into somebody. Although somewhere bigger but less convenient is clearly the best (PS Auditorium?), I would not be against using 024 with a clear mention of the evacuation issue at the beginning of each meeting-an (unannounced) evacuation exercise was explicitly organised during a post-C5 in the past to see how people would evacuate this room and things were calm. Thinking further, we could have the rear doors unlocked and have someone explicitly identified as being responsible to open the door carefully and manage evacuation via that route. Cheers, Tony -----Original Message----- From: Ignacio Reguero Sent: 06 March 2012 09:24 To: Wayne Salter Cc: Ignacio Reguero; Frederic Hemmer; Tony Cass Subject: Re: IT Auditorium Hi Wayne, Frederic and Tony, So the IT Auditorium is actually off and I plan to move all coming ITTF sessions from the IT Auditorium to 513 1-024. Could we do anything to mitigate the safety risk when having attendance higher than the nominal capacity of the room (open the back door)? Should we consider other (bigger but less convenient) venues on the site ? Thanks & cheers ...Ignacio... On Mon, 5 Mar 2012, Wayne Salter wrote: > Hi Ignacio, > > The official situation (which we learnt fromGS on Friday afternoon) is > that there is no heating, cooling or air renewal in the amphitheatre > until July at the earliest. Hence, it is not possible to use this room > for large meetings. It is debatable whether small meeting could be > held under these circumstances but in any case other rooms for smaller meetings exist. |
|
07/03/2012 | LFC | Pursuing the LFC bdii problem. | ANswered GGUS tickets 78770 and 77026 corresponding to INC111102: BDII and INC099445 : CERN LFC nodes are badly published in the BDII | |
06/03/2012 | LCG Ops meeting | Followed up report from CMS of flakeyness in SLS status dispay for lxbatch. | Monitoring problem found and solved by Steve and Gavin. Due to overload of lxplus | |
07/03/2012 | Doing tickets for the support rota | INC111042: Cannot kill hanging jobs INC110874 : lfcatlas03 lfc_nowrite lfc_noread INC110487: no space on execution hosts |
||
07/03/2012 | ITTF | Announced session on Storage Strategy for next Friday. | Prompted by Rainer, checked that bld 30 auditorium not available. Also checking with the Vidyo people. |
|
08/03/2012 | Doing tickets for the support rota | INC111319: Problem with lxbatch Follow on to INC109285 : No access to lxbsp0501 and lxbsp0502 |
||
08/03/2012 | Pointed out that Bitorrent for ALICE which is a known legitimate cases. Followed up SPAM from Computer Security people. |
LXBSU1504: Policy violation detected | ||
08/03/2012 | Ticket Review + short e-group/ldap resolution discussion with Alex | Described in detail what we did for the LSF egroup. | ||
08/03/2012 | Attended section meeting. | https://indico.cern.ch/getFile.py/access?resId=minutes&materialId=minutes&confId=180534![]() |
||
08/03/2012 | batch | Attended meeting to discuss batch urgent stuff. | ||
09/03/2012 | Again, pointed out that Bitorrent for ALICE which is a known legitimate cases. Followed up SPAM from Computer Security people. |
LXBSU1345: Policy violation detected | ||
09/03/2012 | ITTF | IT Technical Forum about “Storage Strategy” | Announcements. Discussion with Tim Smith on Vidyo setup. |
|
09/03/2012 | Doing tickets for the support rota | Follow on INC111291: nfsnobody user and group id's | ||
12/03/2012 | Doing Vm creation tickets for the support rota of the week before | RQF0073296: Test machine for LFC deployment RQF0073305: VMs for Jira cluster RQF0076469: VM for tendering document project Had to deal with errors due to inconsistent configuration for useraccess in boinc cluster + zillions of [ERROR] cannot release lock file: /var/lock/quattor/ncm-ncd Also had to deal with PrepareInstall failing with SINDES error in jira cluster due to missing files. |
||
13/03/2012 | batch | Created usertest/batch namespace as well as 'stages/usertest/batch' template. | To provide a standard CDB area for batch tests. | |
13/03/2012 | Message Brokers | Three mails trying to find out out to find out how to report problem in hwcollect seen by Lionel on the Message Brokers. | ||
13/03/2012 | batch | Ulrich kindly created an updated version of the script to get the RPM list for the glite release that uses a new URL. The script is in /afs/cern.ch/group/c3/tools/bin/CDB_create-glite-templates.new. |
He ran the script to generate prod/cluster/lxbatch/glite_3_2_5-1_glite-glexec_wn.tpl and prod/cluster/lxbatch/glite_3_2_12-1_glite-wn.tpl i.e. for the versions recommended in https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions |
|
14/03/2012 | batch | I made usertest/batch/cluster/lxbatch/glite-GLEXEC-wn-x86_64.tpl pointing to the baseline versions and modified profiles/profile_vm64slc5test.tpl as a "normal" batch worker in 'usertest/batch' |
It commited OK, however SPMA gave 27 dependency problems. |
|
14/03/2012 | ITTF | Sent reminder for Stefan Lueders session. | I would like to invite you to the IT Technical Forum about “Squaring the Circle: Reflections on Identities, Authentication & Authorization at CERN” | |
14/03/2012 | batch | Worked on on BI-550: Upgrade batch to gLite-WN 3.2.5-1 until agreed to leave worker node at 3.2.1 for now. |
Until cofirmed clash of glite and EMI rpms in SWREP: You can confirm the fuckup by doing something like the following and comparing the outputs. # rpm -q -requires -p http://glitesoft.cern.ch/EGEE/gLite/R3.2/glite-WN/sl5/x86_64/RPMS.updates/lcgdm-libs-1.8.2-3sec.sl5.x86_64.rpm # rpm -q -requires -p http://swrep/swrep/x86_64_slc5/lcgdm-libs-1.8.2-3sec.sl5.x86_64.rpm ![]() # rpm -q -requires -p http://swrep/swrep/x86_64_slc5/CGSI_gSOAP_2.7-1.3.4-2.sl5.x86_64.rpm ![]() # rpm -q -requires -p http://glitesoft.cern.ch/EGEE/gLite/R3.2/glite-WN/sl5/x86_64/RPMS.updates/CGSI_gSOAP_2.7-1.3.4-2.sl5.x86_64.rpm |
|
15/03/2012 | ITTF | Discussion with Tim Smith on Vidyo setup for ITTF. | Also installed Vidyo Android client. | |
15/03/2012 | batch | For BI-549: usertest/batch Quattor area for small-scale tests, across multiple host types | moved to usertest/batch 1 VM: vm64slc5test and 2 real machines: lxbst1338(vi_10_21) and lxbsu1523(e4_10_20). | |
15/03/2012 | batch | (BI-544) Upgrade batch to SLC5.8 once its available | Tried ELFMS_OSDATE in usertest/batch to be current latest 20120309. Found errors in commit that Steve sorted out. |
|
15/03/2012 | batch | Followed up on the 'Cannot kill hanging jobs' thread triggered by INC111042. |
Let me add something, in this case the jobs were really stuck so kill -9 in the batch node would not do anything to them. I this case, according to the man: If the job cannot be killed, use bkill -r to remove the job from the LSF system without waiting for the job to terminate, and free the resources of the job. So we would leave the stuck processes behind which is not what we want. The trick to kill the processes pointed out by Ricardo, is to try to attach them for strace. I think that somehow, the signals from strace (SIGTRAP and maybe SIGSTOP) allow the SIGKILL to be processed. If the problem recurs, We would rather need to understand better how the processes managed to get into this stuck state and how to release them. |
|
15/03/2012 | I realized that removing it-dep-fio-smod-alarm is NOT OK because it is included by all the egroups so that SMSs are sent to the people there which includes both CASTOR and Batch people. I mean things like atlas-operator-alarm@cern.ch. |
This works because the it-dep-fio-smod-alarm members have the format the SMS gateway. |
||
16/03/2012 | ITTF | IT Technical Forum session titled “Squaring the Circle: Reflections on Identities, Authentication & Authorization at CERN”. The speaker is Stefan Lueders. |
Announcement, Vidyo setup with Tim, etc. | |
16/03/2012 | Message Brokers | Opened INC114036 : /usr/bin/hwcollect crashes often in SLC6 | hwcollect crashes reported by Lionel. | |
19/03/2012 | Took day off. | Kid sick. | ||
19/03/2012 | batch | Answered a couple of support questions in the saga of Job 225473233: |
||
19/03/2012 | batch | worked on (BI-544) Upgrade batch to SLC5.8 once its available | I tried to set ELFMS_OSDATE in usertest/batch to be current latest 20120316. The commit worked but SPMA failed with depcheck: package nfs-utils-lib 1.0.8-7.9.el5 needs nfs-utils >= 1.0.9-45 This is because nfs-utils and nfs-utils-lib are included by prod/os/x86_64_slc5/rpms/20120316/base.tpl but nfs-utils is deleted in prod/cluster/lxbatch/config.tpl. Either both nfs-utils and nfs-utils-lib should be deleted (or none). |
|
20/03/2012 | batch | Prompted by Gavin, checked that LSF masters OK if we remove NFS RPMs from the batch workers. | I checked and the autofs RPM does not depend on the NFS ones, however you do need the NFS stuff to do the mounts when you use automount with NFS as we do on the masters: [root@lxmaster20 ~]# mount |grep nfs sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) caenas01:/vol/LSF/batch/work on /usr/local/lsf/nfs/work type nfs (rw,fstype,nfs3,hard,addr=137.138.144.196) In any case we seem OK for the masters as I see that they do not include prod/cluster/lxbatch/config.tpl so they get the NFS rpms from the os rpms base template (prod/os/x86_64_slc5/rpms/20111111/base.tpl). |
|
20/03/2012 | batch | (BI-544) Upgrade batch to SLC5.8 once its available: test in usertest/castor | First applied the (NFS free) update to the machines in usertest/batch. |
|
20/03/2012 | batch | (BI-544) Upgrade batch to SLC5.8 once its available: announcement | Submitted announcement to IT Service Status Board in https://itssb.web.cern.ch/service-change/upgrade-system-rpms-lxbatch-nodes/20-03-2012 |
|
20/03/2012 | batch | (BI-544) Upgrade batch to SLC5.8 once its available: changed config of preprod nodes | <cdbop@cdbserv.cern.ch: ~/cdbfiles> commit [INFO] '/preprod/cluster/lxbatch/osdateversion': will be updated [INFO] '/prod/cluster/lxbatch/config': will be updated please confirm [yes]: Last comment: Remove WN update stuff Press [Enter] to confirm the last comment or enter a new one. Comment: Update ELFMS_OSDATE to "20120316" in lxbatch preprod [INFO] please wait... [INFO] commit OK |
|
20/03/2012 | batch | (BI-544) Upgrade batch to SLC5.8 once its available: deployed with spma after testing on a couple of nodes. | Did nc-client --cluster lxbatch --stage preprod --tag spma The spma went OK. |
|
20/03/2012 | batch | Discussed system level being deployed with Linux Supporters | > > We are deploying ELFMS_OSDATE "20120316" to preprod for batch. > Does this correspond to SLC5.8 ? Yes, it should: integrated release was prepared 15th of March, so 16th snapshot is at least as up to date as it (in reality it is little bit more up to date since packages for integrated 5.8 were gathered ~ 10th of March: but this shall not matter) Cheers Jarek |
|
20/03/2012 | ITTF | IT Technical Forum about “New Computing Centre” next Friday |
Sent announcement. | |
20/03/2012 | Message Brokers | Anwered question from Lionel about sysctl -p giving the following error error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key error: "net.bridge.bridge-nf-call-iptables" is an unknown key error: "net.bridge.bridge-nf-call-arptables" is an unknown key in the SLC6 machines in mb/agileinf. |
The problem is described in https://bugzilla.redhat.com/show_bug.cgi?id=512206 and https://bugzilla.redhat.com/show_bug.cgi?id=639821 To summarize - These entries in /etc/sysctl.conf prevent bridged traffic getting pushed through the host's iptables rules. - The keys only become valid after "bridge.ko" is "insmoded". (Bridges seem to be instantiated by libvirt on demand). - The official solution is to use the '-e' option of sysctl to ignore errors about unknown keys. - If the machine does not need to bridge we could get rid of the entries in /etc/sysctl.conf. As matter of fact /etc/sysctl.conf comes initially from initscripts-9.03.27-1.el6_2.1.x86_64. It is controlled by the sysctl NCM component, however the component just applies the keys defined in CDB to current /etc/sysctl.conf that is taken as a template. |
|
21/03/2012 | LFC | Solved INC113177: Problem on the machine lfclhcbrw02 | While the BDII was running fine (I have fixed it no long ago), the lemon-agent was not working properly: [root@lfclhcbrw02 ~]# lemon-host-check --show-all [INFO] lemon-host-check version 1.3.6 started by root at Wed Mar 21 11:24:43 2012 on lfclhcbrw02.cern.ch [ERROR] The following required sensors are not running: [ERROR] gridlfc The gridlfc lemon sensor was crashing due to a wrong /etc/ld.so.conf that was prventing the LFC client commands to work.Fixed it. |
|
21/03/2012 | Message Brokers | Answered Lionel questions about CERN-CC-hardwaretools and abrt problems | Pointed out that the abrt problems in SLC6 are because dumps from unsigned RPMs are rejected unless you set 'OpenGPGCheck = no' in /etc/abrt/abrt.conf. We could set it in the mb/agileinf boxes and see how it goes. More details in https://bugzilla.redhat.com/show_bug.cgi?id=699152 ![]() |
|
21/03/2012 | batch | contributed to (BI-562) Disable netlog on batch for 5.8 upgrade | I saw that 'services/netlog/config' is included for non SLC6 nodes from both prod/cluster/lxplus/config.tpl and prod/cluster/lxbatch/config.tpl so what Steve put into preprod/services/netlog/config.tpl should be OK for lxbatch as well. We have to note that for this to go in place we have to do an ncm-ncd run or at least run 'ncm_wrapper.sh filecopy'. As we did not do it yesterday for the lxbatch preprod nodes we should eventually do it. |
|
21/03/2012 | CASTOR | Chat with Luca from the CASTOR/EOS team about diskPoolDump features and tricks. | ||
22/03/2012 | Message Brokers | Read the instructions for Lionel's machinery to generate the config through RPMs. | In /afs/cern.ch/project/tom/mbcg/README | |
22/03/2012 | Attended IT-PES-PS section meeting. |
Service | Description | Impact/Risk | ||
---|---|---|---|---|
ANY | O | ANY | End of the year... |
# getent passwd |awk -F\: '{if ($3 < 1000 && $3 > 101) print $0}' |sort -t ':' -k 3 > /tmp/passwd.sort
on a batch box. I guess that for RedHat based tools, we are concerned by the ones <=500. They are 216.
/afs/cern.ch/user/l/lsfadmin/scripts/CloudFactory
errors in lxadm.
tcpdump -vvv -X -w /tmp/tcpdump.ldap.out host xldap.cern.ch
when doing getent group zp
in SLC5 and SLC6 and disabling nscd caching. This was to display that SLC6 is actually using the range protocol.
lfc_getpath
method. I guess that this should be eventually reported to the LFC developers. On the other hand, googling arond I did find the example called lfc-getreplica-data-1.2.py
that uses lfc_getpath
. I have tested that works OK in my environment by doing python ./lfc-getreplica-data-1.2.py srm-public.cern.ch --lfn
.
INC101988 'dpm package on some tier-0 nodes causing problems'
we realized that the problems are due to clashes of the dpm packages required by the emi-wn-1.0.0-0.sl5
. In particular dpm(-1.8.2-3sec.sl5)
RPM contains all the rf* commands clashing (actually overwritting) the ones in castor-rfio-client(-2.1.9-3)
. This seems to be what breaks CMS jobs. So Ulrich has produced dummydpm-0.0.1-1
to fulfill the dependencies of emi-wn-1.0.0-0.sl5
. I have I have reported the problem to linux.support, I have also fixed the template prod/cluster/lxbatch/emi_1_0_0-wn.tpl to use it instead of the dpm RPMs and I have deployed this to the lxbatch nodes with --stage preprod by doing SPMA, followed by 'rpm -e castor-rfio-client-2.1.9-3'
followed by another SPMA. This is required to have the right rf* commands from the castor-rfio-client-2.1.9-3
rpm. I also answered the ticket and commited dummydpm in the batchinter SVN rep.
[root@lxadm10 ~]# tcpdump -vvv -X -s 0 -w /tmp/tcpdump.ldap.s0.out host 137.138.240.49 or 137.138.142.25 or 137.138.144.149 or 137.138.145.178 or 137.138.145.182 or 137.138.240.48
Quantum
in http://robhirschfeld.com/2012/02/08/quantum-network-virtualization-in-the-openstack-essex-release-2/Squaring the Circle: eflections on Identities, Authentication & Authorization at CERN
and prepared Indico Event https://indico.cern.ch/conferenceDisplay.py?confId=177625/etc/ldap.conf
to comment out #tls_checkpeer true
and replace ssl start_tls
with ssl no
. As matter of fact the SLC6 configuration has TLS encription disabled. IT should use ldaps:
instead of ldap:
in the URI.
Service | Description | Impact/Risk | ||
---|---|---|---|---|
ANY | O | ANY | End of the year... |
fileserver
VO boxes.
Service | Description | Impact/Risk | ||
---|---|---|---|---|
ANY | O | ANY | End of the year... |
/usr/bin/lsf_cron_auto_reconfig -C batch -V 7.0 -t -p 300
in lxmaster20.
bugroup
is consistent with /usr/lsf/etc/egroup
after the deployment.
DiggiChristmas
limit that is set to 0 slots with blimits -w -n DiggiChristmas
after the reconfiguration and informed Alessandro from ATLAS.
sms set maintenance other 'DB upgrade' lfclhcbro01 lfclhcbro02 lfclhcbro03 lfclhcbrw01 lfclhcbrw02 lfclhcbrw03; wassh -l root lfclhcbro01,lfclhcbro02,lfclhcbro03,lfclhcbrw01,lfclhcbrw02,lfclhcbrw03 service lfcdaemon stop
. Verified that LFC logs OK. Realized and reported that one nodes (lfclhcbrw02) is out of production since end of November. Reported it to the lfc-operations list.
best_hosts = 3
to best_hosts = 2
, restarted the LB server and notified Serguei.
prod/site/cern_cc/rpms/addons/slc6/xrootd-clients.tpl
) while it is put in EPEL by the CERN xroot developers (Lukasz Janyst).