INSPIRE on the Agile Infrastructure
INSPIRE is currently running on VMs (running on
http://openstack.cern.ch/
) and HWs machines. All machines are controlled via Puppet whose web interface is hosted at
http://judy.cern.ch/
. See:
How to create a new VM
- Upload to your AFS home the file https://openstack.cern.ch/dashboard/project/access_and_security/api_access/openrc/
(once you have selected the GS Inspire project in OpenStack)
-
$ ssh aiadm
-
aiadm $ eval $(ai-rc "GS Inspire") # or "GS Inspire critical power" or --same-project-as inspireXX
-
aiadm $ nova flavor-list
+----+------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+----+------------+-----------+------+-----------+------+-------+-------------+-----------+
| 1 | m1.tiny | 512 | 0 | 0 | | 1 | 1.0 | True |
| 2 | m1.small | 2048 | 20 | 0 | | 1 | 1.0 | True |
| 20 | hep2.1 | 2048 | 70 | 20 | | 1 | 1.0 | False |
| 21 | hep2.2 | 4096 | 70 | 40 | | 2 | 1.0 | False |
| 22 | hep2.4 | 8192 | 70 | 80 | | 4 | 1.0 | False |
| 23 | hep2.8 | 16000 | 70 | 160 | | 8 | 1.0 | False |
| 3 | m1.medium | 4096 | 40 | 0 | | 2 | 1.0 | True |
| 4 | m1.large | 8192 | 80 | 0 | | 4 | 1.0 | True |
| 50 | win.small | 2048 | 60 | 0 | | 1 | 1.0 | True |
| 51 | win.medium | 4096 | 80 | 0 | | 2 | 1.0 | True |
| 52 | win.large | 8192 | 120 | 0 | | 4 | 1.0 | True |
+----+------------+-----------+------+-----------+------+-------+-------------+-----------+
aiadm $ nova image-list
+--------------------------------------+-------------------------------------------+--------+--------+
| ID | Name | Status | Server |
+--------------------------------------+-------------------------------------------+--------+--------+
| 76aacd30-5a23-4df4-91d4-b4a96a9b7638 | SLC5 CERN Server - i386 [130920] | ACTIVE | |
| e3496dfa-11a7-496c-a634-107d3d10b22a | SLC5 CERN Server - i386 [2014-01-30] | ACTIVE | |
| 4e1c1875-3b9f-48fc-b43b-7233f450800b | SLC5 CERN Server - i386 [2014-08-05] | ACTIVE | |
| 8c234ca2-ec89-4e7a-9733-9a228c401571 | SLC5 CERN Server - x86_64 [130920] | ACTIVE | |
| 8ba9f996-4399-4dbb-93ee-98821d74f7a1 | SLC5 CERN Server - x86_64 [2014-01-30] | ACTIVE | |
| 63cc5d34-b892-4801-81bf-56c66ff38000 | SLC5 CERN Server - x86_64 [2014-08-05] | ACTIVE | |
| cd233204-96d2-41a4-ab2f-09d3b1954404 | SLC5 Server - i386 [130624] | ACTIVE | |
| a27962b7-e44e-4363-970b-fd4f8ec1eec5 | SLC5 Server - i386 [130920] | ACTIVE | |
| d1285114-9c39-467f-8d6b-487b10fbaf90 | SLC5 Server - i386 [2014-01-30] | ACTIVE | |
| e32bed58-b2b2-4a6d-b9ba-7e9db2e3e5a6 | SLC5 Server - i386 [2014-08-05] | ACTIVE | |
| 690be388-2e8e-4498-9c1f-7c4eac862260 | SLC5 Server - x86_64 [130624] | ACTIVE | |
| 41992b34-19e9-4ea9-ad30-177233795732 | SLC5 Server - x86_64 [130920] | ACTIVE | |
| 0d2c81c6-488d-42e6-8d30-8bcc5cdffa58 | SLC5 Server - x86_64 [2014-01-30] | ACTIVE | |
| ccb6749f-f740-4432-85d9-65e7857ed7c7 | SLC5 Server - x86_64 [2014-08-05] | ACTIVE | |
| 764434ef-47a9-4345-befb-2b0479a346c5 | SLC6 CERN Server - i386 [130920] | ACTIVE | |
| 4d9a71b8-92e4-446e-9939-21f3a7e99211 | SLC6 CERN Server - i686 [2014-01-30] | ACTIVE | |
| 5b957b5b-b220-426b-b217-eb50d9f472ad | SLC6 CERN Server - i686 [2014-08-05] | ACTIVE | |
| 2171bb6e-6404-44e9-8cbd-8c6f6bacce1c | SLC6 CERN Server - x86_64 [130920] | ACTIVE | |
| 98686db8-834d-4cf5-bfe3-4bc09513682a | SLC6 CERN Server - x86_64 [2014-01-30] | ACTIVE | |
| 13ec4721-3f9b-4480-a29e-ccfd897120d7 | SLC6 CERN Server - x86_64 [2014-08-05] | ACTIVE | |
| 49e166bb-68e1-4969-b26a-64023e87ef28 | SLC6 Server - i386 [130624] | ACTIVE | |
| eac5a399-d1c5-43a4-928f-3bbbba7f7cf7 | SLC6 Server - i386 [130920] | ACTIVE | |
| ab2fd0fa-ae7b-4a29-a9fa-57c5c5baf6da | SLC6 Server - i686 [2014-01-30] | ACTIVE | |
| e05c34f7-afcc-4c69-985e-6d9c75011723 | SLC6 Server - i686 [2014-08-05] | ACTIVE | |
| b8018173-fdfc-442c-9337-612fc702652a | SLC6 Server - x86_64 [130624] | ACTIVE | |
| 78deafa9-93a7-41d9-9afb-8c62e29e4259 | SLC6 Server - x86_64 [130920] | ACTIVE | |
| 321b8583-967f-4f56-913e-2a10e058ff37 | SLC6 Server - x86_64 [2014-01-30] | ACTIVE | |
| d1cb4dce-7a03-4342-a6c1-9677ecb8770d | SLC6 Server - x86_64 [2014-08-05] | ACTIVE | |
| 5514d635-22f8-4cc8-8550-4d831920a6d4 | Ubuntu 13.10 cloud image | ACTIVE | |
| 4717a8fa-6980-4b33-b27d-1526db467749 | Windows 7 - x64 [130924] | ACTIVE | |
| b51918ba-8bf7-421e-a1a6-cee78928cbc9 | Windows 7 - x64 [131213] | ACTIVE | |
| e9d9fe68-a977-470a-bb25-e18f7e7222ca | Windows 7 - x64 [2014-06-23] | ACTIVE | |
| dac59475-e195-4e5a-b962-f599d45c893f | Windows 8.1 - x64 [2014-04-17] (Pilot) | ACTIVE | |
| 091a87b6-5882-42cf-9de3-d049281b51e8 | Windows Server 2008 R2 - x64 [130904] | ACTIVE | |
| 6be8397d-264f-4804-a7a9-e83488f6ee9a | Windows Server 2008 R2 - x64 [140116] | ACTIVE | |
| 569370f9-c915-4a74-822b-46be7e3330c3 | Windows Server 2008 R2 - x64 [2014-04-17] | ACTIVE | |
| ea4179a9-cc5f-40ce-b700-92e1fee13a44 | Windows Server 2012 R2 - x64 [2014-01-29] | ACTIVE | |
| a5758c5d-8487-4835-a47b-535cd5a0d815 | Windows Server 2012 R2 - x64 [2014-04-17] | ACTIVE | |
+--------------------------------------+-------------------------------------------+--------+--------+
aiadm $ ai-bs-vm -i "SLC6 Server - x86_64 [2014-08-05]" --nova-flavor=hep2.8 --foreman-environment=inspire_devel -g "inspire/wn" --landb-responsible=inspire-admin --landb-mainuser=inspire-admin inspirevm123
Note that
hep2.8 is currently the largest available flavor. The machine name is the last parameter. Note:
do not use SLC6 CERN Server images. These will conflict with puppet.
How to properly upgrade a machine
Typically machines are automatically updated thanks to
distro-sync:
http://information-technology.web.cern.ch/book/cern-configuration-management-system-user-guide/faq/upgrade-hostgroup-latest-existing-os
Sometimes however the system fails to update packages: this is due to yum not able to upgrade the current kernel properly:
https://cern.service-now.com/service-portal/article.do?n=KB0001959&s=yum%20kernel
You can recognize this by the message:
* ********************************************************************
* Welcome to p05153026581150.cern.ch, SLC, 6.5
* Archive of news is available in /etc/motd-archive
* Reminder: You have agreed to comply with the CERN computing rules
* http://cern.ch/ComputingRules
* Puppet environment: production
* Puppet hostgroup: inspire/wn
* Node alarmed with LAS? true
* Please set a host or hostgroup parameter 'comment' to describe your host or hostgroup.
* * WARNING, p05153026581150.cern.ch has lemon exceptions:
* exception.Operating_System
* exception.YUM_error
* ********************************************************************
For this the easiest thing is to remove all the old kernels, e.g.:
[p05153026485494] /afs/cern.ch/user/s/skaplun > rpm -qa | grep kernel
kernel-2.6.32-431.el6.x86_64
kernel-module-openafs-2.6.32-431.el6-1.6.5-cern1.2.slc6.x86_64
kernel-debug-2.6.32-358.23.2.el6.x86_64
yum-kernel-module-1-5.slc6.cern.noarch
kernel-2.6.32-431.1.2.el6.x86_64
kernel-module-openafs-2.6.32-431.1.2.el6-1.6.5-cern1.2.slc6.x86_64
kernel-debug-2.6.32-431.1.2.el6.x86_64
libreport-plugin-kerneloops-2.0.9-19.el6.x86_64
kernel-firmware-2.6.32-431.1.2.el6.noarch
abrt-addon-kerneloops-2.0.8-21.slc6.x86_64
dracut-kernel-004-336.el6_5.2.noarch
kernel-module-openafs-2.6.32-358.18.1.el6-1.6.5-cern1.2.slc6.x86_64
kernel-module-openafs-2.6.32-358.23.2.el6-1.6.5-cern1.2.slc6.x86_64
kernel-headers-2.6.32-431.1.2.el6.x86_64
kernel-2.6.32-358.18.1.el6.x86_64
[p05153026485494] /afs/cern.ch/user/s/skaplun > rpm -e kernel-2.6.32-358.18.1.el6.x86_64 kernel-module-openafs-2.6.32-358.23.2.el6-1.6.5-cern1.2.slc6.x86_64 kernel-module-openafs-2.6.32-358.18.1.el6-1.6.5-cern1.2.slc6.x86_64 kernel-debug-2.6.32-358.23.2.el6.x86_64
You can then proceed with a nice
sudo yum update
.
Resize disk of newly created machines
Just follow
http://information-technology.web.cern.ch/book/cern-cloud-infrastructure-user-guide/administering-vms/resizing-disks
How to properly reboot a machine
In case a reboot of a machine is necessary, these steps need to be accomplished:
- disable alarms via roger:
[aiadm045] /afs/cern.ch/user/s/skaplun > roger update p05153026485494 --all_alarms false
- Log into the machine as root (e.g. from LXPlus) so that your user is not blocking AFS
- in case the machine is running bibsched,
bibsched halt
to correctly halt bibsched and then the tasks should end.
- in case the machine is a WN properly disable it from haproxy with:
fab disable:inspire05
from your inspire-script folder.
- in case the machine is running solr... too bad! ( Note: inspire05 has /etc/rc.d/init.d/solr which takes care of shutdown and restart)
- in case the machine is running redis... need to verify
- in case the machine is running MySQL master: amend all WNs to switch to read only mode and make the master node to be the slave. Disable slave replication on the slave. Disable bibsched.
- in case the machine is running MySQL slave: disable slave replication and amend all WNs to not point to slave.
-
shutdown -r +5
- check via foreman console for proper reboot
- restart affected services (in particular solr via:
preferred method: via init script
$ sudo /etc/rc.d/init.d/solr start|stop
or
$ sudo /sbin/service solr start|stop
directly via startup script in the install directory
$ screen -DR
screen $ cd /opt/cds-invenio/lib/apache-solr-3.1.0/example
screen $ sudo java -jar start.jar &
screen $ exit
Note:
inspire05 has /etc/rc.d/init.d/solr which takes care of shutdown and restart
$ chkconfig --list solr
solr 0:off 1:off 2:off 3:on 4:off 5:on 6:off
- reattach WNs to haproxy via:
fab enable:inspire05
- re-enable alarms:
aidadm $ roger update p05153026485494 --all_alarms true
- restart when necessary bibsched.
--
KaplunSamuele - 08 Apr 2014