CAE commercial software compatibility and requirements

Long survey has been done to check requirements of microelectronics CAE tools in terms of compatibility with current OS, files systems and authentications. Among all used tools mayor software vendors were inspected including: Cadence, Mentor Graphics and Synopsys. Following conclusions arised:
  • Supported Linux OS platforms are RHEL 6 x86_64 for newer releases of software and RHEL 5 i686/x86_64 for 2-3 year old ones.
  • Supported file systems are native ones to Linux platform only. This explicitly implies NFS as only supported networking file system (industrial standard). According to Cadence statements (main software tools vendor used at PH-ESE-ME) installing their software on different configurations is not recommended and leads to problems. Cadence Client Technology Solutions group comment on infrastructure storage systems: "(...)We have seen customers face problems when trying to use a Google File System (GFS) or an Andrew File System (AFS). Generally we recommend using a dedicated Network File System (NFS) device.(...)". ClioSoft company, which delivers data project management solutions has expressed strong discourage of using anything else but NFS (email from Client Support) : "ClioSoft SOS clients do need file locking and AFS does not support it. SOS would probably startup fine but it work not reliably in that environment. It will probably have issues with managing the data and status correctly. I would strongly advice you against it. Disabling Or Ignoring Locks on files will compromise the DM operations and data. That is why we dont have setting to support it. For Network File Systems - We support only NFS (v3 and v4). And Would not really recommend any other file system".
  • Common supported workload managers/grid solutions are SGE and LSF

Pilot MIC project on AFS file system

Pilot project for hosting microeletronics software CAE tools, projects and user data was established in April under following address: /afs/ . Content is protected by AFS ACL rights, group membership and file hierarchy structure.
  • only members of micmgr owned groups can access file structure below depending on project/PDK/software need
  • and only from computers inside CERN network
Following folder tree was created:
  • cad - to hold CAE software installations and Process Design Kit (PDK) files
  • prj - to hold user/group user projects data and ClioSoft based secondary cache server data
  • sim - to hold users simulation/temporary files
  • various - to hold all other files. Currently holds portion of Europractice installation files repository.
Essential software packages, total of 34, that are in use on current cluster were re-installed in MIC pilot project space, reflecting versions from years 2011, 2012 and latest from 2013. At this stage of project shell environment configurations to run installed software with PDKs needed were prepared manually. Three virtual machines running SLC5 x86_64 (lnxmictestXX) were created and configured with additional OS packages as test benches. ClioSoft data management server (micdmservice) was extended with secondary cache daemon for tests of ClioSoft products over AFS filesystem. For that daemon perpetual kerberos/AFS ticket had to be supplied.

Pilot MIC project - some conlusions

Usage of Cadence Virtuoso and EDI products oblige all machines to have several TCP ports open to enable IPC communication between programs running on different hosts. On all machines there must be 3 special daemons running all the time to facilitate IPC communications: clsbd, oaFSLockD and cdsNameServer, For these daemons running under special service user login (micdmsrv) perpetual kerberos/AFS ticket had to be supplied. No solution was found so far for Design Environment Configuration Management (DECM) tool by Cadence VCAD (fundamental tool to create and maintain design projects configurations) to comply with AFS ACL based access solution, neither with limited number of design project and category unix groups. According to Cadence, tools was designed to cope with standard unix permission scheme, followed by rule to have separate unix group and special user for every design project and category. Several test were done with ClioSoft SOS suite over AFS filesystem, trying out 3 types of project data management. Unfortunately, after some time of trials corruption of SOS metadata was observed on cache server.

MIC @ IT - proposal

Following guidelines from CAE software vendors and keeping in mind results of tests done past months PH-ESE-ME is proposing following requirements for installations on IT infrastructure:
  • several machines class >4 cores >32Gb RAM, SLC6 x86_64 to serve as:
    • few NFS servers for: user projects data (>4Tb), CAE software (>1Tb), user local cluster homedirs (>2Tb), simulation/temporary files server (>5Tb)
    • server for local user/group/authentication and automounter maps (LDAP)
  • several machines class 4,8,16 cores, SLC5 x86_64, with progressive RAM count up to 256Gb to handle LSF based batch processing of Design Rule Check (DRC), Layout vs Schematic (LVS) and other CPU/RAM intensive design flow programs.

The build of cluster definitions should be done according to IT current standards which involves Puppet based configurations for all server, batch and ~45 desktop client machines located in build 14. Following tasks are foreseen (provisional list) :

  • deployment of authentication LDAP server
  • deployment of NFS v4 servers
  • deployment of TSM backup on all servers that need it
  • transfer of >62 special users and unix groups to LDAP/(KDC?) on new authentication server (human users should be authenticated by central IT account service)
  • redefinition of automounter maps
  • review of system administration scripts (add/remove users, special users, special groups)
  • deployment of calculation nodes with LSF queue definitions
  • reconfiguration of desktop client nodes to use new infrastructure at IT (NIS->LDAP transition, changes in firewall setings etc)

Comments are welcomed on proposed configuration and tasks. Roadmap of changes has to be discussed between PH-ESE-ME and IT-PES, as many other IT groups may need to be involved in progress of tasks. Deadline for installation is 31 October 2013.

Priority on current MIC cluster hosted in building 14 is given to the reliability of service, since more than 30 designers depend on it every day. Current downtime per year is <12h for cases of accidents, although cluster was not designed as HA architecture. One expects same downtime or lower per year with new infrastructure based on IT installations.

Topic revision: r4 - 2013-07-15
