Difference: InstallSquid2 (1 vs. 57)

Revision 572016-11-29 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a frontier-squid2 cache server

Line: 6 to 6
  The frontier-squid2 software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem. Many people also use it for other applications as well; if you have any questions or comments about general use of this package contact frontier-talk@cern.ch.
Deleted:
<
<
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier Application Development JIRA.

For rapid response to configuration questions, send e-mail to cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch. Most questions can also be answered on the user's mailing list frontier-talk@cern.ch.

 After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids. All WLCG users should register their squids with the WLCG.

Here is what is on this page:

Added:
>
>

Support

If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier Application Development JIRA.

For rapid response to configuration questions, send e-mail to wlcg-squidmon-support@cern.ch. Most questions can also be answered on the user's mailing list frontier-talk@cern.ch.

 

Why use frontier-squid2 instead of regular squid?

The most important feature of frontier-squid2 is that it correctly supports the HTTP standard headers Last-Modified and If-Modified-Since better than other distributions of squid. The Frontier distributed database caching system, which is used by the LHC projects ATLAS and CMS, depends on proper working of this feature, so that is the main reason why that project maintains this squid distribution. Older versions of squid2 (including the one distributed with Red Hat EL 5) do not correctly support this feature, as documented in the infamous squid bug #7. Also, the frontier-squid2 package contains a couple of related patches that are not in any standard squid distribution. Details are in the beginning paragraph of the MyOwnSquid twiki page. Although this package expressly supports If-Modified-Since, it also works well with applications that do not require If-Modified-Since including CVMFS. The collapsed_forwarding feature is also missing from most versions of squid, and it is important for the most common grid applications that use squid and is included in the frontier-squid2 package.

Line: 172 to 173
  The functionality and performance of your squid should be monitored from CERN using SNMP. The monitoring site is http://wlcg-squid-monitor.cern.ch/.
Changed:
<
<
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16, 188.184.128.0/17, and 188.185.128.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc. When that is ready, register the squid with WLCG to start the monitoring.
>
>
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16, 188.184.128.0/17, and 188.185.128.0/17. When that is ready, register the squid with WLCG to start the monitoring. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc. In order to monitor the extra ports, an exception has to be configured on the wlcg-squid-monitor.cern.ch machine, so please contact the squid support team to have that done.
  When running both frontier-squid2 and frontier-squid on the same computer, one of them will need to change the monitoring port, for example with the following in /etc/squid2/customize.sh:
    setoption("snmp_port", "4401")

Revision 562016-11-08 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a frontier-squid2 cache server

Line: 63 to 63
 

Software

Changed:
<
<
The instructions below are for the frontier-squid2 rpm version >= 2.7STABLE9-23.1 on a Scientific Linux version 5, 6 or 7 based system. The rpm is based on the frontier-squid2 source tarball; there isn't documentation for installing it, but the tarball is available and the instructions are very similar to the instructions for installing directly from the frontier-squid tarball. Please see the rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid or frontier-squid2 distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the frontier-squid2 rpm version >= 2.7STABLE9-23.1 on a Redhat Enterprise Linux (RHEL) version 5, 6 or 7 based system. The rpm is based on the frontier-squid2 source tarball; there isn't documentation for installing it, but the tarball is available and the instructions are very similar to the instructions for installing directly from the frontier-squid tarball. Please see the rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid or frontier-squid2 distribution of squid, see MyOwnSquid.
 

Puppet

Revision 552016-11-08 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

Installing a Frontier squid cache server

>
>

Installing a frontier-squid2 cache server

 
Changed:
<
<
The frontier-squid software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem. Many people also use it for other applications as well; if you have any questions or comments about general use of this package contact frontier-talk@cern.ch.
>
>
NOTE: these are instructions for installing a frontier-squid2 package containing the former version of squid used for many years by the WLCG. Instructions to install the current version based on squid-3 are on the InstallSquid page. The frontier-squid2 package can run on the same computer as frontier-squid, as long as it is configured to use different ports. All of the paths in this package are similar to the paths in the frontier-squid package except they all have a '2' suffix; for example, /etc/squid2, /var/log/squid2, and /usr/sbin/squid2.
 
Changed:
<
<
Note to Open Science Grid users: this same package is also available from the Open Science Grid so it will probably be more convenient to you to follow the OSG frontier-squid installation instructions.

Note to users of EGI's UMD repository: the same package is also available in UMD so it might be easier for you to get it from there.

>
>
The frontier-squid2 software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem. Many people also use it for other applications as well; if you have any questions or comments about general use of this package contact frontier-talk@cern.ch.
  If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier Application Development JIRA.
Line: 18 to 16
 
Changed:
<
<

Why use frontier-squid instead of regular squid?

>
>

Why use frontier-squid2 instead of regular squid?

 
Changed:
<
<
The most important feature of frontier-squid is that it correctly supports the HTTP standard headers Last-Modified and If-Modified-Since better than other distributions of squid. The Frontier distributed database caching system, which is used by the LHC projects ATLAS and CMS, depends on proper working of this feature, so that is the main reason why that project maintains this squid distribution. Older versions of squid2 (including the one distributed with Red Hat EL 5) and all versions of squid3 (including the one on Red Hat EL 6) prior to squid3.5 (which is now in pre-release) do not correctly support this feature, as documented in the infamous squid bug #7. Also, the frontier-squid package contains a couple of related patches that are not in any standard squid distribution. Details are in the beginning paragraph of the MyOwnSquid twiki page. Although this package expressly supports If-Modified-Since, it also works well with applications that do not require If-Modified-Since including CVMFS. The collapsed_forwarding feature is also missing from most versions of squid, and it is important for the most common grid applications that use squid and is included in the frontier-squid package.
>
>
The most important feature of frontier-squid2 is that it correctly supports the HTTP standard headers Last-Modified and If-Modified-Since better than other distributions of squid. The Frontier distributed database caching system, which is used by the LHC projects ATLAS and CMS, depends on proper working of this feature, so that is the main reason why that project maintains this squid distribution. Older versions of squid2 (including the one distributed with Red Hat EL 5) do not correctly support this feature, as documented in the infamous squid bug #7. Also, the frontier-squid2 package contains a couple of related patches that are not in any standard squid distribution. Details are in the beginning paragraph of the MyOwnSquid twiki page. Although this package expressly supports If-Modified-Since, it also works well with applications that do not require If-Modified-Since including CVMFS. The collapsed_forwarding feature is also missing from most versions of squid, and it is important for the most common grid applications that use squid and is included in the frontier-squid2 package.
  In addition, the package has several additional features:
  1. A configuration file generator, so configuration customizations can be preserved across package upgrades even when the complicated standard configuration file changes.
Changed:
<
<
  1. The ability to easily run multiple squid processes listening on the same port, in order to support more networking throughput than can be handled by a single CPU core (squid2 is single-threaded).
>
>
  1. The ability to easily run multiple squid processes listening on the same port, in order to support more networking throughput than can be handled by a single CPU core (squid2 is single-threaded and has no concept of multiple workers like squid3).
 
  1. Automatic cleanup of the old cache files in the background when starting squid, to avoid problems with cache corruption.
  2. Default access control lists to permit remote performance monitoring from shared WLCG squid monitoring servers at CERN.
  3. The default log format is more human readable and includes contents of client-identifying headers.
Line: 49 to 47
  3) What network specs?
Changed:
<
<
The latencies will be lower to the worker nodes if you have a large bandwidth. The network is almost always the bottleneck for this system, so at least a gigabit each is highly recommended. If you have many job slots, 2 bonded gigabit network connections is even better, and squid on one core of a modern CPU can pretty much keep up with 2 gigabits. Squid is single-threaded so if you're able to supply more than 2 gigabits, multiple squid processes on the same machine need to be used to serve the full throughput. This is supported in the frontier-squid package (instructions below) but each squid needs its own memory and disk space.
>
>
The latencies will be lower to the worker nodes if you have a large bandwidth. The network is almost always the bottleneck for this system, so at least a gigabit each is highly recommended. If you have many job slots, 2 bonded gigabit network connections is even better, and squid on one core of a modern CPU can pretty much keep up with 2 gigabits. Squid is single-threaded so if you're able to supply more than 2 gigabits, multiple squid processes on the same machine need to be used to serve the full throughput. This is supported in the frontier-squid2 package (instructions below) but each squid needs its own memory and disk space.
  4) How many squids do I need?
Line: 65 to 63
 

Software

Changed:
<
<
The instructions below are for the frontier-squid rpm version >= 2.7STABLE9-23.1 on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the frontier-squid2 rpm version >= 2.7STABLE9-23.1 on a Scientific Linux version 5, 6 or 7 based system. The rpm is based on the frontier-squid2 source tarball; there isn't documentation for installing it, but the tarball is available and the instructions are very similar to the instructions for installing directly from the frontier-squid tarball. Please see the rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid or frontier-squid2 distribution of squid, see MyOwnSquid.
 

Puppet

Changed:
<
<
A puppet module for configuring frontier-squid is available on puppet-forge which understands a lot of the following instructions. If you're using puppet, check there first.
>
>
A puppet module for configuring frontier-squid is available on puppet-forge which understands a lot of the following instructions. If you're using puppet, check there first. Note that the puppet module is for frontier-squid so you would have to adapt it for frontier-squid2.
 

Preparation

Changed:
<
<
By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then for example before installing the rpm create the file /etc/squid/squidconf with the following contents:
>
>
By default the frontier-squid2 rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then for example before installing the rpm create the file /etc/squid2/squidconf with the following contents:
 
    export FRONTIER_USER=dbfrontier
    export FRONTIER_GROUP=dbfrontier
Line: 91 to 89
 

Next, install the package with the following command:

Changed:
<
<
    # yum install frontier-squid

>
>
    # yum install frontier-squid2

 

Set it up to start at boot time with this command:

Changed:
<
<
    # chkconfig frontier-squid on

>
>
    # chkconfig frontier-squid2 on

 

Configuration

Changed:
<
<
Custom configuration is done in /etc/squid/customize.sh. That script invokes functions that edit a supplied default squid.conf source file to generate the final squid.conf that squid sees when it runs. Comments in the default installation of customize.sh give more details on what can be done with it. Whenever /etc/init.d/frontier-squid runs it generates a new squid.conf if customize.sh has been modified.
>
>
Custom configuration is done in /etc/squid2/customize.sh. That script invokes functions that edit a supplied default squid.conf source file to generate the final squid.conf that squid sees when it runs. Comments in the default installation of customize.sh give more details on what can be done with it. Whenever /etc/init.d/frontier-squid2 runs it generates a new squid.conf if customize.sh has been modified.
  It is very important for security that squid not be allowed to proxy requests from everywhere to everywhere. The default customize.sh allows incoming connections only from standard private network addresses and allows outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:
    setoption("acl NET_LOCAL src", "131.154.0.0/16")

Line: 115 to 113
 

Now that the configuration is set up, start squid with this command:

Changed:
<
<
    # service frontier-squid start

>
>
    # service frontier-squid2 start

 

To have a change to customize.sh take affect while squid is running, run the following command:

Changed:
<
<
    # service frontier-squid reload

>
>
    # service frontier-squid2 reload

 

Moving disk cache and logs to a non-standard location

Changed:
<
<
Often the filesystems containing the default locations for the disk cache ( /var/cache/squid) and logs ( /var/log/squid) isn't large enough and there's more space available in another filesystem. To move them to a new location, simply change the directories into symbolic links to the new locations while the service is stopped. Make sure the new directories are created and writable by the user id that squid is running under. For example if /data is a separate filesystem:
    # service frontier-squid stop
    # mv /var/log/squid /data/squid_logs
    # ln -s /data/squid_logs /var/log/squid
    # rm -rf /var/cache/squid/*
    # mv /var/cache/squid /data/squid_cache
    # ln -s /data/squid_cache /var/cache/squid
    # service frontier-squid start

>
>
Often the filesystems containing the default locations for the disk cache ( /var/cache/squid2) and logs ( /var/log/squid2) isn't large enough and there's more space available in another filesystem. To move them to a new location, simply change the directories into symbolic links to the new locations while the service is stopped. Make sure the new directories are created and writable by the user id that squid is running under. For example if /data is a separate filesystem:
    # service frontier-squid2 stop
    # mv /var/log/squid2 /data/squid_logs2
    # ln -s /data/squid_logs2 /var/log/squid2
    # rm -rf /var/cache/squid2/*
    # mv /var/cache/squid2 /data/squid_cache2
    # ln -s /data/squid_cache2 /var/cache/squid2
    # service frontier-squid2 start

 
Changed:
<
<
Alternatively, instead of creating symbolic links you can set the cache_log and coredump_dir options, the second parameter of the cache_dir option, and the first parameter of the access_log option in /etc/squid/customize.sh. For example:
    setoption("cache_log", "/data/squid_logs/cache.log")
    setoption("coredump_dir", "/data/squid_cache")
    setoptionparameter("cache_dir", 2, "/data/squid_cache")
    setoptionparameter("access_log", 1, "daemon:/data/squid_logs/access.log")

>
>
Alternatively, instead of creating symbolic links you can set the cache_log and coredump_dir options, the second parameter of the cache_dir option, and the first parameter of the access_log option in /etc/squid2/customize.sh. For example:
    setoption("cache_log", "/data/squid_logs2/cache.log")
    setoption("coredump_dir", "/data/squid_cache2")
    setoptionparameter("cache_dir", 2, "/data/squid_cache2")
    setoptionparameter("access_log", 1, "daemon:/data/squid_logs2/access.log")

 

It's recommended to use the "daemon:" prefix on the access_log path because that causes squid to use a separate process for writing to logs, so the main process doesn't have to wait for the disk. It is on by default for those who don't set the access_log path.

Changing the size of log files retained

Changed:
<
<
The access.log is rotated each night, and also if it is over a given size (default 5 GB) when it checks each hour. You can change that value by exporting the environment variable SQUID_MAX_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. You can also append M for megabytes or G for gigabytes. For example for 20 gigabytes each you can use:
>
>
The access.log is rotated each night, and also if it is over a given size (default 5 GB) when it checks each hour. You can change that value by exporting the environment variable SQUID_MAX_ACCESS_LOG in /etc/sysconfig/frontier-squid2 to a different number of bytes. You can also append M for megabytes or G for gigabytes. For example for 20 gigabytes each you can use:
 
    export SQUID_MAX_ACCESS_LOG=20G
Changed:
<
<
By default, frontier-squid compresses log files when they are rotated, and saves up to 9 access.log.N.gz files where N goes from 1 to 9. In order to estimate disk usage, note that the rotated files are typically compressed to a bit under 15% of their original size, and that the uncompressed size can go a bit above $SQUID_MAX_ACCESS_LOG because the cron job only checks four times per hour. For example, for SQUID_MAX_ACCESS_LOG=20G the maximum size will be a bit above 20GB plus 9 times 3GB, so allow 50GB to be safe.
>
>
By default, frontier-squid2 compresses log files when they are rotated, and saves up to 9 access.log.N.gz files where N goes from 1 to 9. In order to estimate disk usage, note that the rotated files are typically compressed to a bit under 15% of their original size, and that the uncompressed size can go a bit above $SQUID_MAX_ACCESS_LOG because the cron job only checks four times per hour. For example, for SQUID_MAX_ACCESS_LOG=20G the maximum size will be a bit above 20GB plus 9 times 3GB, so allow 50GB to be safe.
  If frontier-awstats is installed (typically only on central servers), an additional uncompressed copy is also saved in access.log.0.
Changed:
<
<
An alternative to setting the maximum size of each log file, you can leave each log file at the default size and change the number of log files retained, for example for 50 files (about 6GB total space) set the following in /etc/squid/customize.sh:
>
>
An alternative to setting the maximum size of each log file, you can leave each log file at the default size and change the number of log files retained, for example for 50 files (about 6GB total space) set the following in /etc/squid2/customize.sh:
 
    setoption("logfile_rotate", "50")
Changed:
<
<
It is highly recommended to keep at least 3 days worth of logs, so that problems that happen on a weekend can be investigated during working hours. If you really do not have enough disk space for logs, the log can be disabled with the following in /etc/squid/customize.sh:
>
>
It is highly recommended to keep at least 3 days worth of logs, so that problems that happen on a weekend can be investigated during working hours. If you really do not have enough disk space for logs, the log can be disabled with the following in /etc/squid2/customize.sh:
 
    setoption("access_log", "none")
Changed:
<
<
Then after doing service frontier-squid reload (or service frontier-squid start if squid was stopped) remember to remove all the old access.log* files.
>
>
Then after doing service frontier-squid2 reload (or service frontier-squid2 start if squid was stopped) remember to remove all the old access.log* files.
 
Changed:
<
<
On the other hand, the compression of large rotated logs can take a considerably long time to process, so if you have plenty of disk space and don't want to have the additional disk I/O and cpu resources taken during rotation, you can disable rotate compression by putting the following in /etc/sysconfig/frontier-squid:
>
>
On the other hand, the compression of large rotated logs can take a considerably long time to process, so if you have plenty of disk space and don't want to have the additional disk I/O and cpu resources taken during rotation, you can disable rotate compression by putting the following in /etc/sysconfig/frontier-squid2:
 
    export SQUID_COMPRESS_LOGS=false
That uses the old method of telling squid to do the rotation, which keeps access.log.N where N goes from 0 to 9, for a total of 11 files including access.log. When compression is turned off, the default SQUID_MAX_ACCESS_LOG is reduced from 5GB to 1GB, so override that to set your desired size. When converting between compressed and uncompressed format, all the files of the old format are automatically deleted the first time the logs are rotated.
Line: 176 to 174
  To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16, 188.184.128.0/17, and 188.185.128.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc. When that is ready, register the squid with WLCG to start the monitoring.
Added:
>
>
When running both frontier-squid2 and frontier-squid on the same computer, one of them will need to change the monitoring port, for example with the following in /etc/squid2/customize.sh:
    setoption("snmp_port", "4401")
 Note: some sites are tempted to not allow requests from the whole range of IP addresses listed above, but we do not recommend that because the monitoring IP addresses can and will change without warning. Opening the whole CERN range of addresses has been cleared by security experts on the OSG and CMS security teams, because the information that can be collected is not sensitive information. If your site security experts still won't allow it, the next best thing you can do is to allow the aliases wlcgsquidmon1.cern.ch and wlcgsquidmon2.cern.ch. Most firewalls do not automatically refresh DNS entries, so you will also have to be willing to do that manually whenever the values of the aliases change.

Testing the installation

Line: 229 to 231
 
    $ export http_proxy=http://yoursquid.your.domain:3128
Changed:
<
<
and perform the fnget.py test twice again. It should pass through your squid, and cache the response. To confirm that it worked, look at the squid access log (in /var/log/squid/access.log if you haven't moved it). The following is an excerpt:
>
>
and perform the fnget.py test twice again. It should pass through your squid, and cache the response. To confirm that it worked, look at the squid access log (in /var/log/squid2/access.log if you haven't moved it). The following is an excerpt:
 
    128.220.233.179 - - [22/Jan/2013:08:33:17 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_ HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py 1.5" "-" "Python-urllib/2.6"
Line: 240 to 242
 

Log file contents

Changed:
<
<
Error messages are written to cache.log (in /var/log/squid if you haven't moved it) and are generally either self-explanatory or an explanation can be found with google.
>
>
Error messages are written to cache.log (in /var/log/squid2 if you haven't moved it) and are generally either self-explanatory or an explanation can be found with google.
 
Changed:
<
<
Logs of every access are written to access.log (also in /var/log/squid if you haven't moved it) and the default frontier-squid format contains these fields:
>
>
Logs of every access are written to access.log (also in /var/log/squid2 if you haven't moved it) and the default frontier-squid2 format contains these fields:
 
  1. Source IP address
  2. User name from ident if any (usually just a dash)
  3. User name from SSL if any (usually just a dash)
Line: 265 to 267
  takes care of this problem.
Changed:
<
<
  • If squid has difficulty creating cache directories on RHEL 6, like for example:
    # service frontier-squid start
    
    
>
>
  • If squid has difficulty creating cache directories on RHEL 6 or RHEL 7, like for example:
    # service frontier-squid2 start
    
    
 
Changed:
<
<
Generating /etc/squid/squid.conf
>
>
Generating /etc/squid2/squid.conf
  Initializing Cache... 2014/02/21 14:43:53| Creating Swap Directories
Changed:
<
<
FATAL: Failed to make swap directory /var/cache/squid/00: (13) Permission denied
>
>
FATAL: Failed to make swap directory /var/cache2/squid/00: (13) Permission denied
  ... Starting 1 Frontier Squid... Frontier Squid start failed!!! Then if SELinux is enabled and you want to leave it on try the following command:

Changed:
<
<
# restorecon -R /var/cache
>
>
# restorecon -R /var/cache2
 
Changed:
<
<
And start frontier-squid again.
>
>
And start frontier-squid2 again.
 

Inability to reach full network throughput

Changed:
<
<
If you have a CPU that can't quite keep up with full network throughput, we have found that up to an extra 15% throughput can be achieved by binding the single-threaded squid process to a single core, to maximize use of the per-core on-chip caches. This is not enabled by default, but you can enable it by putting the following in /etc/sysconfig/frontier-squid:
>
>
If you have a CPU that can't quite keep up with full network throughput, we have found that up to an extra 15% throughput can be achieved by binding the single-threaded squid process to a single core, to maximize use of the per-core on-chip caches. This is not enabled by default, but you can enable it by putting the following in /etc/sysconfig/frontier-squid2:
 
    export SETSQUIDAFFINITY=true
Line: 297 to 299
 
  1. Make sure there's a "daemon:" prefix on the access_log if you have changed its value.
  2. Reduce the max log size before compression and increase the number of log files retained, to decrease the length of time of each log compression.
  3. Disable compression if you have the space.
Changed:
<
<
  1. As root run ionice -c1 -p PID for the pid listed in squid.pid (default /var/run/squid/squid.pid) for each squid process run. This raises their I/O priority above ordinary filesystem operations.
>
>
  1. As root run ionice -c1 -p PID for the pid listed in squid.pid (default /var/run/squid2/squid.pid) for each squid process run. This raises their I/O priority above ordinary filesystem operations.
 
  1. Disable the access log completely.

Running out of file descriptors

Changed:
<
<
By default, frontier-squid makes sure that there are at least 4096 file descriptors available for squid, which is usually enough. However, under some situations where there are very many clients it might not be enough. When this happens, a message like this shows up in cache.log:
>
>
By default, frontier-squid2 makes sure that there are at least 4096 file descriptors available for squid, which is usually enough. However, under some situations where there are very many clients it might not be enough. When this happens, a message like this shows up in cache.log:
 
    WARNING! Your cache is running out of filedescriptors

There are two ways to increase the limit:

Changed:
<
<
  1. Add a line such as ulimit -n 16384 in /etc/sysconfig/frontier-squid.
>
>
  1. Add a line such as ulimit -n 16384 in /etc/sysconfig/frontier-squid2.
 
  1. Set the nofile parameter in /etc/security/limits.conf or a file in /etc/security/limits.d. For example use a line like this to apply to all accounts:
    * - nofile 16384
    
    or replace the '*' with the squid user name if you prefer.
Line: 349 to 351
 If you have either a particularly slow machine or a high amount of bandwidth available, you may not be able to get full network throughput out of a single squid process. For example, our measurements with a 10 gigabit interface on a 2010-era machine with 8 cores at 2.27Ghz showed that 3 squids were required for full throughput.

Multiple squids can be enabled very simply by doing these steps:

Changed:
<
<
  • Stop frontier-squid and remove the old cache and logs
>
>
  • Stop frontier-squid2 and remove the old cache and logs
 
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
Changed:
<
<
  • Start frontier-squid again. This will automatically detect the extra subdirectories and start that number of squid processes. It will create corresponding log subdirectories and /var/run/squid subdirectories, and generate a separate squid configuration file for each process in /etc/squid/.squid-N.conf. It will also assign each squid process to a particular core as described above.
>
>
  • Start frontier-squid2 again. This will automatically detect the extra subdirectories and start that number of squid processes. It will create corresponding log subdirectories and /var/run/squid2 subdirectories, and generate a separate squid configuration file for each process in /etc/squid2/.squid-N.conf. It will also assign each squid process to a particular core as described above.
 When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0 (this can be changed, see the next section).
Changed:
<
<
If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories, /var/run/squid subdirectories, and the generated configuration files.
>
>
If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories, /var/run/squid2 subdirectories, and the generated configuration files.
 

Running independent squids on the same machine

Changed:
<
<
By default multiple squids are configured so that only one of them will read from upstream servers, and others read from that squid. To disable that feature and instead have each separately read from the upstream server, you can put the following in /etc/sysconfig/frontier-squid:
>
>
By default multiple squids are configured so that only one of them will read from upstream servers, and others read from that squid. To disable that feature and instead have each separately read from the upstream server, you can put the following in /etc/sysconfig/frontier-squid2:
 
    export SQUID_MULTI_PEERING=false
Line: 366 to 368
  Note that there is currently no mechanism to have a different administrator-controlled configuration for each of the independent squids.
Deleted:
<
<

The frontier-squid2 rpm

In addition to the frontier-squid rpm, there is also a frontier-squid2-2.7 rpm. This is identical to the corresponding frontier-squid-2.7 rpm except that all the squid directories and files in shared directories have a "2" suffix on them, for example there's a /etc/squid2, /var/cache/squid2, /var/log/squid2, and /etc/init.d/frontier-squid2. This rpm may be installed on the same machine as the frontier-squid rpm, but one or both must change their http_port and snmp_port options to avoid clashing with the other. Just do yum install frontier-squid2 to install, and add the "2" suffix in all the configuration instructions on this page.

 

Having squid listen on a privileged port

This package runs squid strictly as an unprivileged user, so it is unable to open a privileged TCP port less than 1024. The recommended way to handle that is to have squid listen on an unprivleged port and use iptables to forward a privileged port to the unprivileged port. For example, to forward port 80 to port 8000, use this:

    # iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000
Changed:
<
<
You can change the port that squid listens on with this in /etc/squid/customize.sh:
>
>
You can change the port that squid listens on with this in /etc/squid2/customize.sh:
 
    setoption("http_port","8000")
Line: 400 to 398
  It should report "offline_mode is now ON" which will prevent cached items from expiring. Then as long as everything was preloaded and the laptop doesn't reboot (because starting squid normally clears the cache) you should be able to re-use the cached data. You can switch back to normal mode with the same command or by stopping and starting squid.
Changed:
<
<
To prevent clearing the cache on start, put the following in /etc/sysconfig/frontier-squid:
>
>
To prevent clearing the cache on start, put the following in /etc/sysconfig/frontier-squid2:
 
    export SQUID_CLEAN_CACHE_ON_START=false

If you do that before the first time you start squid (or if you ever want to clear the cache by hand), run this to initialize the cache:

Changed:
<
<
    # service frontier-squid cleancache

>
>
    # service frontier-squid2 cleancache

 

Revision 542016-06-22 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 300 to 300
 
  1. As root run ionice -c1 -p PID for the pid listed in squid.pid (default /var/run/squid/squid.pid) for each squid process run. This raises their I/O priority above ordinary filesystem operations.
  2. Disable the access log completely.
Added:
>
>

Running out of file descriptors

By default, frontier-squid makes sure that there are at least 4096 file descriptors available for squid, which is usually enough. However, under some situations where there are very many clients it might not be enough. When this happens, a message like this shows up in cache.log:

    WARNING! Your cache is running out of filedescriptors

There are two ways to increase the limit:

  1. Add a line such as ulimit -n 16384 in /etc/sysconfig/frontier-squid.
  2. Set the nofile parameter in /etc/security/limits.conf or a file in /etc/security/limits.d. For example use a line like this to apply to all accounts:
    * - nofile 16384
    
    or replace the '*' with the squid user name if you prefer.
 

Alternate configurations

Restricting the destination

Revision 532015-09-16 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 6 to 6
  Note to Open Science Grid users: this same package is also available from the Open Science Grid so it will probably be more convenient to you to follow the OSG frontier-squid installation instructions.
Added:
>
>
Note to users of EGI's UMD repository: the same package is also available in UMD so it might be easier for you to get it from there.
 If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier Application Development JIRA.

For rapid response to configuration questions, send e-mail to cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch. Most questions can also be answered on the user's mailing list frontier-talk@cern.ch.

Revision 522015-07-10 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 170 to 170
 

Enabling monitoring

Changed:
<
<
The functionality and performance of your squid should be monitored from CERN using SNMP.
>
>
The functionality and performance of your squid should be monitored from CERN using SNMP. The monitoring site is http://wlcg-squid-monitor.cern.ch/.
  To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16, 188.184.128.0/17, and 188.185.128.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc. When that is ready, register the squid with WLCG to start the monitoring.
Changed:
<
<
The monitoring site is http://wlcg-squid-monitor.cern.ch/.
>
>
Note: some sites are tempted to not allow requests from the whole range of IP addresses listed above, but we do not recommend that because the monitoring IP addresses can and will change without warning. Opening the whole CERN range of addresses has been cleared by security experts on the OSG and CMS security teams, because the information that can be collected is not sensitive information. If your site security experts still won't allow it, the next best thing you can do is to allow the aliases wlcgsquidmon1.cern.ch and wlcgsquidmon2.cern.ch. Most firewalls do not automatically refresh DNS entries, so you will also have to be willing to do that manually whenever the values of the aliases change.
 

Testing the installation

Revision 512015-05-22 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 341 to 341
  If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories, /var/run/squid subdirectories, and the generated configuration files.
Deleted:
<
<
NOTE: this feature is not supported for reverse proxies because it uses the squid.conf http_port option.
 

Running independent squids on the same machine

By default multiple squids are configured so that only one of them will read from upstream servers, and others read from that squid. To disable that feature and instead have each separately read from the upstream server, you can put the following in /etc/sysconfig/frontier-squid:

    export SQUID_MULTI_PEERING=false
Changed:
<
<
They still all share the same basic configuration, however they can be used independently by accessing http_port+1, http_port+2, etc. For example if the default http_port is not changed, they all listen on port 3128, but then they each individually listen on port 3129, 3130, etc., so traffic flows can be separated by directly using those ports.
>
>
They still all share the same basic configuration, however they can be used independently by accessing http_port-1, http_port-2, etc. For example if the default http_port is not changed, they all listen on port 3128, but then they each individually listen on port 3127, 3126, etc., so traffic flows can be separated by directly using those ports. A common trick is to set the http_port to 3129, and then not don't advertise that port (and perhaps block it in iptables), so one of the squids can be accessed on the usual port 3128.

Note that there is currently no mechanism to have a different administrator-controlled configuration for each of the independent squids.

 

The frontier-squid2 rpm

Revision 502015-05-19 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 63 to 63
 

Software

Changed:
<
<
The instructions below are for the frontier-squid rpm version >= 2.7STABLE9-19.1 on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the frontier-squid rpm version >= 2.7STABLE9-23.1 on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 

Puppet

Revision 492015-04-30 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 336 to 336
 Multiple squids can be enabled very simply by doing these steps:
  • Stop frontier-squid and remove the old cache and logs
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
Deleted:
<
<
  • Make sure the per-user hard limit on file descriptors ("nofile" in /etc/security/limits.conf, or ulimit -Hn in /etc/sysconfig/frontier-squid) is at least 4096 times the number of squid processes, preferably 8192 times the number of squid processes. The file descriptors will be divided between the number of squid processes to keep them from interfering with each other, and each squid will be set to no more than the soft limit (ulimit -n).
 
  • Start frontier-squid again. This will automatically detect the extra subdirectories and start that number of squid processes. It will create corresponding log subdirectories and /var/run/squid subdirectories, and generate a separate squid configuration file for each process in /etc/squid/.squid-N.conf. It will also assign each squid process to a particular core as described above.
When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0 (this can be changed, see the next section).

Revision 482015-04-23 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 136 to 136
 
    setoption("cache_log", "/data/squid_logs/cache.log")
    setoption("coredump_dir", "/data/squid_cache")
    setoptionparameter("cache_dir", 2, "/data/squid_cache")

Changed:
<
<
setoptionparameter("access_log", 1, "/data/squid_logs/access.log")
>
>
setoptionparameter("access_log", 1, "daemon:/data/squid_logs/access.log")
 
Added:
>
>
It's recommended to use the "daemon:" prefix on the access_log path because that causes squid to use a separate process for writing to logs, so the main process doesn't have to wait for the disk. It is on by default for those who don't set the access_log path.
 

Changing the size of log files retained

The access.log is rotated each night, and also if it is over a given size (default 5 GB) when it checks each hour. You can change that value by exporting the environment variable SQUID_MAX_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. You can also append M for megabytes or G for gigabytes. For example for 20 gigabytes each you can use:

Line: 164 to 166
  That uses the old method of telling squid to do the rotation, which keeps access.log.N where N goes from 0 to 9, for a total of 11 files including access.log. When compression is turned off, the default SQUID_MAX_ACCESS_LOG is reduced from 5GB to 1GB, so override that to set your desired size. When converting between compressed and uncompressed format, all the files of the old format are automatically deleted the first time the logs are rotated.
Added:
>
>
See also the section Log compression interfering with squid operation below.
 

Enabling monitoring

The functionality and performance of your squid should be monitored from CERN using SNMP.

Line: 283 to 287
 
    export SETSQUIDAFFINITY=true
Changed:
<
<
If that little boost isn't enough, try running multiple squid processes on the same machine.
>
>
If that little boost isn't enough, try running multiple squid processes on the same machine. That also enables SETSQUIDAFFINITY option.

Log compression interfering with squid operation

Log compression has been observed on at least one machine to interfere with squid operation. That was an old 10-gbit machine with slow disks, high traffic, and 3 squid processes. These are some possible mitigations. Details of how to do many of these things are in the section Changing the size of log files retained section above.

  1. Make sure there's a "daemon:" prefix on the access_log if you have changed its value.
  2. Reduce the max log size before compression and increase the number of log files retained, to decrease the length of time of each log compression.
  3. Disable compression if you have the space.
  4. As root run ionice -c1 -p PID for the pid listed in squid.pid (default /var/run/squid/squid.pid) for each squid process run. This raises their I/O priority above ordinary filesystem operations.
  5. Disable the access log completely.
 

Alternate configurations

Line: 325 to 338
 
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
  • Make sure the per-user hard limit on file descriptors ("nofile" in /etc/security/limits.conf, or ulimit -Hn in /etc/sysconfig/frontier-squid) is at least 4096 times the number of squid processes, preferably 8192 times the number of squid processes. The file descriptors will be divided between the number of squid processes to keep them from interfering with each other, and each squid will be set to no more than the soft limit (ulimit -n).
  • Start frontier-squid again. This will automatically detect the extra subdirectories and start that number of squid processes. It will create corresponding log subdirectories and /var/run/squid subdirectories, and generate a separate squid configuration file for each process in /etc/squid/.squid-N.conf. It will also assign each squid process to a particular core as described above.
Changed:
<
<
When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0.
>
>
When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0 (this can be changed, see the next section).
  If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories, /var/run/squid subdirectories, and the generated configuration files.

NOTE: this feature is not supported for reverse proxies because it uses the squid.conf http_port option.

Added:
>
>

Running independent squids on the same machine

By default multiple squids are configured so that only one of them will read from upstream servers, and others read from that squid. To disable that feature and instead have each separately read from the upstream server, you can put the following in /etc/sysconfig/frontier-squid:

    export SQUID_MULTI_PEERING=false

They still all share the same basic configuration, however they can be used independently by accessing http_port+1, http_port+2, etc. For example if the default http_port is not changed, they all listen on port 3128, but then they each individually listen on port 3129, 3130, etc., so traffic flows can be separated by directly using those ports.

 

The frontier-squid2 rpm

In addition to the frontier-squid rpm, there is also a frontier-squid2-2.7 rpm. This is identical to the corresponding frontier-squid-2.7 rpm except that all the squid directories and files in shared directories have a "2" suffix on them, for example there's a /etc/squid2, /var/cache/squid2, /var/log/squid2, and /etc/init.d/frontier-squid2. This rpm may be installed on the same machine as the frontier-squid rpm, but one or both must change their http_port and snmp_port options to avoid clashing with the other. Just do yum install frontier-squid2 to install, and add the "2" suffix in all the configuration instructions on this page.

Revision 472014-12-04 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

The frontier-squid software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem. Many people also use it for other applications as well; if you have any questions or comments about general use of this package contact frontier-talk@cern.ch.

Changed:
<
<
Note to Open Science Grid users: this same package is also available from the Open Science Grid so it may be more convenient to you to follow the OSG frontier-squid installation instructions.
>
>
Note to Open Science Grid users: this same package is also available from the Open Science Grid so it will probably be more convenient to you to follow the OSG frontier-squid installation instructions.
  If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier Application Development JIRA.

For rapid response to configuration questions, send e-mail to cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch. Most questions can also be answered on the user's mailing list frontier-talk@cern.ch.

Changed:
<
<
After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids and ATLAS users should submit a ticket to the ATLAS database operations JIRA with your site and squid machine name asking to set up further configuration and monitoring.
>
>
After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids. All WLCG users should register their squids with the WLCG.
  Here is what is on this page:
Line: 57 to 57
  There are many ways to configure multiple squids: round-robin DNS, load-balancing networking hardware, LVS, etc. The simplest thing to do is just set up two or more squid machines independently and let Frontier handle it by making a small addition to the frontier client configuration to have the client do the load balancing (described for CMS in the section on multiple squid servers). If there are many thousands of job slots, hardware-based load balancers can be easily overloaded, so DNS-based or client-based load balancing will probably be called for.
Added:
>
>
6) Can I put squid behind a NAT?

Possibly, but if so it should not be the same NAT shared by the worker nodes, otherwise if the squid fails it becomes very difficult to tell on the upstream servers whether it is a badly performing squid or direct connections from the worker nodes. It is much better for the squid to be on a machine with its own public IP address.

 

Software

The instructions below are for the frontier-squid rpm version >= 2.7STABLE9-19.1 on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.

Line: 164 to 168
  The functionality and performance of your squid should be monitored from CERN using SNMP.
Changed:
<
<
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16, 188.184.128.0/17, and 188.185.128.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
>
>
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16, 188.184.128.0/17, and 188.185.128.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc. When that is ready, register the squid with WLCG to start the monitoring.
  The monitoring site is http://wlcg-squid-monitor.cern.ch/.

Revision 462014-12-04 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier squid cache server

Line: 319 to 319
 Multiple squids can be enabled very simply by doing these steps:
  • Stop frontier-squid and remove the old cache and logs
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
Changed:
<
<
  • Make sure the per-user hard limit on file descriptors ("nofile" in /etc/security/limits.conf, or ulimit -Hn) is at least 4096 times the number of squid processes, preferably 8192 times the number of squid processes. The file descriptors will be divided between the number of squid processes to keep them from interfering with each other, and each squid will be set to no more than the soft limit (ulimit -n).
>
>
  • Make sure the per-user hard limit on file descriptors ("nofile" in /etc/security/limits.conf, or ulimit -Hn in /etc/sysconfig/frontier-squid) is at least 4096 times the number of squid processes, preferably 8192 times the number of squid processes. The file descriptors will be divided between the number of squid processes to keep them from interfering with each other, and each squid will be set to no more than the soft limit (ulimit -n).
 
  • Start frontier-squid again. This will automatically detect the extra subdirectories and start that number of squid processes. It will create corresponding log subdirectories and /var/run/squid subdirectories, and generate a separate squid configuration file for each process in /etc/squid/.squid-N.conf. It will also assign each squid process to a particular core as described above.
When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0.

Revision 452014-11-07 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

Installing a Frontier local squid cache server

>
>

Installing a Frontier squid cache server

  The frontier-squid software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem. Many people also use it for other applications as well; if you have any questions or comments about general use of this package contact frontier-talk@cern.ch.
Line: 319 to 319
 Multiple squids can be enabled very simply by doing these steps:
  • Stop frontier-squid and remove the old cache and logs
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
Added:
>
>
  • Make sure the per-user hard limit on file descriptors ("nofile" in /etc/security/limits.conf, or ulimit -Hn) is at least 4096 times the number of squid processes, preferably 8192 times the number of squid processes. The file descriptors will be divided between the number of squid processes to keep them from interfering with each other, and each squid will be set to no more than the soft limit (ulimit -n).
 
  • Start frontier-squid again. This will automatically detect the extra subdirectories and start that number of squid processes. It will create corresponding log subdirectories and /var/run/squid subdirectories, and generate a separate squid configuration file for each process in /etc/squid/.squid-N.conf. It will also assign each squid process to a particular core as described above.
When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0.
Line: 326 to 327
  NOTE: this feature is not supported for reverse proxies because it uses the squid.conf http_port option.
Added:
>
>

The frontier-squid2 rpm

In addition to the frontier-squid rpm, there is also a frontier-squid2-2.7 rpm. This is identical to the corresponding frontier-squid-2.7 rpm except that all the squid directories and files in shared directories have a "2" suffix on them, for example there's a /etc/squid2, /var/cache/squid2, /var/log/squid2, and /etc/init.d/frontier-squid2. This rpm may be installed on the same machine as the frontier-squid rpm, but one or both must change their http_port and snmp_port options to avoid clashing with the other. Just do yum install frontier-squid2 to install, and add the "2" suffix in all the configuration instructions on this page.

 

Having squid listen on a privileged port

This package runs squid strictly as an unprivileged user, so it is unable to open a privileged TCP port less than 1024. The recommended way to handle that is to have squid listen on an unprivleged port and use iptables to forward a privileged port to the unprivileged port. For example, to forward port 80 to port 8000, use this:

Revision 442014-11-05 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 16 to 16
 
Added:
>
>

Why use frontier-squid instead of regular squid?

The most important feature of frontier-squid is that it correctly supports the HTTP standard headers Last-Modified and If-Modified-Since better than other distributions of squid. The Frontier distributed database caching system, which is used by the LHC projects ATLAS and CMS, depends on proper working of this feature, so that is the main reason why that project maintains this squid distribution. Older versions of squid2 (including the one distributed with Red Hat EL 5) and all versions of squid3 (including the one on Red Hat EL 6) prior to squid3.5 (which is now in pre-release) do not correctly support this feature, as documented in the infamous squid bug #7. Also, the frontier-squid package contains a couple of related patches that are not in any standard squid distribution. Details are in the beginning paragraph of the MyOwnSquid twiki page. Although this package expressly supports If-Modified-Since, it also works well with applications that do not require If-Modified-Since including CVMFS. The collapsed_forwarding feature is also missing from most versions of squid, and it is important for the most common grid applications that use squid and is included in the frontier-squid package.

In addition, the package has several additional features:

  1. A configuration file generator, so configuration customizations can be preserved across package upgrades even when the complicated standard configuration file changes.
  2. The ability to easily run multiple squid processes listening on the same port, in order to support more networking throughput than can be handled by a single CPU core (squid2 is single-threaded).
  3. Automatic cleanup of the old cache files in the background when starting squid, to avoid problems with cache corruption.
  4. Default access control lists to permit remote performance monitoring from shared WLCG squid monitoring servers at CERN.
  5. The default log format is more human readable and includes contents of client-identifying headers.
  6. Access logs are rotated throughout the day if they reach a configured size, to avoid filling up disks of heavily used squids. The logs are also compressed by default.
  7. It chooses default options found to be important by years of operational experience on the WLCG.
 

Hardware

The first step is to decide what hardware you want to run the squid cache server on. These are some FAQs.

Revision 432014-09-16 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 26 to 26
  2) What hardware specs (CPU, memory, disk cache)?
Changed:
<
<
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes the oldest log after 10 rotates, and four times an hour it will also rotate if the access log is bigger than 5GB. By default logs are compressed after rotate and typically reduce space by more than 85%, so allowing 12GB for logs should be sufficient. On heavily used systems the default will most likely keep logs for too short of a time, however, so it's better to change the default (instructions below) and allow at least 25GB for logs.
>
>
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes the oldest log after 10 rotates, and four times an hour it will also rotate if the access log is bigger than 5GB. By default logs are compressed after rotate and typically are reduced to less than 15% of their original size, so allowing 12GB for logs should be sufficient. On heavily used systems the default will most likely keep logs for too short of a time, however, so it's better to change the default (instructions below) and allow at least 25GB for logs.
  From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste (unless you are running multiple squid processes). You should also avoid network filesystems such as AFS and NFS for the disk cache.
Line: 128 to 128
 
    export SQUID_MAX_ACCESS_LOG=20G
Changed:
<
<
By default, frontier-squid compresses log files when they are rotated, and saves up to 9 access.log.N.gz files where N goes from 1 to 9. In order to estimate disk usage, note that the compression ratio is typically better than 85%, and the size can go a bit above the $SQUID_MAX_ACCESS_LOG size because the cron job only checks four times per hour. For example, for SQUID_MAX_ACCESS_LOG=20G the maximum size will be a bit above 20GB plus 9 times 3GB, so allow 50GB to be safe.
>
>
By default, frontier-squid compresses log files when they are rotated, and saves up to 9 access.log.N.gz files where N goes from 1 to 9. In order to estimate disk usage, note that the rotated files are typically compressed to a bit under 15% of their original size, and that the uncompressed size can go a bit above $SQUID_MAX_ACCESS_LOG because the cron job only checks four times per hour. For example, for SQUID_MAX_ACCESS_LOG=20G the maximum size will be a bit above 20GB plus 9 times 3GB, so allow 50GB to be safe.
  If frontier-awstats is installed (typically only on central servers), an additional uncompressed copy is also saved in access.log.0.

Revision 422014-09-16 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 26 to 26
  2) What hardware specs (CPU, memory, disk cache)?
Changed:
<
<
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes the oldest log after 10 rotates, and four times an hour it will also rotate if the access log is bigger than 5GB. By default logs are compressed after rotate and typically compress about 90% so allowing 15GB for logs should be sufficient. On heavily used systems the default will most likely keep logs for too short of a time, however, so it's better to change the default (instructions below) and allow at least 25GB for logs.
>
>
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes the oldest log after 10 rotates, and four times an hour it will also rotate if the access log is bigger than 5GB. By default logs are compressed after rotate and typically reduce space by more than 85%, so allowing 12GB for logs should be sufficient. On heavily used systems the default will most likely keep logs for too short of a time, however, so it's better to change the default (instructions below) and allow at least 25GB for logs.
  From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste (unless you are running multiple squid processes). You should also avoid network filesystems such as AFS and NFS for the disk cache.
Line: 128 to 128
 
    export SQUID_MAX_ACCESS_LOG=20G
Changed:
<
<
By default, frontier-squid compresses log files when they are rotated, and saves up to 9 access.log.N.gz files where N goes from 1 to 9. In order to estimate disk usage, note that the compression ratio is typically close to 90%, and the size can go a bit above the $SQUID_MAX_ACCESS_LOG size because the cron job only checks four times per hour. For example, for SQUID_MAX_ACCESS_LOG=20G the maximum size will be a bit above 20GB plus 9 times 2GB, so allow 45GB to 50GB to be safe.
>
>
By default, frontier-squid compresses log files when they are rotated, and saves up to 9 access.log.N.gz files where N goes from 1 to 9. In order to estimate disk usage, note that the compression ratio is typically better than 85%, and the size can go a bit above the $SQUID_MAX_ACCESS_LOG size because the cron job only checks four times per hour. For example, for SQUID_MAX_ACCESS_LOG=20G the maximum size will be a bit above 20GB plus 9 times 3GB, so allow 50GB to be safe.
  If frontier-awstats is installed (typically only on central servers), an additional uncompressed copy is also saved in access.log.0.

Revision 412014-09-15 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 26 to 26
  2) What hardware specs (CPU, memory, disk cache)?
Changed:
<
<
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes the oldest log after 10 rotates, and four times an hour it will also rotate if the log is bigger than 1 GB. On heavily used systems the default will most likely keep logs for too short of a time, however (less than a day), so it's better to change the default (instructions below) and allow at least 25GB for logs.
>
>
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes the oldest log after 10 rotates, and four times an hour it will also rotate if the access log is bigger than 5GB. By default logs are compressed after rotate and typically compress about 90% so allowing 15GB for logs should be sufficient. On heavily used systems the default will most likely keep logs for too short of a time, however, so it's better to change the default (instructions below) and allow at least 25GB for logs.
  From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste (unless you are running multiple squid processes). You should also avoid network filesystems such as AFS and NFS for the disk cache.
Line: 46 to 46
 

Software

Changed:
<
<
The instructions below are for the frontier-squid rpm version >= 2.7STABLE9-18.1 on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the frontier-squid rpm version >= 2.7STABLE9-19.1 on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 

Puppet

Line: 124 to 124
 

Changing the size of log files retained

Changed:
<
<
The access.log is rotated each night, and also if it is over a given size (default 1 GB) when it checks each hour. You can change that value by exporting the environment variable SQUID_MAX_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. You can also append M for megabytes or G for gigabytes. For example for 20 gigabytes each you can use:
>
>
The access.log is rotated each night, and also if it is over a given size (default 5 GB) when it checks each hour. You can change that value by exporting the environment variable SQUID_MAX_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. You can also append M for megabytes or G for gigabytes. For example for 20 gigabytes each you can use:
 
    export SQUID_MAX_ACCESS_LOG=20G
Line: 145 to 145
 On the other hand, the compression of large rotated logs can take a considerably long time to process, so if you have plenty of disk space and don't want to have the additional disk I/O and cpu resources taken during rotation, you can disable rotate compression by putting the following in /etc/sysconfig/frontier-squid:
    export SQUID_COMPRESS_LOGS=false
Changed:
<
<
That uses the old method of telling squid to do the rotation, which keeps access.log.N where N goes from 0 to 9, for a total of 11 files including access.log.
>
>
That uses the old method of telling squid to do the rotation, which keeps access.log.N where N goes from 0 to 9, for a total of 11 files including access.log. When compression is turned off, the default SQUID_MAX_ACCESS_LOG is reduced from 5GB to 1GB, so override that to set your desired size. When converting between compressed and uncompressed format, all the files of the old format are automatically deleted the first time the logs are rotated.
 

Enabling monitoring

Revision 402014-09-15 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 26 to 26
  2) What hardware specs (CPU, memory, disk cache)?
Changed:
<
<
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes after 10 rotates, and once an hour it will also rotate if the log is bigger than 1 GB. On heavily used systems the default might keep logs for too short of a time, however (less than a day), so it's better to change the default and allow at least 50GB for logs. From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste. You should also avoid network filesystems such as AFS and NFS for the disk cache.
>
>
For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes the oldest log after 10 rotates, and four times an hour it will also rotate if the log is bigger than 1 GB. On heavily used systems the default will most likely keep logs for too short of a time, however (less than a day), so it's better to change the default (instructions below) and allow at least 25GB for logs.
 
Changed:
<
<
Here is a description of squid memory usage: If you have a decent amount of spare memory, the kernel will use that as page cache, so it's a good chance that frequenty-requested items will, in fact, be served from RAM (via the page cache) even if it's not squid's RAM. There's also a design bottleneck in squid that limits cpu efficiency of large cache_mem objects, so resist the urge to give squid all your available memory. Let cache_mem handle your small objects and the kernel handle the larger ones.
>
>
From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste (unless you are running multiple squid processes). You should also avoid network filesystems such as AFS and NFS for the disk cache.

Here is a description of squid memory usage: If you have a decent amount of spare memory, the kernel will use that as a disk cache, so it's a good chance that frequenty-requested items will, in fact, be served from RAM (via the disk cache) even if it's not squid's RAM. There's also a design bottleneck in squid that limits cpu efficiency of large cache_mem objects, so resist the urge to give squid all your available memory. Let cache_mem handle your small objects and the kernel handle the larger ones.

  3) What network specs?
Line: 44 to 46
 

Software

Changed:
<
<
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the frontier-squid rpm version >= 2.7STABLE9-18.1 on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 

Puppet

Line: 62 to 64
 

Installation

First, if you have not installed any frontier rpm before, execute the following command as the root user:

Changed:
<
<
    # rpm -Uvh http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.0-1.noarch.rpm

>
>
    # rpm -Uvh http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.1-1.noarch.rpm

 

If it warns about creating /etc/yum.repos.d/cern-frontier.repo.rpmnew, then move that file into place:

Line: 87 to 89
  The script allows specifying many subnets - just separate them by a blank. If you would like to limit the outgoing connections please see the section below on restricting the destination.
Changed:
<
<
If you want to, you can change the cache_mem option to set the size squid reserves for caching small objects in memory, but don't make it more than 1/8th of your hardware memory. The default 128 MB should be fine, leaving a lot of memory for disk buffering by the OS, because squid performs better for large objects in disk cache buffers than in its own internal memory cache.
>
>
If you want to, you can change the cache_mem option to set the size squid reserves for caching small objects in memory, but don't make it more than 1/8th of your hardware memory. The default 128 MB should be fine, leaving a lot of memory for disk caching by the OS, because squid performs better for large objects in disk cache buffers than in its own internal memory cache.
  Change the size of the cache_dir (the third parameter) to your desired size in MB. The default is only 10 GB which is rather stingy. For example, for 100 GB set it to this:
    setoptionparameter("cache_dir", 3, "100000")

Line: 122 to 124
 

Changing the size of log files retained

Changed:
<
<
The access.log is rotated each night, and also if it is over a given size (default 1 GB) when it checks each hour. You can change that value by exporting the environment variable LARGE_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. For example for 10GB each you can use:
    export LARGE_ACCESS_LOG=10000000000

>
>
The access.log is rotated each night, and also if it is over a given size (default 1 GB) when it checks each hour. You can change that value by exporting the environment variable SQUID_MAX_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. You can also append M for megabytes or G for gigabytes. For example for 20 gigabytes each you can use:
    export SQUID_MAX_ACCESS_LOG=20G

 
Changed:
<
<
In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour.
>
>
By default, frontier-squid compresses log files when they are rotated, and saves up to 9 access.log.N.gz files where N goes from 1 to 9. In order to estimate disk usage, note that the compression ratio is typically close to 90%, and the size can go a bit above the $SQUID_MAX_ACCESS_LOG size because the cron job only checks four times per hour. For example, for SQUID_MAX_ACCESS_LOG=20G the maximum size will be a bit above 20GB plus 9 times 2GB, so allow 45GB to 50GB to be safe.

If frontier-awstats is installed (typically only on central servers), an additional uncompressed copy is also saved in access.log.0.

 
Changed:
<
<
An alternative to setting the maximum size of each log file, you can leave each log file at the default size and change the number of log files retained, for example for 51 files (about 50GB total space) set the following in /etc/squid/customize.sh:
>
>
An alternative to setting the maximum size of each log file, you can leave each log file at the default size and change the number of log files retained, for example for 50 files (about 6GB total space) set the following in /etc/squid/customize.sh:
 
    setoption("logfile_rotate", "50")
Line: 138 to 142
  Then after doing service frontier-squid reload (or service frontier-squid start if squid was stopped) remember to remove all the old access.log* files.
Added:
>
>
On the other hand, the compression of large rotated logs can take a considerably long time to process, so if you have plenty of disk space and don't want to have the additional disk I/O and cpu resources taken during rotation, you can disable rotate compression by putting the following in /etc/sysconfig/frontier-squid:
    export SQUID_COMPRESS_LOGS=false
That uses the old method of telling squid to do the rotation, which keeps access.log.N where N goes from 0 to 9, for a total of 11 files including access.log.
 

Enabling monitoring

The functionality and performance of your squid should be monitored from CERN using SNMP.

Changed:
<
<
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16 and 188.185.0.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
>
>
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16, 188.184.128.0/17, and 188.185.128.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
 
Changed:
<
<
The monitoring site is at http://wlcg-squid-monitor.cern.ch/.
>
>
The monitoring site is http://wlcg-squid-monitor.cern.ch/.
 

Testing the installation

Line: 220 to 229
 
  1. Reply size including http headers
  2. Squid request status (e.g. TCP_MISS) and heirarchy status (e.g. DEFAULT_PARENT) separated by a colon
  3. Response time in milliseconds
Changed:
<
<
  1. The contents of the X-Frontier-Id header or a dash if none, surrounded by double quotes
>
>
  1. The contents of the X-Frontier-Id header or a dash if none, then a space, then the contents of the cvmfs-info header, or a dash if none, all surrounded by double quotes (no client sends both so entries will always either start with "- " or end with " -")
 
  1. The contents of the Referer header or a dash if none, surrounded by double quotes
  2. The contents of the User-Agent header or a dash if none, surrounded by double quotes
Line: 263 to 272
 

Restricting the destination

Changed:
<
<
The default behavior is to allow the squid to be used for any destination. If you want to restrict the squid to be used only for CMS Conditions Data, then you simply have to add two lines to customize.sh that enable a couple of lines in squid.conf which are already there commented out:
>
>
The default behavior is to allow the squid to be used for any destination. There are some pre-defined access controls commented out for the most common destinations on the WLCG. They are
  1. CMS_FRONTIER - CMS Frontier conditions data servers
  2. ATLAS_FRONTIER - ATLAS Frontier conditions data servers
  3. MAJOR_CVMFS - the major WLCG CVMFS stratum 1 servers
In addition, there are two commented out lines using a general RESTRICT_DEST access control which you can use to set a regular expression that restricts connections to any set of hosts of your choice.
 
Changed:
<
<
    uncomment("acl RESTRICT_DEST")
    uncomment("http_access deny !RESTRICT_DEST")

>
>
To use one of the pre-defined access controls, use two lines like this (for example with CMS_FRONTIER):
    uncomment("acl CMS_FRONTIER")
    insertline("^# http_access deny !RESTRICT_DEST", "http_access deny !CMS_FRONTIER")

To use a combination of two of the pre-defined acls, use "http_access allow" followed by "http_access deny !", for example:

    uncomment("acl CMS_FRONTIER")
    uncomment("acl MAJOR_CVMFS")
    insertline("^# http_access deny !RESTRICT_DEST", "http_access allow CMS_FRONTIER")
    insertline("^# http_access deny !RESTRICT_DEST", "http_access deny !MAJOR_CVMFS")

 
Changed:
<
<
If for some reason you want to have a different destination or destinations you can use a regular expression, for example:
>
>
If for some reason you want to have a different destination or destinations you can instead use a regular expression with the RESTRICT_DEST lines, for example:
 
    setoptionparameter("acl RESTRICT_DEST", 3, "^(((cms|atlas).*frontier.*)\\.cern\\.ch)|frontier.*\\.racf\\.bnl\\.gov$")
    uncomment("http_access deny !RESTRICT_DEST")
Changed:
<
<
Once you have restricted the destination, it isn't so important anymore to restrict the source. If you want to leave it unrestricted you can change the NET_LOCAL acl to 0.0.0.0/0:
>
>
Once you have restricted the destination, it isn't so important anymore to restrict the source. If you want to leave it unrestricted you can change the NET_LOCAL acl to 0.0.0.0/0 (unless you want to restrict both):
 
    setoption("acl NET_LOCAL src", "0.0.0.0/0")

Revision 392014-09-01 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 144 to 144
  To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16 and 188.185.0.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
Changed:
<
<
The main monitoring site is at http://wlcg-squid-monitor.cern.ch/. The legacy monitoring service, running within the Frontier monitoring machines, is at http://frontier.cern.ch/squidstats/.
>
>
The monitoring site is at http://wlcg-squid-monitor.cern.ch/.
 

Testing the installation

Revision 382014-08-22 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 326 to 326
 
    export SQUID_CLEAN_CACHE_ON_START=false
Added:
>
>
If you do that before the first time you start squid (or if you ever want to clear the cache by hand), run this to initialize the cache:
    # service frontier-squid cleancache
 Responsible: DaveDykstra

Revision 372014-06-12 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 6 to 6
  Note to Open Science Grid users: this same package is also available from the Open Science Grid so it may be more convenient to you to follow the OSG frontier-squid installation instructions.
Changed:
<
<
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier development Savannah project.
>
>
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier Application Development JIRA.
  For rapid response to configuration questions, send e-mail to cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch. Most questions can also be answered on the user's mailing list frontier-talk@cern.ch.
Changed:
<
<
After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids and ATLAS users should submit a ticket to the ATLAS database operations Savannah project with your site and squid machine name asking to set up further configuration and monitoring.
>
>
After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids and ATLAS users should submit a ticket to the ATLAS database operations JIRA with your site and squid machine name asking to set up further configuration and monitoring.
  Here is what is on this page:

Revision 362014-06-06 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 46 to 46
  The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
Added:
>
>

Puppet

A puppet module for configuring frontier-squid is available on puppet-forge which understands a lot of the following instructions. If you're using puppet, check there first.

 

Preparation

By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then for example before installing the rpm create the file /etc/squid/squidconf with the following contents:

Revision 352014-02-24 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 240 to 240
  Starting 1 Frontier Squid... Frontier Squid start failed!!!
Changed:
<
<
Try the following command:

>
>
Then if SELinux is enabled and you want to leave it on try the following command:

 # restorecon -R /var/cache And start frontier-squid again.

Revision 342014-02-24 - LuisLinares

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 224 to 224
 

SELinux

Changed:
<
<
SELinux on RHEL 5 does not give the proper context to the default SNMP port (3401) (as of selinux-policy-2.4.6-106.el5) . The command (as root)
>
>
  • SELinux on RHEL 5 does not give the proper context to the default SNMP port (3401) (as of selinux-policy-2.4.6-106.el5). The command (as root):
    # semanage port -a -t http_cache_port_t -p udp 3401
    
    takes care of this problem.
 
Changed:
<
<
semanage port -a -t http_cache_port_t -p udp 3401
>
>
  • If squid has difficulty creating cache directories on RHEL 6, like for example:
    # service frontier-squid start
    
    
 
Changed:
<
<
takes care of this problem.
>
>
Generating /etc/squid/squid.conf Initializing Cache... 2014/02/21 14:43:53| Creating Swap Directories FATAL: Failed to make swap directory /var/cache/squid/00: (13) Permission denied ... Starting 1 Frontier Squid... Frontier Squid start failed!!! Try the following command:
# restorecon -R /var/cache
And start frontier-squid again.
 
Changed:
<
<
If squid has difficulty creating cache directories on RHEL 6, our recommendation is to disable SELinux.
>
>
 

Inability to reach full network throughput

Revision 332013-11-26 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 230 to 230
  takes care of this problem.
Added:
>
>
If squid has difficulty creating cache directories on RHEL 6, our recommendation is to disable SELinux.
 

Inability to reach full network throughput

If you have a CPU that can't quite keep up with full network throughput, we have found that up to an extra 15% throughput can be achieved by binding the single-threaded squid process to a single core, to maximize use of the per-core on-chip caches. This is not enabled by default, but you can enable it by putting the following in /etc/sysconfig/frontier-squid:

Line: 274 to 276
 

Having squid listen on a privileged port

This package runs squid strictly as an unprivileged user, so it is unable to open a privileged TCP port less than 1024. The recommended way to handle that is to have squid listen on an unprivleged port and use iptables to forward a privileged port to the unprivileged port. For example, to forward port 80 to port 8000, use this:

Changed:
<
<
    # iptables -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000

>
>
    # iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000

 

You can change the port that squid listens on with this in /etc/squid/customize.sh:

Revision 322013-05-22 - LuisLinares

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 140 to 140
  To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16 and 188.185.0.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
Changed:
<
<
The main monitoring site is at http://frontier.cern.ch/squidstats/.
>
>
The main monitoring site is at http://wlcg-squid-monitor.cern.ch/. The legacy monitoring service, running within the Frontier monitoring machines, is at http://frontier.cern.ch/squidstats/.
 

Testing the installation

Revision 312013-05-20 - BrijKishorJashal

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 44 to 44
 

Software

Changed:
<
<
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available ( Recommended for Scientific Linux 6 based system ) . Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 

Preparation

Revision 302013-05-20 - BrijKishorJashal

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 6 to 6
  Note to Open Science Grid users: this same package is also available from the Open Science Grid so it may be more convenient to you to follow the OSG frontier-squid installation instructions.
Changed:
<
<
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier development Savannah project.
>
>
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier development Savannah project.
  For rapid response to configuration questions, send e-mail to cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch. Most questions can also be answered on the user's mailing list frontier-talk@cern.ch.
Line: 44 to 44
 

Software

Changed:
<
<
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available ( Recommended for Scientific Linux 6 based system ) . Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 

Preparation

Added:
>
>
 By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then for example before installing the rpm create the file /etc/squid/squidconf with the following contents:
Changed:
<
<
    export FRONTIER_USER=dbfrontier

>
>
    export FRONTIER_USER=dbfrontier

  export FRONTIER_GROUP=dbfrontier
Added:
>
>
 where you can fill in whichever user and group id you choose.

Installation

Line: 57 to 58
 

Installation

First, if you have not installed any frontier rpm before, execute the following command as the root user:

Changed:
<
<
    # rpm -Uvh http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.0-1.noarch.rpm

>
>
    # rpm -Uvh http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.0-1.noarch.rpm

 
Added:
>
>
 If it warns about creating /etc/yum.repos.d/cern-frontier.repo.rpmnew, then move that file into place:
Changed:
<
<
    # mv /etc/yum.repos.d/cern-frontier.repo.rpmnew /etc/yum.repos.d/cern-frontier.repo

>
>
    # mv /etc/yum.repos.d/cern-frontier.repo.rpmnew /etc/yum.repos.d/cern-frontier.repo

 

Next, install the package with the following command:

Changed:
<
<
    # yum install frontier-squid

>
>
    # yum install frontier-squid

 

Set it up to start at boot time with this command:

Changed:
<
<
    # chkconfig frontier-squid on

>
>
    # chkconfig frontier-squid on

 

Configuration

Line: 80 to 78
 Custom configuration is done in /etc/squid/customize.sh. That script invokes functions that edit a supplied default squid.conf source file to generate the final squid.conf that squid sees when it runs. Comments in the default installation of customize.sh give more details on what can be done with it. Whenever /etc/init.d/frontier-squid runs it generates a new squid.conf if customize.sh has been modified.

It is very important for security that squid not be allowed to proxy requests from everywhere to everywhere. The default customize.sh allows incoming connections only from standard private network addresses and allows outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:

Changed:
<
<
    setoption("acl NET_LOCAL src", "131.154.0.0/16")

>
>
    setoption("acl NET_LOCAL src", "131.154.0.0/16")

 

The script allows specifying many subnets - just separate them by a blank. If you would like to limit the outgoing connections please see the section below on restricting the destination.

Line: 89 to 86
 If you want to, you can change the cache_mem option to set the size squid reserves for caching small objects in memory, but don't make it more than 1/8th of your hardware memory. The default 128 MB should be fine, leaving a lot of memory for disk buffering by the OS, because squid performs better for large objects in disk cache buffers than in its own internal memory cache.

Change the size of the cache_dir (the third parameter) to your desired size in MB. The default is only 10 GB which is rather stingy. For example, for 100 GB set it to this:

Changed:
<
<
    setoptionparameter("cache_dir", 3, "100000")

>
>
    setoptionparameter("cache_dir", 3, "100000")

 

Now that the configuration is set up, start squid with this command:

Changed:
<
<
    # service frontier-squid start

>
>
    # service frontier-squid start

 

To have a change to customize.sh take affect while squid is running, run the following command:

Changed:
<
<
    # service frontier-squid reload

>
>
    # service frontier-squid reload

 

Moving disk cache and logs to a non-standard location

Often the filesystems containing the default locations for the disk cache (/var/cache/squid) and logs (/var/log/squid) isn't large enough and there's more space available in another filesystem. To move them to a new location, simply change the directories into symbolic links to the new locations while the service is stopped. Make sure the new directories are created and writable by the user id that squid is running under. For example if /data is a separate filesystem:

Changed:
<
<
    # service frontier-squid stop

>
>
    # service frontier-squid stop

  # mv /var/log/squid /data/squid_logs # ln -s /data/squid_logs /var/log/squid # rm -rf /var/cache/squid/*
Line: 117 to 110
 

Alternatively, instead of creating symbolic links you can set the cache_log and coredump_dir options, the second parameter of the cache_dir option, and the first parameter of the access_log option in /etc/squid/customize.sh. For example:

Changed:
<
<
    setoption("cache_log", "/data/squid_logs/cache.log")

>
>
    setoption("cache_log", "/data/squid_logs/cache.log")

  setoption("coredump_dir", "/data/squid_cache") setoptionparameter("cache_dir", 2, "/data/squid_cache") setoptionparameter("access_log", 1, "/data/squid_logs/access.log")
Line: 127 to 119
 

Changing the size of log files retained

The access.log is rotated each night, and also if it is over a given size (default 1 GB) when it checks each hour. You can change that value by exporting the environment variable LARGE_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. For example for 10GB each you can use:

Changed:
<
<
    export LARGE_ACCESS_LOG=10000000000

>
>
    export LARGE_ACCESS_LOG=10000000000

 
Added:
>
>
 In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour.

An alternative to setting the maximum size of each log file, you can leave each log file at the default size and change the number of log files retained, for example for 51 files (about 50GB total space) set the following in /etc/squid/customize.sh:

Line: 133 to 125
 In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour.

An alternative to setting the maximum size of each log file, you can leave each log file at the default size and change the number of log files retained, for example for 51 files (about 50GB total space) set the following in /etc/squid/customize.sh:

Changed:
<
<
    setoption("logfile_rotate", "50")

>
>
    setoption("logfile_rotate", "50")

 

It is highly recommended to keep at least 3 days worth of logs, so that problems that happen on a weekend can be investigated during working hours. If you really do not have enough disk space for logs, the log can be disabled with the following in /etc/squid/customize.sh:

Changed:
<
<
    setoption("access_log", "none")

>
>
    setoption("access_log", "none")

 
Added:
>
>
 Then after doing service frontier-squid reload (or service frontier-squid start if squid was stopped) remember to remove all the old access.log* files.

Enabling monitoring

Line: 157 to 148
  Test access to a Frontier server at CERN with the following commands:
Changed:
<
<
    $ chmod +x fnget.py #(only first time)

>
>
    $ chmod +x fnget.py #(only first time)

  $ ./fnget.py --url=http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier --sql="select 1 from dual"
Line: 200 to 190
  Now to test your squid, replace yoursquid.your.domain in the following command with the name of your squid machine
Changed:
<
<
    $ export http_proxy=http://yoursquid.your.domain:3128

>
>
    $ export http_proxy=http://yoursquid.your.domain:3128

 

and perform the fnget.py test twice again. It should pass through your squid, and cache the response. To confirm that it worked, look at the squid access log (in /var/log/squid/access.log if you haven't moved it). The following is an excerpt:

Line: 218 to 207
 Error messages are written to cache.log (in /var/log/squid if you haven't moved it) and are generally either self-explanatory or an explanation can be found with google.

Logs of every access are written to access.log (also in /var/log/squid if you haven't moved it) and the default frontier-squid format contains these fields:

Changed:
<
<
  1. Source IP address
  2. User name from ident if any (usually just a dash)
  3. User name from SSL if any (usually just a dash)
  4. Date/timestamp query finished in local time, and +0000, surrounded by square brackets
  5. The request method, URL, and protocol version, all surrounded by double quotes
  6. The http status (result) code
  7. Reply size including http headers
  8. Squid request status (e.g. TCP_MISS) and heirarchy status (e.g. DEFAULT_PARENT) separated by a colon
  9. Response time in milliseconds
  10. The contents of the X-Frontier-Id header or a dash if none, surrounded by double quotes
  11. The contents of the Referer header or a dash if none, surrounded by double quotes
  12. The contents of the User-Agent header or a dash if none, surrounded by double quotes
>
>
  1. Source IP address
  2. User name from ident if any (usually just a dash)
  3. User name from SSL if any (usually just a dash)
  4. Date/timestamp query finished in local time, and +0000, surrounded by square brackets
  5. The request method, URL, and protocol version, all surrounded by double quotes
  6. The http status (result) code
  7. Reply size including http headers
  8. Squid request status (e.g. TCP_MISS) and heirarchy status (e.g. DEFAULT_PARENT) separated by a colon
  9. Response time in milliseconds
  10. The contents of the X-Frontier-Id header or a dash if none, surrounded by double quotes
  11. The contents of the Referer header or a dash if none, surrounded by double quotes
  12. The contents of the User-Agent header or a dash if none, surrounded by double quotes
 

Common issues

Line: 244 to 233
 

Inability to reach full network throughput

If you have a CPU that can't quite keep up with full network throughput, we have found that up to an extra 15% throughput can be achieved by binding the single-threaded squid process to a single core, to maximize use of the per-core on-chip caches. This is not enabled by default, but you can enable it by putting the following in /etc/sysconfig/frontier-squid:

Changed:
<
<
    export SETSQUIDAFFINITY=true

>
>
    export SETSQUIDAFFINITY=true

 

If that little boost isn't enough, try running multiple squid processes on the same machine.

Line: 256 to 244
  The default behavior is to allow the squid to be used for any destination. If you want to restrict the squid to be used only for CMS Conditions Data, then you simply have to add two lines to customize.sh that enable a couple of lines in squid.conf which are already there commented out:
Changed:
<
<
    uncomment("acl RESTRICT_DEST")

>
>
    uncomment("acl RESTRICT_DEST")

  uncomment("http_access deny RESTRICT_DEST")

If for some reason you want to have a different destination or destinations you can use a regular expression, for example:

Changed:
<
<
    setoptionparameter("acl RESTRICT_DEST", 3, "^(((cms|atlas).*frontier.*)\\.cern\\.ch)|frontier.*\\.racf\\.bnl\\.gov$")

>
>
    setoptionparameter("acl RESTRICT_DEST", 3, "^(((cms|atlas).*frontier.*)\\.cern\\.ch)|frontier.*\\.racf\\.bnl\\.gov$")

  uncomment("http_access deny RESTRICT_DEST")

Once you have restricted the destination, it isn't so important anymore to restrict the source. If you want to leave it unrestricted you can change the NET_LOCAL acl to 0.0.0.0/0:

Changed:
<
<
    setoption("acl NET_LOCAL src", "0.0.0.0/0")

>
>
    setoption("acl NET_LOCAL src", "0.0.0.0/0")

 

Running multiple squid processes on the same machine

Line: 290 to 274
 

Having squid listen on a privileged port

This package runs squid strictly as an unprivileged user, so it is unable to open a privileged TCP port less than 1024. The recommended way to handle that is to have squid listen on an unprivleged port and use iptables to forward a privileged port to the unprivileged port. For example, to forward port 80 to port 8000, use this:

Changed:
<
<
    # iptables -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000

>
>
    # iptables -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000

 
Added:
>
>
 You can change the port that squid listens on with this in /etc/squid/customize.sh:
Changed:
<
<
    setoption("http_port","8000")

>
>
    setoption("http_port","8000")

 

Personal squid on a desktop/laptop

Line: 309 to 292
  If you want to be able to run a laptop disconnected from the network, add the following to customize.sh:
Changed:
<
<
      setoption("cachemgr_passwd", "none offline_toggle")

>
>
      setoption("cachemgr_passwd", "none offline_toggle")

 

Then, load up the cache by running your user job once while the network is attached, and run the following command once:

Changed:
<
<
      squidclient mgr:offline_toggle

>
>
      squidclient mgr:offline_toggle

 
Added:
>
>
 It should report "offline_mode is now ON" which will prevent cached items from expiring. Then as long as everything was preloaded and the laptop doesn't reboot (because starting squid normally clears the cache) you should be able to re-use the cached data. You can switch back to normal mode with the same command or by stopping and starting squid.

To prevent clearing the cache on start, put the following in /etc/sysconfig/frontier-squid:

Line: 320 to 302
 It should report "offline_mode is now ON" which will prevent cached items from expiring. Then as long as everything was preloaded and the laptop doesn't reboot (because starting squid normally clears the cache) you should be able to re-use the cached data. You can switch back to normal mode with the same command or by stopping and starting squid.

To prevent clearing the cache on start, put the following in /etc/sysconfig/frontier-squid:

Changed:
<
<
    export SQUID_CLEAN_CACHE_ON_START=false

>
>
    export SQUID_CLEAN_CACHE_ON_START=false

 
Deleted:
<
<
 Responsible: DaveDykstra \ No newline at end of file

Revision 292013-05-09 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 317 to 317
 
      squidclient mgr:offline_toggle
Changed:
<
<
It should report "offline_mode is now ON" which will prevent cached items from expiring. Then as long as everything was preloaded and the laptop doesn't reboot (because starting squid clears the cache) you should be able to re-use the cached data. You can switch back to normal mode with the same command or by stopping and starting squid.
>
>
It should report "offline_mode is now ON" which will prevent cached items from expiring. Then as long as everything was preloaded and the laptop doesn't reboot (because starting squid normally clears the cache) you should be able to re-use the cached data. You can switch back to normal mode with the same command or by stopping and starting squid.

To prevent clearing the cache on start, put the following in /etc/sysconfig/frontier-squid:

    export SQUID_CLEAN_CACHE_ON_START=false
 

Responsible: DaveDykstra

Revision 282013-05-09 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 47 to 47
 The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.

Preparation

Changed:
<
<
By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then before installing the rpm create the file /etc/squid/squidconf with the following contents:
>
>
By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then for example before installing the rpm create the file /etc/squid/squidconf with the following contents:
 
    export FRONTIER_USER=dbfrontier

Changed:
<
<
export FRONTIER_GROUP=users
>
>
export FRONTIER_GROUP=dbfrontier
  where you can fill in whichever user and group id you choose.

Revision 272013-04-12 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Changed:
<
<
The frontier-squid software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.
>
>
The frontier-squid software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem. Many people also use it for other applications as well; if you have any questions or comments about general use of this package contact frontier-talk@cern.ch.
  Note to Open Science Grid users: this same package is also available from the Open Science Grid so it may be more convenient to you to follow the OSG frontier-squid installation instructions.

If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier development Savannah project.

Changed:
<
<
For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.
>
>
For rapid response to configuration questions, send e-mail to cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch. Most questions can also be answered on the user's mailing list frontier-talk@cern.ch.
  After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids and ATLAS users should submit a ticket to the ATLAS database operations Savannah project with your site and squid machine name asking to set up further configuration and monitoring.

Revision 262013-02-12 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Changed:
<
<
The frontier-squid software is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.
>
>
The frontier-squid software package is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.

Note to Open Science Grid users: this same package is also available from the Open Science Grid so it may be more convenient to you to follow the OSG frontier-squid installation instructions.

  If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier development Savannah project.

Revision 252013-02-08 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 56 to 56
  First, if you have not installed any frontier rpm before, execute the following command as the root user:

Changed:
<
<
# rpm -Uvh --replacefiles http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.0-1.noarch.rpm
>
>
# rpm -Uvh http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.0-1.noarch.rpm If it warns about creating /etc/yum.repos.d/cern-frontier.repo.rpmnew, then move that file into place:
    # mv /etc/yum.repos.d/cern-frontier.repo.rpmnew /etc/yum.repos.d/cern-frontier.repo

 
Deleted:
<
<
(If you are not upgrading a previous installation done without the frontier-release package you may leave out --replacefiles).
  Next, install the package with the following command:

Revision 242013-02-07 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 100 to 100
 

Moving disk cache and logs to a non-standard location

Changed:
<
<
Often the filesystems containing the default locations for the disk cache (/var/cache/squid) and logs (/var/log/squid) isn't large enough and there's more space available in another filesystem. To move them to a new location, set the cache_log and coredump_dir options, the second parameter of the cache_dir option, and the first parameter of the access_log option. For example:
>
>
Often the filesystems containing the default locations for the disk cache (/var/cache/squid) and logs (/var/log/squid) isn't large enough and there's more space available in another filesystem. To move them to a new location, simply change the directories into symbolic links to the new locations while the service is stopped. Make sure the new directories are created and writable by the user id that squid is running under. For example if /data is a separate filesystem:
 

Changed:
<
<
setoptionparameter("cache_dir", 2, "/data/squid_cache") setoptionparameter("access_log", 1, "/data/squid_logs/access.log") setoption("cache_log", "/data/squid_logs/cache.log") setoption("coredump_dir", "/data/squid_cache")
>
>
# service frontier-squid stop # mv /var/log/squid /data/squid_logs # ln -s /data/squid_logs /var/log/squid # rm -rf /var/cache/squid/* # mv /var/cache/squid /data/squid_cache # ln -s /data/squid_cache /var/cache/squid # service frontier-squid start
 
Changed:
<
<
Pre-create the new cache and logs directory and make them writable by the user id that squid is running under.

Don't forget to reload the configuration

>
>
Alternatively, instead of creating symbolic links you can set the cache_log and coredump_dir options, the second parameter of the cache_dir option, and the first parameter of the access_log option in /etc/squid/customize.sh. For example:
 

Changed:
<
<
# service frontier-squid reload and then remove everything from the old directories
    # rm -rf /var/squid/cache /var/log/squid

>
>
setoption("cache_log", "/data/squid_logs/cache.log") setoption("coredump_dir", "/data/squid_cache") setoptionparameter("cache_dir", 2, "/data/squid_cache") setoptionparameter("access_log", 1, "/data/squid_logs/access.log")
 

Changing the size of log files retained

Revision 232013-02-07 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 116 to 116
  and then remove everything from the old directories

Changed:
<
<
# rm -rf /var/squid/cache/* /var/log/squid/*
>
>
# rm -rf /var/squid/cache /var/log/squid
 
Deleted:
<
<
(If you remove the directories themselves then rpm -V frontier-squid will report them missing).
 

Changing the size of log files retained

Revision 222013-02-05 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 100 to 100
 

Moving disk cache and logs to a non-standard location

Changed:
<
<
Often the filesystems containing the default locations for the disk cache (/var/cache/squid) and logs (/var/log/squid) isn't large enough and there's more space available in another filesystem. To move them to a new location, set the cache_log, pid_filename, and coredump_dir options and the second parameter of the cache_dir option and the first parameter of the access_log option. For example:
>
>
Often the filesystems containing the default locations for the disk cache (/var/cache/squid) and logs (/var/log/squid) isn't large enough and there's more space available in another filesystem. To move them to a new location, set the cache_log and coredump_dir options, the second parameter of the cache_dir option, and the first parameter of the access_log option. For example:
 
    setoptionparameter("cache_dir", 2, "/data/squid_cache")
    setoptionparameter("access_log", 1, "/data/squid_logs/access.log")
    setoption("cache_log", "/data/squid_logs/cache.log")

Deleted:
<
<
setoption("pid_filename", "/data/squid_logs/squid.pid")
  setoption("coredump_dir", "/data/squid_cache")

Pre-create the new cache and logs directory and make them writable by the user id that squid is running under.

Changed:
<
<
Because the location of the pid_filename changes, and the stop command uses that value, it is best to first stop squid with
>
>
Don't forget to reload the configuration
 

Changed:
<
<
# service frontier-squid stop then edit customize.sh, and then start it again with
    # service frontier-squid start

>
>
# service frontier-squid reload
 
Changed:
<
<
Don't forget to remove the old directories
>
>
and then remove everything from the old directories
 

Changed:
<
<
# rm -rf /var/squid/cache /var/log/squid
>
>
# rm -rf /var/squid/cache/* /var/log/squid/*
 
Added:
>
>
(If you remove the directories themselves then rpm -V frontier-squid will report them missing).
 

Changing the size of log files retained

Revision 212013-02-04 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 56 to 56
  First, if you have not installed any frontier rpm before, execute the following command as the root user:

Changed:
<
<
# rpm -Uvh --replacefiles http://frontier.cern.ch/dist/rpms/debug/RPMS/noarch/frontier-release-1.0-1.noarch.rpm
>
>
# rpm -Uvh --replacefiles http://frontier.cern.ch/dist/rpms/RPMS/noarch/frontier-release-1.0-1.noarch.rpm
  (If you are not upgrading a previous installation done without the frontier-release package you may leave out --replacefiles).

Revision 202013-02-04 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 67 to 67
  Set it up to start at boot time with this command:

Changed:
<
<
# chkconfig --add frontier-squid
>
>
# chkconfig frontier-squid on
 

Configuration

Revision 192013-01-31 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 286 to 286
  If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories, /var/run/squid subdirectories, and the generated configuration files.
Added:
>
>
NOTE: this feature is not supported for reverse proxies because it uses the squid.conf http_port option.

Having squid listen on a privileged port

This package runs squid strictly as an unprivileged user, so it is unable to open a privileged TCP port less than 1024. The recommended way to handle that is to have squid listen on an unprivleged port and use iptables to forward a privileged port to the unprivileged port. For example, to forward port 80 to port 8000, use this:

    # iptables -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8000
You can change the port that squid listens on with this in /etc/squid/customize.sh:
    setoption("http_port","8000")
 

Personal squid on a desktop/laptop

If you want to install a Frontier squid on your personal desktop or laptop, just follow the same instructions as under Software above, except:

Revision 182013-01-28 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 132 to 131
 
    export LARGE_ACCESS_LOG=10000000000
Deleted:
<
<
 In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour.
Added:
>
>
An alternative to setting the maximum size of each log file, you can leave each log file at the default size and change the number of log files retained, for example for 51 files (about 50GB total space) set the following in /etc/squid/customize.sh:
    setoption("logfile_rotate", "50")
 It is highly recommended to keep at least 3 days worth of logs, so that problems that happen on a weekend can be investigated during working hours. If you really do not have enough disk space for logs, the log can be disabled with the following in /etc/squid/customize.sh:
    setoption("access_log", "none")

Revision 172013-01-25 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 65 to 65
  # yum install frontier-squid
Added:
>
>
Set it up to start at boot time with this command:
    # chkconfig --add frontier-squid
 

Configuration

Custom configuration is done in /etc/squid/customize.sh. That script invokes functions that edit a supplied default squid.conf source file to generate the final squid.conf that squid sees when it runs. Comments in the default installation of customize.sh give more details on what can be done with it. Whenever /etc/init.d/frontier-squid runs it generates a new squid.conf if customize.sh has been modified.

Line: 83 to 88
  setoptionparameter("cache_dir", 3, "100000")
Changed:
<
<
To have a change to customize.sh take affect while squid is running, run the following command as root:
>
>
Now that the configuration is set up, start squid with this command:
    # service frontier-squid start

To have a change to customize.sh take affect while squid is running, run the following command:

 
    # service frontier-squid reload

Revision 162013-01-23 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 192 to 192
  $ export http_proxy=http://yoursquid.your.domain:3128
Changed:
<
<
and perform the fnget.py test twice again. It should pass through your squid, and cache the response. To confirm that it worked, look at the squid access log (in /var/log/squid/access.log if you haven't moved it). The following is excerpted from an access.log file:
>
>
and perform the fnget.py test twice again. It should pass through your squid, and cache the response. To confirm that it worked, look at the squid access log (in /var/log/squid/access.log if you haven't moved it). The following is an excerpt:
 
    128.220.233.179 - - [22/Jan/2013:08:33:17 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_ HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py 1.5" "-" "Python-urllib/2.6"
Line: 201 to 201
  Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache until the cached item expires.
Added:
>
>

Log file contents

Error messages are written to cache.log (in /var/log/squid if you haven't moved it) and are generally either self-explanatory or an explanation can be found with google.

Logs of every access are written to access.log (also in /var/log/squid if you haven't moved it) and the default frontier-squid format contains these fields:

  1. Source IP address
  2. User name from ident if any (usually just a dash)
  3. User name from SSL if any (usually just a dash)
  4. Date/timestamp query finished in local time, and +0000, surrounded by square brackets
  5. The request method, URL, and protocol version, all surrounded by double quotes
  6. The http status (result) code
  7. Reply size including http headers
  8. Squid request status (e.g. TCP_MISS) and heirarchy status (e.g. DEFAULT_PARENT) separated by a colon
  9. Response time in milliseconds
  10. The contents of the X-Frontier-Id header or a dash if none, surrounded by double quotes
  11. The contents of the Referer header or a dash if none, surrounded by double quotes
  12. The contents of the User-Agent header or a dash if none, surrounded by double quotes
 

Common issues

Revision 152013-01-22 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 192 to 192
  $ export http_proxy=http://yoursquid.your.domain:3128
Changed:
<
<
and perform the wget test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log. The following is excerpted from an access.log file:
>
>
and perform the fnget.py test twice again. It should pass through your squid, and cache the response. To confirm that it worked, look at the squid access log (in /var/log/squid/access.log if you haven't moved it). The following is excerpted from an access.log file:
 
Changed:
<
<
128.220.233.179 - - [12/May/2009:08:33:17 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py 1.5" 128.220.233.179 - - [12/May/2009:08:33:19 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 809 TCP_MEM_HIT:NONE - "fnget.py 1.5"
>
>
128.220.233.179 - - [22/Jan/2013:08:33:17 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_ HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py 1.5" "-" "Python-urllib/2.6" 128.220.233.179 - - [22/Jan/2013:08:33:19 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_ HTTP/1.0" 200 809 TCP_MEM_HIT:NONE 0 "fnget.py 1.5" "-" "Python-urllib/2.6"
 

Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache until the cached item expires.

Line: 250 to 250
 Multiple squids can be enabled very simply by doing these steps:
  • Stop frontier-squid and remove the old cache and logs
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
Changed:
<
<
  • Start frontier-squid again. This will automatically detect the extra subdirectories, create the corresponding log directories, and start that number of squid processes. It also assigns each squid process to a particular core as described above. It generates a separate squid configuration file for each process in /etc/squid/.squid-N.conf.
>
>
  • Start frontier-squid again. This will automatically detect the extra subdirectories and start that number of squid processes. It will create corresponding log subdirectories and /var/run/squid subdirectories, and generate a separate squid configuration file for each process in /etc/squid/.squid-N.conf. It will also assign each squid process to a particular core as described above.
  When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0.
Changed:
<
<
If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories and the generated configuration files.
>
>
If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories, /var/run/squid subdirectories, and the generated configuration files.
 

Personal squid on a desktop/laptop

Revision 142013-01-22 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 247 to 247
  If you have either a particularly slow machine or a high amount of bandwidth available, you may not be able to get full network throughput out of a single squid process. For example, our measurements with a 10 gigabit interface on a 2010-era machine with 8 cores at 2.27Ghz showed that 3 squids were required for full throughput.
Changed:
<
<
Multiple squids can be enabled very simply by this process:
>
>
Multiple squids can be enabled very simply by doing these steps:
 
  • Stop frontier-squid and remove the old cache and logs
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
Changed:
<
<
  • Start frontier-squid again. This will automatically detect the extra subdirectories, create the corresponding log directories, and start that number of squid processes. It also assigns each squid process to a particular core as described above.
>
>
  • Start frontier-squid again. This will automatically detect the extra subdirectories, create the corresponding log directories, and start that number of squid processes. It also assigns each squid process to a particular core as described above. It generates a separate squid configuration file for each process in /etc/squid/.squid-N.conf.
  When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0.
Added:
>
>
If you want to revert to a single squid, reverse the above process including cleaning up the corresponding log directories and the generated configuration files.
 

Personal squid on a desktop/laptop

If you want to install a Frontier squid on your personal desktop or laptop, just follow the same instructions as under Software above, except:

Revision 132013-01-02 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 135 to 135
  The functionality and performance of your squid should be monitored from CERN using SNMP.
Changed:
<
<
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.202.212/32 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name). If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
>
>
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.0.0/16 and 188.185.0.0/17. If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
  The main monitoring site is at http://frontier.cern.ch/squidstats/.

Revision 122012-12-26 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 65 to 65
  # yum install frontier-squid
Deleted:
<
<
The first time you use yum to install any frontier package it will prompt you to import the GPG key; answer 'y'.
 

Configuration

Custom configuration is done in /etc/squid/customize.sh. That script invokes functions that edit a supplied default squid.conf source file to generate the final squid.conf that squid sees when it runs. Comments in the default installation of customize.sh give more details on what can be done with it. Whenever /etc/init.d/frontier-squid runs it generates a new squid.conf if customize.sh has been modified.

Revision 112012-12-26 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 56 to 56
  First, if you have not installed any frontier rpm before, execute the following command as the root user:

Changed:
<
<
# wget -O /etc/yum.repos.d/cern-frontier.repo http://frontier.cern.ch/dist/rpms/cern-frontier.repo
>
>
# rpm -Uvh --replacefiles http://frontier.cern.ch/dist/rpms/debug/RPMS/noarch/frontier-release-1.0-1.noarch.rpm
 
Added:
>
>
(If you are not upgrading a previous installation done without the frontier-release package you may leave out --replacefiles).
 
Changed:
<
<
Install the package with the following command:
>
>
Next, install the package with the following command:
 
    # yum install frontier-squid

Revision 102012-12-13 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

The frontier-squid software is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.

Changed:
<
<
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Savannah frontierdev project.
>
>
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Frontier development Savannah project.
  For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.
Changed:
<
<
After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids and ATLAS users should send an email to atlas-frontier-support@cern.ch with your site and squid machine name asking to set up further configuration and monitoring.
>
>
After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids and ATLAS users should submit a ticket to the ATLAS database operations Savannah project with your site and squid machine name asking to set up further configuration and monitoring.
  Here is what is on this page:

Revision 92012-11-02 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 30 to 30
  3) What network specs?
Changed:
<
<
The latencies will be lower to the worker nodes if you have a large bandwidth. The network is almost always the bottleneck for this system, so at least a gigabit each is highly recommended. If you have many job slots, 2 bonded gigabit network connections is even better, and squid on one core of a modern CPU can pretty much keep up with 2 gigabits. Squid is single-threaded so if you're able to supply more than 2 gigabits, multiple squid processes on the same machine need to be used to serve the full throughput. This is supported in the frontier-squid package (instructions below) but each squid needs its own memory and disk space.
>
>
The latencies will be lower to the worker nodes if you have a large bandwidth. The network is almost always the bottleneck for this system, so at least a gigabit each is highly recommended. If you have many job slots, 2 bonded gigabit network connections is even better, and squid on one core of a modern CPU can pretty much keep up with 2 gigabits. Squid is single-threaded so if you're able to supply more than 2 gigabits, multiple squid processes on the same machine need to be used to serve the full throughput. This is supported in the frontier-squid package (instructions below) but each squid needs its own memory and disk space.
  4) How many squids do I need?

Revision 82012-11-02 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Installing a Frontier local squid cache server

Line: 68 to 68
 

Configuration

Changed:
<
<
Custom configuration is done in /etc/squid/customize.sh. That script invokes functions that edit a supplied default squid.conf configuration to create squid.conf. Comments in the default installation of customize.sh give more details on what can be done with it. The edits are applied to squid.conf whenever /etc/init.d/frontier-squid is run.
>
>
Custom configuration is done in /etc/squid/customize.sh. That script invokes functions that edit a supplied default squid.conf source file to generate the final squid.conf that squid sees when it runs. Comments in the default installation of customize.sh give more details on what can be done with it. Whenever /etc/init.d/frontier-squid runs it generates a new squid.conf if customize.sh has been modified.
 
Changed:
<
<
It is very important for security that squid not be allowed to proxy requests from everywhere to everywhere. The default customize.sh allows incoming connections only from the standard private network addresses and allow outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:
>
>
It is very important for security that squid not be allowed to proxy requests from everywhere to everywhere. The default customize.sh allows incoming connections only from standard private network addresses and allows outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:
 
    setoption("acl NET_LOCAL src", "131.154.0.0/16")
Changed:
<
<
The script allows specifying many subnets - just separate them by a blank. If you would like a more restrictive policy or for other options please see the section below on restricting the destination.
>
>
The script allows specifying many subnets - just separate them by a blank. If you would like to limit the outgoing connections please see the section below on restricting the destination.
  If you want to, you can change the cache_mem option to set the size squid reserves for caching small objects in memory, but don't make it more than 1/8th of your hardware memory. The default 128 MB should be fine, leaving a lot of memory for disk buffering by the OS, because squid performs better for large objects in disk cache buffers than in its own internal memory cache.

Revision 72012-11-01 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

Installing a Frontier Local Squid Cache Server

>
>

Installing a Frontier local squid cache server

  The frontier-squid software is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.
Line: 8 to 8
  For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.
Added:
>
>
After completing a squid installation and configuration, CMS users should follow these further instructions for CMS squids and ATLAS users should send an email to atlas-frontier-support@cern.ch with your site and squid machine name asking to set up further configuration and monitoring.
 Here is what is on this page:

Line: 36 to 38
  5) How should squids be load-balanced?
Changed:
<
<
There are many ways to configure multiple squids: round-robin DNS, load-balancing networking hardware, LVS, etc. The simplest thing to do is just set up two or more squid machines independently and let Frontier handle it by making a small addition to the frontier client configuration (see below under Multiple Squid Servers). If there are many thousands of job slots, hardware-based load balancers can be easily overloaded, so DNS-based or client-based load balancing will probably be called for.
>
>
There are many ways to configure multiple squids: round-robin DNS, load-balancing networking hardware, LVS, etc. The simplest thing to do is just set up two or more squid machines independently and let Frontier handle it by making a small addition to the frontier client configuration to have the client do the load balancing (described for CMS in the section on multiple squid servers). If there are many thousands of job slots, hardware-based load balancers can be easily overloaded, so DNS-based or client-based load balancing will probably be called for.
 

Software

Line: 73 to 75
  setoption("acl NET_LOCAL src", "131.154.0.0/16")
Changed:
<
<
The script allows specifying many subnets - just separate them by a blank. If you would like a more restrictive policy or for other options please see the section below on restricting the destination.
>
>
The script allows specifying many subnets - just separate them by a blank. If you would like a more restrictive policy or for other options please see the section below on restricting the destination.
  If you want to, you can change the cache_mem option to set the size squid reserves for caching small objects in memory, but don't make it more than 1/8th of your hardware memory. The default 128 MB should be fine, leaving a lot of memory for disk buffering by the OS, because squid performs better for large objects in disk cache buffers than in its own internal memory cache.
Line: 134 to 136
  The functionality and performance of your squid should be monitored from CERN using SNMP.
Changed:
<
<
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.202.212/32 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name). If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
>
>
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.202.212/32 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name). If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
  The main monitoring site is at http://frontier.cern.ch/squidstats/.
Line: 218 to 220
  export SETSQUIDAFFINITY=true
Changed:
<
<
If that little boost isn't enough, try running multiple squid processes on the same machine.
>
>
If that little boost isn't enough, try running multiple squid processes on the same machine.
 

Alternate configurations

Line: 249 to 251
 Multiple squids can be enabled very simply by this process:
  • Stop frontier-squid and remove the old cache and logs
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
Changed:
<
<
  • Start frontier-squid again. This will automatically detect the extra subdirectories, create the corresponding log directories, and start that number of squid processes. It also assigns each squid process to a particular core as described above.
>
>
  • Start frontier-squid again. This will automatically detect the extra subdirectories, create the corresponding log directories, and start that number of squid processes. It also assigns each squid process to a particular core as described above.
  When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0.
Changed:
<
<

Personal Squid on a Desktop/Laptop

>
>

Personal squid on a desktop/laptop

 
Changed:
<
<
If you want to install a Frontier Squid on your personal desktop or laptop, just follow the same instructions as under Software above, except:
>
>
If you want to install a Frontier squid on your personal desktop or laptop, just follow the same instructions as under Software above, except:
 
  • For the NET_LOCAL acl, use "127.0.0.1/32"
  • For the cache_dir size you can leave it at the default 10000 or even perhaps cut it down to 5000 if you want to.

Revision 62012-10-30 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Deleted:
<
<
THIS PAGE IS UNDER CONSTRUCTION
 

Installing a Frontier Local Squid Cache Server

The frontier-squid software is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.

Changed:
<
<
If you have any problems with the software or installation, submit a support request to the Savannah frontierdev project.
>
>
If you have any problems with the software or installation, or would like to suggest an improvement to the documentation, please submit a support request to the Savannah frontierdev project.
  For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.
Added:
>
>
Here is what is on this page:
 

Hardware

Line: 40 to 40
 

Software

Changed:
<
<
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
>
>
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing directly from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 

Preparation

By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then before installing the rpm create the file /etc/squid/squidconf with the following contents:
Line: 66 to 66
 

Configuration

Changed:
<
<
Custom configuration is done in /etc/squid/customize.sh. It performs edits on a supplied default squid.conf configuration. Comments in the default installation give more details on what can be done with it. The edits are applied to squid.conf whenever /etc/init.d/frontier-squid is run.
>
>
Custom configuration is done in /etc/squid/customize.sh. That script invokes functions that edit a supplied default squid.conf configuration to create squid.conf. Comments in the default installation of customize.sh give more details on what can be done with it. The edits are applied to squid.conf whenever /etc/init.d/frontier-squid is run.
 
Changed:
<
<
It is very important for security that squid not be allowed to proxy requests from anywhere to anywhere. The default customize.sh allows incoming connections from the standard private network addresses and outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:
>
>
It is very important for security that squid not be allowed to proxy requests from everywhere to everywhere. The default customize.sh allows incoming connections only from the standard private network addresses and allow outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:
 
    setoption("acl NET_LOCAL src", "131.154.0.0/16")
Changed:
<
<
The script allows specifying many subnets - just separate them by a blank. If you would like a more restrictive policy or other options please see the section below on other ACL options.
>
>
The script allows specifying many subnets - just separate them by a blank. If you would like a more restrictive policy or for other options please see the section below on restricting the destination.
  If you want to, you can change the cache_mem option to set the size squid reserves for caching small objects in memory, but don't make it more than 1/8th of your hardware memory. The default 128 MB should be fine, leaving a lot of memory for disk buffering by the OS, because squid performs better for large objects in disk cache buffers than in its own internal memory cache.
Line: 134 to 134
  The functionality and performance of your squid should be monitored from CERN using SNMP.
Changed:
<
<
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.202.212/32 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name).
>
>
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.202.212/32 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name). If you run multiple squid processes, each one will need to be separately monitored. They listen on increasing port numbers, the first one on port 3401, the second on 3402, etc.
  The main monitoring site is at http://frontier.cern.ch/squidstats/.
Changed:
<
<
UP TO HERE

Testing Your Installation

>
>

Testing the installation

  Download the following python script fnget.py (Do a right-click on the link and save the file as fnget.py )
Changed:
<
<
Test access to the Frontier server at CERN with the following commands:
>
>
Test access to a Frontier server at CERN with the following commands:
 
    $ chmod +x fnget.py #(only first time)
    $ ./fnget.py --url=http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier --sql="select 1 from dual"
Changed:
<
<
This should be the response:
>
>
The response should be similar to this:
 
Using Frontier URL:  http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier
Query:  select 1 from dual
Changed:
<
<
Decode results: 1 Refresh cache: 0
>
>
Decode results: True Refresh cache: False
  Frontier Request: http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_
Changed:
<
<
Query started: 05/12/09 13:46:50 EDT WARNING: no timeout available in python older than 2.4 Query ended: 05/12/09 13:46:50 EDT Query time: 0.64064002037 [seconds]
>
>
Query started: 10/30/12 20:04:09 CET Query ended: 10/30/12 20:04:09 CET Query time: 0.0179278850555 [seconds]
  Query result:
Changed:
<
<
>
>
  eJxjY2BgYDRkA5JsfqG+Tq5B7GxgEXYAGs0CVA==
Line: 188 to 185
  This will return whatever you type in the select statement, for example change 1 to 'hello'. The "dual" table is a special debugging feature of Oracle that just returns what you send it.
Changed:
<
<
Now test your squid,
>
>
Now to test your squid, replace yoursquid.your.domain in the following command with the name of your squid machine
 

Changed:
<
<
$ export http_proxy=http://your.squid.url:3128
>
>
$ export http_proxy=http://yoursquid.your.domain:3128
 
Changed:
<
<
and perform the test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log (the following is excerpted form the access.log file):
>
>
and perform the wget test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log. The following is excerpted from an access.log file:
 
    128.220.233.179 - - [12/May/2009:08:33:17 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py 1.5"
    128.220.233.179 - - [12/May/2009:08:33:19 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 809 TCP_MEM_HIT:NONE - "fnget.py 1.5"
Changed:
<
<
Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache.
>
>
Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache until the cached item expires.
 
Changed:
<
<

Issues

>
>

Common issues

 

SELinux

Line: 214 to 211
  takes care of this problem.
Added:
>
>

Inability to reach full network throughput

If you have a CPU that can't quite keep up with full network throughput, we have found that up to an extra 15% throughput can be achieved by binding the single-threaded squid process to a single core, to maximize use of the per-core on-chip caches. This is not enabled by default, but you can enable it by putting the following in /etc/sysconfig/frontier-squid:

    export SETSQUIDAFFINITY=true

If that little boost isn't enough, try running multiple squid processes on the same machine.

 

Alternate configurations

Line: 221 to 226
  The default behavior is to allow the squid to be used for any destination. If you want to restrict the squid to be used only for CMS Conditions Data, then you simply have to add two lines to customize.sh that enable a couple of lines in squid.conf which are already there commented out:
Changed:
<
<
>
>

  uncomment("acl RESTRICT_DEST") uncomment("http_access deny RESTRICT_DEST")
Changed:
<
<
>
>
  If for some reason you want to have a different destination or destinations you can use a regular expression, for example:
Changed:
<
<
 setoptionparameter("acl RESTRICT_DEST", 3, "^(cmsfrontier.*|cernvm.*)\\.cern\\.ch$")
>
>
    setoptionparameter("acl RESTRICT_DEST", 3, "^(((cms|atlas).*frontier.*)\\.cern\\.ch)|frontier.*\\.racf\\.bnl\\.gov$")

  uncomment("http_access deny RESTRICT_DEST")
Changed:
<
<

Personal Squid on a Desktop/Laptop

>
>
 
Changed:
<
<
If you want to install a Frontier Squid on your personal Desktop, just follow the same instructions as under "Software" above, except:
>
>
Once you have restricted the destination, it isn't so important anymore to restrict the source. If you want to leave it unrestricted you can change the NET_LOCAL acl to 0.0.0.0/0:
  setoption("acl NET_LOCAL src", "0.0.0.0/0")
 
Changed:
<
<
You don't need a dbfrontier account. Your own account will work or any account but not root.
You may ignore any instructions about monitoring or registration.
>
>

Running multiple squid processes on the same machine

 
Changed:
<
<
During the ./configure step:
>
>
If you have either a particularly slow machine or a high amount of bandwidth available, you may not be able to get full network throughput out of a single squid process. For example, our measurements with a 10 gigabit interface on a 2010-era machine with 8 cores at 2.27Ghz showed that 3 squids were required for full throughput.
 
Changed:
<
<
When it asks for the network, use 127.0.0.1/32
When it asks for memory, use something modest like 128
When it asks for disk space, also use something modest like 5000.
>
>
Multiple squids can be enabled very simply by this process:
  • Stop frontier-squid and remove the old cache and logs
  • Create subdirectories under your cache directory called 'squid0', 'squid1', up to 'squidN-1' for N squids, making sure they are writable by the user id that your squid runs under
  • Start frontier-squid again. This will automatically detect the extra subdirectories, create the corresponding log directories, and start that number of squid processes. It also assigns each squid process to a particular core as described above.
 
Changed:
<
<
In your site-local-config.xml add the local proxy
    <frontier-connect>
       <proxy url="http://localhost:3128"/>
Remember, after you do the installation, you have to do a manual start of the squid.
>
>
When running multiple squids, all of the memory & disk usage is multiplied by the number of squids. For example, if you choose a cache_dir size of 100GB, running 3 squids will require 300GB for cache space. All the squids listen on the same port and take turns handling requests. Only squid0 will contact the upstream servers; the others forward requests to squid0.
 
Changed:
<
<
The only thing that has to be done as root is the automatic start on boot.
>
>

Personal Squid on a Desktop/Laptop

 
Changed:
<
<

Laptop

>
>
If you want to install a Frontier Squid on your personal desktop or laptop, just follow the same instructions as under Software above, except:
 
Changed:
<
<
For a laptop, just follow the same instructions as for a Desktop, plus a few extra things since a laptop is turned on and off frequently, and might want to run without a network connection.
>
>
  • For the NET_LOCAL acl, use "127.0.0.1/32"
  • For the cache_dir size you can leave it at the default 10000 or even perhaps cut it down to 5000 if you want to.
 
Changed:
<
<
In the file .../frontier-cache/utils/bin/fn-local-squid.sh find the lines:
start()
{
 if cleancache; then
  start_squid
 fi
and comment out the call to cleancache
start()
{
# if cleancache; then
  start_squid
# fi
>
>

Laptop disconnected network operation

 
Changed:
<
<
Then add this line to customize.sh
 setoption("offline_mode", "on")
Of course, any changes to the underlying database while this is set won't automatically be noticed, even if you are connected to the network. To manually refresh the cache while squid is stopped use the cleancache command as described above. Caution: Sometimes the squid disk cache can get corrupted, such as by not shutting the squid down cleanly. If that happens and the squid won't start, you'll need to clean the cache but you won't be able to run without reconnecting to the network to reload the cache.
>
>
If you want to be able to run a laptop disconnected from the network, add the following to customize.sh:
 
Changed:
<
<
As an alternative to offline_mode on you could instead use
>
>

  setoption("cachemgr_passwd", "none offline_toggle")
Changed:
<
<
>
>
 
Changed:
<
<
Now you can switch back and forth between offline (offline_mode on) and online (offline_mode off) just by doing:
install_dir/frontier-cache/squid/bin/squidclient mgr:offline_toggle
where install_dir is wherever you put it.
>
>
Then, load up the cache by running your user job once while the network is attached, and run the following command once:
    squidclient mgr:offline_toggle
It should report "offline_mode is now ON" which will prevent cached items from expiring. Then as long as everything was preloaded and the laptop doesn't reboot (because starting squid clears the cache) you should be able to re-use the cached data. You can switch back to normal mode with the same command or by stopping and starting squid.
 
Deleted:
<
<
Another possiblity is to comment out the offline_mode on line that you added above to customize.sh, and only uncomment it (and tell squid to reload) when you want to run without network connection or use offline_toggle.
  Responsible: DaveDykstra

Revision 52012-10-27 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
THIS PAGE IS UNDER CONSTRUCTION
Line: 111 to 111
  Don't forget to remove the old directories

Changed:
<
<
# rm -rf /var/squid/cache /var/log/squid=
>
>
# rm -rf /var/squid/cache /var/log/squid
 
Changed:
<
<

Changing the size of log files retained

>
>

Changing the size of log files retained

 
Changed:
<
<
UP TO HERE

The hourly.sh script will rotate the logs if access.log goes over a given size, default 1GB. You can change that value by setting the environment variable LARGE_ACCESS_LOG to a different number of bytes. For example for 10GB you can use:

8 * * * * LARGE_ACCESS_LOG=10000000000 /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1

In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour. If disk space for the logs is a concern see the section on the Access Log Growth Issue below.

As Root, Set Up Start at Boot Time

(This is the only step to be done as root.)

Then as root:

copy install_dir/frontier-cache/utils/init.d/frontier-squid.sh into /etc/init.d

Then after the copy, root should do:

>
>
The access.log is rotated each night, and also if it is over a given size (default 1 GB) when it checks each hour. You can change that value by exporting the environment variable LARGE_ACCESS_LOG in /etc/sysconfig/frontier-squid to a different number of bytes. For example for 10GB each you can use:
 
Changed:
<
<
/sbin/chkconfig --add frontier-squid.sh

Set Up CMS Working Environment

Here is the information about how to access CMS conditions data(@T0) access by means of frontier.

For site frontier configuration, computing site responsibles should pick up the xml fragment for calib-data from this file:

site-local-config.xml_sample

In the frontier-connect section include a line like:

<proxy url="http://localcmsproxy1:3128"/> 
with localcmsproxy1 set to the correct local proxy value
>
>
    export LARGE_ACCESS_LOG=10000000000
 
Changed:
<
<
Note: the default working port of the squid is 3128/tcp
>
>
In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour.
 
Changed:
<
<
-insert this xml fragment into the exsiting site-local-config.xml and commit into computing CVS: CMSSW/COMP/SITECONF/${SITE}/JobConfig
>
>
It is highly recommended to keep at least 3 days worth of logs, so that problems that happen on a weekend can be investigated during working hours. If you really do not have enough disk space for logs, the log can be disabled with the following in /etc/squid/customize.sh:
    setoption("access_log", "none")
Then after doing service frontier-squid reload (or service frontier-squid start if squid was stopped) remember to remove all the old access.log* files.
 
Changed:
<
<
User's guide (with much more detail) about site-local-config.xml is in:
>
>

Enabling monitoring

 
Changed:
<
<
https://twiki.cern.ch/twiki/bin/view/CMS/SWIntTrivial#SiteLocalConfig
Here is a nice example of a <calib-data> section in a site-local-config.xml
- <calib-data> 
 - <frontier-connect>  
    <proxy url="http://io.hep.kbfi.ee:3128" />  
    <server url="http://cmsfrontier.cern.ch:8000/FrontierInt" />   
    <server url="http://cmsfrontier1.cern.ch:8000/FrontierInt" />  
    <server url="http://cmsfrontier2.cern.ch:8000/FrontierInt" />  
    <server url="http://cmsfrontier3.cern.ch:8000/FrontierInt" />  
   </frontier-connect>  
  </calib-data>  
>
>
The functionality and performance of your squid should be monitored from CERN using SNMP.
 
Changed:
<
<

Multiple Squid Servers

>
>
To enable this, your site should open incoming firewall(s) to allow UDP requests to port 3401 from 128.142.202.212/32 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name).
 
Changed:
<
<
If you have more than one squid server, and you want Frontier to do the load balancing, set up each squid independently and simply add a proxy line for each squid and one extra loadbalance line to site-local-config.xml: (You should only do this if all the squids are in the same location.)
>
>
The main monitoring site is at http://frontier.cern.ch/squidstats/.
 
Changed:
<
<
<load balance="proxies"/>
<proxy url="http://localcmsproxy1:3128"/> 
<proxy url="http://localcmsproxy2:3128"/> 
<proxy url="http://localcmsproxy3:3128"/> 
>
>
UP TO HERE
 

Testing Your Installation

Line: 186 to 146
  Test access to the Frontier server at CERN with the following commands:
Changed:
<
<
chmod +x fnget.py #(only first time)

./fnget.py --url=http://cmsfrontier.cern.ch:8000/Frontier/Frontier --sql="select 1 from dual"

>
>
    $ chmod +x fnget.py #(only first time)
    $ ./fnget.py --url=http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier --sql="select 1 from dual"
  This should be the response:
Changed:
<
<
Using Frontier URL: http://cmsfrontier.cern.ch:8000/Frontier/Frontier
>
>
Using Frontier URL: http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier
 Query: select 1 from dual Decode results: 1 Refresh cache: 0

Frontier Request:

Changed:
<
<
http://cmsfrontier.cern.ch:8000/Frontier/Frontier?type=frontier_request:1:DEFAUL T&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_
>
>
http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_
  Query started: 05/12/09 13:46:50 EDT WARNING: no timeout available in python older than 2.4
Line: 213 to 173
  eJxjY2BgYDRkA5JsfqG+Tq5B7GxgEXYAGs0CVA==
Changed:
<
<
<quality error="0" md5="5544fd3e96013e694f13d2e13b44ee3c" records="1" full_si ze="25"/>
>
>
 
Line: 231 to 190
  Now test your squid,
Changed:
<
<
export http_proxy=http://your.squid.url:3128
>
>
    $ export http_proxy=http://your.squid.url:3128
 
Changed:
<
<
and perform the test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log (following excerpted form the access.log file usually in squid/var/logs :
>
>
and perform the test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log (the following is excerpted form the access.log file):
 
Changed:
<
<
128.220.233.179 - - [12/May/2009:08:33:17 +0000] "GET http://cmsfrontier.cern.ch :8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py 1.5" 128.220.233.179 - - [12/May/2009:08:33:19 +0000] "GET http://cmsfrontier.cern.ch :8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 809 TCP_MEM_HIT:NONE - "fnget.py 1.5"
>
>
128.220.233.179 - - [12/May/2009:08:33:17 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py 1.5" 128.220.233.179 - - [12/May/2009:08:33:19 +0000] "GET http://cmsfrontier.cern.ch:8000/FrontierProd/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 809 TCP_MEM_HIT:NONE - "fnget.py 1.5"
 

Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache.

Deleted:
<
<
Another possibility for testing your squid is to run the SAM squid test on a worker node by hand.

Register Your Server

To register, please submit as a bug report to

http://savannah.cern.ch/bugs/?func=additem&group=frontier

with the following information:

 
Changed:
<
<
  • Site - Site name
  • Tier - Tier level
  • location - Institution
  • CE - Node we submit grid test jobs to
  • Contact - Contact person’s name
  • email - Contacts email
  • ip/mask - CE nodes addresses that as seen on the WAN
  • Squid Node - Name of the squid node for monitoring
  • Software - Which tarball or RPM was used for the installation

Tier-3 sites should also register.

Monitoring

The functionality of your squid should be monitored from CERN and Fermilab using SNMP.

To enable this, your site should open port 3401/udp to requests from: 128.142.202.212/255.255.255.255 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name) and 131.225.240.232/255.255.255.255. The former is the main site, and the latter is a backup site at Fermilab.

The main monitoring site is at http://frontier.cern.ch/squidstats/.

>
>

Issues

 

SELinux

Line: 285 to 214
  takes care of this problem.
Deleted:
<
<

Some Useful Commands

install_dir/frontier-cache/utils/bin/fn-local-squid.sh with any parameter or no parameter will recreate squid.conf after changing customize.sh

install_dir/frontier-cache/squid/sbin/squid -k parse will just read squid.conf to see if it makes sense

install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload sends a HUP signal and has squid reread squid.conf

install_dir/frontier-cache/utils/bin/fn-local-squid.sh status checks if squid is running

install_dir/frontier-cache/utils/bin/fn-local-squid.sh restart stops squid and starts squid without clearing the cache

install_dir/frontier-cache/utils/bin/fn-local-squid.sh cleancache deletes and recreates the cache, like a start does, but without starting squid

install_dir/frontier-cache/squid/bin/squidclient mgr:info outputs operational information about your squid

Access Log Growth Issue

With many active clients, it is still possible for the squid access.log to grow to unmanageable size. The squid will crash if it runs out of available diskspace. There are a couple ways to avoid this problem:

1) Make sure that you have the hourly.sh cron job enabled as described in the Set Up Cron Job section above to rotate the log when it grows over a size you choose.

 
Changed:
<
<
2) The other possibility is to disable writing to access.log by putting the following in install_dir/frontier-cache/squid/etc/customize.sh:
>
>

Alternate configurations

 
Changed:
<
<
setoption("access_log", "none")

and then do

install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload to update squid.conf and load it if the squid is already running, otherwise just use start instead of reload.

The squid installation script has the access log turned on by default. It is recommended that a new installation be installed with it on, the functioning of the squid verified by reading the access log, then if disk space is limited, turn the access log off when the squid is in production. Even if you do turn the access log off, you should still run the daily.sh script once per day to rotate the other logs.

Filedescriptors

At some installations with a very large number of worker nodes it may be possible to see error messages about running out of filedescriptors in your cache.log. It is easy to avoid this problem:

1) First, make sure your squid version is at least squid-2.7.X

2) As root, add the following line to /etc/security/limits.conf

* - nofile 16384

3) Reboot the machine.

You can check your file descriptor limit and usage by doing:

install_dir/frontier-cache/squid/bin/squidclient mgr:info

Other ACL options

>
>

Restricting the destination

  The default behavior is to allow the squid to be used for any destination. If you want to restrict the squid to be used only for CMS Conditions Data, then you simply have to add two lines to customize.sh that enable a couple of lines in squid.conf which are already there commented out:
Line: 351 to 234
 
Changed:
<
<
If you modify customize.sh while the squid is running, remember to do a

install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload

so that the changes get used.

Personal Squid on a Desktop/Laptop

>
>

Personal Squid on a Desktop/Laptop

  If you want to install a Frontier Squid on your personal Desktop, just follow the same instructions as under "Software" above, except:
Line: 375 to 252
  The only thing that has to be done as root is the automatic start on boot.
Changed:
<
<

Laptop

>
>

Laptop

  For a laptop, just follow the same instructions as for a Desktop, plus a few extra things since a laptop is turned on and off frequently, and might want to run without a network connection.

Revision 42012-10-26 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
THIS PAGE IS UNDER CONSTRUCTION
Line: 66 to 66
 

Configuration

Changed:
<
<
Custom configuration is done in /etc/squid/customize.sh. It performs edits on a supplied default squid.conf configuration.
>
>
Custom configuration is done in /etc/squid/customize.sh. It performs edits on a supplied default squid.conf configuration. Comments in the default installation give more details on what can be done with it. The edits are applied to squid.conf whenever /etc/init.d/frontier-squid is run.
  It is very important for security that squid not be allowed to proxy requests from anywhere to anywhere. The default customize.sh allows incoming connections from the standard private network addresses and outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:

Line: 75 to 75
  The script allows specifying many subnets - just separate them by a blank. If you would like a more restrictive policy or other options please see the section below on other ACL options.
Changed:
<
<
UP TO HERE

On prompt enter the amount of cache memory (in MB) the squid should use. This should be at most 1/8 of your hardware memory. Probably 128 MB would be fine, leaving a lot of memory for disk buffering by the OS because squid performs better for large objects out of the disk cache than the memory cache.

On prompt enter the amount of disk space (in MB) the squid should use for a cache. One suggestion is to set this size at 70% of the available space in your disk partition to allow room for the executables, log files, etc. It should be at least 20000.

You can double check your responses to the prompts by reading Makefile.conf.inc and edit them there before running make if you wish.

Then do:

make

make install

>
>
If you want to, you can change the cache_mem option to set the size squid reserves for caching small objects in memory, but don't make it more than 1/8th of your hardware memory. The default 128 MB should be fine, leaving a lot of memory for disk buffering by the OS, because squid performs better for large objects in disk cache buffers than in its own internal memory cache.
 
Changed:
<
<
After that you should examine /install_dir/frontier-cache/squid/etc/customize.sh and make any changes or other customizations you want to. For details on the editing functions available see /install_dir/frontier-cache/squid/etc/customhelps.awk.

Manual Control of the Server

To do a manual start/stop of the server (as user dbfrontier):

/install_dir/frontier-cache/utils/bin/fn-local-squid.sh start

You can also stop it if you need to:

/install_dir/frontier-cache/utils/bin/fn-local-squid.sh stop

Remember to start your server after you have installed it.

>
>
Change the size of the cache_dir (the third parameter) to your desired size in MB. The default is only 10 GB which is rather stingy. For example, for 100 GB set it to this:
    setoptionparameter("cache_dir", 3, "100000")
 
Changed:
<
<

Setup for different install directories with each release

>
>
To have a change to customize.sh take affect while squid is running, run the following command as root:
    # service frontier-squid reload
 
Changed:
<
<
If you choose to use a different install directory for each release, do the following extra things:
>
>

Moving disk cache and logs to a non-standard location

 
Changed:
<
<
  • Create a symbolic link at a place you will re-use for each new installation, and use that for the cron job described in the next section and in /etc/init.d/frontier-squid.sh described in the following two sections, so those don't need to be reinstalled for every release.
  • Either remember to clean out the old installation's disk cache (in /install_dir/frontier-cache/squid/var/cache) and logs (in /install_dir/frontier-cache/squid/var/logs) each time or (better) edit customize.sh to set the cache_log, pid_filename, and coredump_dir options and the second parameter of the cache_dir option and the first parameter of the access_log option to use common directories that you re-use for each new installation. This has an added advantage of not requiring a lot of disk space where you install the software but rather where you choose to put the cache and logs. For example:
>
>
Often the filesystems containing the default locations for the disk cache (/var/cache/squid) and logs (/var/log/squid) isn't large enough and there's more space available in another filesystem. To move them to a new location, set the cache_log, pid_filename, and coredump_dir options and the second parameter of the cache_dir option and the first parameter of the access_log option. For example:

  setoptionparameter("cache_dir", 2, "/data/squid_cache") setoptionparameter("access_log", 1, "/data/squid_logs/access.log") setoption("cache_log", "/data/squid_logs/cache.log") setoption("pid_filename", "/data/squid_logs/squid.pid") setoption("coredump_dir", "/data/squid_cache")
Changed:
<
<
>
>
 
Changed:
<
<

Set Up Cron Job

>
>
Pre-create the new cache and logs directory and make them writable by the user id that squid is running under.
 
Changed:
<
<
As user dbfrontier, set up cron jobs to rotate the logs, with crontab entries like this:
>
>
Because the location of the pid_filename changes, and the stop command uses that value, it is best to first stop squid with
    # service frontier-squid stop
then edit customize.sh, and then start it again with
    # service frontier-squid start
 
Changed:
<
<
7 7 * * * /install_dir/frontier-cache/utils/cron/daily.sh >/dev/null 2>&1
8 * * * * /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1
>
>
Don't forget to remove the old directories
    # rm -rf /var/squid/cache /var/log/squid=
 
Changed:
<
<
You could get the above crontab by doing (with the appropriate value of install_dir)
>
>

Changing the size of log files retained

 
Changed:
<
<
crontab /install_dir/frontier-cache/utils/cron/crontab.dat
>
>
UP TO HERE
 
Changed:
<
<
You can change the hour and minute as you like, but leave hourly.sh to be one minute after daily.sh, and avoid multiples of 5 for the minute because it can interfere with the monitoring probes which happen every 5 minutes. The hourly.sh script will rotate the logs if access.log goes over a given size, default 1GB. You can change that value by setting the environment variable LARGE_ACCESS_LOG to a different number of bytes. For example for 10GB you can use:
>
>
The hourly.sh script will rotate the logs if access.log goes over a given size, default 1GB. You can change that value by setting the environment variable LARGE_ACCESS_LOG to a different number of bytes. For example for 10GB you can use:
  8 * * * * LARGE_ACCESS_LOG=10000000000 /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1

Revision 32012-10-26 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
THIS PAGE IS UNDER CONSTRUCTION
Line: 6 to 6
  The frontier-squid software is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.
Deleted:
<
<
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 If you have any problems with the software or installation, submit a support request to the Savannah frontierdev project.

For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.

Line: 41 to 39
 There are many ways to configure multiple squids: round-robin DNS, load-balancing networking hardware, LVS, etc. The simplest thing to do is just set up two or more squid machines independently and let Frontier handle it by making a small addition to the frontier client configuration (see below under Multiple Squid Servers). If there are many thousands of job slots, hardware-based load balancers can be easily overloaded, so DNS-based or client-based load balancing will probably be called for.

Software

Added:
>
>
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 

Preparation

By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then before installing the rpm create the file /etc/squid/squidconf with the following contents:

Line: 51 to 52
 

Installation

Changed:
<
<
First, if you do not installed any frontier rpm before, execute the following command as the root user:
>
>
First, if you have not installed any frontier rpm before, execute the following command as the root user:
 
    # wget -O /etc/yum.repos.d/cern-frontier.repo http://frontier.cern.ch/dist/rpms/cern-frontier.repo
Added:
>
>
Install the package with the following command:
    # yum install frontier-squid

The first time you use yum to install any frontier package it will prompt you to import the GPG key; answer 'y'.

Configuration

 
Changed:
<
<
If the directory you are installing into does not yet contain customize.sh, you will also be prompted for the old installation path. If customize.sh is found in the old installation path, it will be copied into the source directory and the configure step will be finished. If you like, you can avoid the first two questions by passing "--prefix=/install_dir" and "--oldprefix=/oldinstall_dir" parameters to ./configure. If you have not previously installed a release that supports customize.sh, you will be asked a few additional questions about basic configuration parameters.
>
>
Custom configuration is done in /etc/squid/customize.sh. It performs edits on a supplied default squid.conf configuration.
 
Changed:
<
<
On prompt enter network/netmask which is allowed to access the Squid.
>
>
It is very important for security that squid not be allowed to proxy requests from anywhere to anywhere. The default customize.sh allows incoming connections from the standard private network addresses and outgoing connections to anywhere. If the machines that will be using squid are not on a private network, change customize.sh to include the network/maskbits for your network. For example:
    setoption("acl NET_LOCAL src", "131.154.0.0/16")
 
Changed:
<
<
Examples: 131.154.184.0/255.255.255.0 or 131.154.0.0/255.255.0.0
>
>
The script allows specifying many subnets - just separate them by a blank. If you would like a more restrictive policy or other options please see the section below on other ACL options.
 
Changed:
<
<
The script does allow to specify many subnets - just separate them by a blank. If you just hit enter, the standard private network addresses 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 will be allowed. NOTE: The default behavior is to allow any IP address you specify here to use the squid to cache objects coming from any destination address. If you would like a more restrictive policy or other options please see the section below on other ACL options.
>
>
UP TO HERE
  On prompt enter the amount of cache memory (in MB) the squid should use. This should be at most 1/8 of your hardware memory. Probably 128 MB would be fine, leaving a lot of memory for disk buffering by the OS because squid performs better for large objects out of the disk cache than the memory cache.
Line: 349 to 361
  uncomment("http_access deny RESTRICT_DEST")
Deleted:
<
<
Another possible configuration is to allow worker nodes at other sites to use your squid, although we discourage that because many worker nodes can use large amounts of bandwidth over the wide area network. If you still want to do it, it can be done by adding extra lines to your customize.sh. The order of these lines is important, so they need to be "anchored" to others, for example like this:
 insertline("acl NET_LOCAL", "acl T2FOO src x.x.x.x/x.x.x.x")
 insertline("acl NET_LOCAL", "acl T2BAR src x.x.x.x/x.x.x.x")
 insertline("http_access allow NET_LOCAL", "http_access allow T2FOO")
 insertline("http_access allow NET_LOCAL", "http_access allow T2BAR")

In addition, you have to make sure there are holes in any site or machine firewalls that allow these other worker nodes access to port 3128 on your squid.

The default configuration permits incoming accesses from any standard private network address 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16. To eliminate that default behavior add this line:

 commentout("allow localnet")

Finally, the remote sites must make an appropriate addition to their site-local-config.xml.

  If you modify customize.sh while the squid is running, remember to do a
Line: 426 to 422
  Another possiblity is to comment out the offline_mode on line that you added above to customize.sh, and only uncomment it (and tell squid to reload) when you want to run without network connection or use offline_toggle.
Changed:
<
<
%RESPONSIBLE% DaveDykstra
>
>
Responsible: DaveDykstra

Revision 22012-10-26 - DaveDykstra

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<

>
>
THIS PAGE IS UNDER CONSTRUCTION
 
Changed:
<
<

Installing a Frontier Local Squid Cache Server

>
>

Installing a Frontier Local Squid Cache Server

 
Changed:
<
<
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also InstallSquidTarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions.

If you have any problems with the software or installation, submit a bug report to http://savannah.cern.ch/bugs/?func=additem&group=frontierdev.

For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.

>
>
The frontier-squid software is a patched version of the standard squid http proxy cache software, pre-configured for use by the Frontier distributed database caching system. This installation is recommended for use by Frontier in the LHC CMS & ATLAS projects, and also works well with the CernVM FileSystem.
 
Added:
>
>
The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also instructions for installing from the frontier-squid tarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions. If, for some reason, you prefer to use a non frontier-squid distribution of squid, see MyOwnSquid.
 
Changed:
<
<

Hardware

>
>
If you have any problems with the software or installation, submit a support request to the Savannah frontierdev project.
 
Changed:
<
<
The first step is to decide what hardware you want to run the squid cache server on. These are some FAQ's

1) Do I need to dedicate a node to squid and only squid?

This is up to you. It is a strongly recommended. It depends on how many jobs try to access the squid simultaneously and what else the machine is used for (see question 2). Large sites may need more than one squid (see question 4). The node needs to have network access to the internet, and be visible to the worker nodes. Virtual machines can help isolate other uses of a physical machine, but it doesn't isolate disk and especially network usage.

2) What hardware specs (CPU, memory, disk cache)?

For most purposes 2-core 2GHZ, 2GB, 100 GB should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm rotates the logs every day and removes after 10 rotates, and once an hour it will also rotate if the log is bigger than 1 GB. From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste. You should also avoid network filesystems such as AFS and NFS for the disk cache.

Here is a description of squid memory usage: If you have a decent amount of spare memory, the kernel will use that as page cache, so it's a good chance that frequenty-requested items will, in fact, be served from RAM (via the page cache) even if it's not squid's RAM. There's also a design bottleneck in squid that limits cpu efficiency of large cache_mem objects, so resist the urge to give squid all your available memory. Let cache_mem handle your small objects and the kernel handle the larger ones.

3) What network specs?

The latencies will be lower to the worker nodes if you have a large bandwidth. The network is almost always the bottleneck for this system, so at least a gigabit each is highly recommended. If you have many job slots, 2 bonded gigabit network connections is even better, and squid on one core of a modern CPU can pretty much keep up with 2 gigabits. Squid is single-threaded so if you're able to supply more than 2 gigabits, multiple squid processes on the same machine need to be used to serve the full throughput. This is supported in the frontier-squid package (instructions below) but each squid needs its own memory and disk space.

4) How many squids do I need?

Sites with over 500 job slots should have at least 2 squids for reliability. We currently estimate that sites should have one gigabit on a squid per 1000 grid job slots. A lot depends on how quickly jobs start; an empty batch queue that suddenly fills up will need more squids. The number of job slots that can be safely handled per gigabit increases as the number of slots increase because the chances that they all start at once tends to go down.

5) How should squids be load-balanced?

There are many ways to configure multiple squids: round-robin DNS, load-balancing networking hardware, LVS, etc. The simplest thing to do is just set up two or more squid machines independently and let Frontier handle it by making a small addition to the frontier client configuration (see below under Multiple Squid Servers). If there are many thousands of job slots, hardware-based load balancers can be easily overloaded, so DNS-based or client-based load balancing will probably be called for.

---++ Software
---+++ Download and Install Software
This site distributes the Frontier version of squid as a tarball. Someone else repackages it as an RPM for Scientific Linux. The documentation for those is being improved but for now the best documentation is the package README file plus the general installation instructions in the frontier rpms README file. Note that many of the same considerations apply to the rpm as the tarball distribution.

(If, for some reason, you prefer to use a different version of squid, see MyOwnSquid)

The second step is to create an account with username dbfrontier on your hardware. (If for some reason, you can't use the name dbfrontier, any name will actually work.) Then as user dbfrontier (NOT root) download a tarball into this account. The current one is:

http://frontier.cern.ch/dist/frontier-squid-2.7.STABLE9-9.tar.gz

Unpack the tarball:

*tar -xvzf frontier-squid-2.7.STABLE9-9.tar.gz*

*cd frontier-squid-2.7.STABLE9-9*

*./configure*

On prompt enter the directory name where the Squid will be installed. This directory holds the working software, cache, and logs so there should be at least 100 GB available (unless you relocate the cache and logs as described below). This directory is called "/install_dir" below. It should be on a local disk of the computer you are using and not NFS or AFS mounted. You should also avoid RAID, particularly RAID5. Note that the directory you enter should be an absolute (fully qualified) directory name and not a relative one. You may either re-use a previous install directory or create a new one for each release. Creating a new directory for each release makes it easier to back out to a previous release and ensures a clean installation, but it requires a little extra work to set up (described below).

If the directory you are installing into does not yet contain customize.sh, you will also be prompted for the old installation path. If customize.sh is found in the old installation path, it will be copied into the source directory and the configure step will be finished. If you like, you can avoid the first two questions by passing "--prefix=/install_dir" and "--oldprefix=/oldinstall_dir" parameters to ./configure. If you have not previously installed a release that supports customize.sh, you will be asked a few additional questions about basic configuration parameters.

On prompt enter network/netmask which is allowed to access the Squid.

Examples: 131.154.184.0/255.255.255.0 or 131.154.0.0/255.255.0.0

The script does allow to specify many subnets - just separate them by a blank. If you just hit enter, the standard private network addresses 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 will be allowed. NOTE: The default behavior is to allow any IP address you specify here to use the squid to cache objects coming from any destination address. If you would like a more restrictive policy or other options please see the section below on other ACL options.

On prompt enter the amount of cache memory (in MB) the squid should use. This should be at most 1/8 of your hardware memory. Probably 128 MB would be fine, leaving a lot of memory for disk buffering by the OS because squid performs better for large objects out of the disk cache than the memory cache.

On prompt enter the amount of disk space (in MB) the squid should use for a cache. One suggestion is to set this size at 70% of the available space in your disk partition to allow room for the executables, log files, etc. It should be at least 20000.

You can double check your responses to the prompts by reading Makefile.conf.inc and edit them there before running make if you wish.

Then do:

*make*

*make install*

After that you should examine /install_dir/frontier-cache/squid/etc/customize.sh and make any changes or other customizations you want to. For details on the editing functions available see /install_dir/frontier-cache/squid/etc/customhelps.awk.

---+++ Manual Control of the Server

To do a manual start/stop of the server (as user dbfrontier):

*/install_dir/frontier-cache/utils/bin/fn-local-squid.sh start*

You can also stop it if you need to:

*/install_dir/frontier-cache/utils/bin/fn-local-squid.sh stop*

Remember to start your server after you have installed it.

---+++ Setup for different install directories with each release

If you choose to use a different install directory for each release, do the following extra things:

* Create a symbolic link at a place you will re-use for each new installation, and use that for the cron job described in the next section and in /etc/init.d/frontier-squid.sh described in the following two sections, so those don't need to be reinstalled for every release.
* Either remember to clean out the old installation's disk cache (in /install_dir/frontier-cache/squid/var/cache) and logs (in /install_dir/frontier-cache/squid/var/logs) each time or (better) edit customize.sh to set the cache_log, pid_filename, and coredump_dir options and the second parameter of the cache_dir option and the first parameter of the access_log option to use common directories that you re-use for each new installation. This has an added advantage of not requiring a lot of disk space where you install the software but rather where you choose to put the cache and logs. For example:
<verbatim>
setoptionparameter("cache_dir", 2, "/data/squid_cache")
setoptionparameter("access_log", 1, "/data/squid_logs/access.log")
setoption("cache_log", "/data/squid_logs/cache.log")
setoption("pid_filename", "/data/squid_logs/squid.pid")
setoption("coredump_dir", "/data/squid_cache")
</verbatim>

---+++ Set Up Cron Job

As user dbfrontier, set up cron jobs to rotate the logs, with crontab entries like this:

*7 7 * * * /install_dir/frontier-cache/utils/cron/daily.sh >/dev/null 2>&1*
<br>
*8 * * * * /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1*

You could get the above crontab by doing (with the appropriate value of install_dir)

*crontab /install_dir/frontier-cache/utils/cron/crontab.dat*

You can change the hour and minute as you like, but leave hourly.sh to be one minute after daily.sh, and avoid multiples of 5 for the minute because it can interfere with the monitoring probes which happen every 5 minutes. The hourly.sh script will rotate the logs if access.log goes over a given size, default 1GB. You can change that value by setting the environment variable LARGE_ACCESS_LOG to a different number of bytes. For example for 10GB you can use:

*8 * * * * LARGE_ACCESS_LOG=10000000000 /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1*

In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour. If disk space for the logs is a concern see the section on the Access Log Growth Issue below.

---+++ As Root, Set Up Start at Boot Time

(This is the only step to be done as root.)

Then as root:

copy install_dir/frontier-cache/utils/init.d/frontier-squid.sh into /etc/init.d*

Then after the copy, root should do:

*/sbin/chkconfig --add frontier-squid.sh*

---+++ Set Up CMS Working Environment

Here is the information about how to access CMS conditions data(@T0) access by means of frontier.

For site frontier configuration, computing site responsibles should pick up the xml fragment for calib-data from this file:

*site-local-config.xml_sample*

In the frontier-connect section include a line like:
<verbatim>
<proxy url="http://localcmsproxy1:3128"/>
</verbatim>with localcmsproxy1 set to the correct local proxy value

Note: the default working port of the squid is 3128/tcp

-insert this xml fragment into the exsiting site-local-config.xml and commit into computing CVS: *CMSSW/COMP/SITECONF/${SITE}/JobConfig*

User's guide (with much more detail) about site-local-config.xml is in:

https://twiki.cern.ch/twiki/bin/view/CMS/SWIntTrivial#SiteLocalConfig
<verbatim>
Here is a nice example of a <calib-data> section in a site-local-config.xml
- <calib-data>
- <frontier-connect>
<proxy url="http://io.hep.kbfi.ee:3128" />
<server url="http://cmsfrontier.cern.ch:8000/FrontierInt" />
<server url="http://cmsfrontier1.cern.ch:8000/FrontierInt" />
<server url="http://cmsfrontier2.cern.ch:8000/FrontierInt" />
<server url="http://cmsfrontier3.cern.ch:8000/FrontierInt" />
</frontier-connect>
</calib-data>
</verbatim>

---++++ Multiple Squid Servers

If you have more than one squid server, and you want Frontier to do the load balancing, set up each squid independently and simply add a proxy line for each squid and one extra loadbalance line to site-local-config.xml: (You should only do this if all the squids are in the same location.)

<verbatim>
<load balance="proxies"/>
<proxy url="http://localcmsproxy1:3128"/>
<proxy url="http://localcmsproxy2:3128"/>
<proxy url="http://localcmsproxy3:3128"/>
</verbatim>

---++ Testing Your Installation

Download the following python script fnget.py (Do a right-click on the link and save the file as fnget.py )

Test access to the Frontier server at CERN with the following commands:

*chmod +x fnget.py
#(only first time)

*./fnget.py --url=http://cmsfrontier.cern.ch:8000/Frontier/Frontier --sql="select 1 from dual"*

This should be the response:
<verbatim>
Using Frontier URL: http://cmsfrontier.cern.ch:8000/Frontier/Frontier
Query: select 1 from dual
Decode results: 1
Refresh cache: 0

Frontier Request:
http://cmsfrontier.cern.ch:8000/Frontier/Frontier?type=frontier_request:1:DEFAUL
T&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_

Query started: 05/12/09 13:46:50 EDT
*WARNING:* no timeout available in python older than 2.4
Query ended: 05/12/09 13:46:50 EDT
Query time: 0.64064002037 [seconds]

Query result:
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE frontier SYSTEM "http://frontier.fnal.gov/frontier.dtd">
<frontier version="3.17" xmlversion="1.0">
<transaction payloads="1">
<payload type="frontier_request" version="1" encoding="BLOBzip">
<data>eJxjY2BgYDRkA5JsfqG+Tq5B7GxgEXYAGs0CVA==</data>
<quality error="0" md5="5544fd3e96013e694f13d2e13b44ee3c" records="1" full_si
ze="25"/>
</payload>
</transaction>
</frontier>


Fields:
1 NUMBER

Records:
1
</verbatim>

This will return whatever you type in the select statement, for example change 1 to 'hello'. The "dual" table is a special debugging feature of Oracle that just returns what you send it.

Now test your squid,

*export http_proxy=http://your.squid.url:3128*

and perform the test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log (following excerpted form the access.log file usually in squid/var/logs :

<verbatim>
128.220.233.179 - - [12/May/2009:08:33:17 +0000] "GET http://cmsfrontier.cern.ch
:8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor
Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py
1.5"
128.220.233.179 - - [12/May/2009:08:33:19 +0000] "GET http://cmsfrontier.cern.ch
:8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor
Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 809 TCP_MEM_HIT:NONE - "fnget.py
1.5"
</verbatim>

Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache.

Another possibility for testing your squid is to run the SAM squid test on a worker node by hand.

---++ Register Your Server
To register, please submit as a bug report to

http://savannah.cern.ch/bugs/?func=additem&group=frontier

with the following information:

* Site - Site name
* Tier - Tier level
* location - Institution
* CE - Node we submit grid test jobs to
* Contact - Contact personís name
* email - Contacts email
* ip/mask - CE nodes addresses that as seen on the WAN
* Squid Node - Name of the squid node for monitoring
* Software - Which tarball or RPM was used for the installation

Tier-3 sites should also register.

---++ Monitoring

The functionality of your squid should be monitored from CERN and Fermilab using SNMP.

To enable this, your site should open port 3401/udp to requests from: 128.142.202.212/255.255.255.255 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name) and 131.225.240.232/255.255.255.255. The former is the main site, and the latter is a backup site at Fermilab.

The main monitoring site is at http://frontier.cern.ch/squidstats/.

---+++ SELinux

SELinux on RHEL 5 does not give the proper context to the default SNMP port (3401) (as of selinux-policy-2.4.6-106.el5) . The command (as root)

*semanage port -a -t http_cache_port_t -p udp 3401*

takes care of this problem.

---++ Some Useful Commands

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh* with any parameter or no parameter will recreate squid.conf after changing customize.sh

*install_dir/frontier-cache/squid/sbin/squid -k parse* will just read squid.conf to see if it makes sense

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload* sends a HUP signal and has squid reread squid.conf

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh status* checks if squid is running

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh restart* stops squid and starts squid without clearing the cache

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh cleancache* deletes and recreates the cache, like a start does, but without starting squid

*install_dir/frontier-cache/squid/bin/squidclient mgr:info* outputs operational information about your squid

---++ Access Log Growth Issue

With many active clients, it is still possible for the squid access.log to grow to unmanageable size. The squid will crash if it runs out of available diskspace. There are a couple ways to avoid this problem:

1) Make sure that you have the hourly.sh cron job enabled as described in the Set Up Cron Job section above to rotate the log when it grows over a size you choose.

2) The other possibility is to disable writing to access.log by putting the following in install_dir/frontier-cache/squid/etc/customize.sh:

*setoption("access_log", "none")*

and then do

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload* to update squid.conf and load it if the squid is already running, otherwise just use start instead of reload.

The squid installation script has the access log turned on by default. It is recommended that a new installation be installed with it on, the functioning of the squid verified by reading the access log, then if disk space is limited, turn the access log off when the squid is in production. Even if you do turn the access log off, you should still run the daily.sh script once per day to rotate the other logs.

---++ Filedescriptors

At some installations with a very large number of worker nodes it may be possible to see error messages about running out of filedescriptors in your cache.log. It is easy to avoid this problem:

1) First, make sure your squid version is at least squid-2.7.X

2) As root, add the following line to /etc/security/limits.conf
<verbatim>
* - nofile 16384
</verbatim>

3) Reboot the machine.

You can check your file descriptor limit and usage by doing:

*install_dir/frontier-cache/squid/bin/squidclient mgr:info*

---++ Other ACL options

The default behavior is to allow the squid to be used for any destination. If you want to restrict the squid to be used only for CMS Conditions Data, then you simply have to add two lines to customize.sh that enable a couple of lines in squid.conf which are already there commented out:

<verbatim>
uncomment("acl RESTRICT_DEST")
uncomment("http_access deny RESTRICT_DEST")
</verbatim>

If for some reason you want to have a different destination or destinations you can use a regular expression, for example:

<verbatim>
setoptionparameter("acl RESTRICT_DEST", 3, "^(cmsfrontier.*|cernvm.*)\\.cern\\.ch$")
uncomment("http_access deny RESTRICT_DEST")
</verbatim>

Another possible configuration is to allow worker nodes at other sites to use your squid, although we discourage that because many worker nodes can use large amounts of bandwidth over the wide area network. If you still want to do it, it can be done by adding extra lines to your customize.sh. The order of these lines is important, so they need to be "anchored" to others, for example like this:
<verbatim>
insertline("acl NET_LOCAL", "acl T2FOO src x.x.x.x/x.x.x.x")
insertline("acl NET_LOCAL", "acl T2BAR src x.x.x.x/x.x.x.x")
insertline("http_access allow NET_LOCAL", "http_access allow T2FOO")
insertline("http_access allow NET_LOCAL", "http_access allow T2BAR")
</verbatim>

In addition, you have to make sure there are holes in any site or machine firewalls that allow these other worker nodes access to port 3128 on your squid.

The default configuration permits incoming accesses from any standard private network address 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16. To eliminate that default behavior add this line:
<verbatim>
commentout("allow localnet")
</verbatim>

Finally, the remote sites must make an appropriate addition to their site-local-config.xml.

If you modify customize.sh while the squid is running, remember to do a

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload*

so that the changes get used.

---++ Personal Squid on a Desktop/Laptop

If you want to install a Frontier Squid on your personal Desktop, just follow the same instructions as under "Software" above, except:

You don't need a dbfrontier account. Your own account will work or any account but not root.<br />You may ignore any instructions about monitoring or registration.

During the ./configure step:

When it asks for the network, use 127.0.0.1/32<br />When it asks for memory, use something modest like 128<br />When it asks for disk space, also use something modest like 5000.

In your site-local-config.xml add the local proxy
<verbatim>
<frontier-connect>
<proxy url="http://localhost:3128"/>
</verbatim>Remember, after you do the installation, you have to do a manual start of the squid.

The only thing that has to be done as root is the automatic start on boot.

---+++ Laptop

For a laptop, just follow the same instructions as for a Desktop, plus a few extra things since a laptop is turned on and off frequently, and might want to run without a network connection.

In the file .../frontier-cache/utils/bin/fn-local-squid.sh find the lines:
<verbatim>
start()
{
if cleancache; then
start_squid
fi
</verbatim>and comment out the call to cleancache <verbatim>
start()
{
# if cleancache; then
start_squid
# fi
</verbatim>

Then add this line to customize.sh
<verbatim>
setoption("offline_mode", "on")
</verbatim>Of course, any changes to the underlying database while this is set won't automatically be noticed, even if you are connected to the network. To manually refresh the cache while squid is stopped use the cleancache command as described above. Caution: Sometimes the squid disk cache can get corrupted, such as by not shutting the squid down cleanly. If that happens and the squid won't start, you'll need to clean the cache but you won't be able to run without reconnecting to the network to reload the cache.

As an alternative to offline_mode on you could instead use
<verbatim>
setoption("cachemgr_passwd", "none offline_toggle")
</verbatim>

Now you can switch back and forth between offline (offline_mode on) and online (offline_mode off) just by doing:
<verbatim>
install_dir/frontier-cache/squid/bin/squidclient mgr:offline_toggle
</verbatim>where install_dir is wherever you put it.

Another possiblity is to comment out the offline_mode on line that you added above to customize.sh, and only uncomment it (and tell squid to reload) when you want to run without network connection or use offline_toggle.

As always, if you have any problems, please e-mail: =cms-frontier-support@cern.ch=

%RESPONSIBLE% DaveDykstra
>
>
For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.

Hardware

The first step is to decide what hardware you want to run the squid cache server on. These are some FAQs.

1) Do I need to dedicate a node to squid and only squid?

This is up to you. It is a strongly recommended. It depends on how many jobs try to access the squid simultaneously and what else the machine is used for (see question 2). Large sites may need more than one squid (see question 4). The node needs to have network access to the internet, and be visible to the worker nodes. Virtual machines can help isolate other uses of a physical machine, but it doesn't isolate disk and especially network usage.

2) What hardware specs (CPU, memory, disk cache)?

For most purposes 2 cores at 2GHZ, 2GB memory, and 100 GB for the disk cache should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm always rotates the logs every day and removes after 10 rotates, and once an hour it will also rotate if the log is bigger than 1 GB. On heavily used systems the default might keep logs for too short of a time, however (less than a day), so it's better to change the default and allow at least 50GB for logs. From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste. You should also avoid network filesystems such as AFS and NFS for the disk cache.

Here is a description of squid memory usage: If you have a decent amount of spare memory, the kernel will use that as page cache, so it's a good chance that frequenty-requested items will, in fact, be served from RAM (via the page cache) even if it's not squid's RAM. There's also a design bottleneck in squid that limits cpu efficiency of large cache_mem objects, so resist the urge to give squid all your available memory. Let cache_mem handle your small objects and the kernel handle the larger ones.

3) What network specs?

The latencies will be lower to the worker nodes if you have a large bandwidth. The network is almost always the bottleneck for this system, so at least a gigabit each is highly recommended. If you have many job slots, 2 bonded gigabit network connections is even better, and squid on one core of a modern CPU can pretty much keep up with 2 gigabits. Squid is single-threaded so if you're able to supply more than 2 gigabits, multiple squid processes on the same machine need to be used to serve the full throughput. This is supported in the frontier-squid package (instructions below) but each squid needs its own memory and disk space.

4) How many squids do I need?

Sites with over 500 job slots should have at least 2 squids for reliability. We currently estimate that sites should have one gigabit on a squid per 1000 grid job slots. A lot depends on how quickly jobs start; an empty batch queue that suddenly fills up will need more squids. The number of job slots that can be safely handled per gigabit increases as the number of slots increase because the chances that they all start at once tends to go down.

5) How should squids be load-balanced?

There are many ways to configure multiple squids: round-robin DNS, load-balancing networking hardware, LVS, etc. The simplest thing to do is just set up two or more squid machines independently and let Frontier handle it by making a small addition to the frontier client configuration (see below under Multiple Squid Servers). If there are many thousands of job slots, hardware-based load balancers can be easily overloaded, so DNS-based or client-based load balancing will probably be called for.

Software

Preparation

By default the frontier-squid rpm installs files with a "squid" user id and group. If they do not exist, the rpm will create them. If your system has its own means of creating logins you should create the login and group before installing the rpm. If you want the squid process to use a different user id (historically it has been "dbfrontier"), then before installing the rpm create the file /etc/squid/squidconf with the following contents:
    export FRONTIER_USER=dbfrontier
    export FRONTIER_GROUP=users
where you can fill in whichever user and group id you choose.

Installation

First, if you do not installed any frontier rpm before, execute the following command as the root user:

    # wget -O /etc/yum.repos.d/cern-frontier.repo http://frontier.cern.ch/dist/rpms/cern-frontier.repo

If the directory you are installing into does not yet contain customize.sh, you will also be prompted for the old installation path. If customize.sh is found in the old installation path, it will be copied into the source directory and the configure step will be finished. If you like, you can avoid the first two questions by passing "--prefix=/install_dir" and "--oldprefix=/oldinstall_dir" parameters to ./configure. If you have not previously installed a release that supports customize.sh, you will be asked a few additional questions about basic configuration parameters.

On prompt enter network/netmask which is allowed to access the Squid.

Examples: 131.154.184.0/255.255.255.0 or 131.154.0.0/255.255.0.0

The script does allow to specify many subnets - just separate them by a blank. If you just hit enter, the standard private network addresses 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 will be allowed. NOTE: The default behavior is to allow any IP address you specify here to use the squid to cache objects coming from any destination address. If you would like a more restrictive policy or other options please see the section below on other ACL options.

On prompt enter the amount of cache memory (in MB) the squid should use. This should be at most 1/8 of your hardware memory. Probably 128 MB would be fine, leaving a lot of memory for disk buffering by the OS because squid performs better for large objects out of the disk cache than the memory cache.

On prompt enter the amount of disk space (in MB) the squid should use for a cache. One suggestion is to set this size at 70% of the available space in your disk partition to allow room for the executables, log files, etc. It should be at least 20000.

You can double check your responses to the prompts by reading Makefile.conf.inc and edit them there before running make if you wish.

Then do:

make

make install

After that you should examine /install_dir/frontier-cache/squid/etc/customize.sh and make any changes or other customizations you want to. For details on the editing functions available see /install_dir/frontier-cache/squid/etc/customhelps.awk.

Manual Control of the Server

To do a manual start/stop of the server (as user dbfrontier):

/install_dir/frontier-cache/utils/bin/fn-local-squid.sh start

You can also stop it if you need to:

/install_dir/frontier-cache/utils/bin/fn-local-squid.sh stop

Remember to start your server after you have installed it.

Setup for different install directories with each release

If you choose to use a different install directory for each release, do the following extra things:

  • Create a symbolic link at a place you will re-use for each new installation, and use that for the cron job described in the next section and in /etc/init.d/frontier-squid.sh described in the following two sections, so those don't need to be reinstalled for every release.
  • Either remember to clean out the old installation's disk cache (in /install_dir/frontier-cache/squid/var/cache) and logs (in /install_dir/frontier-cache/squid/var/logs) each time or (better) edit customize.sh to set the cache_log, pid_filename, and coredump_dir options and the second parameter of the cache_dir option and the first parameter of the access_log option to use common directories that you re-use for each new installation. This has an added advantage of not requiring a lot of disk space where you install the software but rather where you choose to put the cache and logs. For example:
      setoptionparameter("cache_dir", 2, "/data/squid_cache")
      setoptionparameter("access_log", 1, "/data/squid_logs/access.log")
      setoption("cache_log", "/data/squid_logs/cache.log")
      setoption("pid_filename", "/data/squid_logs/squid.pid")
      setoption("coredump_dir", "/data/squid_cache")

Set Up Cron Job

As user dbfrontier, set up cron jobs to rotate the logs, with crontab entries like this:

7 7 * * * /install_dir/frontier-cache/utils/cron/daily.sh >/dev/null 2>&1
8 * * * * /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1

You could get the above crontab by doing (with the appropriate value of install_dir)

crontab /install_dir/frontier-cache/utils/cron/crontab.dat

You can change the hour and minute as you like, but leave hourly.sh to be one minute after daily.sh, and avoid multiples of 5 for the minute because it can interfere with the monitoring probes which happen every 5 minutes. The hourly.sh script will rotate the logs if access.log goes over a given size, default 1GB. You can change that value by setting the environment variable LARGE_ACCESS_LOG to a different number of bytes. For example for 10GB you can use:

8 * * * * LARGE_ACCESS_LOG=10000000000 /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1

In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour. If disk space for the logs is a concern see the section on the Access Log Growth Issue below.

As Root, Set Up Start at Boot Time

(This is the only step to be done as root.)

Then as root:

copy install_dir/frontier-cache/utils/init.d/frontier-squid.sh into /etc/init.d

Then after the copy, root should do:

/sbin/chkconfig --add frontier-squid.sh

Set Up CMS Working Environment

Here is the information about how to access CMS conditions data(@T0) access by means of frontier.

For site frontier configuration, computing site responsibles should pick up the xml fragment for calib-data from this file:

site-local-config.xml_sample

In the frontier-connect section include a line like:

<proxy url="http://localcmsproxy1:3128"/> 
with localcmsproxy1 set to the correct local proxy value

Note: the default working port of the squid is 3128/tcp

-insert this xml fragment into the exsiting site-local-config.xml and commit into computing CVS: CMSSW/COMP/SITECONF/${SITE}/JobConfig

User's guide (with much more detail) about site-local-config.xml is in:

https://twiki.cern.ch/twiki/bin/view/CMS/SWIntTrivial#SiteLocalConfig

Here is a nice example of a <calib-data> section in a site-local-config.xml
- <calib-data> 
 - <frontier-connect>  
    <proxy url="http://io.hep.kbfi.ee:3128" />  
    <server url="http://cmsfrontier.cern.ch:8000/FrontierInt" />   
    <server url="http://cmsfrontier1.cern.ch:8000/FrontierInt" />  
    <server url="http://cmsfrontier2.cern.ch:8000/FrontierInt" />  
    <server url="http://cmsfrontier3.cern.ch:8000/FrontierInt" />  
   </frontier-connect>  
  </calib-data>  

Multiple Squid Servers

If you have more than one squid server, and you want Frontier to do the load balancing, set up each squid independently and simply add a proxy line for each squid and one extra loadbalance line to site-local-config.xml: (You should only do this if all the squids are in the same location.)

<load balance="proxies"/>
<proxy url="http://localcmsproxy1:3128"/> 
<proxy url="http://localcmsproxy2:3128"/> 
<proxy url="http://localcmsproxy3:3128"/> 

Testing Your Installation

Download the following python script fnget.py (Do a right-click on the link and save the file as fnget.py )

Test access to the Frontier server at CERN with the following commands:

chmod +x fnget.py #(only first time)

./fnget.py --url=http://cmsfrontier.cern.ch:8000/Frontier/Frontier --sql="select 1 from dual"

This should be the response:

Using Frontier URL:  http://cmsfrontier.cern.ch:8000/Frontier/Frontier
Query:  select 1 from dual
Decode results:  1
Refresh cache:  0

Frontier Request:
http://cmsfrontier.cern.ch:8000/Frontier/Frontier?type=frontier_request:1:DEFAUL
T&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_

Query started:  05/12/09 13:46:50 EDT
*WARNING:* no timeout available in python older than 2.4
Query ended:  05/12/09 13:46:50 EDT
Query time: 0.64064002037 [seconds]

Query result:
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE frontier SYSTEM "http://frontier.fnal.gov/frontier.dtd">
<frontier version="3.17" xmlversion="1.0">
 <transaction payloads="1">
  <payload type="frontier_request" version="1" encoding="BLOBzip">
   <data>eJxjY2BgYDRkA5JsfqG+Tq5B7GxgEXYAGs0CVA==</data>
   <quality error="0" md5="5544fd3e96013e694f13d2e13b44ee3c" records="1" full_si
ze="25"/>
  </payload>
 </transaction>
</frontier>


Fields:
     1     NUMBER

Records:
     1

This will return whatever you type in the select statement, for example change 1 to 'hello'. The "dual" table is a special debugging feature of Oracle that just returns what you send it.

Now test your squid,

export http_proxy=http://your.squid.url:3128

and perform the test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log (following excerpted form the access.log file usually in squid/var/logs :

128.220.233.179 - - [12/May/2009:08:33:17 +0000] "GET http://cmsfrontier.cern.ch
:8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor
Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py
 1.5"
128.220.233.179 - - [12/May/2009:08:33:19 +0000] "GET http://cmsfrontier.cern.ch
:8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor
Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 809 TCP_MEM_HIT:NONE - "fnget.py
1.5"

Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache.

Another possibility for testing your squid is to run the SAM squid test on a worker node by hand.

Register Your Server

To register, please submit as a bug report to

http://savannah.cern.ch/bugs/?func=additem&group=frontier

with the following information:

  • Site - Site name
  • Tier - Tier level
  • location - Institution
  • CE - Node we submit grid test jobs to
  • Contact - Contact person’s name
  • email - Contacts email
  • ip/mask - CE nodes addresses that as seen on the WAN
  • Squid Node - Name of the squid node for monitoring
  • Software - Which tarball or RPM was used for the installation

Tier-3 sites should also register.

Monitoring

The functionality of your squid should be monitored from CERN and Fermilab using SNMP.

To enable this, your site should open port 3401/udp to requests from: 128.142.202.212/255.255.255.255 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name) and 131.225.240.232/255.255.255.255. The former is the main site, and the latter is a backup site at Fermilab.

The main monitoring site is at http://frontier.cern.ch/squidstats/.

SELinux

SELinux on RHEL 5 does not give the proper context to the default SNMP port (3401) (as of selinux-policy-2.4.6-106.el5) . The command (as root)

semanage port -a -t http_cache_port_t -p udp 3401

takes care of this problem.

Some Useful Commands

install_dir/frontier-cache/utils/bin/fn-local-squid.sh with any parameter or no parameter will recreate squid.conf after changing customize.sh

install_dir/frontier-cache/squid/sbin/squid -k parse will just read squid.conf to see if it makes sense

install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload sends a HUP signal and has squid reread squid.conf

install_dir/frontier-cache/utils/bin/fn-local-squid.sh status checks if squid is running

install_dir/frontier-cache/utils/bin/fn-local-squid.sh restart stops squid and starts squid without clearing the cache

install_dir/frontier-cache/utils/bin/fn-local-squid.sh cleancache deletes and recreates the cache, like a start does, but without starting squid

install_dir/frontier-cache/squid/bin/squidclient mgr:info outputs operational information about your squid

Access Log Growth Issue

With many active clients, it is still possible for the squid access.log to grow to unmanageable size. The squid will crash if it runs out of available diskspace. There are a couple ways to avoid this problem:

1) Make sure that you have the hourly.sh cron job enabled as described in the Set Up Cron Job section above to rotate the log when it grows over a size you choose.

2) The other possibility is to disable writing to access.log by putting the following in install_dir/frontier-cache/squid/etc/customize.sh:

setoption("access_log", "none")

and then do

install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload to update squid.conf and load it if the squid is already running, otherwise just use start instead of reload.

The squid installation script has the access log turned on by default. It is recommended that a new installation be installed with it on, the functioning of the squid verified by reading the access log, then if disk space is limited, turn the access log off when the squid is in production. Even if you do turn the access log off, you should still run the daily.sh script once per day to rotate the other logs.

Filedescriptors

At some installations with a very large number of worker nodes it may be possible to see error messages about running out of filedescriptors in your cache.log. It is easy to avoid this problem:

1) First, make sure your squid version is at least squid-2.7.X

2) As root, add the following line to /etc/security/limits.conf

* - nofile 16384

3) Reboot the machine.

You can check your file descriptor limit and usage by doing:

install_dir/frontier-cache/squid/bin/squidclient mgr:info

Other ACL options

The default behavior is to allow the squid to be used for any destination. If you want to restrict the squid to be used only for CMS Conditions Data, then you simply have to add two lines to customize.sh that enable a couple of lines in squid.conf which are already there commented out:

 uncomment("acl RESTRICT_DEST")
 uncomment("http_access deny !RESTRICT_DEST")

If for some reason you want to have a different destination or destinations you can use a regular expression, for example:

 setoptionparameter("acl RESTRICT_DEST", 3, "^(cmsfrontier.*|cernvm.*)\\.cern\\.ch$")
 uncomment("http_access deny !RESTRICT_DEST")

Another possible configuration is to allow worker nodes at other sites to use your squid, although we discourage that because many worker nodes can use large amounts of bandwidth over the wide area network. If you still want to do it, it can be done by adding extra lines to your customize.sh. The order of these lines is important, so they need to be "anchored" to others, for example like this:

 insertline("acl NET_LOCAL", "acl T2FOO src x.x.x.x/x.x.x.x")
 insertline("acl NET_LOCAL", "acl T2BAR src x.x.x.x/x.x.x.x")
 insertline("http_access allow NET_LOCAL", "http_access allow T2FOO")
 insertline("http_access allow NET_LOCAL", "http_access allow T2BAR")

In addition, you have to make sure there are holes in any site or machine firewalls that allow these other worker nodes access to port 3128 on your squid.

The default configuration permits incoming accesses from any standard private network address 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16. To eliminate that default behavior add this line:

 commentout("allow localnet")

Finally, the remote sites must make an appropriate addition to their site-local-config.xml.

If you modify customize.sh while the squid is running, remember to do a

install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload

so that the changes get used.

Personal Squid on a Desktop/Laptop

If you want to install a Frontier Squid on your personal Desktop, just follow the same instructions as under "Software" above, except:

You don't need a dbfrontier account. Your own account will work or any account but not root.
You may ignore any instructions about monitoring or registration.

During the ./configure step:

When it asks for the network, use 127.0.0.1/32
When it asks for memory, use something modest like 128
When it asks for disk space, also use something modest like 5000.

In your site-local-config.xml add the local proxy

    <frontier-connect>
       <proxy url="http://localhost:3128"/>
Remember, after you do the installation, you have to do a manual start of the squid.

The only thing that has to be done as root is the automatic start on boot.

Laptop

For a laptop, just follow the same instructions as for a Desktop, plus a few extra things since a laptop is turned on and off frequently, and might want to run without a network connection.

In the file .../frontier-cache/utils/bin/fn-local-squid.sh find the lines:

start()
{
 if cleancache; then
  start_squid
 fi
and comment out the call to cleancache
start()
{
# if cleancache; then
  start_squid
# fi

Then add this line to customize.sh

 setoption("offline_mode", "on")
Of course, any changes to the underlying database while this is set won't automatically be noticed, even if you are connected to the network. To manually refresh the cache while squid is stopped use the cleancache command as described above. Caution: Sometimes the squid disk cache can get corrupted, such as by not shutting the squid down cleanly. If that happens and the squid won't start, you'll need to clean the cache but you won't be able to run without reconnecting to the network to reload the cache.

As an alternative to offline_mode on you could instead use

 setoption("cachemgr_passwd", "none offline_toggle")

Now you can switch back and forth between offline (offline_mode on) and online (offline_mode off) just by doing:

install_dir/frontier-cache/squid/bin/squidclient mgr:offline_toggle
where install_dir is wherever you put it.

Another possiblity is to comment out the offline_mode on line that you added above to customize.sh, and only uncomment it (and tell squid to reload) when you want to run without network connection or use offline_toggle.

%RESPONSIBLE% DaveDykstra

Revision 12012-10-18 - DaveDykstra

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Installing a Frontier Local Squid Cache Server

The instructions below are for the latest version of the frontier-squid rpm on a Scientific Linux version 5 or 6 based system. The rpm is based on the frontier-squid source tarball, and there are also InstallSquidTarball available. Please see the tarball Release Notes and rpm Release Notes for details on what has changed in recent versions.

If you have any problems with the software or installation, submit a bug report to http://savannah.cern.ch/bugs/?func=additem&group=frontierdev.

For rapid response to configuration questions, send e-mail to: cms-frontier-support@cern.ch or atlas-frontier-support@cern.ch.

Hardware

The first step is to decide what hardware you want to run the squid cache server on. These are some FAQ's

1) Do I need to dedicate a node to squid and only squid?

This is up to you. It is a strongly recommended. It depends on how many jobs try to access the squid simultaneously and what else the machine is used for (see question 2). Large sites may need more than one squid (see question 4). The node needs to have network access to the internet, and be visible to the worker nodes. Virtual machines can help isolate other uses of a physical machine, but it doesn't isolate disk and especially network usage.

2) What hardware specs (CPU, memory, disk cache)?

For most purposes 2-core 2GHZ, 2GB, 100 GB should be adequate. This excludes the space needed for log files which is determined by how heavily the system is used and what the clean up schedule is. The default in the rpm rotates the logs every day and removes after 10 rotates, and once an hour it will also rotate if the log is bigger than 1 GB. From what we have seen, the most critical resource is the memory. If the machine serves other purposes, make sure the other tasks don't use up all the memory. Squid runs as a single thread, so if that is the only use of the machine, having more than 2 cores is a waste. You should also avoid network filesystems such as AFS and NFS for the disk cache.

Here is a description of squid memory usage: If you have a decent amount of spare memory, the kernel will use that as page cache, so it's a good chance that frequenty-requested items will, in fact, be served from RAM (via the page cache) even if it's not squid's RAM. There's also a design bottleneck in squid that limits cpu efficiency of large cache_mem objects, so resist the urge to give squid all your available memory. Let cache_mem handle your small objects and the kernel handle the larger ones.

3) What network specs?

The latencies will be lower to the worker nodes if you have a large bandwidth. The network is almost always the bottleneck for this system, so at least a gigabit each is highly recommended. If you have many job slots, 2 bonded gigabit network connections is even better, and squid on one core of a modern CPU can pretty much keep up with 2 gigabits. Squid is single-threaded so if you're able to supply more than 2 gigabits, multiple squid processes on the same machine need to be used to serve the full throughput. This is supported in the frontier-squid package (instructions below) but each squid needs its own memory and disk space.

4) How many squids do I need?

Sites with over 500 job slots should have at least 2 squids for reliability. We currently estimate that sites should have one gigabit on a squid per 1000 grid job slots. A lot depends on how quickly jobs start; an empty batch queue that suddenly fills up will need more squids. The number of job slots that can be safely handled per gigabit increases as the number of slots increase because the chances that they all start at once tends to go down.

5) How should squids be load-balanced?

There are many ways to configure multiple squids: round-robin DNS, load-balancing networking hardware, LVS, etc. The simplest thing to do is just set up two or more squid machines independently and let Frontier handle it by making a small addition to the frontier client configuration (see below under Multiple Squid Servers). If there are many thousands of job slots, hardware-based load balancers can be easily overloaded, so DNS-based or client-based load balancing will probably be called for.

---++ Software
---+++ Download and Install Software
This site distributes the Frontier version of squid as a tarball. Someone else repackages it as an RPM for Scientific Linux. The documentation for those is being improved but for now the best documentation is the package README file plus the general installation instructions in the frontier rpms README file. Note that many of the same considerations apply to the rpm as the tarball distribution.

(If, for some reason, you prefer to use a different version of squid, see MyOwnSquid)

The second step is to create an account with username dbfrontier on your hardware. (If for some reason, you can't use the name dbfrontier, any name will actually work.) Then as user dbfrontier (NOT root) download a tarball into this account. The current one is:

http://frontier.cern.ch/dist/frontier-squid-2.7.STABLE9-9.tar.gz

Unpack the tarball:

*tar -xvzf frontier-squid-2.7.STABLE9-9.tar.gz*

*cd frontier-squid-2.7.STABLE9-9*

*./configure*

On prompt enter the directory name where the Squid will be installed. This directory holds the working software, cache, and logs so there should be at least 100 GB available (unless you relocate the cache and logs as described below). This directory is called "/install_dir" below. It should be on a local disk of the computer you are using and not NFS or AFS mounted. You should also avoid RAID, particularly RAID5. Note that the directory you enter should be an absolute (fully qualified) directory name and not a relative one. You may either re-use a previous install directory or create a new one for each release. Creating a new directory for each release makes it easier to back out to a previous release and ensures a clean installation, but it requires a little extra work to set up (described below).

If the directory you are installing into does not yet contain customize.sh, you will also be prompted for the old installation path. If customize.sh is found in the old installation path, it will be copied into the source directory and the configure step will be finished. If you like, you can avoid the first two questions by passing "--prefix=/install_dir" and "--oldprefix=/oldinstall_dir" parameters to ./configure. If you have not previously installed a release that supports customize.sh, you will be asked a few additional questions about basic configuration parameters.

On prompt enter network/netmask which is allowed to access the Squid.

Examples: 131.154.184.0/255.255.255.0 or 131.154.0.0/255.255.0.0

The script does allow to specify many subnets - just separate them by a blank. If you just hit enter, the standard private network addresses 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 will be allowed. NOTE: The default behavior is to allow any IP address you specify here to use the squid to cache objects coming from any destination address. If you would like a more restrictive policy or other options please see the section below on other ACL options.

On prompt enter the amount of cache memory (in MB) the squid should use. This should be at most 1/8 of your hardware memory. Probably 128 MB would be fine, leaving a lot of memory for disk buffering by the OS because squid performs better for large objects out of the disk cache than the memory cache.

On prompt enter the amount of disk space (in MB) the squid should use for a cache. One suggestion is to set this size at 70% of the available space in your disk partition to allow room for the executables, log files, etc. It should be at least 20000.

You can double check your responses to the prompts by reading Makefile.conf.inc and edit them there before running make if you wish.

Then do:

*make*

*make install*

After that you should examine /install_dir/frontier-cache/squid/etc/customize.sh and make any changes or other customizations you want to. For details on the editing functions available see /install_dir/frontier-cache/squid/etc/customhelps.awk.

---+++ Manual Control of the Server

To do a manual start/stop of the server (as user dbfrontier):

*/install_dir/frontier-cache/utils/bin/fn-local-squid.sh start*

You can also stop it if you need to:

*/install_dir/frontier-cache/utils/bin/fn-local-squid.sh stop*

Remember to start your server after you have installed it.

---+++ Setup for different install directories with each release

If you choose to use a different install directory for each release, do the following extra things:

* Create a symbolic link at a place you will re-use for each new installation, and use that for the cron job described in the next section and in /etc/init.d/frontier-squid.sh described in the following two sections, so those don't need to be reinstalled for every release.
* Either remember to clean out the old installation's disk cache (in /install_dir/frontier-cache/squid/var/cache) and logs (in /install_dir/frontier-cache/squid/var/logs) each time or (better) edit customize.sh to set the cache_log, pid_filename, and coredump_dir options and the second parameter of the cache_dir option and the first parameter of the access_log option to use common directories that you re-use for each new installation. This has an added advantage of not requiring a lot of disk space where you install the software but rather where you choose to put the cache and logs. For example:
<verbatim>
setoptionparameter("cache_dir", 2, "/data/squid_cache")
setoptionparameter("access_log", 1, "/data/squid_logs/access.log")
setoption("cache_log", "/data/squid_logs/cache.log")
setoption("pid_filename", "/data/squid_logs/squid.pid")
setoption("coredump_dir", "/data/squid_cache")
</verbatim>

---+++ Set Up Cron Job

As user dbfrontier, set up cron jobs to rotate the logs, with crontab entries like this:

*7 7 * * * /install_dir/frontier-cache/utils/cron/daily.sh >/dev/null 2>&1*
<br>
*8 * * * * /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1*

You could get the above crontab by doing (with the appropriate value of install_dir)

*crontab /install_dir/frontier-cache/utils/cron/crontab.dat*

You can change the hour and minute as you like, but leave hourly.sh to be one minute after daily.sh, and avoid multiples of 5 for the minute because it can interfere with the monitoring probes which happen every 5 minutes. The hourly.sh script will rotate the logs if access.log goes over a given size, default 1GB. You can change that value by setting the environment variable LARGE_ACCESS_LOG to a different number of bytes. For example for 10GB you can use:

*8 * * * * LARGE_ACCESS_LOG=10000000000 /install_dir/frontier-cache/utils/cron/hourly.sh >/dev/null 2>&1*

In order to estimate disk usage, note that up to 11 access.log files are kept at a time, and the size can go a bit above the $LARGE_ACCESS_LOG size because the cron job only checks once per hour. If disk space for the logs is a concern see the section on the Access Log Growth Issue below.

---+++ As Root, Set Up Start at Boot Time

(This is the only step to be done as root.)

Then as root:

copy install_dir/frontier-cache/utils/init.d/frontier-squid.sh into /etc/init.d*

Then after the copy, root should do:

*/sbin/chkconfig --add frontier-squid.sh*

---+++ Set Up CMS Working Environment

Here is the information about how to access CMS conditions data(@T0) access by means of frontier.

For site frontier configuration, computing site responsibles should pick up the xml fragment for calib-data from this file:

*site-local-config.xml_sample*

In the frontier-connect section include a line like:
<verbatim>
<proxy url="http://localcmsproxy1:3128"/>
</verbatim>with localcmsproxy1 set to the correct local proxy value

Note: the default working port of the squid is 3128/tcp

-insert this xml fragment into the exsiting site-local-config.xml and commit into computing CVS: *CMSSW/COMP/SITECONF/${SITE}/JobConfig*

User's guide (with much more detail) about site-local-config.xml is in:

https://twiki.cern.ch/twiki/bin/view/CMS/SWIntTrivial#SiteLocalConfig
<verbatim>
Here is a nice example of a <calib-data> section in a site-local-config.xml
- <calib-data>
- <frontier-connect>
<proxy url="http://io.hep.kbfi.ee:3128" />
<server url="http://cmsfrontier.cern.ch:8000/FrontierInt" />
<server url="http://cmsfrontier1.cern.ch:8000/FrontierInt" />
<server url="http://cmsfrontier2.cern.ch:8000/FrontierInt" />
<server url="http://cmsfrontier3.cern.ch:8000/FrontierInt" />
</frontier-connect>
</calib-data>
</verbatim>

---++++ Multiple Squid Servers

If you have more than one squid server, and you want Frontier to do the load balancing, set up each squid independently and simply add a proxy line for each squid and one extra loadbalance line to site-local-config.xml: (You should only do this if all the squids are in the same location.)

<verbatim>
<load balance="proxies"/>
<proxy url="http://localcmsproxy1:3128"/>
<proxy url="http://localcmsproxy2:3128"/>
<proxy url="http://localcmsproxy3:3128"/>
</verbatim>

---++ Testing Your Installation

Download the following python script fnget.py (Do a right-click on the link and save the file as fnget.py )

Test access to the Frontier server at CERN with the following commands:

*chmod +x fnget.py
#(only first time)

*./fnget.py --url=http://cmsfrontier.cern.ch:8000/Frontier/Frontier --sql="select 1 from dual"*

This should be the response:
<verbatim>
Using Frontier URL: http://cmsfrontier.cern.ch:8000/Frontier/Frontier
Query: select 1 from dual
Decode results: 1
Refresh cache: 0

Frontier Request:
http://cmsfrontier.cern.ch:8000/Frontier/Frontier?type=frontier_request:1:DEFAUL
T&encoding=BLOBzip&p1=eNorTs1JTS5RMFRIK8rPVUgpTcwBAD0rBmw_

Query started: 05/12/09 13:46:50 EDT
*WARNING:* no timeout available in python older than 2.4
Query ended: 05/12/09 13:46:50 EDT
Query time: 0.64064002037 [seconds]

Query result:
<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE frontier SYSTEM "http://frontier.fnal.gov/frontier.dtd">
<frontier version="3.17" xmlversion="1.0">
<transaction payloads="1">
<payload type="frontier_request" version="1" encoding="BLOBzip">
<data>eJxjY2BgYDRkA5JsfqG+Tq5B7GxgEXYAGs0CVA==</data>
<quality error="0" md5="5544fd3e96013e694f13d2e13b44ee3c" records="1" full_si
ze="25"/>
</payload>
</transaction>
</frontier>


Fields:
1 NUMBER

Records:
1
</verbatim>

This will return whatever you type in the select statement, for example change 1 to 'hello'. The "dual" table is a special debugging feature of Oracle that just returns what you send it.

Now test your squid,

*export http_proxy=http://your.squid.url:3128*

and perform the test again. It should pass through your squid, and cache the response. To see if it worked, look at the squid access log (following excerpted form the access.log file usually in squid/var/logs :

<verbatim>
128.220.233.179 - - [12/May/2009:08:33:17 +0000] "GET http://cmsfrontier.cern.ch
:8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor
Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 810 TCP_MISS:DIRECT 461 "fnget.py
1.5"
128.220.233.179 - - [12/May/2009:08:33:19 +0000] "GET http://cmsfrontier.cern.ch
:8000/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOBzip&p1=eNor
Ts1JTS5RMFRIK8rPVUgpTcwBAD0rBmw= HTTP/1.0" 200 809 TCP_MEM_HIT:NONE - "fnget.py
1.5"
</verbatim>

Notice the second entry has a "TCP_MEM_HIT", that means the object was cached in the memory. Any subsequent requests for this object will come from the squid cache.

Another possibility for testing your squid is to run the SAM squid test on a worker node by hand.

---++ Register Your Server
To register, please submit as a bug report to

http://savannah.cern.ch/bugs/?func=additem&group=frontier

with the following information:

* Site - Site name
* Tier - Tier level
* location - Institution
* CE - Node we submit grid test jobs to
* Contact - Contact personís name
* email - Contacts email
* ip/mask - CE nodes addresses that as seen on the WAN
* Squid Node - Name of the squid node for monitoring
* Software - Which tarball or RPM was used for the installation

Tier-3 sites should also register.

---++ Monitoring

The functionality of your squid should be monitored from CERN and Fermilab using SNMP.

To enable this, your site should open port 3401/udp to requests from: 128.142.202.212/255.255.255.255 (or preferably cmsdbsfrontier.cern.ch if you can use a DNS name) and 131.225.240.232/255.255.255.255. The former is the main site, and the latter is a backup site at Fermilab.

The main monitoring site is at http://frontier.cern.ch/squidstats/.

---+++ SELinux

SELinux on RHEL 5 does not give the proper context to the default SNMP port (3401) (as of selinux-policy-2.4.6-106.el5) . The command (as root)

*semanage port -a -t http_cache_port_t -p udp 3401*

takes care of this problem.

---++ Some Useful Commands

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh* with any parameter or no parameter will recreate squid.conf after changing customize.sh

*install_dir/frontier-cache/squid/sbin/squid -k parse* will just read squid.conf to see if it makes sense

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload* sends a HUP signal and has squid reread squid.conf

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh status* checks if squid is running

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh restart* stops squid and starts squid without clearing the cache

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh cleancache* deletes and recreates the cache, like a start does, but without starting squid

*install_dir/frontier-cache/squid/bin/squidclient mgr:info* outputs operational information about your squid

---++ Access Log Growth Issue

With many active clients, it is still possible for the squid access.log to grow to unmanageable size. The squid will crash if it runs out of available diskspace. There are a couple ways to avoid this problem:

1) Make sure that you have the hourly.sh cron job enabled as described in the Set Up Cron Job section above to rotate the log when it grows over a size you choose.

2) The other possibility is to disable writing to access.log by putting the following in install_dir/frontier-cache/squid/etc/customize.sh:

*setoption("access_log", "none")*

and then do

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload* to update squid.conf and load it if the squid is already running, otherwise just use start instead of reload.

The squid installation script has the access log turned on by default. It is recommended that a new installation be installed with it on, the functioning of the squid verified by reading the access log, then if disk space is limited, turn the access log off when the squid is in production. Even if you do turn the access log off, you should still run the daily.sh script once per day to rotate the other logs.

---++ Filedescriptors

At some installations with a very large number of worker nodes it may be possible to see error messages about running out of filedescriptors in your cache.log. It is easy to avoid this problem:

1) First, make sure your squid version is at least squid-2.7.X

2) As root, add the following line to /etc/security/limits.conf
<verbatim>
* - nofile 16384
</verbatim>

3) Reboot the machine.

You can check your file descriptor limit and usage by doing:

*install_dir/frontier-cache/squid/bin/squidclient mgr:info*

---++ Other ACL options

The default behavior is to allow the squid to be used for any destination. If you want to restrict the squid to be used only for CMS Conditions Data, then you simply have to add two lines to customize.sh that enable a couple of lines in squid.conf which are already there commented out:

<verbatim>
uncomment("acl RESTRICT_DEST")
uncomment("http_access deny RESTRICT_DEST")
</verbatim>

If for some reason you want to have a different destination or destinations you can use a regular expression, for example:

<verbatim>
setoptionparameter("acl RESTRICT_DEST", 3, "^(cmsfrontier.*|cernvm.*)\\.cern\\.ch$")
uncomment("http_access deny RESTRICT_DEST")
</verbatim>

Another possible configuration is to allow worker nodes at other sites to use your squid, although we discourage that because many worker nodes can use large amounts of bandwidth over the wide area network. If you still want to do it, it can be done by adding extra lines to your customize.sh. The order of these lines is important, so they need to be "anchored" to others, for example like this:
<verbatim>
insertline("acl NET_LOCAL", "acl T2FOO src x.x.x.x/x.x.x.x")
insertline("acl NET_LOCAL", "acl T2BAR src x.x.x.x/x.x.x.x")
insertline("http_access allow NET_LOCAL", "http_access allow T2FOO")
insertline("http_access allow NET_LOCAL", "http_access allow T2BAR")
</verbatim>

In addition, you have to make sure there are holes in any site or machine firewalls that allow these other worker nodes access to port 3128 on your squid.

The default configuration permits incoming accesses from any standard private network address 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16. To eliminate that default behavior add this line:
<verbatim>
commentout("allow localnet")
</verbatim>

Finally, the remote sites must make an appropriate addition to their site-local-config.xml.

If you modify customize.sh while the squid is running, remember to do a

*install_dir/frontier-cache/utils/bin/fn-local-squid.sh reload*

so that the changes get used.

---++ Personal Squid on a Desktop/Laptop

If you want to install a Frontier Squid on your personal Desktop, just follow the same instructions as under "Software" above, except:

You don't need a dbfrontier account. Your own account will work or any account but not root.<br />You may ignore any instructions about monitoring or registration.

During the ./configure step:

When it asks for the network, use 127.0.0.1/32<br />When it asks for memory, use something modest like 128<br />When it asks for disk space, also use something modest like 5000.

In your site-local-config.xml add the local proxy
<verbatim>
<frontier-connect>
<proxy url="http://localhost:3128"/>
</verbatim>Remember, after you do the installation, you have to do a manual start of the squid.

The only thing that has to be done as root is the automatic start on boot.

---+++ Laptop

For a laptop, just follow the same instructions as for a Desktop, plus a few extra things since a laptop is turned on and off frequently, and might want to run without a network connection.

In the file .../frontier-cache/utils/bin/fn-local-squid.sh find the lines:
<verbatim>
start()
{
if cleancache; then
start_squid
fi
</verbatim>and comment out the call to cleancache <verbatim>
start()
{
# if cleancache; then
start_squid
# fi
</verbatim>

Then add this line to customize.sh
<verbatim>
setoption("offline_mode", "on")
</verbatim>Of course, any changes to the underlying database while this is set won't automatically be noticed, even if you are connected to the network. To manually refresh the cache while squid is stopped use the cleancache command as described above. Caution: Sometimes the squid disk cache can get corrupted, such as by not shutting the squid down cleanly. If that happens and the squid won't start, you'll need to clean the cache but you won't be able to run without reconnecting to the network to reload the cache.

As an alternative to offline_mode on you could instead use
<verbatim>
setoption("cachemgr_passwd", "none offline_toggle")
</verbatim>

Now you can switch back and forth between offline (offline_mode on) and online (offline_mode off) just by doing:
<verbatim>
install_dir/frontier-cache/squid/bin/squidclient mgr:offline_toggle
</verbatim>where install_dir is wherever you put it.

Another possiblity is to comment out the offline_mode on line that you added above to customize.sh, and only uncomment it (and tell squid to reload) when you want to run without network connection or use offline_toggle.

As always, if you have any problems, please e-mail: =cms-frontier-support@cern.ch=

%RESPONSIBLE% DaveDykstra

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback