TWiki
>
EGEE Web
>
EGEEDataManagement
>
DMFileTransfer
>
DMFtsSupport
(revision 32) (raw view)
Edit
Attach
PDF
<!-- * Set ALLOWTOPICCHANGE = Main.DepITGMDMGroup --> ---+ File Transfer Trash.EGEEService support area The purpose of this page is to keep track of the problems and support requests posted to GGUS. This page is relevant to the gLite FTS 1.4.1 and FTS 1.5 release, and most of them to the FTS 2.0 release. Todo: split these out onto separate pages. At least for FTS 2.0. ---++ Summary ---++++ Configuration * [[#YaimConfig][YAIM Configuration explained]] ---++++ File Transfer Trash.EGEEService * [[#MyDNChanged][My DN changed. Could you please grant me the same priveleges I had before?]] * [[#NotAuthorizedToSubmit][I tried to submit a job and it said: submit: You are not authorised to submit jobs to this service]] * [[#MonoDirectionalChannel][I submitted a job from site X to Y but it didn't work. The channel Y-X exists and has a share for my VO!]] * [[#WhichSurlFormat][Which format should I use for the SURLs?]] * [[#InvalidEndpoint][I've tried to submit a job but I get back an error saying: SOAP-ENV:Server.userException - org.xml.sax.SAXException]] * [[#NoMatch][I've tried to submit a job but I get back an error saying: No match]] * [[#NotAuthorizedToGetChannel][I was able to list the channels but I cannot get the channel details]] * [[#NotDedicatedChannel][How do I setup a non-dedicated Channel?]] * [[#NoChannelFound15][After upgrading to !FTS 1.5 I got "No Channel found or VO not authorized ..." error]] * [[#ShortLivedproxies][My jobs fail if I have a short time left on the proxy in !MyProxy]] ---++++ File Transfer Agent * [[#AlwaysSubmitted][Job always in Submitted state]] * [[#AlwaysPending][Job always in Pending state]] * [[#SecurityError][All my transfers fail with a SECURITY_ERROR]] * [[#WhichMyProxy][Which !MyProxy Server is used?]] * [[#MyProxyBindError]["Error in bind()" from MyProxy server]] * [[#CannotGetAgentDN][I've noticed a warning "Cannot Get Agent DN" in the agent log files]] * [[#SrmCopyMalformedUrl][My srmcopy transfers fail with a dCache MalformedUrl exception]] * [[#DCacheSrmCopyUrl][I've upgraded to 1.4.1 but srmcopy doesn't seem to work]] * [[#PingNull][I've upgraded to 1.4.1 but the transfer failed with Error in srm__ping: NULL]] * [[#NoSiteFoundForHost][The transfer failed with the error: No site found for host ...]] * [[#EndOfFileReached][The transfer failed with the error: an end-of-file was reached]] * [[#WhichServiceTypes][Which Trash.EGEEService Types are used?]] * [[#LastHope][I've tried everything, and it still doesn't seem to work]] ---++++ Channel Administration * [[#NFilesForVO][How do I set the number of files transferred per VO instead of per channel?]] ---++ Configuration #YaimConfig ---+++ YAIM Configuration explained Starting from FTS version 1.5, the configuration has moved from the gLite python configuration script to !YAIM (you can find an example in =/opt/glite/yaim/example/site-info.def=). For the Yaim details, please refers to the the related documentation. The relevant part for us is the !FTS and the !FTA one. See [[LCG.FtsYaimValues15][FtsYaimValues15]] ---++ File Transfer Trash.EGEEService #MyDNChanged ---++++ My DN changed. Could you please grant me the same privileges I had before? In case a user !DN changed, for example because of the change of the !CERN !CA, all his/her privileges on the !FTS !Server should be updated. If the old certificate is still valid, the user can perform this operation by his own, without the help of the !FTS amdinistrator. In order to due that, the user has to execute the following steps with a valid proxy generated from the old certificate: * Invoke =glite-transfer-getroles= to retrieve the list of priviledges * For each channel he/she has the management provileges on, execute <verbatim> glite-transfer-channel-addmanager CHANNEL_NAME NEW_DN </verbatim> * For each !VO he/she has the management provileges on, execute <verbatim> glite-transfer-addvomanager VO_NAME NEW_DN </verbatim> In case the old user's certficate expired, the !FTS administrator has to list all managers of all the channels (=glite-transfer-channel-listmanagers=) and !VOs (=glite-transfer-listvomanagers=) and then executes =glite-transfer-channel-addmanager= and =glite-transfer-addvomanager= as above. The user can then check that the privileges are correct by executing =glite-transfer-getroles= with a proxy generated from the new certificate. In case the user is also and !FTS administrator, the file =/opt/glite/etc/glite-transfer-admin-mapfile= should be manually modified in every node where the !FTS-WS is installed and a new entry corresponding to the new !DN soudl be added. When the old certificate expires or is no longer needed, the user should then remove the priviledges granted to the old !DN by executing the following commands, with a proxy generated from the new certificates: * Invoke =glite-transfer-getroles= to retrieve the list of priviledges * For each channel he/she has the management provileges on, execute <verbatim> glite-transfer-channel-removemanager CHANNEL_NAME OLD_DN </verbatim> * For each !VO he/she has the management provileges on, execute <verbatim> glite-transfer-removevomanager VO_NAME OLD_DN </verbatim> #NotAuthorizedToSubmit ---++++ Symptom: I tried to submit a job and it said: =submit: You are not authorised to submit jobs to this service= The user is not authorised to submit jobs to the FTS service. In order to authorize him/her, you have to add his/her DN in the =submit-mapfile= on the FTS server. You can have a look at [[LCG.FtsServerInstall13][FtsServerInstall]] in the =Mapfile= section and at [[LCG.FtsServerSubmitMapfile][FtsServerSubmitMapfile]] However, due to bug in the FTS ([[http://savannah.cern.ch/bugs/?func=detailitem&item_id=10362][#10362]]), if the user has a double or more delegated proxy (i.e. the DN ends with =/CN=proxy/CN=proxy=), a parsing error will cause a authorization denied. This bug has being solved in FTS version 1.4 and in the latest !QuickFix for 1.3 If the user is still not authorized to submit request, check his/her DN is not in the =veto-mapfile= #MonoDirectionalChannel ---++++ Symptom: I submitted a job from site X to Y but it didn't work. The channel Y-X exists and has a share for my VO! From version 1.3 onwards the channel definitions are mono-directional. You have to create another channel in the opposite direction (=glite-transfer-channel-add=), set the share for the VO interested in using the channel (=glite-transfer-channel-setvoshare=) and install an !Channel !Agent that will managed it #WhichSurlFormat ---++++ Which format should I use for the SURLs? Starting from gLite 1.4.1, the FTA implements the enhancement request [[http://savannah.cern.ch/bugs/?func=detailitem&item_id=8364][#8364]], that allows a user to specify any format he prefers: the agent would then convert each SURL before transfering or registering into the catalog to either a fully qualified format <verbatim> srm://<host>:<port>/srm/managerv1?SFN=<file_path> </verbatim> or a compact one <verbatim> srm://<host>/<file_path> </verbatim> depending on the configuration. By default it would use the compact format. In case you want to change this parameter, you have to set the related !ChannelAgent configuration parameter =ACTIONS_SURLNORMALIZATION= (=transfer-agent-channel-actions.SurlNormalization=) to one of the following values: * =compact= all the SURLs will be converted to the format: <verbatim> srm://<host>/<file_path> </verbatim> * =compact-with-port= all the SURLs will be converted to the format: <verbatim> srm://<host>:<port>/<file_path> </verbatim> * =fully-qualified= all the SURLs will be converted to the format: <verbatim> srm://<host>:<port>/srm/managerv1?SFN=<file_path> </verbatim> * =disabled= no SURL convertion will be performed If you're using a previous version, for interoperability reasons we suggest to use fully qualified SURLs, i.e. in the format <verbatim> srm://<srm_host>:<srm_port>/srm/managerv1/?SFN=<file_path> </verbatim> If you know the type of the SRM that would be involved in the transfer, you can also specify one of the supported compact format. For !Castor, as example, you can use <verbatim> srm://<castorsrm>:8443/srm/managerv1?SFN=<file_path> srm://<castorsrm>:8443//srm/managerv1?SFN=<file_path> srm://<castorsrm>:8443/?SFN=<file_path> srm://<castorsrm>:8443/<file_path> srm://<castorsrm>/<file_path> </verbatim> In case the transfer is processed by a channel configured to use =srmcopy=, the fully qualified format may not work. Please have a look [[#SrmCopyMalformedUrl][here]] for a workaround #InvalidEndpoint ---++++ Symptom: I've tried to submit a job but I get back an error saying: SOAP-ENV:Server.userException - org.xml.sax.SAXException Usually this issue is related to an endpoint pointing to the wrong server (typically =ChannelManagement= instead on =FileTransfer=): when you observe an error similar to <verbatim> submit: SOAP fault: SOAP-ENV:Server.userException - org.xml.sax.SAXException: Deserializing parameter 'job': could not find deserializer for type {http://transfer.data.glite.org}TransferJob </verbatim> please ask the user to look at the command he just submitted and to check that the specified endpoint is correct; all the CLIs commands that start with =glite-transfer-channel-*= require to use a =ChannelManagement= interface, while the ones that start with =glite-transfer-*= require the =FileTransfer= interface. In order to check if the endpoint is correct, the user can also re-run the command with the =-v= option and checks if the line =Using Endpoint= ends with =FileTransfer= or =ChannelManagement= #NoMatch ---++++ Symptom: I've tried to submit a job but I get back an error saying: No match When the user submit a transfer job, he usually specify some SURLs that may contains a question mark (=?=). In some shells this character has to be escaped by simply quoting it (='?'=): for example, if the SURLs are <verbatim> srm://castorgridsc.cern.ch:8443/srm/managerv1?SFN=/castor/cern.ch/grid/dteam/src_file srm://castorgridsc.cern.ch:8443/srm/managerv1?SFN=/castor/cern.ch/grid/dteam/dst_file </verbatim> please make sure you run =glite-transfer-submit= in this way <verbatim> glite-transfer-submit \ srm://castorgridsc.cern.ch:8443/srm/managerv1'?'SFN=/castor/cern.ch/grid/dteam/src_file \ srm://castorgridsc.cern.ch:8443/srm/managerv1'?'SFN=/castor/cern.ch/grid/dteam/dst_file </verbatim> #NotAuthorizedToGetChannel ---++++ Symptom: I was able to list the channels but I cannot get the channel details Listing channels is open to any user as long as he/she is not in the veto mapfile - you only get the channel name from this call. However, getting the details of a channel - source, destination, bandwitch, etc is restricted. For this you need to be: * an admin * manager of the channel being queried * manager of any VO on the given FTS You can check your roles on a given FTS by running =glite-transfer-getroles=. Information on channel and VO managers can be managed by a service admin or other managers by using the appropriate client tools. Information on service ADMINs is stored inside the admin-mapfile. #NotDedicatedChannel ---++++ How do I setup a non-dedicated Channel? Non-dedicated channels (a.k.a. "catch-all" channels) are a special channel configuration that allows matching any site as source or destination, therefore not coupled with the underlying network. Using "catch-all" channels allows to limit the number of channels you need to manage, but also limits the degree of control you have over what is coming into your site (although it still provides the other advantages like queueing, policy enforcement and error recovery). The usage of these channels is mainly recommended in Tier1 for providing full connectivity to all other sites, where the suggested channels definition is: * Dedicated channels from any other Tier1 to the T1 * Non-dedicated channels to each of the related Tier2 * A non-dedicated channel to the T1 You can setup a non-dedicated channel that will manage all the transfers from any site to your site by issuing a =glite-transfer-channel-add= and using =*= and source site name, like: =glite-transfer-channel-add -f NUM_OF_FILES -S CHANNEL_STATE [...] CHANNEL_NAME "*" YOUR_SITE= Of course, you have then to issue a =glite-transfer-channel-setvoshare= for each !VO that should be authorized to use the channel and then configure a !ChannelAgent for that channel. Please note that is a !VO is not authorized to use a channel between site =A= and =B= but has privileges on a =*-B= channel, transfer requests for that !VO from site =A= to =B= are denied since the non-dedicated channel is evaluated _after_ all the dedicated ones. In addition, please also note that the default !ChannelAgent configuration for that channel requires that all the SRM that would be involved in the managed transfers should be listed in the information system. In case a !VO needs to relax this constraint, for example in order to transfers files to/from !Classic !SEs not included in the information system, the following parameters should be added to the !VOAgent configuration: * =ACTIONS_ENABLEUNKNOWNSOURCE= (=transfer-agent-vo-actions.EnableUnknownSource=) should be set to =true= if !SEs not known to the !InfoSys should be allowed as valid source (these would be matched by the =*-Site= catch-all channels) * =ACTIONS_ENABLEUNKNOWNDEST= (=transfer-agent-vo-actions.EnableUnknownDest=) should be set to =true= if !SEs not known to the !InfoSys should be allowed as valid destination (these would be matched by the =Site-*= catch-all channels) In case a !VO needs these parameters, it would be better to turn off the [[#WhichSurlFormat][SURL Normalization]], or at least set it to =fully-qualified=, for all the !ChannelAgents associated to non-dedicated channels, since it would be impossible to resolve the correct endpoint for the SRM not listed in the !InformationSystem. It will also be worth to reccommend the users to use fully-qualified SURLs for transfers that should be processed through these channels. *Use of the =*-*= 'catch everything' channel is not recommended for production grids*. #NoChannelFound15 ---++++ Symptom: After upgrading to !FTS 1.5 I got "No Channel found or VO not authorized ..." error Running the !FTS service we encountered many inconsistencies in the way the information was published in !BDII, especially related to the case used to publish the site name. This not not a probalem when !BDII is used directly, since it's is case insensitive, but creates some intereoperability issues when used via !ServiceDiscovery (that is case sensitive). We therefore decided to apply a convention, within the FTS boundaries, in order to have all the site names uppercase in the channel definitions. Starting form version 1.5, the !FTS !WebTrash.EGEEService forces the case when you create a new channel, but when upgrading from previous versions, this convention may conflict whit already defined channels. In order to fix this, we have provided an admin pack hat allows changing the channel definitions. The instruction how to use that tools are available [[LCG.FtsAdminToolsPackageLoading15][here]]. Therefore, if you hit this problem, download the =glite-data-transfer-scripts= !RPM and follow the instuction reported above in order to replace all the site names that contains lowercase letters in all the channel definition (you may need the support of your !DBA). *Note: If this RPM is not yet available in the repository, please contact [[mailto:fts-support@cern.ch fts-support]].* #ShortLivedproxies ---++++ Symptom: My jobs fail if I have a short time left on the proxy in !MyProxy Make sure you have a fresh version in MyProxy that will last at least the length of all your jobs (assume queue length of 2 days from your last submission). ---++ File Transfer Agent #AlwaysSubmitted ---++++ Symptom: Job always in Submitted state The first action that is executed on a transfer request is the !Allocation, performed by the !VO agent associted with the VO of the submitter. This actions checks the source and destination !SURLs of the job request, find the sites of the involved SEs using !ServiceDiscovery and then look up in the registered channels for a matching. When this operation succeed, the job is moved to !Pending and the =channel_name= property is filled with the name of the found channel. Due to a bug in FTA 1.3 and 1.4 ([[http://savannah.cern.ch/bugs/?func=detailitem&item_id=10076][#10076]]) a job stays in !Submitted state instead of going to !Failed in one of the following cases * The channel doesn't exist but the source and destination SE are registered in !ServiceDiscovery or the !VO is configured to accept !unknown source and destination * The VO of the user who submitted the job has no valid share on the channel * The channel is in !Stopped, !Drain or !Halted (actually, when the channel status is !Halted, a job should go in !Pending and not in !Failed) Usually this problem is due to a configuration error. The first thing to do is to retrieve the status of the channel that should be involved in the transfer =glite-transfer-channel-list CHANNEL_NAME= check the channel state, that the !VO has a share and that the names of the source and destination sites match the ones retrived using !ServiceDiscovery: in case the file plugin is used, look at the =site= element of the SRM services reported into the =services.xml= file <verbatim> <service name='CERNSC3-SRM'> <parameters> <endpoint>httpg://castorgridsc.cern.ch:8443/srm/managerv1</endpoint> <type>SRM</type> <version>1.1.0</version> <site>CERN-SC</site> <param name='SEMountPoint'>/castor/cern.ch/grid/dteam/storage</param> </parameters> </service> </verbatim> and compare them with the value returned by =glite-transfer-channel-list= In case this doesn't fix the problem, check that a !VO agent is configured and running for that !VO. Do =glite-transfer-status --verbose JOB_ID= And check that the value of the =VOName= property is correct; in case is not, it's a problem with the !FTS =glite-data-transfer-submit-mapfile=: edit that file manually or regenerate it following teh procedures reported by [[LCG.FtsServerSubmitMapfile][FtsServerSubmitMapfile]], cancel the job, wait that the files is reloaded by the !FTS and ask the user to resubmit the request. In case the !VO is set correctly, check on the agents node that an agent is configured: * if you're using gLite 1.3, please have a look at =/opt/glite/etc/config/glite-data-transfer-agents-oracle.cfg.xml= and see if there is an instance for the VO: <verbatim> <instance name="YOUR_VO-fts"> <parameters> <transfer-vo-agent.Name value="YOUR_VO"/> <!-- Other parameter --> <!- ... --> </parameters> </instance> </verbatim> * if you're using gLite 1.4, open the file =/opt/glite/etc/config/glite-file-transfer-agents-oracle.cfg.xml= and look for an instance: <verbatim> <instance name="YOUR_VO" service="transfer-vo-agent-fts"/> </verbatim> If the instance is missing, or the naming convention is not correct, edit the appropriate file and rerun the configuration script. If the instance is there, check if it's running, using the command =/opt/glite/etc/init.d/glite-data-transfer-agents --instance glite-transfer-vo-agent-YOUR_VO status= or =service transfer-agents --instance glite-transfer-vo-agent-YOUR_VO status= (was =service glite-data-transfer-agents ...= before 1.5) If the job is still !Submitted, follow the procedure reported [[#LastHope][here]] #AlwaysPending ---++++ Symptom: Job always in Pending state After the a transfer request is allocation to a channel, its status is moved to !Pending. The !ChannelAgent will then process this request based on its internal inter-VO scheduling. In case the job state remaing !Pending forever, you have to check the follwoing things: * The related !ChannelAgent daemon should be running * The !Channel state should be set to !Active * The VO should have a share on the channel that is greater than 0 In order to check if the agent is running, use the command =/opt/glite/etc/init.d/glite-data-transfer-agents --instance glite-transfer-channel-agent-TYPE-CHANNEL_NAME status= or =service transfer-agents --instance glite-transfer-channel-agent-TYPE-CHANNEL_NAME status= (was =service glite-data-transfer-agents ...= before 1.5) You can check the !Channel state and VO share using the command: =glite-transfer-channel-list CHANNEL_NAME= In case the job are still !Pending and the !FTS version is less than =2.0=, you may need to check if there are !FTS transfer process alive. In fact, it may happen that due to network problem, some of these processes don't complete correctly or die unexpectedly, leaving the related log files in =/var/tmp/glite-url-copy-edguser= and wasting transfer slots. If that is the case, you have to stop the related channel agents, kill the "zombie" processes and cleanup the transfer log files for the involved channels. Once, you'll restart the channel agents, they will detect the abnormal termination of the transfers and the VO agents will reschedule them according to the configured retry policy If the job is still !Pending, follow the procedure reported [[#LastHope][here]] #SecurityError ---++++ Symptom: All my transfers fail with a SECURITY_ERROR This issue is usually due to a problem in the interaction from a !FTA and the !MyProxy server. This mainly happens in the following cases: * User is mistyping the !MyProxy passphrase when submitting the job * User has an invalid or expired certificate in !MyProxy * The agent is not an authorized retrieves for !MyProxy * There is a authentication problem (expired certificate or crl) In the first two cases, all the transfers of this user should fail while the ones of other users succeed, while in the others all the transfers would faild, indipendently of the user. Usually, you can detect the type of the error by having a look at the agent log file in =/opt/log/glite/glite-transfer-channel-agent-TYPE-CHANNEL_NAME.log= or =/opt/log/glite/glite-transfer-vo-agent-VO_NAME.log= * If the problem is due to a wrong passphrase, you'll see <verbatim> 2005-08-26 07:25:52,281 ERROR transfer-agent-myproxy - Failed to get the proxy from the !MyProxyServer. Reason is: Reason is Error in bind() ERROR from server: invalid pass phrase </verbatim> Ask then the user to resubmit his/her file, possibly using the =-p= option of =glite-transfer-submit=. In case the problem persists, maybe the user forgot teh passphrase, so ask him/her to restore the credential in myproxy using =myproxy-init -s MYPROXY_SERVER -d= * In case the agent is not an authorized retriever, you'll see the a similar entry <verbatim> 2005-08-26 07:25:52,281 ERROR transfer-agent-myproxy - Failed to get the proxy from the MyProxyServer. Reason is: ERROR from server: "<anonymous>" not authorized by server's authorized_retriever policy </verbatim> If that is the case, you have to contact the !MyProxy server administrator and ask him to add the DN of the certificate of the account used to run the agent. If it still doesn't work, please also check the the agent is running with a valid certificate, following what described [[#CannotGetAgentDN][here]] * in case the entry is similar to <verbatim> 2005-08-26 07:25:52,281 ERROR transfer-agent-myproxy - Failed to get the proxy from the MyProxyServer. Reason is: Error authenticating: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: (null) </verbatim> This problem is usually due to an expired certificate or to an expired certificate revocation list (crl). Please check the validity of the certicates and update the crl in both the agent and !MyProxy nodes * In case you see errors like: <verbatim> 2005-08-26 07:25:52,281 ERROR transfer-agent-myproxy - Failed to get the proxy from the !MyProxyServer. Reason is: Reason is Error in bind() </verbatim> whithout any other details, please check that the environment variables MYPROXY_TCP_PORT_RANGE and GLOBUS_TCP_PORT_RANGE are unset for the account used to run the agents. * In the other cases, ask the user to store again his/her certificate in !MyProxy, running the command =myproxy-init -s MYPROXY_SERVER -d= Please note that the the =-d= option is required in order to associte the credentials to the DN of the user instead of the account name If you need to know which !MyProxy server is used, have a look [[#WhichMyProxy][here]] #WhichMyProxy ---++++ Which !MyProxy Server is used? When an agent has to perform an operation in behalf of the user, it retrieves the user's delegated credentials from the configured !MyProxy server, cache it in the local file system and then impersonate the user by setting the environment variable X509_USER_PROXY. The operations where this is required are: * Retrieve services endpoints and information from !ServiceDiscovery * Perform the transfer * Contact the catalog for retrieving the list of replicas and registering the new ones when the transfer is finished (only in case of FPS VO Agent) The endpoint of the !MyProxy server is usually retrieved using !ServiceDiscovery, so in case of the file plugin, you need to have an entry in =/opt/glite/etc/services.xml= like <verbatim> <service name='MyProxy'> <parameters> <endpoint>myproxy://myproxy.cern.ch</endpoint> <type>MyProxy</type> <version>1.14</version> </parameters> </service> </verbatim> You can query the !InfoSys using the command =glite-sd-query -t !MyProxy= In order to resolve which !MyProxy server should be used, the !FileTransferAgent looks into the associated services of the !FileTransferTrash.EGEEService who received the user's request (available from gLite 1.3 QF23) or, if not found, takes the first !MyProxy server returned by the !InformationSystem; you can also force the server to use a specific instance by setting the agent configuration property =MYPROXY_SERVER= (=transfer-agent-myproxy.Server=). In case this property is not set and there is no !MyProxy entry registered in the !InfoSys, the environment variable $MYPROXY_SERVER is used. Starting from version gLite 1.3 QF23, the user is also allowed to specify the myproxy he want to use by providing the option =-m myproxy_hostname= in the =glite-transfer-submit= command line. #MyProxyBindError ---++++ Error: 'Failed to get proxy certificate from myproxy-fts.cern.ch . Reason is Error in bind()' When using !MyProxy servers, you should ensure that the outgoing port range is set correctly in the agent servers' environments. This is normally done via the =/etc/profile.d/= grid scripts. See mail from Maarten: <verbatim> Hi Jason, please check if all the agents have this in their environment: MYPROXY_TCP_PORT_RANGE=20000,25000 Note the comma. The bind() error usually comes from the Myproxy client code defaulting to using the GLOBUS_TCP_PORT_RANGE, defined as follows: GLOBUS_TCP_PORT_RANGE=20000 25000 Note the space: the Myproxy client does not handle that properly, leading to occasional bind() errors... </verbatim> #CannotGetAgentDN ---++++ Symptom: I've noticed a warning "Cannot Get Agent DN" in the agent log files You can see this entry in case the agent doesn't run with a valid certificate. When an !FTA starts, it put an logs the DN of the certificate the agent will use. This certificate is used to perform the following actions: * Retrieve the user delegated credentials from !MyProxy using the passphrase provided by the user. This happend both on the !Channel and the VO Agents * Perfom the transfer If the agent doesn't have a valid certificate, it's likely that these operations would fail. In order to fix this problem, check first that the user running the agents has a valid certificate: usually this certificate are installed in =$HOME/.globus/usercert.pem= and =$HOME/.globus/userkey.pem= and should be owned by the user. In case the certificate is installed in a different place, the environment variables X509_USER_CERT and X509_USER_KEY shoudl be set accordingly. You should also check that the certificate is not expired, by running: =openssl x509 -text -in ~/.globus/usercert.pem= or =openssl x509 -text -in $X509_USER_CERT= In case the certificate is valid but the agent always reports the warning, check if there is an expired proxy certificate in =/tmp/x509up_uUSER_ID= (where =USER_ID= is the uder id of the account used to run the agent) and delete it. #SrmCopyMalformedUrl ---++++ Symptom: My srmcopy transfers fail with a dCache !MalformedUrl exception You may notice this error when a user is transfering files to a dChache SE using a channel configured to perform =srmcopy= transfers. This is due to a bug in dCache version <= 1.6.5 in parsing the URL. You have to ask the user to resubmit his/her requests using the following conventions: * In case the destination SE is dCache, and the source is !Castor or DPM * !Source SURL can be <verbatim> srm://<castorsrm>:<port>//srm/managerv1?SFN=<path> srm://<castorsrm>:<port>/?SFN=<path> srm://<castorsrm>/<path> </verbatim> * !Destination SURL should be <verbatim> srm://<dcachesrm>:<port>/srm/managerv1?SFN=<path> srm://<dcachesrm>/<path> </verbatim> * In case the source SE is dCache and the destination one is !Castor or DPM * !Source SURL should be <verbatim> srm://<dcachesrm>:<port>/srm/managerv1?SFN=<path> srm://<dcachesrm>/<path> </verbatim> * !Destination SURL can be <verbatim> srm://<castorsrm>:<port>/srm/managerv1?SFN=<path> srm://<castorsrm>:<port>//srm/managerv1?SFN=<path> srm://<castorsrm>:<port>/?SFN=<path> srm://<castorsrm>:<port>/<path> srm://<castorsrm>/<path> </verbatim> * In case both the source and destination SE are dCache * !Source SURL should be <verbatim> srm://<dcachesrm>:<port>//srm/managerv1?SFN=<path> srm://<dcachesrm>/<path> </verbatim> * !Destination SURL should be <verbatim> srm://<dcachesrm>:<port>/srm/managerv1?SFN=<path> srm://<dcachesrm>/<path> </verbatim> This problem is fixed in dCache v 1.6.6, however this new version doesn't seem to accept the compact SURL format <verbatim> srm://<srmhost>/<path> </verbatim> If the destination SE is then dCache and it's version is 1.6.6, we suggest to use for both source and destination SURLs either: <verbatim> srm://<srmhost>:<port>/<path> </verbatim> or the fully qualified one: <verbatim> srm://<srmhost>:<port>/srm/managerv1?SFN=<path> </verbatim> #DCacheSrmCopyUrl ---++++ Symptom: I've upgraded to 1.4.1 but srmcopy doesn't seem to work Starting from version 1.3QF23, the !FileTransferAgent normalize the SURLs before executing all the SRM get, put and copy requests and the default normalization is to convert them into the compact format <verbatim> srm://<srmhost>/<path> </verbatim> As illustrated [[#SrmCopyMalformedUrl][here]], we observed a problem with dCache srmcopy in version 1.6.6 not working with this format: after ~30 minutes the error returned is <verbatim> number of retries exceeded:org.dcache.srm.scheduler.NonFatalJobFailure: java.io.IOException: both from and to url are not local srm </verbatim> In order to workaround this problem, you have to change the configuration of !FilteTransferAgent normalization to use a different format, by setting the !ChannelAgent configuration property =ACTIONS_SURLNORMALIZATION (=transfer-agent-channel-actions.SurlNormalization=) to either =compact-with-port= for converting to the format <verbatim> srm://<srmhost>:<port>/<path> </verbatim> or =fully-qualified= for the format <verbatim> srm://<srmhost>:<port>/srm/managerv1?SFN=<path> </verbatim> Please note that this is not a bug in FTS, but a problem in dCache; you might have observed after upgrading to 1.4.1 because this version of FTS has been release more or less at the same time as dCache 1.6.6 #PingNull ---++++ I've upgraded to 1.4.1 but the transfer failed with Error in srm__ping: NULL Starting from version 1.4.1, FTS retrieves the srm endpoint from the information system, instead of parsing the SURL and, in case one of the compact formats are used, using the default port (8443) and service path (srm/managerv1). In case your transfers start failing after the upgrade with an error: <verbatim> Cannot Contact SRM Service. Error in srm__ping: NULL </verbatim> probably the entry in the information system is not correct: in fact, a common error that has been observed is that the SRM endpoint is stored as <verbatim> srm://<srmhost>:<port>/srm/managerv1 </verbatim> instead of <verbatim> httpg://<srmhost>:<port>/srm/managerv1 </verbatim> You can also check by looking into the transfer log files (located in =/var/tmp/glite-transfer-url-copy-UID/CHANNEL_NAMEfailed= in the related !ChannelAgent box) and check the endpoint that is used for the SRM calls #NoSiteFoundForHost ---++++ Symptom: The transfer failed with the error: No site found for host ... During the allocation phase the !VOAgent needs to resolve what are the sites that will be involved during the transfer. In order to do that, the agent will look up in the information system the site names of the source and destination SRMs, querying by the hostname retrieved from the provided SURLs. In case the user gets an error like: <verbatim> Failed to Get Channel Name: No site found for host ... </verbatim> You have to look at the following things: * The entry concerning the SRM services should be listed in the information system * The SD library plugins are defined and configured properly (environament variables, files, etc) * If the file-based plugin is chosen, the =/opt/glite/etc/services.xml= file is properly formatted In order to do detect errors, it's useful to run the command: <verbatim> su - ACCOUNT_USED_TO_RUN_THE_VOAGENT -c '/opt/glite/bin/glite-sd-query -t SRM --host SRM_HOSTNAME' </verbatim> and check the result (this command execute the same query as the agent). In the problem still persists, it may be worth to have a look at the /proc tanle and see if the <verbatim> /proc/VOAGENT_PROCESS_ID/environ </verbatim> contains the correct values for the =GLITE_LOCATION= and =GLITE_SD_*= environment variables. In case the !StorageElement should not be listed in the information system, you may want to have a look [[#NotDedicatedChannel][here]] #EndOfFileReached ---++++ The transfer failed with the error: an end-of-file was reached This error is returned by the globus gridftp library to the !ChannelAgent. We don't have many details, but the experience seems to demonstrate that this error happens when the destination !SE is full and there's no more space available on disk. In this sense, the =end-of-file was reached= could be interpreted as a =write= command that returned 0 bytes written. If the number of this kind of error increases, set the channel status to =Inactive= and then contact the administrator at the destination site in order to verify the status of the !SE. #WhichServiceTypes ---++++ Which Trash.EGEEService Types are used? The File Transfer Agent needs to interact with external services in order to accomplish its tasks and used the gLite !ServiceDiscovery API in order to discover their properties. The involved services are: * !MyProxy: used to retrieve the clients' delegated credentials * !SRM & !GridFtp: the site information is used to allocate a transfer job to a channel * !FileCatalog: used by the vo-agent in FPS mode in order to retrieve the sourec replicas to be used for a transfer and registered the new replicas when the transfer is finished In order to discover that information the File Transfer Agent used the service types listed in [[http://infnforge.cnaf.infn.it/glueinfomodel/index.php/V12/ServiceType][Glue Trash.EGEEService Types]] As reported in bug [[http://savannah.cern.ch/bugs/?func=detailitem&item_id=12961][#12961]], however, the service type for a !GridFtp server is set to =GridFTP= instead of =gsiftp= and a backward compatible fix is foreseen for a future release. As a temporary workaround you could follow the comments reported on the bug. #LastHope ---++++ I've tried everything, and it still doesn't seem to work In case your problem is listed in this page, but none of proposed solutions doesn't seem to work, you can generate verbose log files and send them to [[mailto:fts-support@cern.ch fts-support]]. In order to generate these files, please follow the procedure: For each agent involved (the VO one responsible to allocate a transfer to a channel and retry failed transfer; and the Channel one, responsible to transfer the files and monitor the status), please edit the files =glite-transfer-vo-agent-VO_NAME.log-properties= (in case of !VO !FTA) and/or =glite-transfer-channel-agent-TYPE-CHANNEL_NAME.log-properties= (in case of !Channel !FTA) in =/opt/glite/etc/glite-data-transfer-agents.d/= and replace the lines =log4j.rootCategory=INFO, file= with =log4j.rootCategory=DEBUG, file= and e =log4j.appender.file.fileName=/var/log/glite/glite-transfer-channel-agent-TYPE-CHANNEL_NAME.log= or =log4j.appender.file.fileName=/var/log/glite/glite-transfer-vo-agent-VO_NAME.log= with =log4j.appender.file.fileName=/var/log/glite/glite-transfer-channel-agent-TYPE-CHANNEL_NAME.debug.log= or =log4j.appender.file.fileName=/var/log/glite/glite-transfer-vo-agent-VO_NAME.debug.log= Restart the agents and let them running for ~ 1 minute; then stop the agents, restore the original values of the modified files, start the agents again and mail these =/var/log/glite/*.debug.log= files to [[mailto:fts-support@cern.ch fts-support]] ---++ Channel Administration #NFilesForVO ---++++ Symptom: How do I set the number of files transferred per VO instead of per channel? In the FTS Channel Agent you have three parameters you can act on in order to tune the inter-vo scheduling: the channel VO share, the numbers of files that the channel can process concurrently and the =AGENT_VOSHARETYPE= (=transfer-channel-agent.VOShareType=) configuration property. The purpose of this configuration parameter is to define a policy how the VO share should be interpreted for a channel and you can add it to the instance that corresponds to the related channel agent in the configuration file. The allowed values are: * *normalized*: the share is the value of the channel =voshare= property for the given VO, normalized to the sum of all the shares for all the VOs in the same channel. This option could be used when channel administrators want to guarantee slots for certain VOs, in order to implement some sort of !QoS, accepting to eventually penalize the total throughput (transfer slots would be reserved to a VO even if that VO has no job to process) * *absolute*: the share is the value on the channel =voshare= property expressed as a percentage. No normalization is performed, that means that the sum of all the shares on the same channel can exceed 100%. This option could be used when channel administrators want to balance the share between the VOs, without allowing that a single VO fully allocate a channel but minimizing the risk to allocate slots to VOs that don't have any job to process. This option implies some tuning on the VO share values based on experience, but it would allow to have a compromise between throughput and !QoS. * *normalized-on-active*: the share is the value of the channel =voshare= property for the given VO, normalized to the sum of all the share for all the VOs in the same channel that has at least one job that can be processed by the Channel Agent (job state should be Active, Pending or Canceling). This option is the default and should be used when the channel administrators want to optimize the throughput of the channel (the channel can be fully allocated even by one VO), but with a lower !QoS As an example, supposing you have a channel that has 30 files and 3 VOs, you could have: | || *Normalized* | *Absolute* | *Normalized-on-active** | | VO | Share | Max Files | Max Files | Max Files | |VO_1 | 50 | 15 | 15 | 0 | |VO_2 | 30 | 9 | 9 | 18 | |VO_3 | 20 | 6 | 6 | 12 | (* supposing VO_1 has no job to submit) As you can notice, in case the sum of the VO share is 100, there's no difference between the "normalized" and "absolute" setup. But if this constraint is not respected, you can have: | || *Normalized* | *Absolute* | *Normalized-on-active** | | VO | Share | Max Files | Max Files | Max Files | |VO_1 | 70 | 14 | 21 | 0 | |VO_2 | 50 | 10 | 15 | 19 | |VO_3 | 30 | 6 | 9 | 11 | (* supposing VO_1 has no job to submit) Please note that the value of the column "Max Files" correspond to the maximum number of files a VO is authorized to submit at the same time. In any case the constraint imposed by the "files" channel property is always respected. If you want to start with two VOs, setting them each to be able to perform up to 15 transfers concurrently: Set the =AGENT_VOSHARETYPE= (=transfer-channel-agent.VOShareType=) to _normalized_ (or _absolute_), having the VO share set to 50 and the channel files set to 30: you'll allow then up to 30 parallel transfers on the channel, but each VO would not be able to submit more than 15 at the same time. In case you'll have to support other VOs, you'll need to adjust these percentages. ----- Last edit: %SEARCH{".*" nosearch="on" regex="on" scope="title" nototal="no" topic="DMFtsSupport" format="$wikiusername on $date"}% Maintainer: Main.PaoloBadino -----
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r35
<
r34
<
r33
<
r32
<
r31
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r32 - 2008-01-21
-
LaurenceField
Log In
EGEE
EGEE Web
EGEE Web Home
gLite
ProductTeams
SA3
JRA1
TMB
EMT
SA1
SA2
NA2
NA4
EGEE-UIG
List of
registered projects
List of EGEE-RP
interactions
Changes
Index
Search
Main.WebList
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
EGEE
All webs
Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Ask a support question
or
Send feedback