70 likes | 248 Vues
WP1 WMS release 2: status and open issues. Massimo Sgaravatto INFN Padova. Status. Latest released WP1 RPMs: 2.1.5 Deployed in the EDG dev testbed Under tests by LCG
E N D
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova
Status • Latest released WP1 RPMs: 2.1.5 • Deployed in the EDG dev testbed • Under tests by LCG • New procedure (agreed months ago but never applied) used when we have to release RPMs (in order to avoid releasing broken RPMs) • “Test” branch forked from “head” • Tests done relying on this branch • Fixes to be applied on the “Test” branch and then do a merge, or fixes applied to both “Test” and “head” branches ?
What was fixed since Heidelberg • Problem with resubmission: resubmission was tried even if a job aborted because proxy expired (bug #1643) • Not use anymore /tmp, otherwise “old” WMS files get deleted by tmpwatch (bug #1918) • Actually tmpwatch affects also /var/tmp • Restart of daemons (bug #1105) • edg-wl-ns restart didn’t work (bug #1798) • Documentation: API doc provided (in WP1 web site) • Deadlock problem with filelist (bug #2054) • Remove grid-proxy-init/edg-voms-proxy-init from UI commands (bug #2195) • Warning message if a “non supported” JDL attribute is specified (bug #446) • Performance problems when edg-job-status/get-logging-info called for multiple jobs (bug #2196) • …
Open issues and missing functionalities • In Heidelberg we decided to address various issues by the end of the projects • There are also some new ones • We should decide which ones can be really addressed by the end of the project (and where: head/dagman), taking into account the other priorities
Open issues/missing functionalities • Jobs stay in the “done” status after OSB retrieval (bug #2229) • Failures logging the “Cleared” event • Many job submissions fail because “Register” logging fails • Or at least it is reported that the logging failed • Filelist problem (bug #2220) • Looks like the problem was not really fixed • Segfaults (?) in NS, WM, JC, LM • Memory leaks in NS (bug #2104) • Memory leaks in underline code ? • Logging by WM, JC and LM fail when SSL problems using user proxy (not only when it expired) • Shall we use the host proxy when this happens ? (bug #2016) • Problem with resubmission: CEs already “used” are not considered anymore (bug #1103)
Open issues/missing functionalities • Registration of WMS services in RGMA and status scripts (bug #1324) • BrokerInfo: Replacement for old getSelectedFile needed (bug #1848) • Dynamic quota management in NS • edg-job-list-match and edg-job-submit can hang (bug #1362) • Approach: allow at least CTRL-C • Not abort immediately a job in case of problems (RLS or II down), but retry for a while (bug #1812) • Matchmaking should be retried till a certain TimeLimit=Min(TimeLimitJDL, TimeLimitConf) • More clear error messages when no resources found with edg-job-list-match (bug #1997) • As already done with edg-job-submit • Documentation: Gangmatching note missing
Open issues/missing functionalities • Exploit LB ACLs setting-query via command line tools • Call a WP3 monitor script from Job Wrapper • Has this been discussed with the tech. coordinator ? • Exploit LB extended querying capabilities • UI commands for these queries • Possibility to define user tag in JDL to exploit extended querying capabilities • Use of FTSH in JobWrapper • In DAGMan branch ? • GRIS queries • Issue raised at Heidelberg by applications and also in the Iteam ML • Matchmaking with InputData when “file” is used as protocol • Issue raised at Heidelberg by applications and also in the Iteam ML