Metronome 2.2.4
Warning
This release of Metronome is known to be broken when used with Condor 6.9.2. If you want to use Condor 6.9.2, download release 2.2.3 (or earlier) or 2.2.5 (or later).
Download
Release Date: 05/04/2007
MD5 checksum: fc8e23f58b288d391b3c8116007939e3
Release Notes
New Run Directory Hierarchy
Due to the large size of some Metronome installations, continuing with a single directory for all runs proved to unfeasible. At the installation here at UW-Madison, our submit nodes were bogged down because a single run directory contained 20,000+ subdirectories at a single level. Therefore, starting in Metronome 2.2.4, the framework will break up directories into the following levels:
/path/to/nmi/rundir/<4 digit year>/<2 digit month>///
Example:
/nmi/run/2007/04/pavlo/pavlo_nmi-s002.cs.wisc.edu_1175765722_27363/
See this report for more information and a discussion about the change. The Metronome toolkit and web interface have been retrofited to be backwards compatible with the old run directory format. One can use the new nmi_migrate_run utility to transition run directories to the new hierarchy.
Run Notes & Comments
The web status pages can now optionally provide visitors with the ability to add notes and comments for runs. If you are upgrading from an existing installation, you must execute the following SQL command to add the new column to the database.
ALTER TABLE Run ADD COLUMN notes varchar(255) NOT NULL DEFAULT '';
In order for this feature to work, the DB_READER_USER account in the database must be granted update permissions to the notes in the Run table. Use the following command to update your database privileges table (changing DB_READER_USER and DB_READER_PASS to match your existing account).
GRANT UPDATE (notes) ON nmi_history.Run \ TO 'DB_READER_USER'@'%.example.com' IDENTIFIED BY 'DB_READER_PASS';
Lastly, you must also set RUN_ALLOW_USER_NOTES to true in the web interface’s configuration file (etc/config.inc).
New Features
- Added a new transaction-safe nmi_migrate_run utility for moving runs from one submit node to another. This tool can also be used to move existing runs from the old directory structure to the new nested format (see above).
- The web status pages can now optionally provide visitors with the ability to add notes and comments for runs (see above).
- The web status pages now more helpfully display “Interrupted” for temporarily interrupted tasks and “Removed” for externally-removed Condor jobs, instead of the previous raw -1003 and -1002 values in the task result column.
nmi_submitnow produces more succinct and useful output unless--verboseis specified (feature 472)- The framework now keeps better track of Condor jobs submitted for runs. There is a new nmi_runid2condor utility that will return a list of Condor job ids for a particular run. There is also a
--historyoption that will pull Condor job ids from the installation’s history log file. - By default, Metronome will now try fetch an input three times before giving up. This can be changed on a per-submit-file basis with the option fetch_retry_count, or on a machine-wide basis by the site administrator in
nmi.confwith the option FETCH_RETRY_COUNT. - The web interface now features a better navigation menu on the left-hand side bar and a search bar at the top of every page.
- The default homepage for the web interface now includes a brief summary of the Metronome installation.
- The web interface is now certified to be compatible with PHP 5.
- The Condor userlog (run.log) is now viewable from the web interface.
- The nmi_putfile and nmi_getfile scripts were added to assist with communication between nodes of a parallel job. Documentation on these scripts may be found here
Bugs Fixed
- The
nmi_resource_advertiserno longer reconfigures the local Condor daemons every time it is executed, but now only does so when the routing table contents have changed. - Fixed
nmi_rmto allow runs to be removed when their result code is null, and to correctly remove dependent runs when the--remove-consumersflag is used (bugs 501 and 868) - The hostnames of all nodes of parallel jobs are now correctly published in the
platform_jobtask’s Condor job classad. - Improved error messages in
nmi_runid2gidandnmi_gid2runidwhen a runid/gid cannot be found (bug 921). - Previously, if any of a user’s platform-specific workspaces contained files whose relative pathnames exceeded 255 characters, the platform_job task would fail (return code 1) while extracting them on vendor unix platforms, due to an incompatibility between GNU tar and the vendors’ tar implementations. Now, if such platforms advertise a
nmi_gnutarattribute in their Condor machine classad, Metronome will use it instead. (Note: due to a Condor bug, this does not currently work for parallel tasks.) - A single SITE_LOGO can be defined for the web interface without a numerical suffix.
- The “View File” feature of the web interface no longer relies on the web server to handle content-type for stdout/stderr files.
- Fixed
nmi_rmnot handling multiple runids as input.
Known Bugs
- No known critical bugs (all others, as of the current release)
Requirements
- Metronome Submit/Archive Host
- Condor >= 6.8.0 or Condor >= 6.9.0
- Perl >= 5.005 (including DBI and DBD::mysql modules)
- Apache >= 2.0
- PHP >= 4.2.3 (i.e, with Session & MySQL support)
- Metronome DB Host
- MySQL 4.1.20
- Condor Central Manager Host
- Condor >= 6.8.0 or Condor >= 6.9.0
- Build/Test Execution Hosts
- Condor >= 6.8.0 or Condor >= 6.9.0
- Perl >= 5.005
Special Feature Requirements
- For Cross-Facility Job Migration: Condor >= 6.8.0
- For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.
- Printer-friendly version
- Login or register to post comments
