Metronome 2.2.4

Warning

This release of Metronome is known to be broken when used with Condor 6.9.2. If you want to use Condor 6.9.2, download release 2.2.3 (or earlier) or 2.2.5 (or later).

Download

nmi-2.2.4.tar.gz

Release Date: 05/04/2007
MD5 checksum: fc8e23f58b288d391b3c8116007939e3

Release Notes

New Run Directory Hierarchy

Due to the large size of some Metronome installations, continuing with a single directory for all runs proved to unfeasible. At the installation here at UW-Madison, our submit nodes were bogged down because a single run directory contained 20,000+ subdirectories at a single level. Therefore, starting in Metronome 2.2.4, the framework will break up directories into the following levels:

/path/to/nmi/rundir/<4 digit year>/<2 digit month>///

Example:

/nmi/run/2007/04/pavlo/pavlo_nmi-s002.cs.wisc.edu_1175765722_27363/

See this report for more information and a discussion about the change. The Metronome toolkit and web interface have been retrofited to be backwards compatible with the old run directory format. One can use the new nmi_migrate_run utility to transition run directories to the new hierarchy.

Run Notes & Comments

The web status pages can now optionally provide visitors with the ability to add notes and comments for runs. If you are upgrading from an existing installation, you must execute the following SQL command to add the new column to the database.

ALTER TABLE Run ADD COLUMN notes varchar(255) NOT NULL DEFAULT '';

In order for this feature to work, the DB_READER_USER account in the database must be granted update permissions to the notes in the Run table. Use the following command to update your database privileges table (changing DB_READER_USER and DB_READER_PASS to match your existing account).

GRANT UPDATE (notes) ON nmi_history.Run \
TO 'DB_READER_USER'@'%.example.com' IDENTIFIED BY 'DB_READER_PASS';

Lastly, you must also set RUN_ALLOW_USER_NOTES to true in the web interface’s configuration file (etc/config.inc).

New Features

  • Added a new transaction-safe nmi_migrate_run utility for moving runs from one submit node to another. This tool can also be used to move existing runs from the old directory structure to the new nested format (see above).
  • The web status pages can now optionally provide visitors with the ability to add notes and comments for runs (see above).
  • The web status pages now more helpfully display “Interrupted” for temporarily interrupted tasks and “Removed” for externally-removed Condor jobs, instead of the previous raw -1003 and -1002 values in the task result column.
  • nmi_submit now produces more succinct and useful output unless --verbose is specified (feature 472)
  • The framework now keeps better track of Condor jobs submitted for runs. There is a new nmi_runid2condor utility that will return a list of Condor job ids for a particular run. There is also a --history option that will pull Condor job ids from the installation’s history log file.
  • By default, Metronome will now try fetch an input three times before giving up. This can be changed on a per-submit-file basis with the option fetch_retry_count, or on a machine-wide basis by the site administrator in nmi.conf with the option FETCH_RETRY_COUNT.
  • The web interface now features a better navigation menu on the left-hand side bar and a search bar at the top of every page.
  • The default homepage for the web interface now includes a brief summary of the Metronome installation.
  • The web interface is now certified to be compatible with PHP 5.
  • The Condor userlog (run.log) is now viewable from the web interface.
  • The nmi_putfile and nmi_getfile scripts were added to assist with communication between nodes of a parallel job. Documentation on these scripts may be found here

Bugs Fixed

  • The nmi_resource_advertiser no longer reconfigures the local Condor daemons every time it is executed, but now only does so when the routing table contents have changed.
  • Fixed nmi_rm to allow runs to be removed when their result code is null, and to correctly remove dependent runs when the --remove-consumers flag is used (bugs 501 and 868)
  • The hostnames of all nodes of parallel jobs are now correctly published in the platform_job task’s Condor job classad.
  • Improved error messages in nmi_runid2gid and nmi_gid2runid when a runid/gid cannot be found (bug 921).
  • Previously, if any of a user’s platform-specific workspaces contained files whose relative pathnames exceeded 255 characters, the platform_job task would fail (return code 1) while extracting them on vendor unix platforms, due to an incompatibility between GNU tar and the vendors’ tar implementations. Now, if such platforms advertise a nmi_gnutar attribute in their Condor machine classad, Metronome will use it instead. (Note: due to a Condor bug, this does not currently work for parallel tasks.)
  • A single SITE_LOGO can be defined for the web interface without a numerical suffix.
  • The “View File” feature of the web interface no longer relies on the web server to handle content-type for stdout/stderr files.
  • Fixed nmi_rm not handling multiple runids as input.

Known Bugs

  • No known critical bugs (all others, as of the current release)

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.