Information for Metronome lab administrators

This section describes how to download, install, configure, and maintain a Metronome lab at your own site.

Mailing Lists

In the near future we will create more specialized mailing lists for announcements and discussion relevant to Metronome lab administrators. For the time being, however, the following lists are recommended:

Metronome Releases

Subsequent to Metronome 2.2.8, all releases with even minor versions will be stable releases, and all releases with odd minor versions will be development releases.

The first stable series will be 2.4.x, and the first development series will be 2.5.x.

Metronome 2.2.2

Download

nmi-2.2.2.tar.gz

Release Date: Feb 22, 2007.
MD5 checksum: d8957d492270892dafd3dd5f3cbcafcb ./nmi-2.2.2.tar.gz

Release Notes

NOTE: Be sure to read the NMI 2.2.0 Release Notes to understand the changes made since NMI 2.1.8. Most notable are the configuration parameter changes. The NMI 2.2.1 Release Notes summarize the changes from 2.2.0.

This release fixes a major bug recently discovered at our production facility.

Metronome 2.2.2 is marked as a STABLE release; all users of NMI 2.2.x are encouraged to upgrade to this latest release.

New Features

  • Added support for running bash scripts on Windows.

Bugs Fixed

  • Metronome installations can now run jobs on non-x86 (_i.e._, x86_64) Windows platforms.
  • The nmi_putattr command now correctly handles values containing quote characters.

Known Bugs

  • None

Requirements

  • NMI Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • NMI DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.2.3

Download

nmi-2.2.3.tar.gz

Release Date: 03/29/2007
MD5 checksum: cc25e5f463f8e9228c05404796e70721

Release Notes

New Features

  • Added nmi_list_prereqs, a command to list prereqs (currently) available in the pool.
  • Improved functionality of nmi_rm. You can now remove all users jobs if run as root. More information can be found here.

Bugs Fixed

  • The nmi_putattr and nmi_getattr commands now correctly store and retrieve any valid string value, regardless of its contents, and correctly reject strings containing invalid characters.
  • Fixed the sort by duration on both the results list and the run details pages in the web interface. This was reported in bug #788 and on Savannah.
  • Fixed -w -ww options for nmi_condor_status to use the database configuration information from nmi.conf. This allows these options to function properly in more enviroments.

Known Bugs

  • None

Requirements

  • NMI Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • NMI DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.2.4

Warning

This release of Metronome is known to be broken when used with Condor 6.9.2. If you want to use Condor 6.9.2, download release 2.2.3 (or earlier) or 2.2.5 (or later).

Download

nmi-2.2.4.tar.gz

Release Date: 05/04/2007
MD5 checksum: fc8e23f58b288d391b3c8116007939e3

Release Notes

New Run Directory Hierarchy

Due to the large size of some Metronome installations, continuing with a single directory for all runs proved to unfeasible. At the installation here at UW-Madison, our submit nodes were bogged down because a single run directory contained 20,000+ subdirectories at a single level. Therefore, starting in Metronome 2.2.4, the framework will break up directories into the following levels:

/path/to/nmi/rundir/<4 digit year>/<2 digit month>///

Example:

/nmi/run/2007/04/pavlo/pavlo_nmi-s002.cs.wisc.edu_1175765722_27363/

See this report for more information and a discussion about the change. The Metronome toolkit and web interface have been retrofited to be backwards compatible with the old run directory format. One can use the new nmi_migrate_run utility to transition run directories to the new hierarchy.

Run Notes & Comments

The web status pages can now optionally provide visitors with the ability to add notes and comments for runs. If you are upgrading from an existing installation, you must execute the following SQL command to add the new column to the database.

ALTER TABLE Run ADD COLUMN notes varchar(255) NOT NULL DEFAULT '';

In order for this feature to work, the DB_READER_USER account in the database must be granted update permissions to the notes in the Run table. Use the following command to update your database privileges table (changing DB_READER_USER and DB_READER_PASS to match your existing account).

GRANT UPDATE (notes) ON nmi_history.Run \
TO 'DB_READER_USER'@'%.example.com' IDENTIFIED BY 'DB_READER_PASS';

Lastly, you must also set RUN_ALLOW_USER_NOTES to true in the web interface’s configuration file (etc/config.inc).

New Features

  • Added a new transaction-safe nmi_migrate_run utility for moving runs from one submit node to another. This tool can also be used to move existing runs from the old directory structure to the new nested format (see above).
  • The web status pages can now optionally provide visitors with the ability to add notes and comments for runs (see above).
  • The web status pages now more helpfully display “Interrupted” for temporarily interrupted tasks and “Removed” for externally-removed Condor jobs, instead of the previous raw -1003 and -1002 values in the task result column.
  • nmi_submit now produces more succinct and useful output unless --verbose is specified (feature 472)
  • The framework now keeps better track of Condor jobs submitted for runs. There is a new nmi_runid2condor utility that will return a list of Condor job ids for a particular run. There is also a --history option that will pull Condor job ids from the installation’s history log file.
  • By default, Metronome will now try fetch an input three times before giving up. This can be changed on a per-submit-file basis with the option fetch_retry_count, or on a machine-wide basis by the site administrator in nmi.conf with the option FETCH_RETRY_COUNT.
  • The web interface now features a better navigation menu on the left-hand side bar and a search bar at the top of every page.
  • The default homepage for the web interface now includes a brief summary of the Metronome installation.
  • The web interface is now certified to be compatible with PHP 5.
  • The Condor userlog (run.log) is now viewable from the web interface.
  • The nmi_putfile and nmi_getfile scripts were added to assist with communication between nodes of a parallel job. Documentation on these scripts may be found here

Bugs Fixed

  • The nmi_resource_advertiser no longer reconfigures the local Condor daemons every time it is executed, but now only does so when the routing table contents have changed.
  • Fixed nmi_rm to allow runs to be removed when their result code is null, and to correctly remove dependent runs when the --remove-consumers flag is used (bugs 501 and 868)
  • The hostnames of all nodes of parallel jobs are now correctly published in the platform_job task’s Condor job classad.
  • Improved error messages in nmi_runid2gid and nmi_gid2runid when a runid/gid cannot be found (bug 921).
  • Previously, if any of a user’s platform-specific workspaces contained files whose relative pathnames exceeded 255 characters, the platform_job task would fail (return code 1) while extracting them on vendor unix platforms, due to an incompatibility between GNU tar and the vendors’ tar implementations. Now, if such platforms advertise a nmi_gnutar attribute in their Condor machine classad, Metronome will use it instead. (Note: due to a Condor bug, this does not currently work for parallel tasks.)
  • A single SITE_LOGO can be defined for the web interface without a numerical suffix.
  • The “View File” feature of the web interface no longer relies on the web server to handle content-type for stdout/stderr files.
  • Fixed nmi_rm not handling multiple runids as input.

Known Bugs

  • No known critical bugs (all others, as of the current release)

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.2.5

Download

Release Date: 2007-05-15

metronome-2.2.5.tar.gz
MD5 checksum: 3f1eb4f04b6ea283d27e77e1ecaf88f5

Metronome-2.2.5-0.noarch.rpm
MD5 checksum: 150b3de16f959ae89ac6da3ee8a1cdce

Release Notes

None

New Features

  • Cleaner output from nmi_list_prereqs, which now ignores the ‘mtime’ entries.
  • Will add ‘<prereq>/lib64’ as well as ‘<prereq>/lib’ to LD_LIBRARY_PATH.

Bugs Fixed

  • Prereqs with numbers in the name (e.g., ‘m4’, ‘bzip2’) now listed by nmi_list_prereqs.
  • Prereqs whose version strings include non-dot separators now accurately reported.
  • Fixed the “Download Results” table on the Run Details sidebar in the web interface. Also fixed the “Archive” button on the runs overview page.
  • Fixed incompatability with Condor 6.9.2.
  • Metronome can now fetch artifacts from earlier runs on other submit nodes.

Known Bugs

  • No known critical bugs (all others, as of the current release)

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.2.6

Download

Release Date: 2007-05-31

metronome-2.2.6.tar.gz

MD5 checksum: 31d7105c172ecd04367f1dd6e4336bfc

Metronome-2.2.6-0.noarch.rpm

MD5 checksum: 470c4ea2c6bbd1e3510cfc4fc81fc6e2

Release Notes

None

New Features

  • Added nmi_condor_q script which displays a list of runids waiting for each platform.

Bugs Fixed

  • fetch.pl now uses RUD_DIR_URL to find the location of remote files to download, instead of defaulting to “/rundir”.
  • If the DB update routine for a run (aka the “monitor” job) suffers a fatal error and exits prematurely, it will now be automatically retried three times. This should help reduce the frequency of stale/incorrect info in the DB for completed build and test runs.
  • nmi_getattr and nmi_putattr now work for rpm installs of Condor.
  • Parallel job subtasks are named correctly on the web results page.

Known Bugs

  • No known critical bugs (all others, as of the current release)

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.2.7

Download

Release Date: 06/10/2007

nmi-2.2.7.tar.gz (MD5 checksum: 57f8bc43797cf41f26800e67cd134169)
Metronome-2.2.7-0.noarch.rpm (MD5 checksum: 2a9dd9d8f346b5ebc9c353039e9491ee)

Release Notes

In Metronome 2.2.4, the run archive directories began being named using the 4 digit year as the “root”. This caused problems with permissions and directory ownership. Therefore, starting in Metronome 2.2.7, run archive directories will be named using the owner’s username as the “root”, followed by subdirectories like so:

/path/to/nmi/rundir//&lt;4 digit year&gt;/&lt;2 digit month&gt;//

Example:

/nmi/run/pavlo/2007/06/pavlo_nmi-s002.cs.wisc.edu_1175765722_27363/

See this report for more information and a discussion about this change. Metronome’s directory naming is backwards compatible, and existing run archives named using the old directory names will be recognized without any problems. However, administrators may use the new nmi_migrate_run utility to transition run directories to the new hierarchy format if they wish.

New Features

  • New run archive directory hierarchy. See above.
  • The run completion email now includes the location of the run archive. More information can be found here.

Bugs Fixed

  • The web status pages now correctly use the filepath information of a run in the database.
  • nmi_rm now uses the Condor job’s ProcID when removing jobs from the queue. This fixes compatibility with parallel jobs. More information can be found here.
  • Improved error message handling in the web status pages.

Known Bugs

  • Metronome does not properly handle the “—nmiconf” option in certain tools. See this bug report.

Requirements / Dependencies

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.2.8

Download

Release Date: 2007-08-08

nmi-2.2.8.tar.gz

MD5 checksum: 1e9060cb5b141e1ed228d6d5fe27dda3

Metronome-2.2.8-0.noarch.rpm

MD5 checksum: 71a59e60fd90fefe05e0aef96c9658f1

Release Notes

To support the web status pages’ ability to retain user preferences across multiple submit nodes, the web page database user must now be able to write to the ‘sessions’ table. The schema file, which now also describes the ‘sessions’ table, has been renamed from schema.sql to schema.mysql; please see it for details. We regret that sites with only one submit node can not at present readily disable this feature.

New Features

  • The NMI_CONF enviroment variable is now respected.
  • Added support for platform types, so that users who already have a platform naming scheme can retain it. See the PLATFORM_TYPE configuration variable, and the platform_type submit file variable.
  • You can now advertise ‘nmi_condor_release_dir’ for machines which don’t have Condor (in particular, ‘chirp’) in their default PATH. This allows parallel jobs to invoke nmi_[get|put]attr on these platforms.
  • If the configuration variable use_condor_job_leases is true, Metronome now sets a two-hour Condor job lease for all platform_jobs; this means if a submit host goes down or is disconnected from an execute host for less than two hours, running jobs will no longer be interrupted and have to restart. Note that this will not function with Condor version earlier than 6.9.3.
  • The web status pages can now retain user preferences across multiple submit nodes in a Metronome pool.
  • The web status pages now allow users to set their local timezone.
  • Searching for a run in the web status pages based on runid or gid in the new top-right search box will now automatically jump to the task view for that run.
  • The web status pages now display “condor submission failure” instead of “-1001” for that error code.
  • The user may now suppress the pop-up window used to display standard output and error.

Bugs Fixed

  • Metronome by default now correctly looks for nmi.conf under the Metronome install path specified at install time.
  • Metronome no longer mangles submit files containing ‘queue’ when trying to insert the GID.
  • Return to respecting the advertised attributed ‘nmi_gnutar’.
  • nmi_condor_q now respects the PATH_NMI configuration variable.
  • E-mail notification of run completion works again.
  • The monitor job (which sends updates the Metronome DB) no longer keeps a persistent DB connection open; it also now retries after any failed DB updates. This should increase the maximum number of jobs in the system for a given database connection limit.
  • The monitor now properly drains the event queue when it detects job completion. This should eliminate inconsistencies between the database and the on-disk state of the job.

Known Bugs

  • nmi_condor_q fails to properly display jobs migrated to other Metronome sites

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • If use_condor_job_leases is enabled in nmi.conf: Condor >= 6.9.4

Metronome 2.4.0

This is a stable release of Metronome. It contains only new bug fixes or new platform support.

Download

Release Date: 2007-08-30

nmi-2.4.0.tar.gz

MD5 checksum: 1c2fb3ea8fed3626760ad644fb98ae15

Metronome-2.4.0-0.noarch.rpm

MD5 checksum: 42775e9de4e14c26378077ff0e11c4bf

Release Notes

None.

New Features

  • Metronome now supports the Sony PlayStation 3 platform (requires Condor 6.9.4+)

Bugs Fixed

  • The web status pages’ pool statistics sidebar now correctly reports the total number of Condor CPU Slots in the pool, instead of incorrectly reporting the number of hosts.
  • The web status pages once again report prereq information. Additionally, many small bugs in the sorting of various related tables have been eliminated.
  • Pinned runs whose pins have expired no longer display the pinned icon in the runs overview page.
  • nmi_rm no longer throws a spurious Perl warning.
  • nmi_condor_q handles remote (migrated) jobs better.

Known Bugs

  • Automatic email notification of run completion is broken in 2.4.0; it will be fixed in 2.4.1, but to fix it by hand in the meantime, simply edit line 15 of notify.pl. The broken line reads:

use lib $ENV{'NMI_LIB'} || "/usr/local/nmi-2.2.7/lib";
Just change the path at the end to be your installation’s actual NMI lib/ directory, and email notification will once again work.

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.4.1

This is a stable release of Metronome. It contains only new bug fixes or new platform support.

Download

Release Date: 2007-09-24

nmi-2.4.1.tar.gz

MD5 checksum: 74c68ec1e2bf28974eaaa739efb38f00

Metronome-2.4.1-0.noarch.rpm

MD5 checksum: 9141a68c0735fe02523f785042425e3c

Release Notes

None.

New Features.

None.

Bugs Fixed

  • Thanks to TeraGrid (JP Navarro and Charles Bacon) for the fix to a bug where the wrong package would be selected using SoftEnv if it were a substring of another package.
  • The monitor once again tolerates database connection failures on job start-up.
  • Fixed a bug preventing notification e-mails from being sent.
  • Protected against a cross-site scripting attack. We do not believe this vulnerability can compromise the integrity of the Metronome database, or the server(s) and back-end(s) providing Metronome services, such as Condor. (In particular, if the database permissions as set as we suggested in the installation instructions, we believe that only session preferences and the user-supplied notes associated with a run could be corrupted.) Of course, if your browser is vulnerable to the contents of arbitrary web pages, and you follow a link which exploits this Metronome bug, your browser will be compromised. For that reason, we strongly suggest you upgrade older versions of Metronome or patch the vulnerability. For information on latter, please contact us directly.

Known Bugs

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.4.2

This is a stable release of Metronome. It contains only new bug fixes or new platform support.

Download

Release Date: 2007-10-01

nmi-2.4.2.tar.gz

MD5 checksum: 4050fd0dc9f3d9d8f68118a467216476

Metronome-2.4.2-0.noarch.rpm

MD5 checksum: 68bacd3a637e6eae7acbbe2b1f8c3f4d

Release Notes

None.

New Features.

None.

Bugs Fixed

  • Fixed typo preventing results from being downloaded via the platform-specific quick links in the web status pages’ sidebar.
  • Removed superflous join from many database queries generated by the results pages in the web interface. Significant speed improvements for sites with thousands of historical runs.
  • Fixed problem introduced with database connection timeouts that would manifest as ‘255’ errors during long submit-host operations like source code fetches.

Known Bugs

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.4.3

This is a stable release of Metronome. It contains only new bug fixes or new platform support.

Download

Release Date: 2007-10-18

nmi-2.4.3.tar.gz

MD5 checksum: 5295d8bd7a033b018fb4c0466b245a43

Metronome-2.4.3-0.noarch.rpm

MD5 checksum: 25b6b04428475c97e00a8395f9c0a537

Release Notes

Because of some security enchancements, Metronome users upgrading from 2.4.2 and earlier may need to adjust the configuration of their webserver to make sure that the run directory, as set in RUN_DIR in nmi.conf and the RUN_DIR_URL (also set in nmi.conf), are both accessible from the web. We expect to be able to remove this requirement (and return to requiring only that RUN_DIR be accessible via the web under RUN_DIR_URL) in our next release.

New Features.

None.

Bugs Fixed

  • nmi_submit now respects the notify_fail_only flag in submit files.
  • Fixed a race condition in the input stage which could cause CVS checkouts to be done against the wrong repository if a CVS tree was part of another input.
  • When executing root-enabled jobs, Metronome no longer allows root-owned files to be left behind in the platform_job’s output directory, causing Condor to fail (with “permission denied” errors) when transferring output back to the submit host. Now, all files in the Metronome workspace are chown’ed to the execution user after the job completes, so that Condor can succesfully transfer them back.
  • The web interface no longer displays pinned runs whose pins have expired when the ‘pinned runs only’ checkbox is selected.
  • Closed a cross-site scripting vulnerability in the web interface.
  • Closed an arbitrary file access vulnerability in the web interface. This required the removal of the pop-up window previously used by default to display the standard output and error files of a task.

Known Bugs

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.4.4

This is a stable release of Metronome. It contains only new bug fixes or new platform support.

Download

Release Date: 2008-04-16

nmi-2.4.4.tar.gz

MD5 checksum: 1f8c12217d9bbea3b2f8684446794c57

Metronome-2.4.4-0.noarch.rpm

MD5 checksum: 152706b2b0dd603b4aa97822b6e6f5d7

New Features.

None.

Bugs Fixed

  • Metronome now correctly sets prereq library paths on all platforms for remote tasks (bug 1335)
  • Removed obsolete option to disable pop-ups. (Pop-ups no longer exist.)
  • nmi_rm now runs properly (no longer confuses DAGMan and non-DAGMan jobs).
  • Corrected a typo in nmi_putattr and nmi_getattr which prevented them from running.
  • Suppress Perl warning during processing of certain configuration file attribute values.
  • nmi_migrate_run no longer holds a database transaction open during migration. This allows more concurrent operations and eliminates a class of errors where the database connection would time out and the migration would have to be retried.

Known Bugs

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.5.0

This is a development release of Metronome. It contains new features, and may be unstable.

Download

Release Date: 2008-04-16

nmi-2.5.0.tar.gz

MD5 checksum: ae1672b027af1d56377894e199d17446

Metronome-2.5.0-0.noarch.rpm not created due to technical difficulties

MD5 checksum: TBD

Release Notes

The RPM packaging of this release has been delayed. Please contact us if this becomes a problem.

Backwards-incompatible syntax change: as a result of adding the ability to support multiple platform namespaces, we had to change the syntax of the platforms command in input specification files for the nmi input method. Instead of using a single colon to separate the source and destination platforms, users must now separate the platforms with two. This does not affect Metronome 2.5.0’s ability to use runs from earlier Metronome releases.

You can not run parallel jobs with Condor 6.9.5 and this release. Condor versions 6.9.4 and earlier, and 7.0.0 and later, do not have the problematic bug. Condor versions before 6.9.5 do not have the improved parallel job exit policies, which can dramatically simplify parallel testing, so we recommend using Condor 7.0.0 or above.

Some of Metronome 2.5.0’s new features require new tables in the database. Support for ‘git’ requires a new table, and this table has been added to the schema files. Support for nmi_resubmit_run remains more experimental, and its table is defined in a new file in the distribution, “database/Metronome-2.5.0”, which also includes a table for use with nmi_update_machine_table. This schema has only been tested against MySQL (although if you are using Metronome with Postgres, please let us know).

New Features

  • renamed “Hosts” to “CPU Slots” in pool statistics sidebar, to reflect reality
  • the Run Details web status pages now display the path to a run’s archive directory on the archive host (feature request 1176)
  • Added remote_task_is_null flag to submit files to support local-only jobs.
  • Added ability to handle multiple platform namespaces in a single submit file. See the new platforms and prereqs_ documentation.
  • Added option for use with Condor 6.9.5 and later which throttles potentially IO-intensive jobs on the submit node. See the documentation for a discussion on this feature.
  • Rewrote nmi_migrate_run to better handle large run directories.
  • Added support for individually specifying remote_*_timeouts, as well as remote_default_timeout to replace the functionality of the 2.4.x remote_task_timeout. See <taskname>_timeout.
  • Large run workspaces are now more efficiently packaged in preparation for transfer to remote machines (feature requests 1327 and 1328).
  • Add wgetrc option to use a separate wgetrc file for each input.
  • Added ability to recreate a run entirely from the database. See documentation for nmi_resubmit_run.

Bugs Fixed

  • This release contains all bug fixes from the Metronome 2.4.3 stable release.

Known Bugs

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.5.1

This is a development release of Metronome. It contains new features, and may be unstable.

Download

Release Date: 2008-07-30

nmi-2.5.1.tar.gz

MD5 checksum: 614f408ab159658f3a1784b64504895e

Metronome-2.5.1-0.noarch.rpm

MD5 checksum: TBD

Release Notes

We accidentally introduced a dependency on PHP 5 in this version. Version 2.5.2 has been corrected to remove this dependency.

New Features.

  • Metronome 2.5.1 supports the use of GIDs in the input_runids command for the nmi input method.
  • You may now add ‘h’ or ‘H’ to time-outs to specify a duration in hours.
  • New timeout: “platform_job_timeout” (can set default PLATFORM_JOB_TIMEOUT in nmi.conf) limits the length of the whole of a platform job.
  • nmi_submit now checks for and fails on duplicate attributes.
  • nmi_submit now also checks for valid time-out specifications.
  • Metronome now sets the $_NMI_PREREQ_*_ROOT environment variables with partial version strings. (We urge you to use this feature with restraint — use the most-specific version you readily can.) To explain by way of example, for java-1.5.0_08, Metronome would set $_NMI_PREREQ_java_1_5_0_08_ROOT, $_NMI_PREREQ_java_1_5_0_ROOT, _NMI_PREREQ_java_1_5_ROOT, and so on, down to _NMI_PREREQ_java_ROOT.
  • The parser has been updated to allow submit files with the remote_task_is_null set to not define any platforms.
  • The ‘nmi’ method supports two additional options, block_until_exists and timeout. If you define block_until_exists, then Metronome will block until the named run or runs finishes, or until timeout passes.
  • If a run is submitted and fails to match, Metronome will sanity-check the existence of individual prereqs on that platform and warn the user if one those checks should fail.
  • Web interface searching improvements. Searching for an existing user, project, or component from the simple search box will now return much faster, with the term selected in the appropriate drop-down for further search refinement (if necessary). Quick searches which fail will be converted into slow searches, and the user given an opportunity to perform such a search. In both of these cases, the user will be told what the system just did. The additional search options have been reorganized to separate run-specific terms from terms used to gain an overview.
  • Added a new command-line tool, nmi_rundir. It converts GIDs or runids given on the command line to hostname:full-path pairs.
  • New “well-known” filename: if you create a file “notify.nmi” in the working directoy of the post_all script (userdir/common), it will be appended to the notification e-mail (if any) sent by Metronome.

Bugs Fixed

Includes the following bug fixes from the stable series (due to be released in Metronome 2.4.5):

  • If the user fails to specify a platform in an output specification file, nmi_submit will fail with an error message about the problem. This prevents mysterious failures at the end of the run.
  • nmi_list_prereq no longer silently fails when passed a prereq substring as an argument.
  • Corrected the URL logo in the example web interface configuration to point to the National Science Foundation.
  • Corrected a typo in the web interface to reduce cruft in the server logs.

(This release also has a functioning ‘reset view’ button, and will properly return ‘uses run id’ results; but these are not, strictly, bug fixes, as they’re a part of the improved web searches feature above.)

Known Bugs

  • Accidentally introduced a requirement for the PHP 5 functions file_[get|put]_contents(). Fixed in release 2.5.2.

Requirements

  • Metronome Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 5 (*Bug*, see above.)

  • Metronome DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Metronome 2.5.2

This development release of Metronome, and may be unstable.

Download

Release Date: 2008-08-12

nmi-2.5.2b.tar.gz

MD5 checksum: b2ef88bb0b34c2e411ab68a4f55e7aef

nmi-2.5.2.tar.gz

MD5 checksum: 781f3a3a139a54a97b61b45ffbf2984e

Release Notes

Bugs fixed in 2.5.2b are marked by ‘[b]‘ below.

The 2.5.2 release is identical to the version 2.5.1 release, except that removes and inadvertently-added dependency on PHP 5.

New Features.

None.

Bugs Fixed

  • [b] Corrected bug which caused remote_*_timeouts to be increased by a factor of sixty.
  • Added code to implement file_[get|put]_contents if not supplied by PHP.

Known Bugs

None.

Older NMI Releases

NMI Release 2.0.1

Release Date: ???

Download

nmi-2.0.1.tar.gz

Release Notes

New Features

  • added stats script (nmi_usage_stats.pl)
  • added the disk cleaner (disk_cleaner.pl)
  • support for USER macro in the nmi_submit file
  • expose USER environment variable to the remote jobs

Bug Fixes

NMI Release 2.1.3

Release Date: 07/10/2006

Download

nmi-2.1.3.tar.gz

MD5 checksum:
9c41e87e2cf64be831471ba094356d91 nmi-2.1.3.tar.gz

Release Notes

New Features

  • support for FreeBSD platform.
  • Condor-C functionality.
  • run_as_root support (ETICS).

Bug Fixes

  • Fixed implementation of max_match_wait, which was accidentally enabled only for Condor-C jobs in the last release.
  • remote_task attribute is now required.
  • the monitor bug (monitor hanging after Condor job finishes) fixed “for real”.
  • return correct status on failure for fetch steps.

NMI Release 2.1.4

Release Date: ???
MD5 checksum: f000491cc79a9cf449c69916c54a9e8b nmi-2.1.4.tar.gz

Download

nmi-2.1.4.tar.gz

Release Notes

New Features

  • This release includes beta support for parallel testing, enabling multiple, co-scheduled machines to communicate with one another. This can be useful for client/server, scalability, or cross-platform testing.

Bug Fixes

  • none

NMI Release 2.1.6

Download

nmi-2.1.6.tar.gz

Release Notes

New Features

  • Two new tools, nmi_getattr and nmi_putattr, are available for use by user-defined remote_* scripts. These new tools require Condor 6.9.0 or greater.
  • The nmi input method now allows optional source:destination platform mapping, so results from one platform can be used as input to another. This can be useful for cross-platform binary compatibility testing.
  • In the the cvs input method, multiple _cvs_source_path_ prefixed commands are now deprecated, and have been replaced with a new, simpler _cvs_subdir_ list.
  • Added a new utility tool nmi_testsforrun. Given an NMI run identifier, it will return all the NMI tests for the binaries produced by the build.
  • Added support for new always_run_post_all submit option, which if set to true, allows the post_all to execute in the face of a failed platform_job.

Bug Fixes

  • The nmi_usage_stats.pl script now reports the number of platform_job tasks per user instead of than the number of submitted runs, as the former is a better measure of usage. It also prints out a total of all columns at the bottom of the report.
  • The nmi input specification file parser now treats the “all” keyword of the platforms command case-insensitively.
  • The nmi input specification file parser now ignores (rather than rejects) empty elements in a comma-separated list. For example, foo, , bar is now treated as foo, bar instead of as an error.

NMI Release 2.1.7

Download

nmi-2.1.7.tar.gz

Release Date: 10/02/2006
MD5 checksum: f23dbe5aeb1b036e787efbd51eb34280 nmi-2.1.7.tar.gz

Release Notes

This release requires a change to the database schema in order to record the hostnames on which parallel tasks execute. The upgrade can easily be made to an existing installation by issuing the following command in mySQL:

ALTER TABLE Task ADD COLUMN node_id smallint(3) unsigned default null AFTER runid;

New Features

  • None

Bugs Fixed

  • The build & test system would always record in the DB that the parallel node id of non-parallel tasks was zero. This caused problems with the web portal.
  • Fix the cleaner to automatically fix in DB the unfinished tasks for finished runs with a corresponding note in the email notification.

Known Bugs

  • The web interface is missing some ancillary files that failed to be included in this release. This will be corrected in the next release. As a work around, change the following line in lib/init.inc from:
require_once(LIB_PATH.'formUtil.inc');

to

// require_once(LIB_PATH.'formUtil.inc');

Requirements

  • For Parallel tests: Condor 6.9.0 or later.

NMI Release 2.1.8

Download

nmi-2.1.8.tar.gz

Release Date: 11/30/2006
MD5 checksum: cbd3f03ca5cbf6600483d51e607ad204 nmi-2.1.8.tar.gz

Release Notes

This release requires a new field be added to the database schema. To modify an existing NMI installation, execute the following command from the mySQL console:

ALTER TABLE Run ADD COLUMN identity tinytext DEFAULT NULL AFTER user;

New Features

  • Preliminary support for automatic cross-site job migration is now apart of the NMI framework. This feature is still in development and certain aspects of the mechanism will most likely change in the future. Thus, it may not yet be appropriate for production sites.
  • Improved support for the use of SoftEnv to provide NMI prereqs.
    • The NMI prereq mechanism now fully supports the SoftEnv software management system. Users do not need to make any change to their submission files to take advantage of this feature. Information for administrators about how to configure the NMI framework to use SoftEnv can be found here.
    • This release includes a new Hawkeye module for advertising prereqs managed by SoftEnv.
  • Input specification files now support “http” as a synonym for the existing “url” method.
  • The disk cleaner now supports a —quiet option to suppress stdout when no errors occur. This is useful when calling it from cron, to prevent it from generating unnecessary emails.
  • Added an optional new identity attribute in the NMI submission file. An arbitrary string value can be specified to identify the NMI submission’s owner, if distinct from the local system user who actually submitted it. The user-defined identity string will be stored in the DB for the run, but does not affect the user the job actually executes as on the computing resources.
  • Adminstrators can now control which attribute is displayed in the user column of the NMI web status pages. For example, the new identity attribute (see above) could be used instead of the default user attribute. This may be useful for front-end systems to the NMI framework where a single daemon user submits all jobs.
  • Added a new web interface configuration option VERSION_FOOTER that allows the page rendering time and NMI framework version number to be displayed discretly at the bottom of every page. The default is set to true for all new installations.
  • The log file monitor has been optimized to only open up a single database connection per instance.

Bugs Fixed

  • A major bug preventing fetch.pl from properly reporting nmi input fetch failures has been fixed.
  • Certain files for the web interface were not included and would cause errors. This has been fixed.
  • Fixed a bug where certain macros in the submit file were being stripped inproperly.
  • Added protection measures in nmi_run_status for input values. Savannah Bug #17282
  • Fixed a bug for the results/runDetails page of the web framework that would create a SQL query that locks up large databases.
  • Fixed problem in Makefile.PL where the NMI_LIB path is not set correctly for installation directories that end with /nmi/lib.
  • Fixed a bug where a job using the remote site execution parameters would complete successfully, but then it would fail unexpectedly in the postscript procedures. This only affected sites using Condor 6.8
  • Fixed the web interface for viewing files when the localhost name is different from the NMI_MAIN parameter in the NMI configuration file. See this bug report for more information.

Requirements

  • For Parallel tests: Condor 6.9.0 or later.

NMI Release 2.2.0

Download

nmi-2.2.0.tar.gz

Release Date: 12/21/2006
MD5 checksum: 2d3c52b0f16e58378099d0afdf4caa49 nmi-2.2.0.tar.gz

Release Notes

We recommend that all sites’ NMI web status pages be configured to use a read-only user for database access. To add a new limited-privilege database user, execute the following command (substituting the DB_READER_USER, DB_READER_PASS, and HOSTNAME variables with your own values).

GRANT SELECT,CREATE TEMPORARY TABLES ON nmi_history.* \
TO 'DB_READER_USER'@'HOSTNAME' IDENTIFIED BY 'DB_READER_PASS'; 
GRANT SELECT,CREATE TEMPORARY TABLES ON nmi_history.* \
TO 'DB_READER_USER'@'localhost' IDENTIFIED BY 'DB_READER_PASS';

Several NMI configuration file variable names have been changed. Specifically:

The old names are deprecated but will continue to work, so no immediate change to existing config files is necessary. More information about these new configuration variables can be found here.

New Features

  • The DB update script (monitor.pl) now supports exponential backoff when polling task logs for updates. This new feature is enabled by default. Please see documentation on the POLLING_BACKOFF configuration variable to adjust or disable it.
  • The full path of your NMI installation is now stored in the DB for each run, in addition to the NMI version used to submit it.
  • In the remote execution workspace of each platform job, the USERDIR now contains a .nmi_failed_tasks file listing the names of any remote tasks which have failed. This can be examined by a remote_post script to control what output to return in results.tar.gz.
  • The NMI resource advertiser now supports broadcasting resource information to remote collectors running on a different host and port than the remote submit node.

Bugs Fixed

  • Fixed the rows dropdown in the web status pages so that it correctly saves the users selection.
  • The DB update script (monitor.pl) can once again detect when log files are truncated and reset (e.g., when a Condor job restarts). See this bug report for more information.
  • Improved logfile format and messages for the DB update script (monitor.pl).
  • Fixed the ability for the NMI resource advertiser to execute Condor binaries when the condor_config file is not stored in a standard location.
  • The default database name has been changed from history to nmi_history.

Known Bugs

  • We have seen intermitent problem where the platform_job fails to get written to the database. It is not clear what is the cause of the problem or how it occurs.

Requirements

  • NMI Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • NMI DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

NMI Release 2.2.1

Download

nmi-2.2.1.tar.gz

Release Date: 01/26/2007
MD5 checksum: ff63ee14c79d58de384fa8f3d9542c58 nmi-2.2.1.tar.gz

Release Notes

NOTE: Be sure to read the NMI 2.2.0 Release Notes to understand the changes made since NMI 2.1.8. Most notable are the configuration parameter changes.

This release fixes two major bugs discovered in the last month at our production facility. These bugs were somewhat related and long standing; they became more prevalent due to the addition of the exponential backoff polling in the database logfile monitor.

NMI 2.2.1 is marked as a STABLE release; all users of NMI 2.2.0 are strongly encouraged to upgrade to this latest release. The NMI team will soon be supporting concurrent stable and development releases. More information will be posted in the future.

New Features

  • Add support for Ubuntu in the nmi_platform Hawkeye module.

Bugs Fixed

  • Fixed the log file checksumming feature of the database update script. This prevented log files from being re-read properly when platform_jobs were evicted from a resource (either due to the machine going down or the job being put on hold). This could cause the updated status of remote tasks which re-ran after an eviction not to appear in the database. A symptom of this problem was tasks that successfully completed but still had a -9 status in the database. More information can be found in this bug report.
  • Fixed a race condition between the DB update script and the platform_job_prescript script. This would prevent the platform_job task information from being stored in the database. This bug did not have an adverse affect on the ability for jobs to run, but produced missing status information and SQL errors in the DB update script’s error file.
  • When a platform_job is evicted from a resource, any remote tasks that were running at the time are now be marked with a special -1003 result code instead of -9 (SIGKILL).
  • Corrected spelling mistake in the URL_PREFIX parameter of the email notification script. This caused emails to have incomplete URLs to build/test information.

Known Bugs

  • None

Requirements

  • NMI Submit/Archive Host
    • Condor >= 6.8.0 or Condor >= 6.9.0
    • Perl >= 5.005 (including DBI and DBD::mysql modules)
    • Apache >= 2.0
  • PHP >= 4.2.3 (i.e, with Session & MySQL support)

  • NMI DB Host
  • MySQL 4.1.20

  • Condor Central Manager Host
  • Condor >= 6.8.0 or Condor >= 6.9.0

  • Build/Test Execution Hosts
    • Condor >= 6.8.0 or Condor >= 6.9.0
  • Perl >= 5.005

Special Feature Requirements

  • For Parallel jobs: Condor >= 6.9.0 on central manager, submit and execute hosts.

Reference Manual

Introduction

The NMI framework software works on top of an existing Condor pool. You must identify hosts to run the following NMI facility services:

  • Condor Central Manager host
  • One or more combined Submit/Archive/Web Server hosts (currently these must be co-located)
  • One DB host
  • 1 to n Execute hosts

Any/all of these services may be co-located on the same host if desired. It’s recommended to have at least 1 execute host to start.

Metronome Installation

Preparing System Environment

If you are using Linux for your submit node, you must install the mysql-dev and Perl DBI modules.

perl -MCPAN -e "install DBI"
perl -MCPAN -e "install DBD::mysql"

Preparing the Database

Install the version of MySQL as listed in the release notes. You will need to create a database with the same name as defined DB_NAME (the default name is nmi_history) and install the default schema.

mysqladmin create nmi_history
mysql nmi_history < nmi-X.Y.Z/framework/database/schema.mysql

Now as a privileged user, create the following accounts. The first account shown below is the DB_WRITER_USER, and it needs to be able to insert and update records in the database. The second account, DB_READER_USER, is use by the web interface and needs only read access to the database, except to update the “notes” field (which you can turn off), and to write to the sessions table (which, at least for now, you can’t).

Replace ‘@%.example.com@’ with the appropriate domain or host. Only hosts you specify for the DB_WRITER_USER will be able to use the command-line tools, and only hosts you specify for the DB_READER_USER will be able to run the web interface.

NOTE: Be sure to execute FLUSH PRIVILEGES; to make sure these accounts are add appropriately. You may also need to create an additional ‘localhost’ record for each account if the database is running on the same host as the submit node.

# DB_WRITER ACCOUNT
GRANT SELECT,INSERT,UPDATE,DELETE ON nmi_history.* \
TO 'DB_WRITER_USER'@'.example.com' IDENTIFIED BY 'DB_WRITER_PASS';

# DB_READER ACCOUNT
GRANT SELECT,CREATE TEMPORARY TABLES ON nmi_history.* \
TO 'DB_READER_USER'@'
.example.com' IDENTIFIED BY 'DB_READER_PASS';
GRANT UPDATE (notes) ON nmi_history.Run \
TO 'DB_READER_USER'@'.example.com' IDENTIFIED BY 'DB_READER_PASS';
GRANT SELECT,INSERT,UPDATE,DELETE ON nmi_history.sessions \
TO 'DB_READER_USER'@'
.example.com' IDENTIFIED BY 'DB_READER_PASS';

Installing NMI Framework

Install the NMI software under your chosen prefix:

perl Makefile.PL prefix=<prefix>
make install

If you anticipate installing multiple versions of the framework, you may wish to set the prefix to a location such as /nmi-x.y.z, then create symbolic links to the installation directories:

mkdir <prefix>/nmi
cd <prefix>/nmi
ln -s <prefix>/nmi-X.Y.Z/share
ln -s <prefix>/nmi-X.Y.Z/bin
ln -s <prefix>/nmi-X.Y.Z/lib

Framework Configuration

Copy nmi-X.Y.Z/framework/nmi.conf.sample to prefix/etc/nmi.conf and edit as required. Please make sure that all non-trivial configuration parameters are customized for your local site (see Site Configuration Parameters for more information).

mkdir <prefix>/etc
cp nmi-X.Y.Z/framework/nmi.conf.sample <prefix>/etc/nmi.conf
edit <prefix>/etc/nmi.conf

If you intend to install future framework versions, you may want to place your nmi.conf file in a general location such as /nmi/etc and create the symlink from prefix/etc instead:

mkdir /nmi/etc/@
cp <prefix>/etc/nmi.conf /nmi/etc/
cd <prefix>/etc
ln -s /nmi/etc/nmi.conf nmi.conf

Hawkeye modules setup

The framework relies on Condor Hawkeye technology to ensure that jobs get matched to machines with the right platform. Put the following lines in your Condor config file on ALL of your EXECUTE hosts:

# EDIT_ME: In the next line, is a directory
# path where you keep your Hawkeye modules, if any. For example, it
# could be /home/condor/hawkeye_modules.
MODULES =
STARTD_CRON_NAME = NMIPOOL
# Uncomment the following line if NMIPOOL_JOBS has not been defined yet.
# NMIPOOL_JOBS =
# JOB: Report the list of software installed on the system.
NMIPOOL_JOBS = $(NMIPOOL_JOBS) prereq:has_:$(MODULES)/prereq:10m:kill
# EDIT_ME: In the next line, is the path to the
# directory containing individual prereq installations. For example,
# it could be /prereq.
NMIPOOL_PREREQ_PREREQDIR =
# JOB: Report the nmi_platform.
NMIPOOL_JOBS = $(NMIPOOL_JOBS) nmi_platform::$(MODULES)/nmi_platform:720m:kill

Now take the contents of the framework ‘hawkeye_modules’ directory that comes with this distribution and, on ALL of your EXECUTE hosts, copy the files to .

Check that the module returns sensible values when run directly on your build machines. For example:

./nmi_platform
nmi_platform = "ppc_macos_10.3"

Now restart Condor on the execute hosts and verify that they report their NMI platform correctly to Condor Collector. You should be able to see something like:

condor_status -l | grep nmi_platform | head -5
nmi_platform = "ppc_aix_5.2"
nmi_platform = "hppa_hpux_B.10.20"
nmi_platform = "irix_6.5"
nmi_platform = "alpha_rh_7.2"
nmi_platform = "ia64_rhas_4"

Similarly, to test the prereq module, install some prereqs in your prereqs_location – for example, let’s say you installed python-2.2.3 from source using —prefix=/prereq/python-2.2.3 option to configure. Then you should be able to see something like:

./prereq | grep python-2.2.3
python_2_2_3 = "/prereq/python-2.2.3"

Similarly, if in the command below you replace by the actual hostname for that machine, you should be able to see something like:

condor_status -l | grep python
has_python_2_2_3 = "/prereq/python-2.2.3"

Advanced Hawkeye Usage

You can use Hawkeye to help match jobs to machines on the basis of attributes other than platform. We wrote a Hawkeye module, publish_dir.pl (see below), to simplify this task. It reads files from a specified directory, ignoring lines beginning with ‘#’ (or whitespace followed by ‘#’) and lines which don’t match the ‘attribute = value’ form. For those lines, it makes ‘attribute’, with the value ‘value’, available for matching jobs to the machine.

For example,

# ETICS includes compiler information in the platform string.
etics_platform = Darwin821_powerpc_gcc400

if you set the PLATFORM_TYPES configuration file variable to ‘etics’, users could specify ‘Darwin821_powerpc_gcc400’ as the platform for their jobs.

How to create these files is left for the particular situation of the reader, although it will be convient in many cases to use Hawkeye. See the Condor manual.

Running publish_dir.pl

For older versions of Condor (pre-6.9.x), the following should work:

NMIPOOL_JOBS = $(NMIPOOL_JOBS) publish_dir::$(MODULES)/publish_dir.pl:1h:kill
NMIPOOL_PUBLISH_DIR_PATH = /prereq/.hawkeye

This Condor configuration snippet applies to later versions of Condor:

# Report attribute = name pairs from the named directory.
NMIPOOL_JOBLIST = $(NMIPOOL_JOBLIST) publish_dir
NMIPOOL_PUBLISH_DIR_PREFIX =
NMIPOOL_PUBLISH_DIR_EXECUTABLE = $(MODULES)/publish_dir.pl
NMIPOOL_PUBLISH_DIR_PATH = /prereq/.hawkeye
NMIPOOL_PUBLISH_DIR_PERIOD = 1h
NMIPOOL_PUBLISH_DIR_KILL = True

In both cases, publish_dir.pl update the attribute-value pairs available for matching, as specified by the files in /prereq/.hawkeye, once an hour.

Configuring a Metronome Pool for Parallel Testing

Overview

This feature of Metronome builds on the Condor Parallel Universe and provides for running jobs on multiple machines simultaneously. Condor's Chirp mechanism makes communication between the machines possible.

Version Requirements

    Condor

  • 7.1.0 - latest stable series release, adds "parallel job exit policies":node/1296
  • 7.0.1 - latest developer series release, adds "parallel job exit policies":node/1296
  • 6.9.5 - broken for // jobs, do not use!
  • 6.9.0 - adds global scratchpad feature
  • 6.7.20 - fixes Node 0 killing other nodes bug
  • 6.7.19 - fixes Scheduler attribute
  • 6.7.18 - first Condor release with Parallel universe support for DAGMan
    Metronome

  • 2.5.0 - latest developer series release, recommended.
  • 2.4.4 - latest stable series release
  • 2.4.3 - nmi_putattr and nmi_getattr bugs, do not use!
  • 2.2.3 - uses global scratchpad feature
  • 2.x.x - includes native support for Parallel universe jobs

Pool Setup

Each Metronome pool where you would like to run parallel jobs requires 1 DedicatedScheduler machine. The DedicatedScheduler is usually a submit node, as the Parallel Universe is based on the condor_schedd daemon. Each execute machine must then know about the submitter. There are several ways to configure your pool to run parallel jobs.

  1. Add 1 P-slot to each machine in your pool. For this example, assume that you already have 2 multi-use slots set up on each execute machine and you are adding slot 3 to run only parallel jobs.
    • Replace the LOCAL_CONFIG_FILE line in your condor_config file:
          #LOCAL_CONFIG_FILE      = $(LOCAL_DIR)/$(HOSTNAME).local
          PARALLEL = $(LOCAL_DIR)/condor_config.parallel
          LOCAL = $(LOCAL_DIR)/$(HOSTNAME).local
      
          REQUIRE_LOCAL_CONFIG_FILE = True
          LOCAL_CONFIG_FILE = $(PARALLEL), $(LOCAL)
      
    • Create a config file called condor_config.parallel and put it on all machines where you would like to run parallel jobs. Add the following lines, replacing submit-host.your.org with the name of the machine you will be submitting from.
    • ######################################################################
      # MPI Settings
      ######################################################################
      
      ##  If you want to "lie" to Condor about how many CPUs your machine
      ##  has, you can use this setting to override Condor's automatic
      ##  computation.  If you modify this, you must restart the startd for
      ##  the change to take effect (a simple condor_reconfig will not do).
      ##  Please read the section on "condor_startd Configuration File
      ##  Macros" in the Condor Administrators Manual for a further
      ##  discussion of this setting.  Its use is not recommended.  This
      ##  must be an integer ("N" isn't a valid setting, that's just used to
      ##  represent the default).
      NUM_CPUS = 3
      
      ##  The number of evenly-divided virtual machines you want Condor to
      ##  report to your pool (if less than the total number of CPUs).  This
      ##  setting is only considered if the "type" settings described above
      ##  are not in use.  By default, all CPUs are reported.  This setting
      ##  must be an integer ("N" isn't a valid setting, that's just used to
      ##  represent the default).
      NUM_VIRTUAL_MACHINES = $(NUM_CPUS)
      
      ##  What is the name of the dedicated scheduler for this resource?
      DedicatedScheduler = "DedicatedScheduler@submit-host.your.org"
      
      ##  Path to the special version of rsh that's required to spawn MPI
      ##  jobs under Condor.  WARNING: This is not a replacement for rsh,
      ##  and does NOT work for interactive use.  Do not use it directly!
      MPI_CONDOR_RSH_PATH = $(LIBEXEC)
      
      ##  This setting puts the DedicatedScheduler attribute, defined above,     
      ##  into your machine's classad.  This way, the dedicated scheduler
      ##  (and you) can identify which machines are configured as dedicated
      ##  resources.
      SLOT1_STARTD_EXPRS = $(STARTD_EXPRS)
      SLOT2_STARTD_EXPRS = $(STARTD_EXPRS)
      SLOT3_STARTD_EXPRS = DedicatedScheduler
      
      ## required so the start expr won't eval to false and prevent jobs from ever running.
      IsOwner = False
      
      ## Be cautious that you don't override this START expression in other condor_config.* files.
      START = ( (VirtualMachineID == 1) || \
                (VirtualMachineID == 2) || \
                (VirtualMachineID == 3) && $(SLOT3_TYPE) )
      
      
      ## slot3 runs parallel / MPI jobs
      SLOT3_TYPE = (Scheduler =?= $(DedicatedScheduler))
      
      
  2. Allow all of the slots to run parallel jobs as well as other jobs.

    TBD.

Configuring jobs to require certain hosts, or visa-versa

Matchmaking is bilateral — that is, both jobs and hosts can express their own requirements of one another, and can advertise their own attributes for one another’s reference.

A platform_job’s Requirements expression specifies its requirements of a host to run on. In Metronome, you can easily add a new constraint to the platform_job’s Requirements expression via the append_requirements command in the run specification file, like so:

append_requirements = (host_attribute =?= "bar")

This tells Condor to make sure that the job only runs on a host whose host_attribute equals “bar”.

Likewise, a host’s START expression specifies its requirements of a job. To add a new constraint to the host’s START expression, you should edit the host’s condor_config (or condor_config.local) file, like so:

START = ( $(START) &amp;&amp; job_attribute =?= "foo" )

This tells Condor to make sure that the host only receives jobs whose job_attribute equals “foo”.

To advertise a new job attribute (so you can reference it in a host’s START expression), just add it via the ++ command in the run specification file, like so:

++job_attribute = "foo"

To advertise a new host atttribute (so you can reference it in a job’s Requirements expression), just add it to the host’s condor_config (or condor_config.local) file. like so:

host_attribute = "bar"
STARTD_EXPRS = $(STARTD_EXPRS) host_attribute

Additional Notes

What is =?= and How Is It Different From ==?

The short answer: Use =?=, not ==.

The long answer: In the boolean logic of Condor classads, expressions can evaluate to True, False, or Undefined. If an expression references an attribute which isn’t defined, the value of that expression becomes Undefined. Therefore, if you say:

START = (job_color == "Red")

...and job_color is not defined in the job ad, then START will evaluate to Undefined rather than false.

To avoid this, Condor classads provides a =?= variant of the equality operator which will evaluate to False if one half is Undefined, rather than evaluating to Undefined. So if you say:

START = (job_color =?= "Red")

...and job_color is not defined in the job ad, then START will evaluate to False rather than Undefined.

Likewise for =!= and !=.

Debugging

Although it’s not needed for correctness, for debugging you might also want to add the following to the host’s condor config_config:

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) job_attribute

This will tell the host to publish the job_owner attribute of any currently running job in its own host classad, so you can see it. This can make it easier to confirm that the job that is currently running has the attribute you expect, without having to look up the jobid and examine its classad separately using condor_q -l. The only thing to be careful about is any attribute names which are present in both the host and job classads, because only one can be published in the machine classad. This is one good reason to name job attributes and host attributes with job_ or host_ prefixes.

Cross-Site Job Migration

This feature in Metronome allows build and test submissions to automatically migrate to different pools based on resource availability. If a local user submits a job that requests a particular platform that does not exist in the local pool, it can automatically be routed to a different site that does have a computing resource with that platform.

The following information and pages are a guide for system administrators and pool managers to allow their local Metronome installation to establish routes with others sites.

Overview

There are two major components of the job migration feature:

  1. Broadcast local resource information to other sites
  1. Creating a routing table for local jobs to remote sites

This is preformed by the nmi_resource_advertiser tool. It transmits information about your local Metronome resources to remote collectors, such as the number of available machines in the pool and which type of platforms are available. Other sites that you have a pair-wise agreement with will in turn broadcast their pool information back to your local collector.

With this information, the resource advertiser can construct a table of routes to the remote sites where local jobs can execute on in order to get the resources they need.

To enable this feature, there are several steps and configuration changes that you will need to make to your local Metronome installation. Each pool that you wish to route jobs to will also need to make the same changes.

Limitations

  • Routes are currently bi-directional. There is no way to have jobs migrate from Site A to Site B but not have jobs migrate from Site B to Site A.
  • There is no way to prevent your job from executing on remote sites in the current version. All jobs are eligible for migration after remaining idle for a certain period of time if there are resources that match their job.
  • Jobs are only matched based on the platform that they requested. The system is unable to make matches based on prerequisite software requirements.
  • There is currently no way to specify which platforms are allowed to be shared with remote sites.

NMI Configuration

NMI Configuration File

There are two parameters that need to be added to the end of your nmi.conf file on each of your pool’s submit nodes:

## ---------------------------------------------
## Job Routing
## ---------------------------------------------
ROUTING_TABLE = /path/to/condor/local/dir/condor_config.routing_table
REMOTE_SITES =  /path/to/nmi/etc/remote_sites

The routing table is populated by the resource advertiser with information about how to migrate jobs to remote sites (see this page for more information). It is important that the ROUTING_TABLE entry be writable by the nmi_resource_advertiser script and readable by the Condor daemons. The remote_sites entry should be placed in the same directory as the submit node’s nmi.conf file.

Remote Sites List

The REMOTE_SITES file is a list of hostnames that your local pool is allowed to route jobs to. Each line in the file should contain a hostname of a remote submit node and optionally the hostname of the collector to broadcast the resource information to for the submit node. If no collector host is provided, the resource advertiser will send all information to collector listening on the submit node at the standard port.

Please note that the entries in this file must match exactly with the hostname on each machine. The resource advertiser uses this list to look for remote resource information in the local collector and also to send the local resource information to each of these sites.

Sample list:

# Comments are allowed
#     
remote.site1.com
remote.site2.com   collector.site2.com
remote.site3.com   collector.site3.com:1234

Submit Node Configuration

Cron Job

The nmi_resource_advertiser program is used to broadcast information about the local NMI resources to sites listed in the REMOTE_SITES file and write the routes to remote sites to the ROUTING_TABLE used by the job router.

You will need to add the following command to the crontab of the user that Condor runs as (on most systems this is usually condor or daemon). The command will execute every five minutes. This is just a default interval that should suit most configurations. It should be less than the ClassAd lifetime of information stored by the Collector (CLASSAD_LIFETIME).

*/5 * * * * /path/to/bin/nmi_resource_advertiser --broadcast --routing-table --nmiconf=/path/to/nmi/etc/nmi.conf

Firewall Configuration

If all of your submit and worker nodes are located either on the public network or behind a firewall, there are no configuration changes needed for Metronome. If, however, worker nodes are installed at remote sites that have firewalls, then you will need to configure both Condor and the firewalls to allow the appropriate network traffic. Note that the following changes must be made in your site’s Condor configuration file and not nmi.conf

Please refer to the Networking Section of the Condor Manual for the most up-to-date information about what ports are needed by Condor.

Setting Condor’s Listening Port Range

Condor can be configured to open sockets within a range of values. Please refer to the information below on the number of ports that need to be opened for your installation. The following example will cause Condor to only open ports

LOWPORT = 20000
HIGHPORT = 25000

The range of ports assigned may be restricted based on incoming (listening) and outgoing (connect) ports with the configuration variables IN_HIGHPORT, IN_LOWPORT, OUT_HIGHPORT, and OUT_LOWPORT.

Local Inclusion of Remote Resources

The most important port to open is 9618; this is used as the listening port of the Condor Collector for receiving resource information from worker nodes.

There are several configuration possibilities for included remote resources. For example, you may just allow users to add their resources to your Metronome site’s pool, or you may allow them to submit jobs from the outside into your pool.

The worker nodes that are outside of the

Remote Worker Nodes Connecting to Local Collector

Description needed here…

Number of ports needed:

5 + (5 * number of virtual machines advertised by that machine)

Remote Submit Nodes Connecting to Local Resources

Description needed here…

Number of ports needed:

5 + (5 * MAX_JOBS_RUNNING)

Local Central Manager Connecting to Remote Resources

Description needed here…

Number of ports needed:

5 + NEGOTIATOR_SOCKET_CACHE_SIZE 

Job Migration

The most important port to open is 9618; this is used as the listening port of the Condor Collector for receiving resource information from worker nodes.

Metronome Configuration Parameters

The Metronome configuration on each submit/archive node is determined by the settings in the nmi.conf configuration file. This page is a list of all the available parameters.

Any text after a hash character (#) is considered a comment. The syntax for each line in the configuration file is:

# Comments are ignored
 =    # inline comment

NOTE
Starting in Metronome 2.2.0, the following parameters are deprecated. The default configuration file now uses UPPERCASE naming, although the lower case versions will still be supported. Use the appropriate substitution when updating your configuration file:

ADMIN_EMAIL

This is the email address that is used by the NMI framework to send debug and error information to. For example, if a component of the framework crashes while executing a job, the debug information will be sent to ADMIN_EMAIL.

ADMIN_EMAIL = nmi-admin@example.com

condor_base

Defines the path to the root directory of your submit node’s Condor installation.

condor_base = /path/to/condor

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use PATH_CONDOR instead.

CONDOR_CONFIG

This paramter defines the location of the submit node’s main Condor configuration file. It is needed so that the framework can access Condor utilities for submitting, mananging, and removing builds and tests.

CONDOR_CONFIG = /path/to/condor/etc/condor_config

database

To be written

database = history

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use DB_NAME instead.

DB_HOST

This parameter is used to define the host name of the database server used by the NMI framework. It can be a fully-qualified host name or simply “localhost”. If your database server listens on a non-standard port, please use the DB_PORT parameter to define the connection port.

Prior to release 2.2.0, this parameter was named mysqlhost.

DB_HOST = database.example.com

DB_NAME

Defines the database the framework will use to read and write build and test information. Be sure that the DB_WRITER_USER and DB_READER_USER have appropriate access to this database.

Prior to release 2.2.0, this parameter was named database.

DB_NAME = nmi_history

DB_PORT

If your database server listens on a non-standard port, you must provide this port number in your NMI configuration file so that the core framework code and the default web interface can gain access. The value should be a integer without a preceding colon.

DB_PORT = 1234

DB_READER_PASS

This parameter defines the password used by DB_READER_USER to access the framework database. Please refer to the framework installation instructions on how to create users with minimal privileges.

Note that blank passwords are now allowed.

DB_READER_PASS = some_pass

DB_READER_USER

Along with DB_READER_PASS, this parameter defines the user name of a database user that has read-only access to the database information. This account is used by the NMI framework web interface and other utilities that only require read-only access to the database. Please refer to the framework installation instructions on how to create users with minimal privileges.

DB_READER_USER = some_user

DB_TYPE

This parameter is used to define what database system is used at your site. It allows the framework and web interface to use the proper database connector utility to access the build and test information.

NOTE: The only supported database this time is MySQL. Please contact the NMI developers team if you would like to deploy the framework using a different database server.

DB_TYPE = mysql

DB_WRITER_PASS

To be written

DB_WRITER_PASS = some_password

DB_WRITER_USER

Along with DB_WRITER_PASS, this parameter defines the user name of a user that has access to the NMI database. Please refer to the framework installation instructions on how to create users with increased privileges.

It is advised that you do not use the same account for DB_WRITER_USER as DB_READER_USER.

DB_WRITER_USER = some_user

DEFAULT_INPUT_HOST

Fully-qualified domain name of the central repository where the build and test artifacts and data is stored. For most installations, this value should be the same as THIS_HOST.

DEFAULT_INPUT_HOST = site.example.com

disk_thresh

This parameter defines when the disk cleaner will run if free disk space (in MB) is less than the value.

The default value is 400000 MB (390 GB)

disk_thresh = 400000

FETCH_RETRY_COUNT

This parameter became available in Metronome 2.2.4.


By default, Metronome will try to fetch an input three times before giving up. This parameter allows you to change that to another integer for the inputs in the same submit file.

The machine default may be changed in nmi.conf using the parameter FETCH_RETRY_COUNT.

globus

To be written.

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use PATH_GLOBUS instead.

main_webserver

Web server for the main entry point to the web view of the builds. This is only necessary if there are more than one submit nodes in your pool.

main_webserver = main-webserver.example.com

MAX_MATCH_WAIT

Specifies the default maximum number of seconds that a run may remain in the queue without ever being successfully matched with a resource, before it will be automatically removed. Users may redefine this value on a per-run basis via the max_match_wait attribute in their run specification file.

Defaults to six days if left undefined.

MONITOR_BACKUP_LOGS

When set to true, the monitor will make backup copies of monitor.out and monitor.err in its run directory. Without this flag, Condor will overwrite the files’ contents when the monitor is restarted. This can happen if the jobs are put on hold or Condor is restarted at the submission point. The default is set to false. There are no negative implications to not having this set to true — it is mostly used for debugging.

MONITOR_BACKUP_LOGS = 1

mysqlhost

To be written.

mysqlhost = database.example.com

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use DB_HOST instead.

mysqlport

If your database server listens on a non-standard port, you must provide this port number in your NMI configuration file so that the core framework code and the default web interface can gain access. The value should be a integer without a preceding colon.

mysqlport = 1234

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use DB_PORT instead.

nmiprefix

To be written

nmiprefix = /path/to/nmi

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use PATH_NMI instead.

password

To be written

password = some_password

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use DB_PASS instead.

path

To be written.

path = rundir

PATH_CONDOR

Defines the path to the root directory of the submit node’s Condor installation. This directory should contain the bin, sbin, and lib subdirectories for Condor. It doest not need a trailing slash.

PATH_CONDOR = /path/to/condor

PATH_GLOBUS

Path to the local Globus installation on the submit node.

PATH_GLOBUS = /path/to/globus

PATH_NMI

To be written

PATH_NMI = /path/to/nmi

PLATFORM_JOB_TIMEOUT

This configuration option became available in Metronome 2.5.1.


Sets an absolute upper limit on the length of time a platform job may run (including Metronome-internal stages). The timeout is parsed in the same way as the remote_*_timeout.

PLATFORM_TYPE

First available in Metronome 2.2.8.


PLATFORM_TYPE over-rides the hard-coded default of “nmi” for the platform type, and is in turn over-ridden by the configuration file option platform_type.


This parameter is for use in Metronome installations which use job migration. By default, Metronome uses the ‘nmi’ platform-naming scheme, which the developers found appropriate for their work. Other installations may want to name their platforms differently. Prior to Metronome 2.2.8, this would require changing the value of ‘nmi_platform’ as reported by the Hawkeye script we supply. With platform types, you can advertise more than one platform name for each machine, for instance, ‘nmi_platform’ and ‘etics_platform’. If your users prefer ETICS-style naming, you can set PLATFORM_TYPE to ‘etics’, and they won’t have to set platform_types in all of their submit files to use their preferred names.

POLLING_BACKOFF

When set to true, this parameter will cause the logfile monitor to exponentially backoff when polling a submission’s files. During a polling cycle, if there was no new information added to logfiles, the monitor will sleep twice as long as it did in the previous cycle. The max sleep time can be controlled with the POLLING_MAX variable.

The default is to disable exponential backoff.

POLLING_BACKOFF = 1

POLLING_INTERVAL

This variable allows you to control how many seconds the logfile monitor will sleep after reading the contents of a submission’s logfiles. If set to zero, the framework will continously poll the logfiles. See this page for more information on how to enable the exponential backoff feature when polling files.

The default value for this paramter is 1 second.

POLLING_INTERVAL = 1 # seconds

POLLING_MAX

Sets a limit on how much the monitor will be allowed to sleep when using the exponential backoff option. A low values will cause the logfile monitors will read the files more often to determine whether new information has been posted from the execution nodes; a high values will cause the framework to possibly update more slowly when a change occurs.

The default is 128 seconds.

POLLING_MAX = 128

protocol

This paramter defines the default protocol for fetching builds submitted by and stored on this machine.

Possible options include:

  • http
  • ftp
  • scp
protocol = http

REMOTE_SITES

To be written

ROUTING_TABLE

To be written

rundir

To be written

rundir = /path/to/nmi/rundir

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use RUN_DIR instead.

RUN_DIR

To be written

RUN_DIR = /path/to/nmi/rundir

RUN_DIR_URL

To be written…

RUN_DIR_URL = 

THIS_HOST

Fully-qualified domain name of the submit host.

THIS_HOST = submit-node.example.com

url_prefix

This parameter defines the relative path where the web pages are located.

For example, if the framework webpages are installed /data/www/htdocs/nmi, where /data/www/htdocs/ is the root directory of your webserver, url_prefix should be set as ‘nmi’.

url_prefix = nmi

username

To be written

username = some_user

NOTE: As of NMI 2.2.0, this parameter has been deprecated. Use DB_USER instead.

use_category_throttling

This parameter became available in Metronome 2.5.0.



This parameter may only be used with Condor 6.9.5 or later. If set to true, Metronome will use a new Condor mechanism (“category throttling”) to control the number of tasks running on the submit node at the same time for any given run. This will reduce the load on the submit node from long-running and/or io-intensive platform_pre or platform_post jobs in multiple-platform runs.

Administrators may also wish to limit the load on their submit nodes by limiting how many jobs run there at once. Condor calls these jobs “scheduler universe”, and it should possible to use load measurements in the scheduler universe’s start expression. Doing so, however, is outside the scope of this manual.

Administrators may also wish to throttle submit host load on a per-user basis (so that heavy users don’t starve others), but Condor does not yet support this.

See the Condor manual for more information.

use_condor_job_leases

Added in Metronome 2.2.8.


If this parameter is set to true, Metronome will set job leases for its platform_jobs, which makes them tolerant of service interruptions on the submit hosts. Because Metronome requires streaming output, this option can not be used prior to Condor 6.9.4.

WEBSERVER

To be written

webserver = server.example.com

Metronome Usage Accounting

Metronome Usage Report

The Metronome usage report measures the heterogeneity and frequency of builds and tests submitted by NMI Lab users, as well as summarized by project. It is generated by the nmi_usage_stats.pl script.

The report measures the number of platform_job tasks, rather than all tasks or the number of runs, because the platform_job count offers the best measure of usage. A report of runs would over-report users who submit multi-platform builds as multiple, separate runs vs. users who submit them together in a single run, even though both users performed identical operations. Likewise, a report of all tasks would over-report users who break their builds into many more granular tasks vs. users who perform identical builds monolithically in a single task.

The number of platform_job tasks, in contrast, measures the number of builds or tests a user submitted to individual platforms, regardless of whether those builds or tests were submitted as a single multi-platform run or multiple single-platform runs, and regardless of whether the builds or tests were defined as a single task, or were broken up into many smaller tasks to be individually recorded in the DB.

NOTE: this report does not measure the actual resource consumption of the user. It reports equally a five-minute or ten-hour test. For a measure of resource consumption (as opposed to heterogeneity and frequency of builds and tests), see the Condor usage report.

Metronome Web Interface

To be written…

Customize Site Branding

This page provides information on how to customize the web interface for your organization. All the changes shown below should be made in the web interface configuration file (etc/config.inc).

Site Title

To change the text used in the web browser title, as well as on the sidebar column, modify the SITE_TITLE configuration parameter.

   // -------------------------------------------------------
   // SITE TITLE
   // This is how the site will be branded
   // -------------------------------------------------------
   define('SITE_TITLE',    'New Site Title');

Sidebar Logos

By default, the web interface displays the NSF and NMI lggos in the sidebar column. This can easily be changed to either replace or add additional logos for your organization. There are three base parameters that control these logos: SITE_LOGO, SITE_LOGO_LBL, and SITE_LOGO_URL. You can remove the logos from the page by removing these parameters from etc/config.inc.

It is easy to change not only the logo image, but also its url and label on your site. For example, the logo image is defined with the SITE_LOGO parameter, and the corresponding alternative text is defined with SITE_LOGO_LBL. If you want the logo to be a link to some site, you may also define SITE_LOGO_URL as an address that the image will link to.

   define('SITE_LOGO',     'http://www.example.com/images/logo.gif');
   define('SITE_LOGO_URL', 'http://www.example.com');
   define('SITE_LOGO_LBL', 'Example Site');

To display multiple logos, add a unique numerical suffix to the end of each set of parameter starting at one. As shown in the example below, to display two logos one needs to define SITE_LOGO1 and SITE_LOGO2, along with the corresponding label and url parameters with the same numerical suffix.

   define(‘SITE_LOGO1’,     ‘http://www.example1.com/images/logo1.gif’);
   define(‘SITE_LOGO_URL1’, ‘http://www.example1.com’);
   define(‘SITE_LOGO_LBL1’, ‘Example Site #1’);

define(‘SITE_LOGO2’, ‘http://www.example2.com/images/logo2.gif’); define(‘SITE_LOGO_URL2’, ‘http://www.example2.com’); define(‘SITE_LOGO_LBL2’, ‘Example Site #2’);

Installing Web Interface

1) Expose the contents of this directory (‘web’) so it appears under Apache’s DocumentRoot. You can copy or move it under the DocumentRoot or just create a symlink, for example:

# ln -s /web /nmiweb

Here is the top-level of where you unpacked the NMI
framework tarball.

2) Make a copy of index.php.sample to index.php Please make sure that it has the proper permissions.

% cd /web
% cp index.php.sample index.php
% chmod 0644 index.php

You will need to edit index.php to add the path of your current directory (the one containing index.php) and the path to your NMI configuration file. Be sure to include the trailing slash in the BASE_PATH. For example,

define('BASE_PATH', '/path/to/this/file/');
define('NMI_CONF', '/path/to/nmi.conf');

would become:

define('BASE_PATH', '/home/pavlo/public/html/www/');
define('NMI_CONF', '/nmi/etc/nmi.conf');

3) Copy the sample website configuration file to make a new ‘etc/config.inc’ and tweak the settings for your site.

% cd etc/
% cp config.inc.sample config.inc
% chmod 0664 config.inc

4) Make the interface aware of where to look for .out and .err files for the builds. Let’s say your nmi.conf file has setting:

rundir = /path/to/builds

and let’s say your Apache DocumentRoot is /var/www/html. Then choose some value for the relative URL path to .out/.err files (say, “foo”). Configure it by setting

path = foo

in your nmi.conf, then create a symlink to expose the files:

# ln -s /path/to/builds /var/www/html/foo

5) Point your browser to the address of index.php Check your Apache error/access logs if the page does not load correctly You can also turn on debugging output by uncommenting the following line in index.php

//error_reporting(E_ALL);
error_reporting(E_ERROR | E_PARSE);

would become:

error_reporting(E_ALL);
//error_reporting(E_ERROR | E_PARS