Return Codes
Run Result Codes
The result code stored in the DB for a Metronome run is the number of failed tasks in that run. If the run had no failed tasks, Metronome will store a return code of 0. If the run is in a special state, the Metronome result code will be negative, as follows:
| Run Result Code | Key |
|---|---|
| null | running |
| >0 | completed and failed |
| 0 | completed successfully |
| -1 | removed |
| -1015 | Incomplete. This is a run-terminal condition, set by the monitor when the run finishes but does not complete. (Specifically, when computing the run's result code, if any component task is incomplete, the whole run is incomplete.) |
NOTE: the nmi_run_status command-line tool interprets the value stored in the Metronome DB along with other information, and returns its own set of status codes for each run.
Task Result Codes
The result code reported by Metronome (and stored in the DB) for each Metronome task is the unix return code of the user-specified task script. If the script had no return code because it was killed by a signal, the Metronome result code will be the negative integer corresponding to the signal number. For example, -9 means SIGKILL.
For some Metronome or Condor-level failures (e.g., a job removed from the queue prior the task script even running), the Metronome result code will be a special negative value beyond the range of signals (e.g., -1001).
These special negative Metronome result codes are documented here. The web portal does not yet translate all of them from numbers into human-readable terms.
| Task Return Code | Key | Description |
|---|---|---|
0 to 255 |
Normal Exit | Return code of user-specified task executable |
-1 to -31 |
Killed by Signal | Negative integer corresponding to the Unix signal number |
-32 |
Execution failure | The user-specified task executable could not be executed _or_ timed out |
-1001 |
Submission failure | DAGMan error code for a job submission failure |
-1002 |
Removed | The task's Condor job was manually removed from the queue "out from under" Metronome |
-1003 |
Interrupted | Tasks are given this result code if they were interrupted while executing and no return code was available. (E.g., if the Metronome wrapper overseeing the task was killed by an external entity, or if the remote resource crashed.) This result should be temporary and non-fatal: once the task is automatically re-executed, its new status will replace this one. |
-1015 |
Incomplete | This means that the task neither succeeded nor failed. However, unlike Interrupted, this a run-terminal condition, generated by the run monitor for a task if the run's DAG terminates before that task has a result. |
Meta-Tasks
Special meta-tasks (e.g., platform_job, or remote_task when sub-tasks have been declared) are given return codes corresponding to their total number of failed sub-tasks. If all of tasks within the meta-task succeeded (i.e., returned 0), Metronome will assign the meta-task a return code of 0.
