Running TRINITY
Running a simulation
A TRINITY run is fully specified by one plain-text parameter file, and there is a single command — from the repository root:
python run.py param/simple_cluster.param
The path may be absolute or relative to the repository root.
run.py scans the file and dispatches automatically: if the file
contains list ([...]) or tuple(...) syntax it runs a parameter
sweep across a parallel worker pool, otherwise it runs a single
simulation. There is no separate command or flag for sweeps. On an HPC
cluster you can instead generate a SLURM job array with --emit-jobs
(see Running on a cluster (SLURM) below).
Output is written to the directory named by the path2output
parameter; the default sentinel def_dir resolves to
outputs/<model_name>/ under the current working directory. See
Outputs below for the file layout.
Parameter-file formats
A parameter file lists one keyword value entry per line (see
Parameter Specifications for the full keyword reference). The value
syntax alone decides whether the file is a single run or a sweep:
# Plain key/value — fixed across every run
dens_profile densPL
nISM 0.1
path2output outputs/demo
# tuple(...) — only these explicit (mCloud, sfe) pairs are run
tuple(mCloud, sfe) [1e5, 0.01] [1e7, 0.10]
# [list] — swept Cartesian-style across each tuple pair
nCore [1e3, 1e4]
The file above mixes all three value forms. How many simulations a file generates depends only on which forms it uses:
Value syntax |
Mode |
Runs generated |
|---|---|---|
no |
single |
1 |
|
Cartesian |
every combination (e.g. |
|
tuple |
only the listed points, no expansion |
tuple and list together |
hybrid |
tuple points × list combinations |
The hybrid example therefore runs 2 tuple pairs × 2 nCore values =
4 simulations. Single-purpose worked examples ship as
param/sweep_example.param (Cartesian),
param/sweep_tuple_example.param (tuple), and
param/sweep_hybrid_example.param (hybrid).
Command-line flags
All flags are optional. Most take effect only in sweep mode; for a
single run, --dry-run prints the resolved file and exits without
running, while --workers and --yes are ignored:
Flag |
Description |
|---|---|
|
Preview all combinations (with any GMC warnings) without running. |
|
Parallel workers for the in-process sweep pool — or the array
concurrency cap with |
|
Skip the interactive confirmation prompt. |
|
DEBUG-level logs and the full base-parameter list. |
|
Generate a SLURM job-array bundle in |
|
Aggregate a finished |
Before launching, run.py runs a GMC-parameter plausibility check
(cloud mass vs. core/ISM density, cloud radius, …) on every
combination; invalid ones are listed up front so you can abort rather
than waste compute. Press Ctrl+C — or send SIGTERM, e.g. from
SLURM scancel — to cancel cleanly: in-flight workers are stopped
and a report of completed / failed / cancelled runs is written to the
output directory.
Running on a cluster (SLURM)
On a laptop or a single multi-core node, a sweep runs across an
in-process worker pool sized by --workers. To scale across nodes on
an HPC cluster (e.g. bwForCluster Helix / bwUniCluster), generate a
SLURM job array instead — one array task per combination, so the
scheduler packs them across nodes and restarts failures independently:
python run.py param/sweep_example.param --emit-jobs jobs/
# edit jobs/submit_sweep.sbatch: --account, --partition, --time, --mem
sbatch jobs/submit_sweep.sbatch
python run.py --collect-report jobs/ # after the array finishes
Running the in-process pool on a login node is discouraged; run.py
prints a warning when SLURM is detected without an active job.
--emit-jobs DIR writes a self-contained, submittable bundle:
jobs/
├── params/<run_name>.param # one per combination, absolute path2output
├── runs.tsv # param_path <TAB> output_dir; line N = array task N
├── manifest.json # index: names, params, output dirs
├── submit_sweep.sbatch # #SBATCH --array=1-N[%K]; one sim per task
└── logs/ # %A_%a.out per task
Each array task runs python run.py <combo>.param with one CPU and
math-library threads pinned to one (OMP_NUM_THREADS=1 …,
MPLBACKEND=Agg); parallelism comes from running many tasks, not from
threading one. Passing --workers K at emit time caps concurrency as
--array=1-N%K.
When the array finishes, --collect-report DIR reads each task’s
.exit_code / .duration sentinels and writes the same
sweep_report.txt / .json as a local sweep, then prints a ready
sbatch --array=<failed ids> jobs/submit_sweep.sbatch to rerun only
the failures.
Outputs land in the same path2output/<run_name>/ layout as a local
sweep (see Outputs below). Bundled inputs (SPS, cooling tables,
lib/default/) resolve relative to the package, so the clone location
does not matter; only path2output follows the launch directory — set
it to an absolute path on a work/scratch filesystem for cluster runs.
Outputs
File layout
A single run writes these files into path2output:
path2output/
├── dictionary.jsonl # simulation state, one JSON object per snapshot
├── metadata.json # run constants + termination + final-state blocks
└── trinity.log # log file (written when log_file = True)
A sweep writes those same files into one subdirectory per combination,
adds a fully-resolved .param sidecar to each, and writes two
top-level reports:
outputs/my_sweep/
├── 1e5_sfe001_n1e3/
│ ├── 1e5_sfe001_n1e3.param # full resolved params for this run
│ ├── dictionary.jsonl
│ ├── metadata.json
│ └── trinity.log
├── 1e5_sfe001_n1e4/
│ └── ...
├── sweep_report.txt # human-readable sweep summary
└── sweep_report.json # machine-readable sweep summary
Auto-generated run names
Each sweep combination is named automatically:
{mCloud}_sfe{sfe*100:03d}_n{nCore}[_density-profile][_PHII][_other-swept-keys]
The optional suffixes appear only when the relevant parameter is set
explicitly in the sweep file (not when left at its default.param
value):
_PL{alpha}fordens_profile = densPL(e.g._PL0,_PL-2), or_BE{Omega}fordensBE(e.g._BE14)._yesPHII/_noPHIIwheninclude_PHIIis set — useful when sweeping the flag to compare runs with and without HII pressure.Any other swept parameter without a curated slot above gets a generic
_{key}{value}suffix so distinct combinations never collapse onto the same folder. snake_case keys become camelCase and decimal points in floats becomep(minus signs are kept, as in_PL-2). Examples: sweepingZCloud = [0.5, 1.0]yields_ZCloud0p5/_ZCloud1p0;coll_counter = [True, False]yields_collCounterTrue/_collCounterFalse. Multiple generic suffixes are emitted in sorted-key order for stability.Folder-name safety rules applied to generic values:
Hard-rejected with an immediate
ValueError(no sweep runs): values containing/,\,.., or any control character. This means filepath-typed parameters cannot be swept — set them once in your base param file. The check protects against silently nesting directories or escaping the sweep root.Sanitised to
-: any character outside[A-Za-z0-9.+-](spaces, brackets, shell wildcards, unicode,=,:…). The sweep still runs but with a safe folder name.Length-capped at 200 characters for the full run name; the sweep aborts with a clear error if you cross it (reserve room for sibling filenames within the 255-byte filesystem cap).
For example, 1e7_sfe010_n1e4_noPHII is mCloud=1e7, sfe=0.10,
nCore=1e4 with include_PHII = False.
The folder name is only a unique human-readable handle: every sweep run
also writes its full resolved parameter set to a per-run .param
file (plus the sweep-wide sweep_report.json), so a run with no
suffix for some key still has that key recorded — it just took the
default.param value. Master plot scripts that compare across sweeps
should read parameters from those sidecars rather than parse them out of
the folder name.
Output data model
dictionary.jsonl
Each simulation writes its full state to dictionary.jsonl as a
stream of newline-delimited JSON objects, one per snapshot. Writes are
append-only and crash-safe (the run flushes buffered snapshots on a
clean exit, Ctrl+C, or SIGTERM), so the file remains readable
after a crash — the last line may be partial but every prior line is a
complete snapshot. Snapshots are saved before each ODE step, so all
values in one snapshot share a single t_now.
Snapshot keys group into a handful of categories:
Category |
Example keys |
|---|---|
Administrative |
|
Cloud setup |
|
Dynamical state |
|
Feedback (SPS) |
|
Pressures |
|
Forces |
|
Bubble / shell profiles |
|
A handful of long 1-D profile arrays are downsampled before
serialisation to keep snapshots manageable. Each simplified array is
paired with its own abscissa (*_r_arr) and, where the values span
many decades, stored in \(\log_{10}\) space (log_*). The
target point budget is set by simplify_npoints (see
Parameter Specifications). To recover a profile, linearly interpolate the
paired abscissa against the (possibly log-space) values.
The recommended way to read the file is the reader API in
Output Reader API, which hides the JSONL layout, the per-key
unit metadata, and the legacy .json format behind a small set of
classes.
metadata.json
Run-constant parameters and end-of-run summaries live in a sibling
metadata.json (current schema version 4) rather than being repeated
in every snapshot. The reader rehydrates the run constants into each
snapshot on load, so consumers never have to read this file directly —
but it is small and human-inspectable. Its top-level blocks are:
Block |
Contents |
|---|---|
run constants |
Every input parameter / set-once derived value that does not
change after phase 0 ( |
|
|
|
Every scalar / bool / string on the state dict at run end, in
internal units (pc, Myr, pc⁻³). Long arrays are excluded — their
final values are the last line of |
|
Last-two-snapshot diff, a NaN/Inf inventory, and physics sanity checks, for post-mortem debugging. |
All writes go through an atomic helper, so an interrupted write can never leave a corrupt file.
show_run
For a quick human-readable view of a finished run — run context, termination reason, and final state — without writing any plotting code:
python -m trinity._output.show_run path2output/
It reads metadata.json and pretty-prints a curated subset; pass
--json for the full dump or --quiet for a scriptable exit code
(useful in batch loops over a sweep tree).
Logging
The parameter reference in Parameter Specifications lists the four
logging parameters (log_level, log_console, log_file,
log_colors) and their defaults. This section covers the conceptual
ladder of log levels.
Each level includes itself and all more severe levels:
CRITICAL > ERROR > WARNING > INFO > DEBUG. Setting
log_level = INFO emits INFO, WARNING, ERROR, and
CRITICAL messages.
Level |
Typical messages |
When to use |
|---|---|---|
|
Variable values, loop iterations, intermediate calculations, function entry/exit. |
Development; debugging specific issues (default). |
|
Phase transitions, major events (bubble burst, cloud edge reached), initialisation and completion markers. |
Normal simulation runs. |
|
Values clamped to limits, fallback behaviour, unusual but non-critical conditions. |
Production runs where only potential problems matter. |
|
Calculation failures, recoverable errors. |
Silent runs where only actual errors matter. |
|
Unrecoverable failures, fatal errors. |
When only simulation-stopping errors should print. |
With log_level = INFO, console output looks like:
2026-01-08 15:30:00 | INFO | trinity.main | === TRINITY Simulation Starting ===
2026-01-08 15:30:00 | INFO | trinity.main | Model: test_simulation
2026-01-08 15:30:01 | INFO | trinity.sps.read_sps | SPS data processed
2026-01-08 15:30:03 | INFO | trinity.phase1_energy | Entering energy-driven phase
2026-01-08 15:30:45 | INFO | trinity.main | === TRINITY Simulation Complete ===
Troubleshooting
Most parameter errors are typos against the schema; the authoritative
list of valid keywords and defaults is the ParamSpec registry
(trinity/_input/registry.py), from which trinity/_input/default.param
is generated and mirrored in Parameter Specifications. For issues and
feature requests, see https://github.com/JiaWeiTeh/trinity/issues.