.. highlight:: rest .. _sec-running: Running TRINITY =============== Running a simulation -------------------- A TRINITY run is fully specified by one plain-text parameter file, and there is a single command — from the repository root:: python run.py param/simple_cluster.param The path may be absolute or relative to the repository root. ``run.py`` scans the file and dispatches automatically: if the file contains list (``[...]``) or ``tuple(...)`` syntax it runs a parameter sweep across a parallel worker pool, otherwise it runs a single simulation. There is no separate command or flag for sweeps. On an HPC cluster you can instead generate a SLURM job array with ``--emit-jobs`` (see *Running on a cluster (SLURM)* below). Output is written to the directory named by the ``path2output`` parameter; the default sentinel ``def_dir`` resolves to ``outputs//`` under the current working directory. See *Outputs* below for the file layout. Parameter-file formats ---------------------- A parameter file lists one ``keyword value`` entry per line (see :ref:`sec-parameters` for the full keyword reference). The *value* syntax alone decides whether the file is a single run or a sweep: .. code-block:: text :caption: param/sweep_hybrid_example.param # Plain key/value — fixed across every run dens_profile densPL nISM 0.1 path2output outputs/demo # tuple(...) — only these explicit (mCloud, sfe) pairs are run tuple(mCloud, sfe) [1e5, 0.01] [1e7, 0.10] # [list] — swept Cartesian-style across each tuple pair nCore [1e3, 1e4] The file above mixes all three value forms. How many simulations a file generates depends only on which forms it uses: .. list-table:: :widths: 32 16 52 :header-rows: 1 * - Value syntax - Mode - Runs generated * - no ``[ ]``, no ``tuple()`` - single - 1 * - ``key [a, b, c]`` - Cartesian - every combination (e.g. ``mCloud`` × ``sfe`` = 3 × 2 = 6) * - ``tuple(x, y) [..] [..]`` - tuple - only the listed points, no expansion * - tuple **and** list together - hybrid - tuple points × list combinations The hybrid example therefore runs 2 tuple pairs × 2 ``nCore`` values = 4 simulations. Single-purpose worked examples ship as ``param/sweep_example.param`` (Cartesian), ``param/sweep_tuple_example.param`` (tuple), and ``param/sweep_hybrid_example.param`` (hybrid). Command-line flags ------------------ All flags are optional. Most take effect only in sweep mode; for a single run, ``--dry-run`` prints the resolved file and exits without running, while ``--workers`` and ``--yes`` are ignored: .. list-table:: :widths: 25 75 :header-rows: 1 * - Flag - Description * - ``--dry-run``, ``-n`` - Preview all combinations (with any GMC warnings) without running. * - ``--workers N``, ``-w`` - Parallel workers for the in-process sweep pool — or the array concurrency cap with ``--emit-jobs``. Default inside a SLURM job: the full allocation (``SLURM_CPUS_PER_TASK``, else ``SLURM_CPUS_ON_NODE`` / CPU affinity); on a laptop: ``max(1, CPU count // 2 - 1)``. Must be ``>= 1``; refused if it exceeds the cores available to this process. * - ``--yes``, ``-y`` - Skip the interactive confirmation prompt. * - ``--verbose``, ``-v`` - DEBUG-level logs and the full base-parameter list. * - ``--emit-jobs DIR`` - Generate a SLURM job-array bundle in ``DIR`` (one task per combination) instead of running locally; requires a sweep file. Mutually exclusive with ``--collect-report`` (see below). * - ``--collect-report DIR`` - Aggregate a finished ``--emit-jobs`` bundle into ``sweep_report.txt`` / ``.json``; needs no parameter file. Before launching, ``run.py`` runs a GMC-parameter plausibility check (cloud mass vs. core/ISM density, cloud radius, …) on every combination; invalid ones are listed up front so you can abort rather than waste compute. Press ``Ctrl+C`` — or send ``SIGTERM``, e.g. from SLURM ``scancel`` — to cancel cleanly: in-flight workers are stopped and a report of completed / failed / cancelled runs is written to the output directory. Running on a cluster (SLURM) ---------------------------- On a laptop or a single multi-core node, a sweep runs across an in-process worker pool sized by ``--workers``. To scale across nodes on an HPC cluster (e.g. bwForCluster Helix / bwUniCluster), generate a SLURM **job array** instead — one array task per combination, so the scheduler packs them across nodes and restarts failures independently:: python run.py param/sweep_example.param --emit-jobs jobs/ # edit jobs/submit_sweep.sbatch: --account, --partition, --time, --mem sbatch jobs/submit_sweep.sbatch python run.py --collect-report jobs/ # after the array finishes Running the in-process pool on a *login* node is discouraged; ``run.py`` prints a warning when SLURM is detected without an active job. ``--emit-jobs DIR`` writes a self-contained, submittable bundle: .. code-block:: text jobs/ ├── params/.param # one per combination, absolute path2output ├── runs.tsv # param_path output_dir; line N = array task N ├── manifest.json # index: names, params, output dirs ├── submit_sweep.sbatch # #SBATCH --array=1-N[%K]; one sim per task └── logs/ # %A_%a.out per task Each array task runs ``python run.py .param`` with one CPU and math-library threads pinned to one (``OMP_NUM_THREADS=1`` …, ``MPLBACKEND=Agg``); parallelism comes from running many tasks, not from threading one. Passing ``--workers K`` at emit time caps concurrency as ``--array=1-N%K``. When the array finishes, ``--collect-report DIR`` reads each task's ``.exit_code`` / ``.duration`` sentinels and writes the same ``sweep_report.txt`` / ``.json`` as a local sweep, then prints a ready ``sbatch --array= jobs/submit_sweep.sbatch`` to rerun only the failures. Outputs land in the same ``path2output//`` layout as a local sweep (see *Outputs* below). Bundled inputs (SPS, cooling tables, ``lib/default/``) resolve relative to the package, so the clone location does not matter; only ``path2output`` follows the launch directory — set it to an absolute path on a work/scratch filesystem for cluster runs. Outputs ------- File layout ^^^^^^^^^^^ A single run writes these files into ``path2output``: .. code-block:: text path2output/ ├── dictionary.jsonl # simulation state, one JSON object per snapshot ├── metadata.json # run constants + termination + final-state blocks └── trinity.log # log file (written when log_file = True) A sweep writes those same files into one subdirectory per combination, adds a fully-resolved ``.param`` sidecar to each, and writes two top-level reports: .. code-block:: text outputs/my_sweep/ ├── 1e5_sfe001_n1e3/ │ ├── 1e5_sfe001_n1e3.param # full resolved params for this run │ ├── dictionary.jsonl │ ├── metadata.json │ └── trinity.log ├── 1e5_sfe001_n1e4/ │ └── ... ├── sweep_report.txt # human-readable sweep summary └── sweep_report.json # machine-readable sweep summary Auto-generated run names ^^^^^^^^^^^^^^^^^^^^^^^^ Each sweep combination is named automatically:: {mCloud}_sfe{sfe*100:03d}_n{nCore}[_density-profile][_PHII][_other-swept-keys] The optional suffixes appear only when the relevant parameter is set explicitly in the sweep file (not when left at its ``default.param`` value): - ``_PL{alpha}`` for ``dens_profile = densPL`` (e.g. ``_PL0``, ``_PL-2``), or ``_BE{Omega}`` for ``densBE`` (e.g. ``_BE14``). - ``_yesPHII`` / ``_noPHII`` when ``include_PHII`` is set — useful when sweeping the flag to compare runs with and without HII pressure. - Any *other* swept parameter without a curated slot above gets a generic ``_{key}{value}`` suffix so distinct combinations never collapse onto the same folder. snake_case keys become camelCase and decimal points in floats become ``p`` (minus signs are kept, as in ``_PL-2``). Examples: sweeping ``ZCloud = [0.5, 1.0]`` yields ``_ZCloud0p5`` / ``_ZCloud1p0``; ``coll_counter = [True, False]`` yields ``_collCounterTrue`` / ``_collCounterFalse``. Multiple generic suffixes are emitted in sorted-key order for stability. Folder-name safety rules applied to generic values: - **Hard-rejected** with an immediate ``ValueError`` (no sweep runs): values containing ``/``, ``\``, ``..``, or any control character. This means **filepath-typed parameters cannot be swept** — set them once in your base param file. The check protects against silently nesting directories or escaping the sweep root. - **Sanitised** to ``-``: any character outside ``[A-Za-z0-9.+-]`` (spaces, brackets, shell wildcards, unicode, ``=``, ``:`` …). The sweep still runs but with a safe folder name. - **Length-capped** at 200 characters for the full run name; the sweep aborts with a clear error if you cross it (reserve room for sibling filenames within the 255-byte filesystem cap). For example, ``1e7_sfe010_n1e4_noPHII`` is ``mCloud=1e7, sfe=0.10, nCore=1e4`` with ``include_PHII = False``. The folder name is only a unique human-readable handle: every sweep run also writes its full resolved parameter set to a per-run ``.param`` file (plus the sweep-wide ``sweep_report.json``), so a run with *no* suffix for some key still has that key recorded — it just took the ``default.param`` value. Master plot scripts that compare across sweeps should read parameters from those sidecars rather than parse them out of the folder name. Output data model ----------------- dictionary.jsonl ^^^^^^^^^^^^^^^^ Each simulation writes its full state to ``dictionary.jsonl`` as a stream of newline-delimited JSON objects, one per snapshot. Writes are append-only and crash-safe (the run flushes buffered snapshots on a clean exit, ``Ctrl+C``, or ``SIGTERM``), so the file remains readable after a crash — the last line may be partial but every prior line is a complete snapshot. Snapshots are saved *before* each ODE step, so all values in one snapshot share a single ``t_now``. Snapshot keys group into a handful of categories: .. list-table:: :widths: 30 70 :header-rows: 1 * - Category - Example keys * - Administrative - ``model_name``, ``current_phase``, ``SimulationEndReason`` * - Cloud setup - ``mCloud``, ``mCluster``, ``rCloud``, ``nEdge`` * - Dynamical state - ``t_now``, ``R2``, ``v2``, ``Eb``, ``T0``, ``R1``, ``Pb`` * - Feedback (SPS) - ``Lmech_W``, ``Lmech_SN``, ``Qi``, ``Lbol``, ``pdot_total`` * - Pressures - ``P_drive``, ``P_HII``, ``P_ram`` * - Forces - ``F_grav``, ``F_ram``, ``F_ram_wind``, ``F_ram_SN``, ``F_HII``, ``F_rad`` * - Bubble / shell profiles - ``log_bubble_T_arr`` + ``bubble_T_arr_r_arr``, ``bubble_v_arr`` + ``bubble_v_arr_r_arr``, ``log_shell_n_arr`` + ``shell_r_arr`` A handful of long 1-D profile arrays are downsampled before serialisation to keep snapshots manageable. Each simplified array is paired with its own abscissa (``*_r_arr``) and, where the values span many decades, stored in :math:`\log_{10}` space (``log_*``). The target point budget is set by ``simplify_npoints`` (see :ref:`sec-parameters`). To recover a profile, linearly interpolate the paired abscissa against the (possibly log-space) values. The recommended way to read the file is the reader API in :ref:`sec-trinity-reader`, which hides the JSONL layout, the per-key unit metadata, and the legacy ``.json`` format behind a small set of classes. metadata.json ^^^^^^^^^^^^^ Run-constant parameters and end-of-run summaries live in a sibling ``metadata.json`` (current schema version 4) rather than being repeated in every snapshot. The reader rehydrates the run constants into each snapshot on load, so consumers never have to read this file directly — but it is small and human-inspectable. Its top-level blocks are: .. list-table:: :widths: 24 76 :header-rows: 1 * - Block - Contents * - run constants - Every input parameter / set-once derived value that does not change after phase 0 (``mCloud``, ``sfe``, ``dens_profile``, physical constants, …). Membership is derived from the ParamSpec registry (see :ref:`sec-parameters`), so it always matches the schema. * - ``termination`` - ``{exit_code, outcome, detail, timestamp, model_name}`` — how the run ended. * - ``final_state`` - Every scalar / bool / string on the state dict at run end, in internal units (pc, Myr, pc⁻³). Long arrays are excluded — their final values are the last line of ``dictionary.jsonl``. * - ``termination_debug`` - Last-two-snapshot diff, a NaN/Inf inventory, and physics sanity checks, for post-mortem debugging. All writes go through an atomic helper, so an interrupted write can never leave a corrupt file. show_run ^^^^^^^^ For a quick human-readable view of a finished run — run context, termination reason, and final state — without writing any plotting code:: python -m trinity._output.show_run path2output/ It reads ``metadata.json`` and pretty-prints a curated subset; pass ``--json`` for the full dump or ``--quiet`` for a scriptable exit code (useful in batch loops over a sweep tree). Logging ------- The parameter reference in :ref:`sec-parameters` lists the four logging parameters (``log_level``, ``log_console``, ``log_file``, ``log_colors``) and their defaults. This section covers the conceptual ladder of log levels. Each level includes itself and all more severe levels: ``CRITICAL > ERROR > WARNING > INFO > DEBUG``. Setting ``log_level = INFO`` emits ``INFO``, ``WARNING``, ``ERROR``, and ``CRITICAL`` messages. .. list-table:: :widths: 15 45 40 :header-rows: 1 * - Level - Typical messages - When to use * - ``DEBUG`` - Variable values, loop iterations, intermediate calculations, function entry/exit. - Development; debugging specific issues (default). * - ``INFO`` - Phase transitions, major events (bubble burst, cloud edge reached), initialisation and completion markers. - Normal simulation runs. * - ``WARNING`` - Values clamped to limits, fallback behaviour, unusual but non-critical conditions. - Production runs where only potential problems matter. * - ``ERROR`` - Calculation failures, recoverable errors. - Silent runs where only actual errors matter. * - ``CRITICAL`` - Unrecoverable failures, fatal errors. - When only simulation-stopping errors should print. With ``log_level = INFO``, console output looks like: .. code-block:: text 2026-01-08 15:30:00 | INFO | trinity.main | === TRINITY Simulation Starting === 2026-01-08 15:30:00 | INFO | trinity.main | Model: test_simulation 2026-01-08 15:30:01 | INFO | trinity.sps.read_sps | SPS data processed 2026-01-08 15:30:03 | INFO | trinity.phase1_energy | Entering energy-driven phase 2026-01-08 15:30:45 | INFO | trinity.main | === TRINITY Simulation Complete === Troubleshooting --------------- Most parameter errors are typos against the schema; the authoritative list of valid keywords and defaults is the ParamSpec registry (``trinity/_input/registry.py``), from which ``trinity/_input/default.param`` is generated and mirrored in :ref:`sec-parameters`. For issues and feature requests, see https://github.com/JiaWeiTeh/trinity/issues.