Workflow System
===============

Overview
--------

The workflow system automates the complete pipeline for multisite λ-dynamics
simulations with contextual bandit training:

1. **Combination Generation**: Create all valid site/substituent combinations
2. **Splitting**: Divide combinations into training/validation/test sets
3. **Training**: Train CB policy on graph structures
4. **Simulation**: Run MD simulations with optimized bias coefficients
5. **Compression**: Archive simulation outputs for storage

Running the Workflow
--------------------

Basic Usage
~~~~~~~~~~~

The workflow is driven by a YAML configuration file:

.. code-block:: bash

   python -m mllf.cli.workflow --config examples/workflow_14benz.yaml

Or using the convenience wrapper:

.. code-block:: bash

   cd examples
   python run_workflow_deepset.py workflow_14benz.yaml
   python run_workflow_deepset.py my_config.yaml  # Use custom config

Configuration Format
~~~~~~~~~~~~~~~~~~~~

A workflow config is a YAML file specifying which operations to run and their parameters.
Key sections include:

* **system**: Environment type (solvent, gas, protein)
* **create_combos**: Generate combinations from fragment files
* **split**: Divide combinations into train/val/test sets
* **pretrain**: Optional pretraining from existing simulations
* **curriculum**: Progressive training stages (see :ref:`Curriculum Learning`)
* **training**: Model architecture and hyperparameters
* **reward**: Reward function weights and thresholds
* **output**: Checkpointing and output organization
* **archive**: Automatic compression of completed runs

See the :ref:`Complete Configuration Example` for a full annotated YAML file.

Combination Generation
----------------------

Principles
~~~~~~~~~~

Combinations are generated from site/substituent fragment files:

* **Input files**: ``site{N}_sub{M}_{label}.{rtf,pdb}`` files in the input directory
* **Sites**: Identified by the site number (N)
* **Substituents**: Identified by the sub number (M) within each site

.. warning::
   **Minimum Substituents Required**: Each site must have at least 2 substituents. MSLD simulations 
   will not run correctly with only a single substituent at a site. If any site has only 1 substituent, 
   combination generation will fail with an error. To resolve this, either add more substituents to the site or 
   add the site information to your core structure files (e.g., ``core.pdb`` and ``core.rtf`` if using 
   msld-py-prep).

The generator creates two types of combinations:

1. **Within-site combinations**: Multiple substituents from a single site
2. **Cross-site combinations**: Substituents from multiple sites simultaneously

.. note::
   **Combination Size Limit**: By default, each combination is limited to at most 10 substituents 
   per site (``max_subs_per_site=10``). This prevents combinatorial explosion while still allowing 
   all substituents to participate across different combinations. For example, with 50 substituents 
   at a site, the generator will create combinations like ``[1,2,...,10]``, ``[1,2,...,9,11]``, etc., 
   but not ``[1,2,...,11]``. This limit can be increased via the ``--max-subs`` command-line option 
   or the ``max_subs_per_site`` parameter in the API.

Lazy Directory Creation
^^^^^^^^^^^^^^^^^^^^^^^

For systems with large combination spaces (e.g., 14,211 total combinations),
creating all directories upfront is inefficient—most will never be used in
training. The workflow implements **lazy (on-demand) directory creation**:

**Metadata Generation**: During the combination generation phase, the system:

1. Lists all possible combinations without creating directories
2. Saves metadata to ``combo_metadata.json`` with:
   
   - Combination name (e.g., ``comb_0001_site2_1__site2_2``)
   - Path where directory will be created
   - Sites and substituents included
   - Counter for ordering

3. Writes manifest files listing all possible combinations

**On-Demand Creation**: Directories are created only when needed:

* During training/validation splits, combinations are selected but not created
* When a combination is accessed for training, the workflow:
  
  1. Checks if the directory exists
  2. If not, loads metadata from ``combo_metadata.json``
  3. Creates the directory with all required files
  4. Continues with training

**Benefits**:

* **Disk space efficiency**: Only create ~1-2% of possible combinations (e.g., 
  142 training + 142 validation out of 14,211 total)
* **Faster initialization**: Split generation completes in seconds instead of hours
* **Filesystem efficiency**: Avoid creating thousands of unused directories
* **Scalability**: Handle massive combination spaces (100K+ combinations)

Directory Structure
~~~~~~~~~~~~~~~~~~~

Each combination directory (created on-demand) has a standardized structure:

.. code-block:: text

   generated_combos/
   ├── combo_metadata.json                  # Metadata for all combinations
   ├── manifest.txt                         # List of all combination names
   ├── train_manifest.txt                   # Training combination names
   ├── val_manifest.txt                     # Validation combination names
   ├── test_manifest.txt                    # Test combination names
   ├── comb_0001_site2_1__site2_2/          # Created on-demand
   │   ├── info.py                          # System configuration
   │   ├── mapping.json                     # File renumbering mapping
   │   ├── msld_flat.py                     # Simulation script (copied)
   │   └── prep/
   │       ├── site2_sub1_pres.rtf
   │       ├── site2_sub1_frag.pdb
   │       ├── site2_sub2_pres.rtf
   │       ├── site2_sub2_frag.pdb
   │       ├── core.rtf
   │       ├── core.prm
   │       └── other_support_files...
   ├── comb_0262_site1_1__site1_2__site2_1__site2_2/  # Cross-site
   │   ├── info.py
   │   ├── mapping.json
   │   ├── msld_flat.py
   │   └── prep/
   │       ├── site1_sub1_pres.rtf          # Preserves site numbering
   │       ├── site1_sub2_pres.rtf
   │       ├── site2_sub1_pres.rtf          # Site 2 keeps site2_ prefix
   │       ├── site2_sub2_pres.rtf
   │       └── ...
   └── ...

**File Naming Convention**: The renaming preserves site identity:

* Files maintain their site number (``site1_*``, ``site2_*``, etc.)
* Substituents are renumbered sequentially within each site
* Original site/sub mapping is preserved in ``mapping.json``

Combination Metadata Files
~~~~~~~~~~~~~~~~~~~~~~~~~~

Each combination directory contains standardized metadata files:

**info.py**: System configuration loaded by simulation scripts

.. code-block:: python

   import numpy as np
   import os
   
   info = {}
   info['name'] = 'comb_0262_site1_1__site1_2__site2_1__site2_2'
   info['nsubs'] = [2, 2]              # Substituents per site [site1, site2]
   info['nblocks'] = np.sum(info['nsubs'])  # Total substituents (4)
   info['ncentral'] = 0                # Central replica for replica exchange
   info['nreps'] = 1                   # Number of replicas
   info['nnodes'] = 1                  # MPI nodes
   info['enginepath'] = os.environ.get('CHARMMEXEC', '')
   info['temp'] = 298.15               # Temperature in Kelvin

**mapping.json**: File renumbering information

.. code-block:: json

   [
     {
       "original": "/path/to/site1_sub2_pres.rtf",
       "new_name": "site1_sub1_pres.rtf",
       "original_site": 1,
       "original_sub": 2,
       "new_site": 1,
       "new_sub": 1
     },
     {
       "original": "/path/to/site2_sub5_pres.rtf",
       "new_name": "site2_sub1_pres.rtf",
       "original_site": 2,
       "original_sub": 5,
       "new_site": 2,
       "new_sub": 1
     }
   ]

This tracks how original fragment files were renumbered during combination
creation, enabling traceability back to source files.

Manifest Files
~~~~~~~~~~~~~~

Manifest files list combination names (one per line):

.. code-block:: text

   comb_0001_site2_1__site2_2
   comb_0002_site2_1__site2_3
   comb_0003_site2_1__site2_4
   comb_0075_site1_5__site1_1__site1_2
   comb_0262_site1_1__site1_2__site2_1__site2_2
   ...

Manifest files enable reproducible splits and batch operations. The full paths
are constructed by prepending the ``out_dir`` from the configuration:
``{out_dir}/{combo_name}``.

Graph Construction
------------------

During training, molecular graphs are constructed from combination directories to provide 
input for the policy network. Graphs are built from RTF topology files with DeepSet 
embeddings as node features, representing each substituent's 3D structure and chemistry 
as learned 64-dimensional vectors.

For complete details on graph construction, node features, edge expansion, and the 
RGCN/policy architecture, see :doc:`cb_setup`.

Training Pipeline
-----------------

System Configuration
~~~~~~~~~~~~~~~~~~~~

The ``system`` section specifies environment-level parameters that affect how molecular
structures are processed during training:

.. code-block:: yaml

   system:
     solvent_state: solv  # Environment type

**Solvent State**:

Specifies the simulation environment to determine which atoms are included as context 
during AEV computation for DeepSet embeddings:

* ``solv`` or ``solvent``: Includes core structure and nearby substituents from other sites (within 5.1 Å)
* ``gas`` or ``vacuum``: Includes core structure and nearby substituents (without solvent effects)
* ``protein``: Includes core structure, nearby substituents, AND nearby protein atoms (within 5.1 Å)

The environment type determines what molecular context the DeepSet encoder "sees" when 
computing atomic environment vectors. For protein systems, including nearby protein atoms 
in the AEV computation naturally encodes protein-specific interactions into the learned 
embeddings. See :doc:`deepset_pretraining` for technical details on context-aware AEV computation.

The solvent state is also preserved in ``graph_info.json`` for metadata tracking.

**Auto-Detection** (legacy):

Previously, the system attempted to auto-detect solvent state from directory names
(e.g., ``14benz_solv`` → ``solv``). This is now deprecated in favor of explicit 
configuration for clarity and reliability.

Reward Function
~~~~~~~~~~~~~~~

**Pretraining**

Before training begins, the policy can be pretrained using behavior cloning 
(supervised learning with MSE loss) to imitate successful bias coefficients from 
completed simulations. For complete details on pretraining loss, data organization, 
and transfer learning strategies, see :doc:`cb_pretraining`.

**Training Reward**

During training, the policy is optimized using REINFORCE with rewards computed from simulation trajectories.
The reward function prevents degenerate solutions (e.g., convergence to single-substituent states)
through multiple components:

.. math::

   R_{\text{total}} = coverage\_factor \times (R_P + R_T + R_{\text{entropy}}) + R_{\text{penalties}}

where :math:`coverage\_factor = \left(\frac{N_{\text{visited}}}{N_{\text{subs}}}\right)^2` is a smooth
quadratic multiplier that scales all positive reward components by coverage. At 100% coverage it is 1.0;
at 50% it is 0.25; at 0% it is 0.0. This replaces the earlier hard completeness gate
(which clipped all positive reward to −0.01 when any substituent was unvisited) with a
smooth gradient signal that rewards partial progress.

This eliminates :math:`R_U` (the explicit uniformity term) and the adaptive
coverage penalty :math:`P_{\text{cov}}` — both are now subsumed by
:math:`coverage\_factor`.

**Population Balance Reward** :math:`R_P`:

Encourages equal sampling across all substituents with balanced populations:

.. math::

   R_P = w_P \cdot \frac{\sum_{k \in \text{visited}} p_k}{P_{\text{baseline}}} \cdot C_F

where:

* :math:`w_P` is the population weight (default: 0.5)
* :math:`p_k` is the population count for visited substituent :math:`k`
* :math:`P_{\text{baseline}}` is the normalization constant (default: 500.0)
* :math:`C_F = \min(1.0, T_{\min} / (2 \times N_{\text{req}}))` is the confidence factor
* :math:`T_{\min}` is the minimum transitions across all sites
* :math:`N_{\text{req}}` is the minimum required transitions per site (default: 10)

The confidence factor scales population rewards based on data reliability, reducing
false rewards from low-transition runs with unreliable population distributions.
Within-visited uniformity is now captured entirely by :math:`R_{\text{entropy}}`
(see below) rather than by the balance factor :math:`e^{-CV}` which has been removed.

**Transition Reward** :math:`R_T`:

Rewards frequent transitions between substituents, with bonus for high transition counts:

.. math::

   R_T = \begin{cases}
   w_T \cdot \frac{\sum_{s=1}^{N_{\text{sites}}} T_s}{T_{\text{baseline}}} & \text{if all sites have } \geq \text{min_transitions_per_site} \\
   w_T \cdot \frac{\sum_{s=1}^{N_{\text{sites}}} T_s}{T_{\text{baseline}}} \times 1.5 & \text{if avg. trans/site} > 2 \times \text{min_transitions_per_site} \\
   0 & \text{otherwise (sites below threshold)}
   \end{cases}

where:

* :math:`w_T` is the transition weight (default: 0.75)
* :math:`T_s` is the transition count for site :math:`s`
* :math:`T_{\text{baseline}}` is the normalization constant (default: 50.0)
* The 1.5× bonus applies when average transitions per site exceeds 20 (2× the default minimum)

**Entropy Bonus** :math:`R_{\text{entropy}}`:

Rewards uniform population distributions using normalized Shannon entropy:

.. math::

   R_{\text{entropy}} = \alpha_{\text{entropy}} \cdot \frac{H(\mathbf{p})}{H_{\max}}

where :math:`H(\mathbf{p}) = -\sum_k \frac{p_k}{P_{\text{total}}} \log \frac{p_k}{P_{\text{total}}}` 
is Shannon entropy and :math:`H_{\max} = \log(N_{\text{subs}})` is maximum possible entropy.

**Tiered Transition Penalties** :math:`R_{\text{penalties}}`:

The penalty system uses three tiers based on the worst-performing site, with 
multi-site awareness to fairly handle systems with multiple λ-sites:

**Base Penalty** (determined by :math:`T_{\min}`, the minimum transitions across all sites):

.. math::

   P_{\text{base}} = \begin{cases}
   40.0 & \text{if } T_{\min} = 0 \quad \text{(Tier 1: "Death Floor")} \\
   32.0 & \text{if } T_{\min} = 1 \\
   24.0 & \text{if } T_{\min} = 2 \\
   2.0 + 2.0(N_{\text{req}} - T_{\min}) & \text{if } 3 \leq T_{\min} < N_{\text{req}} \quad \text{(Tier 2: "Climbing Ramp")} \\
   0.0 & \text{if } T_{\min} \geq N_{\text{req}} \quad \text{(Tier 3: "Success Zone")}
   \end{cases}

**Multi-Site Degradation** (incremental penalty for multiple failing sites):

.. math::

   P_{\text{trans}} = \begin{cases}
   P_{\text{base}} + 4.0(n_{\text{bad}} - 1) & \text{if } n_{\text{bad}} > 1 \\
   P_{\text{base}} & \text{if } n_{\text{bad}} = 1 \\
   0 & \text{if } n_{\text{bad}} = 0
   \end{cases}

where :math:`n_{\text{bad}} = |\{s : T_s < N_{\text{req}}\}|` counts sites below threshold.


**Concentration Penalty** (per-site check for single-substituent dominance):

.. math::

   P_{\text{conc}} = \sum_{s=1}^{N_{\text{sites}}} \mathbb{1}\left[\frac{\max_k p_{s,k}}{\sum_k p_{s,k}} > 0.8\right] \cdot \gamma \cdot 5.0 \cdot \left(\frac{\max_k p_{s,k}}{\sum_k p_{s,k}} - 0.8\right)

Total penalties are summed and clamped: :math:`R_{\text{penalties}} = -\min(60.0, P_{\text{trans}} + P_{\text{conc}})`

**Default Hyperparameters**:

.. code-block:: yaml

   reward:
     w_P: 0.5                                # Population weight
     w_T: 0.75                               # Transition weight
     w_U: 0.3                                # Accepted for API compatibility; coverage handled by coverage_factor
     gamma: 4.0                              # Base penalty coefficient
     P_baseline: 500.0                       # Population normalization
     T_baseline: 50.0                        # Transition normalization
     min_transitions_per_site: 10            # Tier 3 threshold
     min_coverage_ratio: 0.5                 # Accepted for API compatibility; coverage handled by coverage_factor
     entropy_bonus: 8.0                      # Entropy bonus coefficient
     concentration_penalty_threshold: 0.8    # Single-substituent dominance threshold

**Policy Gradient Training**:

The policy is optimized using an **Actor-Critic** architecture where the policy network 
(actor) predicts bias coefficients and a value network (critic) provides state-dependent 
baselines for variance reduction. This approach prevents catastrophic forgetting of 
pretrained weights and enables more stable learning.

For architectural details on the RGCN encoder, policy network, and value network, see :doc:`cb_setup`.

Simulation Execution
--------------------

Launching Simulations
~~~~~~~~~~~~~~~~~~~~~

Simulations are launched via subprocess, running CHARMM with bias coefficients 
written to ``variables.py`` from the policy's sampled actions. The simulator 
outputs transition counts and population distributions for reward computation.

Output Parsing
~~~~~~~~~~~~~~

After simulation completes, the framework parses ``output.txt`` from the output 
directory to extract:

* Total transitions per site :math:`T_s` for each λ-site
* Per-substituent populations :math:`p_{s,k}` at each site
* Coverage ratio (fraction of substituents visited)
* Per-site concentration (maximum population fraction at each site)

These metrics feed directly into the reward function components described 
in the Reward Function section above.

.. _Curriculum Learning:

Curriculum Learning
-------------------


**Curriculum learning** progressively trains the policy on increasingly complex
combinations, similar to how students learn from simple to complex problems.
Instead of training on all possible combinations at once, the policy masters
simpler tasks before advancing to harder ones.

Why Curriculum Learning for MSLD
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MSLD bias coefficient optimization has a natural difficulty hierarchy:

**Easy**: Single-site pairs (2 substituents, 1 site)

* Simplest edge interactions to learn
* Clear cause-and-effect relationships
* Provides foundation for pairwise biases

**Medium**: Single-site triplets (3 substituents, 1 site)

* Introduces crowding/density effects
* More complex interaction patterns
* Tests generalization from pairs

**Hard**: Multi-site combinations (2+ sites with multiple substituents each)

* Cross-site interaction effects
* Exponentially larger search space
* Requires composition of learned patterns

Training directly on hard combinations often fails because:

* Reward signals are noisy and unclear
* Policy has no foundation to build upon
* Pretrained weights get overwhelmed by complex gradients

Curriculum learning solves this by building skills incrementally.

Configuration
~~~~~~~~~~~~~

Enable curriculum learning in your workflow YAML:

.. code-block:: yaml

   curriculum:
     enabled: true
     max_train_combos_per_stage: 100  # Optional: limit combinations per stage
     
     stages:
       # Stage 1: Pairs at single sites
       - name: pairs_single_site_easy
         min_subs: 2
         max_subs: 2
         min_sites: 1
         max_sites: 1
         epochs: 50
         
       # Stage 2: Triplets at single sites
       - name: triplets_single_site
         min_subs: 3
         max_subs: 3
         min_sites: 1
         max_sites: 1
         epochs: 50
         
       # Stage 3: Cross-site combinations
       - name: pairs_two_sites
         min_subs: 4  # 2 per site
         max_subs: 4
         min_sites: 2
         max_sites: 2
         epochs: 50
     
     # Progression criteria
     progression:
       type: epoch  # Advance after completing stage epochs

Stage Configuration
~~~~~~~~~~~~~~~~~~~

Each stage specifies:

**Combination Filters**:

* ``min_subs``, ``max_subs``: Total substituents in combination
* ``min_sites``, ``max_sites``: Number of sites represented

**Training Duration**:

* ``epochs``: Number of training epochs for this stage

**Optional Settings**:

* ``max_train_combos``: Stage-specific limit on training combinations (overrides global setting)
* ``reward_override``: Modify reward weights for this stage (e.g., emphasize transitions early)


Combination Selection
~~~~~~~~~~~~~~~~~~~~~

**Filtering Process**:

For each stage, the workflow:

1. Filters all training combinations by stage criteria (min/max subs/sites)
2. If filtered count exceeds ``max_train_combos_per_stage``, randomly selects subset
3. Uses reproducible random selection (seeded by ``split.seed + stage_index``)

**Important**: Random selection is uniform across all matching combinations.
If a stage allows both pairs (2 subs) and triplets (3 subs) via ``min_subs: 2,
max_subs: 3``, the 100 selected combinations will be a random mix with no
preference for either size.

**Reproducibility**: Same seed produces same combination selection across runs.

Progression Criteria
~~~~~~~~~~~~~~~~~~~~

Stages advance based on progression criteria:

**Epoch-based** (default):

.. code-block:: yaml

   progression:
     type: epoch

Advances after completing the specified number of epochs for current stage.

**Reward-based** (experimental):

.. code-block:: yaml

   progression:
     type: reward
     reward_threshold: 10.0  # Minimum average reward to advance

Advances only if average reward over last 5 epochs exceeds threshold.

**Combined**:

.. code-block:: yaml

   progression:
     type: both
     reward_threshold: 10.0

Must complete all epochs AND meet reward threshold.

Training Flow Example
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   === Training with Curriculum ===
   
   Stage 1: pairs_single_site_easy (epochs 1-50)
   ├── Filtered: 41 combinations (2 subs, 1 site)
   ├── Training on all 41 combinations
   └── Epoch 50 completes → Advance to Stage 2
   
   Stage 2: triplets_single_site (epochs 51-100)
   ├── Filtered: 186 combinations (3 subs, 1 site)
   ├── Limited to 100 random combinations
   └── Epoch 100 completes → Advance to Stage 3
   
   Stage 3: pairs_two_sites (epochs 101-150)
   ├── Filtered: 1,681 combinations (4 subs, 2 sites)
   ├── Limited to 100 random combinations
   └── Epoch 150 completes → Training complete

**Training Output**:

.. code-block:: text

   === Starting Stage 1/3: pairs_single_site_easy ===
   Filtered to 41 training combinations for this stage
   
   --- Epoch 1/150 - Stage 1/3: pairs_single_site_easy (epoch 1/50) ---
   Epoch 1 Stats:
     Loss: 12.3456
     Value Loss: 45.6789
     Avg Reward: -28.5432
   
   [... epochs 2-50 ...]
   
   ============================================================
   === Advancing to Stage 2/3: triplets_single_site ===
   ============================================================
   Filtered to 186 training combinations for this stage
   Limiting to 100 random training combos (from 186 available)
   
   --- Epoch 51/150 - Stage 2/3: triplets_single_site (epoch 1/50) ---

Stage-Specific Reward Tuning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Advanced users can override reward parameters per stage:

.. code-block:: yaml

   stages:
     - name: pairs_single_site_easy
       min_subs: 2
       max_subs: 2
       min_sites: 1
       max_sites: 1
       epochs: 50
       reward_override:
         w_T: 0.9              # Emphasize transitions early
         min_transitions_per_site: 5  # Lower threshold for easier combinations

This allows fine-tuning the reward function to match stage difficulty.


Checkpointing and Resume
-------------------------

Long-running training jobs (e.g., 50 epochs) can be interrupted by SLURM time
limits, system maintenance, or manual cancellation. The workflow implements
two-level checkpointing to enable automatic resume without losing progress.

Configuration
~~~~~~~~~~~~~

Enable checkpointing in your workflow YAML:

.. code-block:: yaml

   output:
     base_dir: /path/to/training_output
     save_checkpoints: true    # Enable checkpoint saving
     checkpoint_freq: 5         # Save every N epochs

Training-Level Checkpoints
~~~~~~~~~~~~~~~~~~~~~~~~~~

**Location**: ``{base_dir}/checkpoint_epoch_XXX.pt``

Saved every ``checkpoint_freq`` epochs, containing:

* ``epoch``: Completed epoch number
* ``encoder_state``: Full RGCN encoder state dict
* ``policy_state``: Full edge policy state dict  
* ``optimizer_state``: Optimizer state (momentum, learning rates, etc.)
* ``stats``: Training statistics (loss, average reward)

Automatic Resume
~~~~~~~~~~~~~~~~

When training restarts, the workflow:

1. Scans for ``checkpoint_epoch_*.pt`` files
2. Loads the latest checkpoint (highest epoch number)
3. Restores model and optimizer state
4. Continues from the next epoch

For each combination in each epoch:

1. Checks for ``epoch_results.pt`` in the combination's directory
2. If found: loads cached reward/actions/logp, skips simulation
3. If not found: runs simulation, computes reward, saves checkpoint

Archiving Combinations
-----------------------

Combination directories can be automatically archived to save disk space using
two strategies: **per-stage archiving** (during curriculum training) or
**post-training archiving** (after all training completes). Each combination
directory is compressed into a ``.tar.gz`` file, optionally removing the original.

Configuration
~~~~~~~~~~~~~

Enable archiving in your workflow YAML:

.. code-block:: yaml

   archive:
     enabled: true               # Enable archiving
     per_stage: true             # Archive after each curriculum stage (or false for post-training)
     pattern: 'comb_*'           # Glob pattern for directories to archive (post-training only)
     remove_after: false         # Remove originals after successful archiving
     archive_dir: /path/to/archives  # Where to store .tar.gz files

Per-Stage Archiving (Curriculum Training)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Best for**: Long curriculum training runs where disk space is limited.

When ``per_stage: true``, the workflow archives combinations at the end of each
curriculum stage **in the background** while the next stage's simulations begin.
This provides:

* **Immediate space recovery**: Free up disk as soon as each stage completes
* **No training delays**: Archiving runs concurrently with next stage setup
* **Stage-specific organization**: Each stage gets its own archive directory

**Behavior**:

1. After a curriculum stage completes (e.g., after epoch 50 of stage 1)
2. Archive job launches in background (bash script with tar commands)
3. Next stage begins immediately (simulations submit while archiving runs)
4. After training completes, workflow waits for any remaining archive jobs

**Configuration Example**:

.. code-block:: yaml

   curriculum:
     enabled: true
     stages:
       - name: pairs_single_site_easy
         min_subs: 2
         max_subs: 2
         epochs: 50
       - name: pairs_single_site_full
         min_subs: 2
         max_subs: 2
         epochs: 50
   
   archive:
     enabled: true
     per_stage: true              # Archive after each stage
     remove_after: false
     archive_dir: /path/to/archives

**Timeline**:

.. code-block:: text

   Epoch 1-50 (Stage 1) → Stage 1 completes → Archive job starts in background
                                             ↓
   Epoch 51 begins (Stage 2) ← Simulations submit while Stage 1 archives
   
   Epoch 51-100 (Stage 2) → Stage 2 completes → Archive job starts in background
                                               ↓
   Epoch 101 begins (Stage 3) ← Stage 2 continues archiving in background

Post-Training Archiving
~~~~~~~~~~~~~~~~~~~~~~~~

**Best for**: Non-curriculum training or when you want to keep all data until
the end.

When ``per_stage: false`` (or not specified), the workflow archives combinations
once after all training completes.

**Behavior**:

1. After training completes successfully, all directories matching ``pattern``
   are compressed into individual ``.tar.gz`` archives
2. Archives are moved to ``archive_dir`` (if different from source)
3. Original directories are removed if ``remove_after`` is ``true``


**Configuration Example**:

.. code-block:: yaml

   archive:
     enabled: true
     per_stage: false             # Archive once at the end (default)
     pattern: 'comb_*'            # Directories to archive
     remove_after: false
     archive_dir: /path/to/archives


Manual Archiving
~~~~~~~~~~~~~~~~

You can also archive combinations manually:

.. code-block:: python

   from mllf.file_handling.generate_combinations import archive_combo_dirs
   from pathlib import Path
   
   # Archive all comb_* directories
   archived = archive_combo_dirs(
       out_dir=Path('generated_combos'),
       pattern='comb_*',
       remove=False  # Keep originals
   )
   
   print(f"Created {len(archived)} archive files")


Complete Workflow Example
--------------------------

Full Pipeline Script
~~~~~~~~~~~~~~~~~~~~

The main training workflow is implemented in ``examples/run_workflow_deepset.py``:

.. code-block:: bash

   cd examples
   python run_workflow_deepset.py workflow_14benz.yaml

This executes:

1. Combination generation (if ``create_combos`` specified)
2. Train/val/test split based on ``split`` configuration
3. Model initialization (RGCN encoder + edge policy)
4. Checkpoint detection and resume (if checkpoints exist)
5. Training loop with SLURM job submission
6. Checkpoint saving at ``checkpoint_freq`` intervals
7. Archiving combinations (if ``archive.enabled`` is true)

.. _Complete Configuration Example:

Configuration File
~~~~~~~~~~~~~~~~~~

A complete workflow configuration (``workflow_14benz.yaml``) includes:

.. code-block:: yaml

   # System environment
   system:
     solvent_state: solv
   
   # Generate combinations
   create_combos:
     input_dir: /path/to/14benz
     out_dir: /path/to/generated_combos
     include_patterns: [msld_flat.py]
   
   # Data splitting
   split:
     train_frac: 0.9
     val_frac: 0.1
     seed: 42
   
   # Pretraining (optional but recommended)
   pretrain:
     model_path: models/pretrained_policy.pt
   
   # Curriculum learning
   curriculum:
     enabled: true
     max_train_combos_per_stage: 100
     stages:
       - name: pairs_single_site
         min_subs: 2
         max_subs: 2
         epochs: 50
       - name: triplets_single_site
         min_subs: 3
         max_subs: 3
         epochs: 50
     progression:
       type: epoch
   
   # Model architecture
   training:
     encoder:
       hidden_dims: [64, 64]
       out_dim: 32
     policy:
       mlp_hidden: 64
     value_network:
       hidden_dims: [64, 32]
       lr: 0.001
     optimizer:
       lr: 0.0001
   
   # Simulation settings
   run_sims: true
   max_concurrent_jobs: 60
   timeout: 1200
   
   # Reward function
   reward:
     w_P: 0.5
     w_T: 0.75
     w_U: 0.3
     gamma: 4.0
     lambda_entropy: 0.5
   
   # Checkpointing
   output:
     base_dir: /path/to/training_output
     save_checkpoints: true
     checkpoint_freq: 5
   
   # Per-stage archiving
   archive:
     enabled: true
     per_stage: true
     archive_dir: /path/to/archives


See Also
~~~~~~~~

* :doc:`file_handling` - File format documentation and parsers
* :doc:`cb_setup` - CB infrastructure and policy architecture
* :doc:`deepset_pretraining` - DeepSet pretraining for node embeddings
* :doc:`cb_pretraining` - Behavior cloning from expert coefficients
* :doc:`examples` - Example workflows and usage patterns
* :doc:`api` - API reference for workflow modules