Workflow System
Overview
The workflow system automates the complete pipeline for multisite λ-dynamics simulations with contextual bandit training:
Combination Generation: Create all valid site/substituent combinations
Splitting: Divide combinations into training/validation/test sets
Training: Train CB policy on graph structures
Simulation: Run MD simulations with optimized bias coefficients
Compression: Archive simulation outputs for storage
Running the Workflow
Basic Usage
The workflow is driven by a YAML configuration file:
python -m mllf.cli.workflow --config examples/workflow_14benz.yaml
Or using the convenience wrapper:
cd examples
python run_workflow_deepset.py workflow_14benz.yaml
python run_workflow_deepset.py my_config.yaml # Use custom config
Configuration Format
A workflow config is a YAML file specifying which operations to run and their parameters. Key sections include:
system: Environment type (solvent, gas, protein)
create_combos: Generate combinations from fragment files
split: Divide combinations into train/val/test sets
pretrain: Optional pretraining from existing simulations
curriculum: Progressive training stages (see Curriculum Learning)
training: Model architecture and hyperparameters
reward: Reward function weights and thresholds
output: Checkpointing and output organization
archive: Automatic compression of completed runs
See the Configuration File for a full annotated YAML file.
Combination Generation
Principles
Combinations are generated from site/substituent fragment files:
Input files:
site{N}_sub{M}_{label}.{rtf,pdb}files in the input directorySites: Identified by the site number (N)
Substituents: Identified by the sub number (M) within each site
Warning
Minimum Substituents Required: Each site must have at least 2 substituents. MSLD simulations
will not run correctly with only a single substituent at a site. If any site has only 1 substituent,
combination generation will fail with an error. To resolve this, either add more substituents to the site or
add the site information to your core structure files (e.g., core.pdb and core.rtf if using
msld-py-prep).
The generator creates two types of combinations:
Within-site combinations: Multiple substituents from a single site
Cross-site combinations: Substituents from multiple sites simultaneously
Note
Combination Size Limit: By default, each combination is limited to at most 10 substituents
per site (max_subs_per_site=10). This prevents combinatorial explosion while still allowing
all substituents to participate across different combinations. For example, with 50 substituents
at a site, the generator will create combinations like [1,2,...,10], [1,2,...,9,11], etc.,
but not [1,2,...,11]. This limit can be increased via the --max-subs command-line option
or the max_subs_per_site parameter in the API.
Lazy Directory Creation
For systems with large combination spaces (e.g., 14,211 total combinations), creating all directories upfront is inefficient—most will never be used in training. The workflow implements lazy (on-demand) directory creation:
Metadata Generation: During the combination generation phase, the system:
Lists all possible combinations without creating directories
Saves metadata to
combo_metadata.jsonwith:Combination name (e.g.,
comb_0001_site2_1__site2_2)Path where directory will be created
Sites and substituents included
Counter for ordering
Writes manifest files listing all possible combinations
On-Demand Creation: Directories are created only when needed:
During training/validation splits, combinations are selected but not created
When a combination is accessed for training, the workflow:
Checks if the directory exists
If not, loads metadata from
combo_metadata.jsonCreates the directory with all required files
Continues with training
Benefits:
Disk space efficiency: Only create ~1-2% of possible combinations (e.g., 142 training + 142 validation out of 14,211 total)
Faster initialization: Split generation completes in seconds instead of hours
Filesystem efficiency: Avoid creating thousands of unused directories
Scalability: Handle massive combination spaces (100K+ combinations)
Directory Structure
Each combination directory (created on-demand) has a standardized structure:
generated_combos/
├── combo_metadata.json # Metadata for all combinations
├── manifest.txt # List of all combination names
├── train_manifest.txt # Training combination names
├── val_manifest.txt # Validation combination names
├── test_manifest.txt # Test combination names
├── comb_0001_site2_1__site2_2/ # Created on-demand
│ ├── info.py # System configuration
│ ├── mapping.json # File renumbering mapping
│ ├── msld_flat.py # Simulation script (copied)
│ └── prep/
│ ├── site2_sub1_pres.rtf
│ ├── site2_sub1_frag.pdb
│ ├── site2_sub2_pres.rtf
│ ├── site2_sub2_frag.pdb
│ ├── core.rtf
│ ├── core.prm
│ └── other_support_files...
├── comb_0262_site1_1__site1_2__site2_1__site2_2/ # Cross-site
│ ├── info.py
│ ├── mapping.json
│ ├── msld_flat.py
│ └── prep/
│ ├── site1_sub1_pres.rtf # Preserves site numbering
│ ├── site1_sub2_pres.rtf
│ ├── site2_sub1_pres.rtf # Site 2 keeps site2_ prefix
│ ├── site2_sub2_pres.rtf
│ └── ...
└── ...
File Naming Convention: The renaming preserves site identity:
Files maintain their site number (
site1_*,site2_*, etc.)Substituents are renumbered sequentially within each site
Original site/sub mapping is preserved in
mapping.json
Combination Metadata Files
Each combination directory contains standardized metadata files:
info.py: System configuration loaded by simulation scripts
import numpy as np
import os
info = {}
info['name'] = 'comb_0262_site1_1__site1_2__site2_1__site2_2'
info['nsubs'] = [2, 2] # Substituents per site [site1, site2]
info['nblocks'] = np.sum(info['nsubs']) # Total substituents (4)
info['ncentral'] = 0 # Central replica for replica exchange
info['nreps'] = 1 # Number of replicas
info['nnodes'] = 1 # MPI nodes
info['enginepath'] = os.environ.get('CHARMMEXEC', '')
info['temp'] = 298.15 # Temperature in Kelvin
mapping.json: File renumbering information
[
{
"original": "/path/to/site1_sub2_pres.rtf",
"new_name": "site1_sub1_pres.rtf",
"original_site": 1,
"original_sub": 2,
"new_site": 1,
"new_sub": 1
},
{
"original": "/path/to/site2_sub5_pres.rtf",
"new_name": "site2_sub1_pres.rtf",
"original_site": 2,
"original_sub": 5,
"new_site": 2,
"new_sub": 1
}
]
This tracks how original fragment files were renumbered during combination creation, enabling traceability back to source files.
Manifest Files
Manifest files list combination names (one per line):
comb_0001_site2_1__site2_2
comb_0002_site2_1__site2_3
comb_0003_site2_1__site2_4
comb_0075_site1_5__site1_1__site1_2
comb_0262_site1_1__site1_2__site2_1__site2_2
...
Manifest files enable reproducible splits and batch operations. The full paths
are constructed by prepending the out_dir from the configuration:
{out_dir}/{combo_name}.
Graph Construction
During training, molecular graphs are constructed from combination directories to provide input for the policy network. Graphs are built from RTF topology files with DeepSet embeddings as node features, representing each substituent’s 3D structure and chemistry as learned 64-dimensional vectors.
For complete details on graph construction, node features, edge expansion, and the RGCN/policy architecture, see Contextual Bandit Setup.
Training Pipeline
System Configuration
The system section specifies environment-level parameters that affect how molecular
structures are processed during training:
system:
solvent_state: solv # Environment type
Solvent State:
Specifies the simulation environment to determine which atoms are included as context during AEV computation for DeepSet embeddings:
solvorsolvent: Includes core structure and nearby substituents from other sites (within 5.1 Å)gasorvacuum: Includes core structure and nearby substituents (without solvent effects)protein: Includes core structure, nearby substituents, AND nearby protein atoms (within 5.1 Å)
The environment type determines what molecular context the DeepSet encoder “sees” when computing atomic environment vectors. For protein systems, including nearby protein atoms in the AEV computation naturally encodes protein-specific interactions into the learned embeddings. See AtomBondGNN Pretraining for technical details on context-aware AEV computation.
The solvent state is also preserved in graph_info.json for metadata tracking.
Auto-Detection (legacy):
Previously, the system attempted to auto-detect solvent state from directory names
(e.g., 14benz_solv → solv). This is now deprecated in favor of explicit
configuration for clarity and reliability.
Reward Function
Pretraining
Before training begins, the policy can be pretrained using behavior cloning (supervised learning with MSE loss) to imitate successful bias coefficients from completed simulations. For complete details on pretraining loss, data organization, and transfer learning strategies, see CB Behavior Cloning.
Training Reward
During training, the policy is optimized using REINFORCE with rewards computed from simulation trajectories. The reward function prevents degenerate solutions (e.g., convergence to single-substituent states) through multiple components:
where \(coverage\_factor = \left(\frac{N_{\text{visited}}}{N_{\text{subs}}}\right)^2\) is a smooth quadratic multiplier that scales all positive reward components by coverage. At 100% coverage it is 1.0; at 50% it is 0.25; at 0% it is 0.0. This replaces the earlier hard completeness gate (which clipped all positive reward to −0.01 when any substituent was unvisited) with a smooth gradient signal that rewards partial progress.
This eliminates \(R_U\) (the explicit uniformity term) and the adaptive coverage penalty \(P_{\text{cov}}\) — both are now subsumed by \(coverage\_factor\).
Population Balance Reward \(R_P\):
Encourages equal sampling across all substituents with balanced populations:
where:
\(w_P\) is the population weight (default: 0.5)
\(p_k\) is the population count for visited substituent \(k\)
\(P_{\text{baseline}}\) is the normalization constant (default: 500.0)
\(C_F = \min(1.0, T_{\min} / (2 \times N_{\text{req}}))\) is the confidence factor
\(T_{\min}\) is the minimum transitions across all sites
\(N_{\text{req}}\) is the minimum required transitions per site (default: 10)
The confidence factor scales population rewards based on data reliability, reducing false rewards from low-transition runs with unreliable population distributions. Within-visited uniformity is now captured entirely by \(R_{\text{entropy}}\) (see below) rather than by the balance factor \(e^{-CV}\) which has been removed.
Transition Reward \(R_T\):
Rewards frequent transitions between substituents, with bonus for high transition counts:
where:
\(w_T\) is the transition weight (default: 0.75)
\(T_s\) is the transition count for site \(s\)
\(T_{\text{baseline}}\) is the normalization constant (default: 50.0)
The 1.5× bonus applies when average transitions per site exceeds 20 (2× the default minimum)
Entropy Bonus \(R_{\text{entropy}}\):
Rewards uniform population distributions using normalized Shannon entropy:
where \(H(\mathbf{p}) = -\sum_k \frac{p_k}{P_{\text{total}}} \log \frac{p_k}{P_{\text{total}}}\) is Shannon entropy and \(H_{\max} = \log(N_{\text{subs}})\) is maximum possible entropy.
Tiered Transition Penalties \(R_{\text{penalties}}\):
The penalty system uses three tiers based on the worst-performing site, with multi-site awareness to fairly handle systems with multiple λ-sites:
Base Penalty (determined by \(T_{\min}\), the minimum transitions across all sites):
Multi-Site Degradation (incremental penalty for multiple failing sites):
where \(n_{\text{bad}} = |\{s : T_s < N_{\text{req}}\}|\) counts sites below threshold.
Concentration Penalty (per-site check for single-substituent dominance):
Total penalties are summed and clamped: \(R_{\text{penalties}} = -\min(60.0, P_{\text{trans}} + P_{\text{conc}})\)
Default Hyperparameters:
reward:
w_P: 0.5 # Population weight
w_T: 0.75 # Transition weight
w_U: 0.3 # Accepted for API compatibility; coverage handled by coverage_factor
gamma: 4.0 # Base penalty coefficient
P_baseline: 500.0 # Population normalization
T_baseline: 50.0 # Transition normalization
min_transitions_per_site: 10 # Tier 3 threshold
min_coverage_ratio: 0.5 # Accepted for API compatibility; coverage handled by coverage_factor
entropy_bonus: 8.0 # Entropy bonus coefficient
concentration_penalty_threshold: 0.8 # Single-substituent dominance threshold
Policy Gradient Training:
The policy is optimized using an Actor-Critic architecture where the policy network (actor) predicts bias coefficients and a value network (critic) provides state-dependent baselines for variance reduction. This approach prevents catastrophic forgetting of pretrained weights and enables more stable learning.
For architectural details on the RGCN encoder, policy network, and value network, see Contextual Bandit Setup.
Simulation Execution
Launching Simulations
Simulations are launched via subprocess, running CHARMM with bias coefficients
written to variables.py from the policy’s sampled actions. The simulator
outputs transition counts and population distributions for reward computation.
Output Parsing
After simulation completes, the framework parses output.txt from the output
directory to extract:
Total transitions per site \(T_s\) for each λ-site
Per-substituent populations \(p_{s,k}\) at each site
Coverage ratio (fraction of substituents visited)
Per-site concentration (maximum population fraction at each site)
These metrics feed directly into the reward function components described in the Reward Function section above.
Curriculum Learning
Curriculum learning progressively trains the policy on increasingly complex combinations, similar to how students learn from simple to complex problems. Instead of training on all possible combinations at once, the policy masters simpler tasks before advancing to harder ones.
Why Curriculum Learning for MSLD
MSLD bias coefficient optimization has a natural difficulty hierarchy:
Easy: Single-site pairs (2 substituents, 1 site)
Simplest edge interactions to learn
Clear cause-and-effect relationships
Provides foundation for pairwise biases
Medium: Single-site triplets (3 substituents, 1 site)
Introduces crowding/density effects
More complex interaction patterns
Tests generalization from pairs
Hard: Multi-site combinations (2+ sites with multiple substituents each)
Cross-site interaction effects
Exponentially larger search space
Requires composition of learned patterns
Training directly on hard combinations often fails because:
Reward signals are noisy and unclear
Policy has no foundation to build upon
Pretrained weights get overwhelmed by complex gradients
Curriculum learning solves this by building skills incrementally.
Configuration
Enable curriculum learning in your workflow YAML:
curriculum:
enabled: true
max_train_combos_per_stage: 100 # Optional: limit combinations per stage
stages:
# Stage 1: Pairs at single sites
- name: pairs_single_site_easy
min_subs: 2
max_subs: 2
min_sites: 1
max_sites: 1
epochs: 50
# Stage 2: Triplets at single sites
- name: triplets_single_site
min_subs: 3
max_subs: 3
min_sites: 1
max_sites: 1
epochs: 50
# Stage 3: Cross-site combinations
- name: pairs_two_sites
min_subs: 4 # 2 per site
max_subs: 4
min_sites: 2
max_sites: 2
epochs: 50
# Progression criteria
progression:
type: epoch # Advance after completing stage epochs
Stage Configuration
Each stage specifies:
Combination Filters:
min_subs,max_subs: Total substituents in combinationmin_sites,max_sites: Number of sites represented
Training Duration:
epochs: Number of training epochs for this stage
Optional Settings:
max_train_combos: Stage-specific limit on training combinations (overrides global setting)reward_override: Modify reward weights for this stage (e.g., emphasize transitions early)
Combination Selection
Filtering Process:
For each stage, the workflow:
Filters all training combinations by stage criteria (min/max subs/sites)
If filtered count exceeds
max_train_combos_per_stage, randomly selects subsetUses reproducible random selection (seeded by
split.seed + stage_index)
Important: Random selection is uniform across all matching combinations.
If a stage allows both pairs (2 subs) and triplets (3 subs) via min_subs: 2,
max_subs: 3, the 100 selected combinations will be a random mix with no
preference for either size.
Reproducibility: Same seed produces same combination selection across runs.
Progression Criteria
Stages advance based on progression criteria:
Epoch-based (default):
progression:
type: epoch
Advances after completing the specified number of epochs for current stage.
Reward-based (experimental):
progression:
type: reward
reward_threshold: 10.0 # Minimum average reward to advance
Advances only if average reward over last 5 epochs exceeds threshold.
Combined:
progression:
type: both
reward_threshold: 10.0
Must complete all epochs AND meet reward threshold.
Training Flow Example
=== Training with Curriculum ===
Stage 1: pairs_single_site_easy (epochs 1-50)
├── Filtered: 41 combinations (2 subs, 1 site)
├── Training on all 41 combinations
└── Epoch 50 completes → Advance to Stage 2
Stage 2: triplets_single_site (epochs 51-100)
├── Filtered: 186 combinations (3 subs, 1 site)
├── Limited to 100 random combinations
└── Epoch 100 completes → Advance to Stage 3
Stage 3: pairs_two_sites (epochs 101-150)
├── Filtered: 1,681 combinations (4 subs, 2 sites)
├── Limited to 100 random combinations
└── Epoch 150 completes → Training complete
Training Output:
=== Starting Stage 1/3: pairs_single_site_easy ===
Filtered to 41 training combinations for this stage
--- Epoch 1/150 - Stage 1/3: pairs_single_site_easy (epoch 1/50) ---
Epoch 1 Stats:
Loss: 12.3456
Value Loss: 45.6789
Avg Reward: -28.5432
[... epochs 2-50 ...]
============================================================
=== Advancing to Stage 2/3: triplets_single_site ===
============================================================
Filtered to 186 training combinations for this stage
Limiting to 100 random training combos (from 186 available)
--- Epoch 51/150 - Stage 2/3: triplets_single_site (epoch 1/50) ---
Stage-Specific Reward Tuning
Advanced users can override reward parameters per stage:
stages:
- name: pairs_single_site_easy
min_subs: 2
max_subs: 2
min_sites: 1
max_sites: 1
epochs: 50
reward_override:
w_T: 0.9 # Emphasize transitions early
min_transitions_per_site: 5 # Lower threshold for easier combinations
This allows fine-tuning the reward function to match stage difficulty.
Checkpointing and Resume
Long-running training jobs (e.g., 50 epochs) can be interrupted by SLURM time limits, system maintenance, or manual cancellation. The workflow implements two-level checkpointing to enable automatic resume without losing progress.
Configuration
Enable checkpointing in your workflow YAML:
output:
base_dir: /path/to/training_output
save_checkpoints: true # Enable checkpoint saving
checkpoint_freq: 5 # Save every N epochs
Training-Level Checkpoints
Location: {base_dir}/checkpoint_epoch_XXX.pt
Saved every checkpoint_freq epochs, containing:
epoch: Completed epoch numberencoder_state: Full RGCN encoder state dictpolicy_state: Full edge policy state dictoptimizer_state: Optimizer state (momentum, learning rates, etc.)stats: Training statistics (loss, average reward)
Automatic Resume
When training restarts, the workflow:
Scans for
checkpoint_epoch_*.ptfilesLoads the latest checkpoint (highest epoch number)
Restores model and optimizer state
Continues from the next epoch
For each combination in each epoch:
Checks for
epoch_results.ptin the combination’s directoryIf found: loads cached reward/actions/logp, skips simulation
If not found: runs simulation, computes reward, saves checkpoint
Archiving Combinations
Combination directories can be automatically archived to save disk space using
two strategies: per-stage archiving (during curriculum training) or
post-training archiving (after all training completes). Each combination
directory is compressed into a .tar.gz file, optionally removing the original.
Configuration
Enable archiving in your workflow YAML:
archive:
enabled: true # Enable archiving
per_stage: true # Archive after each curriculum stage (or false for post-training)
pattern: 'comb_*' # Glob pattern for directories to archive (post-training only)
remove_after: false # Remove originals after successful archiving
archive_dir: /path/to/archives # Where to store .tar.gz files
Per-Stage Archiving (Curriculum Training)
Best for: Long curriculum training runs where disk space is limited.
When per_stage: true, the workflow archives combinations at the end of each
curriculum stage in the background while the next stage’s simulations begin.
This provides:
Immediate space recovery: Free up disk as soon as each stage completes
No training delays: Archiving runs concurrently with next stage setup
Stage-specific organization: Each stage gets its own archive directory
Behavior:
After a curriculum stage completes (e.g., after epoch 50 of stage 1)
Archive job launches in background (bash script with tar commands)
Next stage begins immediately (simulations submit while archiving runs)
After training completes, workflow waits for any remaining archive jobs
Configuration Example:
curriculum:
enabled: true
stages:
- name: pairs_single_site_easy
min_subs: 2
max_subs: 2
epochs: 50
- name: pairs_single_site_full
min_subs: 2
max_subs: 2
epochs: 50
archive:
enabled: true
per_stage: true # Archive after each stage
remove_after: false
archive_dir: /path/to/archives
Timeline:
Epoch 1-50 (Stage 1) → Stage 1 completes → Archive job starts in background
↓
Epoch 51 begins (Stage 2) ← Simulations submit while Stage 1 archives
Epoch 51-100 (Stage 2) → Stage 2 completes → Archive job starts in background
↓
Epoch 101 begins (Stage 3) ← Stage 2 continues archiving in background
Post-Training Archiving
Best for: Non-curriculum training or when you want to keep all data until the end.
When per_stage: false (or not specified), the workflow archives combinations
once after all training completes.
Behavior:
After training completes successfully, all directories matching
patternare compressed into individual.tar.gzarchivesArchives are moved to
archive_dir(if different from source)Original directories are removed if
remove_afteristrue
Configuration Example:
archive:
enabled: true
per_stage: false # Archive once at the end (default)
pattern: 'comb_*' # Directories to archive
remove_after: false
archive_dir: /path/to/archives
Manual Archiving
You can also archive combinations manually:
from mllf.file_handling.generate_combinations import archive_combo_dirs
from pathlib import Path
# Archive all comb_* directories
archived = archive_combo_dirs(
out_dir=Path('generated_combos'),
pattern='comb_*',
remove=False # Keep originals
)
print(f"Created {len(archived)} archive files")
Complete Workflow Example
Full Pipeline Script
The main training workflow is implemented in examples/run_workflow_deepset.py:
cd examples
python run_workflow_deepset.py workflow_14benz.yaml
This executes:
Combination generation (if
create_combosspecified)Train/val/test split based on
splitconfigurationModel initialization (RGCN encoder + edge policy)
Checkpoint detection and resume (if checkpoints exist)
Training loop with SLURM job submission
Checkpoint saving at
checkpoint_freqintervalsArchiving combinations (if
archive.enabledis true)
Configuration File
A complete workflow configuration (workflow_14benz.yaml) includes:
# System environment
system:
solvent_state: solv
# Generate combinations
create_combos:
input_dir: /path/to/14benz
out_dir: /path/to/generated_combos
include_patterns: [msld_flat.py]
# Data splitting
split:
train_frac: 0.9
val_frac: 0.1
seed: 42
# Pretraining (optional but recommended)
pretrain:
model_path: models/pretrained_policy.pt
# Curriculum learning
curriculum:
enabled: true
max_train_combos_per_stage: 100
stages:
- name: pairs_single_site
min_subs: 2
max_subs: 2
epochs: 50
- name: triplets_single_site
min_subs: 3
max_subs: 3
epochs: 50
progression:
type: epoch
# Model architecture
training:
encoder:
hidden_dims: [64, 64]
out_dim: 32
policy:
mlp_hidden: 64
value_network:
hidden_dims: [64, 32]
lr: 0.001
optimizer:
lr: 0.0001
# Simulation settings
run_sims: true
max_concurrent_jobs: 60
timeout: 1200
# Reward function
reward:
w_P: 0.5
w_T: 0.75
w_U: 0.3
gamma: 4.0
lambda_entropy: 0.5
# Checkpointing
output:
base_dir: /path/to/training_output
save_checkpoints: true
checkpoint_freq: 5
# Per-stage archiving
archive:
enabled: true
per_stage: true
archive_dir: /path/to/archives
See Also
File Handling - File format documentation and parsers
Contextual Bandit Setup - CB infrastructure and policy architecture
AtomBondGNN Pretraining - DeepSet pretraining for node embeddings
CB Behavior Cloning - Behavior cloning from expert coefficients
Running Examples - Example workflows and usage patterns
mllf API - API reference for workflow modules