mllf.file_handling.generate_combinations

Generate all combinations of site/sub files into separate directories.

This utility scans an input directory for files matching the pattern site{site}_sub{sub}_{label}.{ext} (e.g., site1_sub2_pres.rtf, site1_sub2_frag.pdb) and creates output subdirectories, one per combination. For each combination, it copies the relevant files, renaming them so sub-indices start at 1 within the new directory.

Each generated combination directory contains: - prep/: Copy of input prep directory with renamed RTF/PDB files - msld_flat.py: Simulation script (if included via –include pattern) - mapping.json: Records original file paths and new names - info.py: Configuration dict with nsubs, nblocks, temp, etc. - run.sh: Executable SLURM submission script for running simulations

Example

input_dir/

site1_sub1_pres.rtf site1_sub1_frag.pdb site1_sub2_pres.rtf site1_sub2_frag.pdb site1_sub3_pres.rtf site1_sub3_frag.pdb

Running:

python -m mllf.file_handling.generate_combinations input_dir –out combos_out

Will produce directories like:
combos_out/comb_0001_site1_subs_1_2/

├── prep/ │ ├── site1_sub1_pres.rtf (renamed if necessary, see mapping.json) │ ├── site1_sub1_frag.pdb │ ├── site1_sub2_pres.rtf (renamed if necessary, see mapping.json) │ ├── site1_sub2_frag.pdb │ ├── top_all36_msld.rtf (unchanged from input prep/) │ ├── par_all36_msld.prm (unchanged from input prep/) │ └── … (other prep files) ├── msld_flat.py (if included via –include) ├── mapping.json ├── info.py └── run.sh

Combination Generation Logic:

  • Generates both within-site and cross-site combinations

  • Within-site: Each substituent can be the “anchor” with others as tail

    • Anchor is always first, tail is sorted

    • Example: anchor=1 generates [1,2], [1,3], [1,2,3], etc.

    • Example: anchor=2 generates [2,1], [2,3], [2,1,3], etc.

    • Minimum 2 substituents per combination

  • Cross-site: Cartesian product of within-site selections across sites

    • Each site contributes >= 2 substituents

    • Example: site1 has 75 selections, site2 has 186 selections

    • Generates 75 × 186 = 13,950 cross-site combinations

  • Total combinations grow significantly with multiple sites

Additional Features: - RTF PRES tokens are automatically renumbered to match new indices - Include patterns allow copying extra files (e.g., prep/, msld_flat.py) - Archive mode creates .tar.gz files for storage

Functions

all_site_sub_combinations(found[, ...])

Generate all within-site and cross-site ordered combinations.

archive_combo_dirs(out_dir[, pattern, remove])

Archive combination directories as .tar.gz files.

augment_core_with_excluded_sub1(...)

Augment core.rtf and core.pdb with atoms from an excluded site's sub1.

create_combination_dirs(input_dir, out_dir)

Create combination directories with renamed files and support files.

create_single_combination_dir(input_dir, ...)

Create a single combination directory with renamed files and support files.

find_site_sub_files(input_dir)

Scan input_dir and prep subdirectory for site/sub files.

list_possible_combinations(input_dir, out_dir)

List all possible combinations without creating directories.

main()

make_combo_dir_name(counter, sites, subs[, ...])

Generate a directory name for a combination.

renumber_pres_tokens(content, old_site, ...)

Renumber PRES tokens in RTF file content.