mllf.file_handling.read_pdb

PDB file parser

Provides helpers to parse PDB files and extract atomic coordinates and element symbols. Handles non-standard PDB formats where atom names encode element symbols (e.g., C001, H002, CL01).

Functions: - parse_pdb_file(path) -> tuple: (coordinates, elements) - extract_site_number(pdb_path) -> int: extract site number from filename - find_duplicate_atoms(coords1, coords2, tolerance) -> list: indices of duplicates in coords2 - remove_duplicate_atoms(coords, elements, core_coords) -> tuple: filtered (coords, elements) - combine_pdb_files(pdb_files) -> tuple: (coordinates, elements, atom_counts) - calculate_min_distance(coords1, coords2) -> float: minimum distance between two structures - find_nearby_pdbs(target_pdb, candidate_pdbs, cutoff) -> list: PDBs within cutoff distance - find_reference_subs_from_other_sites(target_pdb, prep_dir, cutoff) -> list: reference subs from other sites - parse_pdb_dir(directory, pattern=’*.pdb’) -> dict mapping filename -> parsed data

Spatial Filtering for Multi-Site Systems: The calculate_min_distance() and find_nearby_pdbs() functions enable automatic detection of which PDB files should be included in AEV computation based on spatial proximity.

For multi-site systems, use find_reference_subs_from_other_sites() to get only the reference substituent (site#_sub1) from other sites, excluding the current site.

Duplicate atom detection (find_duplicate_atoms/remove_duplicate_atoms) prevents double counting when reference substituents share atoms with the core structure.

For ANI-2x, use cutoff=5.1 Å (radial) or 3.5 Å (angular) to match the AEV function cutoffs.

Functions

calculate_min_distance(coords1, coords2)

Calculate minimum distance between two sets of coordinates.

combine_pdb_files(pdb_files)

Combine multiple PDB files into coordinate and element lists.

extract_site_number(pdb_path)

Extract site number from PDB filename.

find_duplicate_atoms(coords1, coords2[, ...])

Find indices in coords2 that duplicate atoms in coords1.

find_nearby_pdbs(target_pdb, candidate_pdbs)

Find PDB files with atoms within cutoff distance of target PDB.

find_reference_subs_from_other_sites(...[, ...])

Find reference substituents (sub1) from other sites within cutoff distance.

parse_pdb_dir(directory[, pattern])

Parse all PDB files in a directory and return a mapping keyed by filename.

parse_pdb_file(pdb_path[, rtf_data])

Parse PDB file to extract coordinates and elements.

remove_duplicate_atoms(coords, elements, ...)

Remove atoms from coords/elements that duplicate atoms in core_coords.