mllf.file_handling.read_pdb
PDB file parser
Provides helpers to parse PDB files and extract atomic coordinates and element symbols. Handles non-standard PDB formats where atom names encode element symbols (e.g., C001, H002, CL01).
Functions: - parse_pdb_file(path) -> tuple: (coordinates, elements) - extract_site_number(pdb_path) -> int: extract site number from filename - find_duplicate_atoms(coords1, coords2, tolerance) -> list: indices of duplicates in coords2 - remove_duplicate_atoms(coords, elements, core_coords) -> tuple: filtered (coords, elements) - combine_pdb_files(pdb_files) -> tuple: (coordinates, elements, atom_counts) - calculate_min_distance(coords1, coords2) -> float: minimum distance between two structures - find_nearby_pdbs(target_pdb, candidate_pdbs, cutoff) -> list: PDBs within cutoff distance - find_reference_subs_from_other_sites(target_pdb, prep_dir, cutoff) -> list: reference subs from other sites - parse_pdb_dir(directory, pattern=’*.pdb’) -> dict mapping filename -> parsed data
Spatial Filtering for Multi-Site Systems: The calculate_min_distance() and find_nearby_pdbs() functions enable automatic detection of which PDB files should be included in AEV computation based on spatial proximity.
For multi-site systems, use find_reference_subs_from_other_sites() to get only the reference substituent (site#_sub1) from other sites, excluding the current site.
Duplicate atom detection (find_duplicate_atoms/remove_duplicate_atoms) prevents double counting when reference substituents share atoms with the core structure.
For ANI-2x, use cutoff=5.1 Å (radial) or 3.5 Å (angular) to match the AEV function cutoffs.
Functions
|
Calculate minimum distance between two sets of coordinates. |
|
Combine multiple PDB files into coordinate and element lists. |
|
Extract site number from PDB filename. |
|
Find indices in coords2 that duplicate atoms in coords1. |
|
Find PDB files with atoms within cutoff distance of target PDB. |
|
Find reference substituents (sub1) from other sites within cutoff distance. |
|
Parse all PDB files in a directory and return a mapping keyed by filename. |
|
Parse PDB file to extract coordinates and elements. |
|
Remove atoms from coords/elements that duplicate atoms in core_coords. |