Combining Contact Timeseries from Multiple Repeats
This functionality allows you to combine contact timeseries from multiple repeat runs to analyze pooled data and calculate posteriors from all data together, rather than analyzing each run separately.
Use Cases
Analyze data from multiple repeat simulations together
Pool binding events from multiple trajectories for better statistics
Calculate combined residence time distributions and confidence intervals
Usage
Command Line Interface
After generating contact files for each repeat run individually:
# Run contact analysis for each repeat
python -m basicrta.contacts --top sys.pdb --traj run1.xtc --sel1 "protein" --sel2 "resname CHOL" --cutoff 7.0
mv contacts_7.0.pkl contacts_run1_7.0.pkl
python -m basicrta.contacts --top sys.pdb --traj run2.xtc --sel1 "protein" --sel2 "resname CHOL" --cutoff 7.0
mv contacts_7.0.pkl contacts_run2_7.0.pkl
python -m basicrta.contacts --top sys.pdb --traj run3.xtc --sel1 "protein" --sel2 "resname CHOL" --cutoff 7.0
mv contacts_7.0.pkl contacts_run3_7.0.pkl
# Combine the contact files
python -m basicrta.combine --contacts contacts_run1_7.0.pkl contacts_run2_7.0.pkl contacts_run3_7.0.pkl --output combined_contacts_7.0.pkl
# Run Gibbs sampler on combined data
python -m basicrta.gibbs --contacts combined_contacts_7.0.pkl --nproc 5
Python API
from basicrta.contacts import CombineContacts
# Combine contact files
combiner = CombineContacts(
contact_files=['contacts_run1_7.0.pkl', 'contacts_run2_7.0.pkl', 'contacts_run3_7.0.pkl'],
output_name='combined_contacts_7.0.pkl'
)
output_file = combiner.run()
print(f"Combined contacts saved to: {output_file}")
Features
Compatibility Validation
The combiner automatically validates that contact files are compatible:
Same cutoff distance: All files must use the same cutoff
Same atom groups: Protein and ligand selections must match
Timestep warnings: Warns if different timestep values are detected across runs
Metadata Preservation
Combined files preserve and extend metadata:
Original trajectory information for each source file
Number of trajectories combined
Source file tracking for potential kinetic clustering
Trajectory Source Tracking
Each contact in the combined file includes trajectory source information:
Original contact data columns preserved
Additional column with trajectory index for kinetic clustering support
Limitations
Kinetic Clustering
Kinetic clustering is not yet supported for combined contact data. The code will:
Issue warnings when loading combined contact files
Raise a clear error if clustering is attempted
Suggest alternatives for kinetic clustering analysis
For kinetic clustering, analyze each trajectory separately or implement the extended clustering algorithm that uses trajectory source information.
Different Trajectory Properties
Different timesteps: The combiner warns about different timestep values but proceeds. This may affect residence time estimates for fast events.
Different particle counts: Unlike trajectory concatenation, this approach handles trajectories with different numbers of particles correctly.
Error Handling
The combiner includes comprehensive error checking:
# Missing files
python -m basicrta.combine --contacts file1.pkl missing_file.pkl
# ERROR: Contact file not found: missing_file.pkl
# Incompatible cutoffs
python -m basicrta.combine --contacts contacts_7.0.pkl contacts_8.0.pkl
# ERROR: Incompatible cutoffs: file 0 has 7.0, file 1 has 8.0
# Skip validation (use with caution)
python -m basicrta.combine --contacts file1.pkl file2.pkl --no-validate
Output Format
Combined contact files:
Maintain the same format as individual contact files
Include extended metadata with source tracking
Add trajectory source column (last column) for each contact
Can be used directly with existing Gibbs sampler workflow
The Gibbs sampler will process combined files normally but issue warnings about kinetic clustering limitations.