Tutorial

Contact Analysis

The basicrta workflow starts with collecting contacts between two atom groups sel1 and sel2 based on a single cutoff using contacts.py.

python -m basicrta.contacts --top top.pdb --traj traj.xtc --sel1 "protein"
--sel2 "resname CHOL" --cutoff 7.0 --nproc 4

This will create two contact maps: contacts_max10.0.pkl and contacts_7.0.pkl. The first is used to map all contacts defined with a 10 Å cutoff, upon which the cutoff is imposed and a new contact map is created for each desired cutoff value (for storage reasons this may change in the future). The protein residues in the topology should be correctly numbered, as these will be the names of directories where results are stored.

Gibbs Sampler

Next the contact map is used to collect the contacts between a specified residue of sel1 and each residue of the sel2 group. The contact durations (residence times) are then used as input data for the Gibbs sampler. A specific residue of sel1 can be used to run a Gibbs sampler for only that residue

python -m basicrta.gibbs --contacts contacts_7.0.pkl --nproc 5 --resid 313

or if resid is left out, a Gibbs sampler will be executed for all sel1 residues in the contact map.

python -m basicrta.gibbs --contacts contacts_7.0.pkl --nproc 5

Clustering

Next the samples obtained from the Gibbs sampler are processed and clustered.

python -m basicrta.cluster --niter 110000 --nproc 3 --cutoff 7.0 --prot b2ar

The prot argument is used to create rectangles in the \(\tau\) vs resid plot that correspond to the TM segments of the protein (figures 7-10). Your protein can be added to basicrta/basicrta/data/tm_dict.txt in the same format as the existing proteins.

Optional parameters --gskip and --burnin can be used to control the Gibbs sampler processing. The --gskip parameter specifies the skip interval for decorrelated samples (default: 100), while --burnin sets the number of initial samples to discard during equilibration (default: 10000). The default values are taken from the research paper.

python -m basicrta.cluster --niter 110000 --nproc 3 --cutoff 7.0 --prot b2ar \
--gskip 100 --burnin 10000

basicrta.cluster will process the Gibbs samplers, compute \(\tau\) for each residue, plot \(\tau\) vs resid, and write the data to tausout.npy (see next section). If a structure is passed to the script, the b-factors of the residues will be populated with the appropriate \(\tau\).

Rate/Tau Estimates

Estimates for \(\tau\) are obtained by using the write_data() method of the ProcessProtein class contained in cluster.py. Data is saved as a .npy file and contains [protein residue, tau, CI lower bound, CI upper bound] for each residue of sel1 analyzed (given dataset size is sufficiently large (~50 points)). Writing out all model parameters to a .npy file is currently not implemented, but will be possible in later versions.

Kinetic Mapping

The kinetically mapped trajectory and weighted densities can be created using kinetics.py.

python -m basicrta.kinetics --gibbs basicrta_7.0/W313/gibbs_110000.pkl
--contacts contacts_7.0.pkl --wdensity

To create only the mapped trajectory, leave out the wdensity flag.

python -m basicrta.kinetics --gibbs basicrta_7.0/W313/gibbs_110000.pkl
--contacts contacts_7.0.pkl

This can also be done using the top_n most likely frames belonging to each component of the exponential mixture model.

python -m basicrta.kinetics --gibbs basicrta_7.0/W313/gibbs_110000.pkl
--contacts contacts_7.0.pkl --top_n 500

Weighted densities can be computed over the top_n frames, over the whole trajectory, or by using the step argument in combination with top_n or the whole trajectory.

python -m basicrta.kinetics --gibbs basicrta_7.0/W313/gibbs_110000.pkl
--contacts contacts_7.0.pkl --step 100

Supplemental Scripts

Slurm scripts for submitting Gibbs sampler jobs to distributed systems are located in the basicrta/scripts directory. Note that some of these were rewritten and may contain slight errors, testing still needs to be done (but can be used with some slight modifications).