gibbs

Perform Gibbs samplers and process data.

This module provides the ParallelGibbs class, which parallelizes the creation of Gibbs samplers for each residue in the contact map. This module also provides the Gibbs class, which allows for the loading and processing of the gibbs sampler data, as well as plotting and saving processed results.

class gibbs.Gibbs(times=None, residue=None, loc=0, ncomp=15, niter=110000, cutoff=None, g=100, burnin=10000, gskip=1)[source]

Gibbs sampler to estimate parameters of an exponential mixture for a set of data. Results are stored in gibbs.results, which uses MDAnalysis.analysis.base.Results(). If ‘results=None’ the gibbs sampler has not been executed, which requires calling run().

Parameters:

times (array, optional) – Set of residence times to analyze
residue (str) – Residue name associated with the set of residence times
loc (int) – Used for progress bar in parallel applications
ncomp (int) – Number of exponential components to use in the mixture model
niter (int) – Number of iterations to run the Gibbs sampler
cutoff (float) – Cutoff value used in contact analysis, used to determine directory to load/save results. Allows for multiple cutoffs to be tested in directory containing contacts.
g (int) – Gibbs skip parameter for decorrelated samples; only save every g samples from full Gibbs sampler chain; default from https://pubs.acs.org/doi/10.1021/acs.jctc.4c01522 (NOTE: this value is called gskip in cluster.py)
burnin (int) – Burn-in parameter, drop first burnin samples as equilibration; default from https://pubs.acs.org/doi/10.1021/acs.jctc.4c01522
gskip (int) – Process data from the subsampled chain (ever g samples) at a coarser skip interval of gskip samples. Thus, in total, samples are taken at g * gskip steps from the full chain. (This is useful for sensitivity analysis where we run the chain with a small g value and save many samples and then use gskip to process samples at increasingly larger intervals without having to re-run the chain.) The default value of 1 means that the samples are processed at every g samples from the full chain.

Example

>>> from basicrta.gibbs import Gibbs
>>> from basicrta.tests.datafiles import times
>>> g = Gibbs(times=times, residue='W313', cutoff=7.0)
>>> g.run()
>>> g.process_gibbs()
>>> g.estimate_tau()
[1, 2, 3]

To load a Gibbs sampler that has already been executed use the load() method

>>> g = Gibbs().load('results.pkl')

The Gibbs sampler can be executed using the run() method without processing the resulting data. Once the process_gibbs() method is called, the Gibbs.results.processed_results attribute will be populated.

cluster(method='GaussianMixture', **kwargs)[source]

Cluster the processed results using the methods available in sklearn.mixture

Parameters:: method (str) – Mixture method to use

estimate_tau()[source]

Estimate the posterior maximum and confidence interval (CI) for the \(tau\) distribution of the slowest process. NOTE: In the future this will return an array containing \(tau\) and CI for all clusters.

Returns:: An array containing the posterior maximum and bounds of the 95% confidence interval in the format [LB, max, UB].
Return type:: list

static load(file)[source]

Load an instance of Gibbs.

Parameters:: file (str) – Path to instance of Gibbs

plot_gibbs(scale=1.5, sparse=1, save=False)[source]

plot_hist(scale=1, save=False, component=None, bins=15)[source]

plot_surv(scale=1, remove_noise=False, save=False, xlim=None, ylim=(1e-06, 5), xmajor=None, xminor=None, xscale='linear', yscale='log')[source]

Plot the survival function with the exponential mixture components where parameters are determined from the clustering results.

Parameters:

scale (float) – Modify the size of the figure by this factor
remove_noise (bool) – Whether to remove noise clusters
save (bool) – Whether to save the figure
xlim (tuple) – X-axis limits
ylim (tuple) – Y-axis limits
xmajor (int) – X-axis major tick
xminor (int) – X-axis minor tick

plot_tau_hist(scale=1, save=False)[source]

Plot histogram of tau values. The figure aspect ratio is 4:3, and can be made larger/smaller using the scale argument.

Parameters:

scale (float) – Increase plot size by this factor
save (bool) – Save plot to file

process_gibbs(show=False)[source]: Process the samples collected from the Gibbs sampler. process_gibbs() can be called multiple times to check the robustness of the results.

result_plot(remove_noise=False, **kwargs)[source]

Generate the combined result plot with option to change kwargs without re-clustering.

Parameters:: remove_noise (bool) – Option to remove noise clusters

run()[source]: Execute the Gibbs sampler and save the raw data to the instance of Gibbs.

save()[source]: Save current state of the Gibbs instance.

class gibbs.ParallelGibbs(contacts, nproc=1, ncomp=15, niter=110000)[source]

A module to take a contact map and run Gibbs samplers for each residue.

Parameters:

contacts (str) – Contact pickle file (contacts-{cutoff}.pkl).
nproc (int) – Number of processes to use in running Gibbs samplers.
ncomp (int) – Number of mixture components to use in the Gibbs sampler.
niter (int) – Number of iterations of the Gibbs sampler to perform.

run(run_resids=None, g=100)[source]

The run() method executes the Gibbs samplers for all residues of sel1 present in the contact map, or a list of resids can be provided.

Parameters:: run_resids (int or list, optional) – Resid(s) for which to run a Gibbs sampler.

gibbs.get_parser()[source]

gibbs.main()[source]

gibbs.run_residue(residue, time, proc, ncomp, niter, cutoff, g, from_combined=False)[source]

Run Gibbs sampler for a single residue.

Parameters:

residue (str) – Residue name
time (array-like) – Residence times data
proc (int) – Process number for progress bar positioning
ncomp (int) – Number of mixture components
niter (int) – Number of iterations
cutoff (float) – Cutoff value used in contact analysis
g (int) – Gibbs skip parameter
from_combined (bool) – Whether data comes from combined contacts