gibbs

Perform Gibbs samplers and process data.

This module provides the ParallelGibbs class, which parallelizes the creation of Gibbs samplers for each residue in the contact map. This module also provides the Gibbs class, which allows for the loading and processing of the gibbs sampler data, as well as plotting and saving processed results.

class gibbs.Gibbs(times=None, residue=None, loc=0, ncomp=15, niter=110000, cutoff=None, g=100, burnin=10000, gskip=1)[source]

Gibbs sampler to estimate parameters of an exponential mixture for a set of data. Results are stored in gibbs.results, which uses MDAnalysis.analysis.base.Results(). If ‘results=None’ the gibbs sampler has not been executed, which requires calling run().

Parameters:
  • times (array, optional) – Set of residence times to analyze

  • residue (str) – Residue name associated with the set of residence times

  • loc (int) – Used for progress bar in parallel applications

  • ncomp (int) – Number of exponential components to use in the mixture model

  • niter (int) – Number of iterations to run the Gibbs sampler

  • cutoff (float) – Cutoff value used in contact analysis, used to determine directory to load/save results. Allows for multiple cutoffs to be tested in directory containing contacts.

  • g (int) – Gibbs skip parameter for decorrelated samples; only save every g samples from full Gibbs sampler chain; default from https://pubs.acs.org/doi/10.1021/acs.jctc.4c01522 (NOTE: this value is called gskip in cluster.py)

  • burnin (int) – Burn-in parameter, drop first burnin samples as equilibration; default from https://pubs.acs.org/doi/10.1021/acs.jctc.4c01522

  • gskip (int) – Process data from the subsampled chain (ever g samples) at a coarser skip interval of gskip samples. Thus, in total, samples are taken at g * gskip steps from the full chain. (This is useful for sensitivity analysis where we run the chain with a small g value and save many samples and then use gskip to process samples at increasingly larger intervals without having to re-run the chain.) The default value of 1 means that the samples are processed at every g samples from the full chain.

Example

>>> from basicrta.gibbs import Gibbs
>>> from basicrta.tests.datafiles import times
>>> g = Gibbs(times=times, residue='W313', cutoff=7.0)
>>> g.run()
>>> g.process_gibbs()
>>> g.estimate_tau()
[1, 2, 3]

To load a Gibbs sampler that has already been executed use the load() method

>>> g = Gibbs().load('results.pkl')

The Gibbs sampler can be executed using the run() method without processing the resulting data. Once the process_gibbs() method is called, the Gibbs.results.processed_results attribute will be populated.

cluster(method='GaussianMixture', **kwargs)[source]

Cluster the processed results using the methods available in sklearn.mixture

Parameters:

method (str) – Mixture method to use

estimate_tau()[source]

Estimate the posterior maximum and confidence interval (CI) for the \(tau\) distribution of the slowest process. NOTE: In the future this will return an array containing \(tau\) and CI for all clusters.

Returns:

An array containing the posterior maximum and bounds of the 95% confidence interval in the format [LB, max, UB].

Return type:

list

static load(file)[source]

Load an instance of Gibbs.

Parameters:

file (str) – Path to instance of Gibbs

plot_gibbs(scale=1.5, sparse=1, save=False)[source]
plot_hist(scale=1, save=False, component=None, bins=15)[source]
plot_surv(scale=1, remove_noise=False, save=False, xlim=None, ylim=(1e-06, 5), xmajor=None, xminor=None, xscale='linear', yscale='log')[source]

Plot the survival function with the exponential mixture components where parameters are determined from the clustering results.

Parameters:
  • scale (float) – Modify the size of the figure by this factor

  • remove_noise (bool) – Whether to remove noise clusters

  • save (bool) – Whether to save the figure

  • xlim (tuple) – X-axis limits

  • ylim (tuple) – Y-axis limits

  • xmajor (int) – X-axis major tick

  • xminor (int) – X-axis minor tick

plot_tau_hist(scale=1, save=False)[source]

Plot histogram of tau values. The figure aspect ratio is 4:3, and can be made larger/smaller using the scale argument.

Parameters:
  • scale (float) – Increase plot size by this factor

  • save (bool) – Save plot to file

process_gibbs(show=False)[source]

Process the samples collected from the Gibbs sampler. process_gibbs() can be called multiple times to check the robustness of the results.

result_plot(remove_noise=False, **kwargs)[source]

Generate the combined result plot with option to change kwargs without re-clustering.

Parameters:

remove_noise (bool) – Option to remove noise clusters

run()[source]

Execute the Gibbs sampler and save the raw data to the instance of Gibbs.

save()[source]

Save current state of the Gibbs instance.

class gibbs.ParallelGibbs(contacts, nproc=1, ncomp=15, niter=110000)[source]

A module to take a contact map and run Gibbs samplers for each residue.

Parameters:
  • contacts (str) – Contact pickle file (contacts-{cutoff}.pkl).

  • nproc (int) – Number of processes to use in running Gibbs samplers.

  • ncomp (int) – Number of mixture components to use in the Gibbs sampler.

  • niter (int) – Number of iterations of the Gibbs sampler to perform.

run(run_resids=None, g=100)[source]

The run() method executes the Gibbs samplers for all residues of sel1 present in the contact map, or a list of resids can be provided.

Parameters:

run_resids (int or list, optional) – Resid(s) for which to run a Gibbs sampler.

gibbs.get_parser()[source]
gibbs.main()[source]
gibbs.run_residue(residue, time, proc, ncomp, niter, cutoff, g, from_combined=False)[source]

Run Gibbs sampler for a single residue.

Parameters:
  • residue (str) – Residue name

  • time (array-like) – Residence times data

  • proc (int) – Process number for progress bar positioning

  • ncomp (int) – Number of mixture components

  • niter (int) – Number of iterations

  • cutoff (float) – Cutoff value used in contact analysis

  • g (int) – Gibbs skip parameter

  • from_combined (bool) – Whether data comes from combined contacts