gibbs
Perform Gibbs samplers and process data.
This module provides the ParallelGibbs class, which parallelizes the creation of Gibbs samplers for each residue in the contact map. This module also provides the Gibbs class, which allows for the loading and processing of the gibbs sampler data, as well as plotting and saving processed results.
- class gibbs.Gibbs(times=None, residue=None, loc=0, ncomp=15, niter=110000, cutoff=None, g=100, burnin=10000, gskip=1)[source]
Gibbs sampler to estimate parameters of an exponential mixture for a set of data. Results are stored in
gibbs.results, which usesMDAnalysis.analysis.base.Results(). If ‘results=None’ the gibbs sampler has not been executed, which requires callingrun().- Parameters:
times (array, optional) – Set of residence times to analyze
residue (str) – Residue name associated with the set of residence times
loc (int) – Used for progress bar in parallel applications
ncomp (int) – Number of exponential components to use in the mixture model
niter (int) – Number of iterations to run the Gibbs sampler
cutoff (float) – Cutoff value used in contact analysis, used to determine directory to load/save results. Allows for multiple cutoffs to be tested in directory containing contacts.
g (int) – Gibbs skip parameter for decorrelated samples; only save every g samples from full Gibbs sampler chain; default from https://pubs.acs.org/doi/10.1021/acs.jctc.4c01522 (NOTE: this value is called gskip in cluster.py)
burnin (int) – Burn-in parameter, drop first burnin samples as equilibration; default from https://pubs.acs.org/doi/10.1021/acs.jctc.4c01522
gskip (int) – Process data from the subsampled chain (ever g samples) at a coarser skip interval of gskip samples. Thus, in total, samples are taken at
g * gskipsteps from the full chain. (This is useful for sensitivity analysis where we run the chain with a small g value and save many samples and then use gskip to process samples at increasingly larger intervals without having to re-run the chain.) The default value of 1 means that the samples are processed at every g samples from the full chain.
Example
>>> from basicrta.gibbs import Gibbs >>> from basicrta.tests.datafiles import times >>> g = Gibbs(times=times, residue='W313', cutoff=7.0) >>> g.run() >>> g.process_gibbs() >>> g.estimate_tau() [1, 2, 3]
To load a Gibbs sampler that has already been executed use the
load()method>>> g = Gibbs().load('results.pkl')
The Gibbs sampler can be executed using the
run()method without processing the resulting data. Once theprocess_gibbs()method is called, theGibbs.results.processed_resultsattribute will be populated.- cluster(method='GaussianMixture', **kwargs)[source]
Cluster the processed results using the methods available in
sklearn.mixture- Parameters:
method (str) – Mixture method to use
- estimate_tau()[source]
Estimate the posterior maximum and confidence interval (CI) for the \(tau\) distribution of the slowest process. NOTE: In the future this will return an array containing \(tau\) and CI for all clusters.
- Returns:
An array containing the posterior maximum and bounds of the 95% confidence interval in the format [LB, max, UB].
- Return type:
- plot_surv(scale=1, remove_noise=False, save=False, xlim=None, ylim=(1e-06, 5), xmajor=None, xminor=None, xscale='linear', yscale='log')[source]
Plot the survival function with the exponential mixture components where parameters are determined from the clustering results.
- plot_tau_hist(scale=1, save=False)[source]
Plot histogram of tau values. The figure aspect ratio is 4:3, and can be made larger/smaller using the scale argument.
- process_gibbs(show=False)[source]
Process the samples collected from the Gibbs sampler.
process_gibbs()can be called multiple times to check the robustness of the results.
- class gibbs.ParallelGibbs(contacts, nproc=1, ncomp=15, niter=110000)[source]
A module to take a contact map and run Gibbs samplers for each residue.
- Parameters:
- gibbs.run_residue(residue, time, proc, ncomp, niter, cutoff, g, from_combined=False)[source]
Run Gibbs sampler for a single residue.
- Parameters:
residue (str) – Residue name
time (array-like) – Residence times data
proc (int) – Process number for progress bar positioning
ncomp (int) – Number of mixture components
niter (int) – Number of iterations
cutoff (float) – Cutoff value used in contact analysis
g (int) – Gibbs skip parameter
from_combined (bool) – Whether data comes from combined contacts