HErmes.fitting package¶

HErmes.fitting.fit module¶

Provide routines for fitting charge histograms

HErmes.fitting.fit.fit_model(charges, model, startparams=None, rej_outliers=False, nbins=200, silent=False, parameter_text=(('$\\mu_{{SPE}}$& {:4.2e}\\\\', 5), ), use_minuit=False, normalize=True, **kwargs)[source]¶

Standardazied fitting routine.

Keyword Arguments:
Parameters:	charges (np.ndarray) – Charges obtained in a measurement (no histogram) model (pyosci.fit.Model) – A model to fit to the data startparams (tuple) – initial parameters to model, or None for first guess
	rej_outliers (bool) – Remove extreme outliers from data nbins (int) – Number of bins parameter_text (tuple) – will be passed to model.plot_result use_miniuit (bool) – use minuit to minimize startparams for best chi2 normalize (bool) – normalize data before fitting silent (bool) – silence output
Returns:	tuple

HErmes.fitting.fit.reject_outliers(data, m=2)[source]¶

A simple way to remove extreme outliers from data

Parameters:	data (np.ndarray) – data with outliers m (int) – number of standard deviations outside the data should be discarded
Returns:	np.ndarray

HErmes.fitting.functions module¶

Provide mathematical functions which can be used to create models. The functions have to be always in the form f(x, *parameters) where the paramters will be fitted and x are the input values.

HErmes.fitting.functions.calculate_chi_square(data, model_data)[source]¶

Very simple estimator for goodness-of-fit. Use with care. Non normalized bin counts are required.

Parameters:	data (np.ndarray) – observed data (bincounts) model_data (np.ndarray) – model predictions for each bin
Returns:	np.ndarray

HErmes.fitting.functions.calculate_reduced_chi_square(data, model_data, sigma)[source]¶

Very simple estimator for goodness-of-fit. Use with care.

Parameters:	data (np.ndarray) – observed data model_data (np.ndarray) – model predictions sigma (np.ndarray) – associated errors

Returns:

HErmes.fitting.functions.calculate_sigma_from_amp(amp)[source]¶

Get the sigma for the gauss from its peak value. Gauss is normed

Parameters:	amp (float) –
Returns:	float

HErmes.fitting.functions.exponential(x, lmbda)[source]¶

An exponential model, e.g. for a decay with coefficent lmbda.

Parameters:	x (float) – input lmbda (float) – The exponent of the exponential
Returns:	np.ndarray

HErmes.fitting.functions.fwhm_gauss(x, mu, fwhm, amp)[source]¶

A gaussian typically used for energy spectra fits of radiotion, where resolutions/linewidths are typically given in full widht half maximum (fwhm)

Parameters:	x (float) – input mu (float) – peak position fwhm (float) – full width half maximum amp (float) – amplitude
Returns:	function value
Return type:	float

HErmes.fitting.functions.gauss(x, mu, sigma)[source]¶

Returns a normed gaussian.

Parameters:	x (np.ndarray) – x values mu (float) – Gauss mu sigma (float) – Gauss sigma n –

Returns:

HErmes.fitting.functions.n_gauss(x, mu, sigma, n)[source]¶

Returns a normed gaussian in the case of n ==1. If n > 1, The gaussian mean is shifted by n and its width is enlarged by the factor of n. The envelope of a sequence of these gaussians will be an expoenential.

Parameters:	x (np.ndarray) – x values mu (float) – Gauss mu sigma (float) – Gauss sigma n (int) – > 0, linear coefficient

Returns:

HErmes.fitting.functions.pandel_factory(c_ice)[source]¶

Create a pandel function with the defined parameters. The pandel function is very specific, and a parametrisation for the delaytime distribution of photons from a source s measured at a reciever r after traversing a certain large (compared to the size of source or reciever) distance in a homogenous scatterint medium such as ice or water. The version here has a number of fixed parameters optimized for IceCube. This function will generate a Pandel function with a single free parameter, which is the distance between source and reciever.

Parameters:	c_ice (float) – group velocity in ice in m/ns
Returns:	callable (float, float) -> float

HErmes.fitting.functions.poisson(x, lmbda)[source]¶

Poisson probability

Parameters:	x (int) – measured number of occurences lmbda (int) – expected number of occurences
Returns:	np.ndarray

HErmes.fitting.functions.williams_correction()[source]¶: The so-called Williams correction can help to correct a chi2 value in case of bins with low statistics (< 5 entries)

HErmes.fitting.model module¶

Provide a simple, easy to use model for fitting data and especially distributions. The model is capable of having “components”, which can be defined and fitted individually.

class HErmes.fitting.model.Model(func, startparams=None, limits=((-inf, inf), ), errors=(10.0, ), func_norm=1)[source]¶

Bases: object

Describe data with a prediction. The Model class allows to set a function for data prediction, and fit it to the data by the means of a chi2 fit. It is possible to use a collection of functions to describe a complex model, e.g Gaussian + some exponential tail. The individual models can be fitted independently, which results in sum_i n_i de degrees of freedom for i models with n_i parameters each, or alternatively they c can be coupled and share parameters, which results in sum_i n_i - n_ij degrees of freedom where n_ij is a shared parameters.

add_data(data, data_errs=None, bins=200, create_distribution=False, normalize=False, density=True, xs=None, subtract=None)[source]¶

Add some data to the model, in preparation for the fit. There are two modes of this: 1) Data needs to be histogrammed, then make sure to set

‘nbins’ appropriatly and set the ‘create_distribution’

Data needs NOT to be histogrammed. In that case, bins has no meaning For a meaningful calculation of chi2, the errors of the data points need to be given to data_errs

Parameters:	data (np.array) – input data

Keyword Args

data_errs (np.array) : errors of the data for chi2 calculation: (only used when not histogramming)
nbins (int/np.array) : number of bins or bin array to be passed: to the histogramming routine
create_distribution (bool) : data requires the creation of a histogram: first before fitting

subtract (callable) : ? normalize (bool) : normalize the data before adding density (bool) : if normalized, assume the data is a pdf.

if False, use bincount for normalization.

Returns:	None

add_first_guess(func)[source]¶

Use func to estimate better startparameters for the initialization of the fit.

Parameters:	func (callable) – The function func has to have the same amount of parameters as we have startparameters.
Returns:	None

clear()[source]¶

Reset the model. This bascially deletes all components and resets the startparameters.

Returns:	None

components¶

construct_error_function(startparams, errors, limits, errordef)[source]¶

Construct the error function together with the necessary parameters for minuit.

Parameters:

startparams (tuple) – A set of startparameters. 1 start parameter per function parameter. A good choice of start parameters helps the fit a lot.
limits (tuple) – individual limit min/max for each parameter 1 tuple (min/max) per parameter
errors (tuple) – One value per parameter, giving an 1sigma error estimate
errordef (float) – The errordef should be 1 for a least square fit (for what this all is constructed for) or 0.5 in case of a likelihood fit

Returns:

tuple (callable, dict)

couple_all_models()[source]¶

“Lock” the model after all components have been added. This will determiine a set of startparameters. After this, no other models can be coupled/added any more.

Returns:	None

couple_models(coupling_variable)[source]¶

Couple the models by a variable, which means use the variable not independently in all model components, but fit it only once. E.g. if there are 3 models with parameters p1, p2, k each and they are coupled by k, parameters p11, p21, p12, p22, and k will be fitted instead of p11, p12, k1, p21, p22, k2.

Parameters:	coupling_variable – variable number of the number in startparams. This must be the index to the respective tuple.
Returns:	None

distribution¶

eval_first_guess(data)[source]¶

Assign a new set of start parameters obtained by calling the first geuss metthod

Parameters:	data (np.ndarray) – input data, used to evaluate the first guess method.
Returns:	None

extract_parameters()[source]¶

Get the variable names and coupling references for the individual model components

Returns:	tuple

fit_to_data(silent=False, use_minuit=True, errors=None, limits=None, errordef=1, debug_minuit=False, **kwargs)[source]¶

Apply this model to data. This will perform the fit with the help of either minuit or scipy.optimize.

Parameters:

data (np.ndarray) – the data, unbinned
silent (bool) – silence output
use_minuit (bool) – use minuit for fitting
errors (list) – errors for minuit, see miniuit manual
limits (list of tuples) – limits for minuit, see minuit manual
errordef (float) –
typically 1 for chi2 fit and 0.5 for llh fit : this class is currently set up as a leeast square

fit, so this should not be changed
debug_minuit (int) – if True, attache the iminuit instance to the model so that it can be inspected later on. Will raise error if use_minuit is set to False at the same time
**kwargs – will be passed on to scipy.optimize.curvefit

Returns:

None

get_minuit_instance()[source]¶: If a previous fit has been done with the debug_minuit instance then it now can be accessed.

n_free_params¶

The number of free parameters of this model. The free parameter in a least square fit are number of data points - fit parameters.

Returns:	int

plot_result(ymin=1000, xmax=8, ylabel='normed bincount', xlabel='Q [C]', fig=None, log=True, figure_factory=None, axes_range='auto', model_alpha=0.3, add_parameter_text=(('$\\mu_{{SPE}}$& {:4.2e}\\\\', 0), ), histostyle='scatter', datacolor='k', modelcolor='r')[source]¶

Show the fit result, together with the fitted data.

Parameters:

ymin (float) – limit the yrange to ymin
xmax (float) – limit the xrange to xmax
model_alpha (float) – 0 <= x <= 1 the alpha value of the lineplot for the model
ylabel (str) – label for yaxis
log (bool) – plot in log scale
figure_factory (fnc) – Use to generate the figure
axes_range (str) – the “field of view” to show
fig (pylab.figure) – A figure instance
add_parameter_text (tuple) – Display a parameter in the table on the plot ((text, parameter_number), (text, parameter_number),…)
datacolor (str) – color for the data points
modelcolor (str) – color for the model prediction

Returns:

pylab.figure

set_distribution(distr)[source]¶

Adding a distribution to the model. The distribution shall contain the data we want to model.

Parameters:	distr (dashi.histogram) –
Returns:	None

HErmes.fitting.model.concat_functions(fncs)[source]¶

Inspect functions and construct a new one which returns the added result. concat_functions(A(x, apars), B(x, bpars)) -> C(x, apars,bpars) C(x, apars, bpars) returns (A(x, apars) + B(x, bpars))

Parameters:	fncs (list) – The callables to concat
Returns:	tuple (callable, list(pars))

HErmes.fitting.model.construct_efunc(x, data, jointfunc, joint_pars)[source]¶

Construct a least-squares error function. This function will then be minimized, e.g. with the help of minuit.

Parameters:	x (np.ndarray) – The x-values the fit should be evaluated on data – (np.ndarray): The y-values of the data we want to describe jointfunc – (callable): The full data model with all components joint_pars – (tuple): The model parameters
Returns:	callable

HErmes.fitting.model.copy_func(f)[source]¶

Based on http://stackoverflow.com/a/6528148/190597 (Glenn Maynard)

Basically recreate the function f independently.

Parameters:	f (callable) – the function f will be cloned

HErmes.fitting.model.create_minuit_pardict(fn, startparams, errors, limits, errordef)[source]¶

Construct a dictionary for minuit fitting. This dictionary contains information for the minuit fitter like startparams or limits.

Parameters:	fn (callable) – The function for which startparams (tuple) – A list of startparameter. One each per parameter errors (list) – ? limits (list(tuple)) – A list of (min, max) tuples for each parameter, can be None errordef (float) – The errordef should be 1 for a least square fit (for what this all is constructed for) or 0.5 in case of a likelihood fit
Returns:	dict

Module contents¶

Provide an easy-to-use, intuitive way of fitting models with different components to data. The focus is less on a statistical sophisticated fitting rather than on an explorative approach to data investigation. This might help answer questions of the form - “How compatible is this data with a Gaussian + Exponential?”. Out of the box, this module provides tools targeted to a least-square fit, however, in principle this could be extended to likelihood fits.

Currently the generation of the minimized error function is automatic, and it is generated only for the least-squares case, however this might be expanded in the future.