HErmes.fitting package

HErmes.fitting.fit module

Provide routines for fitting charge histograms

HErmes.fitting.fit.fit_model(charges, model, startparams=None, rej_outliers=False, nbins=200, silent=False, parameter_text=(('$\\mu_{{SPE}}$& {:4.2e}\\\\', 5), ), use_minuit=False, normalize=True, **kwargs)[source]

Standardazied fitting routine.

Parameters:
  • charges (np.ndarray) – Charges obtained in a measurement (no histogram)
  • model (pyosci.fit.Model) – A model to fit to the data
  • startparams (tuple) – initial parameters to model, or None for first guess
Keyword Arguments:
 
  • rej_outliers (bool) – Remove extreme outliers from data
  • nbins (int) – Number of bins
  • parameter_text (tuple) – will be passed to model.plot_result
  • use_miniuit (bool) – use minuit to minimize startparams for best chi2
  • normalize (bool) – normalize data before fitting
  • silent (bool) – silence output
Returns:

tuple

HErmes.fitting.fit.reject_outliers(data, m=2)[source]

A simple way to remove extreme outliers from data

Parameters:
  • data (np.ndarray) – data with outliers
  • m (int) – number of standard deviations outside the data should be discarded
Returns:

np.ndarray

HErmes.fitting.functions module

Provide mathematical functions which can be used to create models. The functions have to be always in the form f(x, *parameters) where the paramters will be fitted and x are the input values.

HErmes.fitting.functions.calculate_chi_square(data, model_data)[source]

Very simple estimator for goodness-of-fit. Use with care. Non normalized bin counts are required.

Parameters:
  • data (np.ndarray) – observed data (bincounts)
  • model_data (np.ndarray) – model predictions for each bin
Returns:

np.ndarray

HErmes.fitting.functions.calculate_reduced_chi_square(data, model_data, sigma)[source]

Very simple estimator for goodness-of-fit. Use with care.

Parameters:
  • data (np.ndarray) – observed data
  • model_data (np.ndarray) – model predictions
  • sigma (np.ndarray) – associated errors

Returns:

HErmes.fitting.functions.calculate_sigma_from_amp(amp)[source]

Get the sigma for the gauss from its peak value. Gauss is normed

Parameters:amp (float) –
Returns:float
HErmes.fitting.functions.exponential(x, lmbda)[source]

An exponential model, e.g. for a decay with coefficent lmbda.

Parameters:
  • x (float) – input
  • lmbda (float) – The exponent of the exponential
Returns:

np.ndarray

HErmes.fitting.functions.fwhm_gauss(x, mu, fwhm, amp)[source]

A gaussian typically used for energy spectra fits of radiotion, where resolutions/linewidths are typically given in full widht half maximum (fwhm)

Parameters:
  • x (float) – input
  • mu (float) – peak position
  • fwhm (float) – full width half maximum
  • amp (float) – amplitude
Returns:

function value

Return type:

float

HErmes.fitting.functions.gauss(x, mu, sigma)[source]

Returns a normed gaussian.

Parameters:
  • x (np.ndarray) – x values
  • mu (float) – Gauss mu
  • sigma (float) – Gauss sigma
  • n

Returns:

HErmes.fitting.functions.n_gauss(x, mu, sigma, n)[source]

Returns a normed gaussian in the case of n ==1. If n > 1, The gaussian mean is shifted by n and its width is enlarged by the factor of n. The envelope of a sequence of these gaussians will be an expoenential.

Parameters:
  • x (np.ndarray) – x values
  • mu (float) – Gauss mu
  • sigma (float) – Gauss sigma
  • n (int) – > 0, linear coefficient

Returns:

HErmes.fitting.functions.pandel_factory(c_ice)[source]

Create a pandel function with the defined parameters. The pandel function is very specific, and a parametrisation for the delaytime distribution of photons from a source s measured at a reciever r after traversing a certain large (compared to the size of source or reciever) distance in a homogenous scatterint medium such as ice or water. The version here has a number of fixed parameters optimized for IceCube. This function will generate a Pandel function with a single free parameter, which is the distance between source and reciever.

Parameters:c_ice (float) – group velocity in ice in m/ns
Returns:callable (float, float) -> float
HErmes.fitting.functions.poisson(x, lmbda)[source]

Poisson probability

Parameters:
  • x (int) – measured number of occurences
  • lmbda (int) – expected number of occurences
Returns:

np.ndarray

HErmes.fitting.functions.williams_correction()[source]

The so-called Williams correction can help to correct a chi2 value in case of bins with low statistics (< 5 entries)

HErmes.fitting.model module

Provide a simple, easy to use model for fitting data and especially distributions. The model is capable of having “components”, which can be defined and fitted individually.

class HErmes.fitting.model.Model(func, startparams=None, limits=((-inf, inf), ), errors=(10.0, ), func_norm=1)[source]

Bases: object

Describe data with a prediction. The Model class allows to set a function for data prediction, and fit it to the data by the means of a chi2 fit. It is possible to use a collection of functions to describe a complex model, e.g Gaussian + some exponential tail. The individual models can be fitted independently, which results in sum_i n_i de degrees of freedom for i models with n_i parameters each, or alternatively they c can be coupled and share parameters, which results in sum_i n_i - n_ij degrees of freedom where n_ij is a shared parameters.

add_data(data, data_errs=None, bins=200, create_distribution=False, normalize=False, density=True, xs=None, subtract=None)[source]

Add some data to the model, in preparation for the fit. There are two modes of this: 1) Data needs to be histogrammed, then make sure to set

‘nbins’ appropriatly and set the ‘create_distribution’
  1. Data needs NOT to be histogrammed. In that case, bins has no meaning For a meaningful calculation of chi2, the errors of the data points need to be given to data_errs
Parameters:data (np.array) – input data
Keyword Args
data_errs (np.array) : errors of the data for chi2 calculation
(only used when not histogramming)
nbins (int/np.array) : number of bins or bin array to be passed
to the histogramming routine
create_distribution (bool) : data requires the creation of a histogram
first before fitting

subtract (callable) : ? normalize (bool) : normalize the data before adding density (bool) : if normalized, assume the data is a pdf.

if False, use bincount for normalization.
Returns:None
add_first_guess(func)[source]

Use func to estimate better startparameters for the initialization of the fit.

Parameters:func (callable) – The function func has to have the same amount of parameters as we have startparameters.
Returns:None
clear()[source]

Reset the model. This bascially deletes all components and resets the startparameters.

Returns:None
components
construct_error_function(startparams, errors, limits, errordef)[source]

Construct the error function together with the necessary parameters for minuit.

Parameters:
  • startparams (tuple) – A set of startparameters. 1 start parameter per function parameter. A good choice of start parameters helps the fit a lot.
  • limits (tuple) – individual limit min/max for each parameter 1 tuple (min/max) per parameter
  • errors (tuple) – One value per parameter, giving an 1sigma error estimate
  • errordef (float) – The errordef should be 1 for a least square fit (for what this all is constructed for) or 0.5 in case of a likelihood fit
Returns:

tuple (callable, dict)

couple_all_models()[source]

“Lock” the model after all components have been added. This will determiine a set of startparameters. After this, no other models can be coupled/added any more.

Returns:None
couple_models(coupling_variable)[source]

Couple the models by a variable, which means use the variable not independently in all model components, but fit it only once. E.g. if there are 3 models with parameters p1, p2, k each and they are coupled by k, parameters p11, p21, p12, p22, and k will be fitted instead of p11, p12, k1, p21, p22, k2.

Parameters:coupling_variable – variable number of the number in startparams. This must be the index to the respective tuple.
Returns:None
distribution
eval_first_guess(data)[source]

Assign a new set of start parameters obtained by calling the first geuss metthod

Parameters:data (np.ndarray) – input data, used to evaluate the first guess method.
Returns:None
extract_parameters()[source]

Get the variable names and coupling references for the individual model components

Returns:tuple
fit_to_data(silent=False, use_minuit=True, errors=None, limits=None, errordef=1, debug_minuit=False, **kwargs)[source]

Apply this model to data. This will perform the fit with the help of either minuit or scipy.optimize.

Parameters:
  • data (np.ndarray) – the data, unbinned
  • silent (bool) – silence output
  • use_minuit (bool) – use minuit for fitting
  • errors (list) – errors for minuit, see miniuit manual
  • limits (list of tuples) – limits for minuit, see minuit manual
  • errordef (float) –

    typically 1 for chi2 fit and 0.5 for llh fit : this class is currently set up as a leeast square

    fit, so this should not be changed
  • debug_minuit (int) – if True, attache the iminuit instance to the model so that it can be inspected later on. Will raise error if use_minuit is set to False at the same time
  • **kwargs – will be passed on to scipy.optimize.curvefit
Returns:

None

get_minuit_instance()[source]

If a previous fit has been done with the debug_minuit instance then it now can be accessed.

n_free_params

The number of free parameters of this model. The free parameter in a least square fit are number of data points - fit parameters.

Returns:int
plot_result(ymin=1000, xmax=8, ylabel='normed bincount', xlabel='Q [C]', fig=None, log=True, figure_factory=None, axes_range='auto', model_alpha=0.3, add_parameter_text=(('$\\mu_{{SPE}}$& {:4.2e}\\\\', 0), ), histostyle='scatter', datacolor='k', modelcolor='r')[source]

Show the fit result, together with the fitted data.

Parameters:
  • ymin (float) – limit the yrange to ymin
  • xmax (float) – limit the xrange to xmax
  • model_alpha (float) – 0 <= x <= 1 the alpha value of the lineplot for the model
  • ylabel (str) – label for yaxis
  • log (bool) – plot in log scale
  • figure_factory (fnc) – Use to generate the figure
  • axes_range (str) – the “field of view” to show
  • fig (pylab.figure) – A figure instance
  • add_parameter_text (tuple) – Display a parameter in the table on the plot ((text, parameter_number), (text, parameter_number),…)
  • datacolor (str) – color for the data points
  • modelcolor (str) – color for the model prediction
Returns:

pylab.figure

set_distribution(distr)[source]

Adding a distribution to the model. The distribution shall contain the data we want to model.

Parameters:distr (dashi.histogram) –
Returns:None
HErmes.fitting.model.concat_functions(fncs)[source]

Inspect functions and construct a new one which returns the added result. concat_functions(A(x, apars), B(x, bpars)) -> C(x, apars,bpars) C(x, apars, bpars) returns (A(x, apars) + B(x, bpars))

Parameters:fncs (list) – The callables to concat
Returns:tuple (callable, list(pars))
HErmes.fitting.model.construct_efunc(x, data, jointfunc, joint_pars)[source]

Construct a least-squares error function. This function will then be minimized, e.g. with the help of minuit.

Parameters:
  • x (np.ndarray) – The x-values the fit should be evaluated on
  • data – (np.ndarray): The y-values of the data we want to describe
  • jointfunc – (callable): The full data model with all components
  • joint_pars – (tuple): The model parameters
Returns:

callable

HErmes.fitting.model.copy_func(f)[source]

Based on http://stackoverflow.com/a/6528148/190597 (Glenn Maynard)

Basically recreate the function f independently.

Parameters:f (callable) – the function f will be cloned
HErmes.fitting.model.create_minuit_pardict(fn, startparams, errors, limits, errordef)[source]

Construct a dictionary for minuit fitting. This dictionary contains information for the minuit fitter like startparams or limits.

Parameters:
  • fn (callable) – The function for which
  • startparams (tuple) – A list of startparameter. One each per parameter
  • errors (list) –

    ?

  • limits (list(tuple)) – A list of (min, max) tuples for each parameter, can be None
  • errordef (float) – The errordef should be 1 for a least square fit (for what this all is constructed for) or 0.5 in case of a likelihood fit
Returns:

dict

Module contents

Provide an easy-to-use, intuitive way of fitting models with different components to data. The focus is less on a statistical sophisticated fitting rather than on an explorative approach to data investigation. This might help answer questions of the form - “How compatible is this data with a Gaussian + Exponential?”. Out of the box, this module provides tools targeted to a least-square fit, however, in principle this could be extended to likelihood fits.

Currently the generation of the minimized error function is automatic, and it is generated only for the least-squares case, however this might be expanded in the future.