HErmes - highly efficient rapid multipurpose event selection toolset¶
What is an event selection?
In the context of high energy physics, event selection means the enhancement of the signal-to-noise rate by implementing filter criteria on the data. Since the signal consists of individual “events” (like a collision of particles in a collider) selecting only events which appear to be “signal-like” as defined by certain criteria is one of the basic tasks for a typical analysis in high energy physics. Typically the number of these kinds of events is very small compared to the number of background events (which are not “interesting” to the respective analyzer).
How can this package help with the task?
Selecting events is easy. However what is more complicated is the bookkeeping. To illustrate this, we have to go a bit more into the details:
First, let`s start with some definitions:
- A variable describes a quantity which can describe signalness, e.g. energy.
- A cut describes a quality criterion, which is a condition imposed on a variable, e.g. “All events with energies larger then 100TeV”
- A data category is given by the fact that in many cases there is more than one type of data of interest which have to be studied simultaniously. For example this can be:
- Real data, and a simulation of the signal and background
- Different types of signal and background simulations for different kinds of hypothesis
- Different types of data, e.g. different years of experimental data which need to be compared.
and so on…
- A dataset means in this context a compilation of categories.
With these definitions, it is now possible to talk about bookkeeping: it is simply the necessity to ensure that every cut which is done the same way on each category of a dataset. This software intends to perform this task as painless as possible.
Another problem: fragmented datasources..
Often times, the data does not reach the analyzer in a consistent way: There might be several data files for a category, or different names for a variable. This software fixes some of these issues.
Why not just use root?
Root is certainly the most popular framework used in particle physics. The here described package does not intend to reimplement all the statistical and physics oriented features of root. The HErmes toolset allows for a quick inspection of a dataset and pre-analysis with the focus of questions like: “How well does my simulation agree with data?” or “What signal rate can I expect from a certain dataset?”. If questions like that need to be accessed quickly, then this package might be helpful. For elaborated analysis tools, other software (like Root) might be a better choice.
The HErmes package is especially optimized to make the step from a bunch of files to a distribution after applications of some cuts as painless as possible.
HErmes documentation contents¶
HErmes package¶
Subpackages¶
HErmes.analysis package¶
Submodules¶
HErmes.analysis.calculus module¶
Common calculations
-
HErmes.analysis.calculus.
opening_angle
(reco_zen, reco_azi, true_zen, true_azi)[source]¶ Calculate the opening angle between two vectors, described by azimuth and zenith in some coordinate system. Can be useful for estimatiion of angular uncertainty for some reconstruction. Zenith and Azimuth in radians.
Parameters: Returns: Opening angle in degree
Return type:
HErmes.analysis.fluxes module¶
Models for particle fluxes. These are just examples, for specific cosmic ray modelss have a look at e.g. https://github.com/afedynitch/CRFluxModels.git
-
class
HErmes.analysis.fluxes.
PowerLawFlux
(emin, emax, phi0, gamma)[source]¶ Bases:
object
A flux only dependent on the energy of a particle, following a power law. Defined in an energy interval [emin, emax] with fluence phi0 and spectral index gamma
HErmes.analysis.tasks module¶
Multi-step operations which might be ultimatly performed on variables in a dataset
-
HErmes.analysis.tasks.
construct_slices
(name, bins)[source]¶ Prepare a set of cuts for the variable with name “name” in the dataset This will just create the bins. This has then to be handed over to the HErmes.cut.Cut class for further application on a dataset.
Parameters: - name (str) – The name of the variable in the dataset
- bins (array) – bincenters of the slices
Returns: tuple (list of strings, list of cuttuples)
Module contents¶
Some snippets/functions which might help wirh reocurring analysis tools and common tasks
HErmes.fitting package¶
HErmes.fitting.fit module¶
Provide routines for fitting charge histograms
-
HErmes.fitting.fit.
fit_model
(charges, model, startparams=None, rej_outliers=False, nbins=200, silent=False, parameter_text=(('$\\mu_{{SPE}}$& {:4.2e}\\\\', 5), ), use_minuit=False, normalize=True, **kwargs)[source]¶ Standardazied fitting routine.
Parameters: - charges (np.ndarray) – Charges obtained in a measurement (no histogram)
- model (pyosci.fit.Model) – A model to fit to the data
- startparams (tuple) – initial parameters to model, or None for first guess
Keyword Arguments: Returns: tuple
HErmes.fitting.functions module¶
Provide mathematical functions which can be used to create models. The functions have to be always in the form f(x, *parameters) where the paramters will be fitted and x are the input values.
-
HErmes.fitting.functions.
calculate_chi_square
(data, model_data)[source]¶ Very simple estimator for goodness-of-fit. Use with care. Non normalized bin counts are required.
Parameters: - data (np.ndarray) – observed data (bincounts)
- model_data (np.ndarray) – model predictions for each bin
Returns: np.ndarray
-
HErmes.fitting.functions.
calculate_reduced_chi_square
(data, model_data, sigma)[source]¶ Very simple estimator for goodness-of-fit. Use with care.
Parameters: - data (np.ndarray) – observed data
- model_data (np.ndarray) – model predictions
- sigma (np.ndarray) – associated errors
Returns:
-
HErmes.fitting.functions.
calculate_sigma_from_amp
(amp)[source]¶ Get the sigma for the gauss from its peak value. Gauss is normed
Parameters: amp (float) – Returns: float
-
HErmes.fitting.functions.
exponential
(x, lmbda)[source]¶ An exponential model, e.g. for a decay with coefficent lmbda.
Parameters: Returns: np.ndarray
-
HErmes.fitting.functions.
fwhm_gauss
(x, mu, fwhm, amp)[source]¶ A gaussian typically used for energy spectra fits of radiotion, where resolutions/linewidths are typically given in full widht half maximum (fwhm)
Parameters: Returns: function value
Return type:
-
HErmes.fitting.functions.
gauss
(x, mu, sigma)[source]¶ Returns a normed gaussian.
Parameters: Returns:
-
HErmes.fitting.functions.
n_gauss
(x, mu, sigma, n)[source]¶ Returns a normed gaussian in the case of n ==1. If n > 1, The gaussian mean is shifted by n and its width is enlarged by the factor of n. The envelope of a sequence of these gaussians will be an expoenential.
Parameters: Returns:
-
HErmes.fitting.functions.
pandel_factory
(c_ice)[source]¶ Create a pandel function with the defined parameters. The pandel function is very specific, and a parametrisation for the delaytime distribution of photons from a source s measured at a reciever r after traversing a certain large (compared to the size of source or reciever) distance in a homogenous scatterint medium such as ice or water. The version here has a number of fixed parameters optimized for IceCube. This function will generate a Pandel function with a single free parameter, which is the distance between source and reciever.
Parameters: c_ice (float) – group velocity in ice in m/ns Returns: callable (float, float) -> float
HErmes.fitting.model module¶
Provide a simple, easy to use model for fitting data and especially distributions. The model is capable of having “components”, which can be defined and fitted individually.
-
class
HErmes.fitting.model.
Model
(func, startparams=None, limits=((-inf, inf), ), errors=(10.0, ), func_norm=1)[source]¶ Bases:
object
Describe data with a prediction. The Model class allows to set a function for data prediction, and fit it to the data by the means of a chi2 fit. It is possible to use a collection of functions to describe a complex model, e.g Gaussian + some exponential tail. The individual models can be fitted independently, which results in sum_i n_i de degrees of freedom for i models with n_i parameters each, or alternatively they c can be coupled and share parameters, which results in sum_i n_i - n_ij degrees of freedom where n_ij is a shared parameters.
-
add_data
(data, data_errs=None, bins=200, create_distribution=False, normalize=False, density=True, xs=None, subtract=None)[source]¶ Add some data to the model, in preparation for the fit. There are two modes of this: 1) Data needs to be histogrammed, then make sure to set
‘nbins’ appropriatly and set the ‘create_distribution’- Data needs NOT to be histogrammed. In that case, bins has no meaning For a meaningful calculation of chi2, the errors of the data points need to be given to data_errs
Parameters: data (np.array) – input data - Keyword Args
- data_errs (np.array) : errors of the data for chi2 calculation
- (only used when not histogramming)
- nbins (int/np.array) : number of bins or bin array to be passed
- to the histogramming routine
- create_distribution (bool) : data requires the creation of a histogram
- first before fitting
subtract (callable) : ? normalize (bool) : normalize the data before adding density (bool) : if normalized, assume the data is a pdf.
if False, use bincount for normalization.
Returns: None
-
add_first_guess
(func)[source]¶ Use func to estimate better startparameters for the initialization of the fit.
Parameters: func (callable) – The function func has to have the same amount of parameters as we have startparameters. Returns: None
-
clear
()[source]¶ Reset the model. This bascially deletes all components and resets the startparameters.
Returns: None
-
components
¶
-
construct_error_function
(startparams, errors, limits, errordef)[source]¶ Construct the error function together with the necessary parameters for minuit.
Parameters: - startparams (tuple) – A set of startparameters. 1 start parameter per function parameter. A good choice of start parameters helps the fit a lot.
- limits (tuple) – individual limit min/max for each parameter 1 tuple (min/max) per parameter
- errors (tuple) – One value per parameter, giving an 1sigma error estimate
- errordef (float) – The errordef should be 1 for a least square fit (for what this all is constructed for) or 0.5 in case of a likelihood fit
Returns: tuple (callable, dict)
-
couple_all_models
()[source]¶ “Lock” the model after all components have been added. This will determiine a set of startparameters. After this, no other models can be coupled/added any more.
Returns: None
-
couple_models
(coupling_variable)[source]¶ Couple the models by a variable, which means use the variable not independently in all model components, but fit it only once. E.g. if there are 3 models with parameters p1, p2, k each and they are coupled by k, parameters p11, p21, p12, p22, and k will be fitted instead of p11, p12, k1, p21, p22, k2.
Parameters: coupling_variable – variable number of the number in startparams. This must be the index to the respective tuple. Returns: None
-
distribution
¶
-
eval_first_guess
(data)[source]¶ Assign a new set of start parameters obtained by calling the first geuss metthod
Parameters: data (np.ndarray) – input data, used to evaluate the first guess method. Returns: None
-
extract_parameters
()[source]¶ Get the variable names and coupling references for the individual model components
Returns: tuple
-
fit_to_data
(silent=False, use_minuit=True, errors=None, limits=None, errordef=1, debug_minuit=False, **kwargs)[source]¶ Apply this model to data. This will perform the fit with the help of either minuit or scipy.optimize.
Parameters: - data (np.ndarray) – the data, unbinned
- silent (bool) – silence output
- use_minuit (bool) – use minuit for fitting
- errors (list) – errors for minuit, see miniuit manual
- limits (list of tuples) – limits for minuit, see minuit manual
- errordef (float) –
typically 1 for chi2 fit and 0.5 for llh fit : this class is currently set up as a leeast square
fit, so this should not be changed - debug_minuit (int) – if True, attache the iminuit instance to the model so that it can be inspected later on. Will raise error if use_minuit is set to False at the same time
- **kwargs – will be passed on to scipy.optimize.curvefit
Returns: None
-
get_minuit_instance
()[source]¶ If a previous fit has been done with the debug_minuit instance then it now can be accessed.
-
n_free_params
¶ The number of free parameters of this model. The free parameter in a least square fit are number of data points - fit parameters.
Returns: int
-
plot_result
(ymin=1000, xmax=8, ylabel='normed bincount', xlabel='Q [C]', fig=None, log=True, figure_factory=None, axes_range='auto', model_alpha=0.3, add_parameter_text=(('$\\mu_{{SPE}}$& {:4.2e}\\\\', 0), ), histostyle='scatter', datacolor='k', modelcolor='r')[source]¶ Show the fit result, together with the fitted data.
Parameters: - ymin (float) – limit the yrange to ymin
- xmax (float) – limit the xrange to xmax
- model_alpha (float) – 0 <= x <= 1 the alpha value of the lineplot for the model
- ylabel (str) – label for yaxis
- log (bool) – plot in log scale
- figure_factory (fnc) – Use to generate the figure
- axes_range (str) – the “field of view” to show
- fig (pylab.figure) – A figure instance
- add_parameter_text (tuple) – Display a parameter in the table on the plot ((text, parameter_number), (text, parameter_number),…)
- datacolor (str) – color for the data points
- modelcolor (str) – color for the model prediction
Returns: pylab.figure
-
-
HErmes.fitting.model.
concat_functions
(fncs)[source]¶ Inspect functions and construct a new one which returns the added result. concat_functions(A(x, apars), B(x, bpars)) -> C(x, apars,bpars) C(x, apars, bpars) returns (A(x, apars) + B(x, bpars))
Parameters: fncs (list) – The callables to concat Returns: tuple (callable, list(pars))
-
HErmes.fitting.model.
construct_efunc
(x, data, jointfunc, joint_pars)[source]¶ Construct a least-squares error function. This function will then be minimized, e.g. with the help of minuit.
Parameters: - x (np.ndarray) – The x-values the fit should be evaluated on
- data – (np.ndarray): The y-values of the data we want to describe
- jointfunc – (callable): The full data model with all components
- joint_pars – (tuple): The model parameters
Returns: callable
-
HErmes.fitting.model.
copy_func
(f)[source]¶ Based on http://stackoverflow.com/a/6528148/190597 (Glenn Maynard)
Basically recreate the function f independently.
Parameters: f (callable) – the function f will be cloned
-
HErmes.fitting.model.
create_minuit_pardict
(fn, startparams, errors, limits, errordef)[source]¶ Construct a dictionary for minuit fitting. This dictionary contains information for the minuit fitter like startparams or limits.
Parameters: - fn (callable) – The function for which
- startparams (tuple) – A list of startparameter. One each per parameter
- errors (list) –
?
- limits (list(tuple)) – A list of (min, max) tuples for each parameter, can be None
- errordef (float) – The errordef should be 1 for a least square fit (for what this all is constructed for) or 0.5 in case of a likelihood fit
Returns: dict
Module contents¶
Provide an easy-to-use, intuitive way of fitting models with different components to data. The focus is less on a statistical sophisticated fitting rather than on an explorative approach to data investigation. This might help answer questions of the form - “How compatible is this data with a Gaussian + Exponential?”. Out of the box, this module provides tools targeted to a least-square fit, however, in principle this could be extended to likelihood fits.
Currently the generation of the minimized error function is automatic, and it is generated only for the least-squares case, however this might be expanded in the future.
HErmes.icecube_goodies package¶
Submodules¶
HErmes.icecube_goodies.conversions module¶
Unit conversions and such
-
HErmes.icecube_goodies.conversions.
ConvertPrimaryFromPDG
(pid)[source]¶ Convert a primary id in an i3 file to the new values given by the pdg
-
HErmes.icecube_goodies.conversions.
ConvertPrimaryToPDG
(pid)[source]¶ Convert a primary id in an i3 file to the new values given by the pdg
-
HErmes.icecube_goodies.conversions.
IsPDGEncoded
(pid, neutrino=False)[source]¶ Check if the particle has already a pdg compatible pid
Parameters: id (int) – Partilce Id Keyword Arguments: neutrino (bool) – as nue is H in PDG, set true if you know already that ihe particle might be a neutrino Returns (bool): True if PDG compatible
-
class
HErmes.icecube_goodies.conversions.
PDGCode
[source]¶ Bases:
object
Namespace for PDG conform particle type codes
-
Al26Nucleus
= 1000130260¶
-
Al27Nucleus
= 1000130270¶
-
Ar36Nucleus
= 1000180360¶
-
Ar37Nucleus
= 1000180370¶
-
Ar38Nucleus
= 1000180380¶
-
Ar39Nucleus
= 1000180390¶
-
Ar40Nucleus
= 1000180400¶
-
Ar41Nucleus
= 1000180410¶
-
Ar42Nucleus
= 1000180420¶
-
B10Nucleus
= 1000050100¶
-
B11Nucleus
= 1000050110¶
-
Be9Nucleus
= 1000040090¶
-
C12Nucleus
= 1000060120¶
-
C13Nucleus
= 1000060130¶
-
Ca40Nucleus
= 1000200400¶
-
Ca41Nucleus
= 1000200410¶
-
Ca42Nucleus
= 1000200420¶
-
Ca43Nucleus
= 1000200430¶
-
Ca44Nucleus
= 1000200440¶
-
Ca45Nucleus
= 1000200450¶
-
Ca46Nucleus
= 1000200460¶
-
Ca47Nucleus
= 1000200470¶
-
Ca48Nucleus
= 1000200480¶
-
Cl35Nucleus
= 1000170350¶
-
Cl36Nucleus
= 1000170360¶
-
Cl37Nucleus
= 1000170370¶
-
Cr50Nucleus
= 1000240500¶
-
Cr51Nucleus
= 1000240510¶
-
Cr52Nucleus
= 1000240520¶
-
Cr53Nucleus
= 1000240530¶
-
Cr54Nucleus
= 1000240540¶
-
D0
= 421¶
-
D0Bar
= -421¶
-
DMinus
= -411¶
-
DPlus
= 411¶
-
DsMinusBar
= -431¶
-
DsPlus
= 431¶
-
EMinus
= 11¶
-
EPlus
= -11¶
-
Eta
= 221¶
-
F19Nucleus
= 1000090190¶
-
Fe54Nucleus
= 1000260540¶
-
Fe55Nucleus
= 1000260550¶
-
Fe56Nucleus
= 1000260560¶
-
Fe57Nucleus
= 1000260570¶
-
Fe58Nucleus
= 1000260580¶
-
Gamma
= 22¶
-
He3Nucleus
= 1000020030¶
-
He4Nucleus
= 1000020040¶
-
K0_Long
= 130¶
-
K0_Short
= 310¶
-
K39Nucleus
= 1000190390¶
-
K40Nucleus
= 1000190400¶
-
K41Nucleus
= 1000190410¶
-
KMinus
= -321¶
-
KPlus
= 321¶
-
Lambda
= 3122¶
-
LambdaBar
= -3122¶
-
LambdacPlus
= 4122¶
-
Li6Nucleus
= 1000030060¶
-
Li7Nucleus
= 1000030070¶
-
Mg24Nucleus
= 1000120240¶
-
Mg25Nucleus
= 1000120250¶
-
Mg26Nucleus
= 1000120260¶
-
Mn52Nucleus
= 1000250520¶
-
Mn53Nucleus
= 1000250530¶
-
Mn54Nucleus
= 1000250540¶
-
Mn55Nucleus
= 1000250550¶
-
MuMinus
= 13¶
-
MuPlus
= -13¶
-
N14Nucleus
= 1000070140¶
-
N15Nucleus
= 1000070150¶
-
Na23Nucleus
= 1000110230¶
-
Ne20Nucleus
= 1000100200¶
-
Ne21Nucleus
= 1000100210¶
-
Ne22Nucleus
= 1000100220¶
-
Neutron
= 2112¶
-
NeutronBar
= -2112¶
-
NuE
= 12¶
-
NuEBar
= -12¶
-
NuMu
= 14¶
-
NuMuBar
= -14¶
-
NuTau
= 16¶
-
NuTauBar
= -16¶
-
O16Nucleus
= 1000080160¶
-
O17Nucleus
= 1000080170¶
-
O18Nucleus
= 1000080180¶
-
OmegaMinus
= 3334¶
-
OmegaPlusBar
= -3334¶
-
P31Nucleus
= 1000150310¶
-
P32Nucleus
= 1000150320¶
-
P33Nucleus
= 1000150330¶
-
PMinus
= -2212¶
-
PPlus
= 2212¶
-
Pi0
= 111¶
-
PiMinus
= -211¶
-
PiPlus
= 211¶
-
S32Nucleus
= 1000160320¶
-
S33Nucleus
= 1000160330¶
-
S34Nucleus
= 1000160340¶
-
S35Nucleus
= 1000160350¶
-
S36Nucleus
= 1000160360¶
-
Sc44Nucleus
= 1000210440¶
-
Sc45Nucleus
= 1000210450¶
-
Sc46Nucleus
= 1000210460¶
-
Sc47Nucleus
= 1000210470¶
-
Sc48Nucleus
= 1000210480¶
-
Si28Nucleus
= 1000140280¶
-
Si29Nucleus
= 1000140290¶
-
Si30Nucleus
= 1000140300¶
-
Si31Nucleus
= 1000140310¶
-
Si32Nucleus
= 1000140320¶
-
Sigma0
= 3212¶
-
Sigma0Bar
= -3212¶
-
SigmaMinus
= 3112¶
-
SigmaMinusBar
= -3222¶
-
SigmaPlus
= 3222¶
-
SigmaPlusBar
= -3112¶
-
TauMinus
= 15¶
-
TauPlus
= -15¶
-
Ti44Nucleus
= 1000220440¶
-
Ti45Nucleus
= 1000220450¶
-
Ti46Nucleus
= 1000220460¶
-
Ti47Nucleus
= 1000220470¶
-
Ti48Nucleus
= 1000220480¶
-
Ti49Nucleus
= 1000220490¶
-
Ti50Nucleus
= 1000220500¶
-
V48Nucleus
= 1000230480¶
-
V49Nucleus
= 1000230490¶
-
V50Nucleus
= 1000230500¶
-
V51Nucleus
= 1000230510¶
-
WMinus
= -24¶
-
WPlus
= 24¶
-
Xi0
= 3322¶
-
Xi0Bar
= -3322¶
-
XiMinus
= 3312¶
-
XiPlusBar
= -3312¶
-
Z0
= 23¶
-
unknown
= 0¶
-
-
class
HErmes.icecube_goodies.conversions.
ParticleType
[source]¶ Bases:
object
Namespace for icecube particle type codes
-
Al26Nucleus
= 2613¶
-
Al27Nucleus
= 2713¶
-
Ar36Nucleus
= 3618¶
-
Ar37Nucleus
= 3718¶
-
Ar38Nucleus
= 3818¶
-
Ar39Nucleus
= 3918¶
-
Ar40Nucleus
= 4018¶
-
Ar41Nucleus
= 4118¶
-
Ar42Nucleus
= 4118¶
-
B11Nucleus
= 1105¶
-
Be9Nucleus
= 904¶
-
C12Nucleus
= 1206¶
-
Ca40Nucleus
= 4020¶
-
Cl35Nucleus
= 3517¶
-
Cr52Nucleus
= 5224¶
-
EMinus
= 3¶
-
EPlus
= 2¶
-
F19Nucleus
= 1909¶
-
Fe56Nucleus
= 5626¶
-
Gamma
= 1¶
-
He4Nucleus
= 402¶
-
K0_Long
= 10¶
-
K0_Short
= 16¶
-
K39Nucleus
= 3919¶
-
KMinus
= 12¶
-
KPlus
= 11¶
-
Li7Nucleus
= 703¶
-
Mg24Nucleus
= 2412¶
-
Mn55Nucleus
= 5525¶
-
MuMinus
= 6¶
-
MuPlus
= 5¶
-
N14Nucleus
= 1407¶
-
Na23Nucleus
= 2311¶
-
Ne20Nucleus
= 2010¶
-
Neutron
= 13¶
-
NuE
= 66¶
-
NuEBar
= 67¶
-
NuMu
= 68¶
-
NuMuBar
= 69¶
-
NuTau
= 133¶
-
NuTauBar
= 134¶
-
O16Nucleus
= 1608¶
-
P31Nucleus
= 3115¶
-
PMinus
= 15¶
-
PPlus
= 14¶
-
Pi0
= 7¶
-
PiMinus
= 9¶
-
PiPlus
= 8¶
-
S32Nucleus
= 3216¶
-
Sc45Nucleus
= 4521¶
-
Si28Nucleus
= 2814¶
-
TauMinus
= 132¶
-
TauPlus
= 131¶
-
Ti48Nucleus
= 4822¶
-
V51Nucleus
= 5123¶
-
unknown
= 0¶
-
HErmes.icecube_goodies.fluxes module¶
Flux models for atmospheric neutrino and muon fluxes as well as power law fluxes
-
HErmes.icecube_goodies.fluxes.
AtmoWrap
(*args, **kwargs)[source]¶ Allows currying atmospheric flux functions for class interface :param *args: passed through to AtmosphericNuFlux :param **kwargs: passed through to AtmosphericNuFlux
Returns: AtmosphericNuFlux with applied arguments
-
class
HErmes.icecube_goodies.fluxes.
ICMuFluxes
[source]¶ Bases:
object
-
GaisserH3a
= None¶
-
GaisserH4a
= None¶
-
Hoerandel
= None¶
-
Hoerandel5
= None¶
-
-
class
HErmes.icecube_goodies.fluxes.
MuFluxes
[source]¶ Bases:
object
Namespace for atmospheric muon fluxes
-
GaisserH3a
= None¶
-
GaisserH4a
= None¶
-
Hoerandel
= None¶
-
Hoerandel5
= None¶
-
-
class
HErmes.icecube_goodies.fluxes.
NuFluxes
[source]¶ Bases:
object
Namespace for neutrino fluxes
-
static
BARTOL
(x)¶
-
static
BERSSH3a
(x)¶
-
static
BERSSH4a
(x)¶
-
static
E2
(mc_p_energy, mc_p_type, mc_p_zenith, fluxconst=1e-08, gamma=-2)¶
-
static
ERS
(x)¶
-
static
ERSH3a
(x)¶
-
static
ERSH4a
(x)¶
-
static
Honda2006
(x)¶
-
static
Honda2006H3a
(x)¶
-
static
Honda2006H4a
(x)¶
-
static
-
HErmes.icecube_goodies.fluxes.
PowerLawFlux
(fluxconst=1e-08, gamma=2)[source]¶ A simple powerlaw flux
Parameters: Returns (func): the flux function
-
HErmes.icecube_goodies.fluxes.
PowerWrap
(*args, **kwargs)[source]¶ Allows currying PowerLawFlux for class interface
Parameters: - *args – applied to PowerLawFlux
- **kwargs – applied to PowerLawFlux
Returns: PowerLawFlux with applied arguments
-
HErmes.icecube_goodies.fluxes.
generated_corsika_flux
(ebinc, datasets)[source]¶ Calculate the livetime of a number of given coriska datasets using the weighting moduel The calculation here means a comparison of the number of produced events per energy bin with the expected event yield from fluxes in nature. If necessary call home to the simprod db. Works for 5C datasets.
Parameters: - ebinc (np.array) – Energy bins (centers)
- datasets (list) – A list of dictionaries with properties of the datasets or dataset numbers. If only nu8mbers are given, then simprod db will be queried format of dataset dict: example_datasets ={42: {“nevents”: 1, “nfiles”: 1, “emin”: 1, “emax”: 1, “normalization”: [10., 5., 3., 2., 1.], “gamma”: [-2.]*5, “LowerCutoffType”: ‘EnergyPerNucleon’, “UpperCutoffType”: ‘EnergyPerParticle’, “height”: 1600, “radius”: 800}}
Returns: tuple (generated protons, generated irons)
HErmes.icecube_goodies.helpers module¶
Goodies for icecube
HErmes.icecube_goodies.weighting module¶
An interface to icecube’s weighting schmagoigl
-
HErmes.icecube_goodies.weighting.
GetGenerator
(datasets)[source]¶ datasets must be a dict of dataset_id : number_of_files
Parameters: datasets (dict) – Query the database for these datasets. dict dataset_id -> number of files Returns (icecube.weighting…): Generation probability object
-
HErmes.icecube_goodies.weighting.
GetModelWeight
(model, datasets, mc_datasets=None, mc_p_en=None, mc_p_ty=None, mc_p_ze=None, mc_p_we=1.0, mc_p_ts=1.0, mc_p_gw=1.0, **model_kwargs)[source]¶ Compute weights using a predefined model
Parameters: - model (func) – Used to calculate the target flux
- datasets (dict) – Get the generation pdf for these datasets from the db dict needs to be dataset_id -> nfiles
Keyword Arguments: - mc_p_en (array-like) – primary energy
- mc_p_ty (array-like) – primary particle type
- mc_p_ze (array-like) – primary particle cos(zenith)
- mc_p_we (array-like) – weight for mc primary, e.g. some interaction probability
Returns (array-like): Weights
-
class
HErmes.icecube_goodies.weighting.
Weight
(generator, flux)[source]¶ Bases:
object
Provides the weights for weighted MC simulation. Uses the pdf from simulation and the desired flux
-
HErmes.icecube_goodies.weighting.
constant_weights
(size, scale=1.0)[source]¶ Calculate a constant weight for all the entries, e.g. unity
Parameters: size (int) – The size of the returned arraz (d) Keyword Arguments: scale (float) – The returned weight is 1/scale Returns: np.ndarray
-
HErmes.icecube_goodies.weighting.
get_weight_from_weightmap
(model, datasets, mc_datasets=None, mc_p_en=None, mc_p_ty=None, mc_p_ze=None, mc_p_we=1.0, mc_p_ts=1.0, mc_p_gw=1.0, **model_kwargs)[source]¶ Get weights for weighted datasets (generation spectra is already the target flux)
Parameters: - model (func) – Not used, only for compatibility
- datasets (dict) – used to provide nfiles
Keyword Arguments: - mc_p_en (array-like) – primary energy
- mc_p_ty (array-like) – primary particle type
- mc_p_ze (array-like) – primary particle cos(zenith)
- mc_p_we (array-like) – weight for mc primary, e.g. some interaction probability
- mc_p_gw (array-like) – generation weight
- mc_p_ts (array-like) – mc timescale
- mc_datasets (array-like) – an array which has per-event dataset information
Returns (array-like): Weights
Module contents¶
HErmes.plotting package¶
HErmes.visual.canvases module¶
Provides canvases for multi axes plots
-
class
HErmes.visual.canvases.
YStackedCanvas
(subplot_yheights=(0.2, 0.2, 0.5), padding=(0.15, 0.05, 0.0, 0.1), space_between_plots=0, figsize='auto', figure_factory=None)[source]¶ Bases:
object
A canvas for plotting multiple axes on top of each other in Y-direction. So basically creates a several panel multiplot.
-
eliminate_lower_yticks
()[source]¶ Eliminate the lowest y tick on each axes. The bottom axes keeps its lowest y-tick. This might be useful, since typically for stacked plots, the lowest y-tick overwrites the uppermost y-tick of the axis below.
-
global_legend
(*args, **kwargs)[source]¶ A combined legend for all axes
Parameters: args will be passed to pylab.legend (all) – Keyword Arguments: kwargs will be passed to pylab.legend (all) –
-
limit_xrange
(xmin=None, xmax=None)[source]¶ Walk through all axes and set xlims
Keyword Arguments: Returns: None
-
limit_yrange
(ymin=None, ymax=None)[source]¶ Walk through all axes and adjust ymin and ymax
Keyword Arguments: - ymin (float) – min ymin value which will be applied to all axes
- ymin – max ymin value which will be applied to all axes
-
save
(path, name, formats=('pdf', 'png'), **kwargs)[source]¶ Calls pylab.savefig for all endings
Parameters: Keyword Arguments: keyword args will be passed to pylab.savefig (all) –
Returns: The full path to the the saved file
Return type:
-
HErmes.visual.plotting module¶
Define some
-
class
HErmes.visual.plotting.
VariableDistributionPlot
(cuts=None, color_palette='dark', bins=None, xlabel=None)[source]¶ Bases:
object
A plot which shows the distribution of a certain variable. Cuts can be indicated with lines and arrows. This class defines (and somehow enforces) a certain style.
-
add_cumul
(name)[source]¶ Add a cumulative distribution to the plot
Parameters: name (str) – the name of the category
-
add_cuts
(cut)[source]¶ Add a cut to the the plot which can be indicated by an arrow
Parameters: cuts (HErmes.selection.cuts.Cut) – Returns: None
-
add_data
(variable_data, name, bins=None, weights=None, label='')[source]¶ Histogram the added data and store internally
Parameters: - name (string) – the name of a category
- variable_data (array) – the actual data
Keyword Arguments:
-
add_legend
(**kwargs)[source]¶ Add a legend to the plot. If no kwargs are passed, use some reasonable default.
Keyword Arguments: be passed to pylab.legend (will) –
-
add_ratio
(nominator, denominator, total_ratio=None, total_ratio_errors=None, log=False, label='data/$\\Sigma$ bg')[source]¶ Add a ratio plot to the canvas
Parameters: Keyword Arguments:
-
add_variable
(category, variable_name, external_weights=None, transform=None)[source]¶ Convenience interface if data is sorted in categories already
Parameters: - category (HErmese.variables.category.Category) – Get variable from this category
- variable_name (string) – The name of the variable
Keyword Arguments: - external_weights (np.ndarray) – Supply an array for weighting. This will OVERIDE ANY INTERNAL WEIGHTING MECHANISM and use the supplied weights.
- transform (callable) – Apply transformation todata
-
indicate_cut
(ax, arrow=True)[source]¶ If cuts are given, indicate them by lines
Parameters: ax (pylab.axes) – axes to draw on
-
static
optimal_plotrange_histo
(histograms)[source]¶ Get most suitable x and y limits for a bunc of histograms
Parameters: histograms (list(d.factory.hist1d)) – The histograms in question Returns: xmin, xmax, ymin, ymax Return type: tuple (float, float, float, float)
-
plot
(axes_locator=((0, 'c', 0.2), (1, 'r', 0.2), (2, 'h', 0.5)), combined_distro=True, combined_ratio=True, combined_cumul=True, normalized=True, style='classic', log=True, legendwidth=1.5, ylabel='rate/bin [1/s]', figure_factory=None, zoomin=False, adjust_ticks=<function VariableDistributionPlot.<lambda>>)[source]¶ Create the plot
Keyword Arguments: - axes_locator (tuple) –
A specialized tuple defining where the axes should be located in the plot tuple has the following form: ( (PLOTA), (PLOTB), …) where PLOTA is a tuple itself of the form (int, str, int) describing (plotnumber, plottype, height of the axes in the figure) plottype can be either: “c” - cumulative
”r” - ratio “h” - histogram - combined_distro –
- combined_ratio –
- combined_cumul –
- log (bool) –
- style (str) – Apply a simple style to the plot. Options are “modern” or “classic”
- normalized (bool) –
- figure_factor (fcn) – Must return a matplotlib figure, use for custom formatting
- zoomin (bool) – If True, select the yrange in a way that the interesting part of the histogram is shown. Caution is needed, since this might lead to an overinterpretation of fluctuations.
- adjust_ticks (fcn) – A function, applied on a matplotlib axes which will set the proper axis ticks
Returns:
- axes_locator (tuple) –
-
-
HErmes.visual.plotting.
create_arrow
(ax, x_0, y_0, dx, dy, length, width=0.1, shape='right', fc='k', ec='k', alpha=1.0, log=False)[source]¶ Create an arrow object for plots. This is typically a large arrow, which can used to indicate a region in the plot which is excluded by a cut.
Parameters: - ax (matplotlib.axes._subplots.AxesSubplot) – The axes where the arrow will be attached to
- x_0 (float) – x-origin of the arrow
- y_0 (float) – y-origin of the arrow
- dx (float) – x length of the arrow
- dy (float) – y length of the arrow
- length (float) – additional scaling parameter to scale the length of the arrow
Keyword Arguments: Returns: matplotlib.axes._subplots.AxesSubplot
-
HErmes.visual.plotting.
gaussian_fwhm_fit
(data, startparams=(0, 0.2, 1), fitrange=((None, None), (None, None), (None, None)), fig=None, bins=80, xlabel='$\\theta_{{rec}} - \\theta_{{true}}$')[source]¶ A plot with a gaussian fitted to data. A histogram of the data will be created and a gaussian will be fitted, with 68 and 95 percentiles indicated in the plot. The gaussian will be in a form so that the fwhm can be read directly from it. The “width” parameter of the gaussian is NOT the standard deviation, but FWHM!
Parameters: data (array-like) – input data with a (preferably) gaussian distribution
Keyword Arguments: - startparams (tuple) – a set of startparams of the gaussian fit. It is a 3 parameter fit with mu, fwhm and amplitude
- fitrange (tuple) – if desired, the fit can be restrained. One tuple of (min, max) per parameter
- fig (matplotlib.Figure) – pre-created figure to draw the plot in
- bins (array-like or int) – bins for the underliying histogram
- xlabel (str) – label for the x-axes
-
HErmes.visual.plotting.
gaussian_model_fit
(data, startparams=(0, 0.2), fitrange=((None, None), (None, None)), fig=None, norm=True, bins=80, xlabel='$\\theta_{{rec}} - \\theta_{{true}}$')[source]¶ A plot with a gaussian fitted to data. A histogram of the data will be created and a gaussian will be fitted, with 68 and 95 percentiles indicated in the plot.
Parameters: data (array-like) – input data with a (preferably) gaussian distribution
Keyword Arguments: - startparams (tuple) – a set of startparams of the gaussian fit. If only mu/sigma are given, then the plot will be normalized
- fig (matplotlib.Figure) – pre-created figure to draw the plot in
- bins (array-like or int) – bins for the underliying histogram
- fitrange (tuple(min, max) – min-max range for the gaussian fit
- xlabel (str) – label for the x-axes
-
HErmes.visual.plotting.
line_plot
(quantities, bins=None, xlabel='', add_ratio=None, ratiolabel='', colors=None, figure_factory=None)[source]¶ Parameters: quantities –
Keyword Arguments: Returns:
-
HErmes.visual.plotting.
meshgrid
(xs, ys)[source]¶ Create x and y data for matplotlib pcolormesh and similar plotting functions.
Parameters: - xs (np.ndarray) – 1d x bins
- ys (np.ndarray) – 2d y bins
Returns: 2d X and 2d Y matrices as well as a placeholder for the Z array
Return type: tuple (np.ndarray, np.ndarray, np.ndarray)
Module contents¶
A set of
HErmes.selection package¶
Submodules¶
HErmes.selection.categories module¶
Categories of data, like “signal” of “background” etc
-
class
HErmes.selection.categories.
AbstractBaseCategory
(name)[source]¶ Bases:
object
Stands for a specific type of data, e.g. detector data in a specific configuarion, simulated data etc.
-
add_cut
(cut)[source]¶ Add a cut without applying it yet
Parameters: cut (pyevsel.variables.cut.Cut) – Append this cut to the internal cutlist
-
add_livetime_weighted
(other, self_livetime=None, other_livetime=None)[source]¶ Combine two datasets livetime weighted. If it is simulated data, then in general it does not know about the detector livetime. In this case the livetimes for the two datasets can be given
Parameters: other (pyevsel.categories.Category) – Add this dataset
Keyword Arguments:
-
add_plotoptions
(options)[source]¶ Add options on how to plot this category. If available, they will be used.
Parameters: options (dict) – For the names which are currently supported, please see the example file
-
add_variable
(variable)[source]¶ Add a variable to this category
Parameters: variable (pyevsel.variables.variables.Variable) – A Variable instalce
-
apply_cuts
(inplace=False)[source]¶ Apply the added cuts.
Keyword Arguments: inplace (bool) – If True, cut the internal variable buffer (Can not be undone except variable is reloaded)
-
delete_variable
(varname)[source]¶ Remove a variable entirely from the category
Parameters: varname (str) – The name of the variable as stored in self.variable dict Returns: None
-
distribution
(varname, bins=None, color=None, alpha=0.5, fig=None, xlabel=None, norm=False, filled=None, legend=True, style='line', log=False, transform=None, extra_weights=None, figure_factory=None, return_histo=False)[source]¶ Plot the distribution of variable in the category
Parameters: varname (str) – The name of the variable in the catagory
Keyword Arguments: - bins (int/np.ndarray) – Bins for the distribution
- color (str/int) – A color identifier, either number 0-5 or matplotlib compatible
- alpha (float) – 0-1 alpha value for histogram
- fig (matplotlib.figure.Figure) – Canvas for plotting, if None an empty one will be created
- xlabel (str) – xlabel for the plot. If None, default is used
- norm (str) – “n” or “density” - make normed histogram
- style (str) – Either “line” or “scatter”
- filled (bool) – Draw filled histogram
- legend (bool) – if available, plot a legend
- transform (callable) – Apply transformation to the data before plotting
- log (bool) – Plot yaxis in log scale
- extra_weights (numpy.ndarray) – Use this for weighting. Will overwrite any other weights in the dataset
- figure_factory (func) – Must return a single matplotlib.Figure, NOTE: figure_factory has priority over fig keyword
- return_histo (bool) – Return the histogram instead of the figure. WARNING: changes return type!
Returns: matplotlib.figure.Figure or dashi.histogram.hist1d
-
distribution2d
(varnames, bins=None, figure_factory=None, fig=None, norm=False, log=True, cmap=<Mock name='mock.get_cmap()' id='139871021261768'>, interpolation='gaussian', cblabel='events', weights=None, transform=(None, None), despine=False, alpha=0.95, return_histo=False)[source]¶ Draw a 2d distribution of 2 variables in the same category. :param varnames: The names of the variable in the catagory :type varnames: tuple(str,str)
Keyword Arguments: - bins (tuple(int/np.ndarray)) – Bins for the distribution
- cmap – A colormap
- alpha (float) – 0-1 alpha value for histogram
- fig (matplotlib.figure.Figure) – Canvas for plotting, if None an empty one will be created
- xlabel (//) – xlabel for the plot. If None, default is used
- norm (str) – “n” or “density” - make normed histogram
- style (//) – Either “line” or “scatter”
- transform (tuple) – Apply transformation to the data before plotting
- alpha – 0-1, transparency of the histogram
- log (bool) – Plot yaxis in log scale
- transform – Two functions which shall transform sample 1 and 2 respectively
- figure_factory (func) – Must return a single matplotlib.Figure, NOTE: figure_factory has priority over fig keyword
- return_histo (bool) – Return the histogram instead of the figure. WARNING: changes return type!
Returns: matplotlib.figure.Figure or dashi.histogram.hist1d
-
explore_files
()[source]¶ Get a sneak preview of what variables are avaukabke for readout
Returns: list
-
get
(varkey, uncut=False)[source]¶ Retrieve the data of a variable
Parameters: varkey (str) – The name of the variable Keyword Arguments: uncut (bool) – never return cutted values
-
get_files
(*args, **kwargs)[source]¶ Load files for this category uses HErmes.utils.files.harvest_files
Parameters: *args (list of strings) – Path to possible files
Keyword Arguments: - (dict(dataset_id (datasets) – nfiles)): i given, load only files from dataset dataset_id set nfiles parameter to amount of L2 files the loaded files will represent
- force (bool) – forcibly reload filelist (pre-readout vars will be lost)
- append (bool) – keep the already aquired files and only append the new ones
- other kwargs will be passed to (all) –
- utils.files.harvest_files –
-
harvested
¶
-
integrated_rate
¶ Calculate the total eventrate of this category (requires weights)
Returns (tuple): rate and quadratic error
-
load_vardefs
(module)[source]¶ Load the variable definitions from a module
Parameters: module (python module) – Needs to contain variable definitions
-
raw_count
¶ Gives a number of “how many events are actually there”
Returns: int
-
read_variables
(names=None, max_cpu_cores=6, dtype=<class 'numpy.float64'>)[source]¶ Harvest the variables in self.vardict
Keyword Arguments:
-
variablenames
¶
-
weights
¶
-
weightvarname
= None¶
-
-
class
HErmes.selection.categories.
CombinedCategory
(name, categories)[source]¶ Bases:
object
Create a combined category out of several others This is mainly useful for plotting FIXME: should this inherit from category as well? The difference compared to the dataset is that this is flat
-
add_plotoptions
(options)[source]¶ Add options on how to plot this category. If available, they will be used.
Parameters: options (dict) – For the names which are currently supported, please see the example file
-
integrated_rate
¶ Calculate the total eventrate of this category (requires weights)
Returns (tuple): rate and quadratic error
-
vardict
¶
-
weights
¶
-
-
class
HErmes.selection.categories.
Data
(name)[source]¶ Bases:
HErmes.selection.categories.AbstractBaseCategory
An interface to real time event data Simplified weighting only
-
calculate_weights
(model=None, model_args=None)[source]¶ Calculate weights as rate, that is number of events per livetime
Keyword Args: for compatibility…
-
estimate_livetime
(force=False)[source]¶ Calculate the livetime from run start/stop times, account for gaps
Keyword Arguments: force (bool) – overide existing livetime
-
livetime
¶
-
set_livetime
(livetime)[source]¶ Override the private _livetime member
Parameters: livetime – The time needed for data-taking Returns: None
-
set_run_start_stop
(runstart_var=<Variable: None>, runstop_var=<Variable: None>)[source]¶ Let the simulation category know which are the paramters describing the primary
Keyword Arguments: - runstart_var (pyevself.variables.variables.Variable/str) – beginning of a run
- runstop_var (pyevself.variables.variables.Variable/str) – beginning of a run
-
-
class
HErmes.selection.categories.
ReweightedSimulation
(name, mother)[source]¶ Bases:
HErmes.selection.categories.Simulation
A proxy for simulation dataset, when only the weighting differs
-
add_livetime_weighted
(other)[source]¶ Combine two datasets livetime weighted. If it is simulated data, then in general it does not know about the detector livetime. In this case the livetimes for the two datasets can be given
Parameters: other (pyevsel.categories.Category) – Add this dataset
Keyword Arguments:
-
datasets
¶
-
files
¶
-
get
(varname, uncut=False)[source]¶ Retrieve the data of a variable
Parameters: varkey (str) – The name of the variable Keyword Arguments: uncut (bool) – never return cutted values
-
harvested
¶
-
mother
¶
-
raw_count
¶ Gives a number of “how many events are actually there”
Returns: int
-
read_mc_primary
(energy_var='mc_p_en', type_var='mc_p_ty', zenith_var='mc_p_ze', weight_var='mc_p_we')[source]¶ Trigger the readout of MC Primary information Rename variables to magic keywords if necessary
Keyword Arguments:
-
read_variables
(names=None, max_cpu_cores=6, dtype=<class 'numpy.float64'>)[source]¶ Harvest the variables in self.vardict
Keyword Arguments:
-
setter
(other)¶
-
vardict
¶
-
-
class
HErmes.selection.categories.
Simulation
(name, weightvarname=None)[source]¶ Bases:
HErmes.selection.categories.AbstractBaseCategory
An interface to variables from simulated data Allows to weight the events
-
calculate_weights
(model=None, model_args=None)[source]¶ Walk the variables of this category and identify the weighting variables and calculate them.
Usage example: calculate_weights(model=lambda x: np.pow(x, -2.), model_args=[“primary_energy”])
Keyword Arguments: - model (func) – The target flux to weight to, if None, generated flux is used for weighting
- model_args (list) – The variables the model should be applied to from the variable dict
Returns: np.ndarray
-
livetime
¶
-
mc_p_readout
¶
-
-
HErmes.selection.categories.
cut_with_nans
(data, cutmask)[source]¶ Cut the individual fields of a 2d array and keep the shape by filling up with nans
Parameters: - data (np.ndarray) – The array to cut
- cutmask (np.ndarray) – Cut with this boolean array
Returns: data with applied cuts
Return type: np.ndarray
HErmes.selection.cut module¶
Remove part of the data which falls below a certain criteria.
HErmes.selection.dataset module¶
Datasets group categories together. Method calls on datasets invoke the individual methods on the individual categories. Cuts applied to datasets will act on each individual category.
-
class
HErmes.selection.dataset.
Dataset
(*args, **kwargs)[source]¶ Bases:
object
Holds different categories, relays calls to each of them.
-
add_category
(category)[source]¶ Add another category to the dataset
Parameters: category (HErmes.selection.categories.Category) – add this category
-
add_cut
(cut)[source]¶ Add a cut without applying it yet
Parameters: cut (HErmes.selection.variables.cut.Cut) – Append this cut to the internal cutlist
-
add_variable
(variable)[source]¶ Add a variable to this category
Parameters: variable (HErmes.selection.variables.variables.Variable) – A Variable instalce
-
calc_ratio
(nominator=None, denominator=None)[source]¶ Calculate a ratio of the given categories
Parameters: Returns: tuple
-
calculate_weights
(model=None, model_args=None)[source]¶ Calculate the weights for all categories
Keyword Arguments: - model (dict/func) – Either a dict catname -> func or a single func If it is a single funct it will be applied to all categories
- model_args (dict/list) – variable names as arguments for the function
-
categorynames
¶
-
combined_categorynames
¶
-
delete_variable
(varname)[source]¶ Delete a variable entirely from the dataset
Parameters: varname (str) – the name of the variable Returns: None
-
distribution
(name, ratio=([], []), cumulative=True, log=False, transform=None, disable_weights=False, color_palette='dark', normalized=False, styles={}, style='classic', ylabel='rate/bin [1/s]', axis_properties=None, ratiolabel='data/$\\Sigma$ bg', bins=None, external_weights=None, savepath=None, figure_factory=None, zoomin=False, adjust_ticks=<function Dataset.<lambda>>)[source]¶ One shot short-cut for one of the most used plots in eventselections.
Parameters: name (string) – The name of the variable to plot
Keyword Arguments: - path (str) – The path under which the plot will be saved.
- ratio (list) – A ratio plot of these categories will be crated
- color_palette (str) – A predifined color palette (from seaborn or HErmes.plotting.colors)
- normalized (bool) – Normalize the histogram by number of events
- transform (callable) – Apply this transformation before plotting
- disable_weights (bool) – Disable all weighting to avoid problems with uneven sized arrays
- styles (dict) – plot styling options
- ylabel (str) – general label for y-axis
- ratiolabel (str) – different label for the ratio part of the plot
- bins (np.ndarray) – binning, if None binning will be deduced from the variable definition
- figure_factory (func) – factory function which return a matplotlib.Figure
- style (string) – TODO “modern” || “classic” || “modern-cumul” || “classic-cumul”
- savepath (string) – Save the canvas at given path. None means it will not be saved.
- external_weights (dict) – supply external weights - this will OVERIDE ANY INTERNALLY CALCULATED WEIGHTS and use the supplied weights instead. Must be in the form { “categoryname” : weights}
- axis_properties (dict) –
Manually define a plot layout with up to three axes. For example, it can look like this: {
- ”top”: {“type”: “h”, # histogram
- ”height”: 0.4, # height in percent “index”: 2}, # used internally
- ”center”: {“type”: “r”, # ratio plot
- ”height”: 0.2, “index”: 1},
- ”bottom”: { “type”: “c”, # cumulative histogram
- ”height”: 0.2, “index”: 0}
}
- zoomin (bool) – If True, select the yrange in a way that the interesting part of the histogram is shown. Caution is needed, since this might lead to an overinterpretation of fluctuations.
- adjust_ticks (fcn) – A function, applied on a matplotlib axes which will set the proper axis ticks
Returns: HErmes.selection.variables.VariableDistributionPlot
-
files
¶
-
get_category
(categoryname)[source]¶ Get a reference to a category.
Parameters: category – A name which has to be associated to a category Returns: HErmes.selection.categories.Category
-
get_sparsest_category
(omit_empty_cat=True)[source]¶ Find out which category of the dataset has the least statistical power
Keyword Arguments: omit_empty_cat (bool) – if a category has no entries at all, omit Returns: category name Return type: str
-
get_variable
(varname)[source]¶ Get a pandas dataframe for all categories
Parameters: varname (str) – A name of a variable Returns: A 2d dataframe category -> variable Return type: pandas.DataFrame
-
integrated_rate
¶ Integrated rate for each category
Returns: rate with error Return type: pandas.Panel
-
load_vardefs
(vardefs)[source]¶ Load the variable definitions from a module
Parameters: vardefs (python module/dict) – A module needs to contain variable definitions. It can also be a dictionary of categoryname->module
-
read_variables
(names=None, max_cpu_cores=6, dtype=<class 'numpy.float64'>)[source]¶ Read out the variable for all categories
Keyword Arguments: Returns: None
-
set_default_plotstyles
(styledict)[source]¶ Define a standard for each category how it should appear in plots
Parameters: styledict (dict) –
-
set_livetime
(livetime)[source]¶ Define a livetime for this dataset.
Parameters: livetime (float) – Time interval the data was taken in. (Used for rate calculation) Returns: None
-
set_weightfunction
(weightfunction=<function Dataset.<lambda>>)[source]¶ Defines a function which is used for weighting
Parameters: weightfunction (func or dict) – if func is provided, set this to all categories if needed, provide dict, cat.name -> func for individula setting Returns: None
-
sum_rate
(categories=None)[source]¶ Sum up the integrated rates for categories
Parameters: categories – categories considerred background Returns: rate with error Return type: tuple
-
tinytable
(signal=None, background=None, layout='v', format='html', order_by=<function Dataset.<lambda>>, livetime=1.0)[source]¶ Use dashi.tinytable.TinyTable to render a nice html representation of a rate table
Parameters: Returns: formatted table in desired markup
Return type:
-
variablenames
¶
-
weights
¶ Get the weights for all categories in this dataset
-
HErmes.selection.magic_keywords module¶
All magic keywords shall summon here
HErmes.selection.variables module¶
Container classes for variables
-
class
HErmes.selection.variables.
AbstractBaseVariable
[source]¶ Bases:
object
Read out tagged numerical data from files
-
ROLES
¶ alias of
VariableRole
-
bins
¶
-
calculate_fd_bins
(cutmask=None)[source]¶ Calculate a reasonable binning
Keyword Arguments: cutmask (numpy.ndarray) – a boolean mask to cut on, in case cuts have been applied to the category this data is part of Returns: Freedman Diaconis bins Return type: numpy.ndarray
-
data
¶
-
harvest
(*files)[source]¶ Hook to the harvest method. Don’t use in case of multiprocessing! :param *files: walk through these files and readout
-
harvested
¶
-
ndim
¶
-
-
class
HErmes.selection.variables.
CompoundVariable
(name, variables=None, label='', bins=None, operation=<function CompoundVariable.<lambda>>, role=<VariableRole.SCALAR: 10>, dtype=<class 'numpy.float64'>)[source]¶ Bases:
HErmes.selection.variables.AbstractBaseVariable
Calculate a variable from other variables. This kind of variable will not read any file.
-
class
HErmes.selection.variables.
Variable
(name, definitions=None, bins=None, label='', transform=None, role=<VariableRole.SCALAR: 10>, nevents=None, reduce_dimension=None)[source]¶ Bases:
HErmes.selection.variables.AbstractBaseVariable
A hook to a single variable read out from a file
-
class
HErmes.selection.variables.
VariableList
(name, variables=None, label='', bins=None, role=<VariableRole.SCALAR: 10>)[source]¶ Bases:
HErmes.selection.variables.AbstractBaseVariable
A list of variable. Can not be read out from files.
-
data
¶
-
-
class
HErmes.selection.variables.
VariableRole
[source]¶ Bases:
enum.Enum
Define roles for variables. Some variables used in a special context (like weights) are easily recognizable by this flag.
-
ARRAY
= 20¶
-
ENDTIME
= 70¶
-
EVENTID
= 50¶
-
FLUXWEIGHT
= 80¶
-
GENERATORWEIGHT
= 30¶
-
PARAMETER
= 90¶
-
RUNID
= 40¶
-
SCALAR
= 10¶
-
STARTIME
= 60¶
-
UNKNOWN
= 0¶
-
-
HErmes.selection.variables.
extract_from_root
(filename, definitions, nevents=None, dtype=<class 'numpy.float64'>, transform=None, reduce_dimension=None)[source]¶ Use the uproot system to get information from rootfiles. Supports a basic tree of primitive datatype like structure.
Parameters: Keyword Arguments: - nevents (int) – number of events to read out
- reduce_dimension (int) – If data is vector-type, reduce it by taking the n-th element
- dtype (np.dtyoe) – A numpy datatype, default double (np.float64) - use smaller dtypes to save memory
- transform (func) – A function which directy transforms the readout data
-
HErmes.selection.variables.
freedman_diaconis_bins
(data, leftedge, rightedge, minbins=20, maxbins=70, fallbackbins=70)[source]¶ Get a number of bins for a histogram following Freedman/Diaconis
Parameters: Returns: number of bins, minbins < bins < maxbins
Return type: nbins (int)
-
HErmes.selection.variables.
harvest
(filenames, definitions, **kwargs)[source]¶ Read variables from files into memory. Will be used by HErmes.selection.variables.Variable.harvest This will be run multi-threaded. Keep that in mind, arguments have to be picklable, also everything thing which is read out must be picklable. Lambda functions are NOT picklable
Parameters: - filenames (list) – the files to extract the variables from. currently supported: hdf
- definitions (list) – where to find the data in the files. They usually have some tree-like structure, so this a list of leaf-value pairs. If there is more than one all of them will be tried. (As it might be that in some files a different naming scheme was used) Example: [(“hello_reoncstruction”, “x”), (“hello_reoncstruction”, “y”)] ]
Keyword Arguments: - transformation (func) – After the data is read out from the files, transformation will be applied, e.g. the log to the energy.
- fill_empty (bool) – Fill empty fields with zeros
- nevents (int) – ROOT only - read out only nevents from the files
- reduce_dimension (str) – ROOT only - multidimensional data can be reduced by only using the index given by reduce_dimension. E.g. in case of a TVector3, and we want to have onlz x, that would be 0, y -> 1 and z -> 2.
- dtype (np.dtype) – datatype to cast to (default np.float64, but can be used to reduce memory footprint.
Returns: pd.Series or pd.DataFrame
Module contents¶
Provides containers for in-memory variable. These containers are called “categroies”, and they represent a set of variables for a certain type of data. Categories can be further grouped into “Datasets”. Variables can be read out from files and stored in memory in the form of numpy arrays or pandas DataSeries/DataFrames. Selection criteria can be applied simultaniously (and reversibly) to all categories in a dataset with the “Cut” class.
HErmes.selection provides the following submodules:
- categories : Container classes for variables.
- dataset : Grouping categories together.
- cut : Apply selection criteria on variables in a category.
- variables : Variable definition. Harvest variables from files.
- magic_keywords : A bunch of fixed names for automatic weight calculation.
-
HErmes.selection.
load_dataset
(config, variables=None, max_cpu_cores=6, only_nfiles=None, dtype=<class 'numpy.float64'>)[source]¶ Read a json configuration file and load a dataset populated with variables from the files given in the configuration file.
Parameters: config (str/dict) – json style config file or dict
Keyword Arguments: - variables (list) – list of strings of variable names to read out
- max_cpu_cores (int) – maximum number of cpu ucores to use for variable readout
- only_nfiles (int) – readout only ‘only_nfiles’
- dtype (np.dtype) – cast to the given datatype. By default it will be always double (which is np.float64), however ofthen times it is advisable to downcast to a less precise type to save memory.
Returns: HErmes.selection.dataset.Dataset
HErmes.utils package¶
Submodules¶
HErmes.utils.files module¶
Locate files on the filesystem and group them together
-
HErmes.utils.files.
DS_ID
(filename)¶
-
HErmes.utils.files.
ENDING
(filename)¶
-
HErmes.utils.files.
EXP_RUN_ID
(filename)¶
-
HErmes.utils.files.
GCD
(filename)¶
-
HErmes.utils.files.
SIM_RUN_ID
(filename)¶
-
HErmes.utils.files.
check_hdf_integrity
(infiles, checkfor=None)[source]¶ Checks if hdfiles can be openend and returns a tuple integer_files,corrupt_files
Parameters: infiles (list) – Keyword Arguments: checkfor (str) –
-
HErmes.utils.files.
group_names_by_regex
(names, regex=<function <lambda>>, firstpattern=<function <lambda>>, estimate_first=<function <lambda>>)[source]¶ Generate lists with files which all have the same name patterns, group by regex
Parameters: names (list) – a list of file names
Keyword Arguments: - regex (func) – a regex to group by
- firstpattern (func) – the leading element of each list
- estimate_first (func) – if there are servaral elements which match firstpattern, estimate which is the first
Returns: names grouped by reges with first pattern as leading element
Return type:
-
HErmes.utils.files.
harvest_files
(path, ending='.bz2', sanitizer=<function <lambda>>, use_ls=False, prefix='dcap://')[source]¶ Get all the files with a specific ending from a certain path
Parameters: path (str) – a path on the filesystem to look for files
Keyword Arguments: Returns: All files in path which match ending and are filtered by sanitizer
Return type:
-
HErmes.utils.files.
strip_all_endings
(filename)[source]¶ Split a filename at the first dot and declare everything which comes after it and consists of 3 or 4 characters (including the dot) as “ending”
Parameters: filename (str) – a filename which shall be split Returns: file basename + ending Return type: list
Module contents¶
Miscellaneous tools
Module contents¶
A package for filtering datasets as common in high energy physics. Read data from hdf or root files, classify the data in different categories and provide and easy interface to easy access to the variables stored in the files.
The HErmes modules provides the following submodules:
- selection : Start from a .json configuration file to create a full fledged dataset which acts as a container for in different categories.
- utils : Aggregator for files and logging.
- fitting : Fit models to variable distributions with iminuit.
- visual : Data visualization.
- icecube_goodies : Weighting for icecube datasets.
- analysis : convenient functions for data analysis and working with distributions.