HErmes.selection package¶
Submodules¶
HErmes.selection.categories module¶
Categories of data, like “signal” of “background” etc
-
class
HErmes.selection.categories.
AbstractBaseCategory
(name)[source]¶ Bases:
object
Stands for a specific type of data, e.g. detector data in a specific configuarion, simulated data etc.
-
add_cut
(cut)[source]¶ Add a cut without applying it yet
Parameters: cut (pyevsel.variables.cut.Cut) – Append this cut to the internal cutlist
-
add_livetime_weighted
(other, self_livetime=None, other_livetime=None)[source]¶ Combine two datasets livetime weighted. If it is simulated data, then in general it does not know about the detector livetime. In this case the livetimes for the two datasets can be given
Parameters: other (pyevsel.categories.Category) – Add this dataset
Keyword Arguments:
-
add_plotoptions
(options)[source]¶ Add options on how to plot this category. If available, they will be used.
Parameters: options (dict) – For the names which are currently supported, please see the example file
-
add_variable
(variable)[source]¶ Add a variable to this category
Parameters: variable (pyevsel.variables.variables.Variable) – A Variable instalce
-
apply_cuts
(inplace=False)[source]¶ Apply the added cuts.
Keyword Arguments: inplace (bool) – If True, cut the internal variable buffer (Can not be undone except variable is reloaded)
-
delete_variable
(varname)[source]¶ Remove a variable entirely from the category
Parameters: varname (str) – The name of the variable as stored in self.variable dict Returns: None
-
distribution
(varname, bins=None, color=None, alpha=0.5, fig=None, xlabel=None, norm=False, filled=None, legend=True, style='line', log=False, transform=None, extra_weights=None, figure_factory=None, return_histo=False)[source]¶ Plot the distribution of variable in the category
Parameters: varname (str) – The name of the variable in the catagory
Keyword Arguments: - bins (int/np.ndarray) – Bins for the distribution
- color (str/int) – A color identifier, either number 0-5 or matplotlib compatible
- alpha (float) – 0-1 alpha value for histogram
- fig (matplotlib.figure.Figure) – Canvas for plotting, if None an empty one will be created
- xlabel (str) – xlabel for the plot. If None, default is used
- norm (str) – “n” or “density” - make normed histogram
- style (str) – Either “line” or “scatter”
- filled (bool) – Draw filled histogram
- legend (bool) – if available, plot a legend
- transform (callable) – Apply transformation to the data before plotting
- log (bool) – Plot yaxis in log scale
- extra_weights (numpy.ndarray) – Use this for weighting. Will overwrite any other weights in the dataset
- figure_factory (func) – Must return a single matplotlib.Figure, NOTE: figure_factory has priority over fig keyword
- return_histo (bool) – Return the histogram instead of the figure. WARNING: changes return type!
Returns: matplotlib.figure.Figure or dashi.histogram.hist1d
-
distribution2d
(varnames, bins=None, figure_factory=None, fig=None, norm=False, log=True, cmap=<Mock name='mock.get_cmap()' id='140553631437208'>, interpolation='gaussian', cblabel='events', weights=None, despine=False, return_histo=False)[source]¶ Draw a 2d distribution of 2 variables in the same category. :param varnames: The names of the variable in the catagory :type varnames: tuple(str,str)
Keyword Arguments: - bins (tuple(int/np.ndarray)) – Bins for the distribution
- cmap – A colormap
- alpha (//) – 0-1 alpha value for histogram
- fig (matplotlib.figure.Figure) – Canvas for plotting, if None an empty one will be created
- xlabel (//) – xlabel for the plot. If None, default is used
- norm (str) – “n” or “density” - make normed histogram
- style (//) – Either “line” or “scatter”
- transform (callable) – Apply transformation to the data before plotting
- log (bool) – Plot yaxis in log scale
- figure_factory (func) – Must return a single matplotlib.Figure, NOTE: figure_factory has priority over fig keyword
- return_histo (bool) – Return the histogram instead of the figure. WARNING: changes return type!
Returns: matplotlib.figure.Figure or dashi.histogram.hist1d
-
explore_files
()[source]¶ Get a sneak preview of what variables are avaukabke for readout
Returns: list
-
get
(varkey, uncut=False)[source]¶ Retrieve the data of a variable
Parameters: varkey (str) – The name of the variable Keyword Arguments: uncut (bool) – never return cutted values
-
get_files
(*args, **kwargs)[source]¶ Load files for this category uses HErmes.utils.files.harvest_files
Parameters: *args (list of strings) – Path to possible files
Keyword Arguments: - (dict(dataset_id (datasets) – nfiles)): i given, load only files from dataset dataset_id set nfiles parameter to amount of L2 files the loaded files will represent
- force (bool) – forcibly reload filelist (pre-readout vars will be lost)
- append (bool) – keep the already aquired files and only append the new ones
- other kwargs will be passed to (all) –
- utils.files.harvest_files –
-
harvested
¶
-
integrated_rate
¶ Calculate the total eventrate of this category (requires weights)
Returns (tuple): rate and quadratic error
-
load_vardefs
(module)[source]¶ Load the variable definitions from a module
Parameters: module (python module) – Needs to contain variable definitions
-
raw_count
¶ Gives a number of “how many events are actually there”
Returns: int
-
read_variables
(names=None, max_cpu_cores=6)[source]¶ Harvest the variables in self.vardict
Keyword Arguments:
-
variablenames
¶
-
weights
¶
-
weightvarname
= None¶
-
-
class
HErmes.selection.categories.
CombinedCategory
(name, categories)[source]¶ Bases:
object
Create a combined category out of several others This is mainly useful for plotting FIXME: should this inherit from category as well? The difference compared to the dataset is that this is flat
-
add_plotoptions
(options)[source]¶ Add options on how to plot this category. If available, they will be used.
Parameters: options (dict) – For the names which are currently supported, please see the example file
-
integrated_rate
¶ Calculate the total eventrate of this category (requires weights)
Returns (tuple): rate and quadratic error
-
vardict
¶
-
weights
¶
-
-
class
HErmes.selection.categories.
Data
(name)[source]¶ Bases:
HErmes.selection.categories.AbstractBaseCategory
An interface to real time event data Simplified weighting only
-
calculate_weights
(model=None, model_args=None)[source]¶ Calculate weights as rate, that is number of events per livetime
Keyword Args: for compatibility…
-
estimate_livetime
(force=False)[source]¶ Calculate the livetime from run start/stop times, account for gaps
Keyword Arguments: force (bool) – overide existing livetime
-
livetime
¶
-
set_livetime
(livetime)[source]¶ Override the private _livetime member
Parameters: livetime – The time needed for data-taking Returns: None
-
set_run_start_stop
(runstart_var=<Variable: None>, runstop_var=<Variable: None>)[source]¶ Let the simulation category know which are the paramters describing the primary
Keyword Arguments: - runstart_var (pyevself.variables.variables.Variable/str) – beginning of a run
- runstop_var (pyevself.variables.variables.Variable/str) – beginning of a run
-
-
class
HErmes.selection.categories.
ReweightedSimulation
(name, mother)[source]¶ Bases:
HErmes.selection.categories.Simulation
A proxy for simulation dataset, when only the weighting differs
-
add_livetime_weighted
(other)[source]¶ Combine two datasets livetime weighted. If it is simulated data, then in general it does not know about the detector livetime. In this case the livetimes for the two datasets can be given
Parameters: other (pyevsel.categories.Category) – Add this dataset
Keyword Arguments:
-
datasets
¶
-
files
¶
-
get
(varname, uncut=False)[source]¶ Retrieve the data of a variable
Parameters: varkey (str) – The name of the variable Keyword Arguments: uncut (bool) – never return cutted values
-
harvested
¶
-
mother
¶
-
raw_count
¶ Gives a number of “how many events are actually there”
Returns: int
-
read_mc_primary
(energy_var='mc_p_en', type_var='mc_p_ty', zenith_var='mc_p_ze', weight_var='mc_p_we')[source]¶ Trigger the readout of MC Primary information Rename variables to magic keywords if necessary
Keyword Arguments:
-
read_variables
(names=None, max_cpu_cores=6)[source]¶ Harvest the variables in self.vardict
Keyword Arguments:
-
setter
(other)¶
-
vardict
¶
-
-
class
HErmes.selection.categories.
Simulation
(name, weightvarname=None)[source]¶ Bases:
HErmes.selection.categories.AbstractBaseCategory
An interface to variables from simulated data Allows to weight the events
-
calculate_weights
(model=None, model_args=None)[source]¶ Walk the variables of this category and identify the weighting variables and calculate them.
Usage example: calculate_weights(model=lambda x: np.pow(x, -2.), model_args=[“primary_energy”])
Keyword Arguments: - model (func) – The target flux to weight to, if None, generated flux is used for weighting
- model_args (list) – The variables the model should be applied to from the variable dict
Returns: np.ndarray
-
livetime
¶
-
mc_p_readout
¶
-
-
HErmes.selection.categories.
cut_with_nans
(data, cutmask)[source]¶ Cut the individual fields of a 2d array and keep the shape by filling up with nans
Parameters: - data (np.ndarray) – The array to cut
- cutmask (np.ndarray) – Cut with this boolean array
Returns: data with applied cuts
Return type: np.ndarray
HErmes.selection.cut module¶
Remove part of the data which falls below a certain criteria.
HErmes.selection.dataset module¶
Datasets group categories together. Method calls on datasets invoke the individual methods on the individual categories. Cuts applied to datasets will act on each individual category.
-
class
HErmes.selection.dataset.
Dataset
(*args, **kwargs)[source]¶ Bases:
object
Holds different categories, relays calls to each of them.
-
add_category
(category)[source]¶ Add another category to the dataset
Parameters: category (HErmes.selection.categories.Category) – add this category
-
add_cut
(cut)[source]¶ Add a cut without applying it yet
Parameters: cut (HErmes.selection.variables.cut.Cut) – Append this cut to the internal cutlist
-
add_variable
(variable)[source]¶ Add a variable to this category
Parameters: variable (HErmes.selection.variables.variables.Variable) – A Variable instalce
-
calc_ratio
(nominator=None, denominator=None)[source]¶ Calculate a ratio of the given categories
Parameters: Returns: tuple
-
calculate_weights
(model=None, model_args=None)[source]¶ Calculate the weights for all categories
Keyword Arguments: - model (dict/func) – Either a dict catname -> func or a single func If it is a single funct it will be applied to all categories
- model_args (dict/list) – variable names as arguments for the function
-
categorynames
¶
-
combined_categorynames
¶
-
delete_variable
(varname)[source]¶ Delete a variable entirely from the dataset
Parameters: varname (str) – the name of the variable Returns: None
-
distribution
(name, ratio=([], []), cumulative=True, log=False, transform=None, color_palette='dark', normalized=False, styles={}, style='classic', ylabel='rate/bin [1/s]', axis_properties=None, ratiolabel='data/$\\Sigma$ bg', bins=None, external_weights=None, figure_factory=None)[source]¶ One shot short-cut for one of the most used plots in eventselections
Parameters: name (string) – The name of the variable to plot
Keyword Arguments: - path (str) – The path under which the plot will be saved.
- ratio (list) – A ratio plot of these categories will be crated
- color_palette (str) – A predifined color palette (from seaborn or HErmes.plotting.colors)
- normalized (bool) – Normalize the histogram by number of events
- transform (callable) – Apply this transformation before plotting
- styles (dict) – plot styling options
- ylabel (str) – general label for y-axis
- ratiolabel (str) – different label for the ratio part of the plot
- bins (np.ndarray) – binning, if None binning will be deduced from the variable definition
- figure_factory (func) – factory function which return a matplotlib.Figure
- style (string) – TODO “modern” || “classic” || “modern-cumul” || “classic-cumul”
- external_weights (dict) – supply external weights - this will OVERIDE ANY INTERNALLY CALCULATED WEIGHTS and use the supplied weights instead. must be in the form { “categoryname” : weights}
- axis_properties (dict) –
Manually define a plot layout with up to three axes. For example, it can look like this: {
- ”top”: {“type”: “h”, # histogram
- ”height”: 0.4, # height in percent “index”: 2}, # used internally
- ”center”: {“type”: “r”, # ratio plot
- ”height”: 0.2, “index”: 1},
- ”bottom”: { “type”: “c”, # cumulative histogram
- ”height”: 0.2, “index”: 0}
}
Returns: HErmes.selection.variables.VariableDistributionPlot
-
files
¶
-
get_category
(categoryname)[source]¶ Get a reference to a category.
Parameters: category – A name which has to be associated to a category Returns: HErmes.selection.categories.Category
-
get_sparsest_category
(omit_empty_cat=True)[source]¶ Find out which category of the dataset has the least statistical power
Keyword Arguments: omit_empty_cat (bool) – if a category has no entries at all, omit Returns: category name Return type: str
-
get_variable
(varname)[source]¶ Get a pandas dataframe for all categories
Parameters: varname (str) – A name of a variable Returns: A 2d dataframe category -> variable Return type: pandas.DataFrame
-
integrated_rate
¶ Integrated rate for each category
Returns: rate with error Return type: pandas.Panel
-
load_vardefs
(vardefs)[source]¶ Load the variable definitions from a module
Parameters: vardefs (python module/dict) – A module needs to contain variable definitions. It can also be a dictionary of categoryname->module
-
read_variables
(names=None, max_cpu_cores=6)[source]¶ Read out the variable for all categories
Keyword Arguments: Returns: None
-
set_default_plotstyles
(styledict)[source]¶ Define a standard for each category how it should appear in plots
Parameters: styledict (dict) –
-
set_livetime
(livetime)[source]¶ Define a livetime for this dataset.
Parameters: livetime (float) – Time interval the data was taken in. (Used for rate calculation) Returns: None
-
set_weightfunction
(weightfunction=<function Dataset.<lambda>>)[source]¶ Defines a function which is used for weighting
Parameters: weightfunction (func or dict) – if func is provided, set this to all categories if needed, provide dict, cat.name -> func for individula setting Returns: None
-
sum_rate
(categories=None)[source]¶ Sum up the integrated rates for categories
Parameters: categories – categories considerred background Returns: rate with error Return type: tuple
-
tinytable
(signal=None, background=None, layout='v', format='html', order_by=<function Dataset.<lambda>>, livetime=1.0)[source]¶ Use dashi.tinytable.TinyTable to render a nice html representation of a rate table
Parameters: Returns: formatted table in desired markup
Return type:
-
variablenames
¶
-
weights
¶ Get the weights for all categories in this dataset
-
HErmes.selection.magic_keywords module¶
All magic keywords shall summon here
HErmes.selection.variables module¶
Container classes for variables
-
class
HErmes.selection.variables.
AbstractBaseVariable
[source]¶ Bases:
object
Read out tagged numerical data from files
-
ROLES
¶ alias of
VariableRole
-
bins
¶
-
calculate_fd_bins
()[source]¶ Calculate a reasonable binning
Returns: Freedman Diaconis bins Return type: numpy.ndarray
-
data
¶
-
harvest
(*files)[source]¶ Hook to the harvest method. Don’t use in case of multiprocessing! :param *files: walk through these files and readout
-
harvested
¶
-
ndim
¶
-
-
class
HErmes.selection.variables.
CompoundVariable
(name, variables=None, label='', bins=None, operation=<function CompoundVariable.<lambda>>, role=<VariableRole.SCALAR: 10>)[source]¶ Bases:
HErmes.selection.variables.AbstractBaseVariable
Calculate a variable from other variables. This kind of variable will not read any file.
-
class
HErmes.selection.variables.
Variable
(name, definitions=None, bins=None, label='', transform=<function Variable.<lambda>>, role=<VariableRole.SCALAR: 10>, nevents=None, reduce_dimension=None)[source]¶ Bases:
HErmes.selection.variables.AbstractBaseVariable
A hook to a single variable read out from a file
-
class
HErmes.selection.variables.
VariableList
(name, variables=None, label='', bins=None, role=<VariableRole.SCALAR: 10>)[source]¶ Bases:
HErmes.selection.variables.AbstractBaseVariable
A list of variable. Can not be read out from files.
-
data
¶
-
-
class
HErmes.selection.variables.
VariableRole
[source]¶ Bases:
enum.Enum
Define roles for variables. Some variables used in a special context (like weights) are easily recognizable by this flag.
-
ARRAY
= 20¶
-
ENDTIME
= 70¶
-
EVENTID
= 50¶
-
GENERATORWEIGHT
= 30¶
-
RUNID
= 40¶
-
SCALAR
= 10¶
-
STARTIME
= 60¶
-
UNKNOWN
= 0¶
-
-
HErmes.selection.variables.
extract_from_root
(filename, definitions, nevents=None, reduce_dimension=None)[source]¶ Use the uproot system to get information from rootfiles. Supports a basic tree of primitive datatype like structure.
Parameters: Keyword Arguments:
-
HErmes.selection.variables.
freedman_diaconis_bins
(data, leftedge, rightedge, minbins=20, maxbins=70, fallbackbins=70)[source]¶ Get a number of bins for a histogram following Freedman/Diaconis
Parameters: Returns: number of bins, minbins < bins < maxbins
Return type: nbins (int)
-
HErmes.selection.variables.
harvest
(filenames, definitions, **kwargs)[source]¶ Read variables from files into memory. Will be used by HErmes.selection.variables.Variable.harvest This will be run multi-threaded. Keep that in mind, arguments have to be picklable, also everything thing which is read out must be picklable. Lambda functions are NOT picklable
Parameters: - filenames (list) – the files to extract the variables from. currently supported: hdf
- definitions (list) – where to find the data in the files. They usually have some tree-like structure, so this a list of leaf-value pairs. If there is more than one all of them will be tried. (As it might be that in some files a different naming scheme was used) Example: [(“hello_reoncstruction”, “x”), (“hello_reoncstruction”, “y”)] ]
Keyword Arguments: - transformation (func) – After the data is read out from the files, transformation will be applied, e.g. the log to the energy.
- fill_empty (bool) – Fill empty fields with zeros
- nevents (int) – ROOT only - read out only nevents from the files
- reduce_dimension (str) – ROOT only - multidimensional data can be reduced by only using the index given by reduce_dimension. E.g. in case of a TVector3, and we want to have onlz x, that would be 0, y -> 1 and z -> 2.
- FIXME – Not implemented yet! precision (int): Precision in bit
Returns: pd.Series or pd.DataFrame
Module contents¶
Provides containers for in-memory variable. These containers are called “categroies”, and they represent a set of variables for a certain type of data. Categories can be further grouped into “Datasets”. Variables can be read out from files and stored in memory in the form of numpy arrays or pandas DataSeries/DataFrames. Selection criteria can be applied simultaniously (and reversibly) to all categories in a dataset with the “Cut” class.
HErmes.selection provides the following submodules:
- categories : Container classes for variables.
- dataset : Grouping categories together.
- cut : Apply selection criteria on variables in a category.
- variables : Variable definition. Harvest variables from files.
- magic_keywords : A bunch of fixed names for automatic weight calculation.
-
HErmes.selection.
load_dataset
(config, variables=None, max_cpu_cores=6)[source]¶ Read a json configuration file and load a dataset populated with variables from the files given in the configuration file.
Parameters: config (str/dict) – json style config file or dict
Keyword Arguments: Returns: HErmes.selection.dataset.Dataset