Contexts

The contexts are a class from strax and used everywhere in straxen

Below, all of the contexts functions are shown including the minianalyses

Contexts documentation

Auto generated documention of all the context functions including minianalyses

class strax.context.Context(storage=None, config=None, register=None, register_all=None, **kwargs)[source]

Bases: object

Context for strax analysis.

A context holds info on HOW to process data, such as which plugins provide what data types, where to store which results, and configuration options for the plugins.

You start all strax processing through a context.

accumulate(run_id: str, targets: Tuple[str] | List[str], fields=None, function=None, store_first_for_others=True, function_takes_fields=False, **kwargs)[source]

Return a dictionary with the sum of the result of get_array.

Parameters:
  • function

    Apply this function to the array before summing the results. Will be called as function(array), where array is a chunk of the get_array result. Should return either:

    • A scalar or 1d array -> accumulated result saved under ‘result’

    • A record array or dict -> fields accumulated individually

    • None -> nothing accumulated

    If not provided, the identify function is used.

    NB: Additionally and independently, if there are any functions registered under context_config[‘apply_data_function’] these are applied first directly after loading the data.

  • fields – Fields of the function output to accumulate. If not provided, all output fields will be accumulated.

  • store_first_for_others – if True (default), for fields included in the data but not fields, store the first value seen in the data (if any value is seen).

  • function_takes_fields – If True, function will be called as function(data, fields) instead of function(data).

All other options are as for get_iter.

Return dictionary:

Dictionary with the accumulated result; see function and store_first_for_others arguments. Four fields are always added:

start: start time of the first processed chunk end: end time of the last processed chunk n_chunks: number of chunks in run n_rows: number of data entries in run

classmethod add_method(f)[source]

Add f as a new Context method.

apply_cmt_version(cmt_global_version: str) None

Sets all the relevant correction variables.

Parameters:

cmt_global_version – A specific CMT global version, or ‘latest’ to get the newest one

apply_xedocs_configs(db='straxen_db', **kwargs) None
available_for_run(run_id: str, include_targets: None | list | tuple | str = None, exclude_targets: None | list | tuple | str = None, pattern_type: str = 'fnmatch') DataFrame

For a given single run, check all the targets if they are stored. Excludes the target if never stored anyway.

Parameters:
  • run_id – requested run

  • include_targets – targets to include e.g. raw_records, raw_records* or *_nv. If multiple targets (e.g. a list) is provided, the target should match any of the arguments!

  • exclude_targets – targets to exclude e.g. raw_records, raw_records* or *_nv. If multiple targets (e.g. a list) is provided, the target should match none of the arguments!

  • pattern_type – either ‘fnmatch’ (Unix filename pattern matching) or ‘re’ (Regular expression operations).

Returns:

Table of available data per target

compare_metadata(data1, data2, return_results=False)[source]

Compare the metadata between two strax data.

Parameters:

data2 (data1,) – either a list (tuple) of runid + target pair, or path to metadata to

compare, or a dictionary of the metadata :param return_results: bool, if True, returns a dictionary with metadata and lineages that

are found for the inputs does not do the comparison

example usage:

context.compare_metadata( (“053877”, “peak_basics”), “./my_path_to/JSONfile.json”) first_metadata = context.get_metadata(run_id, “events”) context.compare_metadata(

(“053877”, “peak_basics”), first_metadata)

context.compare_metadata(

(“053877”, “records”), (“053899”, “records”) )

results_dict = context.compare_metadata(
(“053877”, “peak_basics”), (“053877”, “events_info”),

return_results=True)

config: dict
context_config: dict
copy_to_frontend(run_id: str, target: str, target_frontend_id: int | None = None, target_compressor: str | None = None, rechunk: bool = False, rechunk_to_mb: int = 200)[source]

Copy data from one frontend to another.

Parameters:
  • run_id – run_id

  • target – target datakind

  • target_frontend_id – index of the frontend that the data should go to in context.storage. If no index is specified, try all.

  • target_compressor – if specified, recompress with this compressor.

  • rechunk – allow re-chunking for saving

  • rechunk_to_mb – rechunk to specified target size. Only works if rechunk is True.

daq_plot(run_id: str, **kwargs)

Plot with peak, records and records sorted by “link” or “ADC ID” (other items are also possible as long as it is in the channel map). This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: . Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

data_info(data_name: str) DataFrame[source]

Return pandas DataFrame describing fields in data_name.

define_run(name: str, data: ndarray | DataFrame | dict | list | tuple, from_run: str | None = None)

Function for defining new superruns from a list of run_ids.

Note:

The function also allows to create a superrun from data (numpy.arrays/pandas.DataFframrs). However, this is currently not supported from the data loading side.

Parameters:
  • name – Name/run_id of the superrun. Suoerrun names must start with an underscore.

  • data – Data from which the superrun should be created. Can be either one of the following: a tuple/list of run_ids or a numpy.array/pandas.DataFrame containing some data.

  • from_run – List of run_ids which were used to create the numpy.array/pandas.DataFrame passed in data.

dependency_tree(target='event_info', dump_plot=True, to_dir='./', format='svg')
deregister_plugins_with_missing_dependencies()[source]

Deregister plugins in case a data_type the plugin depends on is not provided by any other plugin.

estimate_run_start_and_end(run_id, targets=None)[source]

Return run start and end time in ns since epoch.

This fetches from run metadata, and if this fails, it estimates it using data metadata from the targets or the underlying data-types (if it is stored).

event_display(run_id: str, **kwargs)
Make a waveform-display of a given event. Requires events, peaks and

peaklets (optionally: records). NB: time selection should return only one event!

Parameters:
  • context – strax.Context provided by the minianalysis wrapper

  • run_id – run-id of the event

  • events – events, provided by the minianalysis wrapper

  • to_pe – gains, provided by the minianalysis wrapper

  • records_matrix – False (no record matrix), True, or “raw” (show raw-record matrix)

  • s2_fuzz – extra time around main S2 [ns]

  • s1_fuzz – extra time around main S1 [ns]

  • max_peaks – max peaks for plotting in the wf plot

  • xenon1t – True: is 1T, False: is nT

  • display_peak_info – tuple, items that will be extracted from event and displayed in the event info panel see above for format

  • display_event_info – tuple, items that will be extracted from event and displayed in the peak info panel see above for format

  • s1_hp_kwargs – dict, optional kwargs for S1 hitpatterns

  • s2_hp_kwargs – dict, optional kwargs for S2 hitpatterns

:param event_time_limit = overrides x-axis limits of event plot :param plot_all_positions if True, plot best-fit positions

from all posrec algorithms

Returns:

axes used for plotting: ax_s1, ax_s2, ax_s1_hp_t, ax_s1_hp_b, ax_event_info, ax_peak_info, ax_s2_hp_t, ax_s2_hp_b, ax_ev, ax_rec Where those panels (axes) are:

  • ax_s1, main S1 peak

  • ax_s2, main S2 peak

  • ax_s1_hp_t, S1 top hit pattern

  • ax_s1_hp_b, S1 bottom hit pattern

  • ax_s2_hp_t, S2 top hit pattern

  • ax_s2_hp_b, S2 bottom hit pattern

  • ax_event_info, text info on the event

  • ax_peak_info, text info on the main S1 and S2

  • ax_ev, waveform of the entire event

  • ax_rec, (raw)record matrix (if any otherwise None)

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: event_info. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

event_display_interactive(run_id: str, **kwargs)

Interactive event display for XENONnT. Plots detailed main/alt S1/S2, bottom and top PMT hit pattern as well as all other peaks in a given event.

Parameters:
  • bottom_pmt_array – If true plots bottom PMT array hit-pattern.

  • only_main_peaks – If true plots only main peaks into detail plots as well as PMT arrays.

  • only_peak_detail_in_wf – Only plots main/alt S1/S2 into waveform. Only plot main peaks if only_main_peaks is true.

  • plot_all_pmts – Bool if True, colors switched off PMTs instead of showing them in gray, useful for graphs shown in talks.

  • plot_record_matrix – If true record matrix is plotted below. waveform.

  • plot_records_threshold – Threshold at which zoom level to display record matrix as polygons. Larger values may lead to longer render times since more polygons are shown.

  • xenon1t – Flag to use event display with 1T data.

  • colors – Colors to be used for peaks. Order is as peak types, 0 = Unknown, 1 = S1, 2 = S2. Can be any colors accepted by bokeh.

  • yscale – Defines scale for main/alt S1 == 0, main/alt S2 == 1, waveform plot == 2. Please note, that the log scale can lead to funny glyph renders for small values.

  • log – If true color sclae is used for hitpattern plots.

example:

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))
import bokeh.plotting as bklt
fig = st.event_display_interactive(
                 run_id,
                 time_range=(event['time'],
                             event['endtime'])
                 )
bklt.show(fig)
Raises:

Raises an error if the user queries a time range which contains more than a single event.

Returns:

bokeh.plotting.figure instance.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: event_basics, peaks, peak_basics, peak_positions. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

event_display_simple(run_id: str, **kwargs)

Straxen mini-analysis for which someone was too lazy to write a proper docstring This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: event_info. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

event_scatter(run_id: str, **kwargs)

Plot a (cS1, cS2) event scatter plot.

Parameters:
  • show_single – Show events with only S1s or only S2s just besides the axes.

  • s – Scatter size

  • color_dim – Dimension to use for the color. Must be in event_info.

  • color_range – Minimum and maximum color value to show.

  • figsize – (w, h) figure size to use, or leave None to not make a new matplotlib figure.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: event_info. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

extract_latest_comment()

Extract the latest comment in the runs-database. This just adds info to st.runs.

Example:

st.extract_latest_comment() st.select_runs(available=(‘raw_records’))

get_array(run_id: str | tuple | list, targets, save=(), max_workers=None, **kwargs) ndarray[source]

Compute target for run_id and return as numpy array.

Parameters:
  • run_id – run id to get

  • targets – list/tuple of strings of data type names to get

  • save – extra data types you would like to save to cache, if they occur in intermediate computations. Many plugins save automatically anyway.

  • max_workers – Number of worker threads/processes to spawn. In practice more CPUs may be used due to strax’s multithreading.

  • allow_multiple – Allow multiple targets to be computed simultaneously without merging the results of the target. This can be used when mass producing plugins that are not of the same datakind. Don’t try to use this in get_array or get_df because the data is not returned.

  • add_run_id_field – Boolean whether to add a run_id field in case of multi-runs.

  • run_id_as_bytes – Boolean if true uses byte string instead of an unicode string added to a multi-run array. This can save a lot of memory when loading many runs.

  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

get_components(run_id: str, targets=(), save=(), time_range=None, chunk_number=None) ProcessorComponents[source]

Return components for setting up a processor.

Parameters:
  • run_id – run id to get

  • targets – list/tuple of strings of data type names to get

  • save – extra data types you would like to save to cache, if they occur in intermediate computations. Many plugins save automatically anyway.

  • max_workers – Number of worker threads/processes to spawn. In practice more CPUs may be used due to strax’s multithreading.

  • allow_multiple – Allow multiple targets to be computed simultaneously without merging the results of the target. This can be used when mass producing plugins that are not of the same datakind. Don’t try to use this in get_array or get_df because the data is not returned.

  • add_run_id_field – Boolean whether to add a run_id field in case of multi-runs.

  • run_id_as_bytes – Boolean if true uses byte string instead of an unicode string added to a multi-run array. This can save a lot of memory when loading many runs.

  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

get_df(run_id: str | tuple | list, targets, save=(), max_workers=None, **kwargs) DataFrame[source]

Compute target for run_id and return as pandas DataFrame.

Parameters:
  • run_id – run id to get

  • targets – list/tuple of strings of data type names to get

  • save – extra data types you would like to save to cache, if they occur in intermediate computations. Many plugins save automatically anyway.

  • max_workers – Number of worker threads/processes to spawn. In practice more CPUs may be used due to strax’s multithreading.

  • allow_multiple – Allow multiple targets to be computed simultaneously without merging the results of the target. This can be used when mass producing plugins that are not of the same datakind. Don’t try to use this in get_array or get_df because the data is not returned.

  • add_run_id_field – Boolean whether to add a run_id field in case of multi-runs.

  • run_id_as_bytes – Boolean if true uses byte string instead of an unicode string added to a multi-run array. This can save a lot of memory when loading many runs.

  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

get_iter(run_id: str, targets: Tuple[str] | List[str], save=(), max_workers=None, time_range=None, seconds_range=None, time_within=None, time_selection='fully_contained', selection=None, selection_str=None, keep_columns=None, drop_columns=None, allow_multiple=False, progress_bar=True, _chunk_number=None, **kwargs) Iterator[Chunk][source]

Compute target for run_id and iterate over results.

Do NOT interrupt the iterator (i.e. break): it will keep running stuff in background threads…

Parameters:
  • run_id – run id to get

  • targets – list/tuple of strings of data type names to get

  • save – extra data types you would like to save to cache, if they occur in intermediate computations. Many plugins save automatically anyway.

  • max_workers – Number of worker threads/processes to spawn. In practice more CPUs may be used due to strax’s multithreading.

  • allow_multiple – Allow multiple targets to be computed simultaneously without merging the results of the target. This can be used when mass producing plugins that are not of the same datakind. Don’t try to use this in get_array or get_df because the data is not returned.

  • add_run_id_field – Boolean whether to add a run_id field in case of multi-runs.

  • run_id_as_bytes – Boolean if true uses byte string instead of an unicode string added to a multi-run array. This can save a lot of memory when loading many runs.

  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

get_meta(run_id, target) dict[source]

Return metadata for target for run_id, or raise DataNotAvailable if data is not yet available.

Parameters:
  • run_id – run id to get

  • target – data type to get

get_metadata(run_id, target) dict

Return metadata for target for run_id, or raise DataNotAvailable if data is not yet available.

Parameters:
  • run_id – run id to get

  • target – data type to get

get_save_when(target: str) SaveWhen | int[source]

For a given plugin, get the save when attribute either being a dict or a number.

get_single_plugin(run_id, data_name)[source]

Return a single fully initialized plugin that produces data_name for run_id.

For use in custom processing.

get_source(run_id: str, target: str, check_forbidden: bool = True) set | None[source]

For a given run_id and target get the stored bases where we can start processing from, if no base is available, return None.

Parameters:
  • run_id – run_id

  • target – target

  • check_forbidden – Check that we are not requesting to make a plugin that is forbidden by the context to be created.

Returns:

set of plugin names that are needed to start processing from and are needed in order to build this target.

get_source_sf(run_id, target, should_exist=False)[source]

Get the source storage frontends for a given run_id and target.

Parameters:
  • target (run_id,) – run_id, target

  • should_exist – Raise a ValueError if we cannot find one (e.g. we already checked the data is stored)

Returns:

list of strax.StorageFrontend (when should_exist is False)

get_zarr(run_ids, targets, storage='./strax_temp_data', progress_bar=False, overwrite=True, **kwargs)[source]

Get persistent arrays using zarr. This is useful when loading large amounts of data that cannot fit in memory zarr is very compatible with dask. Targets are loaded into separate arrays and runs are merged. the data is added to any existing data in the storage location.

Parameters:
  • run_ids – (Iterable) Run ids you wish to load.

  • targets – (Iterable) targets to load.

  • storage – (str, optional) fsspec path to store array. Defaults to ‘./strax_temp_data’.

  • overwrite – (boolean, optional) whether to overwrite existing arrays for targets at given path.

Return zarr.Group:

zarr group containing the persistant arrays available at the storage location after loading the requested data the runs loaded into a given array can be seen in the array .attrs[‘RUNS’] field

hvdisp_plot_peak_waveforms(run_id: str, **kwargs)

Plot the sum waveforms of peaks. Holoviews time dimension; will create new one if not provided.

Parameters:
  • width – Plot width in pixels

  • show_largest – Maximum number of peaks to show

  • time_dim

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: peaks, peak_basics. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

hvdisp_plot_pmt_pattern(run_id: str, **kwargs)

Plot a PMT array, with colors showing the intensity of light observed in the time range.

Parameters:

array – ‘top’ or ‘bottom’, array to show.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: records. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

hvdisp_plot_records_2d(run_id: str, **kwargs)

Plot records in a dynamic 2D histogram of (time, pmt)

Parameters:
  • width – Plot width in pixels

  • time_stream – holoviews rangex stream to use. If provided, we assume records is already converted to points (which hopefully is what the stream is derived from)

  • tools – Tools to be used in the interactive plot. Only works with bokeh as plot library.

  • plot_library – Default bokeh, library to be used for the plotting.

  • width – With of the record matrix in pixel.

  • hooks – Hooks to adjust plot settings.

Returns:

datashader object, records holoview points, RangeX time stream of records.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: records. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

is_stored(run_id, target, detailed=False, **kwargs)[source]

Return whether data type target has been saved for run_id through any of the registered storage frontends.

Note that even if False is returned, the data type may still be made with a trivial computation.

key_for(run_id, target)[source]

Get the DataKey for a given run and a given target plugin. The DataKey is inferred from the plugin lineage. The lineage can come either from the _fixed_plugin_cache or computed on the fly.

Parameters:
  • run_id – run id to get

  • target – data type to get

Returns:

strax.DataKey of the target

keys_for_runs(target: str, run_ids: ndarray | list | tuple | str) List[DataKey]

Get the data-keys for a multitude of runs. If use_per_run_defaults is False which it preferably is (#246), getting many keys should be fast as we only only compute the lineage once.

Parameters:
  • run_ids – Runs to get datakeys for

  • target – datatype requested

Returns:

list of datakeys of the target for the given runs.

lineage(run_id, data_type)[source]

Return lineage dictionary for data_type and run_id, based on the options in this context.

list_available(target, runs=None, **kwargs) list

Return sorted list of run_id’s for which target is available.

Parameters:
  • target – Data type to check

  • runs – Runs to check. If None, check all runs.

load_corrected_positions(run_id: str, **kwargs)

Returns the corrected position for each position algorithm available, without the need to reprocess event_basics, as the needed information is already stored in event_basics.

Parameters:
  • alt_s1 – False by default, if True it uses alternative S1 as main one

  • alt_s2 – False by default, if True it uses alternative S2 as main one

  • cmt_version – CMT version to use (it can be a list of same length as posrec_algos, if different versions are required for different posrec algorithms, default ‘local_ONLINE’)

  • posrec_algos – list of position reconstruction algorithms to use (default [‘mlp’, ‘gcn’, ‘cnn’])

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: event_basics. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

make(run_id: str | tuple | list, targets, save=(), max_workers=None, _skip_if_built=True, **kwargs) None[source]

Compute target for run_id. Returns nothing (None).

Parameters:
  • run_id – run id to get

  • targets – list/tuple of strings of data type names to get

  • save – extra data types you would like to save to cache, if they occur in intermediate computations. Many plugins save automatically anyway.

  • max_workers – Number of worker threads/processes to spawn. In practice more CPUs may be used due to strax’s multithreading.

  • allow_multiple – Allow multiple targets to be computed simultaneously without merging the results of the target. This can be used when mass producing plugins that are not of the same datakind. Don’t try to use this in get_array or get_df because the data is not returned.

  • add_run_id_field – Boolean whether to add a run_id field in case of multi-runs.

  • run_id_as_bytes – Boolean if true uses byte string instead of an unicode string added to a multi-run array. This can save a lot of memory when loading many runs.

  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

new_context(storage=(), config=None, register=None, register_all=None, replace=False, **kwargs)[source]

Return a new context with new setting adding to those in this context.

Parameters:

replace – If True, replaces settings rather than adding them. See Context.__init__ for documentation on other parameters.

plot_energy_spectrum(run_id: str, **kwargs)

Plot an energy spectrum histogram, with 1 sigma Poisson confidence intervals around it.

Parameters:
  • exposure_kg_sec – Exposure in kg * sec

  • unit – Unit to plot spectrum in. Can be either: - events (events per bin) - kg_day_kev (events per kg day keV) - tonne_day_kev (events per tonne day keV) - tonne_year_kev (events per tonne year keV) Defaults to kg_day_kev if exposure_kg_sec is provided, otherwise events.

  • min_energy – Minimum energy of the histogram

  • max_energy – Maximum energy of the histogram

  • geomspace – If True, will use a logarithmic energy binning. Otherwise will use a linear scale.

  • n_bins – Number of energy bins to use

  • color – Color to plot in

  • label – Label for the line

  • error_alpha – Alpha value for the statistical error band

  • errors – Type of errors to draw, passed to ‘errors’ argument of Hist1d.plot.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: event_info. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_hit_pattern(run_id: str, **kwargs)

Straxen mini-analysis for which someone was too lazy to write a proper docstring This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: peaks, peak_basics. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_nveto_event_display(run_id: str, **kwargs)

Straxen mini-analysis for which someone was too lazy to write a proper docstring This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: hitlets_nv, events_nv, event_positions_nv. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_peak_classification(run_id: str, **kwargs)

Make an (area, rise_time) scatter plot of peaks.

Parameters:

s – Size of dot for each peak

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: peak_basics. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_peaks(run_id: str, **kwargs)

Straxen mini-analysis for which someone was too lazy to write a proper docstring This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: peaks, peak_basics. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_peaks_aft_histogram(run_id: str, **kwargs)

Plot side-by-side (area, width) histograms of the peak rate and mean area fraction top.

Parameters:
  • pe_bins – Array of bin edges for the peak area dimension [PE]

  • rt_bins – array of bin edges for the rise time dimension [ns]

  • extra_labels – List of (area, risetime, text, color) extra labels to put on the plot

  • rate_range – Range of rates to show [peaks/(bin*s)]

  • aft_range – Range of mean S1 area fraction top / bin to show

  • figsize – Figure size to use

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: peak_basics. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_pulses_mv(run_id: str, **kwargs)

Mini-analyis to plot pulses for the specified list of records. You have to provide a a run-id for which pulses should be plotted. You can use the same arguments as for get_array to select a specific time range or data (see also further ).

In addition you can provide the following arguments:

Parameters:
  • plot_hits – If True plot hit boundaries including the left and right extension as orange shaded regions.

  • plot_median – If true plots pulses sample median as dotted line.

  • max_plots – Limits the number of figures. If you would like to plot more pulses you should put the plots in a PDF.

  • store_pdf – If true figures are put to a PDF instead of plotting them to your notebook. The file name is automatically generated including the time range and run_id.

  • path – Relative path where the PDF should be stored. By default it is the directory of the notebook.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: raw_records_mv. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_pulses_nv(run_id: str, **kwargs)

Mini-analyis to plot pulses for the specified list of records. You have to provide a a run-id for which pulses should be plotted. You can use the same arguments as for get_array to select a specific time range or data (see also further ).

In addition you can provide the following arguments:

Parameters:
  • plot_hits – If True plot hit boundaries including the left and right extension as orange shaded regions.

  • plot_median – If true plots pulses sample median as dotted line.

  • max_plots – Limits the number of figures. If you would like to plot more pulses you should put the plots in a PDF.

  • store_pdf – If true figures are put to a PDF instead of plotting them to your notebook. The file name is automatically generated including the time range and run_id.

  • path – Relative path where the PDF should be stored. By default it is the directory of the notebook.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: raw_records_nv. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_pulses_tpc(run_id: str, **kwargs)

Mini-analyis to plot pulses for the specified list of records. You have to provide a a run-id for which pulses should be plotted. You can use the same arguments as for get_array to select a specific time range or data (see also further ).

In addition you can provide the following arguments:

Parameters:
  • plot_hits – If True plot hit boundaries including the left and right extension as orange shaded regions.

  • plot_median – If true plots pulses sample median as dotted line.

  • max_plots – Limits the number of figures. If you would like to plot more pulses you should put the plots in a PDF.

  • store_pdf – If true figures are put to a PDF instead of plotting them to your notebook. The file name is automatically generated including the time range and run_id.

  • path – Relative path where the PDF should be stored. By default it is the directory of the notebook.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: raw_records. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_records_matrix(run_id: str, **kwargs)

Straxen mini-analysis for which someone was too lazy to write a proper docstring This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: . Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

plot_waveform(run_id: str, **kwargs)

Plot the sum waveform and optionally per-PMT waveforms.

Parameters:
  • deep – If True, show per-PMT waveform matrix under sum waveform. If ‘raw’, use raw_records instead of records to do so.

  • show_largest – Show only the largest show_largest peaks.

  • figsize – Matplotlib figure size for the plot.

  • cbar_loc – location of the intensity color bar. Set to None to omit it altogether.

  • lower_panel_height – Height of the lower panel in terms of the height of the upper panel.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: . Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

provided_dtypes(runid='0')[source]

Summarize dtype information provided by this context.

Returns:

dictionary of provided dtypes with their corresponding lineage hash, save_when, version

purge_unused_configs()[source]

Purge unused configs from the context.

raw_records_matrix(run_id: str, **kwargs)

Straxen mini-analysis for which someone was too lazy to write a proper docstring This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: raw_records. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

records_matrix(run_id: str, **kwargs)

Return (wv_matrix, times, pms)

  • wv_matrix: (n_samples, n_pmt) array with per-PMT waveform intensity in PE/ns

  • times: time labels in seconds (corr. to rows)

  • pmts: PMT numbers (corr. to columns)

Both times and pmts have one extra element.

Parameters:
  • max_samples – Maximum number of time samples. If window and dt conspire to exceed this, waveforms will be downsampled.

  • ignore_max_sample_warning – If True, suppress warning when this happens.

Example:

wvm, ts, ys = st.records_matrix(run_id, seconds_range=(1., 1.00001)) plt.pcolormesh(ts, ys, wvm.T, norm=matplotlib.colors.LogNorm()) plt.colorbar(label=’Intensity [PE / ns]’)

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: records. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

register(plugin_class)[source]

Register plugin_class as provider for data types in provides.

Parameters:

plugin_class – class inheriting from strax.Plugin. You can also pass a sequence of plugins to register, but then you must omit the provides argument. If a plugin class omits the .provides attribute, we will construct one from its class name (CamelCase -> snake_case) Returns plugin_class (so this can be used as a decorator)

register_all(module)[source]

Register all plugins defined in module.

Can pass a list/tuple of modules to register all in each.

run_defaults(run_id)[source]

Get configuration defaults from the run metadata (if these exist)

This will only call the rundb once for each run while the context is in existence; further calls to this will return a cached value.

run_metadata(run_id, projection=None) dict[source]

Return run-level metadata for run_id, or raise DataNotAvailable if this is not available.

Parameters:
  • run_id – run id to get

  • projection – Selection of fields to get, following MongoDB syntax. May not be supported by frontend.

runs: DataFrame | None = None
scan_runs(check_available=(), if_check_available='raise', store_fields=()) DataFrame

Update and return self.runs with runs currently available in all storage frontends.

Parameters:
  • check_available – Check whether these data types are available Availability of xxx is stored as a boolean in the xxx_available column.

  • if_check_available – ‘raise’ (default) or ‘skip’, whether to do the check

  • store_fields – Additional fields from run doc to include as rows in the dataframe. The context options scan_availability and store_run_fields list data types and run fields, respectively, that will always be scanned.

search_field(pattern: str, include_code_usage: bool = True, return_matches: bool = False)[source]

Find and print which plugin(s) provides a field that matches pattern (fnmatch).

Parameters:
  • pattern – pattern to match, e.g. ‘time’ or ‘tim*’

  • include_code_usage – Also include the code occurrences of the fields that match the pattern.

  • return_matches – If set, return a dictionary with the matching fields and the occurrences in code.

Returns:

when return_matches is set, return a dictionary with the matching fields and the occurrences in code. Otherwise, we are not returning anything and just print the results

search_field_usage(search_string: str, plugin: Plugin | List[Plugin] | None = None) List[str][source]

Find and return which plugin(s) use a given field.

Parameters:
  • search_string – a field that matches pattern exact

  • plugin – plugin where to look for a field

Returns:

list of code occurrences in the form of PLUGIN.FUNCTION

select_runs(run_mode=None, run_id=None, include_tags=None, exclude_tags=None, available=(), pattern_type='fnmatch', ignore_underscore=True, force_reload=False)

Return pandas.DataFrame with basic info from runs that match selection criteria.

Parameters:
  • run_mode – Pattern to match run modes (reader.ini.name)

  • run_id – Pattern to match a run_id or run_ids

  • available – str or tuple of strs of data types for which data must be available according to the runs DB.

  • include_tags – String or list of strings of patterns for required tags

  • exclude_tags – String / list of strings of patterns for forbidden tags. Exclusion criteria have higher priority than inclusion criteria.

  • pattern_type – Type of pattern matching to use. Defaults to ‘fnmatch’, which means you can use unix shell-style wildcards (?, *). The alternative is ‘re’, which means you can use full python regular expressions.

  • ignore_underscore – Ignore the underscore at the start of tags (indicating some degree of officialness or automation).

  • force_reload – Force reloading of runs from storage. Otherwise, runs are cached after the first time they are loaded in self.runs.

Examples:
  • run_selection(include_tags=’blinded’)

    select all datasets with a blinded or _blinded tag.

  • run_selection(include_tags=’*blinded’)

    … with blinded or _blinded, unblinded, blablinded, etc.

  • run_selection(include_tags=[‘blinded’, ‘unblinded’])

    … with blinded OR unblinded, but not blablinded.

  • `run_selection(include_tags=’blinded’,

    exclude_tags=[‘bad’, ‘messy’])`

    … select blinded dsatasets that aren’t bad or messy

set_config(config=None, mode='update')[source]

Set new configuration options.

Parameters:
  • config – dict of new options

  • mode – can be either - update: Add to or override current options in context - setdefault: Add to current options, but do not override - replace: Erase config, then set only these options

set_context_config(context_config=None, mode='update')[source]

Set new context configuration options.

Parameters:
  • context_config – dict of new context configuration options

  • mode – can be either - update: Add to or override current options in context - setdefault: Add to current options, but do not override - replace: Erase config, then set only these options

show_config(data_type=None, pattern='*', run_id='99999999999999999999')[source]

Return configuration options that affect data_type.

Parameters:
  • data_type – Data type name

  • pattern – Show only options that match (fnmatch) pattern

  • run_id – Run id to use for run-dependent config options. If omitted, will show defaults active for new runs.

size_mb(run_id, target)[source]

Return megabytes of memory required to hold data.

storage: List[StorageFrontend]
storage_graph(run_id, target, graph=None, not_stored=None, dump_plot=True, to_dir='./', format='svg')

Plot the dependency graph indicating the storage of the plugins.

Parameters:
  • target – str of the target plugin to check

  • graph – graphviz.graphs.Digraph instance

  • not_stored – set of plugins which are not stored

  • dump_plot – bool, if True, save the plot to the to_dir

  • to_dir – str, directory to save the plot

  • format – str, format of the plot

Returns:

all plugins that will be calculated when running self.make(run_id, target)

The colors used in the graph represent the following storage states: - grey: strax.SaveWhen.NEVER - red: strax.SaveWhen.EXPLICIT - orange: strax.SaveWhen.TARGET - yellow: strax.SaveWhen.ALWAYS - green: target is stored

stored_dependencies(run_id: str, target: str | list | tuple, check_forbidden: bool = True, _targets_stored: dict | None = None) dict | None[source]

For a given run_id and target(s) get a dictionary of all the datatypes that are required to build the requested target.

Parameters:
  • run_id – run_id

  • target – target or a list of targets

  • check_forbidden – Check that we are not requesting to make a plugin that is forbidden by the context to be created.

Returns:

dictionary of data types (keys) required for building the requested target(s) and if they are stored (values)

Raises:

strax.DataNotAvailable – if there is at least one data type that is not stored and has no dependency or if it cannot be created

takes_config = immutabledict({'storage_converter': <strax.config.Option object>, 'fuzzy_for': <strax.config.Option object>, 'fuzzy_for_options': <strax.config.Option object>, 'allow_incomplete': <strax.config.Option object>, 'allow_rechunk': <strax.config.Option object>, 'allow_multiprocess': <strax.config.Option object>, 'allow_shm': <strax.config.Option object>, 'allow_lazy': <strax.config.Option object>, 'forbid_creation_of': <strax.config.Option object>, 'store_run_fields': <strax.config.Option object>, 'check_available': <strax.config.Option object>, 'max_messages': <strax.config.Option object>, 'timeout': <strax.config.Option object>, 'saver_timeout': <strax.config.Option object>, 'use_per_run_defaults': <strax.config.Option object>, 'free_options': <strax.config.Option object>, 'apply_data_function': <strax.config.Option object>, 'write_superruns': <strax.config.Option object>})
to_absolute_time_range(run_id, targets=None, time_range=None, seconds_range=None, time_within=None, full_range=None)[source]

Return (start, stop) time in ns since unix epoch corresponding to time range.

Parameters:
  • run_id – run id to get

  • time_range – (start, stop) time in ns since unix epoch. Will be returned without modification

  • targets – data types. Used only if run metadata is unavailable, so run start time has to be estimated from data.

  • seconds_range – (start, stop) seconds since start of run

  • time_within – row of strax data (e.g. eent)

  • full_range – If True returns full time_range of the run.

waveform_display(run_id: str, **kwargs)

Plot a waveform overview display”.

Parameters:

width – Plot width in pixels.

This is a straxen mini-analysis. The method takes run_id as its only positional argument, and additional arguments through keywords only.

The function requires the data types: records, peaks, peak_basics. Unless you specify this through data_kind = array keyword arguments, this data will be loaded automatically.

The function takes the same selection arguments as context.get_array:

Parameters:
  • selection – Query string, sequence of strings, or simple function to apply. The function must take a single argument which represents the structure numpy array of the loaded data.

  • selection_str – Same as selection (deprecated)

  • keep_columns – Array field/dataframe column names to keep. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • drop_columns – Array field/dataframe column names to drop. Useful to reduce amount of data in memory. (You can only specify either keep or drop column.)

  • time_range – (start, stop) range to load, in ns since the epoch

  • seconds_range – (start, stop) range of seconds since the start of the run to load.

  • time_within – row of strax data (e.g. event) to use as time range

  • time_selection – Kind of time selection to apply: - fully_contained: (default) select things fully contained in the range - touching: select things that (partially) overlap with the range - skip: Do not select a time range, even if other arguments say so

  • _chunk_number – For internal use: return data from one chunk.

  • progress_bar – Display a progress bar if metedata exists.

  • multi_run_progress_bar – Display a progress bar for loading multiple runs

exception strax.context.OutsideException[source]

Bases: Exception