orcasong.core

Module Contents

Classes

BaseProcessor

Preprocess km3net/antares events for neural networks.

FileBinner

For making binned images and mc_infos, which can be used for conv nets.

FileGraph

Turn km3 events to graph data.

class orcasong.core.BaseProcessor(extractor=None, det_file=None, correct_mc_time=True, center_time=True, calib_hits=True, calib_mchits=True, add_t0=False, correct_timeslew=True, center_hits_to=None, event_skipper=None, chunksize=None, keep_event_info=False, overwrite=True, sort_y=True, y_to_float64=True)[source]

Preprocess km3net/antares events for neural networks.

This serves as a baseclass, which handles things like reading events, calibrating, generating labels and saving the output.

Parameters
extractorfunction, optional

Function that extracts desired info from a blob, which is then stored as the “y” datafield in the .h5 file. The function takes the km3pipe blob as an input, and returns a dict mapping str to floats. Examples can be found in orcasong.extractors.

det_filestr, optional

Path to a .detx detector geometry file, which can be used to calibrate the hits.

correct_mc_timebool

Convert MC hit times to JTE times. Will only be done if mc_hits and mc_tracks are there.

center_timebool

Subtract time of first triggered hit from all hit times. Will also be done for McHits if they are in the blob [default: True].

calib_hitsbool

Apply calibration to hits if det file is given. Default: True.

calib_mchitsbool

Apply calibration to mchits if det file is given and mchits are found in the blob. Default: True.

correct_timeslewbool

If true (default), the time slewing of hits depending on their tot will be corrected during calibration. Only done if det file is given and calib_hits is True.

center_hits_totuple, optional

Translate the xyz positions of the hits (and mchits), as if the detector was centered at the given position. E.g., if its (0, 0, None), the hits and mchits will be centered at xy = 00, and z will be left untouched. Can only be used when a detx file is given.

add_t0bool

If true, add t0 to the time of hits and mchits. If using a det_file, this will already have been done automatically [default: False].

event_skipperfunc, optional

Function that takes the blob as an input, and returns a bool. If the bool is true, the blob will be skipped. This is placed after the binning and mc_info extractor.

chunksizeint, optional

Chunksize (along axis_0) used for saving the output to a .h5 file [default: None, i.e. auto chunking].

keep_event_infobool

If True, will keep the “event_info” table [default: False].

overwritebool

If True, overwrite the output file if it exists already. If False, throw an error instead.

sort_ybool

Sort the columns in the y dataset alphabetically.

y_to_float64bool

Convert everything in the y dataset to float 64 (Default: True). Hint: Not all other dtypes can store nan!

Attributes
n_statusbarint or None

Print a statusbar every n blobs.

n_memory_observerint or None

Print memory usage every n blobs.

complibstr

Compression library used for saving the output to a .h5 file. All PyTables compression filters are available, e.g. ‘zlib’, ‘lzf’, ‘blosc’, … .

complevelint

Compression level for the compression filter that is used for saving the output to a .h5 file.

flush_frequencyint

After how many events the accumulated output should be flushed to the harddisk. A larger value leads to a faster orcasong execution, but it increases the RAM usage as well.

seedint, optional

Makes all random (numpy) actions reproducable. Set at the start of each pipeline.

run(self, infile, outfile=None)[source]

Process the events from the infile, and save them to the outfile.

Parameters
infilestr

Path to the input file.

outfilestr, optional

Path to the output file (will be created). If none is given, will auto generate the name and save it in the cwd.

run_multi(self, infiles, outfolder)[source]

Process multiple files into their own output files each. The output file names will be generated automatically.

Parameters
infilesList

The path to infiles as str.

outfolderstr

The output folder to place them in.

build_pipe(self, infile, outfile, timeit=True)[source]

Initialize and connect the modules from the different stages.

get_cmpts_pre(self, infile)[source]

Modules that read and calibrate the events.

abstract get_cmpts_main(self)[source]

Produce and store the samples as ‘samples’ in the blob.

get_cmpts_post(self, outfile)[source]

Modules that postproc and save the events.

finish_file(self, f, summary)[source]

Work with the output file after the pipe has finished.

Parameters
fh5py.File

The opened output file.

summarykm3pipe.Blob

The output from pipe.drain().

class orcasong.core.FileBinner(bin_edges_list, add_bin_stats=True, hit_weights=None, chunksize=32, **kwargs)[source]

For making binned images and mc_infos, which can be used for conv nets.

Can also add statistics of the binning to the h5 files, which can be plotted to show the distribution of hits among the bins and how many hits were cut off.

Parameters
bin_edges_listList

List with the names of the fields to bin, and the respective bin edges, including the left- and right-most bin edge. Example: For 10 bins in the z direction, and 100 bins in time:

bin_edges_list = [

[“pos_z”, np.linspace(0, 10, 11)], [“time”, np.linspace(-50, 550, 101)],

]

Some examples can be found in orcasong.bin_edges.

add_bin_statsbool

Add statistics of the binning to the output file. They can be plotted with util/bin_stats_plot.py [default: True].

hit_weightsstr, optional

Use blob[“Hits”][hit_weights] as weights for samples in histogram.

kwargs

Options of the BaseProcessor.

get_cmpts_main(self)[source]

Generate nD images.

finish_file(self, f, summary)[source]

Work with the output file after the pipe has finished.

Parameters
fh5py.File

The opened output file.

summarykm3pipe.Blob

The output from pipe.drain().

run_multi(self, infiles, outfolder, save_plot=False)[source]

Bin multiple files into their own output files each. The output file names will be generated automatically.

Parameters
infilesList

The path to infiles as str.

outfolderstr

The output folder to place them in.

save_plotbool

Save the binning hists as a pdf. Only possible if add_bin_stats is True.

get_names_and_shape(self)[source]

Get names and shape of the resulting x data, e.g. (pos_z, time), (18, 50).

class orcasong.core.FileGraph(max_n_hits=None, time_window=None, hit_infos=None, only_triggered_hits=False, fixed_length=False, **kwargs)[source]

Turn km3 events to graph data.

The resulting file will have a dataset “x” of shape (total n_hits, len(hit_infos)). The column names of the last axis (i.e. hit_infos) are saved as attributes of the dataset (f[“x”].attrs).

Parameters
hit_infostuple, optional

Which entries in the ‘/Hits’ Table will be kept. E.g. pos_x, time, … Often, only dir_x/y/z, pos_x/y/z and time are required. Default: Keep all entries.

time_windowtuple, optional

Two ints (start, end). Hits outside of this time window will be cut away (based on ‘Hits/time’). Default: Keep all hits.

only_triggered_hitsbool

If true, use only triggered hits. Otherwise, use all hits (default).

max_n_hitsint

Maximum number of hits that gets saved per event. If an event has more, some will get cut randomly! Default: Keep all hits.

fixed_lengthbool

Legacy option. If False (default), save hits of events with variable length as 2d arrays using km3pipe’s indices. If True, pad hits of each event with 0s to a fixed length, so that they can be stored as 3d arrays like images. max_n_hits needs to be given in that case, and a column will be added called ‘is_valid’, which is 0 if the entry is padded, and 1 otherwise. This is inefficient and will cut off hits, so it should not be used.

kwargs

Options of the BaseProcessor.

get_cmpts_main(self)[source]

Produce and store the samples as ‘samples’ in the blob.

finish_file(self, f, summary)[source]

Work with the output file after the pipe has finished.

Parameters
fh5py.File

The opened output file.

summarykm3pipe.Blob

The output from pipe.drain().