orcasong.core
Module Contents
Classes
Preprocess km3net/antares events for neural networks. |
|
For making binned images and mc_infos, which can be used for conv nets. |
|
Turn km3 events to graph data. |
- class orcasong.core.BaseProcessor(extractor=None, det_file=None, correct_mc_time=True, center_time=True, calib_hits=True, calib_mchits=True, add_t0=False, correct_timeslew=True, center_hits_to=None, event_skipper=None, chunksize=None, keep_event_info=False, overwrite=True, sort_y=True, y_to_float64=True)[source]
Preprocess km3net/antares events for neural networks.
This serves as a baseclass, which handles things like reading events, calibrating, generating labels and saving the output.
- Parameters
- extractorfunction, optional
Function that extracts desired info from a blob, which is then stored as the “y” datafield in the .h5 file. The function takes the km3pipe blob as an input, and returns a dict mapping str to floats. Examples can be found in orcasong.extractors.
- det_filestr, optional
Path to a .detx detector geometry file, which can be used to calibrate the hits.
- correct_mc_timebool
Convert MC hit times to JTE times. Will only be done if mc_hits and mc_tracks are there.
- center_timebool
Subtract time of first triggered hit from all hit times. Will also be done for McHits if they are in the blob [default: True].
- calib_hitsbool
Apply calibration to hits if det file is given. Default: True.
- calib_mchitsbool
Apply calibration to mchits if det file is given and mchits are found in the blob. Default: True.
- correct_timeslewbool
If true (default), the time slewing of hits depending on their tot will be corrected during calibration. Only done if det file is given and calib_hits is True.
- center_hits_totuple, optional
Translate the xyz positions of the hits (and mchits), as if the detector was centered at the given position. E.g., if its (0, 0, None), the hits and mchits will be centered at xy = 00, and z will be left untouched. Can only be used when a detx file is given.
- add_t0bool
If true, add t0 to the time of hits and mchits. If using a det_file, this will already have been done automatically [default: False].
- event_skipperfunc, optional
Function that takes the blob as an input, and returns a bool. If the bool is true, the blob will be skipped. This is placed after the binning and mc_info extractor.
- chunksizeint, optional
Chunksize (along axis_0) used for saving the output to a .h5 file [default: None, i.e. auto chunking].
- keep_event_infobool
If True, will keep the “event_info” table [default: False].
- overwritebool
If True, overwrite the output file if it exists already. If False, throw an error instead.
- sort_ybool
Sort the columns in the y dataset alphabetically.
- y_to_float64bool
Convert everything in the y dataset to float 64 (Default: True). Hint: Not all other dtypes can store nan!
- Attributes
- n_statusbarint or None
Print a statusbar every n blobs.
- n_memory_observerint or None
Print memory usage every n blobs.
- complibstr
Compression library used for saving the output to a .h5 file. All PyTables compression filters are available, e.g. ‘zlib’, ‘lzf’, ‘blosc’, … .
- complevelint
Compression level for the compression filter that is used for saving the output to a .h5 file.
- flush_frequencyint
After how many events the accumulated output should be flushed to the harddisk. A larger value leads to a faster orcasong execution, but it increases the RAM usage as well.
- seedint, optional
Makes all random (numpy) actions reproducable. Set at the start of each pipeline.
- run(self, infile, outfile=None)[source]
Process the events from the infile, and save them to the outfile.
- Parameters
- infilestr
Path to the input file.
- outfilestr, optional
Path to the output file (will be created). If none is given, will auto generate the name and save it in the cwd.
- run_multi(self, infiles, outfolder)[source]
Process multiple files into their own output files each. The output file names will be generated automatically.
- Parameters
- infilesList
The path to infiles as str.
- outfolderstr
The output folder to place them in.
- class orcasong.core.FileBinner(bin_edges_list, add_bin_stats=True, hit_weights=None, chunksize=32, **kwargs)[source]
For making binned images and mc_infos, which can be used for conv nets.
Can also add statistics of the binning to the h5 files, which can be plotted to show the distribution of hits among the bins and how many hits were cut off.
- Parameters
- bin_edges_listList
List with the names of the fields to bin, and the respective bin edges, including the left- and right-most bin edge. Example: For 10 bins in the z direction, and 100 bins in time:
- bin_edges_list = [
[“pos_z”, np.linspace(0, 10, 11)], [“time”, np.linspace(-50, 550, 101)],
]
Some examples can be found in orcasong.bin_edges.
- add_bin_statsbool
Add statistics of the binning to the output file. They can be plotted with util/bin_stats_plot.py [default: True].
- hit_weightsstr, optional
Use blob[“Hits”][hit_weights] as weights for samples in histogram.
- kwargs
Options of the BaseProcessor.
- finish_file(self, f, summary)[source]
Work with the output file after the pipe has finished.
- Parameters
- fh5py.File
The opened output file.
- summarykm3pipe.Blob
The output from pipe.drain().
- run_multi(self, infiles, outfolder, save_plot=False)[source]
Bin multiple files into their own output files each. The output file names will be generated automatically.
- Parameters
- infilesList
The path to infiles as str.
- outfolderstr
The output folder to place them in.
- save_plotbool
Save the binning hists as a pdf. Only possible if add_bin_stats is True.
- class orcasong.core.FileGraph(max_n_hits=None, time_window=None, hit_infos=None, only_triggered_hits=False, fixed_length=False, **kwargs)[source]
Turn km3 events to graph data.
The resulting file will have a dataset “x” of shape (total n_hits, len(hit_infos)). The column names of the last axis (i.e. hit_infos) are saved as attributes of the dataset (f[“x”].attrs).
- Parameters
- hit_infostuple, optional
Which entries in the ‘/Hits’ Table will be kept. E.g. pos_x, time, … Often, only dir_x/y/z, pos_x/y/z and time are required. Default: Keep all entries.
- time_windowtuple, optional
Two ints (start, end). Hits outside of this time window will be cut away (based on ‘Hits/time’). Default: Keep all hits.
- only_triggered_hitsbool
If true, use only triggered hits. Otherwise, use all hits (default).
- max_n_hitsint
Maximum number of hits that gets saved per event. If an event has more, some will get cut randomly! Default: Keep all hits.
- fixed_lengthbool
Legacy option. If False (default), save hits of events with variable length as 2d arrays using km3pipe’s indices. If True, pad hits of each event with 0s to a fixed length, so that they can be stored as 3d arrays like images. max_n_hits needs to be given in that case, and a column will be added called ‘is_valid’, which is 0 if the entry is padded, and 1 otherwise. This is inefficient and will cut off hits, so it should not be used.
- kwargs
Options of the BaseProcessor.