orcanet.in_out

Utility code regarding user input.

Module Contents

Classes

IOHandler

Access info indirectly contained in the cfg object.

Functions

get_subfolder(main_folder[, name, create])

Get the path to one or all subfolders of the main folder.

get_inputs(model)

Get names and keras layers of the inputs of the model, as a dict.

split_name_of_predfile(file)

Get epoch, fileno, cal fileno from the name of a predfile.

h5_get_number_of_rows(h5_filepath[, datasets])

Gets the total number of rows of of a .h5 file.

use_local_tmpdir(files)

Copies given files to the local temp folder.

orcanet.in_out.get_subfolder(main_folder, name=None, create=False)[source]

Get the path to one or all subfolders of the main folder.

Parameters
main_folderstr

The main folder.

namestr or None

The name of the subfolder.

createbool

If the subfolder should be created if it does not exist.

Returns
subfolderstr or tuple

The path of the subfolder. If name is None, all subfolders will be returned as a tuple.

orcanet.in_out.get_inputs(model)[source]

Get names and keras layers of the inputs of the model, as a dict.

class orcanet.in_out.IOHandler(cfg)[source]

Access info indirectly contained in the cfg object.

get_latest_epoch()[source]

Return the highest epoch/fileno pair of any saved model.

Returns
latest_epochtuple or None

The highest epoch, file_no pair. None if the folder is empty or does not exist yet.

get_all_epochs()[source]

Get a sorted list of the epoch/fileno pairs of all saved models.

Returns
epochsList

The (epoch, fileno) tuples. List is empty if none can be found.

get_next_epoch(epoch)[source]

Return the next epoch / fileno tuple.

It depends on how many train files there are.

Parameters
epochtuple or None

Current epoch and file number.

Returns
next_epochtuple

Next epoch and file number.

get_previous_epoch(epoch)[source]

Return the previous epoch / fileno tuple.

get_subfolder(name=None, create=False)[source]

Get the path to one or all subfolders of the main folder.

Parameters
namestr or None

The name of the subfolder.

createbool

If the subfolder should be created if it does not exist.

Returns
subfolderstr or tuple

The path of the subfolder. If name is None, all subfolders will be returned as a tuple.

get_model_path(epoch, fileno, local=False)[source]

Get the path to a model (which might not exist yet).

Parameters
epochint

Its epoch.

filenoint

Its file number.

localbool

If True, will only return the path inside the output_folder, i.e. models/models_epochXX_file_YY.h5.

Returns
model_pathstr

The path to the model.

get_latest_prediction_file_no(epoch, fileno)[source]

Returns the file number of the latest currently predicted val file.

Parameters
epochint

Epoch of the model that has predicted.

filenoint

Fileno of the model that has predicted.

Returns
latest_val_file_noint or None

File number of the prediction file with the highest val index. STARTS FROM 1, so this is whats in the file name. None if there is none.

get_pred_path(epoch, fileno, pred_file_no)[source]

Gets the path of a prediction file. The ints all start from 1.

Parameters
epochint

Epoch of an already trained nn model.

filenoint

File number train step of an already trained nn model.

pred_file_noint

Val file no of the prediction files that are found in the prediction folder.

Returns
pred_filepathstr

The path.

get_pred_files_list(epoch=None, fileno=None)[source]

Returns a sorted list with all pred .h5 files in the prediction folder. Does not include the inference files.

Parameters
epochint, optional

Specific model epoch to look pred files up for.

filenoint, optional

Specific model epoch to look pred files up for.

Returns
pred_files_listList

List with the full filepaths of all prediction results files.

get_local_files(which)[source]

Get the training or validation file paths for each list input set.

Returns the path to the copy of the file on the local tmpdir, which it will generate if called for the first time.

Parameters
whichstr

Either “train”, “val”, or “inference”.

Returns
dict

A dict containing the paths to the training or validation files on which the model will be trained on. Example for the format for two input sets with two files each: {

“input_A” : (‘path/to/set_A_file_1.h5’, ‘path/to/set_A_file_2.h5’), “input_B” : (‘path/to/set_B_file_1.h5’, ‘path/to/set_B_file_2.h5’),

}

get_n_bins()[source]

Get the number of bins from the training files.

Only the first files are looked up, the others should be identical.

Returns
n_binsdict

Toml-list input names as keys, list of the bins as values.

get_file_sizes(which)[source]

Get the number of samples in each training or validation input file.

Parameters
whichstr

Either train or val.

Returns
file_sizesList

Its length is equal to the number of files in each input set.

Raises
ValueError

If there is a different number of samples in any of the files of all inputs.

get_no_of_files(which)[source]

Return the number of training or validation files.

Only looks up the no of files of one (random) list input, as equal length is checked during read in.

Parameters
whichstr

Either train or val.

Returns
no_of_filesint

The number of files.

yield_files(which)[source]

Yield a training or validation filepaths for every input.

They will be yielded in the same order as they are given in the toml file.

Parameters
whichstr

Either train or val.

Yields
files_dictdict

Keys: The name of every toml list input. Values: One of the filepaths.

get_file(which, file_no)[source]

Get a dict with the n-th files.

check_connections(model)[source]

Check if the names and shapes of the samples and labels in the given input files work with the model.

Also takes into account the possibly present sample or label modifiers.

Parameters
modelks.model

A keras model.

Raises
ValueError

If they dont work together.

get_batch()[source]

For testing purposes, return a batch of x_values and y_values.

This will always be the first batchsize samples and y_values from the first file, before any modifiers have been applied.

Returns
info_blobdict

X- and y-values from the files. Has the following entries: x_values : dict

Keys: Names of the input datasets from the list toml file. Values: ndarray, a batch of samples.

y_valuesndarray

From the y_values datagroup of the input files.

get_input_shapes()[source]

Get the input names and shapes of the data after the modifier has been applied.

Returns
input_shapesdict

Keys: Name of the inputs of the model. Values: Their shape without the batchsize.

print_log(lines, logging=True)[source]

Print and also log to the full log file.

get_epoch_float(epoch, fileno)[source]

Make a float value out of epoch/fileno.

get_learning_rate(epoch)[source]

Get the learning rate for a given epoch and file number.

The user learning rate (cfg.learning_rate) can be None, a float, a tuple, or a function.

Parameters
epochtuple

Epoch and file number. Both start at 1, i.e. the start of the training is (1, 1), the next file is (1, 2), … This is also in the filename of the saved models.

Returns
lrfloat

The learning rate that will be used for the given epoch/fileno.

orcanet.in_out.split_name_of_predfile(file)[source]

Get epoch, fileno, cal fileno from the name of a predfile.

Parameters
filestr

Like this: model_epoch_XX_file_YY_on_USERLIST_val_file_ZZ.h5

Returns
epoch , file_no, val_file_notuple(int)

As integers.

orcanet.in_out.h5_get_number_of_rows(h5_filepath, datasets=None)[source]

Gets the total number of rows of of a .h5 file.

Multiple dataset names can be given as a list to check if they all have the same number of rows (axis 0).

Parameters
h5_filepathstr

filepath of the .h5 file.

datasetslist

Optional, The names of datasets in the file to check.

Returns
number_of_rows: int

number of rows of the .h5 file in the first dataset.

Raises
AssertionError

If the given datasets do not have the same no of rows.

orcanet.in_out.use_local_tmpdir(files)[source]

Copies given files to the local temp folder.

Parameters
filesdict

Dict containing the file pathes.

Returns
files_ssddict

Dict with updated SSD/scratch filepaths.