`orcanet.h5_generator`

Module Contents

Classes

Hdf5BatchGenerator

Base object for fitting to a sequence of data, such as a dataset.

Functions

`get_h5_generator`(orga, files_dict[, f_size, ...])	Initialize the hdf5_batch_generator_base with the paramters in orga.cfg.
`make_dataset`(gen)

class orcanet.h5_generator.Hdf5BatchGenerator(files_dict, batchsize=64, key_x_values='x', key_y_values='y', sample_modifier=None, label_modifier=None, fixed_batchsize=False, y_field_names=None, phase='training', xs_mean=None, f_size=None, keras_mode=True, shuffle=False, class_weights=None)[source]

Base object for fitting to a sequence of data, such as a dataset.

Every Sequence must implement the __getitem__ and the __len__ methods. If you want to modify your dataset between epochs you may implement on_epoch_end. The method __getitem__ should return a complete batch.

Notes:

Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once

on each sample per epoch which is not the case with generators.

Examples:

```python from skimage.io import imread from skimage.transform import resize import numpy as np import math

# Here, x_set is list of path to the images # and y_set are the associated classes.

class CIFAR10Sequence(Sequence):

def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set self.batch_size = batch_size

def __len__(self):
return math.ceil(len(self.x) / self.batch_size)

def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]

return np.array([

resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)

```

pad_to_size(info_blob)[source]: Pad the batch to have a fixed batchsize.

open()[source]: Open all files and prepare for read out.

close()[source]: Close all files again.

get_x_values(start_index)[source]

Read one batch of samples from the files and zero center.

Parameters

start_indexint: The start index in the h5 files at which the batch will be read. The end index will be the start index + the batch size.

Returns

x_valuesdict: One batch of data for each input file.

get_y_values(start_index)[source]

Get y_values for the nn. Since the y_values are hopefully the same for all the files, use the ones from the first. TODO add check

Parameters

start_indexint: The start index in the h5 files at which the batch will be read. The end index will be the start index + the batch size.

Returns

y_valuesndarray: The y_values, right from the files.

print_timestats(print_func=None)[source]: Print stats about how long it took to read batches.

get_file_meta()[source]: Meta information about the files. Only read out once.

orcanet.h5_generator.get_h5_generator(orga, files_dict, f_size=None, zero_center=False, keras_mode=True, shuffle=False, use_def_label=True, phase='training')[source]

Initialize the hdf5_batch_generator_base with the paramters in orga.cfg.

Parameters

orgaorcanet.core.Organizer: Contains all the configurable options in the OrcaNet scripts.
files_dictdict: Pathes of the files to train on. Keys: The name of every input (from the toml list file, can be multiple). Values: The filepath of a single h5py file to read samples from.
f_sizeint or None: Specifies the number of samples to be read from the .h5 file. If none, the whole .h5 file will be used.
zero_centerbool: Whether to use zero centering. Requires orga.zero_center_folder to be set.
keras_modebool: Specifies if mc-infos (y_values) should be yielded as well. The mc-infos are used for evaluation after training and testing is finished.
shufflebool: Randomize the order in which batches are read from the file. Significantly reduces read out speed.
use_def_labelbool: If True and no label modifier is given by user, use the default label modifier instead of none.

Yields

xsdict

Data for the model train on.: Keys : str The name(s) of the input layer(s) of the model. Values : ndarray A batch of samples for the corresponding input.

ysdict or None

Labels for the model to train on.: Keys : str The name(s) of the output layer(s) of the model. Values : ndarray A batch of labels for the corresponding output.

Will be None if there are no labels in the file.

y_valuesndarray, optional

Y values from the file. Only yielded if yield_mc_info is True.

orcanet.h5_generator.make_dataset(gen)[source]

orcanet.h5_generator

Module Contents

Classes

Functions

`orcanet.h5_generator`