orcanet.h5_generator

Module Contents

Classes

Hdf5BatchGenerator

Base object for fitting to a sequence of data, such as a dataset.

Functions

get_h5_generator(orga, files_dict[, f_size, ...])

Initialize the hdf5_batch_generator_base with the paramters in orga.cfg.

make_dataset(gen)

class orcanet.h5_generator.Hdf5BatchGenerator(files_dict, batchsize=64, key_x_values='x', key_y_values='y', sample_modifier=None, label_modifier=None, fixed_batchsize=False, y_field_names=None, phase='training', xs_mean=None, f_size=None, keras_mode=True, shuffle=False, class_weights=None)[source]

Base object for fitting to a sequence of data, such as a dataset.

Every Sequence must implement the __getitem__ and the __len__ methods. If you want to modify your dataset between epochs you may implement on_epoch_end. The method __getitem__ should return a complete batch.

Notes:

Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once

on each sample per epoch which is not the case with generators.

Examples:

```python from skimage.io import imread from skimage.transform import resize import numpy as np import math

# Here, x_set is list of path to the images # and y_set are the associated classes.

class CIFAR10Sequence(Sequence):

def __init__(self, x_set, y_set, batch_size):

self.x, self.y = x_set, y_set self.batch_size = batch_size

def __len__(self):

return math.ceil(len(self.x) / self.batch_size)

def __getitem__(self, idx):

batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]

return np.array([
resize(imread(file_name), (200, 200))

for file_name in batch_x]), np.array(batch_y)

```

pad_to_size(info_blob)[source]

Pad the batch to have a fixed batchsize.

open()[source]

Open all files and prepare for read out.

close()[source]

Close all files again.

get_x_values(start_index)[source]

Read one batch of samples from the files and zero center.

Parameters
start_indexint

The start index in the h5 files at which the batch will be read. The end index will be the start index + the batch size.

Returns
x_valuesdict

One batch of data for each input file.

get_y_values(start_index)[source]

Get y_values for the nn. Since the y_values are hopefully the same for all the files, use the ones from the first. TODO add check

Parameters
start_indexint

The start index in the h5 files at which the batch will be read. The end index will be the start index + the batch size.

Returns
y_valuesndarray

The y_values, right from the files.

print_timestats(print_func=None)[source]

Print stats about how long it took to read batches.

get_file_meta()[source]

Meta information about the files. Only read out once.

orcanet.h5_generator.get_h5_generator(orga, files_dict, f_size=None, zero_center=False, keras_mode=True, shuffle=False, use_def_label=True, phase='training')[source]

Initialize the hdf5_batch_generator_base with the paramters in orga.cfg.

Parameters
orgaorcanet.core.Organizer

Contains all the configurable options in the OrcaNet scripts.

files_dictdict

Pathes of the files to train on. Keys: The name of every input (from the toml list file, can be multiple). Values: The filepath of a single h5py file to read samples from.

f_sizeint or None

Specifies the number of samples to be read from the .h5 file. If none, the whole .h5 file will be used.

zero_centerbool

Whether to use zero centering. Requires orga.zero_center_folder to be set.

keras_modebool

Specifies if mc-infos (y_values) should be yielded as well. The mc-infos are used for evaluation after training and testing is finished.

shufflebool

Randomize the order in which batches are read from the file. Significantly reduces read out speed.

use_def_labelbool

If True and no label modifier is given by user, use the default label modifier instead of none.

Yields
xsdict
Data for the model train on.

Keys : str The name(s) of the input layer(s) of the model. Values : ndarray A batch of samples for the corresponding input.

ysdict or None
Labels for the model to train on.

Keys : str The name(s) of the output layer(s) of the model. Values : ndarray A batch of labels for the corresponding output.

Will be None if there are no labels in the file.

y_valuesndarray, optional

Y values from the file. Only yielded if yield_mc_info is True.

orcanet.h5_generator.make_dataset(gen)[source]