orcanet.h5_generator
Module Contents
Classes
Base object for fitting to a sequence of data, such as a dataset. |
Functions
|
Initialize the hdf5_batch_generator_base with the paramters in orga.cfg. |
|
- class orcanet.h5_generator.Hdf5BatchGenerator(files_dict, batchsize=64, key_x_values='x', key_y_values='y', sample_modifier=None, label_modifier=None, fixed_batchsize=False, y_field_names=None, phase='training', xs_mean=None, f_size=None, keras_mode=True, shuffle=False, class_weights=None)[source]
Base object for fitting to a sequence of data, such as a dataset.
Every Sequence must implement the __getitem__ and the __len__ methods. If you want to modify your dataset between epochs you may implement on_epoch_end. The method __getitem__ should return a complete batch.
Notes:
Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once
on each sample per epoch which is not the case with generators.
Examples:
```python from skimage.io import imread from skimage.transform import resize import numpy as np import math
# Here, x_set is list of path to the images # and y_set are the associated classes.
class CIFAR10Sequence(Sequence):
- def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set self.batch_size = batch_size
- def __len__(self):
return math.ceil(len(self.x) / self.batch_size)
- def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
- return np.array([
- resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)
- get_x_values(start_index)[source]
Read one batch of samples from the files and zero center.
- Parameters
- start_indexint
The start index in the h5 files at which the batch will be read. The end index will be the start index + the batch size.
- Returns
- x_valuesdict
One batch of data for each input file.
- get_y_values(start_index)[source]
Get y_values for the nn. Since the y_values are hopefully the same for all the files, use the ones from the first. TODO add check
- Parameters
- start_indexint
The start index in the h5 files at which the batch will be read. The end index will be the start index + the batch size.
- Returns
- y_valuesndarray
The y_values, right from the files.
- orcanet.h5_generator.get_h5_generator(orga, files_dict, f_size=None, zero_center=False, keras_mode=True, shuffle=False, use_def_label=True, phase='training')[source]
Initialize the hdf5_batch_generator_base with the paramters in orga.cfg.
- Parameters
- orgaorcanet.core.Organizer
Contains all the configurable options in the OrcaNet scripts.
- files_dictdict
Pathes of the files to train on. Keys: The name of every input (from the toml list file, can be multiple). Values: The filepath of a single h5py file to read samples from.
- f_sizeint or None
Specifies the number of samples to be read from the .h5 file. If none, the whole .h5 file will be used.
- zero_centerbool
Whether to use zero centering. Requires orga.zero_center_folder to be set.
- keras_modebool
Specifies if mc-infos (y_values) should be yielded as well. The mc-infos are used for evaluation after training and testing is finished.
- shufflebool
Randomize the order in which batches are read from the file. Significantly reduces read out speed.
- use_def_labelbool
If True and no label modifier is given by user, use the default label modifier instead of none.
- Yields
- xsdict
- Data for the model train on.
Keys : str The name(s) of the input layer(s) of the model. Values : ndarray A batch of samples for the corresponding input.
- ysdict or None
- Labels for the model to train on.
Keys : str The name(s) of the output layer(s) of the model. Values : ndarray A batch of labels for the corresponding output.
Will be None if there are no labels in the file.
- y_valuesndarray, optional
Y values from the file. Only yielded if yield_mc_info is True.