:py:mod:`orcanet.h5_generator`
==============================

.. py:module:: orcanet.h5_generator


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   orcanet.h5_generator.Hdf5BatchGenerator


Functions
~~~~~~~~~

.. autoapisummary::

   orcanet.h5_generator.get_h5_generator
   orcanet.h5_generator.make_dataset


.. py:class:: Hdf5BatchGenerator(files_dict, batchsize=64, key_x_values='x', key_y_values='y', sample_modifier=None, label_modifier=None, fixed_batchsize=False, y_field_names=None, phase='training', xs_mean=None, f_size=None, keras_mode=True, shuffle=False, class_weights=None)


   Base object for fitting to a sequence of data, such as a dataset.

   Every `Sequence` must implement the `__getitem__` and the `__len__` methods.
   If you want to modify your dataset between epochs you may implement
   `on_epoch_end`.
   The method `__getitem__` should return a complete batch.

   Notes:

   `Sequence` are a safer way to do multiprocessing. This structure guarantees
   that the network will only train once
    on each sample per epoch which is not the case with generators.

   Examples:

   ```python
   from skimage.io import imread
   from skimage.transform import resize
   import numpy as np
   import math

   # Here, `x_set` is list of path to the images
   # and `y_set` are the associated classes.

   class CIFAR10Sequence(Sequence):

       def __init__(self, x_set, y_set, batch_size):
           self.x, self.y = x_set, y_set
           self.batch_size = batch_size

       def __len__(self):
           return math.ceil(len(self.x) / self.batch_size)

       def __getitem__(self, idx):
           batch_x = self.x[idx * self.batch_size:(idx + 1) *
           self.batch_size]
           batch_y = self.y[idx * self.batch_size:(idx + 1) *
           self.batch_size]

           return np.array([
               resize(imread(file_name), (200, 200))
                  for file_name in batch_x]), np.array(batch_y)
   ```


   ..
       !! processed by numpydoc !!
   .. py:method:: pad_to_size(info_blob)

      
      Pad the batch to have a fixed batchsize.


      ..
          !! processed by numpydoc !!

   .. py:method:: open()

      
      Open all files and prepare for read out.


      ..
          !! processed by numpydoc !!

   .. py:method:: close()

      
      Close all files again.


      ..
          !! processed by numpydoc !!

   .. py:method:: get_x_values(start_index)

      
      Read one batch of samples from the files and zero center.


      :Parameters:

          **start_index** : int
              The start index in the h5 files at which the batch will be read.
              The end index will be the start index + the batch size.

      :Returns:

          **x_values** : dict
              One batch of data for each input file.


      ..
          !! processed by numpydoc !!

   .. py:method:: get_y_values(start_index)

      
      Get y_values for the nn. Since the y_values are hopefully the same
      for all the files, use the ones from the first. TODO add check


      :Parameters:

          **start_index** : int
              The start index in the h5 files at which the batch will be read.
              The end index will be the start index + the batch size.

      :Returns:

          **y_values** : ndarray
              The y_values, right from the files.


      ..
          !! processed by numpydoc !!

   .. py:method:: print_timestats(print_func=None)

      
      Print stats about how long it took to read batches.


      ..
          !! processed by numpydoc !!

   .. py:method:: get_file_meta()

      
      Meta information about the files. Only read out once.


      ..
          !! processed by numpydoc !!


.. py:function:: get_h5_generator(orga, files_dict, f_size=None, zero_center=False, keras_mode=True, shuffle=False, use_def_label=True, phase='training')

   
   Initialize the hdf5_batch_generator_base with the paramters in orga.cfg.


   :Parameters:

       **orga** : orcanet.core.Organizer
           Contains all the configurable options in the OrcaNet scripts.

       **files_dict** : dict
           Pathes of the files to train on.
           Keys: The name of every input (from the toml list file, can be multiple).
           Values: The filepath of a single h5py file to read samples from.

       **f_size** : int or None
           Specifies the number of samples to be read from the .h5 file.
           If none, the whole .h5 file will be used.

       **zero_center** : bool
           Whether to use zero centering.
           Requires orga.zero_center_folder to be set.

       **keras_mode** : bool
           Specifies if mc-infos (y_values) should be yielded as well. The
           mc-infos are used for evaluation after training and testing is finished.

       **shuffle** : bool
           Randomize the order in which batches are read from the file.
           Significantly reduces read out speed.

       **use_def_label** : bool
           If True and no label modifier is given by user, use the default
           label modifier instead of none.


   :Yields:

       **xs** : dict
           Data for the model train on.
               Keys : str  The name(s) of the input layer(s) of the model.
               Values : ndarray    A batch of samples for the corresponding input.

       **ys** : dict or None
           Labels for the model to train on.
               Keys : str  The name(s) of the output layer(s) of the model.
               Values : ndarray    A batch of labels for the corresponding output.
           Will be None if there are no labels in the file.

       **y_values** : ndarray, optional
           Y values from the file. Only yielded if yield_mc_info is True.


   ..
       !! processed by numpydoc !!

.. py:function:: make_dataset(gen)