:py:mod:`orcanet.core`
======================

.. py:module:: orcanet.core

.. autoapi-nested-parse::

   Core scripts for the OrcaNet package.

   ..
       !! processed by numpydoc !!


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   orcanet.core.Organizer
   orcanet.core.Configuration


.. py:class:: Organizer(output_folder, list_file=None, config_file=None, tf_log_level=None, discover_tomls=True)

   
   Core class for working with networks in OrcaNet.


   :Attributes:

       **cfg** : orcanet.core.Configuration
           Contains all configurable options.

       **io** : orcanet.in_out.IOHandler
           Utility functions for accessing the info in cfg.

       **history** : orcanet.in_out.HistoryHandler
           For reading and plotting data from the log files created
           during training.


   ..
       !! processed by numpydoc !!
   .. py:method:: train_and_validate(model=None, epochs=None, to_epoch=None)

      
      Train a model and validate according to schedule.

      The various settings of this process can be controlled with the
      attributes of orca.cfg.
      The model will be trained on the given data, saved and validated.
      Logfiles of the training are saved in the output folder.
      Plots showing the training and validation history, as well as
      the weights and activations of the network are generated in
      the plots subfolder after every validation.
      The training can be resumed by executing this function again.

      :Parameters:

          **model** : ks.models.Model or str, optional
              Compiled keras model to use for training. Required for the first
              epoch (the start of training).
              Can also be the path to a saved keras model, which will be laoded.
              If model is None, the most recent saved model will be
              loaded automatically to continue the training.

          **epochs** : int, optional
              How many epochs should be trained by running this function.
              None for infinite. This includes the current epoch in case it
              is not finished yet, i.e. 1 means complete the epoch if there
              are files left, otherwise do the next epoch.

          **to_epoch** : int, optional
              Train up to and including this epoch. Can not be used together with
              epochs.

      :Returns:

          **model** : ks.models.Model
              The trained keras model.


      ..
          !! processed by numpydoc !!

   .. py:method:: train(model=None)

      
      Trains a model on the next file.

      The progress of the training is also logged and plotted.

      :Parameters:

          **model** : ks.models.Model or str, optional
              Compiled keras model to use for training. Required for the first
              epoch (the start of training).
              Can also be the path to a saved keras model, which will be laoded.
              If model is None, the most recent saved model will be
              loaded automatically to continue the training.

      :Returns:

          **history** : dict
              The history of the training on this file. A record of training
              loss values and metrics values.


      ..
          !! processed by numpydoc !!

   .. py:method:: validate()

      
      Validate the most recent saved model on all validation files.

      Will also log the progress, as well as update the summary plot and
      plot weights and activations of the model.


      :Returns:

          **history** : dict
              The history of the validation on all files. A record of validation
              loss values and metrics values.


      ..
          !! processed by numpydoc !!

   .. py:method:: predict(epoch=None, fileno=None, samples=None)

      
      Make a prediction if it does not exist yet, and return its filepath.

      Load the model with the lowest validation loss, let it predict on
      all samples of the validation set
      in the toml list, and save this prediction together with all the
      y_values as h5 file(s) in the predictions subfolder.

      :Parameters:

          **epoch** : int, optional
              Epoch of a model to load. Default: lowest val loss.

          **fileno** : int, optional
              File number of a model to load. Default: lowest val loss.

          **samples** : int, optional
              Don't use the full validation files, but just the given number
              of samples.

      :Returns:

          **pred_filename** : List
              List to the paths of all the prediction files.


      ..
          !! processed by numpydoc !!

   .. py:method:: inference(epoch=None, fileno=None, as_generator=False)

      
      Make an inference and return the filepaths.

      Load the model with the lowest validation loss, let
      it predict on all samples of all inference files
      in the toml list, and save these predictions as h5 files in the
      predictions subfolder. y values will only be added if they are in
      the input file, so this can be used on un-labeled data as well.

      :Parameters:

          **epoch** : int, optional
              Epoch of a model to load. Default: lowest val loss.

          **fileno** : int, optional
              File number of a model to load. Default: lowest val loss.

          **as_generator** : bool
              If true, return a generator, which yields the output filename
              after the inference of each file.
              If false (default), do all files back to back.

      :Returns:

          **filenames** : list
              List to the paths of all created output files.


      ..
          !! processed by numpydoc !!

   .. py:method:: inference_on_file(input_file, output_file=None, saved_model=None, epoch=None, fileno=None)

      
      Save the model prediction for each sample of the given input file.

      Useful for sharing a saved model, since the usual training folder
      structure is not necessarily required.

      Parameters
      ---------
      input_file : str or dict
          Path to a DL file on which the inference should be done on.
          Can also be a dict mapping input names to files.
      output_file : str, optional
          Save output to an h5 file with this name. Default: auto-generate
          name and save in same directory as the input file.
      saved_model : str, optional
          Optional path to a saved model, which will be used instead
          of loading the one with the given epoch/fileno.
      epoch : int, optional
          Epoch of a model to load from the directory. Only relevant if
          saved_model is None. Default: lowest val loss.
      fileno : int, optional
          File number of a model to load from the directory. Only relevant
          if saved_model is None. Default: lowest val loss.


      :Returns:

          str
              Name of the output file.


      ..
          !! processed by numpydoc !!

   .. py:method:: cleanup_models()

      
      Delete all models except for the the most recent one (to continue
      training), and the ones with the highest and lowest loss/metrics.


      ..
          !! processed by numpydoc !!

   .. py:method:: get_xs_mean(logging=False)

      
      Set and return the zero center image for each list input.

      Requires the cfg.zero_center_folder to be set. If no existing
      image for the given input files is found in the folder, it will
      be calculated and saved by averaging over all samples in the
      train dataset.

      :Parameters:

          **logging** : bool
              If true, the execution of this function will be logged into the
              full summary in the output folder if called for the first time.

      :Returns:

          dict
              Dict of numpy arrays that contains the mean_image of the x dataset
              (1 array per list input).
              Example format:
              { "input_A" : ndarray, "input_B" : ndarray }


      ..
          !! processed by numpydoc !!

   .. py:method:: load_saved_model(epoch, fileno, logging=False)

      
      Load a saved model.


      :Parameters:

          **epoch** : int
              Epoch of the saved model. If both this and fileno are -1,
              load the most recent model.

          **fileno** : int
              Fileno of the saved model.

          **logging** : bool
              If True, will log this function call into the log.txt file.

      :Returns:

          **model** : keras model
              ..


      ..
          !! processed by numpydoc !!

   .. py:method:: val_is_due(epoch=None)

      
      True if validation is due on given epoch according to schedule.
      Does not check if it has been done already.


      ..
          !! processed by numpydoc !!

   .. py:method:: get_strategy()

      
      Get the strategy for distributed training.


      ..
          !! processed by numpydoc !!


.. py:class:: Configuration(output_folder, list_file=None, config_file=None, **kwargs)


   Contains all the configurable options in the OrcaNet scripts.

   All of these public attributes (the ones without a
   leading underscore) can be changed either directly or with a
   .toml config file via the method update_config().

   :Parameters:

       **output_folder** : str
           Name of the folder of this model in which everything will be saved,
           e.g., the summary.txt log file is located in here.

       **list_file** : str or None
           Path to a toml list file with pathes to all the h5 files that should
           be used for training and validation.

       **config_file** : str or None
           Path to a toml config file with attributes that are used instead of
           the default ones.

       **kwargs**
           Overwrites the values given in the config file.


   :Attributes:

       **batchsize** : int
           Batchsize that will be used for the training, validation and inference of
           the network.
           During training and validation, the last batch in each file will be
           skipped if it has fewer samples than the batchsize.

       **callback_train** : keras callback or list or None
           Callback or list of callbacks to use during training.

       **class_weight** : dict or None
           Optional dictionary mapping class indices (integers) to a weight
           (float) value, used for weighting the loss function (during
           training only). This can be useful to tell the model to
           "pay more attention" to samples from an under-represented class.

       **cleanup_models** : bool
           If true, will only keep the best (in terms of val loss) and the most
           recent from all saved models in order to save disk space.

       **custom_objects** : dict, optional
           Optional dictionary mapping names (strings) to custom classes or
           functions to be considered by keras during deserialization of models.

       **dataset_modifier** : function or None
           For orga.predict: Function that determines which datasets get created
           in the resulting h5 file. Default: save as array, i.e. every output layer
           will get one dataset each for both the label and the prediction,
           and one dataset containing the y_values from the validation files.

       **fixed_batchsize** : bool
           The last batch in the file might be smaller then the batchsize.
           Usually, this is no problem, but set to True to skip this batch
           [default: False].

       **key_x_values** : str
           The name of the datagroup in the h5 input files which contains
           the samples for the network.

       **key_y_values** : str
           The name of the datagroup in the h5 input files which contains
           the info for the labels.

       **label_modifier** : function or None
           Operation to be performed on batches of y_values read from the input
           files before they are fed into the model as labels. If None is given,
           all y_values with the same name as the output layers will be passed
           to the model as a dict, with the keys being the dtype names.

       **learning_rate** : float, tuple, function, str (optional)
           The learning rate for the training.
           If None is given, don't change the learning rate at all.
           If it is a float: The learning rate will be constantly this value.
           If it is a tuple of two floats: The first float gives the learning rate
           in epoch 1 file 1, and the second float gives the decrease of the
           learning rate per file (e.g. 0.1 for 10% decrease per file).
           If it is a function: Takes as an input the epoch and the
           file number (in this order), and returns the learning rate.
           Both epoch and fileno start at 1, i.e. 1, 1 is the start of the
           training.
           If it is a str: Path to a csv file inside the main folder, containing
           3 columns with the epoch, fileno, and the value the lr will be set
           to when reaching this epoch/fileno.

       **max_queue_size** : int
           max_queue_size option of the keras training and evaluation generator
           methods. How many batches get preloaded from the generator.

       **multi_gpu** : bool
           Use all availble GPUs (distributed training if theres more then one).

       **n_events** : None or int
           For testing purposes. If not the whole .h5 file should be used for
           training, define the number of samples.

       **sample_modifier** : function or None
           Operation to be performed on batches of x_values read from the input
           files before they are fed into the model as samples.

       **shuffle_train** : bool
           If true, the order in which batches are read out from the files during
           training are randomized each time they are read out.

       **train_logger_display** : int
           How many batches should be averaged for one line in the training log files.

       **train_logger_flush** : int
           After how many lines the training log file should be flushed (updated on
           the disk). -1 for flush at the end of the file only.

       **use_scratch_ssd** : bool
           Declares if the input files should be copied to a local temp dir,
           i.e. the path defined in the 'TMPDIR' environment variable.

       **validate_interval** : int or None
           Validate the model after this many training files have been trained on
           in an epoch. There will always be a validation at the end of an epoch.
           None for only validate at the end of an epoch.
           Example: validate_interval=3 --> Validate after file 3, 6, 9, ...

       **verbose_train** : int
           verbose option of keras.model.fit_generator.
           0 = silent, 1 = progress bar, 2 = one line per epoch.

       **verbose_val** : int
           verbose option of evaluate_generator.
           0 = silent, 1 = progress bar.

       **y_field_names** : tuple or list or str, optional
           During train and val, read out only these fields from the y dataset.
           --> Speed up, especially if there are many fields.

       **zero_center_folder** : None or str
           Path to a folder in which zero centering images are stored.
           If this path is set, zero centering images for the given dataset will
           either be calculated and saved automatically at the start of the
           training, or loaded if they have been saved before.


   ..
       !! processed by numpydoc !!
   .. py:method:: import_list_file(list_file)

      
      Import the filepaths of the h5 files from a toml list file.


      :Parameters:

          **list_file** : str
              Path to the toml list file.


      ..
          !! processed by numpydoc !!

   .. py:method:: update_config(config_file)

      
      Update the default cfg parameters with values from a toml config file.


      :Parameters:

          **config_file** : str
              Path to a toml config file.


      ..
          !! processed by numpydoc !!

   .. py:method:: get_list_file()

      
      Returns the path to the list file that was used to set the training
      and validation files. None if no list file has been used.


      ..
          !! processed by numpydoc !!

   .. py:method:: get_files(which)

      
      Get the training or validation file paths for each list input set.


      :Parameters:

          **which** : str
              Either "train", "val" or "inference".

      :Returns:

          dict
              A dict containing the paths to the training or validation files on
              which the model will be trained on. Example for the format for
              two input sets with two files each:
                      {
                       "input_A" : ('path/to/set_A_file_1.h5', 'path/to/set_A_file_2.h5'),
                       "input_B" : ('path/to/set_B_file_1.h5', 'path/to/set_B_file_2.h5'),
                      }


      ..
          !! processed by numpydoc !!

   .. py:method:: get_custom_objects()

      
      Get user custom objects + orcanet internal ones.


      ..
          !! processed by numpydoc !!