Example on toy data

This page shows how to use orcanet via python by applying it on some toy data. The full scripts can be found in examples/full_example.

In order to use orcanet, data in the form of multi dimensional images in an h5 file, as well as a compiled keras model is required.

The data

Lets generate some dummy data: Vectors of length 5, filled with random numbers. The label is the sum of each vector.

 1def make_dummy_data(path):
 2    """
 3    Save a train and a val h5 file with random numbers as samples,
 4    and the sum as labels.
 5
 6    """
 7    def get_dummy_data(samples):
 8        xs = np.random.rand(samples, 3)
 9
10        dtypes = [('sum', '<f8'), ]
11        labels = np.sum(xs, axis=-1)
12        ys = labels.ravel().view(dtype=dtypes)
13
14        return xs, ys
15
16    def save_dummy_data(path, samples):
17        xs, ys = get_dummy_data(samples)
18
19        with h5py.File(path, 'w') as h5f:
20            h5f.create_dataset('x', data=xs, dtype='<f8')
21            h5f.create_dataset('y', data=ys, dtype=ys.dtype)
22    save_dummy_data(path + "example_train.h5", 40000)
23    save_dummy_data(path + "example_val.h5", 5000)

The generated train and val files are given to orcanet with this toml list file:

1[random_numbers]
2
3train_files = [
4    "output/example_train.h5",
5]
6
7validation_files = [
8    "output/example_val.h5",
9]

Note that we defined the name of this dataset to be “random_numbers”.

The model

This is a small compiled keras model with just one hidden dense layer. Note that the input layer has the same name that we gave the dataset (“random_numbers”).

 1def make_dummy_model():
 2    """
 3    Build and compile a small dummy model.
 4    """
 5
 6    input_shape = (3,)
 7
 8    inp = Input(input_shape, name="random_numbers")
 9    x = Dense(10)(inp)
10    outp = Dense(1, name="sum")(x)
11
12    model = Model(inp, outp)
13    model.compile("sgd", loss="mae")
14
15    return model

Training and results

After creating the data and compiling the model, they are handed to the Organizer object. Training and validation, as well as predicting can be done in just one line each.

In total, the generation of the model, the data, and conducting the training is done like this:

 1def use_orcanet():
 2    temp_folder = "output/"
 3    os.mkdir(temp_folder)
 4
 5    make_dummy_data(temp_folder)
 6    list_file = "example_list.toml"
 7
 8    organizer = Organizer(temp_folder + "sum_model", list_file)
 9    organizer.cfg.train_logger_display = 10
10
11    model = make_dummy_model()
12    organizer.train_and_validate(model, epochs=3)
13
14    organizer.predict()