ANN optimization for building metamodels exploiting a variable batch size training and hyperparameter tuning

To find an optimized ANN architecture for building a metamodel, we first split the training dataset into training and validation sets (normally with a ratio of 15%) using the model selection function in the Scikit-learn library (the test dataset will not be used in the ANN optimization process).

from sklearn.model_selection import train_test_split
Data_train, Data_validation = train_test_split(Data_train, test_size=0.15)

We train our ANNs using the Keras 2.3.1 library with TensorFlow 2.2.0 as the backend. Our sample neural network for the problem presented here (480 training data points, two inputs, and one output) consists of a hidden layer with 4 nodes next to the input layer, and a combination of 1 to 5 intermediate hidden layers with either 4, 8, 16, or 32 nodes within every hidden layer. We apply the ‘tanh’ activation function as a good function approximator for all the layers of our ANNs. We test the Adam optimizer with different learning rates from 5×10^-4 to 10^-3 with “log” sampling (which assigns equal probabilities within the range of each order of magnitude) in the Keras HyperParameters container.

import tensorflow as tf
def build_model(hp):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(4, input_dim=2, kernel_initializer='normal', activation='tanh'))
for i in range(hp.Int('hidden_layers_count', 1, 5, default=2)):
    model.add(tf.keras.layers.Dense(hp.Choice('hidden_size', values=[4, 8, 16, 32], default=16), kernel_initializer='normal', activation='tanh'))
model.add(tf.keras.layers.Dense(1, kernel_initializer='normal'))
model.compile(
    optimizer=tf.keras.optimizers.Adam(
              hp.Float('learning_rate', min_value=5e-4, max_value=1e-3, sampling='LOG', default=5e-4)),        
              loss='mean_squared_error', metrics=['mean_absolute_error'])
    return model

We use the Hyperband algorithm that is a Bandit-based approach to optimize the hyperparameters of our model.

import kerastuner as kt
hp = kt.HyperParameters()
model = build_model(hp)
tuner = kt.Hyperband(
    build_model,
    objective='val_loss',
    factor=2,
    max_epochs=1000,
    hyperband_iterations=10,
    directory=dir_opt,
    project_name=dir_prj)

After testing different combinations through the Hyperband algorithm, the optimum structure as summarized in Table 1 is introduced.

tuner.search(X_train, Y_train, validation_data = (X_validation, Y_validation), callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=100)], verbose=0)
best_model = tuner.get_best_models(1)[0]
best_hyperparameters = tuner.get_best_hyperparameters(1)[0]
best_model.summary()

Table 1. The structure of the optimized ANN for metamodeling.

Next, we run a limited number of brute-force tunings on the optimized model using more epochs. At this stage, we apply a multi-stage training procedure using different batch sizes of 2ⁿ⁺¹ applied sequentially (n = 1, 2, …, where the maximum value for n is such that 2ⁿ⁺¹ does not exceed the length of the dataset). We pick the best model among the models produced after each of these stages.

import numpy as np
data_count = len(Y_train)
epochs = 10000
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=500)
loss_val = 1.
for i in range(1-n+int(np.log2(data_count))):
    batch_size = 2**(i+1+n)
    if batch_size>data_count: batch_size = data_count
    history = model.fit(x=X_train,y=Y_train, batch_size = batch_size, epochs=epochs, verbose=0,
              validation_data = (X_validation,Y_validation), shuffle=True,
              callbacks=[early_stop, tfdocs.modeling.EpochDots()])
    result_v = model.evaluate(X_val, Y_val, verbose=0)
    if result_v[0]<loss_val:
        loss_val = result_v[0]
        model.save(model_path)
        history_recorder(history, loss_path)

Sequentially changing the batch size enables us to find a well-trained ANN without worrying about having a proper value for the batch size as well as the learning rate, which changes adaptively for the Adam optimizer. Nonetheless, we test a limited number of learning rates from 0.0001 to 0.001 and perform the multi-stage training separately. No dropout layer is required for our sample problem, and a learning rate of 0.0005 provides the lowest validation loss. One may validate this approach on another benchmark dataset (for example, the one presented here) to compare its efficiency against other training procedures. The mean absolute error for our sample problem is plotted in Fig. 1.

Fig. 1. The mean absolute error for the training and validation datasets versus epoch number for an optimized ANN.

The code for the ANN Structure Design (using Hyperband and Bayesian optimizations) is available here (or run in Google Colab).

The code for the variable-batch training is available here (or run in Google Colab).

ANN optimization for building metamodels exploiting a variable batch size training and hyperparameter tuning

M. Ashouri

Leave a Reply Cancel reply

ANN class: Layers: Total parameters: Activation function: Optimizer: Learning rate: Batch size:	multilayer perceptron (MLP) feedforward input: 4 nodes, 2 inputs 2 hidden layers: 16 nodes each output layer: 1 node, 1 output 381 ‘tanh’ ( $=2/\left( {1+{{e}^{{-2x}}}} \right)-1$ ) Adam 0.0005 ${{2}^{{n+1}}};~n=1,2,\ldots$