Multi-layer Perceptron a Scikit

Scikit-learn offers two objects to work with MLPs: MLPClassifier for classification problems and MLPRegressor for regression problems. Both share the same parameters and the same interface, with the only difference being that MLPRegressor uses a linear activation function at the output layer and optimizes a regression loss function (MSE by default), whereas MLPClassifier automatically adapts the output to the classification task. This separation follows scikit-learn’s general convention of having differentiated objects for each type of task.

MLPClassifier

MLPClassifier is scikit-learn’s implementation of a Multi-Layer Perceptron for classification tasks. It follows the same interface as the rest of scikit-learn’s estimators (fit, predict, score), which facilitates integrating it into other programs and comparing it with other classifiers we reviewed in block 1. Internally, it trains the network using the backpropagation algorithm and allows configuring its architecture, activation function, and optimization algorithm.

Link to documentation

The way to construct such an object is as follows:

from sklearn.neural_network import MLPClassifier

model = MLPClassifier(
    hidden_layer_sizes=(100,),
    activation='relu',
    solver='adam',
    batch_size='auto',
    learning_rate='constant',
    learning_rate_init=0.001,
    shuffle=True,
    random_state=None
)

Parameters

Let’s see the most relevant parameters:

hidden_layer_sizes — default: (100,): Defines the number of hidden layers and the number of neurons in each layer. Each element of the tuple corresponds to a hidden layer.

hidden_layer_sizes=(100,)        # 1 hidden layer with 100 neurons
hidden_layer_sizes=(100, 50)     # 2 hidden layers: 100 and 50 neurons
hidden_layer_sizes=(128, 64, 32) # 3 hidden layers

activation — default: 'relu'

Activation function of the hidden layers. It is important to note that the output layer uses Softmax (multi-class) or Sigmoid (binary) automatically based on the groundtruth.

Value	Function
`'relu'`	\(f(x) = \max(0, x)\) — recomanada per defecte
`'tanh'`	\(f(x) = \tanh(x)\)
`'logistic'`	\(f(x) = \sigma(x)\) (Sigmoid)
`'identity'`	\(f(x) = x\) (no activation)

solver — default: 'adam'

Optimization algorithm to update the weights.

Valor	Descripció
`'adam'`	Adaptive, efficient, recommended for most cases.
`'sgd'`	Stochastic gradient descent, requires more manual tuning.
`'lbfgs'`	Based on quasi-Newton, suitable for small datasets.

batch_size — default: 'auto'

Number of samples per weight update. With 'auto', scikit-learn uses min(200, n_samples).

batch_size='auto'   # min(200, number of observations)
batch_size=32       # small batch → more noise, can generalize better
batch_size=256      # large batch → more stable and faster training

Note: batch_size only applies when solver='sgd' or solver='adam'.

learning_rate_init — default: 0.001

Initial value of the learning rate. Controls the magnitude of the weight update steps.

learning_rate_init=0.001   # default value, good starting point
learning_rate_init=0.01    # faster learning, risk of not converging
learning_rate_init=0.0001  # slow but stable learning

shuffle — default: True

If this parameter is set to True, it shuffles the training samples before each epoch. Recommended to prevent the order of the data from influencing learning.

random_state — default: None

Is the seed of the random number generator. Setting it guarantees reproducibility of results.

random_state=None   # non-reproducible results
random_state=42     # reproducible results

A Working Example

Next we’ll see a working example with a toy dataset where we have 1000 observations with 10 features from 3 different classes.

[ ]:

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report
random_state = 42

Fake data generation and preparation

[ ]:

# Dataset generation
X, y = make_classification(
    n_samples=1000,
    n_features=10,
    n_classes=3,
    n_informative=6,
    random_state=random_state
)

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=random_state
)

# Normalization
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Training

[ ]:

# Training
model = MLPClassifier(
    hidden_layer_sizes=(64, 32), # How do we know these are the right layers?
    activation='relu',
    solver='adam',
    learning_rate_init=0.001,
    max_iter=1000,
    random_state=random_state
)
model.fit(X_train, y_train)

# Prediction and evaluation
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

MLPRegressor

The MLPRegressor class implements a multi-layer perceptron (MLP) with no activation function at the output layer, which is equivalent to using the identity function as activation. Therefore, it uses squared error as the loss function, and the output is a set of continuous values. As explained earlier, this class has the same parameters as the MLPClassifier class.

Link to documentation

The way to construct such an object is as follows:

from sklearn.neural_network import MLPRegressor

model = MLPRegressor(
    hidden_layer_sizes=(100,),
    activation='relu',
    solver='adam',
    batch_size='auto',
    learning_rate='constant',
    learning_rate_init=0.001,
    shuffle=True,
    random_state=None
)

Saving and Loading a Model with scikit-learn

Once a model has been trained, it can be saved to disk so that it can be reused later without retraining. scikit-learn models are Python objects and can be serialised using the joblib library, which is the recommended approach for scikit-learn estimators.

Saving the model

import joblib

joblib.dump(best_model, 'mlp_regressor.pkl')
print("Model saved as mlp_regressor.pkl")

Loading the model

To load the model, the same architecture does not need to be defined beforehand — joblib restores the complete object including all learned parameters:

loaded_model = joblib.load('mlp_regressor.pkl')
print("Model loaded successfully")

Making predictions without retraining

y_pred = loaded_model.predict(X_test)

Note: it is also good practice to save the scaler alongside the model, since the test data must be transformed with the same scaler used during training:
joblib.dump(scaler, 'scaler.pkl')
scaler = joblib.load('scaler.pkl')
X_test_scaled = scaler.transform(X_test)

Exercises

This dataset is a synthetically generated ocean water quality dataset designed for machine learning experiments related to environmental monitoring, water quality prediction, and classification tasks. It simulates realistic oceanic conditions with continuous and categorical features, incorporating probabilistic variation and random noise to prevent model overfitting.

The dataset consists of 100,000 records and 8 features, including physicochemical parameters commonly used in ocean quality assessment

Tasks:

Given the Ocean Quality dataset, train an MLP to predict the value of variable Quality.
Find the best MLP to predict the Salinity. Exclude the Quality feature from the dataset for this exercise.