{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/bmalcover/AppOC/blob/main/docs/notebooks/02_Xarxes/02_MLP_Scikit.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>"
   ],
   "id": "1623535b01e4154a"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# Multi-layer Perceptron a Scikit\n",
    "\n",
    "Scikit-learn offers two objects to work with MLPs: MLPClassifier for classification problems and MLPRegressor for regression problems. Both share the same parameters and the same interface, with the only difference being that MLPRegressor uses a linear activation function at the output layer and optimizes a regression loss function (MSE by default), whereas MLPClassifier automatically adapts the output to the classification task. This separation follows scikit-learn's general convention of having differentiated objects for each type of task.\n",
    "\n",
    "\n",
    "## MLPClassifier\n",
    "\n",
    "MLPClassifier is scikit-learn's implementation of a Multi-Layer Perceptron for classification tasks. It follows the same interface as the rest of scikit-learn's estimators (`fit`, `predict`, `score`), which facilitates integrating it into other programs and comparing it with other classifiers we reviewed in block 1. Internally, it trains the network using the _backpropagation_ algorithm and allows configuring its architecture, activation function, and optimization algorithm.\n",
    "\n",
    "[Link to documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)\n"
   ],
   "id": "cf82e689fb7e1801"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "The way to construct such an object is as follows:\n",
    "```python\n",
    "from sklearn.neural_network import MLPClassifier\n",
    "\n",
    "model = MLPClassifier(\n",
    "    hidden_layer_sizes=(100,),\n",
    "    activation='relu',\n",
    "    solver='adam',\n",
    "    batch_size='auto',\n",
    "    learning_rate='constant',\n",
    "    learning_rate_init=0.001,\n",
    "    shuffle=True,\n",
    "    random_state=None\n",
    ")\n",
    "```\n",
    "\n",
    "### Parameters\n",
    "\n",
    "Let's see the most relevant parameters:\n",
    "\n",
    "\n",
    "- `hidden_layer_sizes` — default: `(100,)`: Defines the number of hidden layers and the number of neurons in each layer. Each element of the tuple corresponds to a hidden layer.\n",
    "\n",
    "```python\n",
    "hidden_layer_sizes=(100,)        # 1 hidden layer with 100 neurons\n",
    "hidden_layer_sizes=(100, 50)     # 2 hidden layers: 100 and 50 neurons\n",
    "hidden_layer_sizes=(128, 64, 32) # 3 hidden layers\n",
    "```\n",
    "\n",
    "- `activation` — default: `'relu'`\n",
    "\n",
    "Activation function of the hidden layers. It is important to note that the output layer uses Softmax (multi-class) or Sigmoid (binary) automatically based on the _groundtruth_.\n",
    "\n",
    "| Value | Function |\n",
    "|-------|--------|\n",
    "| `'relu'` | $f(x) = \\max(0, x)$ — recomanada per defecte |\n",
    "| `'tanh'` | $f(x) = \\tanh(x)$ |\n",
    "| `'logistic'` | $f(x) = \\sigma(x)$ (Sigmoid) |\n",
    "| `'identity'` | $f(x) = x$ (no activation) |\n",
    "\n",
    "- `solver` — default: `'adam'`\n",
    "\n",
    "Optimization algorithm to update the weights.\n",
    "\n",
    "| Valor | Descripció                                                  |\n",
    "|-------|-------------------------------------------------------------|\n",
    "| `'adam'` | Adaptive, efficient, recommended for most cases.   |\n",
    "| `'sgd'` | Stochastic gradient descent, requires more manual tuning. |\n",
    "| `'lbfgs'` | Based on quasi-Newton, suitable for small datasets.      |\n",
    "\n",
    "- `batch_size` — default: `'auto'`\n",
    "\n",
    "Number of samples per weight update. With `'auto'`, scikit-learn uses `min(200, n_samples)`.\n",
    "\n",
    "```python\n",
    "batch_size='auto'   # min(200, number of observations)\n",
    "batch_size=32       # small batch → more noise, can generalize better\n",
    "batch_size=256      # large batch → more stable and faster training\n",
    "```\n",
    "\n",
    "> **Note:** `batch_size` only applies when `solver='sgd'` or `solver='adam'`.\n",
    "\n",
    "- `learning_rate_init` — default: `0.001`\n",
    "\n",
    "Initial value of the learning rate. Controls the magnitude of the weight update steps.\n",
    "\n",
    "```python\n",
    "learning_rate_init=0.001   # default value, good starting point\n",
    "learning_rate_init=0.01    # faster learning, risk of not converging\n",
    "learning_rate_init=0.0001  # slow but stable learning\n",
    "```\n",
    "\n",
    "- `shuffle` — default: `True`\n",
    "\n",
    "If this parameter is set to `True`, it shuffles the training samples before each epoch. Recommended to prevent the order of the data from influencing learning.\n",
    "\n",
    "- `random_state` — default: `None`\n",
    "\n",
    "Is the seed of the random number generator. Setting it guarantees reproducibility of results.\n",
    "\n",
    "```python\n",
    "random_state=None   # non-reproducible results\n",
    "random_state=42     # reproducible results\n",
    "```\n"
   ],
   "id": "975b055237df84ab"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### A Working Example\n",
    "\n",
    "Next we'll see a working example with a toy dataset where we have 1000 observations with 10 features from 3 different classes."
   ],
   "id": "9266915f6448b5d3"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "from sklearn.datasets import make_classification\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.neural_network import MLPClassifier\n",
    "from sklearn.metrics import classification_report\n",
    "random_state = 42"
   ],
   "id": "78f755f2c6b4248d"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Fake data generation and preparation",
   "id": "2ddbc7286cd368ad"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "# Dataset generation\n",
    "X, y = make_classification(\n",
    "    n_samples=1000,\n",
    "    n_features=10,\n",
    "    n_classes=3,\n",
    "    n_informative=6,\n",
    "    random_state=random_state\n",
    ")\n",
    "\n",
    "# Split into train/test\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, test_size=0.2, random_state=random_state\n",
    ")\n",
    "\n",
    "# Normalization\n",
    "scaler = StandardScaler()\n",
    "X_train = scaler.fit_transform(X_train)\n",
    "X_test = scaler.transform(X_test)"
   ],
   "id": "b64d9ed445918a80"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Training",
   "id": "7ffda1d103736fb6"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "# Training\n",
    "model = MLPClassifier(\n",
    "    hidden_layer_sizes=(64, 32), # How do we know these are the right layers?\n",
    "    activation='relu',\n",
    "    solver='adam',\n",
    "    learning_rate_init=0.001,\n",
    "    max_iter=1000,\n",
    "    random_state=random_state\n",
    ")\n",
    "model.fit(X_train, y_train)\n",
    "\n",
    "# Prediction and evaluation\n",
    "y_pred = model.predict(X_test)\n",
    "print(classification_report(y_test, y_pred))\n"
   ],
   "id": "ec5f10a57832b161"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## MLPRegressor\n",
    "\n",
    "The MLPRegressor class implements a multi-layer perceptron (MLP) with no activation function at the output layer, which is equivalent to using the identity function as activation. Therefore, it uses squared error as the loss function, and the output is a set of continuous values. As explained earlier, this class has the same parameters as the MLPClassifier class.\n",
    "\n",
    "[Link to documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html)\n"
   ],
   "id": "c474b63f0ee5b721"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "The way to construct such an object is as follows:\n",
    "```python\n",
    "from sklearn.neural_network import MLPRegressor\n",
    "\n",
    "model = MLPRegressor(\n",
    "    hidden_layer_sizes=(100,),\n",
    "    activation='relu',\n",
    "    solver='adam',\n",
    "    batch_size='auto',\n",
    "    learning_rate='constant',\n",
    "    learning_rate_init=0.001,\n",
    "    shuffle=True,\n",
    "    random_state=None\n",
    ")\n",
    "```"
   ],
   "id": "36258dc4ca42232c"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Saving and Loading a Model with scikit-learn\n",
    "\n",
    "Once a model has been trained, it can be saved to disk so that it can be reused later without retraining. scikit-learn models are Python objects and can be serialised using the `joblib` library, which is the recommended approach for scikit-learn estimators.\n",
    "\n",
    "### Saving the model\n",
    "\n",
    "```python\n",
    "import joblib\n",
    "\n",
    "joblib.dump(best_model, 'mlp_regressor.pkl')\n",
    "print(\"Model saved as mlp_regressor.pkl\")\n",
    "```\n",
    "\n",
    "### Loading the model\n",
    "\n",
    "To load the model, the same architecture does not need to be defined beforehand — `joblib` restores the complete object including all learned parameters:\n",
    "\n",
    "```python\n",
    "loaded_model = joblib.load('mlp_regressor.pkl')\n",
    "print(\"Model loaded successfully\")\n",
    "```\n",
    "\n",
    "### Making predictions without retraining\n",
    "\n",
    "```python\n",
    "y_pred = loaded_model.predict(X_test)\n",
    "```\n",
    "\n",
    "> **Note:** it is also good practice to save the scaler alongside the model, since the test data must be transformed with the same scaler used during training:\n",
    "> ```python\n",
    "> joblib.dump(scaler, 'scaler.pkl')\n",
    "> scaler = joblib.load('scaler.pkl')\n",
    "> X_test_scaled = scaler.transform(X_test)\n",
    "> ```"
   ],
   "id": "41fb41db9dfb0c12"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Exercises\n",
    "\n",
    "- [Ocean Quality info](https://www.kaggle.com/datasets/vinothkannaece/ocean-quality-dataset)\n",
    "- [Download dataset](https://github.com/bmalcover/AppOC/blob/main/docs/_static/02/ocean.zip)\n",
    "\n",
    "This dataset is a synthetically generated ocean water quality dataset designed for machine learning experiments related to environmental monitoring, water quality prediction, and classification tasks. It simulates realistic oceanic conditions with continuous and categorical features, incorporating probabilistic variation and random noise to prevent model overfitting.\n",
    "\n",
    "The dataset consists of 100,000 records and 8 features, including physicochemical parameters commonly used in ocean quality assessment\n",
    "\n",
    "Tasks:\n",
    "1. Given the *Ocean Quality* dataset, train an MLP to predict the value of variable *Quality*.\n",
    "2. Find the best MLP to predict the Salinity. Exclude the `Quality` feature from the dataset for this exercise."
   ],
   "id": "55cc2e4f8bbc106"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}