{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/bmalcover/AppOC/blob/main/docs/notebooks/02_Xarxes/03_MLP_Pytorch.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>"
   ],
   "id": "66fd4819c0a54889"
  },
  {
   "metadata": {
    "collapsed": true
   },
   "cell_type": "markdown",
   "source": [
    "# Multi-layer Perceptron a PyTorch\n",
    "\n",
    "## Introduction to PyTorch\n",
    "\n",
    "PyTorch is a deep learning library developed by Meta that allows building and training neural networks in a flexible and efficient way. Unlike scikit-learn, which completely abstracts the training process, PyTorch gives complete control over each step of the training loop, making it especially suitable for complex architectures like RNNs, CNNs, or Transformers. The fundamental difference with scikit-learn is that in PyTorch we ourselves write the training process explicitly: what scikit-learn does with `model.fit()` automatically, in PyTorch we must implement it step by step.\n",
    "\n",
    "In this section we'll learn to build our own MLP to discover how the [PyTorch](https://pytorch.org/) library works.\n",
    "\n",
    "## Building an MLP\n",
    "\n",
    "\n",
    "Using the `nn.Sequential` structure is the simplest way to build a network in PyTorch. It allows defining the architecture as an ordered sequence of layers, similar to how we specify `hidden_layer_sizes` in scikit-learn.\n",
    "\n",
    "In the following example we see how we define an MLP for the problem we saw in the previous section. In this case we must define both the number of elements in the input layer and the output layer. In this case we don't differentiate between classification and regression problems, the definition of the network's output and the activation and loss functions we use will be what define its functionality.\n",
    "\n",
    "```python\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "\n",
    "model = nn.Sequential(\n",
    "    nn.Linear(10, 64),   # input layer → first hidden layer\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(64, 32),   # first hidden layer → second hidden layer\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(32, 3)     # second hidden layer → output layer (3 classes)\n",
    ")\n",
    "```\n",
    "\n",
    "Each `nn.Linear(in, out)` defines a fully connected layer with the corresponding weights $W$ and biases $b$. Activation functions are added explicitly between layers, unlike scikit-learn where they are specified with the `activation` parameter.\n",
    "\n",
    "\n",
    "\n",
    "## Complete Example\n",
    "\n",
    "\n",
    "Below we explain how the code works to train and evaluate an MLP, we'll solve the same problem as in the previous section so we can observe the similarities and differences between the two libraries.\n",
    "\n",
    "**Data generation and preparation**\n"
   ],
   "id": "3c35a33a59af5ff5"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "from sklearn.datasets import make_classification\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "# Data generation and splitting (same as with scikit-learn)\n",
    "X, y = make_classification(\n",
    "    n_samples=1000,\n",
    "    n_features=10,\n",
    "    n_classes=3,\n",
    "    n_informative=6,\n",
    "    random_state=42\n",
    ")\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, test_size=0.2, random_state=42\n",
    ")\n",
    "\n",
    "scaler = StandardScaler()\n",
    "X_train = scaler.fit_transform(X_train)\n",
    "X_test  = scaler.transform(X_test)\n",
    "\n",
    "# Conversion of NumPy arrays to PyTorch tensors: This step is mandatory\n",
    "X_train = torch.tensor(X_train, dtype=torch.float32)\n",
    "X_test  = torch.tensor(X_test,  dtype=torch.float32)\n",
    "y_train = torch.tensor(y_train, dtype=torch.long)\n",
    "y_test  = torch.tensor(y_test,  dtype=torch.long)\n"
   ],
   "id": "c6a6b3f56c9c0ec7"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Next, we define the model, the loss function and the optimizer.",
   "id": "4aadfff4dd56eeb6"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "# Model definition. Here we have to define the layers and the activation functions between them.\n",
    "model = nn.Sequential(\n",
    "    nn.Linear(10, 64),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(64, 32),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(32, 3)\n",
    ")\n",
    "\n",
    "# Definition of loss function and optimizer\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # The learning rate is defined here\n",
    "\n",
    "\n"
   ],
   "id": "da1c9ee8c06125ce"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "**Training loop**\n",
    "\n",
    "The training loop follows the 4 elemental steps that we explained in the introduction: Forward pass, Loss calculation, Backward and Weight update.\n",
    "\n"
   ],
   "id": "2f4bf7a0973fb610"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "epochs = 200\n",
    "for epoch in range(epochs):\n",
    "    model.train()\n",
    "\n",
    "    # 1. Forward pass\n",
    "    y_pred = model(X_train)\n",
    "\n",
    "    # 2. Loss calculation\n",
    "    loss = criterion(y_pred, y_train)\n",
    "\n",
    "    # 3. Backward pass\n",
    "    optimizer.zero_grad()\n",
    "    loss.backward()\n",
    "\n",
    "    # 4. Weight update\n",
    "    optimizer.step()\n",
    "    # Display partial results\n",
    "    if (epoch + 1) % 50 == 0:\n",
    "        print(f\"Epoch {epoch+1}/{epochs} - Loss: {loss.item():.4f}\")\n"
   ],
   "id": "5f6bf429cf093b6"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "**Model evaluation**\n",
    "\n",
    "Finally, we use the test set to evaluate the model."
   ],
   "id": "fa275d5dae6cb709"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "model.eval()\n",
    "with torch.no_grad():\n",
    "    y_pred_test = model(X_test)\n",
    "    predicted   = torch.argmax(y_pred_test, dim=1)\n",
    "    accuracy    = (predicted == y_test).float().mean()\n",
    "    print(f\"\\nAccuracy: {accuracy:.4f}\")"
   ],
   "id": "745c5fba6a24161c"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "Aspects to keep in mind of this code that make it very different from training with Scikit:\n",
    "\n",
    "1. `optimizer.zero_grad()` is necessary because PyTorch accumulates gradients by default. This means that if we don't reset them, the gradients from the previous iteration are added to those of the current iteration, leading to incorrect weight updates.\n",
    "2. `model.eval()` and `torch.no_grad()` disable gradient calculation during evaluation, saving memory.\n",
    "3. `nn.CrossEntropyLoss` already incorporates the Softmax function internally, which is why the output layer has no activation function.\n"
   ],
   "id": "a25c5e218e1e23d2"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Tensors\n",
    "As you may have noticed, in the code we've transformed NumPy `ndarray` arrays to the `tensor` data type. A tensor is the basic data structure of PyTorch, equivalent to NumPy arrays but with the additional ability to run on GPU and automatically calculate gradients. A scalar, a vector, a matrix, or any n-dimensional array is represented as tensors.\n",
    "\n",
    "We can create tensors manually, although it won't be a common thing:\n",
    "```python\n",
    "import torch\n",
    "\n",
    "# From a list\n",
    "t = torch.tensor([1.0, 2.0, 3.0])\n",
    "\n",
    "# Special tensors\n",
    "torch.zeros(3, 4)      # 3x4 matrix of zeros\n",
    "torch.ones(3, 4)       # 3x4 matrix of ones\n",
    "torch.rand(3, 4)       # 3x4 matrix of random values between 0 and 1\n",
    "torch.randn(3, 4)      # 3x4 matrix of random values with normal distribution\n",
    "```\n",
    "\n",
    "### Data Types (dtype)\n",
    "The data type is important because PyTorch is strict: operations between tensors of different types generate errors.\n",
    "\n",
    "\n",
    "| `dtype` | Description | Typical use |\n",
    "|---------|------------|----------|\n",
    "| `torch.float32` | 32-bit decimal | Features, network weights |\n",
    "| `torch.float64` | 64-bit decimal | High precision (rare) |\n",
    "| `torch.long` | 64-bit integer | Classification labels |\n",
    "| `torch.bool` | Boolean | Masks |\n",
    "\n",
    "```python\n",
    "t = torch.tensor([1.0, 2.0, 3.0], dtype=torch.float32)\n",
    "print(t.dtype)   # torch.float32\n",
    "```\n",
    "\n",
    "### Conversion between NumPy and PyTorch\n",
    "\n",
    "Converting a data structure between these two libraries is straightforward and simple, you just need to know one operation:\n",
    "\n",
    "```python\n",
    "import numpy as np\n",
    "\n",
    "# NumPy → PyTorch\n",
    "array = np.array([1.0, 2.0, 3.0])\n",
    "t = torch.from_numpy(array)\n",
    "\n",
    "# PyTorch → NumPy\n",
    "array = t.numpy()\n",
    "```\n",
    "\n",
    "> **Note:** `torch.from_numpy` shares memory with the original NumPy array. Modifying one modifies the other. If you want an independent copy use `torch.tensor(array)`.\n",
    "\n",
    "### Basic Operations\n",
    "\n",
    "Some basic operations that can be useful, specifically those that provide us with information:\n",
    "\n",
    "```python\n",
    "a = torch.tensor([1.0, 2.0, 3.0])\n",
    "b = torch.tensor([4.0, 5.0, 6.0])\n",
    "\n",
    "a + b                    # element-wise addition\n",
    "a * b                    # element-wise product\n",
    "torch.matmul(a, b)       # dot product (or matrix product)\n",
    "\n",
    "# Tensor information\n",
    "a.shape                  # dimensions\n",
    "a.dtype                  # data type\n",
    "a.device                 # cpu or cuda\n",
    "```"
   ],
   "id": "9c86effb3790a15"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Graphics Card\n",
    "GPUs are designed to perform many mathematical operations in parallel, making them much more efficient than CPUs for training neural networks. The main operations of a neural network (matrix multiplications and gradient calculation) especially benefit from this parallelization. CUDA is NVIDIA's platform that allows taking advantage of the GPU from PyTorch.\n",
    "To find out if we have a GPU available:\n",
    "\n",
    "```python\n",
    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
    "print(f\"Device: {device}\")\n",
    "```\n",
    "To use the GPU, you need to move both the model and the data to the same device. It's important to note that the model and data must always be on the same device. Mixing CPU and GPU tensors generates an error. The changes to the previous code are minimal:\n",
    "\n",
    "```python\n",
    "# Move model to GPU\n",
    "model = model.to(device)\n",
    "\n",
    "# Move data to GPU\n",
    "X_train = X_train.to(device)\n",
    "X_test  = X_test.to(device)\n",
    "y_train = y_train.to(device)\n",
    "y_test  = y_test.to(device)\n",
    "```\n",
    "The rest of the code (training loop, evaluation) doesn't need any changes, as PyTorch automatically manages operations on the corresponding device.\n",
    "\n",
    "### Back to the cpu\n",
    "\n",
    "Once the model has been trained on the GPU, it is sometimes necessary to move the results back to the CPU — for example, to convert predictions to NumPy arrays or to use scikit-learn metrics, which do not support CUDA tensors. This can be done using the `.cpu()` method:\n",
    "\n",
    "```python\n",
    "predictions = model(X_test).cpu().detach().numpy()\n",
    "```\n",
    "\n",
    "The `.detach()` call is needed to remove the tensor from the computation graph before converting it to NumPy."
   ],
   "id": "487132efa0acda3b"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Saving and Loading Models\n",
    "\n",
    "Once the model has been trained, it can be saved to disk so that it can be reused later without the need to retrain it. PyTorch allows saving the model weights at any point, including during training, for example, saving the best model found so far based on validation loss. Later, the saved weights can be loaded into a model with the same architecture to make predictions on new samples, which is especially useful in production environments or when sharing models with other researchers.\n",
    "\n",
    "Once training is complete we can use the following command to save the model weights:\n",
    "\n",
    "```python\n",
    "torch.save(model.state_dict(), 'NameOfTheModel') # Typically model.pth\n",
    "```\n",
    "\n",
    "At any time, we can load the model. To load the weights, you must first instantiate the model with the same architecture and then load the weights:\n",
    "\n",
    "```python\n",
    "# Instantiate model with same architecture\n",
    "model = nn.Sequential(\n",
    "    nn.Linear(10, 64),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(64, 32),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(32, 3)\n",
    ")\n",
    "\n",
    "# Load the weights\n",
    "model.load_state_dict(torch.load('NameOfTheModel')) # Typically model.pth\n",
    "model.eval()\n",
    "```\n",
    "\n",
    "Why do we use `state_dict`? A `state_dict` is simply a Python dictionary that maps each layer of the model with its weights and biases. Saving only the weights (not the entire model) is more robust against code changes and PyTorch versions."
   ],
   "id": "e53dcc6f76ace7d7"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Exercises\n",
    "\n",
    "In these exercises we'll use the same dataset as in the previous block: [Ocean Quality Dataset](https://github.com/bmalcover/AppOC/blob/main/docs/_static/02/ocean.zip)\n",
    "\n",
    "1. Train an MLP with the same configuration as Scikit but now using PyTorch for the clasification task.\n",
    "2. Repeat the training with different values of the learning rate and visualize how the loss evolution changes. Remember that this parameter makes sense with values close to 0. **Extra**: Make the plot using the `matplotlib` library.\n",
    "3. Save the best model. In another cell or Python `script` load it and make predictions without retraining."
   ],
   "id": "da8b4afa896ec3e9"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}