{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/bmalcover/AppOC/blob/main/docs/notebooks/03_Series/04_Projecte.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>"
   ],
   "id": "2b13139ab7b63801"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "# Project: Sea Surface Temperature Prediction\n",
    "\n",
    "## Description\n",
    "\n",
    "In this project we will use a **TAO/TRITON** buoy dataset to predict sea surface temperature (`s.s.temp.`) in the equatorial Pacific Ocean. Anomalies in sea surface temperature in this region are the main indicator of **El Niño** and **La Niña** events, which have a global impact on climate.\n",
    "\n",
    "Starting from the preprocessed DataFrame `df2`, you have build and compare three recurrent architectures: **RNN**, **LSTM**, and **GRU**, evaluating their ability to capture the temporal dependencies of the series.\n",
    "\n",
    "[Download dataset](https://github.com/bmalcover/AppOC/blob/main/docs/_static/03/el%2Bnino.zip)\n",
    "\n",
    "The dataset contains the following columns:\n",
    "\n",
    "| Column | Description |\n",
    "|--------|-------------|\n",
    "| `month` | Month of the observation |\n",
    "| `day` | Day of the observation |\n",
    "| `latitude` | Latitude of the buoy |\n",
    "| `longitude` | Longitude of the buoy |\n",
    "| `zon.winds` | Zonal wind speed (east-west) |\n",
    "| `mer.winds` | Meridional wind speed (north-south) |\n",
    "| `air temp.` | Air temperature (ºC) |\n",
    "| `s.s.temp.` | Sea surface temperature (ºC) — **target variable** |\n",
    "\n",
    "\n",
    "**Delivery**\n",
    "\n",
    "The project must be submitted as a Jupyter Notebook (.ipynb) file containing all the code, results, and markdown cells explaining each step and the decisions made. Additionally, a PDF export of the notebook must be included so that the results and visualisations are easily accessible without needing to run the code. Both files must be submitted together."
   ],
   "id": "46b90c41f04c5e53"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "from sklearn.metrics import mean_absolute_error\n"
   ],
   "id": "d7bcf2b89a6bda7a",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "## Data loading\n",
    "\n",
    "Loading the data in a dataframe called `df2`:"
   ],
   "id": "4cfbef1aa8ffbfe7"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": [
    "with open('tao-all2.col') as file:\n",
    "    names2 = []\n",
    "    for name in file:\n",
    "\n",
    "        n = name.strip(\"\\n\")\n",
    "        names2.append(n)\n",
    "\n",
    "print(names2)"
   ],
   "id": "934772aebb2230dc",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": [
    "df2 = pd.read_csv(\n",
    "    'tao-all2.dat.gz',\n",
    "    compression='gzip',\n",
    "    sep=r'\\s+',\n",
    "    na_values='.',\n",
    "    comment='%',\n",
    "    header=None,\n",
    ")\n",
    "df2.columns = names2"
   ],
   "id": "3ec2192407b69125",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Deleting unused features:",
   "id": "350b36197b03348f"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "df2.drop([\"obs\", \"humidity\",\"date\",\"year\"], axis=1, inplace=True)",
   "id": "c065df3d0853081",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": [
    "original_rows = df2.shape[0]\n",
    "df2 = df2.dropna()\n",
    "print(f\" Rows after dropna: {df2.shape[0]}\")\n",
    "\n",
    "print(f\"Deleted rows: {original_rows - df2.shape[0]}\")"
   ],
   "id": "aa5c2daf2fb034a9",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "df2.columns",
   "id": "20e3c789f1b4e5f",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "## Definition of train and test sets",
   "id": "2a920ca130d5afaf"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "\n",
   "id": "dc76d43d69935ded",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "## Models",
   "id": "e8d783f8869f0ff2"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "",
   "id": "d0ba7a5f4a4d8378",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "## Training",
   "id": "d17343583df103d5"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "",
   "id": "f5908973c0ceae75",
   "outputs": [],
   "execution_count": null
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "## Evaluation",
   "id": "11cb5808a86a3400"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "source": "",
   "id": "579a674a5740cbb",
   "outputs": [],
   "execution_count": null
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}