Project: Sea Surface Temperature Prediction

Description

In this project we will use a TAO/TRITON buoy dataset to predict sea surface temperature (s.s.temp.) in the equatorial Pacific Ocean. Anomalies in sea surface temperature in this region are the main indicator of El Niño and La Niña events, which have a global impact on climate.

Starting from the preprocessed DataFrame df2, you have build and compare three recurrent architectures: RNN, LSTM, and GRU, evaluating their ability to capture the temporal dependencies of the series.

Download dataset

The dataset contains the following columns:

Column	Description
`month`	Month of the observation
`day`	Day of the observation
`latitude`	Latitude of the buoy
`longitude`	Longitude of the buoy
`zon.winds`	Zonal wind speed (east-west)
`mer.winds`	Meridional wind speed (north-south)
`air temp.`	Air temperature (ºC)
`s.s.temp.`	Sea surface temperature (ºC) — target variable

Delivery

The project must be submitted as a Jupyter Notebook (.ipynb) file containing all the code, results, and markdown cells explaining each step and the decisions made. Additionally, a PDF export of the notebook must be included so that the results and visualisations are easily accessible without needing to run the code. Both files must be submitted together.

[ ]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error

Data loading

Loading the data in a dataframe called df2:

[ ]:

with open('tao-all2.col') as file:
    names2 = []
    for name in file:

        n = name.strip("\n")
        names2.append(n)

print(names2)

[ ]:

df2 = pd.read_csv(
    'tao-all2.dat.gz',
    compression='gzip',
    sep=r'\s+',
    na_values='.',
    comment='%',
    header=None,
)
df2.columns = names2

Deleting unused features:

[ ]:

df2.drop(["obs", "humidity","date","year"], axis=1, inplace=True)

[ ]:

original_rows = df2.shape[0]
df2 = df2.dropna()
print(f" Rows after dropna: {df2.shape[0]}")

print(f"Deleted rows: {original_rows - df2.shape[0]}")

[ ]:

df2.columns

Definition of train and test sets

[ ]:

Models

[ ]:

Training

[ ]:

Evaluation

[ ]: