{
"cells": [
{
"cell_type": "markdown",
"id": "740c03d3",
"metadata": {},
"source": [
"\n",
" \n",
""
]
},
{
"cell_type": "markdown",
"id": "6c5c8cc3",
"metadata": {},
"source": [
"# Tokenization\n",
"\n",
"Tokenization is the process of splitting text into units that a language model can process. These units are called tokens, and they are not the same as words.\n",
"\n",
"**Suggested duration:** 30 minutes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"