{ "cells": [ { "cell_type": "markdown", "id": "ec0e737f", "metadata": {}, "source": [ "\n", " \"Open\n", "" ] }, { "cell_type": "markdown", "id": "ab8ae3fe", "metadata": {}, "source": [ "# Prompt Engineering\n", "\n", "This notebook focuses on how to design prompts that produce useful, reproducible, and well-formatted outputs. The central idea is not using magic words, but reducing ambiguity and giving the model a clear target.\n", "\n", "**Suggested duration:** 2.5 hours" ] }, { "cell_type": "markdown", "id": "b29c9187", "metadata": {}, "source": [ "
\n", "Learning goals\n", "\n", "By the end of this notebook, you should be able to:\n", "\n", "- write a structured prompt for a real task\n", "- explain why prompt quality changes model behavior\n", "- control output format with explicit instructions\n", "- distinguish zero-shot, few-shot, and reasoning-oriented prompting\n", "- adapt prompts for smaller or local models\n", "- test prompting techniques with Hugging Face models\n", "- evaluate prompt quality using realistic case studies\n", "
" ] }, { "cell_type": "markdown", "id": "35596c59", "metadata": {}, "source": [ "
\n", "Table of Contents\n", "\n", "1. [Why prompt structure matters](#why-prompt-structure-matters)\n", "2. [Working with external model APIs](#working-with-external-model-apis)\n", "3. [A practical prompt template](#a-practical-prompt-template)\n", "4. [Controlling output format](#controlling-output-format)\n", "5. [Local models](#local-models)\n", "6. [Zero-shot and few-shot](#zero-shot-and-few-shot)\n", "7. [Reasoning strategies](#reasoning-strategies)\n", "8. [From NetCDF data to a textual alert](#netcdf-to-textual-alert)\n", "
" ] }, { "cell_type": "markdown", "id": "5ccf0aba", "metadata": {}, "source": [ "## Why prompt structure matters \n", "\n", "Prompt engineering is the practice of expressing a task so that the model can solve it with fewer wrong assumptions. A vague prompt leaves many decisions open: audience, scope, tone, format, uncertainty, and missing information.\n", "\n", "Consider the difference between these two instructions:\n", "\n", "> Summarize this paper.\n", "\n", "and\n", "\n", "> Summarize the paper for professionals in data science. in 5 bullet points. Include the research question, data source, method, main result, and one limitation. Use only the abstract provided below.\n", "\n", "The second prompt is stronger because it defines:\n", "\n", "- the **task**: summarize\n", "- the **audience**: Professionals in data science.\n", "- the **scope**: use only the abstract\n", "- the **format**: 5 bullet points\n", "- the **quality bar**: include one limitation\n", "\n", "A useful way to think about prompting is that we are designing the model's working conditions. Better working conditions usually produce more stable outputs." ] }, { "cell_type": "code", "execution_count": null, "id": "ba823521", "metadata": {}, "outputs": [], "source": [ "from textwrap import dedent, fill # utilities to format displayed text\n", "from transformers import pipeline # transformers library contains models and pipelines\n", "# A pipeline is a function that takes a prompt and returns a model's output\n", "# according to the task we want to perform\n", "\n", "weak_prompt = \"Summarize this abstract.\"\n", "\n", "strong_prompt = dedent(\n", " \"\"\"\n", " You are helping with scientific literature review.\n", " Task: summarize the abstract.\n", " Audience: Professionals in data science.\n", " Constraints:\n", " - use only the information in the abstract\n", " - do not invent results\n", " - mention exactly one limitation\n", " Output format:\n", " - 5 bullet points\n", " \"\"\"\n", ").strip()\n", "\n", "# paper: https://www.int-res.com/journals/meps/articles/meps15103\n", "abstract = (\n", " \"Connectivity between patchy marine habitats through larval dispersal is crucial for the persistence of local populations.\"\n", " \"Studies of various marine species suggest broad-scale gene flow across the tropical Indo-West Pacific (IWP), \"\n", " \"presumably facilitated by larval dispersal via stepping-stone habitats. \"\n", " \"However, the generational timescales and geographic paths involved in such dispersal remain unclear, owing to limited biophysical modelling studies.\"\n", " \"Here, we quantified connectivity among patchy habitats of the mangrove whelk Terebralia palustris across the IWP using habitat suitability modelling,\"\n", " \"larval dispersal modelling, and mitochondrial DNA-based population genetic analysis. Our modelling revealed a single larval dispersal network connecting all potential habitats across the IWP. \"\n", " \"At least 14 generations were required for dispersal via stepping-stone habitats to connect the outer edges of the IWP. \"\n", " \"The Maldives and Seychelles served as key stepping stones for dispersal, linking the western Indian Ocean and the western Pacific Ocean through monsoon-driven ocean currents. \"\n", " \"Major haplotypes were shared across 9 regions of the IWP, providing genetic support for a single larval dispersal network. \"\n", " \"Our findings provide fundamental insights into ecological networks formed by stepping-stone dispersal across the IWP, which maintain broad-scale connectivity of T. palustris and potentially other coastal species.\"\n", ")\n", "\n", "print(\"Weak prompt:\\n\")\n", "print(weak_prompt)\n", "print(\"\\n\" + \"=\" * 70 + \"\\n\")\n", "print(\"Strong prompt:\\n\")\n", "print(strong_prompt)\n", "print(\"\\n\" + \"=\" * 70 + \"\\n\")\n", "print(\"Example input abstract:\\n\")\n", "print(abstract)\n", "\n", "# Suggested model for trying this section:\n", "generator = pipeline(\"text-generation\", model=\"google/flan-t5-small\")\n", "weak_output = generator(f\"{weak_prompt}\\n\\nAbstract: {abstract}\", max_new_tokens=120)[0][\"generated_text\"]\n", "strong_output = generator(f\"{strong_prompt}\\n\\nAbstract: {abstract}\", max_new_tokens=120)[0][\"generated_text\"]\n" ] }, { "cell_type": "code", "execution_count": null, "id": "8006859d", "metadata": {}, "outputs": [], "source": [ "\n", "print(\"\\nModel outputs:\\n\")\n", "print(\"Weak prompt output:\\n\") \n", "#print(fill(weak_output, width=100)) #ALERT the variable contains the prompt & the response! (type string)\n", "print(fill(weak_output.split(\"Abstract:\")[1], width=100))\n", "\n", "print(\"\\n\" + \"-\" * 100 + \"\\n\")\n", "print(\"Strong prompt output:\\n\")\n", "print(fill(strong_output.split(\"Abstract:\")[1], width=100))\n", "\n", "# Compare the difference string (len, words, chars)\n", "print(\"\\nDifference between the two outputs (at table):\\n\")\n", "print(\"| Prompt | Length | Words | Chars |\")\n", "print(\"|--------|--------|-------|-------|\")\n", "print(f\"| Weak | {len(weak_output)} | {len(weak_output.split())} | {len(weak_output)} |\")\n", "print(f\"| Strong | {len(strong_output)} | {len(strong_output.split())} | {len(strong_output)} |\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c362d977", "metadata": {}, "outputs": [], "source": [ "# Where Hugging Face models are stored (MAC & Linux) & MODEL SIZE!\n", "!MODEL_DIR=\"${HF_HOME:-$HOME/.cache/huggingface}/hub/models--google--flan-t5-small\"; \\\n", "if [ -d \"$MODEL_DIR\" ]; then \\\n", " du -sh \"$MODEL_DIR\"; \\\n", "fi\n" ] }, { "cell_type": "markdown", "id": "d76459b9", "metadata": {}, "source": [ "
\n", "Activity\n", "

A) Ask the same prompts (weak and strong) to Chatgpt?

\n", "

B) Try different models:
\n", "- `google/flan-t5-base`: stronger than the small version, still manageable for demonstrations.
\n", "- `distilgpt2`: useful to show the limits of a non-instruction-tuned model.
\n", "

\n", "
" ] }, { "cell_type": "markdown", "id": "84e11b2c", "metadata": {}, "source": [ "## Working with External Model APIs \n", "\n", "When we use an external model API, our notebook does not run the model locally. Instead, it sends a request to a remote service and receives a generated response.\n", "\n", "### Core concepts\n", "\n", "- **API (Application Programming Interface)**: a structured way for one program to ask another program for a result.\n", "- **Endpoint**: the URL where requests are sent.\n", "- **API key**: a private token used for authentication. It should never be hardcoded in notebooks or pushed to Git.\n", "- **Model name**: the identifier of the model we want to use (for example, `meta/llama-3.1-8b-instruct`).\n", "- **Parameters**: options such as `temperature`, `max_tokens`, and `top_p` that control response style and length.\n", "- **Response object**: the JSON payload returned by the API, containing generated text and metadata.\n", "\n", "### What \"OpenAI-compatible\" means\n", "\n", "Many providers implement an API that follows the same request/response shape as OpenAI's Chat Completions API. This is often called an **OpenAI-compatible standard**.\n", "\n", "In practice, this means you can usually keep the same client code and only change:\n", "\n", "1. `base_url` (the provider endpoint)\n", "2. `api_key` (your provider token)\n", "3. `model` (a model available on that provider)\n", "\n" ] }, { "cell_type": "markdown", "id": "cf484f58", "metadata": {}, "source": [ "Lets work with external API Providers" ] }, { "cell_type": "markdown", "id": "7024e939", "metadata": {}, "source": [ "### Some (Free) API Providers for Prototyping\n", "\n", "#### Groq\n", "- **Strength**: high-performance inference with very low latency.\n", "- **Typical free limits**: around `14,400 requests/day` and `~30 requests/min` (varies by model/account).\n", "- **Common model families**: Llama 3.1, Mixtral, Gemma.\n", "\n", "#### Google AI Studio (Gemini API)\n", "- **Strength**: easy access to Gemini models for quick experiments.\n", "- **Typical free limits**: around `1,500 requests/day` and `15 requests/min`.\n", "- **Note**: stronger Pro-tier models (for example, Gemini 2.5 Pro) are typically paid.\n", "\n", "#### OpenRouter\n", "- **Strength**: one API interface across many providers/models.\n", "- **Free access**: selected models are marked as free; limits vary by model.\n", "- **Use case**: excellent for fast prototyping and cross-model comparisons.\n", "\n", "#### GitHub Models\n", "- **Strength**: simple model playground/prototyping flow in the GitHub ecosystem.\n", "- **Free access**: free prototyping for many models.\n", "- **Typical constraint**: around `8K` input tokens per request (model-dependent).\n", "\n", "#### Mistral AI (La Plateforme)\n", "- **Strength**: direct access to Mistral-hosted models.\n", "- **Typical free tier**: around `1 request/second` plus monthly token allowances.\n", "\n", "#### Cohere\n", "- **Strength**: clean API and strong enterprise-style NLP tooling.\n", "- **Typical free limits**: around `1,000 requests/month` and `~20 requests/min` (for example, Command R+).\n", "- **Note**: free keys are generally intended for non-commercial use.\n", "\n", "### Hugging Face Inference API\n", "- **Strength**: access to thousands of open-source models.\n", "- **Free access**: available, but effective rate limits can vary with demand/server load.\n", "\n", "> Always verify official pricing and rate-limit pages before !!" ] }, { "cell_type": "markdown", "id": "eec56c0c", "metadata": {}, "source": [ "- Groq:\n", " High-performance inference\n", " ~14,400 requests/day (e.g., Llama 3 8B)\n", " ~30 requests/minute limit\n", " Models: Llama 3.1, Mixtral, Gemma\n", "- Google AI Studio (Gemini API)\n", " Free tier: ~1,500 requests/day, 15 requests/minute\n", " Pro models (e.g., Gemini 2.5 Pro) typically paid\n", "- OpenRouter\n", " Access to multiple “free” models (DeepSeek R1, Llama 3, Mistral)\n", " Limits vary per model\n", " Suitable for prototyping\n", "- GitHub Models\n", " Free prototyping for 45+ models\n", " ~8K input tokens per request\n", " Models from Microsoft (Phi), OpenAI (GPT-4o), Meta (Llama)\n", "- Mistral AI (La Plateforme)\n", " Free tier with ~1 request/second\n", " Generous monthly token limits\n", "- Cohere\n", " Free API key for non-commercial use\n", " ~1,000 requests/month\n", " ~20 requests/minute (e.g., Command R+)\n", "- Hugging Face (Inference API)\n", " Free inference on thousands of open-source models\n", " Rate limits depend on server load" ] }, { "cell_type": "markdown", "id": "0c4bafcf", "metadata": {}, "source": [ "[OpenRouter](https://openrouter.ai/) is a unified API gateway for language models from multiple providers. With one API format, users can switch models quickly and compare prompt behavior across model families.\n", "\n", "Examples of commonly used free-tier model IDs on OpenRouter include:\n", "\n", "- `meta-llama/llama-3.1-8b-instruct:free`: good for structured prompting and lightweight instruction-following tests.\n", "- `mistralai/mistral-7b-instruct:free`: useful for concise prompt/response comparisons with a smaller instruct model.\n", "- `google/gemma-2-9b-it:free`: useful for checking how another model family follows format constraints.\n" ] }, { "cell_type": "markdown", "id": "31156ca8", "metadata": {}, "source": [ "The example:" ] }, { "cell_type": "code", "execution_count": null, "id": "42d1350a", "metadata": {}, "outputs": [], "source": [ "# HOW TO USE THE API KEY IN OUR NOTEBOOK OR SYSTEM!\n", "# TAKE CARE OF THE API KEY!\n", "\n", "# The system will ask for the API key \n", "# In VSCode, the input-form is visible in the upper part of the window\n", "\n", "from getpass import getpass\n", "import os\n", "\n", "api_key_input = getpass(\"Enter your API key: \").strip()\n", "if not api_key_input:\n", " raise ValueError(\"No API key provided.\")\n", "\n", "# Store it in the current notebook session so other cells can read it.\n", "os.environ[\"API_KEY_COURSE\"] = api_key_input\n", "print(\"API key loaded for this session.\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "995da8c4", "metadata": {}, "outputs": [], "source": [ "# Example: calling an NVIDIA model through the OpenAI-compatible API\n", "import os\n", "from openai import OpenAI\n", "\n", "api_key = os.getenv(\"API_KEY_COURSE\")\n", "if not api_key:\n", " raise ValueError(\"Set API_KEY with previous cell\")\n", "\n", "client = OpenAI(\n", " base_url=\"https://openrouter.ai/api/v1\",\n", " api_key=api_key,\n", ")\n", "\n", "messages = [\n", " {\n", " \"role\": \"user\",\n", " \"content\": (\n", " weak_prompt +\n", " \"Abstract: \" + abstract\n", " ),\n", " },\n", "]\n", "\n", "response = client.chat.completions.create(\n", " model=\"nvidia/nemotron-3-super-120b-a12b:free\",\n", " messages=messages,\n", " temperature=0.3,\n", " max_tokens=180,\n", ")\n", "\n", "print(fill(response.choices[0].message.content, width=100))" ] }, { "cell_type": "code", "execution_count": null, "id": "4ef5efe7", "metadata": {}, "outputs": [], "source": [ "print(\"\\nDifference between the outputs:\\n\")\n", "print(\"| Prompt | Length | Words | Chars |\")\n", "print(\"|--------|--------|-------|-------|\")\n", "print(f\"| Weak | {len(weak_output)} | {len(weak_output.split())} | {len(weak_output)} |\")\n", "print(f\"| Strong | {len(strong_output)} | {len(strong_output.split())} | {len(strong_output)} |\")\n", "print(f\"| API | {len(response.choices[0].message.content)} | {len(response.choices[0].message.content.split())} | {len(response.choices[0].message.content)} |\")" ] }, { "cell_type": "code", "execution_count": null, "id": "1b9e0509", "metadata": {}, "outputs": [], "source": [ "response #What kind of object is this? and choices attribute? " ] }, { "cell_type": "markdown", "id": "c686334b", "metadata": {}, "source": [ "## A practical prompt template \n", "\n", "A strong prompt often contains five building blocks:\n", "\n", "1. **Task**: What should the model do?\n", "2. **Context**: What information should it use?\n", "3. **Constraints**: What must it avoid or prioritize?\n", "4. **Output format**: How should the answer be structured?\n", "5. **Quality criteria**: What makes a good answer?\n", "\n", "A reusable template is:\n", "\n", "```text\n", "You are helping with [domain/task].\n", "Goal: [what to produce].\n", "Context: [relevant background or source text].\n", "Constraints: [rules, exclusions, definitions].\n", "Output format: [table, bullets, JSON, abstract, etc.].\n", "Quality criteria: [what makes the output good].\n", "```\n", "\n", "This template works well because it reduces hidden choices. The model does not need to guess whether you want a formal answer, a table, a short summary, or a speculative answer." ] }, { "cell_type": "code", "execution_count": null, "id": "3ba38b68", "metadata": {}, "outputs": [], "source": [ "paper_prompt = dedent(\n", " \"\"\"\n", " You are helping with scientific literature review.\n", " Goal: summarize the paper for an MSc student.\n", " Context: use only the abstract provided below.\n", " Constraints:\n", " - do not invent results that are not explicitly stated\n", " - if information is missing, write 'unknown'\n", " Output format: a Markdown table with columns [Question, Answer].\n", " Quality criteria: concise, accurate, readable, and include one limitation.\n", "\n", " Abstract:\n", " [PASTE ABSTRACT HERE]\n", " \"\"\"\n", ").strip()\n", "\n", "print(paper_prompt)\n", "\n", "# Try with:\n", "# generator = load_generator(\"google/flan-t5-base\")\n", "# result = generator(paper_prompt.replace(\"[PASTE ABSTRACT HERE]\", abstract), max_new_tokens=180)\n", "# print(result[0][\"generated_text\"])" ] }, { "cell_type": "markdown", "id": "fde4a82f", "metadata": {}, "source": [ "## Controlling output format \n", "\n", "Format control matters whenever the model output will be reused by a person, a spreadsheet, or another program.\n", "\n", "Common output targets include:\n", "\n", "- **bullet lists** for short summaries\n", "- **Markdown tables** for comparisons\n", "- **JSON** for pipelines and applications\n", "- **paper-style prose** for academic writing\n", "\n", "The more reusable the output must be, the more explicit the format instructions should be.\n", "\n", "Good format instructions often include:\n", "\n", "- the exact structure to return\n", "- field names or column names\n", "- a rule for missing values\n", "- a limit on length\n", "- a reminder not to invent unsupported content\n", "\n", "Example instruction:\n", "\n", "> Return valid JSON with keys: title, method, dataset, main_result, limitation. If a field is missing, use `unknown`.\n", "\n", "That last sentence is important because it gives the model a safe behavior when evidence is incomplete." ] }, { "cell_type": "code", "execution_count": null, "id": "ae4f78a2", "metadata": {}, "outputs": [], "source": [ "json_prompt = dedent(\n", " \"\"\"\n", " Extract information from the abstract below.\n", " Return valid JSON with keys:\n", " - title\n", " - method\n", " - dataset\n", " - main_result\n", " - limitation\n", " If a field is missing, use 'unknown'.\n", "\n", " Abstract:\n", " We evaluate a transformer-based classifier for plankton image recognition\n", " using 120,000 labeled images from Mediterranean coastal stations.\n", " The model improves macro-F1 by 8% over a CNN baseline, but performance drops\n", " for rare taxa and low-light images.\n", " \"\"\"\n", ").strip()\n", "\n", "table_prompt = dedent(\n", " \"\"\"\n", " Summarize the abstract below as a Markdown table with two columns:\n", " [Item, Value].\n", " Include rows for task, data, model, result, and limitation.\n", "\n", " Abstract:\n", " We evaluate a transformer-based classifier for plankton image recognition\n", " using 120,000 labeled images from Mediterranean coastal stations.\n", " The model improves macro-F1 by 8% over a CNN baseline, but performance drops\n", " for rare taxa and low-light images.\n", " \"\"\"\n", ").strip()\n", "\n", "# print(\"JSON-oriented prompt:\\n\")\n", "# print(json_prompt)\n", "# print(\"\\n\" + \"=\" * 70 + \"\\n\")\n", "# print(\"Table-oriented prompt:\\n\")\n", "# print(table_prompt)\n", "\n", "# Using the OpenAI API\n", "\n", "import os\n", "from openai import OpenAI\n", "\n", "api_key = os.getenv(\"API_KEY_COURSE\")\n", "if not api_key:\n", " raise ValueError(\"Set API_KEY with previous cell\")\n", "\n", "client = OpenAI(\n", " base_url=\"https://openrouter.ai/api/v1\",\n", " api_key=api_key,\n", ")\n", "\n", "messages = [\n", " {\n", " \"role\": \"user\",\n", " \"content\": json_prompt,\n", " },\n", "]\n", "\n", "response = client.chat.completions.create(\n", " model=\"nvidia/nemotron-3-super-120b-a12b:free\",\n", " messages=messages,\n", ")\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f8c4121f", "metadata": {}, "outputs": [], "source": [ "print(fill(response.choices[0].message.content, width=100))" ] }, { "cell_type": "markdown", "id": "bbca73fd", "metadata": {}, "source": [ "
\n", "Activity\n", "

Request the other table_prompt with another free model

\n", "
" ] }, { "cell_type": "markdown", "id": "9c036838", "metadata": {}, "source": [ "## Using prompts with local models (Ollama, llama.cpp, ... )\n", "\n", "Local models are attractive for privacy, cost control, and offline use, but they often require more careful prompting than larger hosted systems.\n", "\n", "Why? Smaller or locally deployed models usually:\n", "\n", "- have weaker instruction-following behavior\n", "- are more sensitive to ambiguous prompts\n", "- may struggle with long context or complex formatting\n", "- benefit more from explicit examples\n" ] }, { "cell_type": "code", "execution_count": null, "id": "27c85344", "metadata": {}, "outputs": [], "source": [ "import requests\n", "from openai import OpenAI\n", "\n", "# Local Ollama server (start with: ollama serve)\n", "OLLAMA_BASE_URL = \"http://localhost:11434\"\n", "\n", "# 1) Discover locally installed models\n", "try:\n", " tags_response = requests.get(f\"{OLLAMA_BASE_URL}/api/tags\", timeout=10)\n", " tags_response.raise_for_status()\n", " local_models = [m[\"name\"] for m in tags_response.json().get(\"models\", [])]\n", "except Exception as exc:\n", " raise RuntimeError(\n", " \"Could not connect to local Ollama at http://localhost:11434. \"\n", " \"Run `ollama serve` and ensure Ollama is installed.\"\n", " ) from exc\n", "\n", "print(\"Local Ollama models:\\n\")\n", "for model_name in local_models:\n", " print(f\"- {model_name}\")\n", "\n", "if not local_models:\n", " raise ValueError(\"No local models found. Pull one first, e.g. `ollama pull llama3.2:3b`.\")\n", "\n", "# 2) Choose a model\n", "# (Note about me): I have two local Ollama models:\n", "#- 0: gemma4:e4b\n", "#- 1: deepseek-coder-v2:16b\n", "## IMPORTANT \n", "\n", "selected_model = local_models[0] # Lets see what happens with the first model\n", "print(f\"\\nSelected model: {selected_model}\")\n", "\n", "# 3) Prompt fallback (in case json_prompt was not executed earlier)\n", "prompt_text = globals().get(\n", " \"json_prompt\",\n", " \"Classify this abstract as one of: observation, experiment, review. Return only one label.\",\n", ")\n", "\n", "# 4) Call Ollama through OpenAI-compatible endpoint\n", "client = OpenAI(base_url=f\"{OLLAMA_BASE_URL}/v1\", api_key=\"ollama\")\n", "response = client.chat.completions.create(\n", " model=selected_model,\n", " messages=[\n", " {\"role\": \"system\", \"content\": \"You are a concise scientific assistant.\"},\n", " {\"role\": \"user\", \"content\": prompt_text},\n", " ],\n", " temperature=0.2,\n", " max_tokens=120,\n", ")\n", "\n", "# 5) Debug-friendly print\n", "choice = response.choices[0]\n", "content = choice.message.content\n", "\n", "print(\"\\nModel response:\\n\")\n", "if content and content.strip():\n", " print(content)\n", "else:\n", " print(\"(empty content returned)\") # The model can not generate a response!\n", " print(f\"finish_reason: {choice.finish_reason}\")\n", " print(\"raw choice:\", choice)" ] }, { "cell_type": "markdown", "id": "1912c93b", "metadata": {}, "source": [ "Notes about the previous results:\n", "- Using the model 1, What happens with the title?" ] }, { "cell_type": "markdown", "id": "b65f2253", "metadata": {}, "source": [ "### Why can local Ollama inference feel slow?\n", "\n", "Local inference can be slower than hosted APIs because everything runs on your own machine, which has less compute and memory than cloud inference clusters.\n", "\n", "Common reasons:\n", "- **Model size vs hardware**: larger models (for example, 16B+) require much more VRAM/RAM and compute per token.\n", "- **CPU fallback**: if the model does not fully fit in GPU memory, part of the workload can run on CPU, which is much slower.\n", "- **First-run overhead**: loading weights and warming up kernels adds startup latency.\n", "- **Token-by-token generation**: generation is sequential; longer outputs (`max_tokens`) take more time.\n", "- **Resource contention**: browser, IDE, and notebook processes compete for local CPU/GPU/RAM.\n", "\n", "Practical speed tips:\n", "- Use smaller models for class demos (for example, 3B-8B).\n", "- Keep prompts concise and lower `max_tokens`.\n", "- Close heavy background apps.\n", "- Prefer quantized models when available.\n", "- Run requests one at a time during exercises." ] }, { "cell_type": "markdown", "id": "d075c6e5", "metadata": {}, "source": [ "## Zero-shot and few-shot \n", "\n", "### Zero-shot\n", "\n", "In zero-shot prompting, we describe the task without giving examples. This works well when the task is familiar, the labels are clear, and the model already knows the pattern.\n", "\n", "Example:\n", "\n", "```text\n", "Classify each abstract as observation, experiment, or review.\n", "```\n", "\n", "### Few-shot\n", "\n", "In few-shot prompting, we provide small examples that demonstrate the intended behavior. Few-shot prompting is useful when:\n", "\n", "- labels are subtle\n", "- the desired style matters\n", "- there are hidden conventions\n", "- the model tends to confuse nearby categories\n", "\n", "Few-shot prompting teaches by demonstration instead of only by instruction." ] }, { "cell_type": "code", "execution_count": null, "id": "ae1d4cfe", "metadata": {}, "outputs": [], "source": [ "zero_shot_prompt = dedent(\n", " \"\"\"\n", " Classify the following abstract as one of: observation, experiment, review.\n", " Return only the label.\n", "\n", " Abstract:\n", " We tested nutrient uptake in mesocosms under different temperature conditions.\n", " \"\"\"\n", ").strip()\n", "\n", "few_shot_prompt = dedent(\n", " \"\"\"\n", " Classify each abstract as one of: observation, experiment, review.\n", " Return only the label.\n", "\n", " Example 1:\n", " Abstract: We measured salinity and chlorophyll trends from coastal buoys over ten years.\n", " Label: observation\n", "\n", " Example 2:\n", " Abstract: We manipulated nutrient concentration in mesocosms and compared growth rates.\n", " Label: experiment\n", "\n", " Example 3:\n", " Abstract: This paper synthesizes recent studies on marine heatwaves.\n", " Label: review\n", "\n", " Now classify:\n", " Abstract: We tested nutrient uptake in mesocosms under different temperature conditions.\n", " Label:\n", " \"\"\"\n", ").strip()\n", "\n", "print(\"Zero-shot prompt:\\n\")\n", "print(zero_shot_prompt)\n", "print(\"\\n\" + \"=\" * 70 + \"\\n\")\n", "print(\"Few-shot prompt:\\n\")\n", "print(few_shot_prompt)\n", "\n", "import os\n", "from openai import OpenAI\n", "\n", "\n", "client = OpenAI(base_url=f\"{OLLAMA_BASE_URL}/v1\", api_key=\"ollama\")\n", "model = \"deepseek-coder-v2:16b\"\n", "\n", "for label, prompt in [(\"zero-shot\", zero_shot_prompt), (\"few-shot\", few_shot_prompt)]:\n", " r = client.chat.completions.create(\n", " model=model,\n", " messages=[{\"role\": \"user\", \"content\": prompt}],\n", " temperature=0.2,\n", " max_tokens=40,\n", " )\n", " print(f\"\\n{label}:\\n{r.choices[0].message.content}\")\n", "# print(generator(few_shot_prompt, max_new_tokens=40)[0][\"generated_text\"])" ] }, { "cell_type": "markdown", "id": "82adca26", "metadata": {}, "source": [ "## Reasoning strategies \n", "\n", "### Chain-of-Thought\n", "\n", "Chain-of-Thought prompting encourages the model to decompose a problem into intermediate steps. This can help when the task benefits from explicit decomposition, such as multi-step classification, planning, or error analysis.\n", "\n", "In real applications, a safer pattern is often to ask for:\n", "\n", "- a short reasoning summary\n", "- a checklist of criteria\n", "- intermediate outputs\n", "- a final answer in a fixed format\n", "\n", "instead of asking for a long unrestricted reasoning trace.\n", "\n", "### Tree-of-Thought\n", "\n", "Tree-of-Thought generalizes this idea by exploring several candidate paths before selecting one. It is useful for open-ended tasks such as planning, hypothesis generation, or comparing alternative strategies.\n", "\n", "It is more expensive and slower than simple prompting, so it should be used only when the task really needs multiple alternatives." ] }, { "cell_type": "code", "execution_count": null, "id": "58311858", "metadata": {}, "outputs": [], "source": [ "chain_of_thought_prompt = dedent(\n", " \"\"\"\n", " Read the abstract and decide whether it is observation, experiment, or review.\n", " First list the cues that support your decision.\n", " Then return the final label on a final line as:\n", " Final label: