chigaros.com / guide

Learning LLMs & AI Agents

A Curated Field Guide — Edition v1.0.0 — March 3, 2026 — Living Document

Preface

This document is a version-controlled reference manual and curriculum architecture for understanding, building, and deploying Large Language Models (LLMs) and Agentic architectures. It is designed to be maintainable, verifiable, and structurally durable.

◆ Living Document

This is Edition 1.0.0. As the field accelerates, individual resources will deprecate rapidly. This guide relies on a stable ID system (e.g., VID-001, PAP-003) so sources can be hot-swapped in the Master Source Catalog without breaking the underlying curriculum.

Who this is for: Engineers, technical product managers, and researchers transitioning into AI engineering. It assumes fundamental programming knowledge but no prior deep learning expertise.

How to Use This Guide

Start with a goal. Find your row in the matrix below, then follow the path. Resource IDs (e.g., VID-001) correspond to entries in Appendix A.

Goal	Start Here	Then	Advanced	Band
Understand LLMs	`VID-001` `BOK-001`	`VID-002` `COU-012`	`BOK-002` `BOK-003`	Beginner
Master Prompting	`REP-003` `PAP-004`	`PAP-005`	`PAP-006`	Beginner
Build first agent	`GUI-001` `GUI-005`	`VID-005` `REP-002`	`COU-001` `VID-007`	Intermediate
Master RAG	`PAP-007` `COU-003`	`COU-004`	`COU-006`	Intermediate
Multi-agent systems	`PAP-002` `VID-003`	`COU-011` `COU-014`	`REP-001` `GUI-002`	Advanced
Evaluate & Deploy	`COU-009` `VID-004`	`COU-008` `REP-010`	`GUI-003` `REP-008`	Advanced

Learning Map

Mental Models — What LLMs are
Prompting as Interface Design
Tool Use & Structured Outputs
RAG Systems
Agents & Agentic Architectures
Multi-Agent Systems & Memory
Evaluation & Reliability
LLMOps & Deployment
Security & Prompt Injection
Capstone Builds

Starter Glossary

Agent

An AI system that uses an LLM as its reasoning engine to determine which actions to take and in what order, often interacting with external tools.

LLM

A deep learning model trained on vast text to predict the next token, enabling it to generate human-like text and perform reasoning tasks.

RAG

Retrieval-Augmented Generation — grounds an LLM's responses in external knowledge retrieved at runtime.

MCP

Model Context Protocol — an open standard enabling secure two-way connections between data sources and AI models.

Embeddings

Numerical representations of text in a high-dimensional vector space, capturing semantic meaning.

→ Full glossary in Appendix B

Foundations

🟢 Beginner

Learning Objectives

Understand the fundamental mechanism of next-token prediction
Differentiate between base models and instruction-tuned models
Recognize hardware and data requirements for training

◆ Definition

A Large Language Model is a neural network, typically based on the Transformer architecture, optimized to predict the most probable subsequent token given a sequence of preceding tokens.

Core Ideas: LLMs are not knowledge databases — they are statistical reasoning engines. Their primary capability is pattern recognition and generation. The shift from base models (which simply continue text) to chat models (which follow instructions) is achieved through Fine-Tuning and Reinforcement Learning from Human Feedback (RLHF).

■ Do This Now

Access a base model and an instruction-tuned model via API or open-source weights. Feed both the same prompt: "The capital of France is". Observe how the base model continues the sentence, while the tuned model answers the implied question.

▲ Common Pitfall

Treating the LLM as a factual lookup engine. Detection cue: You are surprised when the model fabricates a plausible-sounding but incorrect URL or historical date.

✓ Self-Check

I can explain next-token prediction to a non-technical peer
I understand the difference between pre-training and fine-tuning
I know why hallucinations are a feature of the architecture, not a bug

Sources

📹

VID-001

LLM Introduction

Beginner foundations theory ⏱ ~1 hr

↗

📚

BOK-001

Understanding Deep Learning

Intermediate foundations ⏱ 20+ hrs

↗

Prompting & Interaction Design

🟢 Beginner

Learning Objectives

Master zero-shot and few-shot prompting techniques
Implement Chain-of-Thought reasoning to improve output reliability
Design prompts as deterministic interfaces

◆ Definition

Prompt Engineering is the systematic process of designing, structuring, and optimizing inputs to an LLM to elicit accurate, structured, and predictable outputs.

Core Ideas: Models perform better when given space to "think." Techniques like Chain-of-Thought (CoT) force the model to output intermediate reasoning steps, vastly reducing logic errors.

■ Do This Now

Write a prompt asking an LLM to solve a logic puzzle. First, ask for just the answer. Second, append "Think step-by-step before answering." Compare reliability.

▲ Common Pitfall

Writing polite, conversational requests instead of structured commands. Detection cue: Your prompts include "Could you please..." instead of structured <instruction> tags.

✓ Self-Check

I separate instructions from data using delimiters (XML tags)
I provide few-shot examples for complex formatting tasks
I utilize Chain-of-Thought for tasks requiring logic or math

Sources

🗂️

REP-003

Prompt Engineering Guide

Beginner prompting ⏱ ~4 hrs

↗

📜

PAP-004

Chain-of-Thought Prompting Elicits Reasoning in LLMs

Intermediate prompting theory ⏱ ~1 hr

↗

Retrieval-Augmented Generation

🟡 Intermediate

Learning Objectives

Understand the architecture of a standard RAG pipeline
Generate embeddings and store them in a vector database
Execute semantic search to retrieve context for an LLM

◆ Definition

RAG is a framework that retrieves relevant facts from an external knowledge base to ground large language models on the most accurate, up-to-date information before generating an answer.

Core Ideas: Models are frozen in time. RAG solves this by passing relevant documents into the context window at runtime. The process: Chunk → Embed → Index → Retrieve → Generate.

■ Do This Now

Take a long PDF. Chunk it into 500-token segments, generate embeddings for each, then write a script to find the most similar chunk to a user query using cosine similarity.

▲ Common Pitfall

Using overly large chunk sizes. Detection cue: The retrieved context contains the answer but the LLM ignores it — "Lost in the Middle" phenomenon.

✓ Self-Check

I can explain the difference between lexical and semantic search
I understand how chunking strategies impact retrieval quality
I know how to calculate cosine similarity between two vectors

Sources

📜

PAP-007

Retrieval-Augmented Generation Survey

Intermediate RAG theory ⏱ ~2 hrs

↗

🧑‍🏫

COU-003

Building Vector Databases with Pinecone

Intermediate RAG vector-database ⏱ ~5 hrs

↗

Agents & Tool Orchestration

🟡 Intermediate

Learning Objectives

Define what makes a system "agentic"
Implement the ReAct (Reason + Act) pattern
Provide an LLM with external tools — APIs, calculators, code interpreters

◆ Definition

An Agent is a system where an LLM is given an objective, a set of tools, and a loop allowing it to independently reason, execute actions, observe results, and iterate until the objective is met.

Core Ideas: The ReAct pattern is foundational — the model Reasons about what to do, chooses an Action (tool call), Observes the output, and loops. MCP standardizes how agents connect to external data.

■ Do This Now

Build a basic ReAct loop in Python without an agent framework. Define one tool (e.g., current weather function). Write a while loop that prompts the LLM, parses its output for a tool call, executes the function, and feeds the result back.

▲ Common Pitfall

Overloading the agent with too many tools. Detection cue: The model enters an infinite loop, continuously calling the wrong tool or hallucinating tool arguments.

✓ Self-Check

I understand the ReAct framework
I can write a system prompt defining available tools and their schemas
I understand the principles behind the Model Context Protocol (MCP)

Sources

📜

PAP-001

ReAct: Synergizing Reasoning and Acting in Language Models

Advanced agents ReAct ⏱ ~2 hrs

↗

🗺️

GUI-003

Building Effective Agents — Anthropic

Intermediate agents prompting ⏱ ~1 hr

↗

📹

VID-006

Building Agents with MCP

Intermediate MCP agents ⏱ ~45 min

↗

Multi-Agent Systems & Memory

🔴 Advanced

Learning Objectives

Design workflows requiring multiple specialized agents
Implement memory systems — short-term context vs. long-term vector storage
Structure agent-to-agent communication

◆ Definition

Multi-Agent Systems (MAS) orchestrate multiple distinct AI agents, each with specific roles, system prompts, and tools, collaborating to solve tasks too complex for a single agent.

■ Do This Now

Design an architecture diagram for an automated software development team. Map out roles (PM, Coder, Reviewer), the specific tools each role needs, and the flow of information between them.

▲ Common Pitfall

Lack of termination criteria. Detection cue: Agents get stuck in a conversational loop, endlessly agreeing or passing the same data back without progressing the task.

✓ Self-Check

I can identify when a task requires multi-agent vs. single-agent
I understand how to implement an orchestrator/router pattern
I can manage conversational state without exceeding context window limits

Sources

📜

PAP-002

Generative Agents: Interactive Simulacra of Human Behavior

Advanced multi-agent memory ⏱ ~1.5 hrs

Advanced multi-agent ⏱ ~4 hrs

↗

Evaluation & LLMOps

🔴 Advanced

Learning Objectives

Implement "LLM-as-a-Judge" evaluation metrics
Design logging and tracing systems for agent workflows
Manage prompt versioning and regression testing

◆ Definition

LLMOps comprises the operational capabilities, infrastructure, and practices required to manage the lifecycle of LLM applications, from prompt engineering to production deployment and monitoring.

■ Do This Now

Create an evaluation dataset of 10 queries and 10 ideal responses. Write a script that passes your system's outputs to an "evaluator model," prompting it to score 1–5 based on accuracy and conciseness.

▲ Common Pitfall

Deploying straight from a playground to production. Detection cue: You have no visibility into the actual prompts your system generates on behalf of users, or the latency of external tool calls.

✓ Self-Check

I have baseline metrics for my task (retrieval precision, generation accuracy)
I have implemented tracing to monitor token usage and cost
I have automated evaluation pipelines before pushing prompt changes

Sources

📹

VID-004

Building and Evaluating Agents

Advanced evaluation LLMOps ⏱ ~1 hr

Advanced LLMOps ⏱ ~6 hrs

↗

Appendix A — Master Source Catalog

Single source of truth. All additions, deprecations, and updates happen here first.

◆ Editor's Note

Short links (lnkd.in) require manual verification. Mark any unresolved links ⚠️ Verify in the Status column after each patch update. See Appendix D for protocol.

ID	Title	Type	Author	Difficulty	Time	Status	URL
VID-001	LLM Introduction	Video	Unknown	Beginner	1h	Core	↗
VID-002	LLMs from Scratch	Video	Unknown	Advanced	2h	Core	↗
VID-003	Agentic AI Overview (Stanford)	Video	Stanford	Intermediate	1h	Core	↗
VID-004	Building and Evaluating Agents	Video	Unknown	Advanced	1h	Core	↗
VID-005	Building Effective Agents	Video	Unknown	Intermediate	1h	Core	↗
VID-006	Building Agents with MCP	Video	Unknown	Intermediate	1h	Core	↗
VID-007	Building an Agent from Scratch	Video	Unknown	Intermediate	1h	Core	↗
VID-008	Philo Agents (Playlist)	Video	Unknown	Intermediate	2h	Optional	↗
REP-001	GenAI Agents	Repo	Nirdiamant	Intermediate	—	Core	↗
REP-002	AI Agents for Beginners	Repo	Microsoft	Beginner	—	Core	↗
REP-003	Prompt Engineering Guide	Repo	Unknown	Beginner	4h	Core	↗ ⚠️
GUI-001	Google's Agent Whitepaper	Guide	Google	Intermediate	1h	Core	↗ ⚠️
GUI-002	Google's Agent Companion	Guide	Google	Intermediate	1h	Core	↗ ⚠️
GUI-003	Building Effective Agents — Anthropic	Guide	Anthropic	Intermediate	1h	Core	↗ ⚠️
GUI-004	Claude Code Best Agentic Practices	Guide	Anthropic	Intermediate	1h	Core	↗ ⚠️
GUI-005	OpenAI's Practical Guide to Building Agents	Guide	OpenAI	Intermediate	1h	Core	↗ ⚠️
BOK-001	Understanding Deep Learning	Book	Unknown	Intermediate	20h+	Core	↗
BOK-002	Building an LLM from Scratch	Book	Unknown	Advanced	15h+	Core	↗ ⚠️
BOK-003	The LLM Engineering Handbook	Book	Unknown	Advanced	15h+	Core	↗ ⚠️
BOK-004	AI Agents: The Definitive Guide	Book	Nicole Koenigstein	Intermediate	10h+	Core	↗ ⚠️
BOK-005	Building Applications with AI Agents	Book	Michael Albada	Intermediate	10h+	Optional	↗ ⚠️
BOK-006	AI Agents with MCP	Book	Kyle Stratis	Intermediate	10h+	Optional	↗ ⚠️
BOK-007	AI Engineering — O'Reilly	Book	O'Reilly	Advanced	15h+	Core	↗
PAP-001	ReAct: Synergizing Reasoning and Acting	Paper	Unknown	Advanced	2h	Core	↗ ⚠️
PAP-002	Generative Agents	Paper	Unknown	Advanced	2h	Core	↗ ⚠️
PAP-003	Toolformer	Paper	Unknown	Advanced	2h	Core	↗ ⚠️
PAP-004	Chain-of-Thought Prompting	Paper	Unknown	Intermediate	1h	Core	↗ ⚠️
PAP-005	Tree of Thoughts	Paper	Unknown	Advanced	2h	Core	↗ ⚠️
PAP-006	Reflexion	Paper	Unknown	Advanced	2h	Core	↗ ⚠️
PAP-007	RAG Survey	Paper	Unknown	Intermediate	2h	Core	↗ ⚠️
COU-001	HuggingFace Agent Course	Course	HuggingFace	Intermediate	8h	Core	↗ ⚠️
COU-002	MCP with Anthropic	Course	Anthropic	Intermediate	4h	Core	↗ ⚠️
COU-003	Building Vector DBs with Pinecone	Course	Pinecone	Intermediate	5h	Core	↗ ⚠️
COU-004	Vector DBs from Embeddings to Apps	Course	Unknown	Intermediate	5h	Core	↗ ⚠️
COU-005	Agent Memory	Course	Unknown	Intermediate	3h	Core	↗ ⚠️
COU-006	Building and Evaluating RAG Apps	Course	Unknown	Advanced	5h	Core	↗ ⚠️
COU-007	Building Browser Agents	Course	Unknown	Advanced	4h	Optional	↗ ⚠️
COU-008	LLMOps	Course	Unknown	Advanced	6h	Core	↗ ⚠️
COU-009	Evaluating AI Agents	Course	Unknown	Advanced	4h	Core	↗ ⚠️
COU-010	Computer Use with Anthropic	Course	Anthropic	Advanced	4h	Optional	↗ ⚠️
COU-011	Multi-Agent Use	Course	Unknown	Advanced	4h	Core	↗ ⚠️
COU-012	Improving LLM Accuracy	Course	Unknown	Intermediate	4h	Core	↗ ⚠️
COU-013	Agent Design Patterns	Course	Unknown	Advanced	5h	Core	↗ ⚠️
COU-014	Multi Agent Systems	Course	Unknown	Advanced	4h	Core	↗ ⚠️
NEW-001	Gradient Ascent	Newsletter	Unknown	—	—	Watchlist	↗ ⚠️
NEW-002	DecodingML by Paul	Newsletter	Paul	—	—	Watchlist	↗ ⚠️
NEW-003	Deep (Learning) Focus by Cameron	Newsletter	Cameron	—	—	Watchlist	↗ ⚠️
NEW-004	NeoSage by Shivani	Newsletter	Shivani	—	—	Watchlist	↗
NEW-005	Jam with AI	Newsletter	Shirin & Shantanu	—	—	Watchlist	↗ ⚠️
NEW-006	Data Hustle by Sai	Newsletter	Sai	—	—	Watchlist	↗ ⚠️

⚠️ = Short link pending full URL verification. Resolve in next patch update.

Appendix B — Full Glossary

Agent

An AI system that uses an LLM to dynamically determine a sequence of actions, often utilizing external tools to achieve a goal.

Chain-of-Thought (CoT)

A prompting technique that instructs the model to generate intermediate reasoning steps before arriving at a final answer, significantly improving logic performance.

Context Window

The maximum number of tokens an LLM can process in a single prompt-response interaction.

Embeddings

High-dimensional vector representations of text. Text with similar semantic meaning will have vectors located close together in space.

Fine-tuning

Taking a pre-trained base model and training it further on a smaller, specific dataset to specialize its behavior or improve instruction adherence.

Hallucination

When an LLM generates a response that sounds plausible but is factually incorrect or unsupported by its training data or context.

LLM

A massive neural network trained on vast text corpora to predict next-token probabilities.

LLMOps

The practices and tools used to deploy, manage, evaluate, and scale LLM applications in production reliably.

MCP (Model Context Protocol)

An open standard protocol designed to securely connect AI models with external data sources and tools.

Prompt Engineering

The iterative process of structuring text input to effectively communicate with and guide the outputs of generative models.

Prompt Injection

A security vulnerability where malicious input overrides system prompt instructions, causing the model to execute unintended behaviors.

RAG

Retrieval-Augmented Generation — grounding an LLM on external knowledge retrieved dynamically from a database to prevent hallucinations and access real-time data.

ReAct

A foundational agent architecture combining "Reasoning" (thinking about what to do) and "Acting" (using a tool), iteratively.

Reflexion

A framework where an agent evaluates its own past actions and outcomes, generating verbal reinforcement to improve future attempts.

Tool Use / Function Calling

The ability of an LLM to output structured data specifying a function name and arguments, allowing the system to trigger external software.

Tree of Thoughts (ToT)

An advanced reasoning technique extending CoT by allowing the model to explore multiple reasoning paths concurrently, evaluating and pruning them.

Vector Database

A specialized database optimized for storing, managing, and performing similarity searches on embedding vectors.

Appendix C — Changelog

v1.0.0

March 3, 2026

Initial release. Defined 6 core learning chapters. Established Tag Taxonomy. Imported 50+ resources into catalog. Short links pending full metadata resolution.

◆ Template for future entries

vX.X.X — [Date]
- [Added/Removed/Updated] [ID] — [Reason]
- [Structural changes, if any]

Appendix D — Editor's Protocol

Adding a Resource

Ensure the resource fits exactly one primary Tag from the taxonomy
Generate a sequential stable ID (e.g., if last repo is REP-012, new is REP-013)
Validate the URL resolves and note paywall status
Create the Per-Resource Entry Block and assign to the correct Chapter
Add a row to Appendix A (Master Source Catalog)
Log the addition in Appendix C with a patch version bump (v1.0.x)

Deprecating a Resource

Do not delete from Appendix A — change Status to Deprecated
Add a note pointing to the replacement ID (e.g., "Replaced by PAP-008")
Remove the Entry Block from the active Chapter body
Log in Appendix C

Versioning Principles

Version Bump	Trigger
v2.0.0	Major structural curriculum shift
v1.1.0	Adding or removing Core resources
v1.0.1	Fixing links, typos, metadata

Appendix E — Style Guide & Tag Taxonomy

Tag Taxonomy

Use only these exact strings:

foundationspromptingagentsRAGevaluationMCPembeddingsvector-databaseLLMOpssafetymulti-agentfine-tuningtheorybeginner-friendlyarchitecturetools

Callout Markers

Symbol	Meaning	Color
◆	Definition	Blue
■	Exercise	Green
▲	Pitfall	Amber
✓	Checklist	Purple

← Home Research →

Learning LLMs & AI Agents

Preface

How to Use This Guide

Learning Map

Starter Glossary

Foundations

1.1 — The Mental Model of Large Language Models

Learning Objectives

Sources

Prompting & Interaction Design

2.1 — Structuring Context and Reasoning

Learning Objectives

Sources

Retrieval-Augmented Generation

3.1 — Grounding Models in External Data

Learning Objectives

Sources

Agents & Tool Orchestration

4.1 — The Agentic Paradigm

Learning Objectives

Sources

Multi-Agent Systems & Memory

5.1 — Collaborative AI Architecture

Learning Objectives

Sources

Evaluation & LLMOps

6.1 — Reliability and Productionization

Learning Objectives

Sources

Appendix A — Master Source Catalog

Appendix B — Full Glossary

Appendix C — Changelog

Appendix D — Editor's Protocol

Adding a Resource

Deprecating a Resource

Versioning Principles

Appendix E — Style Guide & Tag Taxonomy

Tag Taxonomy

Callout Markers