Learning LLMs & AI Agents
A Curated Field Guide β Edition v1.0.0 β March 3, 2026 β Living Document
Preface
This document is a version-controlled reference manual and curriculum architecture for understanding, building, and deploying Large Language Models (LLMs) and Agentic architectures. It is designed to be maintainable, verifiable, and structurally durable.
This is Edition 1.0.0. As the field accelerates, individual resources will deprecate rapidly. This guide relies on a stable ID system (e.g., VID-001, PAP-003) so sources can be hot-swapped in the Master Source Catalog without breaking the underlying curriculum.
Who this is for: Engineers, technical product managers, and researchers transitioning into AI engineering. It assumes fundamental programming knowledge but no prior deep learning expertise.
How to Use This Guide
Start with a goal. Find your row in the matrix below, then follow the path. Resource IDs (e.g., VID-001) correspond to entries in Appendix A.
| Goal | Start Here | Then | Advanced | Band |
|---|---|---|---|---|
| Understand LLMs | VID-001 BOK-001 |
VID-002 COU-012 |
BOK-002 BOK-003 |
Beginner |
| Master Prompting | REP-003 PAP-004 |
PAP-005 |
PAP-006 |
Beginner |
| Build first agent | GUI-001 GUI-005 |
VID-005 REP-002 |
COU-001 VID-007 |
Intermediate |
| Master RAG | PAP-007 COU-003 |
COU-004 |
COU-006 |
Intermediate |
| Multi-agent systems | PAP-002 VID-003 |
COU-011 COU-014 |
REP-001 GUI-002 |
Advanced |
| Evaluate & Deploy | COU-009 VID-004 |
COU-008 REP-010 |
GUI-003 REP-008 |
Advanced |
Learning Map
- Mental Models β What LLMs are
- Prompting as Interface Design
- Tool Use & Structured Outputs
- RAG Systems
- Agents & Agentic Architectures
- Multi-Agent Systems & Memory
- Evaluation & Reliability
- LLMOps & Deployment
- Security & Prompt Injection
- Capstone Builds
Starter Glossary
An AI system that uses an LLM as its reasoning engine to determine which actions to take and in what order, often interacting with external tools.
A deep learning model trained on vast text to predict the next token, enabling it to generate human-like text and perform reasoning tasks.
Retrieval-Augmented Generation β grounds an LLM's responses in external knowledge retrieved at runtime.
Model Context Protocol β an open standard enabling secure two-way connections between data sources and AI models.
Numerical representations of text in a high-dimensional vector space, capturing semantic meaning.
β Full glossary in Appendix B
Foundations
π’ Beginner1.1 β The Mental Model of Large Language Models
Learning Objectives
- Understand the fundamental mechanism of next-token prediction
- Differentiate between base models and instruction-tuned models
- Recognize hardware and data requirements for training
A Large Language Model is a neural network, typically based on the Transformer architecture, optimized to predict the most probable subsequent token given a sequence of preceding tokens.
Core Ideas: LLMs are not knowledge databases β they are statistical reasoning engines. Their primary capability is pattern recognition and generation. The shift from base models (which simply continue text) to chat models (which follow instructions) is achieved through Fine-Tuning and Reinforcement Learning from Human Feedback (RLHF).
Access a base model and an instruction-tuned model via API or open-source weights. Feed both the same prompt: "The capital of France is". Observe how the base model continues the sentence, while the tuned model answers the implied question.
Treating the LLM as a factual lookup engine. Detection cue: You are surprised when the model fabricates a plausible-sounding but incorrect URL or historical date.
- I can explain next-token prediction to a non-technical peer
- I understand the difference between pre-training and fine-tuning
- I know why hallucinations are a feature of the architecture, not a bug
Sources
Prompting & Interaction Design
π’ Beginner2.1 β Structuring Context and Reasoning
Learning Objectives
- Master zero-shot and few-shot prompting techniques
- Implement Chain-of-Thought reasoning to improve output reliability
- Design prompts as deterministic interfaces
Prompt Engineering is the systematic process of designing, structuring, and optimizing inputs to an LLM to elicit accurate, structured, and predictable outputs.
Core Ideas: Models perform better when given space to "think." Techniques like Chain-of-Thought (CoT) force the model to output intermediate reasoning steps, vastly reducing logic errors.
Write a prompt asking an LLM to solve a logic puzzle. First, ask for just the answer. Second, append "Think step-by-step before answering." Compare reliability.
Writing polite, conversational requests instead of structured commands. Detection cue: Your prompts include "Could you please..." instead of structured <instruction> tags.
- I separate instructions from data using delimiters (XML tags)
- I provide few-shot examples for complex formatting tasks
- I utilize Chain-of-Thought for tasks requiring logic or math
Sources
Retrieval-Augmented Generation
π‘ Intermediate3.1 β Grounding Models in External Data
Learning Objectives
- Understand the architecture of a standard RAG pipeline
- Generate embeddings and store them in a vector database
- Execute semantic search to retrieve context for an LLM
RAG is a framework that retrieves relevant facts from an external knowledge base to ground large language models on the most accurate, up-to-date information before generating an answer.
Core Ideas: Models are frozen in time. RAG solves this by passing relevant documents into the context window at runtime. The process: Chunk β Embed β Index β Retrieve β Generate.
Take a long PDF. Chunk it into 500-token segments, generate embeddings for each, then write a script to find the most similar chunk to a user query using cosine similarity.
Using overly large chunk sizes. Detection cue: The retrieved context contains the answer but the LLM ignores it β "Lost in the Middle" phenomenon.
- I can explain the difference between lexical and semantic search
- I understand how chunking strategies impact retrieval quality
- I know how to calculate cosine similarity between two vectors
Sources
Agents & Tool Orchestration
π‘ Intermediate4.1 β The Agentic Paradigm
Learning Objectives
- Define what makes a system "agentic"
- Implement the ReAct (Reason + Act) pattern
- Provide an LLM with external tools β APIs, calculators, code interpreters
An Agent is a system where an LLM is given an objective, a set of tools, and a loop allowing it to independently reason, execute actions, observe results, and iterate until the objective is met.
Core Ideas: The ReAct pattern is foundational β the model Reasons about what to do, chooses an Action (tool call), Observes the output, and loops. MCP standardizes how agents connect to external data.
Build a basic ReAct loop in Python without an agent framework. Define one tool (e.g., current weather function). Write a while loop that prompts the LLM, parses its output for a tool call, executes the function, and feeds the result back.
Overloading the agent with too many tools. Detection cue: The model enters an infinite loop, continuously calling the wrong tool or hallucinating tool arguments.
- I understand the ReAct framework
- I can write a system prompt defining available tools and their schemas
- I understand the principles behind the Model Context Protocol (MCP)
Sources
Multi-Agent Systems & Memory
π΄ Advanced5.1 β Collaborative AI Architecture
Learning Objectives
- Design workflows requiring multiple specialized agents
- Implement memory systems β short-term context vs. long-term vector storage
- Structure agent-to-agent communication
Multi-Agent Systems (MAS) orchestrate multiple distinct AI agents, each with specific roles, system prompts, and tools, collaborating to solve tasks too complex for a single agent.
Design an architecture diagram for an automated software development team. Map out roles (PM, Coder, Reviewer), the specific tools each role needs, and the flow of information between them.
Lack of termination criteria. Detection cue: Agents get stuck in a conversational loop, endlessly agreeing or passing the same data back without progressing the task.
- I can identify when a task requires multi-agent vs. single-agent
- I understand how to implement an orchestrator/router pattern
- I can manage conversational state without exceeding context window limits
Sources
Evaluation & LLMOps
π΄ Advanced6.1 β Reliability and Productionization
Learning Objectives
- Implement "LLM-as-a-Judge" evaluation metrics
- Design logging and tracing systems for agent workflows
- Manage prompt versioning and regression testing
LLMOps comprises the operational capabilities, infrastructure, and practices required to manage the lifecycle of LLM applications, from prompt engineering to production deployment and monitoring.
Create an evaluation dataset of 10 queries and 10 ideal responses. Write a script that passes your system's outputs to an "evaluator model," prompting it to score 1β5 based on accuracy and conciseness.
Deploying straight from a playground to production. Detection cue: You have no visibility into the actual prompts your system generates on behalf of users, or the latency of external tool calls.
- I have baseline metrics for my task (retrieval precision, generation accuracy)
- I have implemented tracing to monitor token usage and cost
- I have automated evaluation pipelines before pushing prompt changes
Sources
Appendix A β Master Source Catalog
Single source of truth. All additions, deprecations, and updates happen here first.
Short links (lnkd.in) require manual verification. Mark any unresolved links β οΈ Verify in the Status column after each patch update. See Appendix D for protocol.
| ID | Title | Type | Author | Difficulty | Time | Status | URL |
|---|---|---|---|---|---|---|---|
| VID-001 | LLM Introduction | Video | Unknown | Beginner | 1h | Core | β |
| VID-002 | LLMs from Scratch | Video | Unknown | Advanced | 2h | Core | β |
| VID-003 | Agentic AI Overview (Stanford) | Video | Stanford | Intermediate | 1h | Core | β |
| VID-004 | Building and Evaluating Agents | Video | Unknown | Advanced | 1h | Core | β |
| VID-005 | Building Effective Agents | Video | Unknown | Intermediate | 1h | Core | β |
| VID-006 | Building Agents with MCP | Video | Unknown | Intermediate | 1h | Core | β |
| VID-007 | Building an Agent from Scratch | Video | Unknown | Intermediate | 1h | Core | β |
| VID-008 | Philo Agents (Playlist) | Video | Unknown | Intermediate | 2h | Optional | β |
| REP-001 | GenAI Agents | Repo | Nirdiamant | Intermediate | β | Core | β |
| REP-002 | AI Agents for Beginners | Repo | Microsoft | Beginner | β | Core | β |
| REP-003 | Prompt Engineering Guide | Repo | Unknown | Beginner | 4h | Core | β β οΈ |
| GUI-001 | Google's Agent Whitepaper | Guide | Intermediate | 1h | Core | β β οΈ | |
| GUI-002 | Google's Agent Companion | Guide | Intermediate | 1h | Core | β β οΈ | |
| GUI-003 | Building Effective Agents β Anthropic | Guide | Anthropic | Intermediate | 1h | Core | β β οΈ |
| GUI-004 | Claude Code Best Agentic Practices | Guide | Anthropic | Intermediate | 1h | Core | β β οΈ |
| GUI-005 | OpenAI's Practical Guide to Building Agents | Guide | OpenAI | Intermediate | 1h | Core | β β οΈ |
| BOK-001 | Understanding Deep Learning | Book | Unknown | Intermediate | 20h+ | Core | β |
| BOK-002 | Building an LLM from Scratch | Book | Unknown | Advanced | 15h+ | Core | β β οΈ |
| BOK-003 | The LLM Engineering Handbook | Book | Unknown | Advanced | 15h+ | Core | β β οΈ |
| BOK-004 | AI Agents: The Definitive Guide | Book | Nicole Koenigstein | Intermediate | 10h+ | Core | β β οΈ |
| BOK-005 | Building Applications with AI Agents | Book | Michael Albada | Intermediate | 10h+ | Optional | β β οΈ |
| BOK-006 | AI Agents with MCP | Book | Kyle Stratis | Intermediate | 10h+ | Optional | β β οΈ |
| BOK-007 | AI Engineering β O'Reilly | Book | O'Reilly | Advanced | 15h+ | Core | β |
| PAP-001 | ReAct: Synergizing Reasoning and Acting | Paper | Unknown | Advanced | 2h | Core | β β οΈ |
| PAP-002 | Generative Agents | Paper | Unknown | Advanced | 2h | Core | β β οΈ |
| PAP-003 | Toolformer | Paper | Unknown | Advanced | 2h | Core | β β οΈ |
| PAP-004 | Chain-of-Thought Prompting | Paper | Unknown | Intermediate | 1h | Core | β β οΈ |
| PAP-005 | Tree of Thoughts | Paper | Unknown | Advanced | 2h | Core | β β οΈ |
| PAP-006 | Reflexion | Paper | Unknown | Advanced | 2h | Core | β β οΈ |
| PAP-007 | RAG Survey | Paper | Unknown | Intermediate | 2h | Core | β β οΈ |
| COU-001 | HuggingFace Agent Course | Course | HuggingFace | Intermediate | 8h | Core | β β οΈ |
| COU-002 | MCP with Anthropic | Course | Anthropic | Intermediate | 4h | Core | β β οΈ |
| COU-003 | Building Vector DBs with Pinecone | Course | Pinecone | Intermediate | 5h | Core | β β οΈ |
| COU-004 | Vector DBs from Embeddings to Apps | Course | Unknown | Intermediate | 5h | Core | β β οΈ |
| COU-005 | Agent Memory | Course | Unknown | Intermediate | 3h | Core | β β οΈ |
| COU-006 | Building and Evaluating RAG Apps | Course | Unknown | Advanced | 5h | Core | β β οΈ |
| COU-007 | Building Browser Agents | Course | Unknown | Advanced | 4h | Optional | β β οΈ |
| COU-008 | LLMOps | Course | Unknown | Advanced | 6h | Core | β β οΈ |
| COU-009 | Evaluating AI Agents | Course | Unknown | Advanced | 4h | Core | β β οΈ |
| COU-010 | Computer Use with Anthropic | Course | Anthropic | Advanced | 4h | Optional | β β οΈ |
| COU-011 | Multi-Agent Use | Course | Unknown | Advanced | 4h | Core | β β οΈ |
| COU-012 | Improving LLM Accuracy | Course | Unknown | Intermediate | 4h | Core | β β οΈ |
| COU-013 | Agent Design Patterns | Course | Unknown | Advanced | 5h | Core | β β οΈ |
| COU-014 | Multi Agent Systems | Course | Unknown | Advanced | 4h | Core | β β οΈ |
| NEW-001 | Gradient Ascent | Newsletter | Unknown | β | β | Watchlist | β β οΈ |
| NEW-002 | DecodingML by Paul | Newsletter | Paul | β | β | Watchlist | β β οΈ |
| NEW-003 | Deep (Learning) Focus by Cameron | Newsletter | Cameron | β | β | Watchlist | β β οΈ |
| NEW-004 | NeoSage by Shivani | Newsletter | Shivani | β | β | Watchlist | β |
| NEW-005 | Jam with AI | Newsletter | Shirin & Shantanu | β | β | Watchlist | β β οΈ |
| NEW-006 | Data Hustle by Sai | Newsletter | Sai | β | β | Watchlist | β β οΈ |
β οΈ = Short link pending full URL verification. Resolve in next patch update.
Appendix B β Full Glossary
An AI system that uses an LLM to dynamically determine a sequence of actions, often utilizing external tools to achieve a goal.
A prompting technique that instructs the model to generate intermediate reasoning steps before arriving at a final answer, significantly improving logic performance.
The maximum number of tokens an LLM can process in a single prompt-response interaction.
High-dimensional vector representations of text. Text with similar semantic meaning will have vectors located close together in space.
Taking a pre-trained base model and training it further on a smaller, specific dataset to specialize its behavior or improve instruction adherence.
When an LLM generates a response that sounds plausible but is factually incorrect or unsupported by its training data or context.
A massive neural network trained on vast text corpora to predict next-token probabilities.
The practices and tools used to deploy, manage, evaluate, and scale LLM applications in production reliably.
An open standard protocol designed to securely connect AI models with external data sources and tools.
The iterative process of structuring text input to effectively communicate with and guide the outputs of generative models.
A security vulnerability where malicious input overrides system prompt instructions, causing the model to execute unintended behaviors.
Retrieval-Augmented Generation β grounding an LLM on external knowledge retrieved dynamically from a database to prevent hallucinations and access real-time data.
A foundational agent architecture combining "Reasoning" (thinking about what to do) and "Acting" (using a tool), iteratively.
A framework where an agent evaluates its own past actions and outcomes, generating verbal reinforcement to improve future attempts.
The ability of an LLM to output structured data specifying a function name and arguments, allowing the system to trigger external software.
An advanced reasoning technique extending CoT by allowing the model to explore multiple reasoning paths concurrently, evaluating and pruning them.
A specialized database optimized for storing, managing, and performing similarity searches on embedding vectors.
Appendix C β Changelog
Initial release. Defined 6 core learning chapters. Established Tag Taxonomy. Imported 50+ resources into catalog. Short links pending full metadata resolution.
vX.X.X β [Date] - [Added/Removed/Updated] [ID] β [Reason] - [Structural changes, if any]
Appendix D β Editor's Protocol
Adding a Resource
- Ensure the resource fits exactly one primary Tag from the taxonomy
- Generate a sequential stable ID (e.g., if last repo is REP-012, new is REP-013)
- Validate the URL resolves and note paywall status
- Create the Per-Resource Entry Block and assign to the correct Chapter
- Add a row to Appendix A (Master Source Catalog)
- Log the addition in Appendix C with a patch version bump (v1.0.x)
Deprecating a Resource
- Do not delete from Appendix A β change Status to
Deprecated - Add a note pointing to the replacement ID (e.g., "Replaced by PAP-008")
- Remove the Entry Block from the active Chapter body
- Log in Appendix C
Versioning Principles
| Version Bump | Trigger |
|---|---|
| v2.0.0 | Major structural curriculum shift |
| v1.1.0 | Adding or removing Core resources |
| v1.0.1 | Fixing links, typos, metadata |
Appendix E β Style Guide & Tag Taxonomy
Tag Taxonomy
Use only these exact strings:
Callout Markers
| Symbol | Meaning | Color |
|---|---|---|
| β | Definition | Blue |
| β | Exercise | Green |
| β² | Pitfall | Amber |
| β | Checklist | Purple |