Summary of Core AI Concepts that I will be Covering

In terms of the plan to cover the breadth of AI key topics to provide some level of structure to the posts, please see the following list. The aim is to develop a high level understanding of the key concepts and then build the depth. This will help ground the direction and understanding of fundamental concepts, so that as detail is added, there is still a strong foundation in place and understanding of where it sits in the bigger picture.

In terms of the core topics that will be covered, my mental model uses the Large Language Model as the brain and agentic AI as the remainder of the body, which ultimately enables the action.

What does this mean? This means that I will structure the posts as LLM, which will focus on ensuring that the brain is best used, understood and does not degrade over time. I have captured a set of potential sub-topics that may come up during the blog if relevant. From the agentic perspective, the aim is to ensure that the latest considerations are taken onboard as this is the technology that I will be deploying.

SO LLM topics are

LLM Foundations
- LLM APIs
- Open Source LLMs
- Prompt Engineering
- Structured Output
Vector Stores
- Embedding Models
- Vector Databases
- Chunking Strategies
- Semantic Search
RAG
- Ingesting docs
- Retrieval methods
- Context handling
- Prompts Templates
- Orchestration Framework
Advanced RAG
- Query transformation
- Reranking & Filtration
- LLM as a judge
- HyDE
- Corrective RRAG
- RAGAS Evaluation
- Agentic RAG
- Graph-RAG
- Self-RAG
Fine Tuning
- Data Preparation
- PEFT Methods
- Training Config
- Alignment
- Training Tools
Inference Optimisation
- Quantisation
- Serving Engine
- Optimisation Techniques
Deployment
- MML/LLMOps
- Infrastructure
- Local inference cloud platforms
Observability
- Tracing and logging
- Monitoring metrics
- Evals
Agents high level (intro)
- Why
- Landscape
Production & Security
- Guardrails
- Cost optimisation
- Reliability

Agentic AI

Agent Foundations

What is an agent vs. a chain vs. a workflow
The ReAct pattern (Reason + Act)
Tool use and function calling
Memory types (short-term, long-term, episodic)

Tool & Environment Integration

API tool design and schemas
Code execution sandboxes
Browser/web interaction
File system and database access

Planning & Reasoning

Chain-of-thought and tree-of-thought
Task decomposition strategies
Self-reflection and critique loops
Goal tracking and replanning

Orchestration Frameworks

LangGraph
CrewAI
AutoGen
OpenAI Agents SDK
Anthropic's tool use patterns

Multi-Agent Systems

Agent roles and delegation
Communication protocols between agents
Supervisor vs. peer architectures
Shared state and handoffs

Memory & State Management

Conversation memory strategies
Vector store integration for long-term recall
Context window management
Checkpointing and resumability

Human-in-the-Loop

Approval gates and breakpoints
Escalation patterns
Confidence thresholds for autonomous action
User feedback loops

Retrieval-Augmented Agents

Agents that decide when to retrieve
Dynamic tool selection
Combining RAG with planning
Agentic RAG vs. static RAG

Evaluation & Testing

Trajectory evaluation (did the agent take good steps?)
End-to-end task success metrics
Benchmarks (SWE-bench, GAIA, WebArena)
Regression testing for agent behaviour

Safety & Guardrails

Action sandboxing and permissions
Prompt injection defence
Output validation before execution
Rate limiting and cost controls

Deployment & Observability

Tracing agent execution (LangSmith, Arize, etc.)
Latency and cost monitoring per step
Error recovery and fallback strategies
Logging decisions and tool calls

Production Patterns

Async and parallel agent execution
Caching strategies for repeated tasks
Graceful degradation when tools fail
Versioning agent configurations

In terms of the core topics that will be covered, my mental model uses the Large Language Model as the brain and agentic AI as the remainder of the body, which ultimately enables the action.

SO LLM topics are

LLM Foundations
- LLM APIs
- Open Source LLMs
- Prompt Engineering
- Structured Output
Vector Stores
- Embedding Models
- Vector Databases
- Chunking Strategies
- Semantic Search
RAG
- Ingesting docs
- Retrieval methods
- Context handling
- Prompts Templates
- Orchestration Framework
Advanced RAG
- Query transformation
- Reranking & Filtration
- LLM as a judge
- HyDE
- Corrective RRAG
- RAGAS Evaluation
- Agentic RAG
- Graph-RAG
- Self-RAG
Fine Tuning
- Data Preparation
- PEFT Methods
- Training Config
- Alignment
- Training Tools
Inference Optimisation
- Quantisation
- Serving Engine
- Optimisation Techniques
Deployment
- MML/LLMOps
- Infrastructure
- Local inference cloud platforms
Observability
- Tracing and logging
- Monitoring metrics
- Evals
Agents high level (intro)
- Why
- Landscape
Production & Security
- Guardrails
- Cost optimisation
- Reliability

Agentic AI

Agent Foundations

What is an agent vs. a chain vs. a workflow
The ReAct pattern (Reason + Act)
Tool use and function calling
Memory types (short-term, long-term, episodic)

Tool & Environment Integration

API tool design and schemas
Code execution sandboxes
Browser/web interaction
File system and database access

Planning & Reasoning

Chain-of-thought and tree-of-thought
Task decomposition strategies
Self-reflection and critique loops
Goal tracking and replanning

Orchestration Frameworks

LangGraph
CrewAI
AutoGen
OpenAI Agents SDK
Anthropic's tool use patterns

Multi-Agent Systems

Agent roles and delegation
Communication protocols between agents
Supervisor vs. peer architectures
Shared state and handoffs

Memory & State Management

Conversation memory strategies
Vector store integration for long-term recall
Context window management
Checkpointing and resumability

Human-in-the-Loop

Approval gates and breakpoints
Escalation patterns
Confidence thresholds for autonomous action
User feedback loops

Retrieval-Augmented Agents

Agents that decide when to retrieve
Dynamic tool selection
Combining RAG with planning
Agentic RAG vs. static RAG

Evaluation & Testing

Trajectory evaluation (did the agent take good steps?)
End-to-end task success metrics
Benchmarks (SWE-bench, GAIA, WebArena)
Regression testing for agent behaviour

Safety & Guardrails

Action sandboxing and permissions
Prompt injection defence
Output validation before execution
Rate limiting and cost controls

Deployment & Observability

Tracing agent execution (LangSmith, Arize, etc.)
Latency and cost monitoring per step
Error recovery and fallback strategies
Logging decisions and tool calls

Production Patterns

Async and parallel agent execution
Caching strategies for repeated tasks
Graceful degradation when tools fail
Versioning agent configurations