Abstract
By the end of this workshop, you'll leave with working code, a production-ready mindset for AI-powered Go applications, and hands-on experience across the full stack: retrieval, action, performance, and security.
Part 1: Retrieval-Augmented Generation (RAG) in Go
• Understanding RAG Concepts – Improve responses by dynamically retrieving relevant context rather than relying solely on static training data.
• Ingesting and Processing Documents – Build pipelines to index and retrieve documents from client systems.
• Interacting with AI-Compatible APIs – Learn how Go applications can connect to local inference engines, OpenAI-compatible servers, or cloud AI services.
• Optimizing Performance & Latency – Implement caching, batching, and parallel processing to enhance efficiency.
• Using Vector Databases – Store and search embeddings with tools such as Chroma, Pinecone, Weaviate, Milvus or pgvector in PostgreSQL.
Part 2: Tool Calling & Function Execution in Go
• How AI Uses Tool Calling – Enable external system control by allowing AI to invoke predefined functions in Go.
• Building Function Calls with OpenAI-Compatible Systems – Define structured function inputs and outputs for AI-driven interactions.
• Connecting to External APIs & Databases – Trigger real-world actions, query databases, and automate workflows.
• Handling Responses & Errors – Ensure safe and reliable execution of AI-invoked functions.
• Introduction to the Model Context Protocol (MCP) – Understand how MCP standardizes the way models discover and invoke tools. Build a simple MCP server in Go that exposes tools to any MCP-compatible client, showing how it compares to direct function calling.
Part 3: Advanced Optimizations
• Speculative Decoding – Use a smaller draft model alongside a larger verification model to get near-large-model quality at small-model speeds. Applicable to both local inference libraries and serving engines that support it natively.
• Automatic Prefix Caching & KV Cache Reuse – Structure multi-turn conversations so shared prefixes (system prompts, conversation history) are cached and reused across requests, avoiding redundant computation. Manage message arrays carefully to keep prefixes stable across turns.
• Semantic Caching – Embed user queries and check vector similarity against cached query-response pairs, returning cached answers for semantically equivalent questions without running inference — implementable in Go with any embedding model or API
• Adaptive Retrieval – Use a lightweight classifier or a small local model to decide whether RAG context is needed at all, avoiding irrelevant context injection that can degrade response quality.
• Cascading Model Routing – Route queries to different models based on complexity: a fast small model for simple questions, escalating to a larger model only when confidence is low, implemented as Go middleware.
Part 4: Securing LLM-Powered Go Applications
• Prompt Injection Defenses – Understand direct and indirect prompt injection attacks, and implement role separation, input sanitization, and detection strategies. Demonstrate how injected instructions in user input or retrieved documents can hijack model behavior.
• Securing Tool Calls – Apply least-privilege principles to exposed functions, prevent command injection from model output, and enforce authorization checks before execution.
• RAG Pipeline Security – Guard against data poisoning and indirect injection via ingested documents. Show how a malicious document in the vector DB can manipulate retrieval results and model responses, and defend with access controls, relevance thresholds, and content isolation.
• Output Sanitization & Exfiltration Prevention – Sanitize model-generated content before rendering in web UIs to prevent XSS. Defend against data exfiltration where the model encodes sensitive retrieved data into tool call arguments targeting attacker-controlled endpoints, using domain allowlists and egress filtering.
• Chain-of-Call Escalation – Show how a model can chain multiple tool calls in a single turn to escalate privileges. Implement call budgets, supervision layers, and human-in-the-loop checkpoints in Go.
Prerequisites
• Direct access to hardware which can run the workshop will be provided during the class. All the code will be OpenAI-compatible and you can run this against any environment, from Kronk to AWS Bedrock or Vertex AI and anything in-between.
• It is expected that you will have been coding in Go for several months.
• Have a functioning Go environment installed with Go 1.26 or later.
Recommended Preparation
• Direct access to hardware which can run the workshop will be provided during the class. All the code will be OpenAI-compatible and you can run this against any environment, from Kronk to AWS Bedrock or Vertex AI and anything in-between.
• It is expected that you will have been coding in Go for several months.
• Have a functioning Go environment installed with Go 1.26 or later.