Ultimate Local AI

Running inference against LLM models that run on your own local hardware means no API costs, no data leaving your machine, and no vendor lock-in. Having the ability to do this with Go has traditionally been painful.

In this workshop, Bill will introduce Kronk, a Go SDK that lets you embed local model inference directly into your Go applications with full GPU acceleration — no CGO required. Whether you are building chat, vision, audio, embedding, or tool calling applications, Kronk gives you the same power as a model server without needing one.
To prove it, Bill built a model server using the Kronk SDK, complete with caching, batch processing, and agent support. You'll see live demos from writing your first chat app to driving a coding agent with a local model.

Place
GoLab Workshops I
Length
360 min
When
November 1st, 2026
09:30

Abstract

Description

Running inference against LLM models that run on your own local hardware means no API costs, no data leaving your machine, and no vendor lock-in. Having the ability to do this with Go has traditionally been painful. In this workshop, Bill will introduce Kronk, a Go SDK that lets you embed local model inference directly into your Go applications with full GPU acceleration — no CGO required. Whether you are building chat, vision, audio, embedding, or tool calling applications, Kronk gives you the same power as a model server without needing one. To prove it, Bill built a model server using the Kronk SDK, complete with caching, batch processing, and agent support. You'll see live demos from writing your first chat app to driving a coding agent with a local model.

This is a lecture and hands-on full-day workshop where you'll go from zero to running open-source models directly inside your Go applications on your own local machine — no cloud APIs, no external servers, no data leaving your machine. Throughout the day, you will learn all the internals of the Kronk SDK which will teach you about model architectures, KV caching, batch processing, token/decoding, prompt caching, token sampling, and more.

With that solid foundation, you'll build real applications:

  • Retrieval-Augmented Generation (RAG) pipeline that grounds model responses in your own documents using embeddings and vector search.
  • Natural language to SQL system where the model generates database queries from plain English, with grammar constraints ensuring the output is always valid executable SQL.

By the end of the day, you won't just understand how AI model inference works — you'll have built applications that load models, cache intelligently, retrieve context, and generate code, all running locally on your own hardware.

What a Student Is Expected to Learn

By the end of this workshop, you'll leave with working code, a deep understanding of model inference, and hands-on experience across the full stack: model configuration, performance tuning, intelligent caching, retrieval-augmented generation, and structured code generation.

Workshop setup

Hardware Requirements

Don't worry if you don't have the full hardware required for this. The instructor will provide everything you need to follow along and be able to run the examples.

  • Mac M1+ series with at least 16 GB RAM.
  • Any Linux/Windows laptop with a dedicated GPU with at least 8GB VRAM (not system RAM) (pref 16GB).
  • Access to a cloud-based instance with a dedicated GPU with at least 8GB VRAM (pref 16GB).

Prerequisites

  • It's expected that you will have been coding in Go for several months.
  • A working Go environment running on the device you will be bringing to class.

Recommended Preparation

  • Please clone the main repo (https://github.com/ardanlabs/kronk) for the class.
  • Please read the notes in the makefile for installing all the tooling and testing the code before class.
  • Please email the instructor, Bill Kennedy, for assistance.

Outline

  • Why Local Inference? (Privacy, latency, cost, no vendor lock-in, offline)

    • What is Kronk? (Go SDK + optional Model Server)
    • Architecture: SDK-first design, non-CGO via yzma
    • Show the layered architecture diagram
  • Hello World — Question example (simplest SDK usage)

    • Walk through the code, show it running
  • Architecture and Configuration

    • Navigation Hugging Face and Model Types
    • VRAM Calculations
    • Batch Engine Architecure
    • Caching System Semantics
    • Sampling parameters
  • Tool Calling with a Local Model

    • Use the chat example's get_weather function
    • Show a local model deciding to call tools
  • Vision App

    • What projectors are and why vision models need them
    • Memory overhead: model + projector + KV cache
  • Kronk Model Server (KMS)

    • Catalog system — kronk catalog pull, verified models
    • Show the BUI and all the tools/apps
    • Chat App with a coding model
    • Batch processing — concurrent requests with n_seq_max slots
    • Quick flash of observability: Prometheus metrics / Statsviz
  • AI Agent Integration

    • Cline driving real coding work through KMS
    • MCP service with Brave Search — local model doing web searches
    • Mention compatibility: Claude Code, OpenWebUI, any OpenAI client
  • RAG Application

    • Take Go Notebook and show how the model can use it to provide specific answers to questions.
  • SQL Application

    • Create a relational database with data and using natural language query the database.

GoLab is a conference made by Develer.
Develer is a company based in Campi Bisenzio, near Florence. Our motto is : "Technology to give life to your products". We produce hardware and software to create exceptional products and to improve industrial processes and people's well being.
In Develer we have passion for the new technologies and we offer our clients effective solutions that are also efficient, simple and safe for the end users. We also believe in a friendly and welcoming environment where anybody can give their contribution. This passion and this vision are what we've been driven to organize our conference "made by developers for developers".


Subscribe to our newsletter

We hate spam just as much as you do, which is why we promise to only send you relevant communications. We respect your privacy and will never share your information with third parties.
©2026 GoLab | The international conference on Go in Italy-Design & devCantiere Creativo-Made withDatoCMS