zerfoo

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

README

Zerfoo

Go Reference Go Report Card License

Embed LLMs directly in Go applications. No Python. No sidecar. No HTTP calls to localhost.

model, _ := inference.Load("gemma-3-1b-q4")
reply, _ := model.Generate(ctx, "What is the capital of France?")
fmt.Println(reply) // Paris is the capital of France.

That's it. One binary, one import, one function call.

Install

go get github.com/zerfoo/zerfoo@latest

Gemma 3 1B Q4 is ~700 MB and runs on any laptop with 2 GB free RAM. No GPU required.

Examples

Chat bot in 10 lines
model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

reply, _ := model.Chat(ctx, []inference.Message{
    {Role: "user", Content: "Write a haiku about Go"},
})
fmt.Println(reply.Content)
Streaming tokens to a terminal
model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

model.GenerateStream(ctx, "Explain quantum computing", func(token string, done bool) error {
    fmt.Print(token)
    return nil
})
Summarize text inside a CLI tool
func summarize(text string) string {
    model, _ := inference.Load("gemma-3-1b-q4")
    defer model.Close()

    summary, _ := model.Generate(ctx,
        "Summarize this in one sentence:\n\n"+text,
        inference.WithMaxTokens(64),
        inference.WithTemperature(0.3),
    )
    return summary
}
Add an AI endpoint to an existing HTTP server
mux := http.NewServeMux()

// Your existing routes
mux.HandleFunc("GET /health", healthHandler)

// Add LLM-powered endpoint
model, _ := inference.Load("gemma-3-1b-q4")
mux.HandleFunc("POST /ask", func(w http.ResponseWriter, r *http.Request) {
    var req struct{ Question string }
    json.NewDecoder(r.Body).Decode(&req)

    answer, _ := model.Generate(r.Context(), req.Question,
        inference.WithMaxTokens(256),
    )
    json.NewEncoder(w).Encode(map[string]string{"answer": answer})
})

http.ListenAndServe(":8080", mux)
Drop-in OpenAI-compatible server
model, _ := inference.Load("gemma-3-1b-q4")
server := serve.NewServer(model)
http.ListenAndServe(":8080", server.Handler())

Works with any OpenAI client library — just point it at localhost:8080.

Classify text with structured output
model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

prompt := `Classify this support ticket as "billing", "technical", or "general".
Reply with only the category name.

Ticket: I can't log in to my account after changing my password.

Category:`

category, _ := model.Generate(ctx, prompt,
    inference.WithMaxTokens(4),
    inference.WithTemperature(0),
)
fmt.Println(strings.TrimSpace(category)) // technical
Generate code
model, _ := inference.Load("gemma-3-2b-q4") // Larger model for code tasks
defer model.Close()

code, _ := model.Generate(ctx,
    "Write a Go function that reverses a string. Return only the code.",
    inference.WithMaxTokens(128),
    inference.WithTemperature(0.2),
)
fmt.Println(code)
Process a batch of inputs
model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

questions := []string{
    "What is 2+2?",
    "Name the largest ocean.",
    "Who wrote Hamlet?",
}

for _, q := range questions {
    answer, _ := model.Generate(ctx, q, inference.WithMaxTokens(32))
    fmt.Printf("Q: %s\nA: %s\n\n", q, answer)
}
Model Size RAM Best for
gemma-3-1b-q4 ~700 MB 2 GB Laptops, CI, edge devices, quick tasks
gemma-3-2b-q4 ~1.5 GB 4 GB Code generation, longer reasoning
llama-3-8b-q4 ~4.5 GB 8 GB Complex tasks, higher quality output

All models run on CPU out of the box. Add inference.WithDevice("cuda") for GPU acceleration.

Supported Architectures

Gemma 3, LLaMA 3, Mistral, Qwen 2.5, DeepSeek, Phi-4 — in GGUF or ZMF format, with F32 or Q4_0 quantization.

CLI

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

zerfoo run gemma-3-1b-q4             # Interactive chat
zerfoo serve gemma-3-1b-q4           # OpenAI-compatible API on :8080
zerfoo predict -model gemma-3-1b-q4  # Batch inference from CSV/JSON

Performance

Metric Value
Gemma 3 2B Q4 CPU (ARM64) 3.60 tok/s
CUDA Q4 GEMM (GB10) 2,383 GFLOPS
Q4 model compression 3.7x smaller than F32
PagedAttention savings 46% less memory

How It Works

Zerfoo is a full ML framework written in Go — tensors, computation graphs, automatic differentiation, SIMD kernels, and CUDA support. The inference package wraps all of that into the simple API shown above.

Under the hood, inference.Load does:

  1. Downloads the model (or loads from cache)
  2. Memory-maps the weights (zero-copy, no heap allocation)
  3. Builds a static computation graph with optimized fused kernels
  4. Returns a *Model ready for generation
inference.Load("gemma-3-1b-q4")
    │
    ├── model/gguf    → Parse GGUF file, load Q4 weights
    ├── graph/        → Build computation DAG, fold transposes
    ├── compute/      → CPU engine with NEON/AVX2 SIMD, fused RMSNorm/RoPE
    └── generate/     → Autoregressive decode with PagedKV cache

Building with CUDA

# CPU only (default, no Cgo)
go build ./cmd/zerfoo

# With GPU support
CGO_CFLAGS='-I/usr/local/cuda/include' \
CGO_LDFLAGS='-L/usr/local/cuda/lib64' \
go build -tags cuda ./cmd/zerfoo

Project Structure

inference/       Load models and generate text (start here)
serve/           OpenAI-compatible HTTP server
compute/         Engine interface (34 ops), CPU and CUDA backends
graph/           Computation DAG with automatic differentiation
layers/          40+ layer types (attention, normalization, activations)
tensor/          N-dimensional arrays with Q4/Q8 quantized storage
generate/        Token sampling, speculative decoding, PagedKV cache
model/           ZMF and GGUF model format loaders
training/        SGD, Adam, AdamW optimizers and training loops
internal/cuda/   CUDA kernels (Q4 GEMM, Flash Attention)
internal/xblas/  NEON/AVX2 SIMD matrix multiply
distributed/     gRPC-based distributed training

Contributing

See docs/design.md for architecture decisions.

go test ./... -race -timeout 120s
golangci-lint run ./...

License

Apache 2.0

Documentation

Overview

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildFromZMF added in v0.3.0

func BuildFromZMF[T tensor.Numeric](
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	m *zmf.Model,
	opts ...model.BuildOption,
) (*graph.Graph[T], error)

BuildFromZMF builds a graph from a ZMF model.

func NewAdamW

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer.

func NewCPUEngine

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU engine for the given numeric type.

func NewDefaultTrainer

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer.

func NewFloat32Ops

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

func NewGraph

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph.

func NewMSE

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

func NewRMSNorm

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm is a factory function for creating RMSNorm layers.

func NewTensor

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

func RegisterLayer

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder.

func UnregisterLayer

func UnregisterLayer(opType string)

UnregisterLayer unregisters a layer builder.

Types

type Batch

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch.

type Engine

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU).

type Graph

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

type LayerBuilder

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a layer.

type Node

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

type Numeric

type Numeric tensor.Numeric

Numeric represents a numeric type constraint.

type Parameter

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

type Tensor

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

type ZMFModel added in v0.3.0

type ZMFModel = zmf.Model

Model is a ZMF model.

Directories

Path Synopsis
cmd
bench-compare command
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_tps command
bench_tps measures tokens-per-second for a local ZMF model.
bench_tps measures tokens-per-second for a local ZMF model.
cli
Package cli provides a generic command-line interface framework for Zerfoo.
Package cli provides a generic command-line interface framework for Zerfoo.
coverage-gate command
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
zerfoo command
zerfoo-predict command
zerfoo-tokenize command
Package compute implements tensor computation engines and operations.
Package compute implements tensor computation engines and operations.
Package config provides file-based configuration loading with validation and environment variable overrides.
Package config provides file-based configuration loading with validation and environment variable overrides.
Package device provides device abstraction and memory allocation interfaces.
Package device provides device abstraction and memory allocation interfaces.
Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
coordinator
Package coordinator provides a distributed training coordinator.
Package coordinator provides a distributed training coordinator.
pb
Package graph provides a computational graph abstraction.
Package graph provides a computational graph abstraction.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
internal
clblast
Package clblast provides Go wrappers for the CLBlast BLAS library.
Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
cublas
Package cublas provides low-level purego bindings for the cuBLAS library.
Package cublas provides low-level purego bindings for the cuBLAS library.
cuda
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
cuda/kernels
Package kernels provides Go wrappers for custom CUDA kernels.
Package kernels provides Go wrappers for custom CUDA kernels.
cudnn
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi
Package gpuapi defines internal interfaces for GPU runtime operations.
Package gpuapi defines internal interfaces for GPU runtime operations.
hip
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
hip/kernels
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
miopen
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
nccl
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
opencl
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
tensorrt
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
workerpool
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
Package workerpool provides a persistent pool of goroutines that process submitted tasks.
layers
activations
Package activations provides activation function layers.
Package activations provides activation function layers.
attention
Package attention provides attention mechanisms for neural networks.
Package attention provides attention mechanisms for neural networks.
components
Package components provides reusable components for neural network layers.
Package components provides reusable components for neural network layers.
core
Package core provides core neural network layer implementations.
Package core provides core neural network layer implementations.
embeddings
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
gather
Package gather provides the Gather layer for the Zerfoo ML framework.
Package gather provides the Gather layer for the Zerfoo ML framework.
hrm
Package hrm implements the Hierarchical Reasoning Model.
Package hrm implements the Hierarchical Reasoning Model.
normalization
Package normalization provides various normalization layers for neural networks.
Package normalization provides various normalization layers for neural networks.
reducesum
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
registry
Package registry provides a central registration point for all layer builders.
Package registry provides a central registration point for all layer builders.
regularization
Package regularization provides regularization layers for neural networks.
Package regularization provides regularization layers for neural networks.
sequence
Package sequence provides sequence modeling layers such as State Space Models.
Package sequence provides sequence modeling layers such as State Space Models.
transformer
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
transpose
Package transpose provides the Transpose layer for the Zerfoo ML framework.
Package transpose provides the Transpose layer for the Zerfoo ML framework.
Package log provides a structured, leveled logging abstraction.
Package log provides a structured, leveled logging abstraction.
runtime
Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.
Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.
Package model provides adapter implementations for bridging existing and new model interfaces.
Package model provides adapter implementations for bridging existing and new model interfaces.
gguf
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
hrm
Package hrm provides experimental Hierarchical Reasoning Model types.
Package hrm provides experimental Hierarchical Reasoning Model types.
Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.
Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.
pkg
tokenizer
Package tokenizer provides text tokenization for ML model inference.
Package tokenizer provides text tokenization for ML model inference.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
Package serve provides an OpenAI-compatible HTTP API server for model inference.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
Package tensor provides a multi-dimensional array (tensor) implementation.
Package tensor provides a multi-dimensional array (tensor) implementation.
testing
testutils
Package testutils provides testing utilities and mock implementations for the Zerfoo ML framework.
Package testutils provides testing utilities and mock implementations for the Zerfoo ML framework.
tests
Package training provides adapter implementations for bridging existing and new interfaces.
Package training provides adapter implementations for bridging existing and new interfaces.
loss
Package loss provides various loss functions for neural networks.
Package loss provides various loss functions for neural networks.
optimizer
Package optimizer provides various optimization algorithms for neural networks.
Package optimizer provides various optimization algorithms for neural networks.
Package types contains shared, fundamental types for the Zerfoo framework.
Package types contains shared, fundamental types for the Zerfoo framework.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL