zerfoo

package module

v1.1.0 Latest Latest Go to latest Published: Mar 14, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/zerfoo/zerfoo

Links

Open Source Insights

README ¶

Zerfoo

Embed LLMs directly in Go applications. No Python. No sidecar. No HTTP calls to localhost.

model, _ := inference.Load("gemma-3-1b-q4")
reply, _ := model.Generate(ctx, "What is the capital of France?")
fmt.Println(reply) // Paris is the capital of France.

That's it. One binary, one import, one function call.

Install

go get github.com/zerfoo/zerfoo@latest

Gemma 3 1B Q4 is ~700 MB and runs on any laptop with 2 GB free RAM. No GPU required.

Examples

Chat bot in 10 lines

model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

reply, _ := model.Chat(ctx, []inference.Message{
    {Role: "user", Content: "Write a haiku about Go"},
})
fmt.Println(reply.Content)

Streaming tokens to a terminal

model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

model.GenerateStream(ctx, "Explain quantum computing", func(token string, done bool) error {
    fmt.Print(token)
    return nil
})

Summarize text inside a CLI tool

func summarize(text string) string {
    model, _ := inference.Load("gemma-3-1b-q4")
    defer model.Close()

    summary, _ := model.Generate(ctx,
        "Summarize this in one sentence:\n\n"+text,
        inference.WithMaxTokens(64),
        inference.WithTemperature(0.3),
    )
    return summary
}

Add an AI endpoint to an existing HTTP server

mux := http.NewServeMux()

// Your existing routes
mux.HandleFunc("GET /health", healthHandler)

// Add LLM-powered endpoint
model, _ := inference.Load("gemma-3-1b-q4")
mux.HandleFunc("POST /ask", func(w http.ResponseWriter, r *http.Request) {
    var req struct{ Question string }
    json.NewDecoder(r.Body).Decode(&req)

    answer, _ := model.Generate(r.Context(), req.Question,
        inference.WithMaxTokens(256),
    )
    json.NewEncoder(w).Encode(map[string]string{"answer": answer})
})

http.ListenAndServe(":8080", mux)

Drop-in OpenAI-compatible server

model, _ := inference.Load("gemma-3-1b-q4")
server := serve.NewServer(model)
http.ListenAndServe(":8080", server.Handler())

Works with any OpenAI client library — just point it at localhost:8080.

Classify text with structured output

model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

prompt := `Classify this support ticket as "billing", "technical", or "general".
Reply with only the category name.

Ticket: I can't log in to my account after changing my password.

Category:`

category, _ := model.Generate(ctx, prompt,
    inference.WithMaxTokens(4),
    inference.WithTemperature(0),
)
fmt.Println(strings.TrimSpace(category)) // technical

Generate code

model, _ := inference.Load("gemma-3-2b-q4") // Larger model for code tasks
defer model.Close()

code, _ := model.Generate(ctx,
    "Write a Go function that reverses a string. Return only the code.",
    inference.WithMaxTokens(128),
    inference.WithTemperature(0.2),
)
fmt.Println(code)

Process a batch of inputs

model, _ := inference.Load("gemma-3-1b-q4")
defer model.Close()

questions := []string{
    "What is 2+2?",
    "Name the largest ocean.",
    "Who wrote Hamlet?",
}

for _, q := range questions {
    answer, _ := model.Generate(ctx, q, inference.WithMaxTokens(32))
    fmt.Printf("Q: %s\nA: %s\n\n", q, answer)
}

Recommended Models

Model	Size	RAM	Best for
`gemma-3-1b-q4`	~700 MB	2 GB	Laptops, CI, edge devices, quick tasks
`gemma-3-2b-q4`	~1.5 GB	4 GB	Code generation, longer reasoning
`llama-3-8b-q4`	~4.5 GB	8 GB	Complex tasks, higher quality output

All models run on CPU out of the box. Add inference.WithDevice("cuda") for GPU acceleration.

Supported Architectures

Gemma 3, LLaMA 3, Mistral, Qwen 2.5, DeepSeek, Phi-4 — in GGUF or ZMF format, with F32 or Q4_0 quantization.

CLI

go install github.com/zerfoo/zerfoo/cmd/zerfoo@latest

zerfoo run gemma-3-1b-q4             # Interactive chat
zerfoo serve gemma-3-1b-q4           # OpenAI-compatible API on :8080
zerfoo predict -model gemma-3-1b-q4  # Batch inference from CSV/JSON

Performance

Metric	Value
Gemma 3 2B Q4 CPU (ARM64)	3.60 tok/s
CUDA Q4 GEMM (GB10)	2,383 GFLOPS
Q4 model compression	3.7x smaller than F32
PagedAttention savings	46% less memory

How It Works

Zerfoo is a full ML framework written in Go — tensors, computation graphs, automatic differentiation, SIMD kernels, and CUDA support. The inference package wraps all of that into the simple API shown above.

Under the hood, inference.Load does:

Downloads the model (or loads from cache)
Memory-maps the weights (zero-copy, no heap allocation)
Builds a static computation graph with optimized fused kernels
Returns a *Model ready for generation

inference.Load("gemma-3-1b-q4")
    │
    ├── model/gguf    → Parse GGUF file, load Q4 weights
    ├── graph/        → Build computation DAG, fold transposes
    ├── compute/      → CPU engine with NEON/AVX2 SIMD, fused RMSNorm/RoPE
    └── generate/     → Autoregressive decode with PagedKV cache

Building with CUDA

# CPU only (default, no Cgo)
go build ./cmd/zerfoo

# With GPU support
CGO_CFLAGS='-I/usr/local/cuda/include' \
CGO_LDFLAGS='-L/usr/local/cuda/lib64' \
go build -tags cuda ./cmd/zerfoo

Project Structure

inference/       Load models and generate text (start here)
serve/           OpenAI-compatible HTTP server
compute/         Engine interface (34 ops), CPU and CUDA backends
graph/           Computation DAG with automatic differentiation
layers/          40+ layer types (attention, normalization, activations)
tensor/          N-dimensional arrays with Q4/Q8 quantized storage
generate/        Token sampling, speculative decoding, PagedKV cache
model/           ZMF and GGUF model format loaders
training/        SGD, Adam, AdamW optimizers and training loops
internal/cuda/   CUDA kernels (Q4 GEMM, Flash Attention)
internal/xblas/  NEON/AVX2 SIMD matrix multiply
distributed/     gRPC-based distributed training

Contributing

See docs/design.md for architecture decisions.

go test ./... -race -timeout 120s
golangci-lint run ./...

License

Apache 2.0

Documentation ¶

Overview ¶

Package zerfoo provides the core building blocks for creating and training neural networks. It offers a prelude of commonly used types to simplify development and enhance readability of model construction code.

Index ¶

func BuildFromZMF[T tensor.Numeric](engine compute.Engine[T], ops numeric.Arithmetic[T], m *zmf.Model, ...) (*graph.Graph[T], error)
func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]
func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]
func NewDefaultTrainer[T tensor.Numeric](g *graph.Graph[T], lossNode graph.Node[T], opt optimizer.Optimizer[T], ...) *training.DefaultTrainer[T]
func NewFloat32Ops() numeric.Arithmetic[float32]
func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]
func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]
func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, ...) (*normalization.RMSNorm[T], error)
func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)
func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])
func UnregisterLayer(opType string)
type Batch
type Engine
type Graph
type LayerBuilder
type Node
type Numeric
type Parameter
type Tensor
type ZMFModel

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BuildFromZMF ¶ added in v0.3.0

func BuildFromZMF[T tensor.Numeric](
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	m *zmf.Model,
	opts ...model.BuildOption,
) (*graph.Graph[T], error)

BuildFromZMF builds a graph from a ZMF model.

func NewAdamW ¶

func NewAdamW[T tensor.Numeric](learningRate, beta1, beta2, epsilon, weightDecay T) *optimizer.AdamW[T]

NewAdamW creates a new AdamW optimizer.

func NewCPUEngine ¶

func NewCPUEngine[T tensor.Numeric]() compute.Engine[T]

NewCPUEngine creates a new CPU engine for the given numeric type.

func NewDefaultTrainer ¶

func NewDefaultTrainer[T tensor.Numeric](
	g *graph.Graph[T],
	lossNode graph.Node[T],
	opt optimizer.Optimizer[T],
	strategy training.GradientStrategy[T],
) *training.DefaultTrainer[T]

NewDefaultTrainer creates a new default trainer.

func NewFloat32Ops ¶

func NewFloat32Ops() numeric.Arithmetic[float32]

NewFloat32Ops returns the float32 arithmetic operations.

func NewGraph ¶

func NewGraph[T tensor.Numeric](engine compute.Engine[T]) *graph.Builder[T]

NewGraph creates a new computation graph.

func NewMSE ¶

func NewMSE[T tensor.Numeric](engine compute.Engine[T]) *loss.MSE[T]

NewMSE creates a new Mean Squared Error loss function.

func NewRMSNorm ¶

func NewRMSNorm[T tensor.Numeric](name string, engine compute.Engine[T], ops numeric.Arithmetic[T], modelDim int, options ...normalization.RMSNormOption[T]) (*normalization.RMSNorm[T], error)

NewRMSNorm is a factory function for creating RMSNorm layers.

func NewTensor ¶

func NewTensor[T tensor.Numeric](shape []int, data []T) (*tensor.TensorNumeric[T], error)

NewTensor creates a new tensor with the given shape and data.

func RegisterLayer ¶

func RegisterLayer[T tensor.Numeric](opType string, builder model.LayerBuilder[T])

RegisterLayer registers a new layer builder.

func UnregisterLayer ¶

func UnregisterLayer(opType string)

UnregisterLayer unregisters a layer builder.

Types ¶

type Batch ¶

type Batch[T tensor.Numeric] struct {
	Inputs  map[graph.Node[T]]*tensor.TensorNumeric[T]
	Targets *tensor.TensorNumeric[T]
}

Batch represents a training batch.

type Engine ¶

type Engine[T tensor.Numeric] interface {
	compute.Engine[T]
}

Engine represents a computation engine (e.g., CPU).

type Graph ¶

type Graph[T tensor.Numeric] struct {
	*graph.Graph[T]
}

Graph represents a computation graph.

type LayerBuilder ¶

type LayerBuilder[T tensor.Numeric] func(
	engine compute.Engine[T],
	ops numeric.Arithmetic[T],
	name string,
	params map[string]*graph.Parameter[T],
	attributes map[string]interface{},
) (graph.Node[T], error)

LayerBuilder is a function that builds a layer.

type Node ¶

type Node[T tensor.Numeric] interface {
	graph.Node[T]
}

Node represents a node in the computation graph.

type Numeric ¶

type Numeric tensor.Numeric

Numeric represents a numeric type constraint.

type Parameter ¶

type Parameter[T tensor.Numeric] struct {
	*graph.Parameter[T]
}

Parameter represents a trainable parameter in the model.

type Tensor ¶

type Tensor[T tensor.Numeric] struct {
	*tensor.TensorNumeric[T]
}

Tensor represents a multi-dimensional array.

type ZMFModel ¶ added in v0.3.0

type ZMFModel = zmf.Model

Model is a ZMF model.

Source Files ¶

View all Source files

zerfoo.go

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
cmd
bench-compare command Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.	Command bench-compare compares two NDJSON benchmark result files and outputs a markdown regression report.
bench_tps command bench_tps measures tokens-per-second for a local ZMF model.	bench_tps measures tokens-per-second for a local ZMF model.
cli Package cli provides a generic command-line interface framework for Zerfoo.	Package cli provides a generic command-line interface framework for Zerfoo.
coverage-gate command Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.	Command coverage-gate reads a Go coverage profile and fails if any testable package drops below the configured coverage threshold.
debug-infer command
zerfoo command
zerfoo-predict command
zerfoo-tokenize command
compute Package compute implements tensor computation engines and operations.	Package compute implements tensor computation engines and operations.
config Package config provides file-based configuration loading with validation and environment variable overrides.	Package config provides file-based configuration loading with validation and environment variable overrides.
data
device Package device provides device abstraction and memory allocation interfaces.	Package device provides device abstraction and memory allocation interfaces.
distributed Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.	Package distributed provides distributed training strategies and coordination mechanisms for multi-node machine learning workloads in the Zerfoo framework.
coordinator Package coordinator provides a distributed training coordinator.	Package coordinator provides a distributed training coordinator.
pb
features
generate
graph Package graph provides a computational graph abstraction.	Package graph provides a computational graph abstraction.
health Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.	Package health provides HTTP health check endpoints for Kubernetes-style liveness and readiness probes.
inference Package inference provides a high-level API for loading models and generating text with minimal boilerplate.	Package inference provides a high-level API for loading models and generating text with minimal boilerplate.
internal
clblast Package clblast provides Go wrappers for the CLBlast BLAS library.	Package clblast provides Go wrappers for the CLBlast BLAS library.
codegen Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.	Package codegen generates CUDA megakernel source code from a compiled ExecutionPlan instruction tape.
cublas Package cublas provides low-level purego bindings for the cuBLAS library.	Package cublas provides low-level purego bindings for the cuBLAS library.
cuda Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).	Package cuda provides low-level bindings for the CUDA runtime API using dlopen/dlsym (no CGo).
cuda/kernels Package kernels provides Go wrappers for custom CUDA kernels.	Package kernels provides Go wrappers for custom CUDA kernels.
cudnn Package cudnn provides purego bindings for the NVIDIA cuDNN library.	Package cudnn provides purego bindings for the NVIDIA cuDNN library.
gpuapi Package gpuapi defines internal interfaces for GPU runtime operations.	Package gpuapi defines internal interfaces for GPU runtime operations.
hip Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.	Package hip provides low-level bindings for the AMD HIP runtime API using purego dlopen.
hip/kernels Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.	Package kernels provides Go wrappers for custom HIP kernels via purego dlopen.
miopen Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.	Package miopen provides low-level bindings for the AMD MIOpen library using purego dlopen.
nccl Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).	Package nccl provides CGo bindings for the NVIDIA Collective Communications Library (NCCL).
opencl Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.	Package opencl provides Go wrappers for the OpenCL 2.0 runtime API.
opencl/kernels Package kernels provides OpenCL kernel source and dispatch for elementwise operations.	Package kernels provides OpenCL kernel source and dispatch for elementwise operations.
rocblas Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.	Package rocblas provides low-level bindings for the AMD rocBLAS library using purego dlopen.
tensorrt Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).	Package tensorrt provides bindings for the NVIDIA TensorRT inference library via purego (dlopen/dlsym, no CGo).
workerpool Package workerpool provides a persistent pool of goroutines that process submitted tasks.	Package workerpool provides a persistent pool of goroutines that process submitted tasks.
xblas
layers
activations Package activations provides activation function layers.	Package activations provides activation function layers.
attention Package attention provides attention mechanisms for neural networks.	Package attention provides attention mechanisms for neural networks.
components Package components provides reusable components for neural network layers.	Package components provides reusable components for neural network layers.
core Package core provides core neural network layer implementations.	Package core provides core neural network layer implementations.
embeddings Package embeddings provides neural network embedding layers for the Zerfoo ML framework.	Package embeddings provides neural network embedding layers for the Zerfoo ML framework.
features
gather Package gather provides the Gather layer for the Zerfoo ML framework.	Package gather provides the Gather layer for the Zerfoo ML framework.
hrm Package hrm implements the Hierarchical Reasoning Model.	Package hrm implements the Hierarchical Reasoning Model.
normalization Package normalization provides various normalization layers for neural networks.	Package normalization provides various normalization layers for neural networks.
recurrent
reducesum Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.	Package reducesum provides the ReduceSum layer for the Zerfoo ML framework.
registry Package registry provides a central registration point for all layer builders.	Package registry provides a central registration point for all layer builders.
regularization Package regularization provides regularization layers for neural networks.	Package regularization provides regularization layers for neural networks.
sequence Package sequence provides sequence modeling layers such as State Space Models.	Package sequence provides sequence modeling layers such as State Space Models.
tokenizers
transformer Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.	Package transformer provides transformer building blocks such as the Transformer `Block` used in encoder/decoder stacks.
transpose Package transpose provides the Transpose layer for the Zerfoo ML framework.	Package transpose provides the Transpose layer for the Zerfoo ML framework.
log Package log provides a structured, leveled logging abstraction.	Package log provides a structured, leveled logging abstraction.
metrics
runtime Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.	Package runtime provides a backend-agnostic metrics collection abstraction for runtime observability.
model Package model provides adapter implementations for bridging existing and new model interfaces.	Package model provides adapter implementations for bridging existing and new model interfaces.
gguf Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.	Package gguf implements a pure-Go parser for the GGUF v3 model format used by llama.cpp.
hrm Package hrm provides experimental Hierarchical Reasoning Model types.	Package hrm provides experimental Hierarchical Reasoning Model types.
numeric Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.	Package numeric provides precision types, arithmetic operations, and generic constraints for the Zerfoo ML framework.
pkg
tokenizer Package tokenizer provides text tokenization for ML model inference.	Package tokenizer provides text tokenization for ML model inference.
registry
serve Package serve provides an OpenAI-compatible HTTP API server for model inference.	Package serve provides an OpenAI-compatible HTTP API server for model inference.
shutdown Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.	Package shutdown provides orderly shutdown coordination using context cancellation and cleanup callbacks.
tensor Package tensor provides a multi-dimensional array (tensor) implementation.	Package tensor provides a multi-dimensional array (tensor) implementation.
testing
testutils Package testutils provides testing utilities and mock implementations for the Zerfoo ML framework.	Package testutils provides testing utilities and mock implementations for the Zerfoo ML framework.
tests
internal/testutil
training Package training provides adapter implementations for bridging existing and new interfaces.	Package training provides adapter implementations for bridging existing and new interfaces.
loss Package loss provides various loss functions for neural networks.	Package loss provides various loss functions for neural networks.
optimizer Package optimizer provides various optimization algorithms for neural networks.	Package optimizer provides various optimization algorithms for neural networks.
types Package types contains shared, fundamental types for the Zerfoo framework.	Package types contains shared, fundamental types for the Zerfoo framework.