Home/Documentation/Understanding Model Serialization

Understanding Model Serialization

This guide explains how Loom saves and loads neural network models—what actually gets stored, how the formats work, and how to use serialization for different deployment scenarios.


What Gets Saved?

When you save a model, you're capturing:

  1. Architecture: The structure of the network (grid size, layer types, configuration)
  2. Weights: The learned parameters (millions of floating point numbers)
  3. Metadata: Version info, model ID, creation time
Text
Saved Model File
┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Architecture                                                   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ Grid: 2×3, LayersPerCell: 2                               │  │
│  │                                                           │  │
│  │ Layer[0,0,0]: Dense, 1024→512, ReLU                       │  │
│  │ Layer[0,0,1]: Dense, 512→256, ReLU                        │  │
│  │ Layer[0,1,0]: Attention, heads=8, dim=256                 │  │
│  │ ...                                                       │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
│  Weights (encoded)                                              │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ Format: Base64                                            │  │
│  │ Data: "eyJ0eXBlIjoiZmxvYXQzMi1hcnJheSIsImxl..."           │  │
│  │       (millions of numbers compressed to text)            │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
│  Metadata                                                       │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ ID: "my-classifier-v1"                                    │  │
│  │ Type: "modelhost/bundle"                                  │  │
│  │ Version: 1                                                │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

The Bundle Format

Loom uses a "bundle" format that can contain multiple models. This is useful for: - Encoder-decoder pairs - Ensemble models - Different versions of the same model

Json
{
  "type": "modelhost/bundle",
  "version": 1,
  "models": [
    {
      "id": "encoder",
      "cfg": { ... architecture ... },
      "weights": { ... encoded weights ... }
    },
    {
      "id": "decoder",
      "cfg": { ... architecture ... },
      "weights": { ... encoded weights ... }
    }
  ]
}

Even single models use this format (just with one entry in the models array). This keeps the format consistent.


How Weights Are Encoded

The challenging part is encoding millions of floating-point numbers efficiently. Here's what happens:

Step 1: Binary Conversion

Float32 values are converted to their binary representation:

Text
Float32: 3.14159...

In memory (IEEE 754):
    Sign:     0 (positive)
    Exponent: 10000000 (128, meaning 2^1)
    Mantissa: 10010010000111111011011

Binary bytes: [0x40, 0x49, 0x0F, 0xDB]

Step 2: Base64 Encoding

Binary data is converted to text using Base64 (only uses safe ASCII characters):

Text
Binary bytes:  [0x40, 0x49, 0x0F, 0xDB, ...]
                         ↓
Base64 string: "QEkP2w=="   (roughly 33% larger than binary)

Why Base64?

JSON can't directly contain binary data (binary bytes might include special characters). Base64 ensures the weights are safe to embed in JSON:

Text
Direct binary in JSON: BROKEN
    {"weights": "AB\x00CD\xFF..."}
              ↑              ↑
         Null byte       Invalid UTF-8
         breaks JSON     breaks JSON

Base64 in JSON: SAFE
    {"weights": "QEkP2w/rABC..."}
                 ↑
    Only letters, numbers, +, /, =

File-Based Serialization

Saving a Single Model

Go
err := network.SaveModel("model.json", "my-classifier")

What happens: 1. Network traverses all layers 2. For each layer: records type, size, activation 3. For each weight matrix: converts to bytes, then Base64 4. Writes JSON to file

Text
Network in memory                    model.json on disk
┌─────────────────┐                 ┌─────────────────────┐
│ Grid: 2×2       │────────────────▶│ {                   │
│ Layers: [...]   │   SaveModel()   │   "type": "...",    │
│ Weights: [...]  │                 │   "models": [...]   │
└─────────────────┘                 │ }                   │
                                    └─────────────────────┘

Loading a Model

Go
network, err := nn.LoadModel("model.json", "my-classifier")

What happens: 1. Reads JSON from file 2. Parses architecture configuration 3. Creates empty network with correct structure 4. Decodes Base64 weights back to floats 5. Populates network with weights

Text
model.json                          Network in memory
┌─────────────────────┐            ┌─────────────────┐
│ {                   │            │ Grid: 2×2       │
│   "models": [{      │──────────▶ │ Layers: [Dense, │
│     "cfg": {...},   │ LoadModel()│          Softmax]│
│     "weights": "..."│            │ Weights: [1.2,  │
│   }]                │            │          -0.5,  │
│ }                   │            │          ...]   │
└─────────────────────┘            └─────────────────┘

JSON Configuration (Advanced)

Loom allows you to define complex architectures directly in JSON using nn.BuildNetworkFromJSON. This is particularly useful for recursive and parallel structures.

KMeans Layer Fields

Field Type Description
type string Must be "kmeans"
num_clusters int Number of centroids (K)
kmeans_output_mode string "probabilities", "features", or "reconstruction"
attached_layer object Recursive: A full LayerDefinition for the internal sub-network

Example:

Json
{
  "type": "kmeans",
  "num_clusters": 8,
  "attached_layer": {
    "type": "dense", "input_size": 16, "output_size": 16, "activation": "tanh"
  }
}

Parallel Layer (Gated MoE) Fields

Field Type Description
type string Must be "parallel"
branches array List of LayerDefinition objects for each expert
combine_mode string "concat", "add", "avg", "filter", or "grid_scatter"
filter_gate object Gating: A LayerDefinition for the gate network (if mode is "filter")
filter_softmax string Gating normalization: "standard", "sparsemax", etc.
filter_temperature float Softness/sharpness of routing (default: 1.0)

Example:

Json
{
  "type": "parallel",
  "combine_mode": "filter",
  "filter_softmax": "standard",
  "filter_gate": { "type": "dense", "input_size": 16, "output_size": 2 },
  "branches": [
    { "type": "dense", "output_size": 8 },
    { "type": "kmeans", "num_clusters": 4 }
  ]
}

String-Based Serialization

Sometimes you don't have a file system—for example, in WebAssembly or when sending models over a network.

Saving to String

Go
jsonString, err := network.SaveModelToString("my-classifier")
// jsonString is now a complete JSON representation

The output is exactly the same as file-based, but returned as a string instead of written to disk.

Loading from String

Go
network, err := nn.LoadModelFromString(jsonString, "my-classifier")

Use Cases

Text
File-based:                         String-based:
┌─────────────┐                     ┌─────────────┐
│ Desktop app │                     │ Browser     │
│ Server      │                     │ WebAssembly │
│ CLI tools   │                     │ REST API    │
│ Notebooks   │                     │ Database    │
└─────────────┘                     │ Serverless  │
                                    │ Mobile      │
                                    └─────────────┘

File-based works when you have                String-based works when
disk access and want persistence.             you need to move models
                                              around without files.

Precision Options

Loom supports the full spectrum of 13 Safetensors DTypes, ranging from double-precision to 4-bit quantization:

Text
Precision     Size    Range                 Use case
─────────────────────────────────────────────────────────────────
float64 (F64) 8 bytes ±10^308               High-precision research
float32 (F32) 4 bytes ±10^38                Standard training
float16 (F16) 2 bytes ±65504                Standard inference, GPU
bfloat16 (BF16) 2 bytes ±10^38               Modern LLM inference
float4 (F4)   0.5 bytes [0.25, 3.0]         High-compression (8x vs F32)
int64 (I64)   8 bytes ±9 quintillion        Large integer networks
int32 (I32)   4 bytes ±2 billion            Standard integer networks
int16 (I16)   2 bytes ±32767                Quantized models
int8 (I8)     1 byte  ±127                  Edge devices, mobile
uint64 (U64)  8 bytes 0 to 18 quintillion   Unsigned offsets
uint32 (U32)  4 bytes 0 to 4 billion        Unsigned indices
uint16 (U16)  2 bytes 0 to 65535            Unsigned textures/images
uint8 (U8)    1 byte  0 to 255              Standard image data

FP4 (4-bit Float) Support

Loom implements the E2M1 (1 sign bit, 2 exponent bits, 1 mantissa bit) format for extreme model compression. By utilizing a "Shift-and-Scale" quantization strategy, it can maintain >99% quality while using only 0.5 bytes per parameter.

How It Works

Text
Original weights (float32):     [0.523, -0.127, 0.891, ...]
                                   ↓
                              Quantize to int8
                                   ↓
Quantized (int8):              [67, -16, 114, ...]
                                + scale factor: 0.00785
                                + zero point: 0
                                   ↓
                              Encode to file
                                   ↓
On disk: compact int8 representation (4× smaller!)
                                   ↓
                              Decode on load
                                   ↓
Restored float32:              [0.526, -0.126, 0.895, ...]
                                (small precision loss)

Size Comparison

Text
1 million weights:

float64:    8 MB
float32:    4 MB  ← Standard
float16:    2 MB  ← GPU inference
int8:       1 MB  ← Edge/mobile

For a 7B parameter model:
    float32: 28 GB
    int8:    7 GB   ← Actually deployable on consumer hardware!

Loading External Models: SafeTensors

HuggingFace models are often stored in "SafeTensors" format. Loom can load these directly.

What is SafeTensors?

SafeTensors is a simple format for storing tensors:

Text
SafeTensors File Structure:

┌───────────────────────────────────────────────────────┐
│ Header (JSON)                                         │
│ ┌───────────────────────────────────────────────────┐ │
│ │ {                                                 │ │
│ │   "model.embed_tokens.weight": {                  │ │
│ │     "dtype": "F16",                               │ │
│ │     "shape": [151552, 576],                       │ │
│ │     "data_offsets": [0, 174635008]                │ │
│ │   },                                              │ │
│ │   "model.layers.0.self_attn.q_proj.weight": {     │ │
│ │     "dtype": "F16",                               │ │
│ │     "shape": [576, 576],                          │ │
│ │     "data_offsets": [174635008, 174967424]        │ │
│ │   },                                              │ │
│ │   ...                                             │ │
│ │ }                                                 │ │
│ └───────────────────────────────────────────────────┘ │
├───────────────────────────────────────────────────────┤
│ Binary Data                                           │
│ ┌───────────────────────────────────────────────────┐ │
│ │ [raw bytes for embed_tokens.weight]               │ │
│ │ [raw bytes for q_proj.weight]                     │ │
│ │ [raw bytes for k_proj.weight]                     │ │
│ │ ...                                               │ │
│ └───────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────┘

Loading & Saving Network Weights

While LoadSafeTensors handles raw tensors, the Network object provides high-level methods to directly save and load model weights using the Safetensors format.

Go
// 1. Save network weights with specific precision
// Supported: "F32", "F64", "F16", "BF16", "F4", "I8", etc.
err := network.SaveWeightsToSafetensors("model.safetensors")

// 2. Load weights back into an existing network architecture
err := network.LoadWeightsFromSafetensors("model.safetensors")

These methods are the preferred way to handle model persistence in Loom, as they handle byte conversion and layer mapping automatically.

Data Type Handling

SafeTensors may store weights in float16 or bfloat16. Loom automatically converts:

Text
File contains:     float16 (2 bytes each)
                        ↓
                   Auto-convert
                        ↓
In memory:         float32 (4 bytes each)

You don't need to worry about the conversion!

Generic Model Loading

What if you have an unknown model format? Loom can auto-detect:

Go
network, detected, err := nn.LoadGenericFromBytes(weightsData, configData)

The Detection Process

Text
Input: mystery safetensors file
              │
              ▼
┌─────────────────────────────────────────────┐
│ 1. Parse safetensors header                 │
│    • Extract tensor names and shapes        │
└─────────────────────┬───────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────┐
│ 2. Analyze tensor names                     │
│    • "model.layers.0.self_attn.q_proj"     │
│      → Looks like attention                 │
│    • "model.embed_tokens"                   │
│      → Looks like embedding                 │
│    • "model.layers.0.mlp.gate_proj"        │
│      → Looks like SwiGLU                    │
└─────────────────────┬───────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────┐
│ 3. Build network architecture               │
│    • Create layers matching detected types  │
│    • Wire them together appropriately       │
│    • Load weights into correct layers       │
└─────────────────────────────────────────────┘
                      │
                      ▼
           Ready-to-use Network!

The detected Return Value

The function returns detected tensor info for inspection:

Go
for _, t := range detected {
    fmt.Printf("%s: %v (%s)\n", t.Name, t.Shape, t.Type)
}

// Output:
// model.embed_tokens.weight: [151552, 576] (Embedding)
// model.layers.0.self_attn.q_proj.weight: [576, 576] (Attention)
// model.layers.0.self_attn.k_proj.weight: [192, 576] (Attention)
// ...

Transformer-Specific Loading

For Llama-style transformers, there's a specialized loader:

Go
network, err := nn.LoadTransformerFromSafetensors("./models/llama-7b/")

What It Understands

Text
Llama Architecture Pattern:

model.embed_tokens.weight           → Embedding layer
model.norm.weight                   → Final RMSNorm
lm_head.weight                      → Output projection

For each of N layers:
    model.layers.{i}.input_layernorm.weight    → Pre-attention norm
    model.layers.{i}.self_attn.q_proj.weight   → Query projection
    model.layers.{i}.self_attn.k_proj.weight   → Key projection
    model.layers.{i}.self_attn.v_proj.weight   → Value projection
    model.layers.{i}.self_attn.o_proj.weight   → Output projection
    model.layers.{i}.post_attention_layernorm.weight  → Pre-MLP norm
    model.layers.{i}.mlp.gate_proj.weight      → SwiGLU gate
    model.layers.{i}.mlp.up_proj.weight        → SwiGLU up
    model.layers.{i}.mlp.down_proj.weight      → SwiGLU down

Supported Models

  • Llama, Llama 2, Llama 3
  • Mistral
  • Qwen2.5
  • TinyLlama
  • SmolLM
  • Any model using the Llama architecture

Cross-Platform Deployment

One of Loom's strengths is that saved models work across all platforms:

Text
Model saved in Go
        │
        ▼
   model.json
        │
        ├──────────────────────────────────────────┐
        │                    │                     │
        ▼                    ▼                     ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│     Go      │      │   Browser   │      │   Python    │
│   Native    │      │    WASM     │      │   welvet    │
└─────────────┘      └─────────────┘      └─────────────┘
        │                    │                     │
        ▼                    ▼                     ▼
┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│    C/C++    │      │  TypeScript │      │     C#      │
│   via CABI  │      │   bindings  │      │   Welvet    │
└─────────────┘      └─────────────┘      └─────────────┘

Loading in Different Languages

Go (native):

Go
network, _ := nn.LoadModel("model.json", "my_model")
output, _ := network.ForwardCPU(input)

JavaScript (WASM):

Javascript
const json = await fetch('model.json').then(r => r.text());
const network = loom.LoadNetworkFromString(json, "my_model");
const output = network.ForwardCPU(inputArray);

Python (welvet):

Python
with open("model.json") as f:
    json_str = f.read()
network = welvet.load_model_from_string(json_str, "my_model")
output = network.forward_cpu(input_array)

C (CABI):

C
char* json = read_file("model.json");
Network* net = LoomLoadModel(json, "my_model");
float* output = LoomForward(net, input, input_len);

Practical Tips

Versioning Models

Include version in the model ID:

Go
network.SaveModel("checkpoints/model_v2.1.0.json", "classifier_v2.1.0")

Checkpointing During Training

Save periodically to recover from crashes:

Go
for epoch := 0; epoch < 1000; epoch++ {
    // ... training ...

    if epoch % 100 == 0 {
        network.SaveModel(
            fmt.Sprintf("checkpoints/epoch_%04d.json", epoch),
            "training_checkpoint",
        )
    }
}

Validating Loaded Models

After loading, verify the model works:

Go
loaded, err := nn.LoadModel("model.json", "my_model")
if err != nil {
    return err
}

// Test with known input
testInput := make([]float32, 1024)
output, _ := loaded.ForwardCPU(testInput)

// Check output is reasonable (not NaN, not all zeros)
if math.IsNaN(float64(output[0])) {
    return errors.New("model produces NaN")
}

Summary

Serialization captures the complete state of a trained model: - Architecture: Layer types, sizes, configurations - Weights: Millions of learned parameters - Format: JSON with Base64-encoded binary weights

Key operations: - SaveModel / LoadModel - File-based - SaveModelToString / LoadModelFromString - String-based - LoadSafeTensors - HuggingFace format - LoadGenericFromBytes - Auto-detect format - LoadTransformerFromSafetensors - Llama-style models

The same model file works across Go, WASM, Python, C#, and C.