LOOM Neural Engine
Deterministic, CPU-first neural runtime with WebGPU acceleration. One JSON model format across Go, TypeScript/JavaScript, Python, C#, C-ABI, and WASM. Identical outputs across every platform.
What's New in v0.0.8
The "NeuroSymbolic" update brings recursive clustering, transformer components, and native MoE routing.
New KMeansLayer enables differentiable clustering and symbolic reasoning within the neural
graph.
Native support for Multi-Head Attention (MHA), AdaNorm, GELU, and advanced Softmax variants.
Native Mixture-of-Experts routing using Grid Softmax and Grid Scatter modes.
Geometric weight interpolation combined with backpropagation for robust adaptation.
Full support for saving and loading complex, nested architectures using the Safetensors format.
Comprehensive benchmarking for int8, uint16, float64, and more.
GPU Acceleration Update: As of v0.0.8, WebGPU acceleration is enabled for standard
Forward/Backward.
Note that Step-based execution, Neural Tweening, and K-Means currently
run on CPU only.
Beyond Traditional Networks
Loom introduces revolutionary concepts that break free from sequential bottlenecks, enabling parallel processing and spatial organization.
The Bottleneck
Traditional neural networks think sequentially—like cars at a red light. Every layer must lock, process, and wait. If one layer stalls, the entire intelligence grinds to a halt.
The Roundabout (Stepping Mode)
Loom replaces the red light with a roundabout. In Stepping Mode, data flow is continuous—early layers ingest new data while deeper layers think about previous context. Training and inference happen simultaneously.
The 3D Grid (Zig-Zag)
Traditional models are flat stacks of pancakes. Loom models live in a 3D grid. Signals don't just go up—they travel through space, zig-zagging through rows and columns for spatial organization.
Infinite Connections (Starburst)
With Parallel Linking, any part of the brain can talk to any other part instantly. A central hub broadcasts simultaneously to multiple corners—skipping the line entirely.
What Makes LOOM Different
Key points from the LOOM capability report: deterministic, cross-language, CPU-first.
Bit-for-bit parity across Go, Python, TypeScript/JS, C#, C, and WASM. CPU-first with optional WebGPU acceleration.
Same function names across languages: create, forward, train, save/load, evaluate. One JSON model format.
Dense, Conv2D, Multi-Head Attention, RNN/LSTM, LayerNorm, Residual, RMSNorm, SwiGLU, Softmax (10 variants, MoE).
Fine-grained forward/backward with manual gradient application for online/real-time learning scenarios.
Code Examples
import { createNetworkFromJSON, forward } from "@openfluke/welvet";
const model = {
layers: [
{ type: "dense", width: 4, height: 8, activation: "relu" },
{ type: "dense", width: 8, height: 2, activation: "softmax" },
],
};
const net = createNetworkFromJSON(model);
const output = forward(net, [[0.1, 0.2, 0.3, 0.4]]);
console.log("output", output);
from welvet import create_network_from_json, forward
config = {
"layers": [
{"type": "dense", "width": 4, "height": 8, "activation": "relu"},
{"type": "dense", "width": 8, "height": 2, "activation": "softmax"},
]
}
net = create_network_from_json(config)
print(forward(net, [[0.1, 0.5, 0.3, 0.7]]))
package main
import (
"fmt"
"github.com/openfluke/loom"
)
func main() {
layers := []loom.LayerConfig{
{Type: "dense", Width: 4, Height: 8, Activation: "relu"},
{Type: "dense", Width: 8, Height: 2, Activation: "softmax"},
}
net, _ := loom.BuildNetworkFromJSON(layers)
out := loom.ForwardCPU(net, [][]float32{{0.1, 0.2, 0.3, 0.4}})
fmt.Println(out)
}
Unique Advantages
What sets Loom apart from traditional runtimes—built for embedding, designed for portability.
Compiles into a single binary with zero external dependencies. No Python runtime, no C++ bridges—just deploy and run.
First-class C ABI and WebAssembly support. Train and infer in browsers, Python, C#, Rust, and Node.js with identical behavior.
"Neural Tweening" combines geometric gap-closing with backprop-guided momentum. Features Link Budget telemetry and Explosion Detection for self-healing training.
LayerParallel system supports arbitrary branching with Concat, Add, Average, Grid Scatter, and Softmax-Gated MoE. Native Inception, ResNeXt, and Siamese architectures.
Generic tensor backend supports int8, uint16, and float32 natively. Quantization-aware training without post-processing wrappers.
Pure Go BPE implementation compatible with HuggingFace tokenizer.json files. No Rust or C++ dependencies required.
7 LR schedulers, 3 optimizers (SGD/AdamW/RMSprop), 10 softmax variants for MoE routing, and 5 activation functions with proper derivatives.
Runtime reflection via GetMethodsJSON(), ExtractNetworkBlueprint() for visualizing structure, and complete evaluation suite with deviation metrics.
Known Limitations: No central Model Zoo (relies on external checkpoints), WebGPU acceleration is beta/experimental, and broad operator coverage (3D Conv, Deformable Attention, FFTs) is limited compared to SciPy/JAX.
The AI Landscape
See how Loom compares to major industry engines and the Go ecosystem.
| Feature | Loom (Go) | PyTorch | TensorFlow | GoMLX | Spago | TF.js | Candle |
|---|---|---|---|---|---|---|---|
| Core | |||||||
| Runtime Dependency | None (Binary) | Heavy (Pip) | Binary | CGo/XLA | None | Browser | None |
| Auto-Differentiation | ⚠️ Hybrid | ✅ Full | ✅ Full | ✅ Full | ⚠️ Manual | ✅ Full | ✅ Full |
| Loading & Format | |||||||
| Safetensors | ✅ Native | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Structure Inference | ✅ Auto-Detect | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Training | |||||||
| Neural Tweening | ✅ Hybrid Engine | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| LR Schedulers | ✅ 7 Types | ✅ | ✅ | ✅ | ⚠️ Basic | ✅ | ✅ |
| Layers | |||||||
| Parallel / MoE | ✅ Structural | ⚠️ Manual | ⚠️ Manual | ❌ | ❌ | ❌ | ❌ |
| SwiGLU | ✅ Native | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
| Pure Go Tokenizer | ✅ | Rust/C++ | C++ | ❌ | ❌ | ❌ | ✅ |
| Platform | |||||||
| WASM Training | ✅ Full | ❌ | ❌ | ❌ | ❌ | ⚠️ Slow | ✅ |
| Cross-Lang C-ABI | ✅ Universal | ❌ | ❌ | ❌ | ❌ | ❌ | ⚠️ |
| Advanced | |||||||
| Step-Based Forward | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Dynamic Arch Gen | ✅ Built-in | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Network Grafting | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Feature | Loom | GoMLX | Gorgonia | Spago | Go-Deep | Gonum |
|---|---|---|---|---|---|---|
| Foundation | ||||||
| Implementation | Pure Go | CGo (XLA) | Pure Go + CGo | Pure Go | Pure Go | Pure Go |
| Autograd | ⚠️ Hybrid | ✅ Full | ✅ Symbolic | ✅ Dynamic | ✅ Backprop | ❌ |
| Model Loading | ||||||
| Safetensors | ✅ Native | ✅ | ❌ | ❌ | ❌ | ❌ |
| Architectures | ||||||
| Transformer (MHA) | ✅ Explicit | ✅ | ⚠️ Hard | ✅ (BERT) | ❌ | ❌ |
| RNN / LSTM | ✅ Full Gate | ✅ | ⚠️ Basic | ✅ BiLSTM | ❌ | ❌ |
| SwiGLU | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
| Parallel / MoE | ✅ Structural | ⚠️ Manual | ❌ | ❌ | ❌ | ❌ |
| Training | ||||||
| Hybrid Tweening | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ |
| Softmax Variants | ✅ 10 Types | ⚠️ Standard | ⚠️ Standard | ⚠️ Standard | ⚠️ Standard | ❌ |
| Advanced | ||||||
| RoPE (GQA) | ✅ GQA Support | ✅ | ❌ | ❌ | ❌ | ❌ |
| Network Grafting | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ |
| Step-Based Forward | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ |
| Dynamic Arch Gen | ✅ Unique | ❌ | ❌ | ❌ | ❌ | ❌ |
| Platform | ||||||
| C-ABI (Polyglot) | ✅ Universal | ❌ | ❌ | ❌ | ❌ | ❌ |
| WASM Training | ✅ Full | ❌ (XLA) | ❌ | ❌ | ❌ | ❌ |
| Ecosystem | ||||||
| Maintenance | 🔥 Active | 🔥 Active | ⚠️ Slow | ⏸️ Paused | ⚠️ Slow | 🔥 Active |
Native Numerical Type Support
Train and infer on any numerical type without wrappers or post-processing. Most runtimes require QAT (Quantization-Aware Training)—a multi-step process where you train in float32, then simulate lower precision during fine-tuning, then convert to int8 for deployment. Loom skips this entirely: define your types upfront and train natively on int8, uint16, or any supported type from the start.
| Layer Type / Numerical Type | Loom | GoMLX | Gorgonia | Spago | PyTorch |
|---|---|---|---|---|---|
| Float32 (Standard) | ✅ | ✅ | ✅ | ✅ (Float64) | ✅ |
| Float64 (High Precision) | ✅ Native | ✅ | ✅ | ✅ | ✅ |
| Float16 / BF16 | ✅ | ✅ (XLA) | ❌ | ❌ | ✅ |
| Int8 Training | ✅ Native | ❌ | ❌ | ❌ | ⚠️ QAT Wrapper |
| Int8 Inference | ✅ | ❌ | ❌ | ❌ | ✅ (Quant) |
| Int16, Int32, Int64 | ✅ Native | ✅ (XLA) | ⚠️ Tensor | ❌ | ❌ Tensor Only |
| Uint8, Uint16, Uint32 | ✅ Native | ✅ (XLA) | ⚠️ Tensor | ❌ | ✅ Uint8 Only |
Complete Type System
Unlike runtimes that treat integers primarily as storage formats for quantization,
Loom's Generics allow native training and inference on exotic types like
uint16 (common in medical imaging), int32, or float64
(scientific simulations) across every layer type without changes to model code.
When to Choose Each Engine
The right tool depends on your use case. Here's a quick decision guide.
For research, SOTA models, or complex dynamic architectures requiring the Python ecosystem.
For robust mobile/edge deployment with a mature toolchain and optimized inference.
For high-performance training in Go when you can tolerate CGo and XLA C++ dependencies.
For iOS/macOS exclusive apps leveraging Apple's Neural Engine and native integration.
For pure Go-native embedding (cloud/CLI/server), zero-dependency single binaries, Neural Tweening experimentation, Step-Based Forward for real-time inference, or Dynamic Architecture Generation for automated model exploration.