Next-Gen Neural Architecture

LOOM Neural Engine

Deterministic, CPU-first neural runtime with WebGPU acceleration. One JSON model format across Go, TypeScript/JavaScript, Python, C#, C-ABI, and WASM. Identical outputs across every platform.

CPU + WebGPU Bit-for-bit parity 3D Grid Architecture Open source

View on GitHub Quick Start

GitHub npm PyPI NuGet

Latest Release

What's New in v0.0.8

The "NeuroSymbolic" update brings recursive clustering, transformer components, and native MoE routing.

Recursive Neuro-Symbolic

New KMeansLayer enables differentiable clustering and symbolic reasoning within the neural graph.

Transformer Inference

Native support for Multi-Head Attention (MHA), AdaNorm, GELU, and advanced Softmax variants.

Grid Softmax (MoE)

Native Mixture-of-Experts routing using Grid Softmax and Grid Scatter modes.

Neural Tweening

Geometric weight interpolation combined with backpropagation for robust adaptation.

Recursive Safetensors

Full support for saving and loading complex, nested architectures using the Safetensors format.

Verified Numerical Types

Comprehensive benchmarking for int8, uint16, float64, and more.

GPU Acceleration Update: As of v0.0.8, WebGPU acceleration is enabled for standard Forward/Backward. Note that Step-based execution, Neural Tweening, and K-Means currently run on CPU only.

Architecture

Beyond Traditional Networks

Loom introduces revolutionary concepts that break free from sequential bottlenecks, enabling parallel processing and spatial organization.

Scene 1

The Bottleneck

Traditional neural networks think sequentially—like cars at a red light. Every layer must lock, process, and wait. If one layer stalls, the entire intelligence grinds to a halt.

Scene 2

The Roundabout (Stepping Mode)

Loom replaces the red light with a roundabout. In Stepping Mode, data flow is continuous—early layers ingest new data while deeper layers think about previous context. Training and inference happen simultaneously.

Scene 3

The 3D Grid (Zig-Zag)

Traditional models are flat stacks of pancakes. Loom models live in a 3D grid. Signals don't just go up—they travel through space, zig-zagging through rows and columns for spatial organization.

Scene 4

Infinite Connections (Starburst)

With Parallel Linking, any part of the brain can talk to any other part instantly. A central hub broadcasts simultaneously to multiple corners—skipping the line entirely.

TypeScript / Node.js

npm install @openfluke/welvet

npm GitHub

Python

pip install welvet

PyPI GitHub

Go / C-ABI

go get github.com/openfluke/loom

pkg.go.dev GitHub

C# / .NET

dotnet add package Welvet

NuGet GitHub

Capabilities

What Makes LOOM Different

Key points from the LOOM capability report: deterministic, cross-language, CPU-first.

Deterministic Everywhere

Bit-for-bit parity across Go, Python, TypeScript/JS, C#, C, and WASM. CPU-first with optional WebGPU acceleration.

Unified API Surface

Same function names across languages: create, forward, train, save/load, evaluate. One JSON model format.

Layer Coverage

Dense, Conv2D, Multi-Head Attention, RNN/LSTM, LayerNorm, Residual, RMSNorm, SwiGLU, Softmax (10 variants, MoE).

Stepping API

Fine-grained forward/backward with manual gradient application for online/real-time learning scenarios.

Quick Start

Code Examples

TypeScript

import { createNetworkFromJSON, forward } from "@openfluke/welvet";

const model = {
  layers: [
    { type: "dense", width: 4, height: 8, activation: "relu" },
    { type: "dense", width: 8, height: 2, activation: "softmax" },
  ],
};

const net = createNetworkFromJSON(model);
const output = forward(net, [[0.1, 0.2, 0.3, 0.4]]);
console.log("output", output);

Python

from welvet import create_network_from_json, forward

config = {
    "layers": [
        {"type": "dense", "width": 4, "height": 8, "activation": "relu"},
        {"type": "dense", "width": 8, "height": 2, "activation": "softmax"},
    ]
}

net = create_network_from_json(config)
print(forward(net, [[0.1, 0.5, 0.3, 0.7]]))

package main

import (
    "fmt"
    "github.com/openfluke/loom"
)

func main() {
    layers := []loom.LayerConfig{
        {Type: "dense", Width: 4, Height: 8, Activation: "relu"},
        {Type: "dense", Width: 8, Height: 2, Activation: "softmax"},
    }
    net, _ := loom.BuildNetworkFromJSON(layers)
    out := loom.ForwardCPU(net, [][]float32{{0.1, 0.2, 0.3, 0.4}})
    fmt.Println(out)
}

Key Strengths

Unique Advantages

What sets Loom apart from traditional runtimes—built for embedding, designed for portability.

True Embeddability

Compiles into a single binary with zero external dependencies. No Python runtime, no C++ bridges—just deploy and run.

Run Anywhere (Polyglot)

First-class C ABI and WebAssembly support. Train and infer in browsers, Python, C#, Rust, and Node.js with identical behavior.

Hybrid Gradient/Geometric Engine

"Neural Tweening" combines geometric gap-closing with backprop-guided momentum. Features Link Budget telemetry and Explosion Detection for self-healing training.

Structural Parallelism

LayerParallel system supports arbitrary branching with Concat, Add, Average, Grid Scatter, and Softmax-Gated MoE. Native Inception, ResNeXt, and Siamese architectures.

Native Mixed-Precision

Generic tensor backend supports int8, uint16, and float32 natively. Quantization-aware training without post-processing wrappers.

Universal Tokenizer

Pure Go BPE implementation compatible with HuggingFace tokenizer.json files. No Rust or C++ dependencies required.

Complete Training Infrastructure

7 LR schedulers, 3 optimizers (SGD/AdamW/RMSprop), 10 softmax variants for MoE routing, and 5 activation functions with proper derivatives.

Telemetry & Introspection

Runtime reflection via GetMethodsJSON(), ExtractNetworkBlueprint() for visualizing structure, and complete evaluation suite with deviation metrics.

Known Limitations: No central Model Zoo (relies on external checkpoints), WebGPU acceleration is beta/experimental, and broad operator coverage (3D Conv, Deformable Attention, FFTs) is limited compared to SciPy/JAX.

Comparison

The AI Landscape

See how Loom compares to major industry engines and the Go ecosystem.

Feature	Loom (Go)	PyTorch	TensorFlow	GoMLX	Spago	TF.js	Candle
Core
Runtime Dependency	None (Binary)	Heavy (Pip)	Binary	CGo/XLA	None	Browser	None
Auto-Differentiation	⚠️ Hybrid	✅ Full	✅ Full	✅ Full	⚠️ Manual	✅ Full	✅ Full
Loading & Format
Safetensors	✅ Native	✅	✅	✅	❌	✅	✅
Structure Inference	✅ Auto-Detect	❌	❌	❌	❌	❌	❌
Training
Neural Tweening	✅ Hybrid Engine	❌	❌	❌	❌	❌	❌
LR Schedulers	✅ 7 Types	✅	✅	✅	⚠️ Basic	✅	✅
Layers
Parallel / MoE	✅ Structural	⚠️ Manual	⚠️ Manual	❌	❌	❌	❌
SwiGLU	✅ Native	✅	✅	✅	❌	❌	✅
Pure Go Tokenizer	✅	Rust/C++	C++	❌	❌	❌	✅
Platform
WASM Training	✅ Full	❌	❌	❌	❌	⚠️ Slow	✅
Cross-Lang C-ABI	✅ Universal	❌	❌	❌	❌	❌	⚠️
Advanced
Step-Based Forward	✅ Unique	❌	❌	❌	❌	❌	❌
Dynamic Arch Gen	✅ Built-in	❌	❌	❌	❌	❌	❌
Network Grafting	✅ Unique	❌	❌	❌	❌	❌	❌

Feature	Loom	GoMLX	Gorgonia	Spago	Go-Deep	Gonum
Foundation
Implementation	Pure Go	CGo (XLA)	Pure Go + CGo	Pure Go	Pure Go	Pure Go
Autograd	⚠️ Hybrid	✅ Full	✅ Symbolic	✅ Dynamic	✅ Backprop	❌
Model Loading
Safetensors	✅ Native	✅	❌	❌	❌	❌
Architectures
Transformer (MHA)	✅ Explicit	✅	⚠️ Hard	✅ (BERT)	❌	❌
RNN / LSTM	✅ Full Gate	✅	⚠️ Basic	✅ BiLSTM	❌	❌
SwiGLU	✅	✅	❌	❌	❌	❌
Parallel / MoE	✅ Structural	⚠️ Manual	❌	❌	❌	❌
Training
Hybrid Tweening	✅ Unique	❌	❌	❌	❌	❌
Softmax Variants	✅ 10 Types	⚠️ Standard	⚠️ Standard	⚠️ Standard	⚠️ Standard	❌
Advanced
RoPE (GQA)	✅ GQA Support	✅	❌	❌	❌	❌
Network Grafting	✅ Unique	❌	❌	❌	❌	❌
Step-Based Forward	✅ Unique	❌	❌	❌	❌	❌
Dynamic Arch Gen	✅ Unique	❌	❌	❌	❌	❌
Platform
C-ABI (Polyglot)	✅ Universal	❌	❌	❌	❌	❌
WASM Training	✅ Full	❌ (XLA)	❌	❌	❌	❌
Ecosystem
Maintenance	🔥 Active	🔥 Active	⚠️ Slow	⏸️ Paused	⚠️ Slow	🔥 Active

Precision

Native Numerical Type Support

Train and infer on any numerical type without wrappers or post-processing. Most runtimes require QAT (Quantization-Aware Training)—a multi-step process where you train in float32, then simulate lower precision during fine-tuning, then convert to int8 for deployment. Loom skips this entirely: define your types upfront and train natively on int8, uint16, or any supported type from the start.

Layer Type / Numerical Type	Loom	GoMLX	Gorgonia	Spago	PyTorch
Float32 (Standard)	✅	✅	✅	✅ (Float64)	✅
Float64 (High Precision)	✅ Native	✅	✅	✅	✅
Float16 / BF16	✅	✅ (XLA)	❌	❌	✅
Int8 Training	✅ Native	❌	❌	❌	⚠️ QAT Wrapper
Int8 Inference	✅	❌	❌	❌	✅ (Quant)
Int16, Int32, Int64	✅ Native	✅ (XLA)	⚠️ Tensor	❌	❌ Tensor Only
Uint8, Uint16, Uint32	✅ Native	✅ (XLA)	⚠️ Tensor	❌	✅ Uint8 Only

Complete Type System

Unlike runtimes that treat integers primarily as storage formats for quantization, Loom's Generics allow native training and inference on exotic types like uint16 (common in medical imaging), int32, or float64 (scientific simulations) across every layer type without changes to model code.

Verdict

When to Choose Each Engine

The right tool depends on your use case. Here's a quick decision guide.

Choose PyTorch

For research, SOTA models, or complex dynamic architectures requiring the Python ecosystem.

Choose TensorFlow / TFLite

For robust mobile/edge deployment with a mature toolchain and optimized inference.

Choose GoMLX

For high-performance training in Go when you can tolerate CGo and XLA C++ dependencies.

Choose Core ML

For iOS/macOS exclusive apps leveraging Apple's Neural Engine and native integration.

Choose Loom

For pure Go-native embedding (cloud/CLI/server), zero-dependency single binaries, Neural Tweening experimentation, Step-Based Forward for real-time inference, or Dynamic Architecture Generation for automated model exploration.

Resources

More Resources

Validation Demos npm package PyPI package NuGet package Go Docs