Home / Documentation

LOOM Documentation

Learn how to build and train GPU-accelerated neural networks in Go, deploy to browsers with WebAssembly, and integrate with Python, TypeScript, C#, and C/C++.

What is LOOM?

LOOM (Layered Omni-architecture Openfluke Machine) is a high-performance GPU-accelerated neural network framework written in Go. It features WebGPU compute shaders for parallel execution and WebAssembly export for browser deployment.

Breakthrough: Native Mixture of Experts
LOOM's Softmax layer includes native Mixture of Experts (MoE) via Grid Softmax - the same architecture used in GPT-4, Switch Transformer, and Mixtral. Mathematically proven equivalent with 97.1% loss reduction and perfect gradient matching.
Unique: Neural Tweening
LOOM features Neural Tweening—a bidirectional training algorithm that works from both ends simultaneously. Every layer gets direct gradient information (no vanishing gradients), with automatic explosion detection and recovery. Learn more →

Key Features

GPU Acceleration

Native GPU acceleration using WebGPU compute shaders (WGSL). Intelligent routing between CPU and GPU execution with support for Dense, Conv2D, and Multi-Head Attention layers.

WebAssembly Support

Compile to WASM for client-side inference in browsers. 5.4MB binary with zero dependencies, 24+ discoverable functions, full training support, and automatic method exposure via reflection.

5 Layer Types

Dense (fully-connected), Conv2D (convolutional), Multi-Head Attention (transformers), RNN (recurrent), and LSTM (long short-term memory) with full CPU execution and GPU acceleration where supported.

Multi-Language Bindings

Use LOOM from Go (native), Python (welvet via C-ABI), JavaScript/TypeScript (WASM), C#/.NET (P/Invoke), and C/C++/Rust (via C Foreign Function Interface).

How It Works

LOOM provides a flexible neural network framework that works across platforms:

Workflow
Design → Train → Deploy → Monitor

1. Create network architecture with 5 layer types
2. Train models with GPU acceleration
3. Deploy to browser (WASM), desktop, or mobile
4. Monitor performance with built-in evaluation metrics

Architecture

  • Grid-Based Networks: Organize layers in a 2D grid (rows × columns × layers per cell) supporting 100+ layers in a single network.
  • WebGPU Compute: WGSL shaders for GPU-accelerated forward and backward propagation on Dense, Conv2D, and Multi-Head Attention layers.
  • Softmax Layer: First-class layer with 10 variants including Grid Softmax (native MoE), Hierarchical, Temperature, Gumbel, and more.
  • Training System: Built-in training loop with gradient clipping, loss tracking, validation integration, and comprehensive metrics.
  • Model Serialization: Save/load models as JSON with base64-encoded weights, compatible across all platforms and languages.
Validation & Proof
All 5 layer types empirically validated through end-to-end training with 98.6% loss reduction and perfect classification. Cross-platform verified across Go, WebAssembly, TypeScript, Python, and C#.

Installation

Go (Native)

bash
go get github.com/openfluke/loom/nn

Python (welvet)

bash
pip install welvet

JavaScript/TypeScript (WASM)

bash
npm install @openfluke/welvet

C# / .NET

bash
dotnet add package Welvet

C/C++/Rust (C-ABI)

bash
git clone https://github.com/openfluke/loom
cd loom/cabi
./build.sh
GPU Requirements
WebGPU acceleration requires a compatible GPU and browser/runtime. CPU fallback is automatic. See the system requirements for details.

Next Steps

Now that you understand the basics, explore the comprehensive documentation:

Need Help?

If you can't find what you're looking for in the documentation, here are some other resources:

Contributing
LOOM is open source under Apache 2.0! We welcome contributions of all kinds. Check out the contributing section to get started.