KeInFS — Erasure-Coded Parallel Object Storage for AI

The Problem

STORAGE IS STUCK
IN THE 1970s.

S3, parallel filesystems like Lustre or BeeGFS, Ceph, MinIO, re-invented pNFS, you name it — most if not all major file or object storage systems in production today are, at their core, POSIX storage with a distributed veneer or object storage stuck at the millennium on top of a POSIX filesystem. They inherit the same foundational assumptions from the 1970s: inodes, journals, POSIX lock semantics, directory trees, page cache coherency. These abstractions were designed for interactive multi-user Unix workstations. They are not designed for a world where a single training run reads 1 PiB across 2,500 GPU nodes.

Legacy Lock Managers

Distributed locks add milliseconds to every metadata operation

Designed for POSIX consistency that nobody in the AI/ML world asked for — but everybody pays the performance tax on. Your GPUs sit idle while the storage system argues with itself about file locks.

Kernel Module Hell

Recompile for every OS update, crash the whole node on bugs

Kernel-level filesystem clients are a maintenance nightmare at scale. One bad module update can take down hundreds of compute nodes simultaneously. Your storage vendor's release cycle now dictates your kernel upgrades.

Bolted-On Observability

eBPF probes and sidecar agents glued to the I/O path

"Why is my training run slow?" should not require deploying a separate monitoring stack, attaching kernel probes across every node, and correlating logs from six different systems to find a single bottleneck.

POSIX Tax

Compatibility shims that nobody asked for

Your AI pipeline doesn't need atomic rename, POSIX advisory locks, or extended attributes. But your storage system implements all of it — and charges you in latency, complexity, and ops burden on every single I/O operation.

For the Researcher

FROM YOUR MACBOOK
TO 2,500 GPUs.

Storage shouldn't dictate your workflow. With libkeinfs, an AI researcher can prototype a training pipeline on a MacBook against a single-node KeInFS instance in a Docker container, then deploy that exact same code — unchanged — against a 2,500 GPU node cluster backed by a 3 TiB/s KeInFS storage backend. Same API. Same client library. Same data format. Zero refactoring.

Prototype

MacBook + single KeInFS node
libkeinfs / S3 / FUSE read+write

Validate

Dev cluster + multi-node KeInFS
Same code, real scale

        
      
Production
2,500 GPUs + 3 TiB/s KeInFS
Same code. Zero refactoring.

The Solution

BUILT FOR AI.
NOT RETROFITTED.

KeInFS is not a general-purpose filesystem that someone added an S3 gateway to. It is an object storage system designed from the ground up for the access patterns of modern machine learning: massive sequential reads, large checkpoint writes, high-throughput parallel ingest, and low-latency model serving.

Raw Block Devices

Erasure-coded data written directly to NVMe via io_uring with O_DIRECT. No filesystem layer, no journal, no inode table. A policy-driven extent allocator handles large objects, while packed containers keep small-object overhead under control.

HTTP/2 Native Protocol

The wire protocol, KeInFS/2, runs HTTP/2 over TLS 1.3. Coordinators handle auth, resolve, and commit. Smart clients using libkeinfs read and write directly to storage nodes in parallel, with direct pull or direct push selected by client policy. A FUSE client built on libkeinfs supports both reads and writes on that same direct path.

S3 Backward Compatible

The entire AI/ML ecosystem — PyTorch, Hugging Face, DVC, MLflow, Spark — speaks S3. KeInFS provides S3 compatibility through storage-node proxy ingress, so adoption is zero-friction without pretending S3 is the high-performance path. Same data, same metadata, different honesty level.

Built-In Observability

Every KeInFS/2 request carries a signed attribution context: tenant, team, project, job, training rank. Metrics per-job, per-operation, in real time. "Why is my training run slow?" — one command, one answer.

ISA-L Erasure Coding

Reed-Solomon erasure coding via Intel ISA-L with AVX2/AVX-512 SIMD acceleration. Encode throughput exceeds network bandwidth on modern hardware. Profiles from "kamikaze" to "fortress" — you choose your redundancy vs. capacity tradeoff.

Self-Healing Cluster

Drives fail. Nodes fail. KeInFS detects failures in seconds, reconstructs lost chunks from parity across surviving nodes, and restores full protection — automatically, without operator intervention, without impacting running workloads.

Architecture

DUAL-PATH CLIENT MODEL

Native KeInFS and S3 serve different needs. The native path uses coordinators for control plane only and moves bytes directly between clients and storage nodes. S3 is the only proxy path, and that proxy runs on storage nodes behind ordinary load balancing.

Clients

libkeinfs (Smart Path) Initiate/Resolve at coordinators → EC encode local → direct pull or push with storage nodes → EC decode local.

FUSE Client (on libkeinfs) Full read/write POSIX emulation with direct chunk I/O, aggressive read-ahead, writeback, and a pinned hot core for low latency.

S3 SDK (Proxy Path) Standard S3 requests. A storage-node ingress proxy handles assembly, fan-out, and streaming on behalf of the client.

▼ HTTP/2 + TLS 1.3 ▼

Coordinators

Stateless Control Plane Auth · policy evaluation · initiate/resolve/commit · management API · quota orchestration · signed capability issuance

▼ mTLS ▼

Data Plane

Metadata Plane Ordered namespace operations · object publish transactions · watches · leases · backend under active evaluation

EC Engine (ISA-L) Reed-Solomon · AVX2/AVX-512 SIMD · CRC32C via SSE 4.2 · Hardware-accelerated integrity

Storage Nodes direct chunk service · optional S3 ingress · policy allocator with extents and packed containers · Raw block devices

By Design

PERFORMANCE IS
NOT AN AFTERTHOUGHT.

Every design decision in KeInFS optimizes for the access patterns of AI/ML workloads. When accelerator vendors recommend 1.4 GB/s+ sustained read bandwidth per GPU, your storage system needs to deliver — not negotiate POSIX locks.

Kernel Modules

Entirely userspace. Deploy as static binaries or containers with zero runtime dependencies. No recompilation on kernel upgrades.

≈LS

Line-Speed Transfer

Transports data at or near line speed, on par with traditional parallel filesystems — without the kernel modules, lock managers, or POSIX baggage.

1.4

GB/s per GPU

Designed to saturate modern accelerator bandwidth requirements — NVIDIA, AMD, Intel, or whatever comes next. Calculate GPU supportability directly from measured storage throughput.

<2s

Failure Detection

Automatic failure detection and self-healing rebuild from erasure coding parity. Operators replace hardware at their convenience.

KEIN
FILESYSTEM.

STORAGE IS STUCK
IN THE 1970s.

Distributed locks add milliseconds to every metadata operation

Recompile for every OS update, crash the whole node on bugs

eBPF probes and sidecar agents glued to the I/O path

Compatibility shims that nobody asked for

FROM YOUR MACBOOK
TO 2,500 GPUs.

BUILT FOR AI.
NOT RETROFITTED.

Raw Block Devices

HTTP/2 Native Protocol

S3 Backward Compatible

Built-In Observability

ISA-L Erasure Coding

Self-Healing Cluster

DUAL-PATH CLIENT MODEL

PERFORMANCE IS
NOT AN AFTERTHOUGHT.

GET UNDER
THE HOOD.

STORAGE IS STUCKIN THE 1970s.

Distributed locks add milliseconds to every metadata operation

Recompile for every OS update, crash the whole node on bugs

eBPF probes and sidecar agents glued to the I/O path

Compatibility shims that nobody asked for

FROM YOUR MACBOOKTO 2,500 GPUs.

BUILT FOR AI.NOT RETROFITTED.

Raw Block Devices

HTTP/2 Native Protocol

S3 Backward Compatible

Built-In Observability

ISA-L Erasure Coding

Self-Healing Cluster

DUAL-PATH CLIENT MODEL

PERFORMANCE ISNOT AN AFTERTHOUGHT.

GET UNDERTHE HOOD.

STORAGE IS STUCK
IN THE 1970s.

FROM YOUR MACBOOK
TO 2,500 GPUs.

BUILT FOR AI.
NOT RETROFITTED.

PERFORMANCE IS
NOT AN AFTERTHOUGHT.

GET UNDER
THE HOOD.