Kein Filesystem. Kein Problem.
No Filesystem. No Problem.
Erasure-coded parallel object storage written directly to raw block devices — NVMe, SAS SSD, and yes, even HDD. No filesystem. No kernel modules. No POSIX. No distributed lock manager. Native clients talk directly and in parallel to storage nodes. Coordinators handle auth, placement, resolve, and commit. S3 is proxied only because S3 leaves no honest alternative, and that proxy runs on storage nodes, not on a fake "not really in the data path" middle tier. Purpose-built for AI/ML workloads at GPU-cluster scale. Runs on bare metal, KVM virtual machines, and Kubernetes.
S3, parallel filesystems like Lustre or BeeGFS, Ceph, MinIO, re-invented pNFS, you name it — most if not all major file or object storage systems in production today are, at their core, POSIX storage with a distributed veneer or object storage stuck at the millennium on top of a POSIX filesystem. They inherit the same foundational assumptions from the 1970s: inodes, journals, POSIX lock semantics, directory trees, page cache coherency. These abstractions were designed for interactive multi-user Unix workstations. They are not designed for a world where a single training run reads 1 PiB across 2,500 GPU nodes.
Designed for POSIX consistency that nobody in the AI/ML world asked for — but everybody pays the performance tax on. Your GPUs sit idle while the storage system argues with itself about file locks.
Kernel-level filesystem clients are a maintenance nightmare at scale. One bad module update can take down hundreds of compute nodes simultaneously. Your storage vendor's release cycle now dictates your kernel upgrades.
"Why is my training run slow?" should not require deploying a separate monitoring stack, attaching kernel probes across every node, and correlating logs from six different systems to find a single bottleneck.
Your AI pipeline doesn't need atomic rename, POSIX advisory locks, or extended attributes. But your storage system implements all of it — and charges you in latency, complexity, and ops burden on every single I/O operation.
Storage shouldn't dictate your workflow. With libkeinfs, an AI researcher
can prototype a training pipeline on a MacBook against a single-node KeInFS instance
in a Docker container, then deploy that exact same code — unchanged — against a 2,500 GPU node cluster backed
by a 3 TiB/s KeInFS storage backend. Same API. Same client library. Same data format.
Zero refactoring.
KeInFS is not a general-purpose filesystem that someone added an S3 gateway to. It is an object storage system designed from the ground up for the access patterns of modern machine learning: massive sequential reads, large checkpoint writes, high-throughput parallel ingest, and low-latency model serving.
Erasure-coded data written directly to NVMe via io_uring with O_DIRECT. No filesystem layer, no journal, no inode table. A policy-driven extent allocator handles large objects, while packed containers keep small-object overhead under control.
The wire protocol, KeInFS/2, runs HTTP/2 over TLS 1.3. Coordinators handle auth, resolve, and commit. Smart clients using libkeinfs read and write directly to storage nodes in parallel, with direct pull or direct push selected by client policy. A FUSE client built on libkeinfs supports both reads and writes on that same direct path.
The entire AI/ML ecosystem — PyTorch, Hugging Face, DVC, MLflow, Spark — speaks S3. KeInFS provides S3 compatibility through storage-node proxy ingress, so adoption is zero-friction without pretending S3 is the high-performance path. Same data, same metadata, different honesty level.
Every KeInFS/2 request carries a signed attribution context: tenant, team, project, job, training rank. Metrics per-job, per-operation, in real time. "Why is my training run slow?" — one command, one answer.
Reed-Solomon erasure coding via Intel ISA-L with AVX2/AVX-512 SIMD acceleration. Encode throughput exceeds network bandwidth on modern hardware. Profiles from "kamikaze" to "fortress" — you choose your redundancy vs. capacity tradeoff.
Drives fail. Nodes fail. KeInFS detects failures in seconds, reconstructs lost chunks from parity across surviving nodes, and restores full protection — automatically, without operator intervention, without impacting running workloads.
Native KeInFS and S3 serve different needs. The native path uses coordinators for control plane only and moves bytes directly between clients and storage nodes. S3 is the only proxy path, and that proxy runs on storage nodes behind ordinary load balancing.
Every design decision in KeInFS optimizes for the access patterns of AI/ML workloads. When accelerator vendors recommend 1.4 GB/s+ sustained read bandwidth per GPU, your storage system needs to deliver — not negotiate POSIX locks.
Full architecture documentation, design specifications, deployment guides, and API reference are available to project contributors. Authenticate with GitHub to access the complete technical documentation.
Sign in with GitHubRequires membership in the storagebit GitHub organization. Access opens progressively as the project progresses.