What started with a single NVIDIA RTX 3090 just two years ago has officially evolved into an open-source hardware milestone.
Independent hardware engineer Max Zanoga (@zanoga) has successfully built a 32-GPU local AI cluster completely from scratch. Running a massive 768GB of VRAM across four independent server nodes, this setup marks the world's first documented personal AI datacenter of this exact configuration. It bridges the gap between commercial-grade computing and off-grid "solarpunk" sustainability.
Here is exactly how he did it, the technical hurdles he overcame, and why this sets a new benchmark for personal AI infrastructure.
π The Evolution: From Gamer Rigs to Server Nodes
Like many great homelab projects, Max's journey was a gradual escalation:
The Spark: A single RTX 3090 purchased for local AI experimentation.
The VRAM Wall: Realizing he needed more memory, he quickly expanded to two, then four cards.
The PCIe Bottleneck: At six cards, consumer motherboards ran out of PCIe lanes. Max upgraded to an ASUS ROMED8-2T motherboard to accommodate the expansion.
The Cluster Dream: Once the first 6-GPU node was complete, Max set his sights on linking multiple independent servers into a unified cluster.
π Breaking the Network Bottleneck with InfiniBand
Connecting multiple AI servers is an engineering nightmare. Standard Ethernet connections suffer from significant latency bottlenecks. Max originally experimented with llama.cpp RPC over Ethernet, but the latency proved too high for distributed AI training and inference.
To solve this, he looked at how hyperscale datacenters operate and turned to InfiniBand networking:
The Switch: A Mellanox SB7800 InfiniBand switch sourced from eBay.
The NICs: Mellanox ConnectX-6 network interface cards installed in each server.
The Heat Challenge: ConnectX-6 cards run extremely hot. Max custom-designed and 3D-printed specialized fan shrouds to keep the network cards from overheating.
π Overcoming Hardware & Power Failures
Scaling a cluster to 18 GPUs—and eventually 32—introduced major hardware instability that standard consumer components couldn't handle.
1. Gen 4 Risers vs. MCIO
Standard PCIe riser cables ranging from 20 cm to 60 cm caused constant system crashes and data errors. Max replaced them with industrial MCIO adapters from ADT-Link. He also hand-resoldered the SATA power inputs directly to the GPU power connectors for rock-solid stability.
2. Random Power Supply Resets
Max initially used modified Dell D2400E-S0 server power supplies. However, the units would randomly drop power to individual GPUs regardless of computing load. He eventually replaced them with Super Flower Leadex 2000W power supplies, achieving stable and predictable operation.
☀️ The Solarpunk Dream: 100% Off-Grid Compute
At full load, the 32-GPU cluster draws approximately 10 kW of continuous power. To avoid massive electricity costs and dependence on the grid, Max engineered a fully independent solar-plus-storage system:
Solar Arrays: A primary 20-panel array (~12 kW peak), supplemented by 2.5 kW and 4 kW secondary arrays.
Inverter: A Deye SUN-16K-SG01LP1-EU 16 kW hybrid inverter to manage the system.
Battery Storage: Two Seplos Mason LiFePO₄ battery banks (51.2 V), providing 32.15 kWh of energy storage.
Cooling: Instead of liquid cooling, the cluster relies on air cooling powered by 96 chassis fans, assisted by local air conditioning. A dedicated HVAC system is planned for the future.
π The Software Stack & Final Cost
Getting the hardware running was only half the challenge. Max spent days debugging a containerized software stack running vLLM and Ray inside Docker containers, testing more than 100 different configurations before successfully producing his first multi-node AI output.
The Financials
Max estimates that the GPUs and core server hardware cost approximately $30,000 USD, thanks in part to purchasing much of the hardware before the global GPU memory price surge. This estimate does not include the solar arrays, inverter, battery storage, or cooling infrastructure.
The Purpose
This isn't a museum piece—it's a production AI platform. Max currently uses the cluster to run state-of-the-art open-weight models such as Kimi K2.6 locally while maintaining complete data privacy.
The Bottom Line
When processing billions of tokens each month, renting cloud GPUs or paying recurring API fees can quickly become prohibitively expensive. Max Zanoga has demonstrated that, with enough engineering skill and determination, a private individual can own their data, generate their own power, and operate an enterprise-grade AI datacenter from home.

No comments:
Post a Comment