Unsloth Studio launches beta; no-code UI enables local model training at 2x speed with 70% less VRAM
Overview
Unsloth has launched Unsloth Studio, a beta-stage open-source web UI designed to democratize model training and inference. The platform provides a no-code, unified local interface for training, running, and exporting open models across Windows, Linux, macOS, and WSL.
Key Features
Local Model Inference & Chat
- Run GGUF and safetensor models locally on any OS
- Support for multi-GPU inference powered by llama.cpp and Hugging Face
- Model Arena feature allows side-by-side comparison of two models (e.g., base vs. fine-tuned)
- Self-healing tool calling, web search, and code execution capabilities
- Auto inference parameter tuning and custom chat template editing
No-Code Training
- Train 500+ models including text, vision, TTS/audio, and embedding models
- Training is 2x faster with 70% less VRAM and no accuracy loss
- Supports latest models like Qwen 3.5 and NVIDIA Nemotron 3
- Upload datasets directly from PDF, CSV, JSON, DOCX, or TXT files
- Automatic multi-GPU training orchestration
- Optimized kernels for LoRA, FP8, FFT, and PT configurations
Data Processing
- Data Recipes feature transforms unstructured documents into usable synthetic datasets via graph-node workflows
- Powered by NVIDIA DataDesigner for automatic format conversion
Observability & Control
- Real-time tracking of training loss, gradient norms, and GPU utilization
- Training progress viewable remotely on other devices
- Full training history preservation for revisiting runs and experimentation
Export & Privacy
- Export fine-tuned models to safetensors or GGUF formats compatible with llama.cpp, vLLM, Ollama, and LM Studio
- 100% offline local operation with token-based authentication (password and JWT flows)
- Data remains under user control at all times
Platform Availability
- Linux & Windows/WSL: Full training and inference support on NVIDIA GPUs (RTX 30/40/50, Blackwell, DGX)
- macOS: Chat/inference only currently; MLX training coming soon
- CPU: Chat inference works without GPU; training requires NVIDIA hardware
The product is currently in beta, with the team actively addressing install time improvements through precompiled llama.cpp binaries.