BEAVR: Bimanual, multi-Embodiment, Accessible VR Teleoperation

Abstract

BEAVR is an open-source, Bimanual, Multi-embodiment Virtual Reality (VR) Teleoperation system for Robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema

Our system features a zero-copy streaming architecture achieving ≤35 ms latency, an asynchronous “think–act” control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA.

All code is publicly available, and datasets are released on Hugging Face. Code, datasets, and VR app are available at https://github.com/ARCLab-MIT/BEAVR-Bot.

Autonomous Policy Demos

DiffusionPolicy — Pickup (front view)

DiffusionPolicy — Pickup (top view)

SmolVLA — Pickup (front view)

SmolVLA — Pickup (top view)

Teleoperated Demos

Flip Cube

Grasp Lantern

Pickup Box

Pour

Stack Blocks

Tape Insert

System Overview

End-to-end pipeline from VR input to robot control

Experimental Setup

All experiments were conducted on an Alienware x16 R2 laptop equipped with an Intel Core Ultra 9 185H CPU and an NVIDIA GeForce RTX 4080 Max-Q GPU (12 GB VRAM). The system runs Ubuntu 24.04.2 LTS with CUDA 12.8 and NVIDIA driver version 570.169. We use a fixed-base 7-DoF tabletop manipulator (XArm7) with a 16-DoF LEAP hand attached to the end effector, operating in a tabletop workspace. Two statically mounted RGB cameras (front-facing and overhead) capture synchronized streams at 480×640 resolution and 30 FPS.

We conduct three separate experiments:

Three dexterous tasks demonstrating arm and hand use.
Success rate evaluation of three visuomotor policies for a given task.
System performance scaling analysis: latency, jitter, and frequency.

Affordable Hardware

BEAVR uses consumer VR and commodity hardware to enable affordable teleoperation without proprietary dependencies. The reference setup employs Meta Quest 3 for hand tracking and VR streaming, paired with standard compute and networked robot controllers, aligning with the accessibility goals described in the paper. See the paper for details on latency and system design.

Meta Quest 3 VR headset with controllers
Workstation-class PC with consumer GPU
Standard Ethernet/Wi‑Fi network
Robot platform(s) with network API

Results

Task Evaluation

Task	Success rate	Avg. time (s)
Tape task	6 / 10 (60%)	17.61
Lantern task	8 / 10 (80%)	21.61
Stack blocks	7 / 10 (70%)	25.89
Flip cube	10 / 10 (100%)	16.50
Pour	8 / 10 (80%)	84.67
Pick and place	10 / 10 (100%)	11.90

Policy Learning (Pickup Box)

Policy	Success rate	Avg. time (s)
ACT	10 / 10 (100%)	9.16
Diffusion	8 / 10 (80%)	23.88
SmolVLA	7 / 10 (70%)	33.84
Human operator	10 / 10 (100%)	12.08

Expert Comparison

Success Rate

Task	OpenTeach	BEAVR
Flip cube	1.0	1.0
Pour	0.8	0.8
Pick & Place	0.8	1.0

Median time (s)

Task	OpenTeach	BEAVR
Flip cube	2.85	13.32
Pour	14.83	28.92
Pick & Place	11.88	9.72

Network Performance

Component	Target Hz	Achieved Hz	Jitter (ms)
XArm7 (single)	30	29.93	0.90
LEAP (single)	30	29.69	0.19
XArm7 (bimanual)	30	29.93	0.89
LEAP (bimanual)	30	29.61	0.21
XArm7 (high freq)	90	99.18	0.75
LEAP (high freq)	90	97.22	0.13

All metrics per the paper [arXiv].

BibTeX

@misc{posadasnava2025beavr,
  title         = {BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots},
  author        = {Alejandro Posadas-Nava and Alejandro Carrasco and Richard Linares},
  year          = {2025},
  eprint        = {2508.09606},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  note          = {Accepted for presentation at ICCR 2025, Kyoto},
  url           = {https://arxiv.org/abs/2508.09606}
}

BEAVR

Bimanual, multi-Embodiment, Accessible VR Teleoperation

Video Introduction

BEAVR demonstrates accessible VR teleoperation with low-latency control and visual feedback through Meta Quest 3, enabling dexterous bimanual manipulation across different robot morphologies without requiring motion-capture stages or proprietary hardware.