BrightDrive / Brightskies2024

3D Annotation Tool Enhancement for Autonomous Driving (CVAT Extension)

Redesigned the 3D annotation workflow for autonomous driving datasets by building a geometry-aware annotation system on top of CVAT, reducing manual effort and significantly accelerating labeling speed.

Instead of requiring annotators to manipulate 3D structures directly, the system introduces a 2D-driven interaction paradigm that reconstructs 3D annotations automatically.

Designed and implemented core 3D annotation features, combining point cloud geometry, learning-based orientation estimation, and user-centric interaction design to simplify complex 3D tasks.

Core Stack

CVATTypeScriptBevPoint Cloud Processing
CVAT Extension

My Role

Designed and implemented 3D annotation features for point cloud data within CVAT.

Developed algorithms to simplify 3D bounding box creation from 2D inputs.

Integrated deep learning models for orientation and yaw prediction.

Built tools for 3D semantic segmentation acceleration.

Led integration, QA testing, and feature validation across the codebase.

Key Challenges

Working inside a large TypeScript web codebase without prior web development experience.

Designing intuitive tools for annotators handling complex 3D tasks.

Integrating new features cleanly into an existing production system.

Maintaining annotation quality while significantly reducing labeling time.

Key Results

Reduced 3D annotation time by up to 4x.

Enabled 2D-driven 3D bounding box creation via BEV interaction.

Automated conversion from bounding boxes to semantic segmentation.

Delivered production-ready features integrated into CVAT.

Problem Setting

3D annotation for autonomous driving is a time-consuming and error-prone process, especially when working with point cloud data. Annotators typically need to manipulate boxes across multiple axes, which slows the workflow and increases cognitive load. The goal of this project was to redesign critical parts of the annotation process to reduce complexity and accelerate labeling without sacrificing quality.

System Approach

The system shifts annotation from direct 3D manipulation to a simplified 2D interaction space, while leveraging geometric reasoning and learning-based models to reconstruct accurate 3D structure automatically.

2D to 3D Bounding Box Generation

A key contribution was enabling 3D bounding box generation from simple 2D BEV interactions. Annotators draw a 2D box in bird's-eye view, which is projected into 3D space. The system then applies RANSAC-based ground filtering, removes outliers via clustering, and estimates object orientation using a learning-based model. The result is an automatically generated oriented 3D bounding box that significantly reduces manual effort and annotation complexity.

BEV To 3D
Yaw Prediction
OBB Output

Bounding Box to Semantic Segmentation

To further accelerate labeling, the system converts 3D bounding boxes into semantic segmentation masks by extracting and refining point cloud regions. This enables annotators to switch annotation types without re-labeling from scratch, reducing duplicated effort across tasks.

Box To Segmentation

Manual 3D Semantic Segmentation Tool

CVAT did not provide enough support for manual 3D semantic segmentation, so a custom interface was developed for direct interaction with point cloud data. The tool improved precision, usability, and labeling speed for annotators working on complex 3D scenes.

Manual Segmentation
Point Cloud Editing

Integration and Engineering Challenges

A major challenge was integrating these features into CVAT's large TypeScript-based codebase despite having no prior web development background. The work required understanding the architecture, building features that fit existing workflows, and then handling integration, QA testing, pull requests, and merge conflict resolution to make the extension production-ready.

Final Outcome

The system reduced 3D annotation time by up to 4x while improving usability for annotators working with complex point cloud data. It introduced a simpler interaction paradigm, eliminated redundant labeling steps, and was integrated into a production annotation pipeline used by the team.

Why This Matters

3D annotation is one of the main bottlenecks in autonomous driving pipelines. By reducing annotation complexity and time, this system directly improved dataset scalability and the speed at which downstream models could be developed and iterated.