Shaurya // Lab
Back to Work

SignVision + Air-Pen

An AI-powered accessibility platform combining sign language recognition, speech generation, translation, and touchless air-writing.

Main System Interface

01. The Problem

Communication barriers faced by people with hearing or speech impairments still remain a major challenge in daily life. Most sign language communication systems are either expensive, hardware-dependent, limited to small gesture sets, or not interactive enough for real-world usage. Traditional systems also struggle with real-time responsiveness, accurate gesture recognition under varying lighting/background conditions, natural sentence formation, accessibility for multilingual users, and writing text without physical contact devices. At the same time, there are very few systems that combine sign language recognition, speech generation, translation, and gesture-based writing into one unified platform. This creates a gap between accessibility technology and practical human communication.

The core insight behind SignVision + Air-Pen was that hand gestures can serve two powerful purposes simultaneously: Communication through sign language and Writing through motion in air. Instead of building only a static sign detector, the project was designed as a complete interactive communication system. The idea evolved into: Detecting hand signs in real time using computer vision, Converting gestures into meaningful text, Improving usability through phrase prediction and transcript stabilization, Speaking the generated text aloud, Translating communication into Hindi/English, Allowing users to “write in air” naturally using finger motion. The project focuses not only on AI accuracy, but also on making the interaction feel smooth, human, and accessible.

Approach and Solution: The project was developed as a hybrid AI + rule-based intelligent vision system. A) Real-Time Sign Language Detection — webcam input processed with MediaPipe Hands; rule-based detection uses geometric heuristics (finger positions, distances, palm orientation, finger open/closed states), fused with a CNN-based TensorFlow/Keras A–Z sign model using confidence thresholds. B) Stabilization Engine — multiple consecutive frames for stability; spacing rules when no hand is detected. C) Smart Transcript System — live transcript with phrase suggestions, auto phrase replacement, basic autocorrect logic, and TTS output. D) Translation Layer — English ↔ Hindi using proper Unicode + Devanagari rendering. E) Air-Pen Module — finger path tracked; virtual ink on a canvas; pinch gesture commits; OCR converts handwriting to text via Tesseract OCR with EasyOCR fallback. F) Web-Based Architecture (ongoing) — deployable web app with FastAPI backend + WebSocket streaming and Next.js frontend overlays.
Web Accessibility Architecture
AI / ML
TensorFlow / KerasCNN-based image classification
Computer Vision
OpenCVMediaPipe Hands
OCR
Tesseract OCREasyOCR
Translation & Speech
deep-translatorpyttsx3
Frontend / Backend
Next.jsReactFastAPIWebSockets

Architecture: The system is being transformed into a browser-based accessibility platform with real-time webcam streaming and overlays. The architecture includes a FastAPI backend WebSocket communication and a Next.js frontend for low-latency transcript rendering.

Learnings: This project provided deep practical learning in real-time computer vision, AI model integration, human-computer interaction, accessibility-focused product design, gesture recognition systems, and OCR pipelines for robust touchless writing.

Next Steps: The next step is a deployable web application with future Android support, including improved OCR reliability and smoother gesture-to-transcript stabilization for production-grade accessibility.

View the repository

Implementation details are available in GitHub.

View repository