AR Visual Aid for the Visually Impaired – IT & Computer Engineering Guide
1. Project Overview
The AR Visual Aid for the Visually Impaired is a wearable or mobile application designed to assist users with low or no vision by enhancing environmental awareness using computer vision, auditory feedback, and spatial mapping. The system detects objects, reads signs, and provides navigational support through audio and haptic signals.
2. System Architecture Overview
- Camera Module: Captures real-time surroundings.
- Object Detection Engine: Identifies and classifies objects.
- Scene Interpretation: Describes layout and hazards.
- Audio Feedback System: Converts visual info into spoken or spatial cues.
- Optional Haptics: Feedback via vibrations.
- Backend Services: Optional cloud-based enhancement and learning.
3. Hardware Components
Component |
Specifications |
Description |
Smart Glasses / Mobile Device |
AR-enabled with camera & audio output |
Captures surroundings and delivers output |
Camera |
HD RGB / Depth sensing (LiDAR or ToF) |
Captures 3D view of environment |
Bone Conduction Headphones |
Stereo, Bluetooth/USB-C |
Provides audio cues without blocking ears |
Vibration Motors (Optional) |
Mini linear actuators |
Used in wearables to provide tactile signals |
4. Software Components
4.1 Development Tools
- Development Platform: Android/iOS or Unity with AR
Foundation
- Computer Vision: OpenCV, TensorFlow Lite, YOLOv5
- Text-to-Speech: Android TTS, Google TTS, Amazon Polly
- Edge AI: Models optimized with ONNX/TFLite for low-power devices
4.2 Programming Languages
- Python (AI models), Java/Kotlin (Android), Swift (iOS), C# (Unity)
4.3 Libraries and SDKs
- ARCore/ARKit for spatial awareness
- OpenCV for image processing
- MediaPipe for hand/face detection
- TTS APIs for voice feedback
- TensorFlow Lite or CoreML for local inference
5. Core Functional Modules
- Object Detection: Real-time identification of people,
vehicles, obstacles
- Text Recognition: OCR to read signs, boards, documents
- Scene Description: AI-generated spoken summaries (e.g., 'a person crossing
the street')
- Navigation Aid: Detects clear paths, stairs, doorways
- Gesture Controls: Use hand gestures for interaction (optional)
6. Audio and Haptic Feedback System
- Audio Modes: Mono (for blind), stereo spatial (for low
vision)
- Feedback Content: Distance alerts, object names, scene summaries
- Haptic Feedback: Directional vibration cues for alerts or navigation
- Alert Prioritization: Urgent threats override background descriptions
7. User Interface Design (Minimalist)
- Audio-First UI: Voice prompts and gesture/voice commands
- Simplified Controls: Few buttons or automatic operation
- Configuration Options: Adjust TTS speed, languages, detection preferences
- Emergency Mode: SOS feature with voice activation
8. AI Model Optimization and Testing
- Lightweight Models: TFLite, quantized YOLO or MobileNet
for speed
- Dataset: Urban/street datasets + accessibility-specific datasets
- Real-world Testing: Simulate crowded, low-light, and noisy environments
- Performance Metrics: FPS, latency, detection accuracy, battery use
9. Deployment and Maintenance
- Device Support: Mid-range Android/iOS phones and AR
glasses
- Updates: AI model retraining and OTA delivery
- Offline Mode: Works without internet for core functions
- Cloud Mode (Optional): Improves recognition via remote processing
10. Security and Privacy
- Data Handling: Minimal/no storage of personal video/audio
- Edge Processing: AI runs locally for privacy
- Anonymization: Blurs faces if storing data (optional for research)
- User Control: Toggle data collection/sharing features
11. Future Enhancements
- Voice-controlled assistant integration
- Smart object detection (e.g., familiar faces or objects)
- Braille output or tactile screens
- Real-time translation of signs/text
- Integration with GPS for outdoor navigation