Industrial environments often require workers to interact closely with robotic machinery, creating safety risks during operation and maintenance. This project addresses the need for a safer and more responsive control interface by developing EchoSafe, a low-power edge-AI speech keyword detection system. The system allows operators to control machinery using voice commands such as “go” and “stop,” reducing the need for manual interaction near hazardous equipment. By running entirely offline, the system also improves reliability, security, and privacy compared to cloud-based voice systems.
The system captures audio using a microphone connected to a Raspberry Pi, which performs real-time audio preprocessing and extracts Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. These features are transmitted to a custom FPGA accelerator that runs a quantized convolutional neural network (CNN) designed to recognize the keywords “go” and “stop.” The accelerator uses a custom RISC-V-based vector processor optimized for low-power edge AI inference. When a keyword is detected, the Raspberry Pi receives the result and triggers the appropriate system response.
The project produced a working prototype of an offline speech keyword detection system for industrial safety applications. The system integrates a Raspberry Pi, MEMS microphone, and FPGA accelerator to process speech commands and control a motor in real time. A custom CNN model was trained, quantized to INT8, and deployed on the hardware accelerator, achieving approximately 96.7% accuracy on the test dataset. The final prototype demonstrates that low-power edge hardware can perform reliable speech inference without requiring cloud connectivity.
