AAIELA is an open-source project on GitHub that enables users to edit images using audio commands. The project combines AI models for computer vision, speech-to-text, language models, and text-to-image inpainting.
Utilizes Detectron2 for segmentation.
Leverages Faster Whisper for audio transcription.
Employs language models like GPT-4 for language understanding.
Incorporates Stable Diffusion for image inpainting.
Project aims to bridge spoken language and visual transformation.