| Day | Task | Start Date | Completion Date | Reference Material and Learning Notes |
|---|---|---|---|---|
| 2 | Project Setup - Create FastAPI project structure (/app, /models, /services, /utils, /routers, /schemas, /ai-models) - Setup virtual environment (Python 3.11+) - Install FastAPI, Uvicorn, Pydantic - Install local MongoDB (Docker) - Create database: ai_service_db (configured in docker-compose.yml) - Collections: Bills, Voices (using MongoEngine models) - Setup MongoDB connection with MongoEngine - Test connection (lifecycle management in database.py) Technology Research - For Bill: Try various OCR models that best solve Vietnamese extraction - For Voice: Search for Speech-to-Text models in Vietnamese language | 03/11/2025 | 03/11/2025 | Sprint 01 - Day 01 |
| 3 | Core API Structure and Endpoints - Shared API infrastructure - Setup Voice and Bill APIs - Create basic API endpoint structure - Setup Pydantic models for request/response - Add CORS middleware - Add error handling middleware - Create health check endpoint (GET /health) - Setup logging (structlog) | 04/11/2025 | 04/11/2025 | Sprint 01 - Day 02 |
| 4 | Voice Processing Pipeline - Install Vietnamese STT Library - Audio preprocessing - Voice-to-Text integration - Implement STT functionality - Test Voice Model OCR Preparation for Bill Extraction - Install OCR library selected from Day 1 - Research image preprocessing techniques - Create sample preprocessing pipeline - Test with bill images | 05/11/2025 | 05/11/2025 | Sprint 01 - Day 03 |
| 5 | Text Processing for Vietnamese Language - Test model accuracy and select appropriate model, team uses PhoWhisper from VinAI - Check model detection, whether it receives voice and returns text - Adjust endpoint to return correct categories defined by team - Test processing multiple transactions simultaneously, check if model can handle - Setup detection of transaction time from speech - Create transaction object OCR Preprocessing - Process images when user inputs and process that bill. Enhance image to best quality before feeding to OCR model - Collect Vietnamese bills (electricity bills, shopping mall bills, restaurant bills, convenience store bills, coffee shop bills, …) | 06/11/2025 | 06/11/2025 | Sprint 01 - Day 04 |
| 6 | Integration and Testing - Handle background tasks for Voice and Bill - Setup event publishing Speech to Text - End-to-end testing of Voice - Pipelines running as Recording -> Processing -> Return Endpoint, check and improve Voice processing accuracy Bill Detection - Deploy and test OCR models: Tesseract, EasyOCR,… | 07/11/2025 | 07/11/2025 | Sprint 01 - Day 05 |
ai_service_db database and collections for Bills and VoicesSummary: Week 9 successfully completed all objectives, including FastAPI-MongoDB infrastructure setup, PhoWhisper model integration for Vietnamese STT, OCR implementation for bill processing, and event publishing system with RabbitMQ. NLP functions for transaction information extraction were implemented and successfully tested.