Week 9 Worklog

Week 9 Objectives:

  • FastAPI project with MongoDB
  • Vietnamese STT library selected and integrated
  • Vietnamese OCR library selected and tested
  • NLP extraction: amount, category, date, jar detection (REQ-027)
  • Detect multiple transactions (REQ-027)
  • Publish events to RabbitMQ

Tasks to be carried out this week:

DayTaskStart DateCompletion DateReference Material and Learning Notes
2Project Setup
- Create FastAPI project structure (/app, /models, /services, /utils, /routers, /schemas, /ai-models)
- Setup virtual environment (Python 3.11+)
- Install FastAPI, Uvicorn, Pydantic
- Install local MongoDB (Docker)
- Create database: ai_service_db (configured in docker-compose.yml)
- Collections: Bills, Voices (using MongoEngine models)
- Setup MongoDB connection with MongoEngine
- Test connection (lifecycle management in database.py)
Technology Research
- For Bill: Try various OCR models that best solve Vietnamese extraction
- For Voice: Search for Speech-to-Text models in Vietnamese language
03/11/202503/11/2025Sprint 01 - Day 01
3Core API Structure and Endpoints
- Shared API infrastructure
- Setup Voice and Bill APIs
- Create basic API endpoint structure
- Setup Pydantic models for request/response
- Add CORS middleware
- Add error handling middleware
- Create health check endpoint (GET /health)
- Setup logging (structlog)
04/11/202504/11/2025Sprint 01 - Day 02
4Voice Processing Pipeline
- Install Vietnamese STT Library
- Audio preprocessing
- Voice-to-Text integration
- Implement STT functionality
- Test Voice Model

OCR Preparation for Bill Extraction
- Install OCR library selected from Day 1
- Research image preprocessing techniques
- Create sample preprocessing pipeline
- Test with bill images
05/11/202505/11/2025Sprint 01 - Day 03
5Text Processing for Vietnamese Language
- Test model accuracy and select appropriate model, team uses PhoWhisper from VinAI
- Check model detection, whether it receives voice and returns text
- Adjust endpoint to return correct categories defined by team
- Test processing multiple transactions simultaneously, check if model can handle
- Setup detection of transaction time from speech
- Create transaction object

OCR Preprocessing
- Process images when user inputs and process that bill. Enhance image to best quality before feeding to OCR model
- Collect Vietnamese bills (electricity bills, shopping mall bills, restaurant bills, convenience store bills, coffee shop bills, …)
06/11/202506/11/2025Sprint 01 - Day 04
6Integration and Testing
- Handle background tasks for Voice and Bill
- Setup event publishing
Speech to Text
- End-to-end testing of Voice
- Pipelines running as Recording -> Processing -> Return Endpoint, check and improve Voice processing accuracy
Bill Detection
- Deploy and test OCR models: Tesseract, EasyOCR,…
07/11/202507/11/2025Sprint 01 - Day 05

Week 9 Achievements:

1. Project Setup and Infrastructure

  • Completed FastAPI project structure with standard directories (/app, /models, /services, /utils, /routers, /schemas, /ai-models)
  • Setup Python 3.11+ environment with virtual environment
  • Installed and configured local MongoDB using Docker
  • Created ai_service_db database and collections for Bills and Voices
  • Setup MongoDB connection with MongoEngine and lifecycle management

2. API Structure and Middleware

  • Setup API endpoints for Voice and Bill processing
  • Created Pydantic models for request/response validation
  • Configured CORS middleware and error handling middleware
  • Implemented health check endpoint (GET /health)
  • Integrated structlog for logging system

3. Voice Processing (Speech-to-Text)

  • Researched and selected PhoWhisper from VinAI for Vietnamese STT
  • Implemented audio preprocessing and Voice-to-Text integration
  • Tested model accuracy and voice detection capability
  • Configured endpoint to return correct defined categories
  • Tested processing multiple transactions simultaneously
  • Setup time detection for transactions from speech
  • Created transaction objects from voice data

4. Bill Processing (OCR)

  • Researched and selected appropriate OCR libraries (Tesseract, EasyOCR)
  • Implemented image preprocessing to optimize quality before OCR
  • Collected diverse Vietnamese bill dataset (electricity, supermarket, restaurants, convenience stores, coffee shops)
  • Tested OCR models with real bill images

5. NLP Extraction and Data Processing

  • Implemented information extraction: amount, category, date (REQ-027)
  • Jar detection
  • Multiple transaction detection in a single request

6. RabbitMQ Integration

  • Setup event publishing to RabbitMQ
  • Handled background tasks for Voice and Bill processing

7. Testing and Quality Assurance

  • End-to-end testing for Voice processing pipeline (Recording → Processing → Return Endpoint)
  • Evaluated and improved Voice processing accuracy
  • Tested OCR models with real dataset

Summary: Week 9 successfully completed all objectives, including FastAPI-MongoDB infrastructure setup, PhoWhisper model integration for Vietnamese STT, OCR implementation for bill processing, and event publishing system with RabbitMQ. NLP functions for transaction information extraction were implemented and successfully tested.