| Day | Task | Start Date | Completion Date | Reference Material |
|---|---|---|---|---|
| 2 | - Participate in Cloud Mastery Series #2 OCR Enhancement - Detect and extract line items in proper JSON format - Extract multiple items in invoice - Count components in invoice - Test with various bill types Voice Model Quality Improvement - Handle Vietnamese numbers (e.g., “hai mươi hai” to 22) - Recognize compound phrases - Fix spelling errors and handle noise - Improve JSON response format - Testing | 10/11/2025 | 10/11/2025 | Sprint 02 - Day 06 |
| 3 | Check OCR model confidence score - Calculate Field-level Confidence - Calculate confidence score for each extracted field: - Amount/Total confidence - Date confidence - Text clarity - Clear line separation - Price alignment - Quantity detection - Overall confidence algorithm - Calculate Low Confidence Threshold - Test with over 30 bills Check Voice Confidence Scoring - Test multi-layer confidence scoring - Test overall confidence algorithm - Test Voice Low Confidence Thresholds - Performance check (over 50 samples, <4 seconds) Comprehensive testing of Voice and Bill systems | 11/11/2025 | 11/11/2025 | Sprint 02 - Day 07 |
| 4 | Smart category detection and context recognition in Voice - Advanced category detection such as merchant name analysis, keyword extraction based on categories - Determine context - Testing Extract new invoice types - Test with supermarket invoices - Test with restaurant invoices - Test with coffee shop and beverage store invoices - Extraction rules for each specific type - Improve Bill endpoint JSON response Backend integration and testing - Integrate transaction functions - Error handling - Return correct JSON values | 12/11/2025 | 12/11/2025 | Sprint 02 - Day 08 |
| 5 | Error Handling, Optimization & Monitoring Voice processing optimization - Optimize Voice model performance Error Handling & Resilience - Handle Corrupted Audio: - Validate audio format (WAV, MP3, M4A) - Check audio duration (minimum 0.5 seconds, maximum 30 seconds) - Detect corrupted/incomplete files - Return clear error: “Invalid audio format or corrupted file” - Handle STT Timeout: - Set timeout for STT processing (maximum 30 seconds) - If timeout, return partial result with warning - Log timeout events for monitoring Retry Logic: - Retry STT on temporary errors (maximum 3 retries) - Exponential backoff: 1 second, 2 seconds, 4 seconds - Return error after maximum retries Exception Cases: - Empty/silent audio → Error: “No speech detected” - Non-Vietnamese speech → Low confidence warning - Multiple speakers → Warning + best effort extraction - Very long audio (>30 seconds) → Error: “Audio too long” Graceful Degradation: - If category detection fails → Return “Uncategorized” - If quantity extraction fails → Return null value with warning - If date not detected → Use current date with warning - Always return partial results when possible | 13/11/2025 | 13/11/2025 | Sprint 02 - Day 09 |
| 6 | Comprehensive testing and documentation - Prepare data for testing both Voice and Bill models - Test Voice API - Test Bill OCR API Voice test cases - Test with background noise - Test with fast or slow speech - Test with invalid input, such as nonsense speech - Test multiple transactions in one recording - Test ambiguous input Bill processing test cases - Test images rotated in various directions - Test image quality - Test inputs that are not invoice images - Test with highly complex invoices - Test with invoices containing many items | 14/11/2025 | 14/11/2025 | Sprint 02 - Day 10 |
OCR Confidence:
Voice Confidence:
Audio Error Handling:
Exception Handling:
Graceful Degradation:
Voice Testing:
Bill OCR Testing:
Summary: Week 10 focused on enhancing quality and reliability of both Voice and OCR models. Successfully deployed confidence scoring system, comprehensive error handling, and multi-dimensional testing with various edge cases. The system is ready for Backend integration and handling complex real-world scenarios.