Internship Report > Worklog > Week 10 Worklog

Week 10 Worklog

Week 10 Objectives:

Improve and optimize OCR model for various types of invoices
Enhance Voice processing quality with Vietnamese numbers and compound phrases
Deploy Confidence Scoring system for both Voice and Bill
Smart category detection and context recognition
Comprehensive error handling and performance optimization
Multi-dimensional testing with edge cases

Tasks to be carried out this week:

Day	Task	Start Date	Completion Date	Reference Material
2	- Participate in Cloud Mastery Series #2 OCR Enhancement - Detect and extract line items in proper JSON format - Extract multiple items in invoice - Count components in invoice - Test with various bill types Voice Model Quality Improvement - Handle Vietnamese numbers (e.g., “hai mươi hai” to 22) - Recognize compound phrases - Fix spelling errors and handle noise - Improve JSON response format - Testing	10/11/2025	10/11/2025	Sprint 02 - Day 06
3	Check OCR model confidence score - Calculate Field-level Confidence - Calculate confidence score for each extracted field: - Amount/Total confidence - Date confidence - Text clarity - Clear line separation - Price alignment - Quantity detection - Overall confidence algorithm - Calculate Low Confidence Threshold - Test with over 30 bills Check Voice Confidence Scoring - Test multi-layer confidence scoring - Test overall confidence algorithm - Test Voice Low Confidence Thresholds - Performance check (over 50 samples, <4 seconds) Comprehensive testing of Voice and Bill systems	11/11/2025	11/11/2025	Sprint 02 - Day 07
4	Smart category detection and context recognition in Voice - Advanced category detection such as merchant name analysis, keyword extraction based on categories - Determine context - Testing Extract new invoice types - Test with supermarket invoices - Test with restaurant invoices - Test with coffee shop and beverage store invoices - Extraction rules for each specific type - Improve Bill endpoint JSON response Backend integration and testing - Integrate transaction functions - Error handling - Return correct JSON values	12/11/2025	12/11/2025	Sprint 02 - Day 08
5	Error Handling, Optimization & Monitoring Voice processing optimization - Optimize Voice model performance Error Handling & Resilience - Handle Corrupted Audio: - Validate audio format (WAV, MP3, M4A) - Check audio duration (minimum 0.5 seconds, maximum 30 seconds) - Detect corrupted/incomplete files - Return clear error: “Invalid audio format or corrupted file” - Handle STT Timeout: - Set timeout for STT processing (maximum 30 seconds) - If timeout, return partial result with warning - Log timeout events for monitoring Retry Logic: - Retry STT on temporary errors (maximum 3 retries) - Exponential backoff: 1 second, 2 seconds, 4 seconds - Return error after maximum retries Exception Cases: - Empty/silent audio → Error: “No speech detected” - Non-Vietnamese speech → Low confidence warning - Multiple speakers → Warning + best effort extraction - Very long audio (>30 seconds) → Error: “Audio too long” Graceful Degradation: - If category detection fails → Return “Uncategorized” - If quantity extraction fails → Return null value with warning - If date not detected → Use current date with warning - Always return partial results when possible	13/11/2025	13/11/2025	Sprint 02 - Day 09
6	Comprehensive testing and documentation - Prepare data for testing both Voice and Bill models - Test Voice API - Test Bill OCR API Voice test cases - Test with background noise - Test with fast or slow speech - Test with invalid input, such as nonsense speech - Test multiple transactions in one recording - Test ambiguous input Bill processing test cases - Test images rotated in various directions - Test image quality - Test inputs that are not invoice images - Test with highly complex invoices - Test with invoices containing many items	14/11/2025	14/11/2025	Sprint 02 - Day 10

Week 10 Achievements:

1. OCR Model Improvements

Detect and extract line items in proper JSON format
Extract multiple items in a single invoice
Automatically calculate quantity of components in invoice
Successfully tested with various invoice types: supermarket, restaurant, coffee shop, beverage stores
Built extraction rules for specific invoice types
Improved JSON response format for Bill endpoint

2. Voice Processing Quality Enhancement

Handle Vietnamese numbers (convert “hai mươi hai” → “22”)
Recognize Vietnamese compound phrases
Fix spelling errors and handle audio noise
Improved JSON response format
Optimized Voice processing performance

3. Confidence Scoring System

OCR Confidence:

Calculate field-level confidence
Confidence for Amount/Total
Confidence for Date
Evaluate text clarity and line separation
Detect price alignment and quantity
Build overall confidence algorithm
Set Low Confidence Threshold
Tested on over 30 real invoices

Voice Confidence:

Deploy multi-layer confidence scoring
Overall confidence algorithm for Voice
Set Low Confidence Thresholds
Performance check on over 50 samples (processing time <4 seconds)

4. Smart Category Detection

Merchant/seller name analysis
Keyword extraction based on categories
Transaction context determination
Advanced Category Detection

5. Error Handling & Resilience

Audio Error Handling:

Validate audio format (WAV, MP3, M4A)
Check duration (min: 0.5s, max: 30s)
Detect corrupted/incomplete files
Handle STT timeout (max: 30s)
Retry logic with exponential backoff (3 times, 1s-2s-4s)

Exception Handling:

Empty/silent audio → Error: “No speech detected”
Non-Vietnamese speech → Low confidence warning
Multiple speakers → Warning + best effort extraction
Audio too long (>30s) → Error: “Audio too long”

Graceful Degradation:

Category not detected → Return “Uncategorized”
Quantity not extracted → Return null + warning
Date not detected → Use current date + warning
Always return partial results when possible

6. Comprehensive Testing

Voice Testing:

Test with background noise
Test speech speed (fast/slow)
Test invalid input (nonsense speech)
Test multiple transactions in one recording
Test ambiguous input

Bill OCR Testing:

Test images rotated in various directions
Test diverse image quality
Test input not in invoice format
Test highly complex invoices
Test invoices with many items

7. Backend Integration

Integrate transaction functions with Backend
Comprehensive error handling
Return standardized JSON values

Summary: Week 10 focused on enhancing quality and reliability of both Voice and OCR models. Successfully deployed confidence scoring system, comprehensive error handling, and multi-dimensional testing with various edge cases. The system is ready for Backend integration and handling complex real-world scenarios.