Week 10 Worklog

Week 10 Objectives:

  • Improve and optimize OCR model for various types of invoices
  • Enhance Voice processing quality with Vietnamese numbers and compound phrases
  • Deploy Confidence Scoring system for both Voice and Bill
  • Smart category detection and context recognition
  • Comprehensive error handling and performance optimization
  • Multi-dimensional testing with edge cases

Tasks to be carried out this week:

DayTaskStart DateCompletion DateReference Material
2- Participate in Cloud Mastery Series #2

OCR Enhancement
- Detect and extract line items in proper JSON format
- Extract multiple items in invoice
- Count components in invoice
- Test with various bill types

Voice Model Quality Improvement
- Handle Vietnamese numbers (e.g., “hai mươi hai” to 22)
- Recognize compound phrases
- Fix spelling errors and handle noise
- Improve JSON response format
- Testing
10/11/202510/11/2025Sprint 02 - Day 06
3Check OCR model confidence score
- Calculate Field-level Confidence
- Calculate confidence score for each extracted field:
- Amount/Total confidence
- Date confidence
- Text clarity
- Clear line separation
- Price alignment
- Quantity detection
- Overall confidence algorithm
- Calculate Low Confidence Threshold
- Test with over 30 bills

Check Voice Confidence Scoring
- Test multi-layer confidence scoring
- Test overall confidence algorithm
- Test Voice Low Confidence Thresholds
- Performance check (over 50 samples, <4 seconds)

Comprehensive testing of Voice and Bill systems
11/11/202511/11/2025Sprint 02 - Day 07
4Smart category detection and context recognition in Voice
- Advanced category detection such as merchant name analysis, keyword extraction based on categories
- Determine context
- Testing

Extract new invoice types
- Test with supermarket invoices
- Test with restaurant invoices
- Test with coffee shop and beverage store invoices
- Extraction rules for each specific type
- Improve Bill endpoint JSON response

Backend integration and testing
- Integrate transaction functions
- Error handling
- Return correct JSON values
12/11/202512/11/2025Sprint 02 - Day 08
5Error Handling, Optimization & Monitoring
Voice processing optimization
- Optimize Voice model performance
Error Handling & Resilience
- Handle Corrupted Audio:
- Validate audio format (WAV, MP3, M4A)
- Check audio duration (minimum 0.5 seconds, maximum 30 seconds)
- Detect corrupted/incomplete files
- Return clear error: “Invalid audio format or corrupted file”
- Handle STT Timeout:
- Set timeout for STT processing (maximum 30 seconds)
- If timeout, return partial result with warning
- Log timeout events for monitoring

Retry Logic:
- Retry STT on temporary errors (maximum 3 retries)
- Exponential backoff: 1 second, 2 seconds, 4 seconds
- Return error after maximum retries

Exception Cases:
- Empty/silent audio → Error: “No speech detected”
- Non-Vietnamese speech → Low confidence warning
- Multiple speakers → Warning + best effort extraction
- Very long audio (>30 seconds) → Error: “Audio too long”

Graceful Degradation:
- If category detection fails → Return “Uncategorized”
- If quantity extraction fails → Return null value with warning
- If date not detected → Use current date with warning
- Always return partial results when possible
13/11/202513/11/2025Sprint 02 - Day 09
6Comprehensive testing and documentation
- Prepare data for testing both Voice and Bill models
- Test Voice API
- Test Bill OCR API

Voice test cases
- Test with background noise
- Test with fast or slow speech
- Test with invalid input, such as nonsense speech
- Test multiple transactions in one recording
- Test ambiguous input

Bill processing test cases
- Test images rotated in various directions
- Test image quality
- Test inputs that are not invoice images
- Test with highly complex invoices
- Test with invoices containing many items
14/11/202514/11/2025Sprint 02 - Day 10

Week 10 Achievements:

1. OCR Model Improvements

  • Detect and extract line items in proper JSON format
  • Extract multiple items in a single invoice
  • Automatically calculate quantity of components in invoice
  • Successfully tested with various invoice types: supermarket, restaurant, coffee shop, beverage stores
  • Built extraction rules for specific invoice types
  • Improved JSON response format for Bill endpoint

2. Voice Processing Quality Enhancement

  • Handle Vietnamese numbers (convert “hai mươi hai” → “22”)
  • Recognize Vietnamese compound phrases
  • Fix spelling errors and handle audio noise
  • Improved JSON response format
  • Optimized Voice processing performance

3. Confidence Scoring System

OCR Confidence:

  • Calculate field-level confidence
  • Confidence for Amount/Total
  • Confidence for Date
  • Evaluate text clarity and line separation
  • Detect price alignment and quantity
  • Build overall confidence algorithm
  • Set Low Confidence Threshold
  • Tested on over 30 real invoices

Voice Confidence:

  • Deploy multi-layer confidence scoring
  • Overall confidence algorithm for Voice
  • Set Low Confidence Thresholds
  • Performance check on over 50 samples (processing time <4 seconds)

4. Smart Category Detection

  • Merchant/seller name analysis
  • Keyword extraction based on categories
  • Transaction context determination
  • Advanced Category Detection

5. Error Handling & Resilience

Audio Error Handling:

  • Validate audio format (WAV, MP3, M4A)
  • Check duration (min: 0.5s, max: 30s)
  • Detect corrupted/incomplete files
  • Handle STT timeout (max: 30s)
  • Retry logic with exponential backoff (3 times, 1s-2s-4s)

Exception Handling:

  • Empty/silent audio → Error: “No speech detected”
  • Non-Vietnamese speech → Low confidence warning
  • Multiple speakers → Warning + best effort extraction
  • Audio too long (>30s) → Error: “Audio too long”

Graceful Degradation:

  • Category not detected → Return “Uncategorized”
  • Quantity not extracted → Return null + warning
  • Date not detected → Use current date + warning
  • Always return partial results when possible

6. Comprehensive Testing

Voice Testing:

  • Test with background noise
  • Test speech speed (fast/slow)
  • Test invalid input (nonsense speech)
  • Test multiple transactions in one recording
  • Test ambiguous input

Bill OCR Testing:

  • Test images rotated in various directions
  • Test diverse image quality
  • Test input not in invoice format
  • Test highly complex invoices
  • Test invoices with many items

7. Backend Integration

  • Integrate transaction functions with Backend
  • Comprehensive error handling
  • Return standardized JSON values

Summary: Week 10 focused on enhancing quality and reliability of both Voice and OCR models. Successfully deployed confidence scoring system, comprehensive error handling, and multi-dimensional testing with various edge cases. The system is ready for Backend integration and handling complex real-world scenarios.