RIVF2021 MC-OCR Competition

Mobile-Captured Image Document Recognition for Vietnamese Receipts

Latest news:

- Nov. 26, 2020: website online!

- Dec. 08, 2020: codalab online!

- Oct. 09, 2022: updated legacy version!

In this competition, we would like to tackle a problem of receipt recognition analysis: the text recognition of Vietnamese receipts. The competition task aims at extracting required fields in Vietnamese receipts captured by mobile devices.

Text recognition of mobile-captured receipts Call For Participation

Mobile captured receipts OCR (MC-OCR) is a process of recognizing text from structured and semi-structured receipts, and invoices in general captured by mobile devices. This process plays a critical role in the streamlining of document-intensive processes and office automation in many financial, accounting and taxation areas. However, MC-ORC faces big challenges due to the complexity of mobile captured images. First, receipts might be crumpled or the content might be blurred. Second, different from scanned images the quality of photos taken with mobile devices is very diverse because of the light condition and the dynamic environment (e.g., in-door, out-door, complex background, etc.), where the receipts were captured. These result in low quality of recognized information. To address them, in this challenge, we target two tasks including (1) evaluating the quality of the captured receipt, and (2) recognizing required fields of the receipt. The task hence is a multi-modal analysis task which can take advantages from both fields: computer vision and natural language processing which are two of the main interests of the RIVF community.

Task 1 : Receipt Image Quality Evaluation (IQA)

Receipt image quality is measured by the ratio of text lines associated with the “clear” label evaluated by human annotators. The quality ranges from 0 to 1 in which, score of 1 means the highest quality and score of 0 means the lowest quality.

Task 2: Key Information Extraction (KIE)

At maximum, a receipt image is associated with 4 fields annotated by human annotators. With different receipt's formats, the numbers of text lines might be different as some receipts do not contain all fields. For instance, the SELLER_ADDRESS might not exist in the receipt or simply, because the line is not readable.

To participate, please join RIV2021 MC-OCR Challenge via Codalab:

Important dates :

  • 06 Dec 2020: Registration of competition

  • 05 Dec 2020: Warm-up data and evaluation tools will be available

  • 20 Dec 2020 Public phase: training data testing data will be available (old date 10 Dec 2020)

  • 18 Jan 2021: Team merging and pre-trained models registration deadlines

  • 20 Jan 2021: Registration deadline (old date 03 Jan 2020)

  • 20 Jan 2021: Private test phase (old date 03 Jan 2021)

  • 23 Jan 2021: Final deadline (old date 05 Jan 2021)

  • 25 Jan 2021: Semi-final list notification (old date 08 Jan 2021)

  • 19 Feb 2021 (GMT+7): Technical paper and code submission for semi-finalists (old date Feb 15, 2021)

  • 20 Feb, 2021 (GMT+7): Technical paper deadline submission to RIVF2021 (old date 20 Jan 2021)

  • 20 March, 2021 (GMT+7): Review result

  • 24 April, 2021 (GMT+7): Camera-ready paper submission and author registration.

  • June 3-5, 2021 (GMT+7): Conference days and award ceremonies (RIVF2021)

Contacts :