RIVF2021 MC-OCR Competition
Mobile-Captured Image Document Recognition for Vietnamese Receipts
- Nov. 26, 2020: website online!
- Dec. 08, 2020: codalab online!
- Oct. 09, 2022: updated legacy version!
In this competition, we would like to tackle a problem of receipt recognition analysis: the text recognition of Vietnamese receipts. The competition task aims at extracting required fields in Vietnamese receipts captured by mobile devices.
Mobile captured receipts OCR (MC-OCR) is a process of recognizing text from structured and semi-structured receipts, and invoices in general captured by mobile devices. This process plays a critical role in the streamlining of document-intensive processes and office automation in many financial, accounting and taxation areas. However, MC-ORC faces big challenges due to the complexity of mobile captured images. First, receipts might be crumpled or the content might be blurred. Second, different from scanned images the quality of photos taken with mobile devices is very diverse because of the light condition and the dynamic environment (e.g., in-door, out-door, complex background, etc.), where the receipts were captured. These result in low quality of recognized information. To address them, in this challenge, we target two tasks including (1) evaluating the quality of the captured receipt, and (2) recognizing required fields of the receipt. The task hence is a multi-modal analysis task which can take advantages from both fields: computer vision and natural language processing which are two of the main interests of the RIVF community.
Task 1 : Receipt Image Quality Evaluation (IQA)
Receipt image quality is measured by the ratio of text lines associated with the “clear” label evaluated by human annotators. The quality ranges from 0 to 1 in which, score of 1 means the highest quality and score of 0 means the lowest quality.
Task 2: Key Information Extraction (KIE)
At maximum, a receipt image is associated with 4 fields annotated by human annotators. With different receipt's formats, the numbers of text lines might be different as some receipts do not contain all fields. For instance, the SELLER_ADDRESS might not exist in the receipt or simply, because the line is not readable.
To participate, please join RIV2021 MC-OCR Challenge via Codalab:
First and foremost, please register to participate at THIS LINK. You will be added to a Google Groups for further notification.
Afterwards, please join the challenge via Codalab: https://competitions.codalab.org/competitions/27798 (no longer accept new registration).
New challenge page: https://aihub.vn/competitions/1 (Legacy version of the above one).
Important dates :
06 Dec 2020: Registration of competition
05 Dec 2020: Warm-up data and evaluation tools will be available
20 Dec 2020 Public phase: training data testing data will be available (old date
10 Dec 2020)
18 Jan 2021: Team merging and pre-trained models registration deadlines
20 Jan 2021: Registration deadline (old date
03 Jan 2020)
20 Jan 2021: Private test phase (old date
03 Jan 2021)
23 Jan 2021: Final deadline (old date
05 Jan 2021)
25 Jan 2021: Semi-final list notification (old date
08 Jan 2021)
19 Feb 2021 (GMT+7): Technical paper and code submission for semi-finalists (
old date Feb 15, 2021)
20 Feb, 2021 (GMT+7): Technical paper deadline submission to RIVF2021 (old date
20 Jan 2021)
20 March, 2021 (GMT+7): Review result
24 April, 2021 (GMT+7): Camera-ready paper submission and author registration.
June 3-5, 2021 (GMT+7): Conference days and award ceremonies (RIVF2021)
Xuan-Son VU - sonvx at cs.umu.se
Quang-Anh BUI - anh at purchease.fr
Nhu-Van NGUYEN - vincent.nguyen at insa-lyon.fr
Hai Thi Tuyet Nguyen - hai.nguyen at univ-lr.fr
Thanh Vu - thanh.v.vu at oracle.com