The organizers have asked volunteer data contributors to take photos of their or their friends’ readable receipts to create the dataset. This results in a dataset of more than 2,000 receipt images contributed from more than 50 data contributors. For each receipt image in the dataset, a human annotator is assigned to annotate each text line with the corresponding text. Here, a text line is considered clear if the text is extracted easily by the annotator. The number of clean text lines is then used to produce the quality score of the receipt image.
(The following information is for the warm-up dataset, however other data have the same structure). Dataset structure:
Folder "./warmup_images" contains raw receipts.
File "warmup_train.csv" contains annotations as described above.
File "warmup_test.csv" contains list of testing receipts and predicted info (as described above). Before submission, please rename this file to "results.csv" and zip it.
File label_dict.json contains predefined labels and their IDs.
README.md contains brief information of the data and data license.
. D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. Gomez, S. Robles, J. Mas, D. Fernandez, J. Almazan, L.P. de las Heras: ICDAR 2013 Robust Reading Competition. ICDAR, 2013.
. D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, D. Ghosh , A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, VR. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, E. Valveny: ICDAR 2015 robust reading competition. ICDAR, 2015.
. Everingham, M. and Eslami, S. M. A. and Van Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.: The Pascal Visual Object Classes Challenge: A Retrospective. IJCV, 2015.
. D. Karatzas, L. Rushinol, The Robust Reading Competition Annotation and Evaluation Platform.