The Food Recall Incidents dataset consists of 7,546 short texts (from 5 to 360 characters each), which are the titles of food recall announcements (therefore referred to as title), crawled from 24 public food safety authority websites by Agroknow. The texts are written in 6 languages, with English (6,644) and German (888) being the most common, followed by French (8), Greek (4), Italian (1) and Danish (1). Most of the texts have been authored after 2010 and they describe recalls of specific food products due to specific hazards. Experts manually classified each text to four groups of classes describing hazards and products on two levels of granularity:
The columns hazard-title and product-title comprise character spans, generated based on feature importance of a Logistic Regression (LR) classifier. These signify parts of the title that are important for hazard and product classification. Due to their very low support for many classes, the fine-grained tasks of hazard and product classification may require further pre-processing (e.g. label clustering or filtering), dependent on the application. The dataset comprises also metadata, such as the release date of the text (year, month, day), the language of the text (language), and the country of issue (country).
Paper | Code | Results | Date | Stars |
---|