Building a Dataset for Possessions Identification in Text
Just as industrialization matured from mass production to customization and personalization, so has the Web migrated from generic content to public disclosures of one{'}s most intimately held thoughts, opinions and beliefs. This relatively new type of data is able to represent finer and more narrowly defined demographic slices. If until now researchers have primarily focused on leveraging personalized content to identify latent information such as gender, nationality, location, or age of the author, this study seeks to establish a structured way of extracting possessions, or items that people own or are entitled to, as a way to ultimately provide insights into people{'}s behaviors and characteristics. In order to promote more research in this area, we are releasing a set of 798 possessions extracted from blog genre, where possessions are marked at different confidence levels, as well as a detailed set of guidelines to help in future annotation studies.
PDF Abstract LREC 2016 PDF LREC 2016 Abstract