This paper addresses the problem of extracting acronyms and their definitions from large documents in a setting, when high recall is required and user feedback is available. We propose a three step approach to deal with the problem. First, acronym candidates are extracted using a weak regular expression. This step results in a list of acronyms with high recall but low precision rates. Second, definitions are constructed for every acronym candidate from its surrounding text. And last, a classifier is used to select genuine acronym- definition pairs. At the last step we use relevance feedback mechanism to tune the classifier model for every particular document. This allows achieving reasonable precision without losing recall. As opposed to existing approaches, either created to be generic and domain independent or tuned to one particular domain, our method is adaptive to an input document. We evaluate the proposed approach using three datasets from different domains. The experiments prove the validity of the presented ideas.
Thursday, May 03, 2012
Adding semantic meaning to text can only help our users. High-recall extraction of acronym-definition pairs with relevance feedback by Anna Yarygina and Natalia Vassilieva has been publshed by HP Laboratories as HPL-2012-46.
at 9:47 AM