Named Entity Recognition on Low Resource Languages for Emergent Incidents

The DARPA Low Resource Languages for Emergent Incidents (LORELEI) Program aims to develop language technology that can support rapid and effective response to emerging incidents where the language resources are very limited. As such, DARPA sponsors an annual evaluation where teams have less than 24 hours to produce language information (tags, Wikipedia links, etc.) on two surprise, low resource languages.

As a part of the Cognitive Computation Group, I helped build and improve a pipeline for processing surprise languages Sinhalese and Kinyarwanda. More specifically, I focused on developing an algorithm for smarter training document selection, writing code for more accurate matching of entities to knowledge bases, and helping guide native informants to manually tag training data.

System Description: