Named entity recognition is a sub task of information extraction that seeks to locate and classify atomic elements or facts in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. User trainable named entity extractor is a project on named entity recognition. Named entity recognition comes under the field of Natural Language Processing of the branch of Artificial Intelligence. The goal of the project is to develop a user trainable tool which can extract named entities from an unstructured text. Various domains will have various set of entities. Using regular expressions, specifications for the rules as per the set of possible strings that the user wants to match is made. Pattern Matching involves a search for the required entity through the various sentences in the given text. If a match is found, then the obtained match will be highlighted else there will be no change to the document.
Two major areas of User Trainable Fact extraction system are information retrieval (IR) and information extraction (IE). Objective of IR system is to collect documents from a huge collection which matches the search query we feed into the system. Once IR fetches the documents, IE does a detailed analysis on the filtered documents. It is the duty of IE to locate the exact position of the information in the document an extract it.
User interact with Wouter’s Annotating Tool (WAT) to create annotations and to read the annotations found by UTFE system. WAT also directs the IE process and communication with the database, because WAT is the only component in the UTFE system that is having an interface with other components. WAT is the main component which controls the entire UTFE system. Only drawback with WAT is that it is not possible for multiple users to access the database at the same time.
The database stores the annotation, document, fact, user and ontology data. This makes UTFE capable of dealing with multiple users. Annotations and the ontology are stored in the same manner as WAT annotations and ontologies.
Xenon is used to find similar documents as the documents annotated by the user and to perform IR task in the UTFE system. The UTFE system will guide Xenon and feed it with enough information. In the current setup it is difficult to check whether Xenon returns the right documents. Because the system does not offer the possibility to give feedback to Xenon, the user cannot tell the system that it is giving incorrect documents. If the user cannot give feedback, it is impossible to evaluate the performance of Xenon.