User Trainable Fact Extractor For Unstructured Text

    2 Votes

Named entity recognition is a sub task of information extraction that seeks to locate and classify atomic elements or facts in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. User trainable named entity extractor is a project on named entity recognition. Named entity recognition comes under the field of Natural Language Processing of the branch of Artificial Intelligence. The goal of the project is to develop a user trainable tool which can extract named entities from an unstructured text. Various domains will have various set of entities. Using regular expressions, specifications for the rules as per the set of possible strings that the user wants to match is made. Pattern Matching involves a search for the required entity through the various sentences in the given text. If a match is found, then the obtained match will be highlighted else there will be no change to the document.

Two major areas of User Trainable Fact extraction system are information retrieval (IR) and information extraction (IE). Objective of IR system is to collect documents from a huge collection which matches the search query we feed into the system. Once IR fetches the documents, IE does a detailed analysis on the filtered documents. It is the duty of IE to locate the exact position of the information in the document an extract it.  

System Design

User Trainable Fact Extraction

WAT

User interact with Wouter’s Annotating Tool (WAT) to create annotations and to read the annotations found by UTFE system. WAT also directs the IE process and communication with the database, because WAT is the only component in the UTFE system that is having an interface with other components. WAT is the main component which controls the entire UTFE system. Only drawback with WAT is that it is not possible for multiple users to access the database at the same time.

Database

The database stores the annotation, document, fact, user and ontology data. This makes UTFE capable of dealing with multiple users. Annotations and the ontology are stored in the same manner as WAT annotations and ontologies.

Xenon

Xenon is used to find similar documents as the documents annotated by the user and to perform IR task in the UTFE system. The UTFE system will guide Xenon and feed it with enough information. In the current setup it is difficult to check whether Xenon returns the right documents. Because the system does not offer the possibility to give feedback to Xenon, the user cannot tell the system that it is giving incorrect documents. If the user cannot give feedback, it is impossible to evaluate the performance of Xenon. 

References

http://hmi.ewi.utwente.nl/verslagen/afstudeer/JoosseWouterFinalThesis.pdf

Popular Videos

communication

How to improve your Interview, Salary Negotiation, Communication & Presentation Skills.

Got a tip or Question?
Let us know

Related Articles

Travel Planner using Genetic Algorithm
Data Recovery and Undeletion using RecoverE2
PC CONTROLLED ROBOTIC CAR
Routino Router Algorithm
Data Leakage Detection
Scene Animation System Project
Data Structures and Algorithms Visualization Tool
Paint Program in C
Solving 0-1 Knapsack Problem using Genetic Algorithm
Software Watermarking Project
Android Gesture Recognition
Internet working between OSI and TCP/IP Network Managements with Security Features Requirements
Web Image Searching Engine Using SIFT Algorithm
Remote Wireless Sensor Networks for Water Quality Monitoring Requirements
Ranking Spatial Data by Quality Preferences
Scalable Learning Of Collective Behaviour
Computational Metaphor Extraction And Interpretation
Designing a domain independent Rules Engine For Business Intelligence
Graph Colouring Algorithm
Gesture Based Computing