The aim of the project is to develop a software which will convert a text story into its corresponding image story. This software can represent a simple or even a big story with images corresponding to each sentence in the story. Each sentence of the given story is analyzed for its meaning and matching images are extracted from the web using Google Search Engine. The sentences are replaced by images which can convey the meaning of the text data, so that, the user obtains an image story line as an output for the text story line given as input. For finding which all words in each sentence make it meaningful, we make use of the NLTK module available in Python. At first, each word in the given story is tokenized using the tokenizer available in Python.
NLTK is an open source project. Various NLTK modules are
- Token - This library contain classes which are used to process individual elements of text, such as words and sentences
- Probability - Contain classes to process process probabilistic information.
- Tree - Classes for representing and processing hierarchical information over text.
- CFG - Classes for representing and processing context free grammars.
- FSA - Finite state automaton
- Tagger - Used for tagging each word with a part of speech, sense etc
- Parser - Building trees over text (includes chart, chunk and probabilistic parser's)
- Classifier - Classify text into categories (includes feature, feature Selection, maxent, naivebayes
- Draw - Visualize NLP structures and processes them
- Corpus - Access (tagged) corpus data