Some of my works and projects in data science and signal processing
This is an implementation of some Natural Language Proccesing algorithms with the objective of answering questions about information that is readed from a corpus of documents. It was implemented using nltk in Python.
It was made as an activity for the Harvard’s Course CS50’s Introduction to Artificial Intelligence with python.
The bot performs two tasks: document retrieval and passage retrieval. This version, the 1.0, uses simple strategies and metrics to get that, I will be working to add some features soon, such as looking for synonyms of query words, or lemmatizing to handle different forms of the same word.
Using the The Beatles’ post on wikipedia as corpus, let’s try some querys:
$ python python questions.py beatles/
Query: when was let it be released?
On 8 May 1970, Let It Be was released.
$ python python questions.py beatles/
Query: when does ringo starr join the band?
Already contemplating Best's dismissal,[41] the Beatles replaced him in mid-August with Ringo Starr, who left Rory Storm and the Hurricanes to join them.
$ python python questions.py beatles/
Query: when was the first live performance in US?
They gave their first live US television performance two days later on The Ed Sullivan Show, watched by approximately 73 million viewers in over 23 million households,[92] or 34 percent of the American population.
This version needs some direct questions, because it answers by choosing the more related phrased in the text. However, it could happen that the more related phrase is not the best answer. However, this methodology could be useful to customer services if the corpus is opportunely prepared!
Look in my repo, to download the files and run the bot.
Yo need to have in the same folder:
Then, run:
$ python questions.py [foldername/]
It works!