Project 4 : Next Word Predictor
In this project, my goal was to familiarize myself with natural language processing and use it to build a “next word” predictive model such as the ones we have on our smartphones keyboards. For that, I used a database provided by SwiftKey.
I divided this project into three parts : the App (hosted on ShinyApps), the project pitch (hosted on RPubs), and the GitHub repository that contains the App’s code.
The project pitch contains a step-by-step description on how I built the predictor, for which I chose not to use machine learning algorithms but document frequency matrices and classification to use Katz’ backoff model along with Good-Turing smoothing.
As you can probably tell, this project is half mathematical theory and half programming.
In the end, I built an App that is quite responsive to long inputs but shows its weakness in complex sentences or short inputs since I chose to remove all stopwords - such as “I”, “my”, “yours”, “and” - to only focus on words that carry most of the message, for computer performance issues (this project pushed my computer to its absolute limits in RAM capabilities)