Scikit Learn python library provides built in classes to perform one-hot encoding transformations to your categorical features in order to utilize them in machine learning models.
Are you trying to implement a machine learning algorithm to classify documents? Need to determine the intent of a sentence to use in a chatbot? You might be asking yourself the same question. How do I convert text into a form that my machine learning algorithm can use? In the following post we will go over a simple to use model to convert sentences into vectors called the Bag of Words model. We will implement this algorithm in python from scratch and then we will use Scikit learns built in functions to vectorize sentences.
If you have been using machine learning, you will sooner rather than later realize that machine learning algorithms require numerical inputs. Unlucky for us, our features will come in various forms. Some will be continuous, others categorical in numeric or text format. Machine learning algorithms cannot work with variables in text form, we must perform certain preprocessing steps to get our data in the right format. How do we deal with these categorical variables? Worry no more! In this blog post I will explain how to deal with these categorical variables by using a technique known as one hot encoding.