Bag of Words Algorithm in Python Introduction

Are you trying to implement a machine learning algorithm to classify documents? Need to determine the intent of a sentence to use in a chatbot? You might be asking yourself the same question. How do I convert text into a form that my machine learning algorithm can use? In the following post we will go over a simple to use model to convert sentences into vectors called the Bag of Words model. We will implement this algorithm in python from scratch and then we will use Scikit learns built in functions to vectorize sentences.

Chi Square Feature Selection in Python

Feature selection is an important part of building machine learning models. As the saying goes, garbage in garbage out. Training your algorithms with irrelevant features will affect the performance of your model. Also known as variable selection or attribute selection, choosing or engineering new features is often what separates the best performing models from the rest.

Keras Multi-Class Classification Introduction

Building neural networks is a complex endeavor with many parameters to tweak prior to achieving the final version of a model. On top of this, the two most widely used numerical platforms for deep learning and neural network machine learning models, TensorFlow and Theano, are too complex to allow for rapid prototyping. The Keras Deep Learning library for Python helps bridge the gap between prototyping speed and the utilization of the advanced numerical platforms for deep learning. Keras is a high-level API for building neural networks that run on top of TensorFlow, Theano or CNTK. It allows for rapid prototyping, supports both recurrent and convolutional neural networks and runs on either your CPU or GPU for increased speed.

Python One Hot Encoding with Pandas Made Simple

If you have been using machine learning, you will sooner rather than later realize that machine learning algorithms require numerical inputs. Unlucky for us, our features will come in various forms. Some will be continuous, others categorical in numeric or text format. Machine learning algorithms cannot work with variables in text form, we must perform certain preprocessing steps to get our data in the right format. How do we deal with these categorical variables? Worry no more! In this blog post I will explain how to deal with these categorical variables by using a technique known as one hot encoding.

Fitting Probability Distributions with Python Part 1

Probability distributions are a powerful tool to use when modeling random processes. They are widely used in statistics, simulations, engineering and various other settings. I have had to use them in various projects to correctly model randomness. There are many probability distributions to choose, from the well-known normal distribution to many others such as logistic and Weibull. The common problem I have continuously faced is having an easy to use tool to quickly fit the best distribution to my data and then use the best fit distribution to generate random numbers. Once again Python shows its flexibility for data science with its SciPy package, one of the main Python packages for mathematics, science and engineering. We will be using the SciPy package to tackle this task.