Aniruddh Chandratre

The touch of Machine Learning

Aniruddh Chandratre — Sun, 02 Jul 2017 14:39:51 GMT

Machine Learning has been a topic of interest since a long time. But even today, many of us consider Artificial Intelligence to be an idea which is too good to be true. But in reality, we have actually approached to a time where Artificial Intelligence is real, and is completely capable of doing tasks in ways humans can't possibly imagine. Computers have come a long way since they were first created.

Most of the tools we have created till date are Passive, that is, they can only do what we tell them to do, and nothing more than that. They can only perform limited tasks in limited ways and have no capability of thinking about any other ways. There are three stages for a computing tool.

Passive : The tools which can not think for themselves.
Generative : Generative tools are quite intelligent. Most of the tools which use Machine Learning are generative in nature, that is, they can think for ways to solve for a certain problem. Consider an example of Generative tool. Airbus recently used Machine Learning to design a new type of partition wall in their planes. The new structure was much stronger than what humans had designed, and was 50% lighter. Currently, a there's a completely autonomous bridge construction underway in Amsterdam. Read more about it here. Basically, they are using generative tools so that the computer can design a bridge by itself and 3D print it using stainless steel.
Intuitive : This is the ultimate goal of Artificial Intelligence. This is where the computer uses its intuition to decide the further steps. Recently, Google DeepMind created a Neural Network which defeated the world's best player in Go, which is considered to be the most strategistic game till date. During the match, at some points, even the engineers who designed AlphaGo couldn't understand why AlphaGo made a certain move!

What is Machine Learning?

In simple words, Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.

For example, one kind of algorithm is a classification algorithm. It can put data into different groups. The same classification algorithm used to recognize handwritten numbers could also be used to classify emails into spam and not-spam without changing a line of code. It’s the same algorithm but it’s fed different training data so it comes up with different classification logic.

Styles of Machine Learning

There are three styles of Machine Learning. But, supervised and unsupervised learning styles are the most common and popular styles.

Supervised : We have a properly labelled dataset and we receive feedback on each and every cycle. So even if the prediction is wrong, we receive a feedback. This type of learning is easier to implement because we have a properly labelled dataset in this case.
Unsupervised : Here, have an unordered dataset, which may not be labelled. In this style of learning, there is no feedback at all, that is, we won't get a feedback in any type of prediction (correct or wrong). This is a bit more difficult to implement than supervised.
Reinforcement : Here, we can have a dataset of any type. The main difference in this style of learning is the feedback. We only get a feedback when the prediction is correct. For example, if you setup a bot which can play chess, we would prefer reinforcement learning, that is, the bot would only learn the steps if it wins the game!

To sum it up,

Let's code it out!

So now we know the styles of machine learning, but we don't know how to implement it. Let's say that we want to predict cost of houses. So, how would you write a program which estimates the weight of an animals body just by knowing the weight of its brain?

In the traditional programming approach, we would basically write a program with lots if-elses which compare the weight of the brain with certain parameters and coming out with a result. In this case, there would be infinite amount of if-elses because the number of animal species is way too high, and, there would be enormous data which would have to be manually fed into the system for it to work.

Our traditional approach would look something like this :

def calculate_weight_of_body(brain_weight, animal_name):
    if animal_name == animal_one:
        if weight < certain_value:
            return weight_certain_value
    elif animal_name == animal_two:
        if weight < certain_value:
            return some_value

...and so on.

So we follow the Machine Leaning approach to solve this problem. Let's consider that we have a dataset which looks like this:

Weight of Brain	Weight of Body
3.385	44.50
0.480	15.50
1.04	5.50

.. and so on. Also, keep in mind that since our data is labelled, we are following the supervised learning style.

(Actual dataset included in the code repository. Check the code section for links).

So now let's get into the code. This time, we will use the scikit_learn library for performing Linear Regression. Don't worry if you don't understand what it is right now, because I'll be explaining more about it in the upcoming posts! We have three main dependencies :

Pandas : We'll be using this library for quickly loading our data from the dataset file.
Scikit Learn : For performing Linear Regression
Matplotlib : For visualizing our predictions

So lets get into it

# Import all the dependencies
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plot

# Read data
data = pd.read_fwf('brain_body.txt')
x_values = data[['Brain']]
y_values = data[['Body']]

# Train model using Linear Regression
body_prediction = linear_model.LinearRegression()
body_prediction.fit(x_values, y_values)

# Visualize results
plot.scatter(x_values, y_values)
plot.plot(x_values, body_prediction.predict(x_values))
plot.show()

And there you are! You've just taken your first step into Machine Learning. But right now, you're curious about how it really works?

How this works?

We use Linear Regression to trace a simple line which is the best fit for all our data points. Consider the x-axis to be the weight of the brain and y-axis to be the weight of the body. Now, we plot all the points form our dataset on Cartesian plane.

Linear Regression helps us find the relation between the points by tracing a line with the best fit for our data. The equation of our line is

where b is the y-intercept and m is the slope of the line. And by using this line, we can have an established relation between the brain weight and the body weight!

The graph of our points and our traced line would look like this:

Is this magic?

No. This is not magic! Once you start seeing how easily machine learning techniques can be applied to problems that seem really hard (like handwriting recognition), you start to get the feeling that you could use machine learning to solve any problem and get an answer as long as you have enough data.

Summing it up

To sum it up, we now know three basic things :

Machine Learning : Letting the computer figure out the steps on its own, based on the data we provide it with.
Three different styles : Supervised, Unsupervised and Reinforcement
Linear Regression : It allows us to model relations between independent and dependent values via a line of best fit.

Code

The code for this post has been uploaded on my Github Profile!

Find the source code here : https://github.com/C-Aniruddh/linear_regression

Implementing a sequence to sequence model in TensorFlow

Aniruddh Chandratre — Sat, 01 Jul 2017 15:08:40 GMT

There are a number of ways to implement chatbots. You can find organisations like API.ai, Gupshup, Recast, etc providing an easy to API for intelligent bots. All these bots use a model known as Rule based model. Using the rule based model, a developer can define a set of rules which the bot follows while responding to queries. The most popular way of implementing the rule based model is using AIML (Artificial Intelligence Markup Language). AIML is an XML based language which allows developers to create patterns and templates. Whenever the bot recognizes a pattern, it follows the corresponding set of rules.

Though it is easy enough for everyone to use AIML and make their own chatbots, it is extremely difficult to implement all the possible use cases the bot might encounter. And thus, the bot would eventually fail to respond when tested with an unknown pattern.

But then how to create chatbots? Consider a bot which can learn from actual human interactions? This is where machine learning comes into play.

In traditional programming, if we have a problem statement, then to solve it, we go step by step. For example, if you are programming an app which can recognize between different types of chairs, you would probably write a code like :

def number_of_legs():
    do_something()
def type_of_material():
    do_something()
...

But in Machine Learning, we define the problem statement, and tell our program, that here are a bunch of images of different types of chairs, learn which one is which type on your own. So, we can say that in machine learning, we do not define the steps to solve a particular problem, but tell the computer to learn all the steps by itself. Sometimes it might also happen that we won't have any idea about some steps. Let's leave this here. I'll be coming up with a proper series of posts to get started with machine learning!

Types of models

Let us get back to chatbots. There are different types of models which can be implemented in a Machine Learning based bot. These types of models can be mainly classified into two types:

Retrieval-based models
Generative models

The Retrieval-based models choose a response from a collection of responses based on the query. It does not generate any new sentences, hence we don’t need to worry about grammar.

While the Generative models are quite intelligent. They generate a response, word by word based on the query. Due to this, the responses generated are prone to grammatical errors.

Sequence to Sequence (seq2seq)

Sequence To Sequence model demonstrates the Learning Phrase Representations using RNN Encoder-Decoder. Basically, It consists of two RNNs (Recurrent Neural Network) : An Encoder and a Decoder. The encoder takes a sequence(sentence) as input and processes one symbol(word) at a time.

It converts a sequence of symbols into a fixed size vector that only encodes the important information in the sequence, and losing the unnecessary information. This is possible by the use of LSTMs (Long Short-Term Memory). The LSTM has the capability to alter the passing information by using Gates. A simple LSTM can be visualized as follows :

In simple terms, the LSTM can use the Forget Gate to alter the unnecessary information and throw it out allowing only the necessary information to flow forward. You can head to an article written by Christopher Olah which properly explains the working of a LSTM here.

A seq2seq model can be represented as follows :

Here, every block you see is a LSTM. Each hidden state influences the next hidden state and the final hidden state can be seen as the summary of the sequence. This state is called the context or thought vector, as it represents the intention of the sequence. From the context, the decoder generates another sequence, one symbol(word) at a time. Here, at each time step, the decoder is influenced by the context and the previously generated symbols.

Tokenizing a sentence and padding

Before the dataset is completely usable, we have to pad the dataset. Padding refers to a process where we convert variable lengths of sentences to a fixed length. There are four special symbols we need to understand before we dive deeper.

EOS : End of sentence
PAD : Filler
GO : Start decoding
UNK : Unknown; word not in vocabulary

To understand padding, consider a small conversation.

 > Hi! How are you?
 = Hello. I am fine.

Here, if we consider a fixed length of 10, the sentences would be padded in accordance to 10 elements. The above conversation would look like :

 > ["PAD", "PAD", "PAD", "PAD", "?", "you", "are", "How", "!", "Hi"] 
 = ["GO", "Hello", ".", "I", "am", "fine", ".", "PAD", "PAD", "PAD"]

Bucketing

Padding can be useful in cases of a dataset with all sentences having almost the same number of words. Consider the above conversation with a fixed padding while the largest sentence in our dataset is of 100 words. There would be 94 "PAD"s in the query. This would cause an information loss.

Bucketing solves this problem, by putting sentences into buckets of different sizes. Consider this list of buckets : [ (5,10), (10,15), (20,25), (40,50) ]. If the length of a query is 4 and the length of its response is 4 (as in our previous example), we put this sentence in the bucket (5,10). The query will be padded to length 5 and the response will be padded to length 10. While running the model (training or predicting), we use a different model for each bucket, compatible with the lengths of query and response. All these models, share the same parameters and hence function exactly the same way.

Word Embedding

Word Embedding is a technique for dense representation of words in a low dimensional vector space. Each word can be seen as a point in this space, represented by a fixed length vector. A word embed would look like the following:

The word embedding in our project with a Vocabulary of 20,000 in 3-Dimensional Space would look like:

Code Explaination

The code hosted for DeepConversations would be explained in the next post. Follow the blog for updates!

Fork the Source

A seq2seq model implementation code has been open sourced and can be found on my GitHub profile. Fork the source here.