savoy082

Posts

Showing posts from September, 2018

Markov Decision Processes 03: optimal policy variation

September 30, 2018

Pt En This post is going to follow up what was discussed here , and I will show you how changing the value of the discount factor may affect what the optimal policy is. For that, consider the MDP defined below: This MDP is inspired in the example I have been using; I added an extra state and stripped the MDP of the majority of the transitions so we can focus on what is essential here. From the MDP above there is a clear contender for the title of optimal policy: $$\begin{align} &\pi(H) = E\\ &\pi(T) = D\\ &\pi(TU) = D\end{align}$$ Assume that when we enter the MDP there is a $50\%$ chance we start Hungry and a $50\%$ chance we start Thirsty. If we use the policy $\pi$ then the expected reward is $$E[R | \pi] = 0.5E[R_T | \pi] + 0.5E[R_H | \pi]$$ where $E[R_s | \pi]$ is the expected reward we get by following policy $\pi$ starting from state $s$. From my previous post we know that $E[R_T | \pi] = E[R_H | \pi] = \frac{1}{1-\gamma}$ and hence $$E[R | \pi] = \frac{1}{1-\g...

Markov Decision Processes 02: how the discount factor works

September 28, 2018

Pt En In this previous post I defined a Markov Decision Process and explained all of its components; now, we will be exploring what the discount factor $\gamma$ really is and how it influences the MDP. Let us start with the complete example of last post: In this MDP the states are Hungry and Thirsty (which we will represent with $H$ and $T$) and the actions are Eat and Drink (which we will represent with $E$ and $D$). The transition probabilities are specified by the numbers on top of the arrows. In the previous post we put forward that the best policy for this MDP was defined as $$\begin{cases} \pi(H) = E\\ \pi(T) = D\end{cases}$$ but I didn't really prove that. I will do that in a second, but first what are all the other possible policies? Well, recall that the policy $\pi$ is the "best strategy" to be followed, and $\pi$ is formally seen as a function from the states to the actions, i.e. $\pi: S \to A$. Because of that, we must know what $\pi(H)$ and $\pi(T)$ a...

Markov Decision Processes 01: the basics

September 27, 2018

Pt En In this post I will introduce Markov Decision Processes, a common tool used in Reinforcement Learning, a branch of Machine Learning. By the end of the post you will be able to make some sense of the figure above! I will couple the formal details, definitions and maths with an intuitive example that will accompany us throughout this post. In later posts we will make our example more complete and use other examples to explain other properties and characteristics of the MDPs. Let me introduce the context of the example: From a simplistic point of view, I only have two moods: " hungry " and " thirsty ". Thankfully, my parents taught me how to eat and how to drink, so that I can fulfill the needs I mentioned earlier. Of course that eating when I am hungry makes me happy, just as drinking when I am thirsty makes me happy! Not only that, but eating when I am hungry usually satisfies me, much like drinking when I am thirsty usually satisfies me. Suppose that, g...

Twitter proof: folding my way to the moon

September 24, 2018

Pt En In this twitter proof we will see how the exponential function can mess up with objects from our daily lives!.. Claim: with less than $50$ folds, a piece of paper will be so thick that it will cover the distance from the Earth to the Moon. Twitter proof: a common sheet of paper is $0.1$mm thick. If we fold it once, it becomes $0.2$mm thick. Folding twice, $0.4$mm. Folding $49$ times, the paper becomes $2^{49}\times 0.1$mm thick, which is around $5.63\times 10^{13} $mm or $5.63\times10^7$km, $141$ times the distance from the Earth to the Moon ($398818$km). Neste post vamos ver como a função exponencial pode interagir com objetos do nosso quotidiano e criar resultados inesperados. Proposição: com menos de $50$ dobras, uma folha de papel fica com uma grossura superior à distância da Terra à Lua. Prova num tweet: uma folha de papel normal tem $0.1$mm de grossura. Se a dobrarmos uma vez, fica com $0.2$mm de grossura. Dobrando de novo, fica com $0.4$mm. Dobrando $49$ vezes,...

Twitter proof: neural networks and the linear activation function

September 18, 2018

Pt En In this post we will see why it is not helpful to have two consecutive layers of neurons with linear activation functions in neural networks. With just a bit of maths we can conclude that $n$ consecutive linear layers compute the exact same functions as $1$ single layer. Claim: having two fully connected layers of neurons using linear activation functions is the same as having just one layer with a linear activation function. We just have to lay down some notation in order for the maths to be doable. Assume the two consecutive layers of linear neurons are preceded by a layer with $n_0$ neurons, whose outpus are $o_1, o_2, \cdots, o_{n_0}$. Let us say that after that layer, there is a layer of $n_1$ neurons with linear activation functions $f_i(x) = a_ix + b_i$; the weight from neuron $t$ of the previous layer to the neuron $i$ of this layer is $w_{t,i}$. The second layer of neurons has $n_2$ neurons, with linear activation functions $f_i'(x) = a_i'x+b_i'$ and the...

Pledging to do 100 days of Machine Learning and progress log!

September 17, 2018

Pt En After watching this video from Siraj Raval, I decided to jump right on board of the #100daysofMLcode initiative! (even though I am something like 73 days late...) The goal here is to devote (at least) 1h every day, for the next 100 days, to studying ML or writing code! According to the rules posted by Siraj, I must: Make a public pledge for this, which this post is; Make a log of everything, which this post will also be; Whenever I see something related to this #100DaysofMLCode, be supportive! Progress log For the day $0$ I wrote this post and spent quite some time thinking about what I will do throughout. I am thinking of studying several topics about ML and then writing educative posts here, for the blog. For today I wrote this twitter proof , tackling a mathematical property of neural networks with linear activation functions. Started reading about Reinforcement Learning and Markov Decision Processes; already imagined a nice example I will be using when writing about th...

MatchWalker, a puzzle game of shape and colour

September 16, 2018

Pt En In today's post I will be sharing a game I made with just under $400$ lines in Processing, a wrapper for Java that makes drawing to the screen really easy. The goal of the game is really simple: go from the cell you are standing on (marked with the black outline of the ellipse, in the screenshot) to the cell that is framed in white. To do that, you can move a "cursor" (the black frame) with the $AWSD$ keys to choose the next cell you want to go to. To move, press the space bar. There are a couple of rules to moving, though: You can only move to the selected cell if it is in the same row or same column as the cell you are in; You can only move to the selected cell if it has the same colour or the same shape as the cell you are in. Rule number $1$ says you can only go in the directions these orange arrows cover: Rule number $2$ says that, from the cells specified by the above rule, you can only go to the white circle, diamond or vertical ellipse (precisely becau...

Pocket maths: your verification code is 446267

September 12, 2018

Pt En It has become quite common for online services to provide some form of 2-factor authentication when logging in from unknown devices. For example, whenever I try to access my Gmail account from a computer I never used, I get a text message with a one-time use 6-digit code. One day I was using that same service to log in into my email, when I noticed that one of the digits in the security code appeared twice, like the $1$ in $315641$. But when I read the other text messages from Google, I noticed that there were plenty more security codes with repeated digits than security codes that had six different digits. I found that weird and then decided to compute the probabilities of these events, just to check whether my intuition was tricking me or not... We are about to compute some probabilities regarding these $6$-digit codes - which I will start calling PINs for the sake of brevity - with the rather intuitive formula $$P(\text{some property}\ A) = \frac{\text{# PINs that satis...