Recently i’ve been interested in machine learning, and made some sample implementations to understand better the subject. In particular recently i’ve implemented a simplified version of the K-Means Clustering classifier and then I decided to apply this algorithm to a more practical task.
I’ve implemented the K-Means Clustering to classify some text by the language it’s written in. In brief, you can provide some different text to the algorithm, decide in how many groups you want them to be classified and then run the algorithm. It will partition the text depending on the different relative frequency of the letters in it, trying to recognize some structure in the different languages. It’s not perfect, but works, and there’s a live version to try here.
The larger the text is, the more the algorithm will be accurate. In fact, with short text the result will be quite random.
Moving to the second lesson of this tutorial, i’ve learnt about the K Means Clustering classifier. Basically, We’re giving the algorithm some points of the space and it will partition the elements in K different sets. The algorithm is really easy, I suggest you to read the tutorial for further information.
The only thing I want to explain here about this algorithm is that, given a certain dataset (in our case a set of points) you should already know in how many sets you should partition it. Otherwise, the algorithm will get to a solution which may be inaccurate. To better understand this, try to use my little implementation (the link is below) making 6 sets of close points, and try to run the algorithm with K different from 6. You’ll understand why it’s important to have an accurate guess of the K value.
I modified my recent implementation of the K Nearest Neighbour to use this algorithm.
Here’s the link.
Anyway, what I’ve understood so far is that an important part of machine learning consists in classifiers. Classifiers are algorithm meant for “recognizing” (classify) objects by some of their features. Actually you feed them with a known dataset (already classified) and hope they will be able to classify any new object you throw in them by just recognizing their features and comparing them with the known features in the dataset. I won’t go technical on this (I can’t yet), I suggest you to read the linked tutorial if you’re really interested.
There are plenty of classifiers, but one of the simplest is the K Nearest Neighbour Classifier.It works just by representing the n features (which must be numeric) on the axis of a n-dimensional space. When you give it an element to classify, it finds the K nearest known elements (of the given dataset) and finds which class occurs most of the times in this K elements. The element is then assumed to belong to this class, and thus is classified.
Notice that this implementation uses as dataset of the current step every element in the canvas, even those generated randomly and then classified in previous steps. This is of course not very clever for this kind of classifier, but it makes the result less predictable, and fits better my purposes (I have no purposes).
Here’s the link, and here’s a screenshot:
Recently I started trying to create (or recreate) some website templates just for fun, making some modern layout examples. This is my first (and simplest) try. I may keep updating this sometimes.
It is made by using JQuery for animation and some styling, and uses open fonts found at the Google Fonts page.
Here is the example page.