Quantcast
Channel: My Blog by Philippe Adjiman » padjiman
Browsing all 10 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

Flexible Collaborative Filtering In JAVA With Mahout Taste

I recently had to build quickly a prototype of recommendation engine for a promising start-up company. I wanted to first test state of the art collaborative filtering algorithms before to build a...

View Article



Image may be NSFW.
Clik here to view.

Hadoop Tutorial Series, Issue #1: Setting Up Your MapReduce Learning Playground

Update: Instructions updated for hadoop 0.20.2. This is the first post of a series of small hadoop tutorials introducing progressively core hadoop functionnalities. You might be interested in that...

View Article

Image may be NSFW.
Clik here to view.

Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning

In the Issue #1 of this series, we set up the “learning playground” (based on the Cloudera Virtual Machine) in order to enjoy hands-on learning experiences around Hadoop. In this issue, we’ll use our...

View Article

Image may be NSFW.
Clik here to view.

How To Build A Relevant Real Time Search Engine Prototype In Few Hundreds...

By the end of the post you’ll find the code along with a small command line JAVA program to play with, but let me first describe the specifications of the real time search engine prototype that I’m...

View Article

Image may be NSFW.
Clik here to view.

Hadoop Tutorial Series, Issue #3: Counters In Action

Note: This post has been updated with a code working for hadoop 0.20.1. In this 3rd issue of the hadoop tutorial series, we’ll speak about a very simple but very useful hadoop’s feature: counters. Even...

View Article


Image may be NSFW.
Clik here to view.

Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner

Welcome to the fourth issue of the Hadoop Tutorial Series. Combiners are another important Hadoop’s feature that every hadoop developer should be aware of. The primary goal of combiners is to...

View Article

Image may be NSFW.
Clik here to view.

What Are The 10 Most Cited Websites On Twitter When Tweeting About Hot Trends?

Lately I wrote a post on how to build a relevant real time search engine prototype in few hundreds lines of code.  Using a tailored ranking algorithm based on link popularity in twitter,  I showed that...

View Article

Image may be NSFW.
Clik here to view.

A Generic Method For Sorting (Google Collections) Multiset Per Entry Count

I’m regularly using the excellent google collections library (now final and part of the more general guava libraries). One of the data structure I’m using the most is probably the multiset (a.k.a bag)....

View Article


Image may be NSFW.
Clik here to view.

How To Easily Build And Observe TF-IDF Weight Vectors With Lucene And Mahout

You have a collection of text documents, and you want to build their TF-IDF weight vectors, probably before doing some clustering on the collection or other related tasks. You would like to be able for...

View Article


Image may be NSFW.
Clik here to view.

A Data Science Exploration From the Titanic in R

Illustration of the (very hype) random forest learning method (click to see original website) Kaggle offered this year a knowledge competition called “Titanic: Machine Learning from Disaster” exposing...

View Article
Browsing all 10 articles
Browse latest View live




Latest Images