Discovr: a flickr experiment gone wrong

I need help with this. I had a dream… Well, not so much as a dream, maybe a “It’d be cool to…”

I thought it’d be nice to discover new photos on flickr using your favorite photos and the people who also favorited those photos, and the favorite photos of those who also favorited my pictures. Still with me?

It’s actually a quite simple code (about 500 lines, check it on github: discovr), but it’s terribly slow. Some possible reasons:

  • Way too much data. I’ve found people with around more than 18000 favorites, and there are photos with more than 2k fans. After limiting to 50 last favorites, the numbers are still creepy. Following from my personal favorites (366), I discovered 1268 users and 52632 photos
  • Too complicated for an API. This is the kind of feature that wouldn’t be so hard to implement if you have access to the flickr database directly, but having to do so many requests adds a lot of time to the process.
  • Inefficient library. I had to do some modifications to the flickr ruby library just to make it work, but it’s still quite inefficient in some cases. Want to know the url of a picture (knowing the picture id)? 4 (completely unnecessary) API calls
  • My code is bad. OK, I know it’s ugly to start blaming everyone else. I know my code is not very good, as it’s a quick prototype. Still, I’m not sure if making my code/libraries better would be enough improvement given the network/api bottleneck

The simplified algorithm goes like this.

  # method from class User
  def similar_pictures
    similar = {}

    favorites.each do |favorite|
      favorite.favorited_by.each do |user|
        user.favorites.each do |v|
          similar[k] ||= {:weight => 0, :picture => v[:picture]}
          similar[k][:weight] += 1

    similar.values.sort {|a,b| b[:weight]  a[:weight]}.select {|v| v[:weight] > 1}

So I’ve created a github repository and uploaded the code: discovr at github. Feel free to clone, test and improve

MySQL Conference 2009, I need an idea

I had a sad time this year when I missed the MySQL conference, since I had much fun last year in Santa Clara. I can’t miss it next year.

As a MySQL partner, and after almost 2 years doing MySQL training, I sure have interesting things to tell in the conference, but I’m not sure about what.

I will be thinking about this in the next weeks, but I’d appreciate some help. What topics are you interested in?