Discovr: a flickr experiment gone wrong

I need help with this. I had a dream… Well, not so much as a dream, maybe a “It’d be cool to…”

I thought it’d be nice to discover new photos on flickr using your favorite photos and the people who also favorited those photos, and the favorite photos of those who also favorited my pictures. Still with me?

It’s actually a quite simple code (about 500 lines, check it on github: discovr), but it’s terribly slow. Some possible reasons:

  • Way too much data. I’ve found people with around more than 18000 favorites, and there are photos with more than 2k fans. After limiting to 50 last favorites, the numbers are still creepy. Following from my personal favorites (366), I discovered 1268 users and 52632 photos
  • Too complicated for an API. This is the kind of feature that wouldn’t be so hard to implement if you have access to the flickr database directly, but having to do so many requests adds a lot of time to the process.
  • Inefficient library. I had to do some modifications to the flickr ruby library just to make it work, but it’s still quite inefficient in some cases. Want to know the url of a picture (knowing the picture id)? 4 (completely unnecessary) API calls
  • My code is bad. OK, I know it’s ugly to start blaming everyone else. I know my code is not very good, as it’s a quick prototype. Still, I’m not sure if making my code/libraries better would be enough improvement given the network/api bottleneck

The simplified algorithm goes like this.

  # method from class User
  def similar_pictures
    similar = {}

    favorites.each do |favorite|
      favorite.favorited_by.each do |user|
        user.favorites.each do |v|
          similar[k] ||= {:weight => 0, :picture => v[:picture]}
          similar[k][:weight] += 1
        end
      end
    end

    similar.values.sort {|a,b| b[:weight]  a[:weight]}.select {|v| v[:weight] > 1}
  end

So I’ve created a github repository and uploaded the code: discovr at github. Feel free to clone, test and improve

About these ads

5 thoughts on “Discovr: a flickr experiment gone wrong

  1. Some ideas (Too lazy for a patch :P):
    1) taking very few favourites / users randomly
    2) saving meta-data on a temporal cache, first request will be always slow, but next will be fast.
    3) Hey, you could download Flickr… Not so crazy, after all google downloaded the internet.

  2. 1) could be nice. Switching from *most* favorited to *latest* favorited could be faster. I wonder if the results would be worthy

    2) Already keeps caches. But they won’t help most users, unless…

    3) …it keeps running for a while and mirrors flickr :) But that would take a huge database, and many users willing to wait 30 minutes to see some pics :(

  3. One way of limiting the number of users (although not sure it would actually make it faster) would be to go the last.fm way.

    Last.fm makes a list of your “neighbors”, that is, the people who like the same of music you do. First find people who not only added one picture you added as a favorite, but actually several. Make a list of the people who have most in common with you, keep 10 to 20 of them, and find the images that most of them added as their favorite. You will get more relevance this way imo.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s