I need help with this. I had a dream… Well, not so much as a dream, maybe a “It’d be cool to…”
I thought it’d be nice to discover new photos on flickr using your favorite photos and the people who also favorited those photos, and the favorite photos of those who also favorited my pictures. Still with me?
It’s actually a quite simple code (about 500 lines, check it on github: discovr), but it’s terribly slow. Some possible reasons:
- Way too much data. I’ve found people with
aroundmore than 18000 favorites, and there are photos with more than 2k fans. After limiting to 50 last favorites, the numbers are still creepy. Following from my personal favorites (366), I discovered 1268 users and 52632 photos - Too complicated for an API. This is the kind of feature that wouldn’t be so hard to implement if you have access to the flickr database directly, but having to do so many requests adds a lot of time to the process.
- Inefficient library. I had to do some modifications to the flickr ruby library just to make it work, but it’s still quite inefficient in some cases. Want to know the url of a picture (knowing the picture id)? 4 (completely unnecessary) API calls
- My code is bad. OK, I know it’s ugly to start blaming everyone else. I know my code is not very good, as it’s a quick prototype. Still, I’m not sure if making my code/libraries better would be enough improvement given the network/api bottleneck
The simplified algorithm goes like this.
# method from class User
def similar_pictures
similar = {}
favorites.each do |favorite|
favorite.favorited_by.each do |user|
user.favorites.each do |v|
similar[k] ||= {:weight => 0, :picture => v[:picture]}
similar[k][:weight] += 1
end
end
end
similar.values.sort {|a,b| b[:weight] a[:weight]}.select {|v| v[:weight] > 1}
end
So I’ve created a github repository and uploaded the code: discovr at github. Feel free to clone, test and improve















