Danbooru

Image search

Posted under General

I've made a somewhat experimental image search of all images on Danbooru available here:

http://haruhidoujins.yi.org/db-search.php

Unlike the search engines built into Danbooru, this allows you to search with an image file to find if this or similar images have been posted here. It's useful for example to find the original when all you have is a defaced and artefacted aeriesdies version, without having to guess all the tags that might be on the picture.

Since I don't have a copy of the danbooru database, I have to do an md5 search for the post. This will come up empty if the image is marked explicit and you don't have a privileged account, in case you're wondering why the search engine finds a pic but danbooru claims to not have it.

At the moment it'll only find images that are at least a few weeks old since that was when I got the thumbnail dump. If the search is found useful I'll add a regular update for the latest pics so those can be found too.

Updated

Hi, piespy. We've got a small community for image sorting/searching, and we've done several projects using the same sort of thing.
We'd be glad to have you join us, to discuss and compare searching systems.
It's on freenode.net, channel #tma

ADVERTISER VOICE, OFF.

This is pretty awesome. Any chance a developer of danbooru could incorporate it into the trunk? This could be a boon for moderators -- for example make it available (or even autorun it) from the upload form to avoid duplicates...

0xCCBA696 said:
This is pretty awesome. Any chance a developer of danbooru could incorporate it into the trunk? This could be a boon for moderators -- for example make it available (or even autorun it) from the upload form to avoid duplicates...

I already use it if I feel like the image is possibly on the server already (that occurs only rarely). Adding it to the trunk would make sense as it's a great tool, but it also works independently just fine.

The system already supports downloading images by URL, using http://haruhidoujins.yi.org/db-search.php?url=<URL> so it can be used to check any picture with just a link.

I built it using a heavily tweaked version of imgSeek (for faster searching and greatly reduced memory usage) that needs to run as a separate database server. The imgSeek source is GPL, so it shouldn't be a problem to package it as separate server with Danbooru however.

Currently it needs about 100 MB of memory for the Danbooru image set, and takes ~0.25 seconds to find the 16 best matches on my 2.2 GHz Opteron (plus another 0.5+ seconds to find the corresponding MD5s, which could be eliminated if I wasn't lazy). Both memory and CPU usage would scale linearly with the number of images but there's probably still some room for further optimizations.

I don't know if it'd be appropriate to autorun it on image submission, since it IS rather CPU intensive, and it could only run after the image is already uploaded anyway (except for URL submissions). It might be useful for the moderation queue perhaps, but I don't know how that works. Maybe it could be added as a link for each pic to open a new window with the search results.

Tighter integration with Danbooru would have the advantage that the search results could show tags, dimensions, and other information already. Though for the purposes of the moderation queue or whatever I could just add an interface that returns the MD5 of the search results in a parsable form (JSON/XML/whatever).

I guess it's up to the admins/mods to decide if this would be useful or not.

piespy said:
Maybe it could be added as a link for each pic to open a new window with the search results.

That's what I do now so that would be useful, but it really isn't a problem, I just put your site in a separate tab and go at it. It only came up for two images at most so far, so I don't think it would be worth the effort, to be honest.

surasshu said:
That's what I do now so that would be useful, but it really isn't a problem, I just put your site in a separate tab and go at it. It only came up for two images at most so far, so I don't think it would be worth the effort, to be honest.

It should probably be easy to whip up a greasemonkey script to do what you're doing automatically, but I still think this script would be nice to have in the trunk - especially since it could auto-update its thumbnail cache, as well as link directly to posts rather than to searches for their MD5 hashes, or even display other info about the file, like piespy said.

Shuugo said:
Adding it to your bookmarks should do too

yeah, but that wouldn't alert others to its existence :P this thread has already sunk into oblivion once, until I bumped it, haha :) I still hope this thread might catch a developer's eye...

I'm not sure if the server has enough spare cycles to run something like this automatically. ~50% idle, around 1.20 load per core. I don't want to burden piespy's server by interfacing with it directly.

If it's a feature for the mod queue, I doubt the load would be enough to cause any trouble, if it's even noticable. I certainly wouldn't mind to test it if you like.

Currently the search is being used about 10-20 times per hour... even with a link on the upload page I doubt the server would break a sweat.

And even for a "find duplicates/similar pics" link on each post page would be manageable with some caching, I think. Though I'm not sure that would generally be useful.

I've tweaked it a bit more and now the search is practically instant (as long as the database isn't paged out on the server which it might be if there weren't any searches in a while).

It now takes less than 0.05 seconds to find a pic in Danbooru's database. It should be fine to use the search for any reason whatsoever. So feel free to do so :)

1 2 3 4 5 6