Danbooru

Image search

Posted under General

Shuugo said:
It's possible for konachan too? Since wallpapers are often resized to fit other's screens it would be really handy tool. If you don't want to add that to your server I can try running it locally

Certainly. Not a problem. Do you want it as a separate search DB or as part of the Danbooru search? Doesn't make much difference to me either way. And if there are any hidden posts on konachan I suppose I'd need a privileged account there, or whatever you need to see them.

piespy said:
Certainly. Not a problem. Do you want it as a separate search DB or as part of the Danbooru search? Doesn't make much difference to me either way. And if there are any hidden posts on konachan I suppose I'd need a privileged account there, or whatever you need to see them.

I prefer a separated db. If you register I'll give you privileged account. There's a few posts hidden about 200 only but it's better to have it full.

Thanks

piespy said:
I'd be happy to make a search for moe.imouto.org available on my system too if you like, since I've already got it all set up. It could be a separate image DB or merged with the Danbooru DB so that searches will find images from either site.

You can also run the image query server yourself, though at the moment it's a rather labor-intensive process to set up since it needs a bunch of scripts and must be compiled from source. Well, talk to me on IRC (PM piespy@rizon) or something if you want to run it yourself. For a small image set like moe.imouto.org it wouldn't take any system resources at all (a couple MB of memory and diskspace at most).

Though I'll definitely be happy to add it to my system, that'd be less work altogether. Even if it's separate DB I could add an option to search both Danbooru and moe.imouto.org if people just want to find a pic and don't care who has it. Even then you can use the xml query if you want to make your own interface for it.

It'd be great to have it in the same database as danbooru since sometimes we have trouble IDing artists or where the image comes from. You'll probably need to register since I hid loli images from unregistered users.

dovac said:
It'd be great to have it in the same database as danbooru since sometimes we have trouble IDing artists or where the image comes from.

I made a separate database now for konachan. It'd actually be a little easier to make one for moe.imouto too now instead of combining it. And anyway I plan to add a multi-search feature that combines search results from all databases in my system.

If you want it combined just to find your images on Danbooru, you could instead add a "Find on Danbooru" link that goes to db-search.php with your thumb (like the hidden "Similar images" link does now). When I have the moe.imouto thumbs it doesn't even need to get it from your site to do that. Though if you'd still like it combined anyway, I can do that, no problem. Though then you couldn't search just for images on moe.imouto, you'd also get Danbooru results. Anyway, let me know how you'd like it done.

You'll probably need to register since I hid loli images from unregistered users.

I already seem to have a privileged account there. Not sure how that happened... apparently I signed up in December but I don't know how it got privileged access. Oh well, it accepted my username and password so I guess it's my account anyway.

piespy said: I already seem to have a privileged account there. Not sure how that happened... apparently I signed up in December but I don't know how it got privileged access.

There are no real user levels on moe.imouto, everyone is privileged by default (unless you're a mod or admin).

piespy said:
I made a separate database now for konachan. It'd actually be a little easier to make one for moe.imouto too now instead of combining it. And anyway I plan to add a multi-search feature that combines search results from all databases in my system.

If you want it combined just to find your images on Danbooru, you could instead add a "Find on Danbooru" link that goes to db-search.php with your thumb (like the hidden "Similar images" link does now). When I have the moe.imouto thumbs it doesn't even need to get it from your site to do that. Though if you'd still like it combined anyway, I can do that, no problem. Though then you couldn't search just for images on moe.imouto, you'd also get Danbooru results. Anyway, let me know how you'd like it done.

Whatever is easier for you, I'll just have another link for search on danbooru like you suggested.

piespy said:
Due to popular demand (well, one person asked for it), I've added a rudimentary XML interface to the search, e.g. to find posts similar to a Danbooru image use it like this:
http://haruhidoujins.yi.org/db-search.xml?url=http://danbooru.donmai.us/data/preview/be8c6eb6b9760f4ab835b042f4069296.jpg.

The "take you to the wrong image" text is coming through on db-search.xml searches. (Not a big deal since it ends up being valid XML anyway, though.)

Is the database of the db-search system available to public? I'd like to get a copy of the database (feature/color samples). So I can check to see whether my post is dupe or not. I found it annoyed to see my posts being parented or flagged as dupe or jpeg artifacts.

BTW, is this system based on imgseek's isk-daemon or just the client part?

lemontree said:
Is the database of the db-search system available to public? I'd like to get a copy of the database (feature/color samples). So I can check to see whether my post is dupe or not. I found it annoyed to see my posts being parented or flagged as dupe or jpeg artifacts.

The Danbooru similarity database is 134 MB large, and my server has rather limited bandwidth so I'd prefer not to make it publically available. Feel free to use the image search as much as you like, it doesn't actually use much bandwidth or CPU.

Though it won't help regarding the jpeg artefacts...

BTW, is this system based on imgseek's isk-daemon or just the client part?

It was based on the imgdb code I got from their public svn. I think that's what's used by isk-daemon though I could never get that to work. However I wrote my own client code for it, which acts as TCP/IP database server queried by PHP. But by now practically none of the original code remains, I've converted it to real C++ and I had to optimize much of it (better choice of containers and algorithms, and much improved cache coherence while searching) to make a large DB like Danbooru feasible. Originally it used over 2 GB of memory for the DB and took several seconds to conduct a search, now it's 33 MB and takes less than 80 milliseconds.

piespy said:
It was based on the imgdb code I got from their public svn.

To tell you the truth, I think only the "haar.c" should be kept (since I'm not good at mathematics.) And the rest could be rewritten. I always wonder whether using a real DBMS is better than using a native database. So the feature/color samples could be stored together with the post/image information.

Though it won't help regarding the jpeg artefacts...

Of course, such content based image retrieval systems will help. By finding the original one, you know exactly that it's a jpeg artefacts. It's always a pain that you come across a great image and find that it's just jpeg artefacts. But now I can simply check to see whether there're some similar ones around here, and hopeful, there's the original one among them.

lemontree said:
To tell you the truth, I think only the "haar.c" should be kept (since I'm not good at mathematics.) And the rest could be rewritten. I always wonder whether using a real DBMS is better than using a native database. So the feature/color samples could be stored together with the post/image information.

I don't know, you'd have to twist most database systems pretty hard to be efficient at doing a similarity search based on Haar coefficients. Basically, it does "add $weight[$coeff_type] to each image that has $coeff, then return top 16" for every one of the top 40 coefficients in each channel. I think either way you'll need a specialized database format. A flat list of images that have any given coefficient is quite effective for now. So long as you pay a bit of attention to cache locality and don't actually implement a list of integers using std::list<int> (lol).

Anyway, if you're interested in the code contact me on IRC. Even though it's GPL'd, I'd like to contact the original authors first before making it publically available.

Oh, and I forgot. The XML interface now supports POST too, so you can query it using files you upload as well, not just URLs. You'll have to construct the POST request yourself though, in whatever tool you use to do XML queries.

Can you put the image extension in the XML results, so thumbnail links can be generated without having to query separately? With that, I think everything in the HTML output could be regenerated from the XML.

I'd suggest allowing the same full service name in the service parameter as the XML output uses, so clients don't have to have their own mapping list.

The extension of the original image? I don't know that, I only know the thumbnail (which is always .jpg). But if you want a link to my copy of the thumbnail, I can add that of course. Linking to the original might fail, at least moe.imouto.org does referrer checks IIRC.

As for the second, done (e.g. &service[]=konachan.com)

1 2 3 4 5 6