Danbooru

Wild card searches broken?

Posted under General

Maybe this has been going on awhile and I never noticed, but I do these often enough that I'd think it's a recent development.

Per the third post in forum #11676, wild card searches seem not to be working correctly. They bring up only a small fraction of the actual results. *_sakura doesn't bring up a single kinomoto_sakura image. It brings up only 7 pages of results even though there are 39 of her alone.

Just wondering if this is something I've been oblivious to for awhile, or if it's a recent change. And if the latter, was it accidental or a necessary decision for bandwidth purposes or some such?

Updated by 葉月

It looks like wildcard searches in the post index match a hard limit of 20 tags, presumably for performance reasons (for those interested, see http://trac.donmai.us/browser/danbooru/trunk/app/models/tag/parse_methods.rb#L113). kinomoto_sakura wasn't included because it isn't one of the first 20 tags ending in *_sakura.

There doesn't seem to be any limit on wildcard searches in the tag index (http://danbooru.donmai.us/tag?name=*_sakura&type=&order=name&commit=Search seems to return all results). I don't know why there's a limit when searching posts and not when searching tags. If it's for performance reasons, maybe albert could be persuaded to increase the limit for privileged users, similar to how privileged users can search six tags at once instead of two.

albert said: Oh. I can increase the limit if you want. The issue is when some wiseass tries to search for *, which would basically time out the database.

Well, disabling a search for * would be a good idea (have it return a "you're an asshole" page or something =P), but yeah I do think the basic wildcard search needs to find many more possibilities than it does now. One of the best uses of a wildcard search is to ensure something isn't already uploaded or that a tag doesn't already exist (I do this often) or if you only remember part of a show/character/artist's name (I do this VERY often). If the results aren't complete/comprehensive it really cuts back the usefulness of it.

But I understand if realities of bandwidth and processing power and such force some sort of limit on that. If so, is there any way to, I guess, prioritize artist, character and copyright tag types so they get pulled in first?

Maybe, but the more common it is the more likely I am to know the tag already. These wild card searches are great for checking on lesser known series (especially ones without a popular anime or manga) as well. I'm not really sure what a good trade-off would be...

Could you just up the limit to 100 or so, or is that too high? I think that would be sufficient for any search like "*_firstname" or "lastname_*". I can't imagine any legitimate searches that would need a higher limit.

Also, there should be a warning message displayed when you hit the tag limit.

jxh2154: If you just want to know which tags exist you can search the tag index instead of the post index. It doesn't have a limit on wildcards.

evazion said: jxh2154: If you just want to know which tags exist you can search the tag index instead of the post index. It doesn't have a limit on wildcards.

True, useful in some cases but it's nice to see the image results right away too. I could load up each tag and all it would take is time... I guess the bigger issue is not knowing what I'm missing, so a warning when the tag limit was reached would definitely be good.

evazion said:
If you just want to know which tags exist you can search the tag index instead of the post index. It doesn't have a limit on wildcards.

The "tag index" also divides the results into pages. Hmm... Could something like that be used here?

Again, wildcard searches are mostly useful for less popular things, so that you can make sure you're getting the right idea, or the right character name, etc. Returning them by popularity goes against that very goal.

Then I have no idea what other option there would be other than have them displayed by highest popularity. Displaying them by least used would make no sense, it'd just return 18 tags with 1 to 3 posts on them. Right now, it returns by alphabetical order, which works for semi-specific searches but is useless in generic ones (http://danbooru.donmai.us/post?tags=*yuri&commit=Search)

If you are looking to identify a character by a picture you can ignore that list anyway and just look through the posts. If you're looking to find a character by name go to http://danbooru.donmai.us/tag?name=*yuri&type=4&order=count&commit=Search or even http://danbooru.donmai.us/tag?name=*yuri&type=&order=count&commit=Search just to be sure.

Ideally the limit would be made high enough that you would never hit it on a reasonable search, but low enough that overly general searches won't time out the database. If that can be done then it won't matter in what order the results are returned, since no one should actually be hitting the limit.

I went ahead and created a trac ticket for this. See http://trac.donmai.us/ticket/437.

So do we know how it chooses what results to display? It will always pick the same 20 tags, but these tags seem to be random. A search for z* returns 20 results from "z-" to "zu" alphabetically, despite the fact that there are nearly 300 different tags inbetween them, many having less or more posts than the ones displayed.

If a strict limit is needed, it should probably go in order of popularity, i.e., the top 25 instead of the first 25.

It'd be preferable to drop the limit altogether of course, but bad things happen when the DB times out, so perhaps it's for the best.

1 2