Danbooru

Pixiv/Seiga/Tinami/Nijie Top Images Script

Posted under General

These are a few scripts I've made for doing danbooru-filtered searches of several different sites.
It also allows you to order by various different scores that the normal site doesn't allow you to.
Each post also has links to IQDB, aswell as an exists/meh link to manually filter it.
The "meh" filter is simply for images that aren't danbooru-quality, tutorials, etc.
There is also a simple tag search which uses the tags of said site.

Pixiv:
This can be found @ http://pvdb.codeanimu.net/. (You will need a cookie to view thumbnails, which can be found here.
It can be ordered by id/ratecount/totalrate/views/bookmarks/commentcount. There is also an averagerate, which is (totalrate / ratecount), this is limited to 300ratecount+ though.
This is 1/2 complete (1.1M/2.2M images gone through).

Seiga:
This can be found @ http://nndb.codeanimu.net/.
It can be ordered by views/comments/clips.
This is filtered entirely via IQDB, so it still contains many unfiltered images. So far, 46K images have been filtered this way.
This is 1/3 complete (8.17M atm), with majority of the images filtered. 1.1M/4.3M artists downloaded.

Tinami: (Added 22/08)
This can be found @ http://tndb.codeanimu.net/.
It can be ordered by views/comments/clips.
This is also filtered entirely IQDB. Only 10K images have been filtered though.
This is complete as of 2012-07-15.

This script is still rather experimental, so expect possible bugs :<
TLDR: It's an thumbnail page that lets you view the top images on pixiv/seiga/tinami (that have been filtered with danbooru/iqdb).

Known issues:

* Searching tags that have alot of images "??" (Touhou) might take some time..

Updated by Yoposoc

Why would you need two links to IQDB? Shouldn't a full-size link to IQDB be reserved for manga images?

I can almost understand the argument for tall comics but since those are going to use the danbooru thumbnail it's still going to return crap.

edit: I see that it filters manga images entirely so there's really no reason to search anything but the thumb aside from burning piespy's bandwidth.

Log said:
Why would you need two links to IQDB? Shouldn't a full-size link to IQDB be reserved for manga images?

I can almost understand the argument for tall comics but since those are going to use the danbooru thumbnail it's still going to return crap.

One is a thumbnail link, another is full/normal sized link.
Mainly have both since I'm assuming the full-size link would be more accurate, added a thumbnail link since it works most of the time aswell as causes less strain on IQDB.

It shouldn't be any more accurate, both sites use 150x150 thumbnails and IQDB only catalogs the thumbnails so the results should be more or less the same aside from transparent background images which pixiv generates as a white background and danbooru as a black so they should never match up.

Log said:
It shouldn't be any more accurate, both sites use 150x150 thumbnails and IQDB only catalogs the thumbnails so the results should be more or less the same aside from transparent background images which pixiv generates as a white background and danbooru as a black so they should never match up.

Oh, thought IQDB had versions of both.
Will remove the highres links for JPG images then, will keep for PNG due to to the black/white backgrounds though.

Given the stated limitations with this approach, it's worth noting that the Endless Pixiv Pages script has the option of hiding thumbs below a given number of favorites on tag searches, runs Danbooru source searches against all images (manga and non-manga) for a good indication of if they're already on here, and adds IQDB links (using small thumbs for non-manga) for good measure. The first two are disabled by default.

Updated

RaisingK said:
Given the stated limitations with this approach.

Just wondering what you mean by "limitations", unless you mean by it not having the entire DB?

runs Danbooru source searches against all images (manga and non-manga)

Small suggestion for this, since I have more or less a copy of the DB post database, might be worth having it use that for a "first check", if it doesn't exist then it would check Danbooru? (Since the DB I have isn't always up to date)
Would lessen a bit of the strain on danbooru's servers :3

Script Update:
Implemented somewhat of a voting system to say if the image exists/meh.
Edit: Basically once the image gets enough votes, it will be marked as such, and hidden from the list.
Sadly I will have to do this manually for now :<

Updated

DakuTree said:
Just wondering what you mean by "limitations", unless you mean by it not having the entire DB?

To pull out of your first post:

  • At the moment it is REQUIRED that you go to this before loading the above link.
  • It filters using my own DB
  • At the moment it filters out any manga images
  • At the moment there is no tag searching, will "try" and add this in the future..
  • Also restricted it to the top 1000 images mainly due to any higher being somewhat a strain on my DB..
  • This script is rather experimental, so expect possible bugs :<
  • The pixiv DB it's using is also still not complete mainly due to the insane size of it.

DakuTree said:
Small suggestion for this, since I have more or less a copy of the DB post database, might be worth having it use that for a "first check", if it doesn't exist then it would check Danbooru?

Meaning no offense, it's incomplete, possibly-buggy, maintained by a single independent user, requires me to figure out said third-party list, complicates EPP, and requires many more queries on the users' end. EPP already cuts down on the queries needed by searching broadly (/img/name/*) first and also caches non-manga results.

(Since it's a relatively new feature and now disabled by default, I'm also guessing that most users don't edit the source code to turn it back on and trigger the queries anyway.)

Good luck with your effort, but I'm not going to make use of it, and I don't want to derail your thread any more talking about EPP...

Updated

I probably won't use this yet, lack of an ability to search / filter by tags kills it for me, but I applaud your effort.

There used to be a greasemonkey script that allowed you to sort a query Pixiv by bookmark score, though it ceased functioning long ago when Pixiv changed its format. It is a feature I would very much make use of if it fit with everything else I do there. The other score metrics are also a great idea, especially since they typically aren't available via the search query pages.

Downloading all of Pixiv to make this work feels a little crazy, and I hope they don't block you for it, though it does provide you with a very rich database to work with. Maybe down the road once this is more fleshed out, you could provide an API to allow script-writers to make use of your efforts.

Anyway keep up the good work, it seems like you have some good ideas.

DakuTree said:
The pixiv DB it's using is also still not complete mainly due to the insane size of it.

this work seems promising. but i'll just ask if you somehow discovered how to pull even private images from pixiv into your own db? probably, the direct URLs. please let us know, thanks.

Shinjidude said:
I probably won't use this yet, lack of an ability to search / filter by tags kills it for me, but I applaud your effort.

I'll see about getting this added when I'm a bit more awake. Problem is, the way I've saved the tags means without changing around the DB a bit, it would take quite a while just to make a single query. Basically I could implement it at the moment but it would kill/slowdown my server.

Maybe down the road once this is more fleshed out, you could provide an API to allow script-writers to make use of your efforts.

Pixiv actually already has an API although it is really terrible to work with. An absolute pain to even use, and it's restricted by page limit (Which is why I'm downloading by artist).
I have already created a somewhat usable API @Github .

ghostrigger said:
this work seems promising. but i'll just ask if you somehow discovered how to pull even private images from pixiv into your own db? probably, the direct URLs. please let us know, thanks.

I'm pulling via http://iphone.pxv.jp/iphone/member_illust.php?id=$artistid&p=$page .

Not too sure if it pulls private images, if you have an image ID of a private image with an artist ID under 55K~ I could check :>

RaisingK said:

post #304524 and post #305919, for different reasons (privacy level, MyPicks).

Neither seem to exist, which makes me wonder if the API actually provides any 18+ content.

Edit: *facepalm* Apparently you can see 18+ content if you're using the &PHPSESSID=$sessid . This means will need to go through the past 50K~ artists to double check if anything missed :<

Edit2: I knew it was a smart idea to add a rating system. Someone has already been voting images exist that don't exist, to a fair amount of images too. Don't do this, or I WILL ipban you from using the script.
The only time it should be marked "exists" is when it is actually on danbooru, or a close enough version is on danbooru (Which is why the IQDB link is there).
The same goes for the "meh" vote, which should only be used for art that doesn't really belong on danbooru, or isn't that great quality.

Updated

Tag searching has been implemented :3
At the moment it only works with single tags, shall try and make it work with multiple too..

Like pixiv it's case-sensitive (So searching some english tags, I.E "Fate" and "fate", would bring different results).

Only one issue with this is, there was a few tags that conflicted my download script (Mainly anything containing ' or "), so if you are searching a tag containing that you may have to remove them.

Edit: Forgot to note that only 340K~ of the 1.1M are tagged with support for searching. It appears that it takes a while to convert them :<

Updated

Nice! That one feature makes your system very useful as a way to quickly find good quality images in what you are looking for.

I almost never use more than one tag on Pixiv anyway, though later incorporating full tag search would be a nice feature.

Anyway, the system is still rather rough around the edges, but it looks like you've got the start of something quite useful and makes up for a huge annoyance in Pixiv's interface.

At the moment it is REQUIRED that you go to this before loading the above link.

how about keep the thumbnails on your server? (or if your pixiv DB already have them, even better)

the exists/meh/IQDB buttons are too close to each other, might misclick

and good luck not getting ipblocked by pixiv(iirc in the TOS there's a "don't use bot script")

Shinjidude said:
I almost never use more than one tag on Pixiv anyway, though later incorporating full tag search would be a nice feature.

I'm somewhat the same. If pixiv had better tagging, it would be slightly more useful. It can still be handy for looking up specific things though (I.E Touhou + Yuri).

Anonymity said:
how about keep the thumbnails on your server? (or if your pixiv DB already have them, even better)

the exists/meh/IQDB buttons are too close to each other, might misclick

and good luck not getting ipblocked by pixiv(iirc in the TOS there's a "don't use bot script")

I'd rather not keep the thumbnails due to the reason of not wanting to rape pixiv's server more than required, even if each thumb is only 20KB, it's still 22M thumbnails (Which is around 419GB?)
It would lessen the strain on their servers overall, but it would increase the chance of getting IPblocked. My hosting service doesn't really like R18 content on their servers either, and even if I hosted it from my home server, it would be SLOW.

There is always the chance of getting banned, but I'm not exactly running my download script 24/7, every few days I run the script through 10K~ artists, which is about 10K-20K API calls depending on how many pages of images each artist has. It's an insane amount, but since there isn't any downloading of images..it shouldn't be too bad.
Fairly sure this is the related thing in the TOS:

TOS Article 13.14:
Putting burden on the server over the normal extend or damaging the service operation or network system.

This little feature should make things a bit better, managed to talk to the creator of IQDB to see if I would be able to send all thumbs through IQDB automatically. Got an OK to this (Limiting the script by 3-5 second API calls though).
This should start filtering out everything that exists on danbooru (Even if the source is different). There is always the risk of it filtering stuff that doesn't exist, so I'll make this optional :3

Edit: Entire tag database for images I have are now converted.

Updated

Another small update for this.
Implement a simple "average rating", which is basically (totalrate / ratecount). This should make it slightly easier to find good images.
At the moment it only shows average ratings with a ratecount above 500, since a lot of images below that will instantly have an average of 10 (Considering there is 160K~ 10rate images with only 1 rating).
Also implemented a way to see the database non-filtered, if you're curious to see the top images on pixiv :3
Don't think I mentioned above, but should also be able to filter by rating now too (Safe/R18/Guro etc.)

Might also be worth mention that at least 625K~ images have been filtered, which is over half of the Danbooru DB. Which is a fair amount considering that only 380K/4.3M artists have been downloaded.

Since there isn't much need in creating another thread for this...

Went ahead and created a version of this for Nico Nico Seiga .
It's rather basic compared to the pixiv script, but it works just as well.
There isn't any rating filter at the moment, due to the API being confusing as hell to figure out (There isn't any documentation on it either.)
It can be ordered by id/submitted/views/comments/clips.
All image danbooru filtering is done via IQDB aswell as matching the resolution of the image (So if it finds image above 90% > see if resolution is same).
Simple source filtering can't be done due to how highres URLs work.
At the moment it only has around 30K~ images due to the way I'm scraping the API, in total Nico Nico has at most 1.5M images, possibly much less. I've went through 200K id's, grabbed all their artist images and only gotten 30K..

Might also be worth mentioning that multiple tag searches have been implemented in both scripts.

1 2