Danbooru

Pixiv uploaders and unhelpful source URLs

Posted under General

jxh2154 said:
I also found in the course of this that someone made a yazzz_(pixiv) tag (I really don't like the pixiv qualifier, much less the pixiv# qualifier but that's an aside I guess)

It should be used only when the artist has no webpage (where you can always find his "real" PN/HN) and his nickname on pixiv is ambiguous and there is already an _(artist) one, but people keep adding the _(pixiv) part without even checking if the tag is already "taken".

There are about 200 pages of badly sourced posts, so I've made a python script to fix them automatically.

What it does:

  • Loads a chunk of improperly sourced danbooru posts
  • Stores the bad sources in a dictionary (indexed by the post id)
  • Logs into Pixiv
  • For each entry in the bad source dictionary :

-> Fetches the Pixiv view page
-> Parses the page to find the image url
-> Checks it so that it matches the pattern http://img??.pixiv.net/img/*/*
---> If it matches, it stores the good source in a dictionary indexed by the danbooru post id
---> If it doesn't match, it's an error page and we don't want to change anything

  • For each entry in the good source dictionary :

-> Uploads the danbooru post with the good source

It has the following options:

--batch : batch mode, does not ask for confirmation
--count=COUNT : number of posts to retrieve and fix (default 100)

I wrote it for Python 2.5. It depends on Beautiful Soup (http://crummy.com/software/BeautifulSoup) for HTML parsing.

It's a bit slow because it has to load each individual Pixiv page / danbooru update page individually, but you can leave it in batch mode safely.

http://python.pastebin.com/f22d23e09

It's total spaghetti programming, but I think it works just fine. If there are any bugs, tell me.

      • The danbooru/Pixiv user/password are hardcoded, so you have to edit the source to make it work.

Same as Mysterio said. When I was first told to include pixiv source, I did it the wrong way, and it was only after someone told me to use the link for the true image (I've never accidentally used the preview image) that I did it right.

aldeayeah said: There are about 200 pages of badly sourced posts, so I've made a python script to fix them automatically.

Wow... I knew it would be bad but hadn't gotten around to doing the relevant source: search to find out how bad. As for your script, it's not something I'd be able to evaluate or run but maybe albert can comment.

Mysterio006 said: I think it may be helpful to make a brief mention of this on the upload page, because this is certain one of the biggest, if not #1, method of mis-sourcing images.

I think an even better strategy is to:
1) Outright reject the upload, telling them how to get the correct URL instead, unless...
2) albert can somehow (using a form of aldeayeah's script maybe?) set danbooru to recognize a bad pixiv source url is being added, and automatically input the correct source.

This should only involve changing what shows up in the source field, rather than any re-uploading. The image is fine (note: not talking about people who upload the preview images here, just who dont use the direct image url), the source is just off.

All this debate about source links makes me think there should be two source links on an image like there are on the artist wikis, a useful one with a link to the place on the site where the image is hosted (a blog entry or pixiv image page) and another one for the minority who like 404 errors and use Find Artist rather than asking "Does anyone know the artist?" on the comments.
Though if you ask me, I think the latter should be invisible to users since they have no use for mass add nor need a (possibly broken) direct link to the image they're already looking at.

memegui said: All this debate about source links makes me think there should be two source links on an image like there are on the artist wikis

This is irrelevant to Pixiv, since that redirect is automated.

You're also getting back into your bad habit of strawman arguments with that quip about people who "like 404 screens".

Sorry, jxh, I just think that it's ridiculous for a source link to be just a tool rather than something helpful to the users.

If I'm at post #349561, I should be able to reach http://www.pixiv.net/member_illust.php?mode=medium&illust_id=633229 through the source link. Why would I need to go to http://img05.pixiv.net/img/ugg/633229.png ?
That to me is an unhelpful source URL.

EDIT: Just noticed what you said, how it changes automatically for pixiv, but my point still stands. Another example then: post #146714 has a link to http://www.bupo.jp/ishikei/cgroom6a/k_mk01a.jpg which is (surprise surprise) a dead link, if it had linked to http://www.bupo.jp/ishikei/cgroom6b/top55.htm instead this problem wouldn't have happened. And Ishikei's is one of the forgiving ones, you can just cut the rest of the link to obtain the URL of his site http://www.bupo.jp/ishikei/ but if the source link directed you to, for example, http://blog-imgs-24.fc2.com/p/o/n/ponpondao/tsukasa.jpg, dead or not, there'd be no way to find out that artist's site was at http://ponpondao.blog10.fc2.com/ without going all the way back to the wiki article of that artist, much less that the blog post for it is at http://ponpondao.blog10.fc2.com/blog-entry-66.html . I think that that last link should be the the one on the Source field, is that really so crazy?

Updated

It may be the fact that the people who use the "find artist" feature are in the minority, but that's unfortunate. It's that minority that uploads the bulk of the well-tagged posts, and rescues the pixiv posts that are tagged with just "panties" or the like. Without proper image URL's and the "find artist" system, that task would be much much harder.

The system exists so that we don't have to rely on asking "who's the artist?" in the comments. I don't see why you would think that that's the better option.

You admit that you can get to an artist's site via the artist's page on the wiki, which is the sensible place to put the URL to the author's site. Why would you want to add redundant links to each image and wipe out the image URL's that would normally point you more directly to the image itself? Is one extra click that much a bother?

memegui said: without going all the way back to the wiki article of that artist

You make it sound so much harder than it really is. What we need to figure out is, "What reason is there to go to the blog entry?" The answer is important. To find the artist's site? Artist wiki. Notes in a specific blog entry? I'll leave the link in a comment. And if it's blog format, someone can also just go to the site and scroll down. It's not perfect, but it works.

When you only have one link it has to be the direct link to the image. You might not think all that background stuff is important but it is. It also makes source: searches more efficient. I do those often. Artist wiki + direct link is the best we've got, when there's only one source link.

***Note*** I am not, and never have been, *against* a better way to direct people to individual blog entries. I don't think it's as crucial as you do, but despite what I've said above, I agree with you. I've thought about this exact issue many times.

Expending functionality is good, but the way we've been doing it is hardly bad. That's what I'm defending.

So moving on: yes, we could (theoretically) add a second source link where one could link to the blog entry, 2chan thread [if they didn't disappear so fast], image gallery, etc.

I think this would be a good thing, but I don't know that people would do it right. It's hard enough getting them to add one link correctly right as it is. It would need to be implemented in a way that didn't confuse people and lead to them flipping the links or leaving them in the wrong boxes.

------
As an aside, I often do google searches with the inurl: prefix to find sites that aren't easy to get back to, like those stupid blog sites. A google for inurl:ponpondao brought up the blog immediately. I'm not proposing this as "the answer", just as a workaround for now when you run into this situation. It's unexpectedly reliable.

jxh2154 said:
Wow... I knew it would be bad but hadn't gotten around to doing the relevant source: search to find out how bad. As for your script, it's not something I'd be able to evaluate or run but maybe albert can comment.

Just search "source:*member_illust*" and you'll see the remaining ones.

The script works, yesterday I used it to resource about 3000 posts. It can be run by anyone with a privileged account level or higher. To be safe, I wrote it so that it would only change the source after checking it's a good source.

If someone wonders about the "Pixiv errors" that the script reports, those correspond to posts that don't point to a valid Pixiv page (usually because they were deleted in Pixiv).

aldeayeah said: The script works, yesterday I used it to resource about 3000 posts. It can be run by anyone with a privileged account level or higher.

Oh, so that's why I found a lot less than 200 pages when I checked a couple hours ago. Awesome, thanks very much for that.

jxh2154 said:
I'm not sure what you mean? Danbooru has a redirect scripted in so that if you upload the actual image (*.jpg, *.png etc) itself, the 'source' link will automatically direct you to the artist's page with the tags and all that.

That way, we get both ease of access to image (with the redirect) and ability to use Find Artist and reliably mass edit (with the precise link).

I didn't know it did that. I'll direct link to the image when I link to pixiv from now on.

Mode medium works for me - it takes me to the direct source of the image plus the artist gallery which is win.
Jxh second option doesn't work for me - I just get 404 which fails.

  • I'm all for second source link. *

It's a little annoying how people are complaining about the "correct" way to souce pixiv links [at least they add the souce!] when I find NOT adding the pixiv source at all worse! And a lot of people tend to do it. Or they add artist source but no artist tag. If not everyone, I myself have done so and willing to correct it.

We need to fix the following:
1) Adding Pixiv artist source
2) Adding Pixiv image source
3) Then fixing up the correct way to place the image source of a Pixiv link/and artist.

Xephina said: Mode medium works for me - it takes me to the direct source of the image plus the artist gallery which is win. Jxh second option doesn't work for me - I just get 404 which fails.

You should probably read the rest of the thread before deciding this. Your concern here isn't relevant because a workaround is automated in danbooru. Please read the explanations of how this is handled and why it's done this way.

It's getting a bit frustrating how people keep repeating the same objections about direct pixiv image links when those concerns were all rendered moot, what, months ago?

It's a little annoying how people are complaining about the "correct" way to souce pixiv links [at least they add the souce!] when I find NOT adding the pixiv source at all worse!

Strawman. While no source is certainly a problem, that has no bearing on the issue of which link to put in the source. That's like saying we shouldn't complain about violent assault because murder is worse. We can, should, and do tackle both.

We need to fix the following:
1) Adding Pixiv artist source
2) Adding Pixiv image source
3) Then fixing up the correct way to place the image source of a Pixiv link/and artist.

We already have answers for these. The discussion is over so far as Pixiv goes.

The only remaining discussion is if we add a second source link, which would be useful for other sites where we don't have automatic workarounds coded straight into danbooru. Which should really go in a separate thread.

1 2