Danbooru

On the topic of third-party dupes and users that upload incorrectly

Posted under General

I hate having to explain this repeatedly in private, so I'm putting it up in a thread because this requires concrete resolution and I won't shut up about it until that happens.

The policy on how we resolve duplicate uploads is outdated and offers next to no reasoning why it is as such. The goal of this thread discussion is to help elaborate on this policy so we can reduce the amount of confusion that revolves around this concept and policy among uploaders, gardeners, and approvers.

So let's explain this a bit. The current duplicate policy (per use of the tag which is also pretty faulty) states this:

  • Do not flag duplicates for deletion if your only reason is that the image is a duplicate.

However, you will find that there are users that, by way of plain ignorance or plainly by seeking an easy way to become promoted or 'look like a good uploader' in the gallery, upload these what we would call 'duplicates'. The most concerning part especially when it happens to be a third-party edit or a discrete image sample (such as a pixiv sample) seemingly made legitimate by re-uploading the image to a mirror site so that it looks difficult to dispute (obscure sources like imgur, discord, or fc2).

This is a multi-faceted problem exacerbated by the fact that we basically have zero current policy or code of operations by which images are gardened, vetted, and approved. Approvers are simply told to use their own instincts and approve what they see they like regardless of that image's actual validity or integrity. Uploaders are guided towards the way to do things correctly by way of the howto pages, but nothing locks them down from purposefully uploading incorrectly without any consequence given the chance they are allowed to get away with it, aside from more experienced and well-meaning users catching a glimpse of that behavior and deducing it firsthand.

There is a current pattern though, and this is what it looks like right now:

  • User uploads a or an image known to be a sample from a source site (such as a pixiv sample) unknowingly. This is fixed and remedied through scripts and Moderator+ level gardeners in topic #14156. This is expected behavior.
  • User knowingly uploads a sample when an original already exists. Their post is moved to deletion. This is expected behavior.

However, if you throw in the fact that a user is allowed to upload any image from their computer and does not necessarily have to provide a site source, this gets invariably more complicated since we can't verify such above cases ourselves.

Let's take post #3003247 for example. I moved this post to immediate deletion under the assumption and guise that it was a sample (and subsequently lost my approval privileges over it, go figure). This isn't totally wrong, but it ignores the base reason why we would be interested in removing posts like these.

For one, if you check, then wherever the uploader claims this image is from is really not where it comes from. It already fails that first check of validity by being an md5 mismatch from the source provided. Dig deeper and you actually find it's a lossy-lossy image, and the metadata present in the original image is missing, suggesting it is indeed a resaved 'edit'.

Everytime you resave a file in a lossy format through your editor, you introduce more lossy artifacts regardless of if you actually changed anything. You can observe this by way of an actual image diff, or a visual layer comparison in photoshop, GIMP, Krita, whatever allows you to compare it personally. On that scale it is quite small and practically insignificant to the naked eye, but it is a change regardless.

This third-party saved image was uploaded after the original was already upped from the source site, and it was even mistakenly marked as a revision by a gardener, despite the fact that the artist almost never revises their work on pixiv. Here's the crux then: How do we handle third-party edits like this?

Regarding that specific image itself it's hard to deduce, really, what the original uploader did or was thinking when they did it even when they were asked for comment. Such a user was already pointed to resources on how to upload correctly and yet they still happen to mess up somehow and some way. But it happened, and we have no way of handling such things under current policy, though a number of approvers with the knowledge to confirm an image's validity have taken things into their own hands, myself included.

Before, it would be: Did the user understand their commitment to a mistake? Then perhaps its best to point them to topic #14156 and hope that someone amends their mistake. Since we have no support for uploading from sites like E-Hentai, Weibo or bcy, sometimes this mistake honestly happens (see topic #14119). So it'd be appropriate to request a replacement, even if it isn't from a site that gains automated sample replacement support through user-run scripts (primarily pixiv, again see topic #14119).

Yet when a user uploads an image they unknowingly or even knowingly made an edit to, does that warrant replacement or forgiveness? Especially in an 'innocent' case like this where the image was double-saved in their editor and then re-uploaded here. After all, here in booru there is no real concept of truly deleting an image aside from expunging it, which we almost never do aside from completely inappropriate content.

To put this into example, and I'll copy this from another message I recently sent.

We've done it in the past before and that's exactly what I complained about regarding replacing images from Twitter -> Pixiv or third-party Discord -> Pixiv. But to put it in simpler terms: Say nonamethanks uploads an image correctly from an artist's Twitter and I upload an image with 99.9% visual identity... the catch being that I uploaded it from "Mike's Anime Gallery" on pinterest. Let's say Mike here has sourced the image from Pixiv or Tumblr where the image is untainted by resampling artifacts. But pinterest resamples images. And then I claim I forgot where I found it because I 'thought' it was the same image from Pixiv and the link is already lost, or better yet I stay silent and feign total ignorance.

This is my opinion and my reasoning. Replacing third-party edits isn't helpful, nor is it even warranted because we cannot claim for sure whether we are replacing an image that we'd like to reference in the future. It attributes credit to the wrong user as they are not the original uploader of such an image when it is found, and it is a poor admittance of due credit. It has been done before in the past, where images uploaded from Facebook or Discord have been replaced with their pixiv/seiga/artist twitter variants, but I raised hell about this enough to make the policy change and was partly responsible for why approvers aren't allowed to replace images in-place anymore.

Just to wrap things up here: We say we don't delete duplicates, but we delete tons of 'duplicate' content that's rejected in help:third-party edit: lossy-lossless, lossless-lossy, or even the currently nonexistant lossy-lossy (which would go for a turbulent many images in the gallery if it were ever applied). Artists can apply any of these conditions themselves... but its a rare and in-between case, and oftentimes its a third party that applies it.

These categories are where the concern over deleting duplicates arises, since it's hard to tell and sometimes can only really be verified with tools you provide yourself.

If you need to read more on topical appropriateness for where a duplicate would be completely fine to upload, you can check out my writeup at https://hackmd.io/s/SJXiK1L9- or in topic #14426. Of course, sometimes I'm not totally right but I'm open to editing and fixing it with suggestions.

Updated

I'm a little skeptical that 'users resaving images in order to get "good" uploads' is actually a thing, or at least much of one at all. But if sniping is a thing, I guess maybe that can be too?

Mikaeri said:

Replacing third-party edits isn't helpful, nor is it even warranted because we cannot claim for sure whether we are replacing an image that we'd like to reference in the future. It attributes credit to the wrong user as they are not the original uploader of such an image when it is found, and it is a poor admittance of due credit. It has been done before in the past, where images uploaded from Facebook or Discord have been replaced with their pixiv/seiga/artist twitter variants, but I raised hell about this enough to make the policy change and was partly responsible for why approvers aren't allowed to replace images in-place anymore.

I'd like to check the consensus on this point in particular, since unsourced->sourced replacements periodically come up in the replacement thread, while Mikaeri's opening post for the replacement thread states:

You also can't claim an image is a sample that should be replaced if it has no source provided and doesn't match an md5 anywhere on a given site (whether that be Twitter, Tumblr, Pixiv/seiga), but you can flag it for deletion once a superior post has been following how we treat unfairly modified posts (waifu2x, unsourced third-party conversions, etc).

I really don't care if the "wrong" user gets credit for the original image, so long as the original image is up.

And a source like "Image board" or Imgur is as good as unsourced, to me.

IMO, all unsourced duplicates or duplicates from lame sources (Imgur, Discord, Facebook, etc) should be deleted if we have a copy from a primary source (artist’s Pixiv, Twitter, etc), unless they’re of clearly superior quality. We do have an increasing amount of paid rewards that can’t be sourced but are of clearly superior quality and must be coming directly from the artist. waifu2x upscales and the likes are usually not too hard to spot.

Why so relentless? Such duplicates mean that the uploader either couldn’t be bothered to look for the proper version and went whatever was at hand or intentionally wanted to upload a duplicate. I kind of doubt that intentional duplicates are that much of a problem (yet?) compared to duplicates out of laziness, but even if it’s “only” laziness, such behavior shouldn’t be rewarded at all. It’s not that hard to search on iqdb.org and then click the conveniently provided SauceNAO and TinEye links.

Intentional duplicates to get “good uploads” is an aspect that I hadn’t considered yet, but to me the bigger problem is pollution of the image database. I find it really annoying to see multiple copies of the same image with no immediate indication which one’s the best, only to find out that some of them are just lame duplicates, possibly upscaled or with more artifacts.

If no primary source is available for whatever reason, okay, can’t do anything about it. But if one is available, the lame duplicate should be deleted.

Regarding replacing lame duplicates with properly sourced ones, I don’t have a strong opinion on that, but I also believe that the credit should go to the user who did bother to look for the properly sourced version.

RaisingK said:

I'm a little skeptical that 'users resaving images in order to get "good" uploads' is actually a thing, or at least much of one at all. But if sniping is a thing, I guess maybe that can be too?

While it might not be much of a "thing" yet (there are plenty of things keeping away potential uploaders on the site), the possibility is simply there, and since promoters can't make the most thorough analyses of a user's uploading performance, an upload count is one of the easiest ways to gauge it.

I'd like to check the consensus on this point in particular, since unsourced->sourced replacements periodically come up in the replacement thread, while Mikaeri's opening post for the replacement thread states:

I really don't care if the "wrong" user gets credit for the original image, so long as the original image is up.

And a source like "Image board" or Imgur is as good as unsourced, to me.

I've said that statement because I believe images that aren't samples but are different (lossier through resaves or conversions, and/or modified in some other respect) should be replaced. I did mistakenly apply image sample and replaceme to a number of posts, but I've since amended that and hard-replaced them myself in new posts.

Since I'm not a Moderator (nor am I interested in even trying to become one), I typically leave that decision up to those users that can. But even Moderator+ users get it wrong. One admin actually did an inappropriate replacement without letting anyone know (post #2931411), and it caused a point of strife for another user. So as to how much some of those users can be trusted to do a decent job... Meh.

Regarding the "wrong" user getting credit. Well, other people care. If you attempted to upload something, it succeeds, but its totally incorrect how you uploaded and someone else does it correctly shortly after, then you messed up. As for imgur sources, well it's pretty much an easy way to 'mask' a source. Some are legitimate through way of linking on an artist's exclusive plurk/discord/reddit/facebook/etc, others are just because they simply found it online and didn't bother checking.

kittey said:

IMO, all unsourced duplicates or duplicates from lame sources (Imgur, Discord, Facebook, etc) should be deleted if we have a copy from a primary source (artist’s Pixiv, Twitter, etc), unless they’re of clearly superior quality. We do have an increasing amount of paid rewards that can’t be sourced but are of clearly superior quality and must be coming directly from the artist. waifu2x upscales and the likes are usually not too hard to spot.

Why so relentless? Such duplicates mean that the uploader either couldn’t be bothered to look for the proper version and went whatever was at hand or intentionally wanted to upload a duplicate. I kind of doubt that intentional duplicates are that much of a problem (yet?) compared to duplicates out of laziness, but even if it’s “only” laziness, such behavior shouldn’t be rewarded at all. It’s not that hard to search on iqdb.org and then click the conveniently provided SauceNAO and TinEye links.

Intentional duplicates to get “good uploads” is an aspect that I hadn’t considered yet, but to me the bigger problem is pollution of the image database. I find it really annoying to see multiple copies of the same image with no immediate indication which one’s the best, only to find out that some of them are just lame duplicates, possibly upscaled or with more artifacts.

If no primary source is available for whatever reason, okay, can’t do anything about it. But if one is available, the lame duplicate should be deleted.

Pretty much agree with the above. Lame duplicates, for a while now, have either been flagged and deleted, or simply deleted straight.

Regarding replacing lame duplicates with properly sourced ones, I don’t have a strong opinion on that, but I also believe that the credit should go to the user who did bother to look for the properly sourced version.

Yup. See above.

I still think I lost my privileges over something a number of approvers do normally, which is frankly kind of annoying because some of the staff may have a personal vendetta against me. It's annoying to get caught up in drama, but when users find any reason to put blame on you, even for something you typically do and assume to be just fine... It's no wonder that appropriate replacement is as misunderstood as it is right now. Better to clear up the smoke around it.

Updated

Is this topic now about third-party edits or is it about image samples? Because image samples are getting mentioned quite often but if it's detected by one bot then it's safe to instant delete them. I guess image samples aren't much of an issue then.
But I say that it's fine to approve samples but the uploader should be contacted on how to avoid uploading them. I guess there also is an automated message by @RaisingK if someone uploads a sample from Pixiv. Don't know if it covers other pages as well, but it might be worth to include Twitter, NicoSeiga, Tumblr if not already done. That means while approving that stuff is fine, it's not ok to upload samples that get replaced then. It is also not ok to frequently upload without a source or from Imgur etc. These users will get called out eventually by some Approvers. The mod queue does help with detecting such posts. Right now, I don't really notice one user who is doing this shoddy stuff mentioned in this topic except only one
Approving samples should be done as I stated on post #2997109. There isn't that much confusion actually but it should be said how to handle that stuff somehow.

My personal experience with unsourced stuff and third-party edits is that they do happen but it's extremely rare and in my opinion this issue about resaving files on the computer does only happen by two users. One of those two users also has unrestricted uploads but got shot down by @chinatsu eventually. The other user uploaded the post mentioned in Mikaeri's first post.

If these are the only situations where that happened then this issue isn't related to how we handle third-party edits and image samples. Image sample don't even have anything to do with the mentioned post, so I get confused why we even mentioned them in this topic to begin with.

Actually, all we need to know about this issue is already listed in help:third-party edit. Actually, while it is stated that lossy-lossless is listed under generally rejected I would still always flag the image normally and wouldn't move it to instant deletion. The only thing I would instant delete is stuff that is detected as an image sample by @RaisingK 's or @BrokenEagle98 's bot.

The post is then still under observation by over 20 people and doesn't get wiped away. I noticed some errors in the past when ding this, and they also happened to me, so I've grown extra cautious when deleting an image instantly. Then it sits three more days in the queue but then we can say for sure that everything was alright.

As said in the topic -- Third-party edits/resaves/dupes. Or what kittey refers to as "lame duplicates". Duplicates that simply exist just as "duplicates" despite the fact that one image is obviously the original and the other isn't, given both are provided digitally.

We have a well-established paradigm for how we handle actual site samples, when we find an image that matches the checksum of a sample image on a primary source site. We either replace them when the opportunity arises, or we delete them if no opportunity provides itself. This is what happened with pixiv samples for a long time, and what happened en masse when _raw was discovered on tumblr and tumblr samples were mass-replaced. We just don't do it automatically for a good many other sites, which is why topic #14156 exists, and why I've included it as a 'checklist feature' in topic #14119.

Some lossy-lossless images are okay. It depends on the availability of the original image or if the concerning post is actually a stitched image (we have no image stitch tag). The tag is also used to describe png images with jpeg artifacts, which is another point of concern.

In any case I'll be more careful about moving posts to instant deletion, but it is not something I expect to be silently punished for. I still want better elaboration on it, especially for the edge cases where say, a user uploads fanbox rewards but a handful of them are actually samples that we can't verify without subscribing to the artist ourselves. This was done for a number of sayori's posts.

Some lossy-lossless images are okay.

Indeed they are. A normal flag ensures that it gets checked properly.
An instant deletion with a wrong deletion reason on top is even worse since a wrong concensus is created that way. One just assumes it was a sample when it was in reality a lossy-lossless, loss-lossy or lossless-lossy conversion. You also say that they are sometimes ok.

Waiting three more days doesn't hurt anyone. Or does it?

I don't comment much these days, but I'd treat most third party edits the way we already treat nude edits - flag and possibly delete, as the original mission is to be a repository of quality images. Any edits, even if well-intentioned, not by the original creator may be removed.

1