Danbooru

Image Sample Cleanup Project

Posted under General

Hmmm... well I do leave comments on every image with sample or MD5 mismatch, so there's at least that.

I could share my scripts, but they're not exactly publisher friendly (meaning the code is messy). On the plus side, it is modular, though the Tumblr module is by far the most complicated piece of code next to the Deviantart module.

However, RaisingK already does a lot with Pixiv. Even though I've taken over several different site sources, the total sum of those still don't equal the uploads from Pixiv.

I can see the need for some collaboration though, since it's not a good practice to be reliant on only a few pieces.

Mikaeri said:

Hmm, maybe. I would still replace a number of the md5 mismatch width:1280 or height:1920 "revisions" outright though. Luckily it's a lot less than all the samples from before.

@Mikaeri

I can run hash and visual comparisons on:

width:1280 source:http*://*tumblr.com/
height:1920 source:http*://*tumblr.com/

There are still a lot of posts without sample / replaceme tags like:

post #2730096 -> http://data.tumblr.com/30ba679756aed2eb5a807af9374ce83c/tumblr_oq9cdsG8X71qf69plo1_raw.jpg

By the way, I should say that curfew is lifted on the whole Tumblr sample replacement thing, now that support has been added to grab the best image available following the latest patch. Any Tumblr samples uploaded past yesterday (06/24/2017) are fair game to be hard replaced by another user (whether that be by accident, coincidence, or even intent). Hard replaced being that a completely new upload is made to replace the sample.

If you happen to upload a Tumblr sample and someone else uploads the full shortly after (let's say maybe a few hours to two days), then the deletion goes to you and not them (post #2766953). This is regardless of where the original was sourced -- a full from pixiv/seiga/nijie with the same md5 as its Tumblr full is more than sufficent enough to replace a Tumblr sample. Preferably, the use of replaceme will really depend on if an approver notices your plea in due time, but it's not exactly our duty to always cover your mistakes when you shouldn't be making them in the first place. Always make sure to read the howto's (howto:tumblr) for the sites you're uploading from when they're available.

One last thing, we still need to get through tons upon tons of Twitter md5 mismatches that may have the wrong source to begin with:

If anyone is willing to help out it would be much appreciated. Check the tag history and the images to verify that they're indeed inferior uploads from the source, and not just as the result of a user replacing the source of a post (even if bad id'd) to a more inferior one (such as from Twitter). That is to say, watch out for stuff like this.

Updated

@reiyasona Do you think we can do script replacements for these searches?

I would say it'd be worth checking manually for replacements, but given that the vast majority of these matches are pretty much identical (and perhaps the results of antiquated Tumblr sampling methods that aren't archived, and thus aren't detected to be samples by BE98's script) I think it's easier to just do a batch replacement. The revision parts are really quite negligible at best.

EDIT: Seems some of these have smaller res revisions (ex. post #2723625). Probably wanna skip those.

Updated

Mikaeri said:

And also, can we get special highlighting in the mod queue for images tagged image sample and md5 mismatch? @evazion @BrokenEagle98 There are a few images that have recently been approved by @Nitrogen09 and @Qpax, for example, that should have been reupped instead. I think it would be better this way.

I agree that these posts should be highlighted but right now they have the same color as posts with a score equal or less than -3 or posts that are tagged with bad anatomy.
In m opinion, this should not be the case since the red coloring is misleading. Simply because a post is tagged with md5_mismatch doesn't mean it should not be approved. Sometimes for example the revised version is worse (bad_revision).
So I think the color shouldn't be red, but maybe yellow or blue in the queue.

How should sources such as ArtStation which previously had larger sources available that we are using but no longer do be tagged? Obviously bad revision should not be used since nothing is per se "revised." A comparison would be if pixiv suddenly resized all its images to lowres, that wouldn't in turn make every post a bad "revision."

So I propose perhaps downscaled source for these pictures. Wording is a bit yucky, go ahead and suggest another if you will. But to reiterate this is for the concept of a post that was sourced using a platform which has since degraded the quality and that original file is now longer available.

I'll ping @Mikaeri @BrokenEagle98 @evazion @reiyasona @RaisingK

chinatsu said:

How should sources such as ArtStation which previously had larger sources available that we are using but no longer do be tagged? Obviously bad revision should not be used since nothing is per se "revised." A comparison would be if pixiv suddenly resized all its images to lowres, that wouldn't in turn make every post a bad "revision."

So I propose perhaps downscaled source for these pictures. Wording is a bit yucky, go ahead and suggest another if you will. But to reiterate this is for the concept of a post that was sourced using a platform which has since degraded the quality and that original file is now longer available.

I'll ping @Mikaeri @BrokenEagle98 @evazion @reiyasona @RaisingK

Relevant: topic #14804

chinatsu said:

How should sources such as ArtStation which previously had larger sources available that we are using but no longer do be tagged? Obviously bad revision should not be used since nothing is per se "revised." A comparison would be if pixiv suddenly resized all its images to lowres, that wouldn't in turn make every post a bad "revision."

So I propose perhaps downscaled source for these pictures. Wording is a bit yucky, go ahead and suggest another if you will. But to reiterate this is for the concept of a post that was sourced using a platform which has since degraded the quality and that original file is now longer available.

I'll ping @Mikaeri @BrokenEagle98 @evazion @reiyasona @RaisingK

No need to ping Mikaeri. He already stopped doing that for this site.

1 9 10 11 12 13 14