Danbooru

Rethinking flagging

Posted under General

albert said:

Not at all. If you think a particular approver is bad, point me to them and I'm more than happy to demote them. I've done it before. I myself haven't promoted anyone recently precisely because of reactions like the one you're giving here. And you're not wrong.

But pointing to one or two bad uploads isn't sufficient. Everyone approves bad stuff once in awhile, us included. You need to prove that a large fraction of their approvals are bad. I'd say at least 10%. Stuff like a high percentage of negative scores is good evidence, or even a low median score.

Well, since you asked @albert, i'm going to make names. I want to preface this with the fact that I have no ill feelings towards any other user of this site, simply because after all it's just anime pictures - I'm just trying to provide objective data to answer your question.

Let's take a look at the approver report:
If we sort the columns by average unique downvotes / number of approvals, and skip people who have under 10 approvals (because it doesn't make sense to use this math for someone with 3 approvals), you'll notice that Akaineko (lottery approver) has a value of 33%, exactly twice the amount of the second, which is Hat Vangart, another lottery approver. Hat vangart has four times as many approvals as SBE and raisingk, respectively third and fourth, but the same value (16%). The nearest other users are all under 11%. I picked ten random reports where these users appeared, in order to make sure there wasn't any one-time bias. I seriously doubt it'll change if we take all of them and plot a time-functioned trend. Chances are these two users will stay as outliers. Furthermore, consider that member users can flag but cannot downvote. Only a fraction of those who would care to do so can actually downvote posts.

The average percentage of unique downvotes per approvals is 5%. Akaineko has more than six times that amount. Edit: It should be pointed out however that he has relatively few approvals recently, so the numbers are not set in stone. Still, anyone else (including RaisingK and SBE) with his amount of approval is nowhere near his numbers, so while he does not fit in the overall average, having only 10% of the average amount of approvals, he's also an outlier in the distribution of people with low approval counts. (Also I hope I didn't make any mistake in calculating these.)

You say this, but look at the flagger distribution. Just four people flagging more than 10 posts. There is no signal there. These four people could easily be victims of confirmation bias or other behavioral fallacies.

Members can only flag one post per day. That's enough to throw a wrench into anyone's attempt at flagging enough posts to show up in that report. Also, you can see my flags, and you'll notice I mostly flag samples, corrupted images, or exceptionally bad posts like post #615298. Ion does the same - from what I see every day in the mod queue he mostly flags single pages of comics and very low quality scribbles from a decade ago. I'd also be interested in knowing how many users only flag once, or which users have the largest amount of different flaggers, for statistics purposes, but that information is not publicly available.

I honestly don't see the point in flagging these uploads from 2012 when the average user is never going to go far back enough to come across it. So the fact we wouldn't be flagging these isn't a huge loss. And by your own admission, if poor quality flags aren't common, then the real world impact of not permitting them would be minimal.

The rationale of flagging those posts is to keep danbooru's active gallery curated. And, I want to add, to make sure new users know what's acceptable or not. I've seen plenty of newcomers trying to 1up horrible but active old posts with pixiv versions they found, only to see theirs deleted and end up complaining in the forums about it. I don't think having posts that the majority of approvers consider bad by today's standards be deleted (which is what flaggers so far have been doing) is bad. Deleted posts aren't expunged, so we're not losing anything.

You're right. Bad anatomy shouldn't be valid flag reason either.

But exceptionally bad anatomy is often associated with poor quality. "Bad anatomy" is just the short way to say that a post was drawn by an amateur and the flaws of the drawing are so exceptional, that they're evident to anyone looking at it. It's why when the flaws are not as much evident, people instead write longer flags. I don't really agree with most descriptive flags so I won't argue in favor of them - I've approved posts that were redeemed by the overall quality of the image in the past. But I don't think taking away the chance of sending a post back into the moderation queue is a bad idea - if the post is good enough it'll be reapproved anyway, as shown in the statistics I posted in an earlier post. About 1/4th of all flags in the past year have been denied, this not counting the ones that went through that were samples (again, 1/10th of these flags) and such, so it's not like there's a rush of posts being lost to the aether. And it's a rarity for any post to be flagged more than once.

Edit: made some edits because I noticed a mistake with Akaineko's numbers. The overall argument does not change.

Updated

And finally, this picture shows the data trend for the latest report. Vertical is percentage of unique downvotes per total approvals, and horizontal is just the row number in libreoffice (I usually use matlab to plot so apologies for the half-assed job at graph rendering). I couldn't figure out how to show data labels in libreoffice, so from right to left the seven wild data points that do not fit in the distribution represent:

Log
Akaineko
Hat Vangart
ShadowbladeEdge
Raisingk
Fujishiro
CaptainLoony

Log has 3 approvals so they do not fit in the distribution. Raisingk, SBE and Akaineko are the only ones at around ~20 posts but Akaineko has a disproportionate unfit compared to the other two. The remaining three users too are all lottery approvers. I don't think Fujishiro or CaptainLoony are bad approvers, and I don't think they should be demoted - even though they deviate from the norm, overall I don't consider their quality of approvals to be subpar personally, but still you cannot deny that this data shows they're outliers rather than a good representation of what the average danbooru user capable of voting on posts wants.

Final edit: i went back and quickly plotted the reports until 2018-04-01 to see if there was any time correlation. Akaineko's percentages went down during july and early april, but most of the time were high on the chart, with some weeks with decisive spikes, presumably because of particularly bad approvals. As a final note, these numbers should be taken with a grain of salt because of the few amount of approvals, and because I only went into details with the latest report due to lack of time, but imo they do a good job at showing outliers. In other months, most of the outliers that were not lottery approvers were users who only showed up for that month, or people with a handful of approvals on the same level as Log (like ultima, with 7 approvals in some I checked) and so can't fit into the data.

It would be interesting to make a correlation between approvals and flags, but that'd take way more time using the api and of course there's the concern of past flag abuses. It should be pointed out that in the case of Akaineko, there's 111 posts under flagger:any user:Akaineko age:<1y, which doesn't count multiple flags to his posts that were reapproved by other lottery approvers. Compare his low amount of approvals, with someone with high output such as PhoenixG (approver:PhoenixG flagger:any age:<1y: 51), or Qpax (approver:Qpax flagger:any age:<1y: 30). Even I, the user with the highest amount of approvals in the past year (19379), have less than him (92), and of those only about half ended up being deleted (half of which in turn were samples mods decided to 1up themselves rather than replace), compared to akaineko's 99 deleted (81%). The only reason this number is not higher is because two notable flaggers (ceres and provence) were banned, and nobody else cared enough to take up the questionable mantle (whether for fear of suffering the same fate or because of lack of caring is debatable - though I can count at least four other builders off the top of my head who refuse to flag for fear of suffering retaliation from a certain mod), but I wonder how many of those other still active approvals that were never flagged would actually survive a single flag.

Updated

I realize now I typed way more than I meant to at the start, and again I apologize if I sound too confrontational or accusatory (and for the triple post). I am ESL and text can only carry so much meaning.

To say that flags and feedbacks are abused is weird when you consider how much feedbacks were spammed in the past, even by members of the staff (see the now infamous red wave of bans/kicks by 葉月 that one can observe going back a couple hundred pages or so). And I'd wager most approvers have no issues with the current output of flags - a rate of 3-4 a day is more than manageable, and anyone can save posts from the queue anyway at any time they want or undelete them. If anything, it becomes bothersome when you see posts getting "four approvers believe this post has poor quality", but then a lottery approver comes along and picks it up until it's flagged again three days after, to the minute.

Everything I've written here, expecially the last pharagraph, is not just my personal opinion, but something I've seen talked about by several approvers and builders (and even some mod and, unfortunately, an admin) over the past year on private channels and servers (most notably IRC and discord, which is where several builders/approvers end up venting their frustration, given the lack of other outlets, but also DMails). These same users however do not want to stick their neck out and say what they think, and so choose to express that in other less permanent and less public places rather than risk getting banned from contributing or even receiving revenge feedbacks, which is precisely what happened in the past. But I guess everything has its limit, which is what I think caused this heavy pushback that you've seen against both your proposals.

I'm not happy I have to write this, but this stuff needed to be said, that's all.

Updated

nonamethanks said:

I realize now I typed way more than I meant to at the start, and again I apologize if I sound too confrontational or accusatory (and for the triple post). I am ESL and text can only carry so much meaning.

To say that flags and feedbacks are abused is weird when you consider how much feedbacks were spammed in the past, even by members of the staff (see the now infamous red wave of bans/kicks by 葉月 that one can observe going back a couple hundred pages or so). And I'd wager most approvers have no issues with the current output of flags - a rate of 3-4 a day is more than manageable, and anyone can save posts from the queue anyway at any time they want or undelete them. If anything, it becomes bothersome when you see posts getting "four approvers believe this post has poor quality", but then a lottery approver comes along and picks it up until it's flagged again three days after, to the minute.

Everything I've written here, expecially the last pharagraph, is not my personal opinion, but something I've seen talked about by several approvers and builders (and even some mod and, unfortunately, an admin) over the past year on private channels and servers (most notably IRC and discord, which is where several builders/approvers end up venting their frustration, given the lack of other outlets, but also DMails). These same users however do not want to stick their neck out and say what they think, and so choose to express that in other less permanent and less public places rather than risk getting banned from contributing or even receiving revenge feedbacks, which is precisely what happened in the past. But I guess everything has its limit, which is what I think caused this heavy pushback that you've seen against both your proposals.

I'm not happy I have to write this, but this stuff needed to be said, that's all.

+1

[nonamethanks math]

I don't understand all this nor the cryptic user_report chart.

Please explain in simple words where my approvals went wrong and what I could've done to improve them, and why I wasn't warned before receiving a negative rating?

Thank you for listening.

I wanted to add one final observation, to what you said earlier:

I myself haven't promoted anyone recently precisely because of reactions like the one you're giving here.

I can assure you nobody would find you at fault for promoting more people to approvers or unrestricted uploaders, or even just builders. Issues only arise when these people are picked at random, especially if their profiles don't show any substantial proof that they're "fit for the job", so to speak.

In fact there's been a shortage of promoters recently, since Wypatroszony became inactive and EB only promotes one or two users every couple of months. See for example recent unrestricted uploaders (only 11 since the start of 2018) and promotions to builders (14 promoted to builder since the start of 2018) (ctrl+F builder, I can't find a better way to do it). If anything, we need more people to be encouraged to upload. I'm not sure there's any solution to that that builders can think of, besides pestering the few interested members of the staff with lists of names every month or so, given that the last mod promotions save for RaisingK (who hasn't promoted anyone so far) were a couple of years ago and most current mods don't seem interested in actively promoting due not being as active as they used to be.

albert said:

Regarding unlimited uploaders:

Flagging is just a bandaid solution. I don't think a dozen deletions a week is a real deterrent to heavy uploaders. If you perceive one of the unlimited uploaders to be bad, then you should message me and we can discuss whether it makes sense to revoke the privilege. It's usually the case that while their average quality is low, and you can always point to a few egregiously bad examples, their uploads are a net positive for the site. But that's an assumption I'm making. It should be handled on a case by case basis.

That's exactly the point. I'm not calling for any users to lose their approval privileges (although there are others who would). But as you say yourself there are a few egregiously bad examples, and there needs to be a way to ensure these can get deleted, even if the uploader (or approver) for whatever reason doesn't see the problem with the images in question. Flagging provides the means for this, and indeed does deal with things on a case by case basis, allowing their positive contributions to remain unaffected.

The aim isn't to deter heavy uploaders - it's to ensure that these egregiously bad examples get deleted from the site.

If you were to somehow propose an alternative method of removing them that was as effective and efficient as flagging, giving every user the chance to highlight bad images and giving approvers the opportunity to say "actually, no, this image is fine", then my opposition would be weakened. But there is no such proposal, and it seems to me that this would be to all intents and purposes exactly the same as flagging, and trigger the same issues for those users who mistakenly see a flag as personal affront on the uploader.

Updated

Flags and appeals are a perennial source of drama on Danbooru.

The way I see it, the drama that surrounds flags and appeals is a problem stemming from the users.
The tools are not at fault if people take things personally to the point of openly antagonizing users across the site.

I think a large part of it is disputes that are subjective in nature with regards to art quality.

Poorly drawn is not a binary decision. There's a gradient. My opinion is that if you think an upload is ugly, you should either ignore it or down vote it.

This was my stance as a normal user since I signed up for the site: Enjoy what catches my eye and move on if I don't see anything I fancy.
However, as kittey said, I mainly use Danbooru because I do not have to scroll through several pages in other sites to find worthwhile images (as well as excellent tagging and source consolidation). I assume those are the primary reasons why the userbase browses Danbooru and not elsewhere.

After spending a few months as an approver and being given the responsibility of maintaining the gallery, I've realized that the multiple entry/exit manual curation is responsible for Danbooru's high quality galleries.

The approval queue, flagging system and appeal system are all important parts of this curation. Due to the subjective nature of art quality, I think it is a good setup by having several checks and counterchecks to strike that delicate balance.

I suggest we remove poor quality as a valid reason for flagging.

Since flagging is a crucial part of the curation process, I don't think it is a good idea to neuter the flagging system from cleaning up images for being poorly drawn.

Especially not without an replacement process to take its place. Even then, I can't think of a better system than what we have now.

With regards to the drama, changing what flag reasons are valid will not solve it.
The actual problem is when users who go out of their way to attack other users over arguments about art quality.

This is a problem that can only be handled by moderators.
And as far as I know, the people who do this have been warned and banned for going beyond disagreement to outright hostility.

------------------

Thoughts on Unique Downvotes determining unfit approvers

Are we only going to look at unique downvotes for this?

Akaineko and Hat Vangart getting demoted over one perfomance indicator strikes to me as unfair.
Unique downvotes are only one metric of approval quality in terms of post scores, after all. Shouldn't more than one aspect be considered for before coming to a decision?

@albert You mentioned percentage of negative scores and median scores:

But pointing to one or two bad uploads isn't sufficient. Everyone approves bad stuff once in awhile, us included. You need to prove that a large fraction of their approvals are bad. I'd say at least 10%. Stuff like a high percentage of negative scores is good evidence, or even a low median score.

I analyzed how Akaineko and Hat Vangart did with regards to these two metrics.

Negatively scored approvals:

The percentage of negatively scored posts is below 2% for Akaineko and below 4% for Hat Vangart

For Akaineko

Out of 3227 approvals within the last year:
53 posts were at score 0 approver:Akaineko score:0 age:<1y - that's only 1.6% posts
2 posts were negatively scored approver:Akaineko score:<0 age:<1y - that's about 0.06%

For Hat Vangart

Out of 671 approvals within the last year:
24 posts were at score 0 approver:Hat_Vangart score:0 age:<1y - that's 3.57% posts
6 posts were negatively scored approver:Hat_Vangart score:<0 age:<1y - only 0.89%

So in terms of negative scores, neither user is coming close to 10% negative scores across their approvals within the last year.

Median scores:

Both users do not deviate more than 2 points from the weekly average.
Nor they never come close the lowest median score of the week.

Median Score comparisons by week
WeekAverage Median Score
across all users
Max Median Score
across all users
Min Median Score
across all users
Median Score
Akaineko
Median Score
Hat Vangart
2018-04-017.452127N/A
2018-04-087.2517285
2018-04-157.231628N/A
2018-04-227.531726N/A
2018-04-297.181717N/A
2018-05-067.331718N/A
2018-05-136.881517N/A
2018-05-206.4315187
2018-05-276.1415166
2018-06-036.8815197
2018-06-107.2522197
2018-06-177.171716N/A
2018-06-246.851818N/A
2018-07-016.721717N/A
2018-07-087.111627N/A
2018-07-157.3523279
2018-07-227.6426157
2018-07-297.3426145
2018-08-057.4424156
2018-08-127.5425175
2018-08-196.8817177

As far as I can tell, both users stayed closed to the average baseline. They are certainly never outliers.

Unique downvotes:

I think the timing was rather unfortunate for both users.
They had a particularly bad week. In fact this week was their highest downvote percentage ever since April

Unique downvote comparisons by week
WeekAverage downvote percentage
across all users
Max downvote percentage
across all users
Min downvote percentage
across all users
Downvote percentage
Akaineko
Downvote percentage
Hat Vangart
2018-04-015.16%30.77%0.00%13.79%N/A
2018-04-084.59%20.00%0.00%9.28%8.00%
2018-04-155.30%20.00%0.00%12.77%N/A
2018-04-225.15%22.22%0.00%22.22%N/A
2018-04-295.74%20.00%0.89%9.52%N/A
2018-05-065.60%21.62%0.00%9.52%N/A
2018-05-135.58%27.59%0.00%9.38%N/A
2018-05-205.55%60.00%0.00%8.11%9.09%
2018-05-273.68%15.38%0.00%15.38%6.94%
2018-06-034.49%23.08%0.00%16.22%8.57%
2018-06-104.04%17.65%0.00%17.65%9.84%
2018-06-174.27%15.38%0.00%14.63%N/A
2018-06-243.64%14.00%0.00%2.94%N/A
2018-07-013.25%12.50%0.00%3.57%N/A
2018-07-083.96%15.25%0.00%2.08%N/A
2018-07-154.32%27.27%0.00%3.70%6.45%
2018-07-224.46%28.57%0.00%7.59%5.56%
2018-07-295.05%21.43%0.00%11.63%8.57%
2018-08-055.73%21.74%0.00%21.74%7.32%
2018-08-126.42%25.00%0.00%17.86%15.58%
2018-08-195.83%33.33%0.00%33.33%16.22%

NNT is correct that these two users are generally higher than the weekly average, and I think he's got it right by keeping the comparisons with context of approvers with similar approval numbers.

As NNT pointed out, these users have relatively few approvals and generally take breaks (Weeks for Akaineko, Months for Hat Vangart).
Given that reports operate on the last 30 days, if an approval gets downvoted, it takes four reports for it to fall off. This has particularly high impact on low volume approvers.

I decided to look closer into what kind of images they've been approving:

  • Hat Vangart had a pretty good start on July 15, being mostly within 2% of the average. But somewhere in the previous week approver:Hat_Vangart date:>2018-08-05...<2018-08-12 he approved a certain Miku post that earned him at least 5 or more downvotes. This sets a new high record by a fairly large margin. He's in a bad spot on August 12th and onward.

Right now, both of these users are having highest the downvote / approval ratios of their careers by almost a double amount.
Then this thread occurs and it just so happens that downvotes are analyzed.

I think it'd be fair to take into account additional performance indicators as well, such as deletion ratios, scores, etc before considering to write them off.
And since nobody has talked to these users, given them feedback or dmailed them (to my knowledge), they never got the chance to step up their game. Their demotion is quite sudden.

Finally, last thing to consider about using votes as a measure:
I've been told many times that scores and favourites are not a good indicator of what images belong on Danbooru.
The amount of highly scored and favourite imagesd that are deleted speaks to that score:>10 status:deleted
The opposite is true as well score:<0 status:active

So why are post votes now being used to screen approvers here?
Even unrestricted uploaders aren't subject to the amount of votes their uploads garner, nor is it a requirement to be promoted unrestricted and approver status.

Unless, that is changing too?

nonamethanks said:

The rationale of flagging those posts is to keep danbooru's active gallery curated. And, I want to add, to make sure new users know what's acceptable or not. I've seen plenty of newcomers trying to 1up horrible but active old posts with pixiv versions they found, only to see theirs deleted and end up complaining in the forums about it. I don't think having posts that the majority of approvers consider bad by today's standards be deleted (which is what flaggers so far have been doing) is bad. Deleted posts aren't expunged, so we're not losing anything.

To add on to that, I'd also mention the relatively frequent use of "there's worse things on the site" as a justification to upload and/or defend barely passable posts.

Squishy said:

Thoughts on Unique Downvotes determining unfit approvers

Are we only going to look at unique downvotes for this?

Akaineko and Hat Vangart getting demoted over one perfomance indicator strikes to me as unfair.
Unique downvotes are only one metric of approval quality in terms of post scores, after all. Shouldn't more than one aspect be considered for before coming to a decision?

@albert You mentioned percentage of negative scores and median scores:

I analyzed how Akaineko and Hat Vangart did with regards to these two metrics.

Negatively scored approvals:

The percentage of negatively scored posts is below 2% for Akaineko and below 4% for Hat Vangart

For Akaineko

Out of 3227 approvals within the last year:
53 posts were at score 0 approver:Akaineko score:0 age:<1y - that's only 1.6% posts
2 posts were negatively scored approver:Akaineko score:<0 age:<1y - that's about 0.06%

For Hat Vangart

Out of 671 approvals within the last year:
24 posts were at score 0 approver:Hat_Vangart score:0 age:<1y - that's 3.57% posts
6 posts were negatively scored approver:Hat_Vangart score:<0 age:<1y - only 0.89%

So in terms of negative scores, neither user is coming close to 10% negative scores across their approvals within the last year.

Median scores:

Both users do not deviate more than 2 points from the weekly average.
Nor they never come close the lowest median score of the week.

Median Score comparisons by week
WeekAverage Median Score
across all users
Max Median Score
across all users
Min Median Score
across all users
Median Score
Akaineko
Median Score
Hat Vangart
2018-04-017.452127N/A
2018-04-087.2517285
2018-04-157.231628N/A
2018-04-227.531726N/A
2018-04-297.181717N/A
2018-05-067.331718N/A
2018-05-136.881517N/A
2018-05-206.4315187
2018-05-276.1415166
2018-06-036.8815197
2018-06-107.2522197
2018-06-177.171716N/A
2018-06-246.851818N/A
2018-07-016.721717N/A
2018-07-087.111627N/A
2018-07-157.3523279
2018-07-227.6426157
2018-07-297.3426145
2018-08-057.4424156
2018-08-127.5425175
2018-08-196.8817177

As far as I can tell, both users stayed closed to the average baseline. They are certainly never outliers.

Unique downvotes:

I think the timing was rather unfortunate for both users.
They had a particularly bad week. In fact this week was their highest downvote percentage ever since April

Unique downvote comparisons by week
WeekAverage downvote percentage
across all users
Max downvote percentage
across all users
Min downvote percentage
across all users
Downvote percentage
Akaineko
Downvote percentage
Hat Vangart
2018-04-015.16%30.77%0.00%13.79%N/A
2018-04-084.59%20.00%0.00%9.28%8.00%
2018-04-155.30%20.00%0.00%12.77%N/A
2018-04-225.15%22.22%0.00%22.22%N/A
2018-04-295.74%20.00%0.89%9.52%N/A
2018-05-065.60%21.62%0.00%9.52%N/A
2018-05-135.58%27.59%0.00%9.38%N/A
2018-05-205.55%60.00%0.00%8.11%9.09%
2018-05-273.68%15.38%0.00%15.38%6.94%
2018-06-034.49%23.08%0.00%16.22%8.57%
2018-06-104.04%17.65%0.00%17.65%9.84%
2018-06-174.27%15.38%0.00%14.63%N/A
2018-06-243.64%14.00%0.00%2.94%N/A
2018-07-013.25%12.50%0.00%3.57%N/A
2018-07-083.96%15.25%0.00%2.08%N/A
2018-07-154.32%27.27%0.00%3.70%6.45%
2018-07-224.46%28.57%0.00%7.59%5.56%
2018-07-295.05%21.43%0.00%11.63%8.57%
2018-08-055.73%21.74%0.00%21.74%7.32%
2018-08-126.42%25.00%0.00%17.86%15.58%
2018-08-195.83%33.33%0.00%33.33%16.22%

NNT is correct that these two users are generally higher than the weekly average, and I think he's got it right by keeping the comparisons with context of approvers with similar approval numbers.

As NNT pointed out, these users have relatively few approvals and generally take breaks (Weeks for Akaineko, Months for Hat Vangart).
Given that reports operate on the last 30 days, if an approval gets downvoted, it takes four reports for it to fall off. This has particularly high impact on low volume approvers.

I decided to look closer into what kind of images they've been approving:

  • Hat Vangart had a pretty good start on July 15, being mostly within 2% of the average. But somewhere in the previous week approver:Hat_Vangart date:>2018-08-05...<2018-08-12 he approved a certain Miku post that earned him at least 5 or more downvotes. This sets a new high record by a fairly large margin. He's in a bad spot on August 12th and onward.

Right now, both of these users are having highest the downvote / approval ratios of their careers by almost a double amount.
Then this thread occurs and it just so happens that downvotes are analyzed.

I think it'd be fair to take into account additional performance indicators as well, such as deletion ratios, scores, etc before considering to write them off.
And since nobody has talked to these users, given them feedback or dmailed them (to my knowledge), they never got the chance to step up their game. Their demotion is quite sudden.

Finally, last thing to consider about using votes as a measure:
I've been told many times that scores and favourites are not a good indicator of what images belong on Danbooru.
The amount of highly scored and favourite imagesd that are deleted speaks to that score:>10 status:deleted
The opposite is true as well score:<0 status:active

So why are post votes now being used to screen approvers here?
Even unrestricted uploaders aren't subject to the amount of votes their uploads garner, nor is it a requirement to be promoted unrestricted and approver status.

Unless, that is changing too?

I'm sorry, but this is not a good analysis of the public data available. First of all, you can do a weighted median downvote percentage, and you'll notice that in nearly all of the reports, excluding the two months I've mentioned earlier, Aikaineko in particular is way above any possible linear fit. In short, he's the most outlier among all approvers in terms of how well received his approvals are by gold+ users. In fact, I'll go as far as to say that the only two months where his downvotes are below the average are also months where the average downvote ratio dropped to half, such as for 2018-06-01, for example, when the average downvote per approval was 1.14, down from the observed ~1.60 to 2.10 of other timeframes.

I used unique downvotes because

  • they're unique users, so there cannot be an argument to be made for mass downvoting
  • people downvote much more rarely than they do upvote

Trying to plot data by average upvote or score makes no sense, especially in a site where most of the highest scored posts are fetish-centered. Do male_focus and scenery posts have lesser quality than tits&ass? The data would suggest that yes, big tits have a much higher value than both those other two categories combined, and in general bikini pictures are much better than anything else on the site. That's not really a valid line of reasoning. An even stronger argument can be made if you consider comic uploads. They're extremely popular outside of danbooru, and linked all over the place, but people who read them typically have no account to vote with. And yet one of the reasons danbooru is so beloved, along with the high quality of the active gallery, is its dedicated translators. There's tons of series that were partially popularized because of danbooru's translations (just look at pop team epic or all the yuri stuff), and yet never make it past a score of 10 despite having thousands of views.

Hence why the only other way to determine how well received posts are is by determining how much pushback there is against certain approvers. And unique downvotes and the ratio of flags per posts is the only public available data that stands the scrutiny, given that the ratio of those two users didn't even improve after ceres was banned for flag abuse.

In particular for the second argument, people upvote nearly anything that tickles their interest, but people only downvote if something is so hideous they cannot go out of their way to ignore it - and the same can be said for flags. That's why downvotes are typically outnumbered by upvotes, and why they make a good indication of when something's gone wrong: they're so rare, that if someone is consistently an outlier in them, and receives a consistent higher number of them than the low average, there's something very wrong going on.

I think it'd be fair to take into account additional performance indicators as well, such as deletion ratios, scores, etc before considering to write them off.

This was already brought up, Akaineko has an 89% percent deletion ratio for reflagged approvals.

Hat Vangart is in a slightly better position, but only because he did not approve enough in the previous months to end up in the ranking, and because he has much less approvals. In the months were he did approve, he inevitably ended up being an outlier. Notice how he immediately jumped as fifth as soon as he surfaced in the last batch of reports where his name is featured, and stayed there well above the limit while everyone else who was not a lottery approver or someone with four times less approvals cycled. There's no argument to be made for singular approvals ruining the score because this is a six-week timeframe.

Even if you consider some random months, he always stays at the top and has several times more unfitness than any other manually selected approver.

In fact his promotion was put into question the very first month he was promoted (along with akaineko) (forum #131033, though this is more about akaineko than hat vangart).

If you don't trust me you can plot the data yourself, representing each approver as a function in a 2D graph and seeing that, while hand-picked approvers have ups and downs with a mostly average score, random approvers end up staying distinctly separated from the rest. It could be further improved by doing a weighted median in depth, but that would take quite some time.

In short, Albert asked for numerical proof that indeed these approvers were a bad choice, and the numbers show some pretty damning evidence. I understand that this kind of talk is not exactly understandable by people who have little practice in statistical analysis (got knows how much I hate it but you gotta do what you gotta do), but there's no other way to explain it other than "downvotes and reflags are the best available way to determine how good an approver you are, and by all factors your numbers look very bad". This is even more true when their percentages are much worse than people who have approved as much as ten to twenty times more.

Updated

nonamethanks said:

I'm sorry, but this is not a good analysis of the public data available. First of all, you can do a weighted median downvote percentage, and you'll notice that in nearly all of the reports, excluding the two months I've mentioned earlier, Aikaineko in particular is way above any possible linear fit. In short, he's the most outlier among all approvers in terms of how well received his approvals are by gold+ users. In fact, I'll go as far as to say that the only two months where his downvotes are below the average are also months where the average downvote ratio dropped to half, such as for 2018-06-01, for example, when the average downvote per approval was 1.14, down from the observed ~1.60 to 2.10 of other timeframes.

I'm not disputing they were consistent outliers when it came to unique downvotes on their approvals.
The use of weighted median to demonstrate their deviations wasn't mentioned in your initial analysis, nor do I think it necessary since you made a very clear case already.

What I was pointing out was how the latest report (that received the most in-depth analysis) also coincidentally had these users' worst downvote-to-approval ratios in their career as approvers. Did this record breaking magnitude had a more alarming effect on the immediate decision to demote them?

Or perhaps, the fact that they were constant outliers by a large margin was reason enough. The massive spike in unique downvotes for the latest report just icing on the cake.

I used unique downvotes because

  • they're unique users, so there cannot be an argument to be made for mass downvoting
  • people downvote much more rarely than they do upvote

Trying to plot data by average upvote or score makes no sense, especially in a site where most of the highest scored posts are fetish-centered. Do male_focus and scenery posts have lesser quality than tits&ass? The data would suggest that yes, big tits have a much higher value than both those other two categories combined, and in general bikini pictures are much better than anything else on the site. That's not really a valid line of reasoning. An even stronger argument can be made if you consider comic uploads. They're extremely popular outside of danbooru, and linked all over the place, but people who read them typically have no account to vote with. And yet one of the reasons danbooru is so beloved, along with the high quality of the active gallery, is its dedicated translators. There's tons of series that were partially popularized because of danbooru's translations (just look at pop team epic or all the yuri stuff), and yet never make it past a score of 10 despite having thousands of views.

Hence why the only other way to determine how well received posts are is by determining how much pushback there is against certain approvers. And unique downvotes and the ratio of flags per posts is the only public available data that stands the scrutiny, given that the ratio of those two users didn't even improve after ceres was banned for flag abuse.

In particular for the second argument, people upvote nearly anything that tickles their interest, but people only downvote if something is so hideous they cannot go out of their way to ignore it - and the same can be said for flags. That's why downvotes are typically outnumbered by upvotes, and why they make a good indication of when something's gone wrong: they're so rare, that if someone is consistently an outlier in them, and receives a consistent higher number of them than the low average, there's something very wrong going on.

You make a very convincing case here, I can see how the gravitas behind a unique downvote far outweighs that of an upvote.

I am one of those people who read the translated images and comics on Danbooru on a frequent basis (the amount of positive feedback I give out to translators is testament to this) and I've always pondered why comics that have an enthusiastic fanbase often have meh scores. Or why a skillfully designed scenic piece with landscapes and architecture constantly lose out to explicit smut.

I guess upvotes are mostly from the lowest common denominator and aren't worth much?

In short, Albert asked for numerical proof that indeed these approvers were a bad choice, and the numbers show some pretty damning evidence. I understand that this kind of talk is not exactly understandable by people who have little practice in statistical analysis (got knows how much I hate it but you gotta do what you gotta do), but there's no other way to explain it other than "downvotes and reflags are the best available way to determine how good an approver you are, and by all factors your numbers look very bad". This is even more true when their percentages are much worse than people who have approved as much as ten to twenty times more.

My main concern was that I thought that the scope of factors was too focused on what these two approvers did poorly in (downvotes, reflags), while not considering some of aspects they did better at (negative scores, upvote medians, etc).
I understand now why unique downvotes was the point of contention here.

  • However, I still think the demotion was too abrupt.

As far as I can tell, this was the first time people were demoted based on data from the Reportbooru. Unique downvotes can only be found in those reports and I don't know how many people make it a habit to track their performance there. There needs to be some kind of procedure to warn people to step up their game and pay attention to their numbers from the user reports. Probably a DM or feedback citing the report and where they're doing poorly.

Otherwise, users who could have improved if they were given a chance will be unfairly blindsided, like Akaineko and Hat Vangart were. At least in my opinion I think it was unfair.

I'll stick my neck out and mention that Vangart didn't even get that many flags during his tenure either, so he had little to go on to think he was doing badly.
I think he had good reason to ask why he wasn't warned and for advice on how he could have improved.

Speaking of which, I'll pay more attention to these numbers and see what I can do better as an appprover.

Thanks for your explanation and patience with me.

albert said:

I appreciate everyone's sincerity and passion. I am listening and reading and I am keeping an open mind about things.

"sincerity"

How mighty hypocritical of someone who only listens to raw numbers and doesn't even bother to take the actual person into account.

If ANYONE had told me: "Dude, your numbers are not so great. Try to step up your game within a month, or we'll have to take your approval rights.", I would've been cool with it.

I've been a member of this site for over 10 years and I believe I've come to know a tiny little bit about quality. I might've been a bit more lenient with my approvals, but I always try to be fair and base them on how the artist generally fairs, if it's part of a pool, etc. I never think about "Hmm, how many upvotes would I get for this image?"

Case in pont:
approver:Hat_Vangart score:0 age:<1y

Out of 23 images,

  • 8 images are part of a cute DBZ sketch series.
  • 3 are each part of a well liked pool
  • And the rest were simply too unfortunate to be popular.

As for the THREE negative scores:
approver:Hat_Vangart score:<0 age:<1y

And finally:

I had no idea you folks are being THIS strict about it.

That said, if all you care about are strict numbers like scientists handling sensitive data, or worse, marketing strategists, then I won't want to be a part of your so called quality control anymore.

Squishy:

Otherwise, users who could have improved if they were given a chance will be unfairly blindsided, like Akaineko and Hat Vangart were. At least in my opinion I think it was unfair.

Thanks, at least someone is being reasonable here.

Updated

Hat_Vangart said:

A random selection of posts you approved with several downvotes, from your last 200 posts (fetched via the api):

post #3217919
post #3215196
post #3215193
post #3200740 (this one, I'll admit, I've downvoted myself earlier before entering this thread, so you might discard it if you want)
post #3199519
post #3193703
post #3193702
post #3145751
post #3136131

Just because the score is zero or above doesn't mean it's well received - sometimes it's just an unusually large amount of downvotes balanced by an average amount of upvotes. In fact, there's 2.2M posts of score above >2, which shows well enough that positive score means nothing in the face of fetishes.

I don't know how you could have possibly missed the whole ordeal with people questioning the quality of random approvers' approvals, given that it started as soon as the promotions were handed out (over a year ago), and it involved several feedback exchanges and forum topics, though your partial inactivity might have been the cause, but the whole issue was with the fact that you were given the privilege at (almost) random. Not sure if it was a matter of matching favorites with albert, or something similar, but there was no on-site merit associated to it. You can choose to discard what I wrote in previous posts if you want, but it doesn't change the fact that it shows your approvals were very sub-par compared to the rest of the approval team.

That said, I don't want to take part in a name-calling contest, so I'll just leave it at that.

nonamethanks said:

A random selection of posts you approved with several downvotes, from your last 200 posts (fetched via the api):

How many downvotes did each one of those get?
How can I check the unique downvotes of my approvals.
How come nobody told me that downvotes can risk my approval privilege?

I don't know how you could have possibly missed the whole ordeal with people questioning the quality of random approvers' approvals, given that it started as soon as the promotions were handed out (over a year ago),

I don't frequent the forums, so I'm afraid that I completely missed it.

Oh well.

I'm fairly sure that the users will be much happier with the new and improved quality of the site. Now that you removed the weakest link (based purely on a minor number of downvotes), all the flagging drama is taken care of. Good job.

I don't think I've used flags all that much, only about once or twice a month on average, and mostly for off-topic posts and rule-breaking content. If flagging purely for poor quality went away, I probably wouldn't be affected much. It would still be nice to have the option of doing so, however, for those times when an approver or unrestricted uploader makes a mistake and a bad image makes it into the gallery. I'm not talking about images with borderline proportions or an easily-overlooked anatomy error (this is what error and related tags are for), I'm talking about glaringly obvious defects like visible compression artifacts or generally amateurish/sloppy artwork. Flags are, for now, our best means of dealing with posts that the moderation queue failed to keep out.

Also, eliminating quality flags as a means of preventing drama seems a bit... naive, I guess. All you're doing is sweeping problems under the rug; the bad attitudes and hostility will remain, and the people who would use flags for harassment and personal grudges will just find some other tactic to pursue their agendas, like creating forum topics to call out other users. If some tiny minority of users are abusing the flag system, the proper solution is to deal with them on an individual basis, not to deprive everyone else of a valuable tool.

albert said:

Regarding unlimited uploaders:

Flagging is just a bandaid solution. I don't think a dozen deletions a week is a real deterrent to heavy uploaders. If you perceive one of the unlimited uploaders to be bad, then you should message me and we can discuss whether it makes sense to revoke the privilege. It's usually the case that while their average quality is low, and you can always point to a few egregiously bad examples, their uploads are a net positive for the site. But that's an assumption I'm making. It should be handled on a case by case basis.

Do you really see deletion as a mere deterrent?. To me, it's a matter of good housekeeping. New users and prospective uploaders who browse the gallery and find things like this or this might be left with the impression that the posting of such images is condoned, or at least tolerated. It brings down the overall quality of the gallery. Cleaning up such posts sends the message that "high-quality anime-style art and doujinshi" isn't just a meaningless phrase, and that we're serious about holding all users to the same standard.

Squishy said:

Since flagging is a crucial part of the curation process, I don't think it is a good idea to neuter the flagging system from cleaning up images for being poorly drawn.

Especially not without an replacement process to take its place. Even then, I can't think of a better system than what we have now.

Back before it got purged, you could have blacklisted poorly drawn in order to filter some of the worst examples of art quality your search results. I wouldn't call this a "better" system by any means, mind you. The tag was extremely subjective and there were no checks on its use, unlike the system of flags and appeals. But if you wanted a "replacement process" for flags, it's the closest thing we've ever had.

One of the stated reasons for getting rid of this tag, in fact, was that anything deserving of the tag could just be flagged instead. In the event that poor quality ceases to be a legitimate reason for flagging, maybe this tag will make a resurgence? How ironic.

So anyway, I was digging into the forum archives and my old messages, and now I understand why everybody thought the lottery approver thing was a bad idea.

Test janitor mail I received by albert:

Janitors on Danbooru are responsible for helping maintain a high level of quality on the site. They approve uploads from other users and help with other moderation efforts. You would be expected at a minimum to approve a few posts a week. If you are interested, please respond to this message.

And the approval message:

DanbooruBot said:

You have been selected as a test janitor. You can now approve pending posts and have access to the moderation interface. You should reacquaint yourself with the howto:upload guide to make sure you understand the site rules.

Over the next several weeks your approvals will be monitored. If the majority of them are not quality uploads you will fail the trial period and lose your approval privileges. You will also receive a negative user record indicating you previously attempted and failed a test janitor trial.

There is a minimum quota of 1 approval a month to indicate that you are being active. Remember, the goal isn't to approve as much as possible. It's to filter out borderline-quality art.

This is ALL I received to get me started.

Here's what should've been added:

Here's an approval role. We can *always* take it away on short notice for any reason, even after the unspecified test period of several weeks is over.

Read the rules. ALL of them. We're not going to point you to any specific rules set for approvers, it's your responsibility to figure everything out yourself on this massively complex system and the incredibly impersonal and unwelcoming forums.

A degree in art and anatomy is also mandatory for any good approver.

If you don't understand anything, ask someone.
Don't know anybody? Don't know *what* to ask?
Your problem. We're not your friend.

1 2 3