Danbooru

Cross-site tag definition sharing

Posted under General

I've been very interested in Danbooru-like collaborative folksonomic tagging for awhile now, but one thing that irritated me is that tags only exist for each community.

This causes issues when trying to share tagging data between sites, cap may mean a screencapture on one site, but a type of hat on another.

Aliases, implications and documentations help solve this issue, but they are a part of the internal infrastructure, and requires that all sites use the same setup and definitions.

I think there should be an agreed-upon standard that permits tag definitions (including their implications, aliases and documentation) to be shared amongst sites.

The benefits of this is the ability for smaller danbooru-likes to have access to a richer and more complete library of tags, as well as easier sharing/syncing of shared tags between sites (no translation/abstraction between definitions).

The problem with this is that it increases the complexity of attaching tags, as izumi_konata is a lot easier to remember and type than http://example.com/izumi_konata

I'm interested in what people think of this, and what benefits and problems they see with it, any suggestions?

Updated by rantuyetmai

There are a variety of reasons this will never happen including but not limited to each site believing they are correct, different sites having different needs for qualifiers, and most importantly, there is no cooperation between sites.

There is some cooperation with 3dbooru with common staff. I attempt to unify tags when I can and it makes sense but sometimes danbooru insists on using a tag I find ridiculous or I need a qualifier when danbooru doesn't and vice-versa. These two sites are probably the closest to cooperating and even that's pretty minimal.

I believe the lack of cooperation is primarily due to the difficulty in cooperating. It seems the only way to 'cooperate' is just scrape the site wholesale, which resembles mirroring rather than cooperation (as well as being one-way).

Like the different tag names between sites, there is little to no incentive to mimic the style of another, as you have to create it by hand anyway.

Kitsu~ said:
I think there should be an agreed-upon standard that permits tag definitions (including their implications, aliases and documentation) to be shared amongst sites.

For the record, what you're talking about is basically an ontology, and you have a lovely career as an information scientist waiting for you if you can bring yourself to care about them.

That said, having worked a little with ontologies in real life, I can tell you they basically boil down to:

  • getting everyone together
  • getting them all to agree
  • getting them to do what they agreed on after they leave.

Unfortunately, all three of these steps are utterly and completely impossible under any circumstances. On the Internet? They are double impossible. Possibly even triple.

Kitsu~ said:
I've been very interested in Danbooru-like collaborative folksonomic tagging for awhile now, but one thing that irritated me is that tags only exist for each community.

I'm interested in what people think of this, and what benefits and problems they see with it, any suggestions?

As glasnost points out, you aren't the first to bash your head against this wall, and I guarantee you won't be the last. We've been having issues with this semantic metadata stuff since the mid-'90s when Tim BL started writing about it-- the problem is as old as ZUN's art is bad.

This is why I find Danbooru, Danbooru in isolation, and Danbooru alone to be a fascinating subject. All the formalised ontologies in the world can't compete with real, useful results generated on the fly

Finding a way to relate separate collaborative efforts with disparate philosophies about weight, value, and style is an open problem, as far as I'm concerned. NEPOMUK ontologies are a decent example of a start in the right direction, but they can only go so far as they lack the necessary sentience and social maladjustment required to hold grudges and difference of opinion that can be reasoned at but not reconciled.

I actually think this would be an awesome idea, and I've thought about how I would attempt to put a dent in it in the past, but it's a huge problem, and extremely complicated to say the least. Also I'm pretty sure the best you could do is create a descriptive site or system that atempts to join mostly synonymous concepts across sites. I think Danbooru's policy would be hard enough to modify to begin with, to try to change Gelbooru's or Pixiv's would be impossible.

I've made my thoughts on tag ontologies clear in the past. They'd be a great addition to the site, but also a lot of work to put together, especially since we have no less than 118k tags and growing.

Shinjidude said:
I've made my thoughts on tag ontologies clear in the past. They'd be a great addition to the site, but also a lot of work to put together, especially since we have no less than 118k tags and growing.

I agree. But don't sell it short: We have proven that a mostly-flat ontology is useful, usable, and extensible to three-quarters of a million (and growing) posts.

Well, in the larger sense, Danbo has scaled so far because it is mostly based on a naive flat ontology: it's an accessible and conceptually simple model that makes for a low barrier to entry. Disturbingly, that's still too complex for large segments of the population. The underlying issue, then, is still in how we expose both read- and write-access to semantic systems for users.

Shinjidude said:
I've made my thoughts on tag ontologies clear in the past. They'd be a great addition to the site, but also a lot of work to put together, especially since we have no less than 118k tags and growing.

And that'd be an ontology that covers what danbooru does. Doing the same for gelbooru or pixiv would be impossible, because they have no bloody tagging standards. Especially pixiv is atrocious in this regard, with its ocean of single-use, idiotic crap such as 画像が表示されない・・・だと・・・?. As far as tagging goes, these sites are a collection of retards, running around frothing at their mouth and peeing randomly on whatever they happen to lay their eyes upon.

I've discussed that previously (in other places than danbooru), and for us, being an isolated island is a Good Thing™. Trying to join tags with anybody else is not only pointless and not worth it, it'd be actively harmful if you attempted it with cesspools like gelbooru or (God forbid) pixiv. We're an isolated island simply because nobody else comes even close to the amount of care we put in the tagging and other forms of quality control; trying to "extend" that to some other site would be like mixing wine and sewage. You don't get more wine out of it, you can only get more sewage and waste good wine.

Basically, unless it's a sister site (like oreno or 3dbooru), that additionally tries very hard to stay in sync, the whole ontology idea would backfire big time.

My thoughts were along two lines, one allowing for a dictionary for mapping what tags do make sense cross sites, and the other being a useful ontology within our set of tags.

Using Pixiv or Gelbooru to fully automatically tag posts on Danbooru would be an awful idea.

I'm not exactly sure what an ontology internal to our tags would be or do.

As for mapping to any subsets of cross-site tags, I still think it's extraordinarily pointless and useless, because the only way you could apply the results safely would be from the better source (danbooru) to the worse one. I can only see the use as a suggestion tool for some very, very limited sets of tags, namely copyrights / characters. But then trying to maintain a mapping to anything on pixiv would be a nightmare -- have fun keeping up with things like やおよいおり (which means takatsuki_yayoi minase_iori), or the multitude of horribly amibguous variations of names, diminutives and nicknames they use for characters.

An ontology internal to our tags would basically supplant things like implications, aliases, and combo tags, and would formally encode things like tag groups. It would allow tags to be structured and searched in a hierarchical nature.

Although I agree mapping things cross site would be more useful to browsers of other sites, I think there is some utility to us going from Pixiv to Danbooru. The limited Japanese → English mappings available in the Pixiv Translation Plus Greasemonkey script already show this to some degree.

If the understood tags were translated into our tagset rather than English they would provide useful suggestions for people importing images here. It could in no way be perfect, and as you note, there is going to be noise and cruft no matter what, but I don't see how it would be "extraordinarily useless".

Ok, here's the userscript with Log's additions hardcoded in. If you use the PTP Menu, the new set of tags shows up in it's own list a bit off kilter, but I dont' feel like playing with the CSS to get it to line up exactly right. Also note that this might get overwritten the next time the script is officially updated.

http://pastebin.com/kaWqVhB3

Anyway I guess this is a bit off on a tangent, but this sort of thing might be of use to anyone that decides to try building a dictionary/ontology.

Shinjidude said:
It could in no way be perfect, and as you note, there is going to be noise and cruft no matter what, but I don't see how it would be "extraordinarily useless".

FTR, that was the only part of it that wouldn't :)

Shinjidude said:
An ontology internal to our tags would basically supplant things like implications, aliases, and combo tags, and would formally encode things like tag groups. It would allow tags to be structured and searched in a hierarchical nature.

You know, those tags, implications, aliases, and even the wiki articles in a bizarre improvisation...those are ontology. The cruft and requirement of human intervention is the nature of the beast. We could build better linkage. We could encode these things differently. We could expand it a bit and allow it to be a little less flat. But the best that can be achieved is the strongest weak ontology.

Again, don't sell it short because of the clunky implementation: this is really something, here.

1 2