Danbooru

Read the rules before proceeding!

Topic: Feature Support for Major Source Sites

Posted under General

Mikaeri

I think it's useful to have a topic where we can currently see and discuss the progress of feature support for many of the major sites that we upload from, so I've composed the table below:

Legend:
◯ - supported
✕ - unsupported
― - unsupportable (to the best of my knowledge), or unnecessary
? - not sure

-PixivTwitterNicoseigaNijieTinamiTumblrPawooDeviantArtArtStationDrawcrowdBanCiYuanMedibangE-HentaiYandere
Konachan
Works with link/bookmarklet (HTML, single)◯*
Works with bookmarklet (batch)―**―**◯*―**―**―**
Works with link/bookmarklet (direct image URL)◯*
Corrects sample sizes✕***✕***✕***
Supports direct image URL as source
Bookmarklet fixes source link (to HTML with proper referrer)
Translated tags
Commentary fetch
image sample/md5 mismatch/bad id scripts run regularly
Replacement scripts run regularly (topic #14063, replaceme)✕****
Search metatag available (e.g. pixiv:62729688)
Howto available
Userscript support????????
API documented?
Indexed by SauceNAO◯*****
Indexed by ascii2d◯*****

* Does not yet work with custom domains.
** These sites use a single entry for each image.
*** /large/ on Drawcrowd, /w650 on BanCiYuan, /440_330/ on Medibang
**** Not possible to replace from Amazon S3 yet.
***** Very partially indexed.

If anyone else has more ideas (additional sites and what not) or ways to add support then feel free to contribute. This doesn't cover all of the available sites -- I've left some out because they're either very rarely uploaded from (e.g. weibo, fc2) and/or defunct websites (e.g. Twitpic).

Wix support might be added. I'm in favor of it. There's been a number of artists that have been using wix to provide images of their artwork (post #2743138, post #2744357, etc), although it's still a rarity. Some notes:

Edits

Updated by Mikaeri

  • ID: 132211
  • Permalink
  • Type-kun

    I don't think E-hentai is unsupportable, we can always parse the html if things come to it, but it requires a lot of investigation, like any non-pixiv-lookalike source. Ehentai samples are a common problem by the way, I've seen it occur multiple times, because people assume that viewer shows the original and don't bother to look in the bottom. See also issue #2413.
    By the way, support for sadpanda will come along automatically if we manage to do this, but it requires an ehentai account which allows the access, which might also be a problem unless someone has a spare one to sacrifice or knows exactly how to get said access fast.

  • ID: 132242
  • Permalink
  • Mikaeri

    I get the feeling E-hentai support will be wonky though, given how strict they are with auto-downloaders and other things like that. If danbooru is limited to one account for access, it might trigger those same blocking schemes. Maybe I should put question marks there instead.

    @Grahf might know more... I hope.

  • ID: 132243
  • Permalink
  • Mikaeri

    Some notes about Nijie:

    • Bookmarklet works from view.php but not from view_popup.php
    • Unsure if the bookmarklet will fix the source link from view.php when opening image in a new tab, but it definitely doesn't from view_popup.php
    • Batch bookmarklet does not currently work with Nijie.

    Notes on Medibang:

    • Sample size has /440_330/ in the URL -- there may be others. Sample link for post #2728375 looks like the following:

    EDIT: Nijie support has been added, so points are moot now.

    Updated by Mikaeri

  • ID: 132244
  • Permalink
  • Grahf

    Well, there's a few things that I know that would be relevant for EH.

    I believe that all images are downscaled past a certain point unless you've made a minimum donation or grinded out the site currency for a particular "perk" as it were.

    I know that when I uploaded post #2740441 for example that I could just go into the gallery, onto the image, then view the image and initiate the bookmarklet from there and it did upload properly, although I did have to change the source after the fact from the jargon that the actual link was to the gallery page itself.

    At the same time, I might be able to get away with doing that because I have those aforementioned perks. Anyone just browsing the site without might not be so lucky.

    Truth be told, most of the time whenever I want to upload something to danbooru that I find in the galleries, 99% of the time I download it and then just manually upload the images because there's less inherent risk in doing so.

    That being said if there's anything in terms of questions or more detailed information I can try and help provide it. I might have to take any real technical minded stuff to the site admin though, and I'm unsure if they'd be willing to answer. They might, but they also might not.

  • ID: 132245
  • Permalink
  • BrokenEagle98

    Mikaeri said:

    I get the feeling E-hentai support will be wonky though, given how strict they are with auto-downloaders and other things like that. If danbooru is limited to one account for access, it might trigger those same blocking schemes. Maybe I should put question marks there instead.

    Just for reference, the last time I tried my hand at Ehentai, it IP banned me for a day after only 6 or so GETs spaced at least 2 minutes apart. To call their system trigger-happy would be severely downplaying it... :/

    However, with the above I was only doing a GET to the HTML page. Although I haven't tested it yet, I'm wondering if Ehentai requires a full browser simulation to avoid the ban detection, i.e. downloading everything from the page and making all of the appropriate HTTP calls that a browser would. To lend credence to the above point, I once did a test where I loaded at least 10 tabs in under 30 seconds all to different images in different galleries, and nothing was triggered.

  • ID: 132249
  • Permalink
  • Mikaeri

    BrokenEagle98 said:

    Just for reference, the last time I tried my hand at Ehentai, it IP banned me for a day after only 6 or so GETs spaced at least 2 minutes apart. To call their system trigger-happy would be severely downplaying it... :/

    However, with the above I was only doing a GET to the HTML page. Although I haven't tested it yet, I'm wondering if Ehentai requires a full browser simulation to avoid the ban detection, i.e. downloading everything from the page and making all of the appropriate HTTP calls that a browser would. To lend credence to the above point, I once did a test where I loaded at least 10 tabs in under 30 seconds all to different images in different galleries, and nothing was triggered.

    That could be it. Although perhaps this discussion would be better happening in private -- can @Grahf maybe get in touch with the admin about that? Because although this is for honest purposes, y'no how it goes -- people may snoop in over it.

  • ID: 132256
  • Permalink
  • Grahf

    Mikaeri said:

    That could be it. Although perhaps this discussion would be better happening in private -- can @Grahf maybe get in touch with the admin about that? Because although this is for honest purposes, y'no how it goes -- people may snoop in over it.

    I sent a PM to Tenboro asking if there's anything he'd be comfortable sharing. It might be a while before I receive a response though. I'll keep you guys posted.

    EDIT: He replied back saying "Some botlike behavior has rather aggressive ban triggers, but it depends of course on what they are trying to do. The API, for example, is exempt but still rate-limited to some extent."

    I think that it might be better if there were more directed questions that I could ask potentially. So if anyone wants to post or send some to me I can ask on everyone's behalf. How about I wait a day or two then send any questions I get off.

    Updated by Grahf

  • ID: 132293
  • Permalink
  • BrokenEagle98

    AFAIK, the API from E-hentai is pretty much useless as far as scanning for image samples or MD5 mismatches.

    However, if there were a way to get the filesize and MD5 hash for the fullsize image and all of the sample images, that would be useful. As that's not available, I was reduced to having to use HTML grabs.

  • ID: 132411
  • Permalink
  • Mikaeri

    I've seen uploads from a few rarer sites so I'm curious if anyone may still want howto's on more obtuse yet active sites that don't see a whole lot of usage. Here are some candidates along with extra notes:

    • Wix -- I already explained how to get non-sample versions in the OP, but uploading from Wix sites seem largely incidental since there are so many different ways a website can link their work (blog, homepage, etc).
    • Weibo -- This is a pretty easy one, actually. Just replace /bmiddle/, /mw690/, or whatever part of that URL reads after sinaimg.cn with /large/. post #2790775 is a recent example.
    • Cloudinary -- Information here. tl;dr strip everything between /upload/ and the file's public id (which includes folders if there are any). Although you may want to leave in post versions or play with them if you think there's changes (/v1/, /v2/, etc.)
    • Naver -- Naver is a real bitch to get full sizes from. If you have Chrome DevTools, open up your console, go to Sources. Click on the image to open up the full size image viewer if it's available. Then look for the images under the domain blogfiles.naver.net. Most of the time this will also be hidden under the mainFrame element.

    Updated by Mikaeri

  • ID: 133845
  • Permalink
  • CodeKyuubi

    Mikaeri said:

    I've actually never seen /bmiddle/ before, it's always /mw690/ or /mw1024/, and on a single occasion I saw /woriginal/ which is /large/ except with horrifying compression artifacts, for whatever reason.

    For naver, you should always be able to find it under the mainframe element, but only if you expand the image to its full size at least once, if I'm remembering correctly. Life would be so much easier if they didn't try to disable right-clicking.

  • ID: 133846
  • Permalink
  • Mikaeri

    CodeKyuubi said:

    I've actually never seen /bmiddle/ before, it's always /mw690/ or /mw1024/, and on a single occasion I saw /woriginal/ which is /large/ except with horrifying compression artifacts, for whatever reason.

    For naver, you should always be able to find it under the mainframe element, but only if you expand the image to its full size at least once, if I'm remembering correctly. Life would be so much easier if they didn't try to disable right-clicking.

    Yarr, that's why I mention the "Click on the image to open up the full size image viewer if it's available." part. Sometimes you can, sometimes you can't. Depends on the blog, but if you can't then it's usually already loaded to begin with.

    Were there other sites though? I recall there was one site like Cloudinary (cloud storage) but I can't remember off the top of my head what it was. URL structure is extremely similar.

  • ID: 133847
  • Permalink
  • Mikaeri

    Yeah, I remember now. The site was jimdo. Images hosted on a jimdo site are located in https://image.jimcdn.com/

    Example:

    Couldn't find any documentation otherwise of this.

    What else to remark... right, so I do have some free time after I finish a upload batch I have planned, and some new candidates to add to forum #133845 have popped up.

    In addition to stuff I have shelved to write:

    There are these:

    Mihuashi is the most recent high interest discovery. Most images from Mihuashi match these two searches:

    If anyone wants a howto written for any of these sites sooner than later, I'd love to know so I'll be putting my efforts someplace that won't be wasted.

    Updated by Mikaeri

  • ID: 133996
  • Permalink
  • Mikaeri

    http://horne.red/

    This site seems to be owned by same company as Nijie, and thus shares practically the same interface. Only difference is the type of content it has a primary focus on, which seems to be aimed at fujoshi.

    @evazion Do you think we'd be able to add support for the site given our support for Nijie? I don't know if anyone has ever uploaded from it yet (I haven't checked), but some links do exist in a few artist entries.

  • ID: 135047
  • Permalink
  • <<
  • 1
  • >>