Danbooru

HTML scrubber escaping linebreaks causes problems

Posted under Bugs & Features

Since linebreaks don't normally display in HTML, the HTML scrubber for notes escapes them by inserting <br> tags. Apparently it does this without even parsing the HTML first, which means that notes with fancy markup like those for the titles for the Life of Maid comics (pool #678) can't be broken up for better readability without

<div style="font-size: 350%;
            color: #82C2FC;
            font-style: italic;
            font-weight: bold;
            white-space: nowrap;
            text-shadow: -1px   -1px   0     #080E65,
                          1px   -1px   0     #080E65,
                         -1px    1px   0     #080E65,
                          1px    1px   0     #080E65,
                         -2px   -2px   0     #080E65,
                          2px   -2px   0     #080E65,
                         -2px    2px   0     #080E65,
                          2px    2px   0     #080E65,
                         -2.5px -2.5px 0.5px #080E65,
                         -2.5px  2.5px 0.5px #080E65,
                          2.5px -2.5px 0.5px #080E65,
                          2.5px  2.5px 0.5px #080E65;
            margin: -4px;
            padding: 4px /* tee hee */
            ">
    Life of Maid #27
</div>

turning into <div style="font-size: 350%;<br> color: #82C2FC;<br> font-style: italic;<br> font-weight: bold;<br> white-space: nowrap;<br> text-shadow: -1px -1px 0 #080E65,<br> 1px -1px 0 #080E65,<br> -1px 1px 0 #080E65,<br> 1px 1px 0 #080E65,<br> -2px -2px 0 #080E65,<br> 2px -2px 0 #080E65,<br> -2px 2px 0 #080E65,<br> 2px 2px 0 #080E65,<br> -2.5px -2.5px 0.5px #080E65,<br> -2.5px 2.5px 0.5px #080E65,<br> 2.5px -2.5px 0.5px #080E65,<br> 2.5px 2.5px 0.5px #080E65;<br> margin: -4px;<br> padding: 4px /* tee hee */<br> "><br> Life of Maid #27<br> </div> in the note body and breaking completely.

Is there any reason that the desired linebreak-escaping behavior can't be accomplished with the white-space: pre-line CSS property? This would have the additional advantage that translators could suppress the behavior by overriding the attribute in an inline style.

Updated

I feel the same way, but doing so without a plan would currently screw up a lot of notes that rely upon that mechanic.

My thoughts on this would be to have some kind of marker in the note, similar to what is currently being used for embedded notes (About:Embedded Notes). The presence of this marker in the note text would turn off the \n -> <br> conversion for the entire note body. This would preserve the functionality of old notes, while allowing new notes to use the new capability.

Experimenting a bit more, I see that my original post was somewhat mistaken. The scrubber does parse the HTML first. The substitution doesn't happen inside tags, only inside attributes. The following code behaves correctly:

<div
style="font-style: italic"
>Hello world</div
>

The following, syntactically identical code does not:

<div style="font-style:
italic">Hello world</div>

This is just a bug.

BrokenEagle98 said:

I feel the same way, but doing so without a plan would currently screw up a lot of notes that rely upon that mechanic.

The 'white-space: pre-line' property makes it so that \n and <br> are treated exactly the same, except inside tags. This behavior appears to be exactly the same as the scrubber's \n -> <br> substitution, except without this bug and at a native CSS level. This would mean that the only notes that would change would be ones that are broken by the bug in the place.

Updated

Lefkadios said:

The 'white-space: pre-line' property makes it so that \n and <br> are treated exactly the same, except inside tags. This behavior appears to be exactly the same as the scrubber's \n -> <br> substitution, except without this bug and a native CSS level. This would mean that the only notes that would change would be ones that are broken by the bug in the place.

Many notes I write, including those I've seen from others, sets the white-space property to nowrap, because otherwise the note box will wrap the box in weird place. Those that do this usually rely on the \n -> <br> mechanic, and so those notes would be broken if the only thing that gets changed is to set the white-space value to pre-line. Bottom-line, that text transformation still needs to take place for those that rely on it.

BrokenEagle98 said:

Many notes I write, including those I've seen from others, sets the white-space property to nowrap, because otherwise the note box will wrap the box in weird place. Those that do this usually rely on the \n -> <br> mechanic, and so those notes would be broken if the only thing that gets changed is to set the white-space value to pre-line. Bottom-line, that text transformation still needs to take place for those that rely on it.

What about pre? Only difference between pre and pre-line plus the transformation is that pre preserves other kinds of whitespace as well. If we advise the use of pre and apply the new rules only to new posts, there's no problem.

Lefkadios said:

What about pre? Only difference between pre and pre-line plus the transformation is that pre preserves other kinds of whitespace as well. If we advise the use of pre and apply the new rules only to new posts, there's no problem.

pre makes everything nowrap.

https://developer.mozilla.org/en-US/docs/Web/CSS/white-space#values

That's great for when you want nowrap (occasionally), but terrible when you don't (most of the time).

The only way to make this scheme workable is having Javascript detect if the white-space of each element in a note is nowrap or not, and set either pre or pre-line accordingly.

It's really hard to make any changes at all to the note CSS without breaking old notes. I'm not confident we can make a change like this without breaking notes.

It was a mistake for us to allow white-space: nowrap to begin with. It allows translators to create notes that are unreadable on mobile.

evazion said:

It was a mistake for us to allow white-space: nowrap to begin with. It allows translators to create notes that are unreadable on mobile.

Since it's only an issue on mobile devices, just create a @media responsive CSS rule which overrides the element white-space property.

For myself, I don't view the availability of the white-space property as a mistake, since it allows me to make notes that are better looking. Many times the popup note box wraps the text in the worst places and the end result ends up looking pretty bad and amateurish. Additionally, there are times I want to replicate the presentation of the original text, which often has no word wraps itself.

BrokenEagle98 said:

Since it's only an issue on mobile devices, just create a @media responsive CSS rule which overrides the element white-space property.

For myself, I don't view the availability of the white-space property as a mistake, since it allows me to make notes that are better looking. Many times the popup note box wraps the text in the worst places and the end result ends up looking pretty bad and amateurish. Additionally, there are times I want to replicate the presentation of the original text, which often has no word wraps itself.

I agree with this. Notes left to wrap on their own lead to pretty clunky and unreadable things sometimes.

BrokenEagle98 said:

pre makes everything nowrap.

https://developer.mozilla.org/en-US/docs/Web/CSS/white-space#values

That's great for when you want nowrap (occasionally), but terrible when you don't (most of the time).

That's my point. I'm saying that new notes for which the translator doesn't want the words to be wrapped and but does want linebreaks to be preserved can just use pre instead of nowrap. The only way in which the behavior of pre differs from that of hard linebreak conversion plus nowrap is that pre preserves all whitespace whereas the linebreak conversion preserves only linebreaks.

evazion said:

It's really hard to make any changes at all to the note CSS without breaking old notes. I'm not confident we can make a change like this without breaking notes.

What if the behavior of the note were to depend on when the note was last edited, with the editing dialogue showing a warning if the notes contain a style that doesn't normally preserve linebreaks?

Updated

Lefkadios said:

What if the behavior of the note were to depend on when the note was last edited, with the editing dialogue showing a warning if the notes contain a style that doesn't normally preserve linebreaks?

Your idea is to have the behavior differ by when a note was last edited? A problem with that is that we wouldn't be able to properly set the default white-space values unless we were to add a CSS class to the note box which would turn that rule on or off, all the while still having to convert carriage returns to <br> for the old-style of notes.

Let me war game this out a bit so that it's clear what we're talking about:

Scheme #1

1. Add default white-space value for note elements
div.note-body *,
div.note-box-inner-border * {
    white-space: pre-line;
}
2. Remove conversion of \n -> <br>
Result:

Everything performs as expected, with the exception of notes which have the white-space value of nowrap, which unless the translator was using actual <br> elements, will now instead have no line breaks whatsoever and could possibly extend outside the screen.

Scheme #2

1. Add CSS class of "preline-note" for note elements if they were edited past a certain date
2. Add default white-space value for note elements
div.note-body.preline-note *,
div.note-box-inner-border.preline-note * {
    white-space: pre-line;
}
3. Remove conversion of \n -> <br> for elements if they were edited past a certain date
Result:

Everything performs as expected. However, there are now 2 separate changes that need to be added and maintained in the Javascript code. Additionally, the Ruby code for the server will have to be modified in order to render the timestamps of the notes into the note bodies, so that the Javascript has that information when it processes the notes.

Scheme #3

The same as scheme #1 with one or more additions.

A. All notes regardless of age are edited to change nowrap to pre

This could be done either by one or more users, or the server itself.

B. The server changes all notes when it renders them

This would only be done if the white-space value was nowrap

Result:

Everything performs as expected. If #A is done, then it would only be a one-time operation and would require no changes in the server code. If #B is done, then it future-proofs any notes that users write in the future in case they do use the nowrap value.

Final thoughts

From all of the above, scheme #3 using option #B seems like the best route. The server is already scrubbing/sanitizing notes as it renders them, so this would just be one additional transformation on top of all of that.

BrokenEagle98 said:

Scheme #3

The same as scheme #1 with one or more additions.

A. All notes regardless of age are edited to change nowrap to pre

This could be done either by one or more users, or the server itself.

B. The server changes all notes when it renders them

This would only be done if the white-space value was nowrap

Result:

Everything performs as expected. If #A is done, then it would only be a one-time operation and would require no changes in the server code. If #B is done, then it future-proofs any notes that users write in the future in case they do use the nowrap value.

The only potential issue with scheme 3 is that pre preserves all whitespace, not just linebreaks. So if the text inside a nowrap element has, for example, double spaces after periods, both those spaces will now visibly render.

Updated

Lefkadios said:

The only potential issue with option 3A is that pre preserves all whitespace, not just linebreaks. So if the text inside a nowrap element has, for example, double spaces after periods, both those spaces will now visibly render.

Not sure if it was clear, but 3B is pretty much the same as 3A, only it's the server that changes the values from nowrap to pre as part of the note sanitation process.

As to your point about pre preserving all white space as opposed to nowrap which does not, the server could also collapse all white-space inside areas where the nowrap would have taken effect, thus still preserving the original look.

Updated

Lefkadios said:

Oh. One problem with 3B is that some notes use spaces in conjunction with pre to manually adjust the positioning of text.

As mentioned in that scheme, it only affects notes which have the white-space value of nowrap, and not others with values like pre.

1