Forum: Ruby on Rails HTML Entity Reference in tags

Announcement (2017-05-07): is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see and for other Rails- und Ruby-related community platforms.
5190330ad8f1b06d35e2c2da73dc623c?d=identicon&s=25 Eric Anderson (Guest)
on 2007-06-13 18:20
(Received via mailing list)
If I have the following markup:

<% form_tag do %>
  <%=text_field_tag 'html', params[:html]%>
  <%=submit_tag 'Update'%>
<% end %>

and input "&copy;" and submit. The form when it comes back up will have
the actual copyright symbol in the text field instead of the characters
"&copy;" that were submitted.

This is due to the fact that text_field_tag[1] relies on tag_options[2]
which calls escape_once[3] which is designed to escape the input but
revert any secondary escape.

I am guessing Rails is trying to be helpful in the case where content
gets double-encoded but what about when I want to keep the entity
reference the user typed in.

A use case might be a WYSIWYG field that automatically inserts the
entity references for the user (such as TinyMCE[4]). The WYSIWYG editor
insert &copy; but when the same content is redisplayed it will have the
literal copyright symbol instead of the entity reference. TinyMCE seems
to deal with this fine but others might not. Also some of these
character references (such as &emsp; that might be generated from
copy/paste data from Word) get messed up as they are saved and pulled
from the database (am am guessing because something is the stack isn't
Unicode compatible). So the ends result is the user gets all sorts of
strange characters appearing on the screen.

So my question is what are the solutions. It seems the problem is this
"help" Rails is trying to provide. If something is encoding twice it
should be fixed. Rails should not try to help out. But there is probably
lots of code that now depends on this "help".

I could try to find what in the stack is not Unicode compatible but that
could be difficult and/or impractical as it may be the client browser
(or maybe the database server)!

I have created the tags by hand (i.e. didn't use the helpers) but I sure
don't want to do this for all text fields that might accept entity

Also triple encoding the data seems to work. So:

<% form_tag do %>
  <%=text_field_tag 'html', h(params[:html])%>
  <%=submit_tag 'Update'%>
<% end %>

But that seems tedious and error prone to do on every field that might
get an entity reference. I thought about overriding the behavior of
tag_options to always just escape or always triple encoding before
calling encode_once but what will that break?

Am I missing something. This seems like a big deal but I couldn't find
much on the web about it. Should I submit a ticket in Trac?

Any insight would be appreciated.


This topic is locked and can not be replied to.