Stopping String Escaping

basic · January 7, 2010, 9:34pm

Hi,

I’m trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \ ruby teats it as a single \ is there
anyway to stop this? I still require all the other 's such as \n etc.

Thanks
Phil.

basic · January 7, 2010, 10:23pm

Phil Cooper-king wrote:

Hi,

I’m trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \ ruby teats it as a single \ is there
anyway to stop this? I still require all the other 's such as \n etc.

How are you parsing them?

If you are using File.read() then no unescaping is done.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html

If you are parsing them some other way, then please explain it. Please
also explain what “shop escaping” is.

Regards,

Brian.

basic · January 7, 2010, 11:03pm

How are you parsing them?
If you are using File.read() then no unescaping is done.
I am using rails and redcloth, I have the plain-text in the database,
and the text gets parsed when the view gets called atm.

I am using Uv to for the syntax, which I pull out before sending to
redcloth

def snatch_code(text)
    snippets = text.scan(/#>code\((\S+)\)(.+?)#>code/m)

    snippets.each do |snip|
      code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
      code.insert(0, "<notextile>")
      code.insert(code.length, "</notextile>")
      text.sub!(/#>code\((\S+)\)(.+?)#>code/m, code)
    end

    text
  end

then redcloth parses it.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
Programming Ruby: The Pragmatic Programmer's Guide
ouch. and thanks

If you are parsing them some other way, then please explain it. Please
also explain what “shop escaping” is.

dyslexia rules! KO!

I want to stop the escaping thatâ€™s not dealing with whitespace, tab, new
line etc.

Phil.

basic · January 8, 2010, 10:01am

OK, then what I suggest is you make a standalone test case, outside of
Rails.

source = <<‘EOS’
Put your sample source code here
EOS

yeah I did this as well.

require 'rubygems'
require 'uv'

un_parsed =<<ENDOF
  \\
ENDOF

parsed = Uv.parse(un_parsed, "xhtml", "c++", false, "twilight")
=> \

puts un_parsed
=> \

in both cases the slash gets lost. I expect the \ to be lost in puts
tho. Using the dump I see the double slash is still there.

basic · January 7, 2010, 11:25pm

Phil Cooper-king wrote:

I am using Uv to for the syntax, which I pull out before sending to
redcloth

def snatch_code(text)
>     snippets = text.scan(/#>code\((\S+)\)(.+?)#>code/m)
> 
>     snippets.each do |snip|
>       code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
>       code.insert(0, "<notextile>")
>       code.insert(code.length, "</notextile>")
>       text.sub!(/#>code\((\S+)\)(.+?)#>code/m, code)
>     end
> 
>     text
>   end

OK, then what I suggest is you make a standalone test case, outside of
Rails.

source = <<‘EOS’
Put your sample source code here
EOS

Print it to be sure it hasn’t already been escaped by Ruby

Now process it with Uv

Show the intermediate state

Now process it with Redcloth

Show the final state

Then you can see whether the problem is with Uv, or with Redcloth.

Then the question becomes much more focussed - for example, it might be
“how do I stop Redcloth turning \ into \ inside a section?”

basic · January 8, 2010, 11:20am

Hopefully it’s clear from the above
yes, thanks you.

So try your test case again:
(1) Use the DATA.read / END to get the test source in
(2) Use ‘puts’ and not ‘dump’ to see clearly what you have

require 'rubygems'
require 'redcloth'

data_read = DATA.read
string = "\\"

puts RedCloth.new(string).to_html
puts RedCloth.new(data_read).to_html

__END__
\\

yeilds

\

\\

although I have no idea how to treat a string as a file.
is all this to do with encoding? (sorry if that was a dense question)

erb results are similar, which I would have though was be happening in
rails anyway

ERB.new("\").src
=> “_erbout = ‘’; _erbout.concat “\\”; _erbout”

basic · January 8, 2010, 10:57am

Phil Cooper-king wrote:

un_parsed =<<ENDOF
\
ENDOF

Unfortunately, here the \ is being turned into a single backslash by
ruby, the same as inside a quoted string. In other words, the same as
this:

irb(main):001:0> “\”.size
=> 1
irb(main):002:0> ‘\’.size
=> 1

The simplest way of preventing this is to read unparsed from a file, or
you can have an inline dataset at the end of your source code, like
this:

unparsed = DATA.read
… rest of your code goes here

END
\

I expect the \ to be lost in puts
tho.

No, puts never converts two backslashes into one. If your string
contains two backslashes, puts will show two backslashes.

Using the dump I see the double slash is still there.

No, this is the opposite. String#inspect turns a raw string into a
quoted string for display purposes, and as part of this quoting a single
backslash is displayed as two backslashes.

Look at this:

irb(main):001:0> s = 92.chr
=> “\”
irb(main):002:0> s.size
=> 1
irb(main):003:0> puts s

=> nil
irb(main):004:0> s2 = s + s
=> “\\”
irb(main):005:0> s2.size
=> 2
irb(main):006:0> puts s2
\
=> nil

Hopefully it’s clear from the above that string s has one character (a
single backslash), and s2 has two backslashes. But these are displayed
in quoted form in irb as

“\”
“\\”

respectively. puts displays them correctly.

Similarly, a single newline character is displayed as backslash-n when
inspect gives the quoted form; whereas puts actually prints a newline.

irb(main):009:0> nl = 10.chr
=> “\n”
irb(main):010:0> nl.size
=> 1
irb(main):011:0> puts nl

=> nil

So try your test case again:
(1) Use the DATA.read / END to get the test source in
(2) Use ‘puts’ and not ‘dump’ to see clearly what you have

basic · January 8, 2010, 11:58am

Here’s the kind of standalone test I was thinking of.

----- 8< -------------------------------------------------
require ‘rubygems’
require ‘uv’
require ‘redcloth’

snip = DATA.read
code = Uv.parse(snip, ‘xhtml’, ‘ruby’, false, ‘twilight’)
code.insert(0, “”)
code.insert(code.length, “”)
puts RedCloth.new(code).to_html

END
puts “Hello world!\n”
puts “Hello\one backslash”
----- 8< -------------------------------------------------

And for me the output it gives is:

puts "Hello world!\n"
puts "Hello\\one backslash"

This looks correct to me. So can you provide an example where it fails?
Otherwise you need to look elsewhere in your application to see if
you’re providing the wrong input into Uv, or you’re handling the output
wrongly.

Or maybe you have an old gem with a bug which has since been fixed. I’m
using:

ultraviolet (0.10.2)
RedCloth (4.2.2)

basic · January 8, 2010, 1:05pm

thanks again

yep I have the same gems and the same result running your code.

I went nuts with the puts all over the place

fromdb: “##code(ruby)\r\n’\\’\r\n##code\r\n”

before: “##code(ruby)\n’\\’\n##code\n”

before parse: “\n’\\’\n”

after parse: “<pre class=“twilight”>\n’\\’\n”

after insert: “<pre class=“twilight”>\n’\\’\n”

after sub: “<pre class=“twilight”>\n’\’\n\n”

so after the sub section I loose two of the back slashes

  text.sub!(/##code\((\S+)\)(.+?)##code/m, code)

basic · January 8, 2010, 1:55pm

Phil Cooper-king wrote:

so after the sub section I loose two of the back slashes
  text.sub!(/##code$(\S+)$(.+?)##code/m, code)

Ah yes, backslashes have a special interpretation in the
string-replacement part of a (g)sub too: \1 means the first capture, \2
means the second capture etc, so \ means a single backslash.

Note that the replacement string here is two backslashes:

puts “abc”.sub(/b/, “\\”)
a\c
=> nil

The easy solution is to use the block form of sub instead.

puts “abc”.sub(/b/) { “\\” }
a\c
=> nil

You could simplify your code if you rewrote to use the block form of
gsub anyway.

text.gsub!(/#>code((\S+))(.+?)#>code/m) do |snip|
… make a string containing the marked-up code
end

basic · January 8, 2010, 11:40am

Phil Cooper-king wrote:

Hopefully it’s clear from the above
yes, thanks you.

So try your test case again:
(1) Use the DATA.read / END to get the test source in
(2) Use ‘puts’ and not ‘dump’ to see clearly what you have

... data_read = DATA.read string = "\\" ... __END__ \\

So in this program, ‘data_read’ contains two backslash characters; and
‘string’ contains a single backslash character.

yeilds

\

\\

That looks correct to me - HTML doesn’t need a backslash to be escaped.
So now add Uv into your test to see if that is munging the backslashes.

although I have no idea how to treat a string as a file.

A string is just a string. In ruby 1.8 it’s a sequence of bytes; in ruby
1.9 it’s a sequence of characters. But that doesn’t matter here; a
backslash is a backslash, and is both a single character and a single
byte in either ASCII or UTF-8.

However if you enter a string literal in a ruby program (or in IRB),
then it is parsed with backslash escaping rules to turn it into an
actual String object. For example:

a = “abc\ndef”
b = ‘abc\ndef’

string ‘a’ contains 7 characters (a,b,c,newline,d,e,f), whereas string b
contains 8 characters (a,b,c,backslash,n,d,e,f). This is because there
are different escaping rules for double-quoted and single-quoted
strings.

In a single-quoted string literal, ’ is a single quote, and \ is a
backslash, and everything else is treated literally, so \n is two
characters \ and n.

In a double-quoted string literal, " is a double quote, \n is a
newline, \ is a backslash, and there’s a whole load of other expansion
including #{…} for expression interpolation and #@… for instance
variable substitution.

erb results are similar, which I would have though was be happening in
rails anyway

ERB.new("\").src
=> “_erbout = ‘’; _erbout.concat “\\”; _erbout”

Now you’re just scaring yourself with backslash escaping

Firstly, note that you passed a single backslash character to ERB.
That’s what the string literal “\” creates.

ERB compiled it to the following Ruby code:

_erbout = ‘’; _erbout.concat “\”; _erbout

which just appends a single backslash to _erbout, which is what you
expect.

However, IRB displays the returned string from ERB.new using
String#inspect, so it is turned into a double-quoted string. This means:

A " is added to the start and end of the string
Any " within the string is displayed as "
Any \ within the string is displayed as \

In other words, String#inspect turns a string into a Ruby string literal

something that you could paste directly into IRB. Try it:

str = “_erbout = ‘’; _erbout.concat “\\”; _erbout”
puts str

That will show you the actual contents of str, which is the Ruby code I
pasted above.

HTH,

Brian.

basic · January 8, 2010, 2:03pm

You could simplify your code if you rewrote to use the block form of
gsub anyway.

Try this:

----- 8< -------------------------------------------------
require ‘rubygems’
require ‘uv’
require ‘redcloth’

text = DATA.read
text.gsub!(/#>code((\S+))(.+?)#>code/m) do
“” +
Uv.parse($2, ‘xhtml’, $1, false, ‘twilight’) +
“”
end
puts RedCloth.new(text).to_html

END
h1. Some code

#>code(ruby)
puts “Hello world!\n”
puts “Hello\one backslash”
#>code

h1. The end
----- 8< -------------------------------------------------

Output:

Some code

puts "Hello 
world!\n"
puts "Hello\\one backslash"

The end

basic · January 8, 2010, 2:20pm

Try this:

require ‘rubygems’
require ‘uv’
require ‘redcloth’

text = DATA.read
text.gsub!(/#>code((\S+))(.+?)#>code/m) do
“” +
Uv.parse($2, ‘xhtml’, $1, false, ‘twilight’) +
“”
end
puts RedCloth.new(text).to_html

END
h1. Some code

#>code(ruby)
puts “Hello world!\n”
puts “Hello\one backslash”
#>code

h1. The end

thanks again, it worked like a treat, in 1/2 the lines

basic · January 8, 2010, 2:07pm

I was just reading on them, well I wont forget this mistake quickly.

The easy solution is to use the block form of sub instead.

puts “abc”.sub(/b/) { “\\” }
a\c

yep worked like a gem

You could simplify your code if you rewrote to use the block form of
gsub anyway.

I’m having to loop through the code blocks in order to parse the syntax
with Uv anyway. though that while I was in the loop I may as replace
each code block as its parsed.

thanks for your effort, you’ve been a great help.