Substitution with Hash

jarvo88 · September 11, 2007, 12:41pm

Ok i’ll try to explain what i mean as well as i can

Lets say i have a hash like this

hash { ‘a’ => ‘1’ } #just as example, its actually far bigger

and if a user inputs abcdabcd i was it to sub all of the a’s with 1’s…

As i said, the hash is far larger which is why i can’t just do it with
gsub…

Any ideas?

Thanks in advance…

Lee

jarvo88 · September 11, 2007, 12:50pm

Lee J. wrote the following on 11.09.2007 12:41 :

Any ideas?

Thanks in advance…

Lee

yourstring.split(//).map{|c| hash[c] || c}.join

jarvo88 · September 11, 2007, 12:58pm

Lionel B. wrote the following on 11.09.2007 12:48 :

As i said, the hash is far larger which is why i can’t just do it with
yourstring.split(//).map{|c| hash[c] || c}.join

Note that if your hash is only used to convert single characters to
single characters, you can use String#tr (or tr!). If you are after
performance, as you must prepare the strings used by String#tr from your
hash, you’ll have to bench it to see if it’s worth it in your use case
even if String#tr is faster in itself.
If you are processing UTF-8 content, String#tr is probably not safe
(there are libraries out there for fixing this though IIRC), but my
first answer probably is (assuming $KCODE=‘u’; require ‘jcode’…) as
the regexp processing is utf-8 aware, so the String#split should be
safe.

Lionel

jarvo88 · September 11, 2007, 3:37pm

2007/9/11, Lee J. [email protected]:

"hmm ~‘.split(/ /).map{|c| h[c] || c}.join(’ ')

Outputs hmm ~, but obviously doing things like question marks wont work,
Maybe i’ll have to use loops and string#tr

I’d rather not do the split step, IMHO direct replacement will be
faster:

h = {“#126” => “~”}
s.gsub(/&([^;]+);/) {|c| h[c] || “&#{c};”}

Btw, I believe there are standard classes that do this type of
replacement (entities in HTML documents) - maybe it’s in CGI.

Kind regards

robert

jarvo88 · September 11, 2007, 1:51pm

Thanks that worked well, And no its not single chars, Which is the only
reason i’m doing it this way…

I have to split on whitespace (/ /) because spliting on characters would
obviously split the text i want to transform, which means it wont match
if the characters are trailing another word, HTML special chars for
example

h = {"~" => “~”}

"hmm ~’.split(/ /).map{|c| h[c] || c}.join(’ ')

Outputs hmm ~, but obviously doing things like question marks wont work,
Maybe i’ll have to use loops and string#tr

jarvo88 · September 12, 2007, 1:53am

Lee J. wrote:

Thanks that worked well, And no its not single chars, Which is the only
reason i’m doing it this way…

I have to split on whitespace (/ /) because spliting on characters would
obviously split the text i want to transform, which means it wont match
if the characters are trailing another word, HTML special chars for
example

h = {"~" => “~”}

If you’re just trying to translate numeric html entities it’s easy:
str.gsub(/&#(\d+);/){ [$1.to_i].pack(‘U’) }
If you also want named entities I suggest the htmlentities gems.
If it’s for a more general case, how about:
rx = Regexp.new(hash.keys.map{|k|Regexp.escape(k)}.join("|"))
str.gsub(rx){ hash[$&] }

Daniel

jarvo88 · September 11, 2007, 4:19pm

Robert K. wrote:

If it’s all for html entities yes. I’m not sure of what the actual use
case is though.

h = {"#126" => “~”}
s.gsub(/&([^;]+);/) {|c| h[c] || “&#{c};”}

Btw, I believe there are standard classes that do this type of
replacement (entities in HTML documents) - maybe it’s in CGI.

The htmlentities gem (more robust than CGI with UTF-8…) is quite good.