Argument error --- How to solve?

I´m at the very beginning with Ruby and give again and again this error

program.rb:6:in `gsub’: broken utf-8 string (argumenterror)

when I’m trying this short code:

#coding:utf-8

temp=""
txtfile=File.open(“8-3_tiedosto.txt”,“r”);txtfile.each{|row|temp=temp+row};txtfile.close

temp = temp.gsub(“Å”, ‘’)
puts temp

The original text in the file contains characters that I do not to
include to my final result, that should only contain ASCII 65…90 and
97…122. So I do not understand, what arguments should be given to gsub?

I’m sorry because of my stupidity :slight_smile:

Your first argument to gsub appears to be ASCII 197.

C. Zona wrote in post #1031196:

Your first argument to gsub appears to be ASCII 197.

Yes, You’re correct, but still I do not know how to fix my code…As the
source text contains chars not among 65…90 and 97…122, how I can
remove or replace them?

Not possile to edit previous, so additional comment: My environment do
not allow to change character statement at the first line…

On Thu, Nov 10, 2011 at 9:03 AM, Ar Ik [email protected] wrote:

Not possile to edit previous, so additional comment: My environment do
not allow to change character statement at the first line…

If you insert this line at the beginning of the script, what does it
print?

p("".encoding)

Btw, you can simplify reading by doing

txtfile = File.read(“8-3_tiedosto.txt”, encoding: ‘UTF-8’)

assuming your file is encoded in UTF-8.

You might have to play with
Encoding.default_external=
Encoding.default_internal=

For more please see

http://blog.grayproductions.net/articles/miscellaneous_m17n_details

Kind regards

robert

–Try doing this and see if it helps with your substitution experience,
without getting too involved with Ruby’s encoding mechanism

#coding:utf-8

Do NOT delete the above utf-8 line, which

you already have in your original copy

temp=""
txtfile=File.open(“8-3_tiedosto.txt”,“r”)
txtfile.each{|row|temp=temp+row}
txtfile.close

tmp = temp.gsub(/[^A-Z0-9[:punct:]\s]+/ix, ‘’)

puts tmp

PS–I left the numericals and all kinds of punctuational marks in there,
just in case if you have them in the original file–though there are
certainly not within your original range of ASCII 65…90 and 97…122

Ar Ik wrote in post #1031208:

C. Zona wrote in post #1031196:

Your first argument to gsub appears to be ASCII 197.

Yes, You’re correct, but still I do not know how to fix my code…As the
source text contains chars not among 65…90 and 97…122, how I can
remove or replace them?

Strings in ruby 1.9 are complicated beasts. I had a go at understanding
them:
https://github.com/candlerb/string19/blob/master/string19.rb

So it really depends on what you’re trying to do. If you want to
manipulate this file as a series of bytes, and match particular bytes,
then open it in binary mode (‘rb’), and pass only binary strings to
gsub.

temp.gsub!(“xxx”.force_encoding(“BINARY”), “”)

The trouble with opening the file as UTF-8, and doing regexp matches
with UTF-8 characters, is that your program will crash when fed invalid
UTF-8 data. So it is not good for “data cleaning” exercises.

But strangely, ruby 1.9 is quite happy to deal with invalid strings in
some contexts. For example, if you do

temp.size.times do |i|
puts temp[i]
end

then it will work even if the i’th character is invalid. Go figure.

-----Messaggio originale-----
Da: Nik Z. [mailto:[email protected]]
Inviato: gioved 10 novembre 2011 23:55
A: ruby-talk ML
Oggetto: Re: Argument error — How to solve?

–Try doing this and see if it helps with your substitution experience,
without getting too involved with Ruby’s encoding mechanism

#coding:utf-8

Do NOT delete the above utf-8 line, which ## you already have in your

original copy

temp=""
txtfile=File.open(“8-3_tiedosto.txt”,“r”)
txtfile.each{|row|temp=temp+row}
txtfile.close

tmp = temp.gsub(/[^A-Z0-9[:punct:]\s]+/ix, ‘’)

puts tmp

PS–I left the numericals and all kinds of punctuational marks in there,
just in case if you have them in the original file–though there are
certainly not within your original range of ASCII 65…90 and 97…122


Posted via http://www.ruby-forum.com/.


Caselle da 1GB, trasmetti allegati fino a 3GB e in piu’ IMAP, POP3 e
SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Conto Arancio al 4,20%. Zero spese e massima liberta’, aprilo in due
minuti!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid922&d)-12

-----Messaggio originale-----
Da: Luca (Email) [mailto:[email protected]]
Inviato: gioved 29 dicembre 2011 07:58
A: ruby-talk ML
Oggetto: I: Argument error — How to solve?


Caselle da 1GB, trasmetti allegati fino a 3GB e in piu’ IMAP, POP3 e
SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
Riccione Hotel 3 stelle in centro: Pacchetto Capodanno mezza pensione,
animazione bimbi, zona relax, parcheggio. Scopri l’offerta solo per
oggi…
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid983&d)-12

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs