Issue while parsing a big file

Hi,

i’ve an issue with Ruby (version ruby-2.2.4-x64-mingw32/ Windows 10).
I’m parsing a 2 Gb file, line by line, searching the following regex
123456$


Code :

counter = 0
counter2 =0
#$/ = “\r\n”
filename = “…\…\Tests\10-million-combos.txt”
open(“…\Tests\10-million-combos_LF.txt”) do |content|
content.each_line do |line|
counter=counter+1
if line.match(/123456$/)
counter2 += 1
end
end
end
p counter
p counter2

Works good on little file bug bug on the big one. At the start, an issue
was occuring with CR+LF, but i’ve solved it using a tools which set all
end of line to LF. The total count of line is 86931744 (should be
185866730) and it found 6983 occurencies instead of 14901.

You can download the file here :MyAirBridge.com | Send or share big files up to 20 GiB for free
You can verify the expected result like that : grep 123456$ and wc -l

Where does come from the issue in your opinion ?

Cheers.

hi,

ur code runs fine with ruby version 2.0.0 on win10,
i only changed 1 line to :

open(“10mio.txt”,“rb”) do |content|

output:

d:\dev\ruby200\bin\ruby t1.rb
185866730
14901
Exit code: 0

regards,

olli

Hi,

thanks Olli, turning from plain text to binary was the solution !

Cheers :slight_smile: