Find a (13) digit number string in .csv file using Ruby

Dtown · April 4, 2024, 5:09pm

Hi All,
I have a csv file which contains a (13) digit number string at the start of the row data.
I’d like to find it and print it on screen using Ruby.

I’ve been using below without success:

#find_numbers.rb

file_contents = File.read(‘log_snippet.csv’)
numbers = file_contents.match(/d{13}/)
p numbers

It’s output is “nil”

The input file looks like this (it’s just one row, with only the single data point I’m looking for):

1711542651000,“{”“version”“:”“0"”,““id””:““fdd11d8a-ef17-75ae-cf50-077285bb7e15"”,”“detail-type”“:”“Naut0 log”“,”“source”“:”“bxt.partner”“,”“account”“:”“654654277766"”,““time””:““2024-03-27T12:30:51Z””,““region””:““us-east-2"”}”

The output I’d like for it to find and display on screen is:

1711542651000

Any help is greatly appreciated!
Best Regards,
DC

Dtown · April 6, 2024, 1:06pm

========
ANSWER
========

Figured out how to print out the first column only.
In the “puts” section, it’s denoted by the “[0]” which calls the first column and prints it’s rows:

require ‘csv’

CSV.foreach((“log_snippet.csv”), headers: false, col_sep: “,”) do |row|
puts row[0]
end

YeomanRando · April 10, 2024, 6:39am

/^\d{13}/ not /d{13}/

/d{13}/ will match “ddddddddddddd”

/^\d{13}/ for “1234567890123”

The backslash makes the difference. And the caret (^) makes sure it is at the start of the line.

You could also do /^[0-9]{13}/

Dtown · April 10, 2024, 12:50pm

Thank you YeomanRando for your reply and explanation !
So I needed the backslash, as it acts as an escape character to let Ruby know I’m looking for digits and not the letter “d”, 13 times, correct ? Knowledge of the caret (^) and your other parse example will be helpful too with this file and other strings as well.
Best Regards,
Donald

YeomanRando · April 12, 2024, 12:34am

I hope it helped. Regular Expressions are difficult because each language does with them what it will. Grep, egrep, sed, and Perl all changed regex mercilessly. Ruby litters its army of built-in methods regex. One never quite knows for certain what one is going to get. It is easy to get an expression that over or under matches. I always try to test them with an online regex tester.

It’s easy to get unwanted results. For instance, I suspect that /\d{13}/ will match the first thirteen digits of a longer than thirteen digit number, and will match a 26 digit number twice. If you want to be rigid and not match longer numbers, the only thing I can think of is to add a space or end of line character to the end:

/^\d{13}[\s|$]/

“Space” is represented by \s, and “end of line” is represented by $. The pipe character (|) means OR, but usually only inside brackets. In any case, now you would be matching extra stuff that you will probably want to strip. And of course it may not be necessary at all. /\d{13}/ will probably do the job. It’s just that large data files always have errors and edge cases. Eventually, “Look for 13 digit numbers at the start of a line” will produce a problem that will need more nuanced regex.

And remember that Ruby doesn’t necessarily return a number object when regex matches something. It may return its goofy regex MatchData object. So if you want to deal with that number as a number and not as a MatchData object you’ll have to convert it.