Check for text file

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

Thanks,

Alin

Alin P. wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it’s a little unclear exactly what you’re trying to
achieve. Do you have an example?

Alex Y. wrote:

Alin P. wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it’s a little unclear exactly what you’re trying to
achieve. Do you have an example?

I’m trying to do a replace in file for some text but I don’t want to
consider files like archives or other binary files.

Alin P. wrote:

Alex Y. wrote:

Alin P. wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

I think so, but it’s a little unclear exactly what you’re trying to
achieve. Do you have an example?

I’m trying to do a replace in file for some text but I don’t want to
consider files like archives or other binary files.

Of course, when I’m on windows I can go after the file extension and try
to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want)
but I don’t know how to do it on Linux/Unix OS where file extension is
not mandatory.

Hello,

On a *nix system, you can do

file_type = file my_file
puts file_type

but this will not work on Windows.

George

On 19.06.2007 09:33, Alin P. wrote:

achieve. Do you have an example?
I’m trying to do a replace in file for some text but I don’t want to
consider files like archives or other binary files.

Of course, when I’m on windows I can go after the file extension and try
to ignore some specific (eg. .exe, .zip, .jar, .rar, .anything_i_want)
but I don’t know how to do it on Linux/Unix OS where file extension is
not mandatory.

You could read the file (or portion of the file), create a histogram of
byte (or groups of bytes) occurrences and compare that to what you
expect for text files (e.g. most chars are “0-9a-zA-Z” and punctuation).

You could as well use command “file” and parse its output.

Kind regards

robert

George M. wrote:

Hello,

On a *nix system, you can do

file_type = file my_file
puts file_type

but this will not work on Windows.

George

Thanks guys, the problem was solved due to your indications :wink:

Regarding file command, I can use it on win also since there are
gnuwin32 tools :slight_smile:

Best regards,

Alin

On 19.06.2007 10:01, George M. wrote:

Hello,

On a *nix system, you can do

file_type = file my_file
puts file_type

but this will not work on Windows.

robert@fussel ~
$ file .inputrc
.inputrc: ASCII English text

robert@fussel ~
$ uname -a
CYGWIN_NT-5.1 fussel 1.5.24(0.156/4/2) 2007-01-31 10:57 i686 Cygwin

:slight_smile:

robert

On Jun 18, 2007, at 23:59 , Alin P. wrote:

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain
text no
matter what characters are in it.

http://blog.zenspider.com/archives/2006/08/i_miss_perls_b.html

On 6/19/07, Alin P. [email protected] wrote:

Hi guys,

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain text no
matter what characters are in it.
This thing may be possible by using ruby ?

If you can’t use ‘file’ directly, you should look at it’s source and see
how
the detection works. I think CVS also detects text quite well.

Thanks,

Ryan D. wrote:

On Jun 18, 2007, at 23:59 , Alin P. wrote:

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain
text no
matter what characters are in it.

http://blog.zenspider.com/archives/2006/08/i_miss_perls_b.html

Nice, thanks.

Hi,

At Wed, 20 Jun 2007 02:10:57 +0900,
Ryan D. wrote in [ruby-talk:256206]:

After some research I still cannot find a way how to see if a file is
plain text or binary. In fact I want to check if a file is plain
text no
matter what characters are in it.

http://blog.zenspider.com/archives/2006/08/i_miss_perls_b.html

You can use String#count:

def File.binary?(path)
s = read(path, 4096) and
!s.empty? and
(/\0/n =~ s or s.count(“\t\n -~”).to_f/s.size<=0.7)
end

In any case, it doesn’t work for non-ascii files.

On Jun 19, 11:33 am, Alin P. [email protected] wrote:

Nice, thanks.
Which I shamelessly plagiarized and stuck in the ptools library.

gem install ptools

File.binary?(‘some_file’)

Regards,

Dan

Hi,

At Wed, 20 Jun 2007 08:22:51 +0900,
Daniel DeLorme wrote in [ruby-talk:256241]:

Still, I have to say I was surprised; I didn’t know that a hyphen in
String#count had the same effect as in a regexp character class. Talk
about an undocumented feature!

It’s documented.

It can be
s.count("^\t\n -~").to_f/s.size>0.3

Nobuyoshi N. wrote:

You can use String#count:

def File.binary?(path)
s = read(path, 4096) and
!s.empty? and
(/\0/n =~ s or s.count("\t\n -~").to_f/s.size<=0.7)
end

In any case, it doesn’t work for non-ascii files.

Pedantic correction: it desn’t work for non-western scripts. French uses
accents here and there but it would pass the test above.

Still, I have to say I was surprised; I didn’t know that a hyphen in
String#count had the same effect as in a regexp character class. Talk
about an undocumented feature!

Daniel