Validating an Image file is an image file

I know how to validate a file based only on the file name dot
extension, but this seems wholly insecure to me.
I feel that just testing for .jpg, .png, .jpeg, .gif, etc… is not
enough.
Clearly renaming a file to anything at all is easy to do.
How can I read into the file and check to see if it is is actually a
file of a given image type? Is there file header info to look for ?
Such as a particular byte sequence at a particular location in the file?

John J.

El Jul 18, 2007, a las 4:25 PM, John J.
escribió:

I know how to validate a file based only on the file name dot
extension, but this seems wholly insecure to me.
I feel that just testing for .jpg, .png, .jpeg, .gif, etc… is not
enough.
Clearly renaming a file to anything at all is easy to do.
How can I read into the file and check to see if it is is actually
a file of a given image type? Is there file header info to look for ?

The canonical solution is to delegate to this library:

http://grub.ath.cx/filemagic/

– fxn

Well, looks like RMagick can do this for me.
For a minute I was starting to fear reading specs on formats, but
perhaps not.

John J.

On Jul 18, 2007, at 9:33 AM, Wayne E. Seguin wrote:


Wayne E. Seguin
Sr. Systems Architect & Systems Admin
[email protected]

The file command and bindings to it are OK, but results are not
consistent across common image file types. What’s worse is that the
code would be unportable. Ideally, the solution would rely simply on
the file format internally and thus be portable.

From: Daniel B. [mailto:[email protected]]

require ‘ptools’

File.image?(file)

i like the image routines, but could it be made extendible? maybe a
template where we can add file info/properties easily like…

cat /temp/image_template

bmp BM6
jpg,jpeg \377\330\377\340\000\020JFIF
png \211PNG
gif GIF89a
gif GIF97a

i’ve updated my ptools to 1.5 and am looking at ptools.rb. but i have
concern, are you sure you like to add those extra methods like .jpg?
.png?, etc?
i find too many methods already in ruby. You have already image?, would
it be ok if image? return the image type like “jpg” eg, and nil if it’s
not? like,

File.image?(“test.jpg”) => “jpg”

also, image? should not be extension dependent since i rename some files
here =)

File.image?(“test.jpg.renamed”) => “jpg”

File.image?(“justadatafile.data”) => nil

kind regards -botp

On Jul 19, 8:14 pm, Peña, Botp [email protected] wrote:

png \211PNG
gif GIF89a
gif GIF97a

I’d rather not. Based on the information I read, those templates don’t
change (for the file formats I support anyway). I’m not sure what the
point would be, and it would be more work that I want to avoid. :slight_smile:

i’ve updated my ptools to 1.5 and am looking at ptools.rb. but i have concern, are you sure you like to add those extra methods like .jpg? .png?, etc?

They’re private.

i find too many methods already in ruby. You have already image?, would it be ok if image? return the image type like “jpg” eg, and nil if it’s not? like,

File.image?(“test.jpg”) => “jpg”

But, the ‘?’ indicates a boolean method. I’d rather not.

also, image? should not be extension dependent since i rename some files here =)

True, but the method I implemented is meant as a poor man’s
replacement for filemagic, to deal with the more likely and common
cases. I want to keep it simple. If you want a more robust and
technically more accurate way to detect images, use filemagic
instead. :slight_smile:

Regards,

Dan

On Jul 18, 2007, at 10:25 , John J. wrote:

John J.

Use the unix file command file #{file_name}

example:

file the.gif
the.gif: GIF image data, version 89a, 91 x 91

On Jul 19, 2007, at 6:05 AM, Raf C. wrote:

‘jpeg’. ‘jpg’ is an abomination introduced by the same people who
introduced
the shudder ‘.htm’ filename extension.

R.
Regardless, .JPG is what you get from most digital cameras! Nothing
is ever guaranteed forever…!
As for .htm it is meaningless, as is .html
All that matters is the .conf or .htaccess declaration, and the mime
type(s).
html/xhtml is often served with .php, .pl, .rhtml, .py etc…

The point of all of this is the same as my OP: file extensions are
meaningless. Only Windows and poorly written apps really rely on them.
A file is a file is a file. It’s what’s inside that matters.
(usually) The extensions are intended for humans to easily identify a
file and to help extend the name spaces. In addition it makes some
processing easier when looking for particular extensions ( C files
for example with .h and .c )
Anticipating particular extensions is fine, But checking should
still occur.
extensions do get deleted or munged. Especially at the hands of users
clicking on things in a desktop GUI.

John J.

On Jul 18, 10:20 am, John J. [email protected]
wrote:

The file command and bindings to it are OK, but results are not
consistent across common image file types. What’s worse is that the
code would be unportable. Ideally, the solution would rely simply on
the file format internally and thus be portable.

How’s this for a portable version? I based this on a 5 minute overview
of the Wikipedia content at
Magic number (programming) - Wikipedia,
combined with some trial and error.

class File
def self.image?(file)
bmp?(file) || jpg?(file) || png?(file) || gif?(file)
end

def self.bmp?(file)
IO.read(file, 3) == “BM6” && File.extname(file).downcase ==
‘.bmp’
end

def self.jpg?(file)
IO.read(file, 10) == “\377\330\377\340\000\020JFIF” &&
File.extname(file).downcase == ‘.jpg’
end

def self.png?(file)
IO.read(file, 4) == “\211PNG” && File.extname(file).downcase ==
‘.png’
end

def self.gif?(file)
[‘GIF89a’, ‘GIF97a’].include?(IO.read(file, 6)) &&
File.extname(file).downcase == ‘.gif’
end
end

Regards,

Dan

On Jul 18, 9:46 pm, John J. [email protected]
wrote:

I’ve not had time to test it out or kick the tires yet, but I’ve
dubbed it ‘The Daniel B. Detector’

Heh. :slight_smile:

Well, I’ve officially added it to ptools, release 1.1.5, which I just
put out. So, now you can do:

require ‘ptools’

File.image?(file)

Regards,

Dan

John J. wrote:

file of a given image type? Is there file header info to look for ?
the.gif: GIF image data, version 89a, 91 x 91
code would be unportable. Ideally, the solution would rely simply on the
file format internally and thus be portable.

If you know it’s an image file, then ImageMagick’s identify command will
probably do what you need, especially with the --verbose switch. I
think you get the same info from Magick::Image#inspect, if RMagick’s an
option for you.

On Jul 18, 9:47 pm, John J. [email protected]
wrote:

That will probably run a lot lighter than RMagick methods, I’ll

Also if you’ll tell me which parts of the documentation are lacking
img = Image.read(‘redzigzag.jpg’)
Which apparently creates something similar to ImageList.
But when trying to use Image methods, I kept getting an error
about …private method … called for …[…]:array
(the elipses are just leaving out specifics)
What I didn’t realize I was missing was the [0] in:
img = Image.read(‘redzigzag.jpg’)[0]
I’m still not quite sure how that is working as part of that
statement really, but I know how to get it going anyway.

Because the file can contain multiple images (say, in the case of an
animated GIF or a multi-layer Photoshop image), the Image.read method
returns an array with an element for each image in the file. By adding
[0] you’re simply saying “the first image in the array.”

Perhaps I need to emphasize this more in the doc. I’ll see what I can
do.

I can certainly see the convenience of using ImageList to batch a
bunch of images, but I figured that Image was fine for working out
the logic and flow of what I want to do first.

There’s a lot of overlap between ImageList and Image. ImageList is
good if you’re working with animations or layers, otherwise it doesn’t
offer much. I hardly ever use it. I’ve often thought it would’ve been
smarter to just have one class, named Image but with the properties of
ImageList, but that’s water under the dam.

The only other truly confusing thing was lack of examples in many
method definitions. This is just me, I can “get it” faster when I
have a method definition and example with dummy data plugged in.

So do I. In fact, there are some who say that RMagick already has too
many examples, especially when they’re waiting for them to run during
the install. :slight_smile: I look at every method to see whether it really needs
its own example, or whether it’s sufficiently similar to other methods
that someone with reasonable familiarity with RMagick can figure out
how to use it. Of course I can be wrong. Again, if you have a list of
methods that you think need an example, let me know.

BTW, the RVG stuff looks pretty interesting. I didn’t have time to
really dig deep into that so much, just skimming.

Overall, I especially like the simplicity of reading and writing
files with RMagick. I’m just glad I can do it with Ruby instead of
PHP, because Ruby is much easier for me to get a clear picture in my
mind of what I’m looking at.

Thanks for taking the time to post your impressions. If you’re really
interested in RMagick, there are a number of books that include
tutorials or recipes. I’ve posted their names on the RMagick home
page. Hal F.'s The_Ruby_Way is particularly thorough.

I hope you enjoy using RMagick!

On Jul 19, 5:05 am, “Raf C.” [email protected] wrote:

the shudder ‘.htm’ filename extension.
Be that as it may, you will see both extensions in practice. So, the
code should probably be refactored to be:

def self.jpeg?(file)
IO.read(file, 10) == “\377\330\377\340\000\020JFIF” && [‘.jpg’,
‘.jpeg’].include?(File.extname(file).downcase)
end

alias jpg? jpeg?

Regards,

Dan

2007/7/18, Daniel B. [email protected]:

def self.jpg?(file)
IO.read(file, 10) == “\377\330\377\340\000\020JFIF” &&
File.extname(file).downcase == ‘.jpg’
end

Ack! The proper abbreviation, and thus also the filename extension, is
‘jpeg’. ‘jpg’ is an abomination introduced by the same people who
introduced
the shudder ‘.htm’ filename extension.

R.

On Jul 18, 2007, at 4:13 PM, Daniel B. wrote:

How’s this for a portable version? I based this on a 5 minute overview
IO.read(file, 3) == “BM6” && File.extname(file).downcase ==
‘.png’
Dan

I’ve not had time to test it out or kick the tires yet, but I’ve
dubbed it ‘The Daniel B. Detector’

John J.

On Jul 19, 7:06 am, Daniel B. [email protected] wrote:

Ack! The proper abbreviation, and thus also the filename extension, is

alias jpg? jpeg?

Whoops, that should be:

class << self
alias jpg? jpeg?
end

Regards,

Dan

On Jul 18, 2007, at 7:28 PM, Tim H. wrote:

At this point I know how to scale images and keep the aspect ratio
background of your desired size. See my article "Alpha Compositing

Also very cool. Thanks.
I will check that stuff out.
I guess it’s just a bit much the first time looking at the API.
The main thing I found a bit vague was creating a new Image or
ImageList instance.
ImageList responds basically as I would expect, but Image doesn’t .
Initially I was trying:
img = Image.read(‘redzigzag.jpg’)
Which apparently creates something similar to ImageList.
But when trying to use Image methods, I kept getting an error
about …private method … called for …[…]:array
(the elipses are just leaving out specifics)
What I didn’t realize I was missing was the [0] in:
img = Image.read(‘redzigzag.jpg’)[0]
I’m still not quite sure how that is working as part of that
statement really, but I know how to get it going anyway.
I can certainly see the convenience of using ImageList to batch a
bunch of images, but I figured that Image was fine for working out
the logic and flow of what I want to do first.

The only other truly confusing thing was lack of examples in many
method definitions. This is just me, I can “get it” faster when I
have a method definition and example with dummy data plugged in.

BTW, the RVG stuff looks pretty interesting. I didn’t have time to
really dig deep into that so much, just skimming.

Overall, I especially like the simplicity of reading and writing
files with RMagick. I’m just glad I can do it with Ruby instead of
PHP, because Ruby is much easier for me to get a clear picture in my
mind of what I’m looking at.

On Jul 18, 2007, at 4:13 PM, Daniel B. wrote:

How’s this for a portable version? I based this on a 5 minute overview
IO.read(file, 3) == “BM6” && File.extname(file).downcase ==
‘.png’
Dan

Cool, Dan, that looks pretty slick. I certainly couldn’t have done
that so quickly, but I knew what basically was needed. That looks
like about what I was thinking of conceptually. Very nice and clean
code! Nice touch with downcasing file extensions too! I hate that
cameras all like to upcase filenames… (> <)!
That will probably run a lot lighter than RMagick methods, I’ll try
it out later. I’ve been fiddling with the RMagick API all afternoon.
There’s a lot of documentation but some of it is lacking in clear
examples.
At this point I know how to scale images and keep the aspect ratio
within a maximum new size, but I’d like to have the final output be
square with matting on the sides for portrait orientation, matting on
the top and bottom for landscape orientation, either one with the
scaled image centered.
Will this require subsequent compositing after scaling? Or did I miss
a method somewhere in the docs that does all of this at once?

John J.

John J. wrote:

within a maximum new size, but I’d like to have the final output be
square with matting on the sides for portrait orientation, matting on
the top and bottom for landscape orientation, either one with the
scaled image centered.
Will this require subsequent compositing after scaling? Or did I miss
a method somewhere in the docs that does all of this at once?

John J.

Yes, you’ll need to composite the scaled image on top of the background
of your desired size. See my article “Alpha Compositing - Part 1”
[http://rmagick.rubyforge.org/src_over.html]. If you’re making a lot of
thumbnails you might also be interested in “Making Thumbnails with
RMagick” [http://rmagick.rubyforge.org/resizing-methods.html], which
compares the performance of all the RMagick resizing methods.

Also if you’ll tell me which parts of the documentation are lacking in
clear examples I’ll see what I can do to fix it. You can always open a
documentation bug in the RMagick bug tracker on RubyForge.