Metadata extractor

url: http://dark.fhtr.org/repos/metadata
tarball: http://dark.fhtr.org/repos/metadata/metadata-0.1.tar.gz

Description

This package Metadata' comes with a library called metadata’ and
a small program called `mdh’.

The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.

Mdh can print out file metadata as YAML and package the metadata
with the file.

This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks,
filename
extensions, bags of bytes and mimetypes.

The metadata hash mostly follows the shared-metadata-spec naming.
shared-filemetadata-spec

Usage

print out metadata header

mdh -p myfile.jpg

create myfile.jpg.mdh, which consists of metadata header +

myfile.jpg
mdh myfile.jpg

print out metadata header from mdh file

mdh -e -p myfile.jpg.mdh

strip out metadata header from mdh file and save it to myfile.jpg

mdh -e myfile.jpg.mdh

irb> Metadata.extract(‘myfile.jpg’)
irb> Metadata.extract_text(‘myfile.pdf’)
irb> Pathname.new(“myfile.jpg”).metadata

Requirements

  • Ruby 1.8

  • Tons of metadata extraction programs,
    list of debian packages follows:
    dcraw
    libimlib2-ruby
    extract
    libimage-exiftool-perl
    poppler-utils
    mplayer
    html2text
    imagemagick
    unhtml
    pstotext
    antiword
    catdoc
    shared-mime-info

  • You do want to install the latest versions of dcraw and
    shared-mime-info to be able to handle camera raw images.
    http://cybercom.net/~dcoffin/dcraw/
    shared-mime-info

  • Python + chardet library
    http://chardet.feedparser.org/

License

Ruby’s

Ilmari H. <ilmari.heikkinen gmail com>

Quoth Ilmari H. on Monday 10 September 2007 04:18:25 pm:

and camera make, mp3 artist, pdf word count) and returns the metadata
shared-filemetadata-spec

print out metadata header from mdh file

Requirements
poppler-utils
shared-mime-info to be able to handle camera raw images.

Ilmari H. <ilmari.heikkinen gmail com>

Any chance this could be expanded to add FLAC and OGG support?

Thanks!