url: http://dark.fhtr.org/repos/metadata
tarball: http://dark.fhtr.org/repos/metadata/metadata-0.1.tar.gz
Description
This package Metadata' comes with a library called
metadata’ and
a small program called `mdh’.
The library probes files for their metadata (e.g. jpeg dimensions
and camera make, mp3 artist, pdf word count) and returns the metadata
as a Hash.
Mdh can print out file metadata as YAML and package the metadata
with the file.
This package has many dependencies since there is no single universal
metadata header format that all files use. Blame resource forks,
filename
extensions, bags of bytes and mimetypes.
The metadata hash mostly follows the shared-metadata-spec naming.
http://wiki.freedesktop.org/wiki/Specifications/shared-filemetadata-spec
Usage
print out metadata header
mdh -p myfile.jpg
create myfile.jpg.mdh, which consists of metadata header +
myfile.jpg
mdh myfile.jpg
print out metadata header from mdh file
mdh -e -p myfile.jpg.mdh
strip out metadata header from mdh file and save it to myfile.jpg
mdh -e myfile.jpg.mdh
irb> Metadata.extract(‘myfile.jpg’)
irb> Metadata.extract_text(‘myfile.pdf’)
irb> Pathname.new(“myfile.jpg”).metadata
Requirements
-
Ruby 1.8
-
Tons of metadata extraction programs,
list of debian packages follows:
dcraw
libimlib2-ruby
extract
libimage-exiftool-perl
poppler-utils
mplayer
html2text
imagemagick
unhtml
pstotext
antiword
catdoc
shared-mime-info -
You do want to install the latest versions of dcraw and
shared-mime-info to be able to handle camera raw images.
http://cybercom.net/~dcoffin/dcraw/
http://freedesktop.org/wiki/Software/shared-mime-info -
Python + chardet library
http://chardet.feedparser.org/
License
Ruby’s
Ilmari H. <ilmari.heikkinen gmail com>