l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2007 Mar 20 21:25

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] How to tell if a pdf is text or image?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] How to tell if a pdf is text or image?

Anyone know a way to tap into a pdf programmatically to tell if it contains text vs was scanned as an image?

I basically just want to sort a directory with many thousands of pdfs.
I figured there must be something in the header or in the file info that either says that it's an image or it has text, or to be more complicated gives you a quick percentage of document is text, which I could use to set a sort threshold.

Alternately if it can be done more easily on a ps file there's no reason why I can't do a pdf2ps on it and then decide how to sort.
It's really a one time deal so I'll take the overhead on that operation.

vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Sunset Systems
Who graciously hosts our website & mailing lists!