[Leaplist] Scanner Setup Help?
Steve Litt
slitt at troubleshooters.com
Wed Jul 15 13:08:11 EDT 2009
On Wednesday 15 July 2009 01:00:11 pm Bruce Metcalf wrote:
> Bruce Metcalf wrote:
> > I need to be able to scan, and hopefully OCR, large amounts of text --
> > hundreds to thousands of pages -- in batches of 12 to 50 pages. Tools
> > available are several Debian boxen and an HP All-in-One with a sheet
> > feeder that connects through a USB port.
> >
> > Question 1: Does anyone have any suggestions about either specific
> > software or a web site "how-to" for this?
>
> Okay, I'll take the last day's silence as evidence that you all agree
> that OCR software simply isn't up to this task. Thanks, Steve, for your
> comments nevertheless.
>
> So, next question:
>
> Is there a software package for Linux anyone might recommend that would
> permit loading the doc feeder with a stack of paper and have it sucked
> into the system as either a PDF or JPG file for manual retyping?
Xsane. It's available as a package on almost any distro. You want to select
the "multipage" method. Be sure you have plenty of diskspace -- the
intermediate .pnm files add up to one big hunk of oxide.
Before deciding what file format you want for the finished product, look at
your OCR software to see what formats it takes as input. Who knows, you might
get lucky and the OCR works.
As I mentioned before, if you make 3 or 4 scan->OCRs and diff em all against
each other, you might be able to get major sections of clean OCR, and you'll
be pointed to the parts that OCR'ed wrong.
If this project is important to you, I'd recommend spending a hundred bucks
for a terrabyte disk, and use biiiiig resolutions like 300 or 600 dpi.
SteveT
Steve Litt
Recession Relief Package
http://www.recession-relief.US
Twitter: http://www.twitter.com/stevelitt
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Leaplist
mailing list