Paperless Office using the Raspberry Pi

This is a follow-up on an older blog using Ubuntu.

r by rosmary, on Flickr
Creative Commons Creative Commons Attribution 2.0 Generic License   by  rosmary 

For this purpose I used a Fujitsu ScanSnap S1300i scanner as I really like the features of this series (full duplex scan as well auto document feeder as well for around $250). It’s document feeder is not a good as the S1500 we have in the office, but very compact and can be powered from USB hub.

Raspberry Pi Prerequisites

Since this will be a purely headless install designed to sit in a corner behind the scanner I am using a Base Raspian (Debian Wheezy) install (I personally like the clean minimal install via https://github.com/debian-pi/raspbian-ua-netinst the best).

apt-get install sudo vim wget wput libusb-dev build-essential git-core

Add non-privileged user account(s)

adduser USERNAME
adduser USERNAME sudo
groupadd scanner
usermod -a -G scanner USERNAME

Install Sane

The version of sane from the Raspbian repos is not working with the Fujitsu ScanJet range and needs to be built from source.

git clone git://git.debian.org/sane/sane-backends.git
cd sane-backends
BACKENDS=epjitsu ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
make
make install

Install S1300i Driver

You need to get the driver file (‘1300i_0D12.nal’) from the CD that came with the scanner. If you still have access to a CDROM drive that is. :(

mkdir -p /usr/share/sane/epjitsu/
cp 1300i_0D12.nal /usr/share/sane/epjitsu/

Check /etc/sane.d/epjitsu.conf and see if the following line is there (in my case it was already created by sane build).

# Fujitsu S1300i
firmware /usr/share/sane/epjitsu/1300i_0D12.nal
usb 0x04c5 0x128d

sane-find-scanner -q

found USB scanner (vendor=0x04c5 [FUJITSU], product=0x128d [ScanSnap S1300i]) at libusb:001:004
found USB scanner (vendor=0x0424, product=0xec00) at libusb:001:003

scanimage -L

device `epjitsu:libusb:001:004′ is a FUJITSU ScanSnap S1300i scanner

Copy libsane rules from the sane build directory to udev rules.
sudo cp sane-backends/tools/udev/libsane.rules /etc/udev/rules.d/60-libsane.rules

Logout and log in a the non-privileged user account previously created.

If the scanimage -L command works as above you have fully configured the scanner to work under that user account.

Start saned on boot-up

Edit the /etc/rc.local file and add the following line before the ‘0’ line to ensure saned is running as the non-privileged user when you have to reboot.

saned -a USERNAME

Installing Conversion Tools

sudo apt-get install imagemagick bc exactimage pdftk tesseract-ocr tesseract-ocr-eng unpaper

You can add other languages such as tesseract-ocr-deu if you require OCR support for those.

Scan to Repository Script

The script is hosted on Github: https://github.com/leogaggl/misc-scripts/blob/master/scan2repo.sh

#!/bin/bash
# Thanks to Andreas Gohr (http://www.splitbrain.org/) for the initial work
# https://github.com/splitbrain/paper-backup/
OUT_DIR=~/scan
TMP_DIR=`mktemp -d`
FILE_NAME=scan_`date +%Y%m%d-%H%M%S`
LANGUAGE="eng"
echo 'scanning...'
scanimage --resolution 300 \
--batch="$TMP_DIR/scan_%03d.pnm" \
--format=pnm \
--mode Gray \
--source 'ADF Duplex'
echo "Output saved in $TMP_DIR/scan*.pnm"
cd $TMP_DIR
# cut borders
echo 'cutting borders...'
for i in scan_*.pnm; do
mogrify -shave 50x5 "${i}"
done
# check if there is blank pages
echo 'checking for blank pages...'
for f in ./*.pnm; do
unpaper --size "a4" --overwrite "$f" `echo "$f" | sed 's/scan/scan_unpaper/g'`
#need to rename and delete original since newer versions of unpaper can't use same file name
rm -f "$f"
done
# apply text cleaning and convert to tif
echo 'cleaning pages...'
for i in scan_*.pnm; do
echo "${i}"
convert "${i}" -contrast-stretch 1% -level 29%,76% "${i}.tif"
done
# Starting OCR
echo 'doing OCR...'
for i in scan_*.pnm.tif; do
echo "${i}"
tesseract "$i" "$i" -l $LANGUAGE hocr
hocr2pdf -i "$i" -s -o "$i.pdf" < "$i.html" done # create PDF echo 'Converting PDF...' pdftk *.tif.pdf cat output "$FILE_NAME.pdf" wput $FILE_NAME.pdf ftp://uid:pwd@scanner.domain:21/Alfresco/scans/ cp $FILE_NAME.pdf $OUT_DIR/ rm -rf $TMP_DIR

Thanks go to Andi Gohr @ Splitbrain for the excellent blog that helped me to get over the sane problems and also gave me some ideas to make the scan script better (as unpaper was not doing such a good job): http://www.splitbrain.org/blog/2014-08/23-paper_backup_1_scanner_setup

Author: Leo Gaggl

ict business owner specialising in mobile learning systems. interests: sustainability, internet of things, ict for development, open innovation, agriculture

2 thoughts on “Paperless Office using the Raspberry Pi”

  1. Hi!

    Thanks for the great blog post! How’s the speed of your scans? I am setting up something similar but I’m experiencing painful speeds scanning and post-processing images. I am, however, using one of the old 700MHz CPU and 180M RAM Raspberrys, so I suppose upgrading to a recent board would help a lot. But even then, are the speeds OK? Or do you think I should investigate other hardware alternatives?

    Best wishes,
    Svante

Leave a Reply