Since paper and myself have never gotten on well I have always been dreaming of a paperless office. A while ago I purchased a Fujitsu ScanSnap S1500 scanner for the office. I did this after doing some research on which Automatic Document Feed (ADF) multipage & duplex scanners were both affordable as well as supported on Linux.
It took a while for me to get around to set all of this up, but the result now is that this scanner is connected to a headless Ubuntu VM and the press of the scanner button will:
- scan the document
- perform OCR to convert to text
- combine the text with PDF to create a searchable PDF
- OPTIONAL – send the resulting document into Alfresco Document Management Server via FTP
Install dependencies
NOTE: PPA is only required for support of Fujitsu ScanSnap S1500
sudo apt-add-repository ppa:rolfbensch/sane-git
sudo apt-get update
sudo apt-get install sane sane-utils imagemagick tesseract-ocr pdftk libtiff-tools libsane-extras exactimage wput
Install scanbuttond
Download the “Debian Experimental” package from http://pkgs.org/download/scanbuttond
sudo dpkg -i scanbuttond_0.2.3.cvs20090713-14_i386.deb
This step is only for the Fujitsu ScanSnap support. For other scanners you can probably install from the Ubuntu Repository
Scanner config
vim 40-libsane.rules
#add this line
ATTRS{idVendor}=="04c5", ATTRS{idProduct}=="11a2", ENV{libsane_matched}="yes"
Permissions
sudo adduser saned scanner
Useful command lines for troubleshooting
Since I had a few trouble getting this scanner to work properly I found the following commands highly useful in locating the issue.
man sane-usb
sane-find-scanner
scanimage -L
dmesg
tail /var/log/udev
NOTE: If you are using a notebook devices be careful as I spent quite a few hours troubleshooting an error when opening the device from saned. It turned out to be that the USB power-management on the Toshiba notebook caused havoc with saned (http://askubuntu.com/questions/55140/error-during-device-i-o-when-using-usb-scanner). Switching to the desktop that is now housing the scanner fixed that problem. Thank you VIRTUALBOX (I ended up setting up a dedicated VM for this task) !
Configure scanbuttond
vim /etc/default/scanbuttond
#change this line from no to yes
RUN=yes
cd /etc/scanbuttond
sudo cp initscanner.sh.example initscanner.sh
sudo vim initscanner.sh
Uncomment or copy any scanner init command(s).
sudo cp buttonpressed.sh.example buttonpressed.sh
sudo vim buttonpressed.sh
Copy the contents of the scan script below. The script is also hosted on GitHub (https://github.com/leogaggl/misc-scripts/blob/master/buttonpressed.sh)
Scan script
#!/bin/bash
OUT_DIR=/output/directory/name
TMP_DIR=`mktemp -d`
FILE_NAME=scan_`date +%Y%m%d-%H%M%S`
cd $TMP_DIR
echo "################## Scanning ###################"
scanimage --resolution 150 --batch=scan_%03d.pnm --format=pnm --mode Gray --device-name "fujitsu:ScanSnap S1500:67953" --source “ADF Duplex” --page-width 210 --page-height 297 --sleeptimer 1 -y 297 -x 210
echo "################## Cleaning ###################"
for f in ./*.pnm; do
unpaper --size "a4" --overwrite "$f" "$f"
done
echo "############## Converting to TIF ##############"
mogrify -format tif *.pnm
echo "################ OCR ################"
for f in ./*.tif; do
tesseract "$f" "$f" -l eng hocr
hocr2pdf -i "$f" -s -o "$f.pdf" < "$f.html"
done
echo "############## Converting to PDF ##############"
pdftk *.tif.pdf cat output "output.pdf" && rm *.tif.pdf && rm *.tif.html
echo "############## Copy Output File ##############"
cp $FILE_NAME.pdf $OUT_DIR/
echo "############## clean up ##############"
cd ..
rm -rf $TMP_DIR
echo "############## FTP Output File ##############"
#wput $OUT_DIR/$FILE_NAME.pdf ftp://user:pwd@ftp.alfrescoserver.com.au:21/autoscan/pdf/
Credits:
A big thank you & hat tip to the following authors of the following pages:
- http://blog.konradvoelkel.de/2013/03/scan-to-pdfa/
- http://www.robinclarke.net/archives/the-paperless-office-with-linux
- http://askubuntu.com/questions/271271/how-do-i-produce-a-multi-page-sandwich-pdf-with-hocr2pdf
EDIT (2013-09-16): I found this link describing how to remove empty pages: http://philipp.knechtges.com/?p=190 – might have to investigate this when I have some time.
Pingback: [ubuntu] Paperless office on a budget - Fujitsu ScanSnap S1500 | Ubuntu InfoUbuntu Info
Pingback: [ubuntu] Paperless office on a budget - Fujitsu ScanSnap S1500 | WyldePlayground.netWyldePlayground.net -
Pingback: [ubuntu] Paperless office on a budget - Fujitsu ScanSnap S1500 | James n Sheri.comJames n Sheri.com
I’m looking for a portable ADF scanner (e.g. Canon imageFORMULA P-215 Scan-tini Personal Document Scanner, or HP Scanjet Pro 3000 s2 Sheet-feed Scanner) that would be Ubuntu compatible.
Any recommendations from the research that you did?
I’m not finding the portable ADF scanners listed in http://www.sane-project.org/sane-mfgs.html
Thank you!
@Ivan – sorry – I did not look at portable devices at all. In fact I needed a fairly solid stationary option and the Fujitsu was the most cost-efficient.
Hey there,
thank you for the nice instructions, but i’m hanging for hours on the i/o error you mentioned. Can you help me on that? What do you mean by using Virtualbox now?
@Marlon: I ended up switching from the notebook (which I used to test all of this) to a desktop (using a VirtualBox VM on the desktop host). So it was the switch from notebook to desktop that fixed the issue (rather than VirtualBox – that was a bit misleading).
It worked like a charm on Ubuntu 14.04 LTS, thanks a lot! I had been looking for a solution for some time! Two things though:
1) The same “scanbuttond” package file (scanbuttond_0.2.3.cvs20090713-14_i386.deb) is available now in the repositories, probably after installing the cited ppa:rolfbensch/sane-git, so there’s no need to download it from the pkgs.com website, just type “sudo apt-get install scanbuttond”.
2) The actual button on the scanner does nothing when pressed so I’m not sure what the purpose of the “scanbuttond” software actually is, so probably it is not needed anyway if you don’t mind missing this functionality. If the purpose of the software is just to have this physical button work then it doesn’t though, at least in my case. I scanned through Easyscan, Xsane and gscan2pdf and all worked perfectly.
Ah by the way I got it running in a Panasonic Let’s note laptop and there where no usb port power saving issues here.
I would recommend doing the “Scanner config” and “Permissions” sections in the reference article and checking if it works, if it doesn’t then go to “Install dependencies” through the PPA and check again. At last I would install the scanbuttond and configure it.
Pingback: Evan
Thank you, very useuful