Speaking

Podcasts
Apple Store Points

Apple Store –Macs & iPods

Support
« Mac Power Users 3 - Going Paperless | Main | ABA Virtualization Article »
Saturday
23May2009

PDFpen OCR Folder Action Script

As discussed on Mac Power Users episode 3, "Going Paperless," the nice people at Smile On My Mac put together an Applescript that, when combined with a folder action, gives you a way to automatically OCR documents using PDFpen or PDFpenPro. So here is the promised walk through:

What you'll need:

1. Some scanned PDF images;
2. PDFpen or PDFpenPro (See my review here);
3. A bit of patience.

Step 1 - Load up the Script Editor


Script Editor.png

This little application allows you to create and save AppleScripts.

Step 2 - Copy in the below script


on adding folder items to this_folder after receiving added_items
try
repeat with i from 1 to number of items in added_items
set this_item to item i of added_items
tell application "PDFpenPro"
open this_item
set theDoc to document 1
repeat with aPage in pages of theDoc
ocr aPage
-- Looks like we need to modify PDFpen so that we can detect when OCR is done; for now use 15 seconds
delay 15
end repeat
save theDoc
close theDoc
end tell
end repeat
on error errText
display dialog "Error: " & errText
end try
end adding folder items to

-------------

Note - if you use PDFpenPro instead of PDFpen, you'll need to open the script and edit the command that reads "tell application "PDFpen" to read "tell application "PDFpenPro".

Note 2 - Wordpress seems to have converted the double dash before the comment in to an em-dash and the quotes to smart quotes. Although I fixed it in the wordpress code, it still reverts to "fixing" things when I publish so you'll have to correct those in your editor. Sorry. If anyone knows a better way to post applescript via wordpress, please drop me a note.

Step 3 - Save the script


You need to save it to a specific directory:

HD/Library/Scripts/Folder Action Scripts/

I named mine "PDFpen Scriptacular"

Step 4 - Create a folder


Save the folder wherever is convenient. Perhaps in your documents folder or (for you anarchists) on the desktop. By the way, did you know that command-shift-n gets you a new folder? I named mine "OCR Drop."

Step 5 - Enable folder actions


Secondary click on the folder and enable folder actions under the "More" item.
Enable Folder Actions.jpg

Step 6 - Configure Folder Action


Right clicking the folder a second time gives you a new option, Configure Folder Action. Click it.
Configure Folder Actions-1.jpg

Step 7 - Pick Your Folder


On the menu that appears, hit the plus (+) sign under the "Folders with Actions" box.
FA pick folder.jpg

Select your folder, wherever you located it. It will then ask you to pick a script. Pick the PDFpen scriptacular.scpt
pick script.jpg

It should now look like this.
Script menu.jpg

Close the window and you are done.

Now just drag a few PDFs in and let the script go to work. Copy the OCR'd PDFs where they belong and you are done. There are a few additional points:

1. There is no Applescript command in PDFpen that reports when it is done doing an OCR so instead there is a 15 second timer. The PDFpen wizards report they are going to try and fix this in a future release.

2. While this script generally works, it sometimes gave me an error when I overloaded it. Be patient.

I want to give my personal thanks to the gang at Smile On My Mac, particularly Greg, who put this script together for Mac Power Users just because we asked.

Reader Comments (8)

Very helpful. There was an article sometime last year in, I think, Macworld entitled something like "Going Paperless" that used script to control scan, OCR the product, and file the OCR on your hard drive. Acrobat was used to provide the OCR.

May 24, 2009 | Unregistered CommenterMike Harahan

Wordpress seems to have converted the double dash before the comment in to an em-dash and the quotes to smart quotes. You might want to edit the text to correct this so that it functions properly for those who do not know enough to fix that sort of little bug.

May 27, 2009 | Unregistered CommenterGreg Mote

@Greg -

Thanks for catching that. I tried to fix it but Wordpress keeps reverting it so I placed a note. If anyone knows a better way to post Applescript to Wordpress, please drop me a note.

May 28, 2009 | Unregistered CommenterMacSparky

Hey Mr. Sparky

Nice one, now could you do the same with Adobe Acrobat :)

June 12, 2009 | Unregistered CommenterMarkus

Me again

Why I ask for Acrobat is because Acrobat reduces the size after the OCR process which PDFpen doesnt do in the same process.

June 12, 2009 | Unregistered CommenterMarkus

Markus - this script works well from DocumentSnap:

http://www.documentsnap.com/acrobat-applescript-for-scansnap-ocr/

August 17, 2009 | Unregistered CommenterRob

This script is really buggy. I keep getting these errors:

1st error: Error: PDFpen got an error: Connection is invalid.

2nd error: Error: PDFPen got and error: Can't get document 1. Invalid index.

(#2 pops up once I hit OK on error #1)

I thought the Scripts published for Acrobat by Macworld were buggy but this one is just as bad. Ideas? I'll also post this over at MPUs. Thanks.

September 23, 2009 | Unregistered CommenterRob

Rob,

There is a new version of PDF Pen since this post published so the script needs to change. I'll be doing a new post with a new script soon.

September 23, 2009 | Unregistered CommenterMacSparky

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>