PDF Pen Gets Better OCR


PDFPen and PDFPen Pro, my favorite OS X PDF applications, recently updated to incorporate the OmniPage OCR engine. I've been testing it the last few days and there is a noticeable improvement in both accuracy and speed. The justification for paying Adobe a king's ransom for Acrobat just gets smaller and smaller as Smile On My Mac continues to improve PDF Pen.

PDFpen OCR Folder Action Script

As discussed on Mac Power Users episode 3, "Going Paperless," the nice people at Smile On My Mac put together an Applescript that, when combined with a folder action, gives you a way to automatically OCR documents using PDFpen or PDFpenPro. So here is the promised walk through:

What you'll need:

1. Some scanned PDF images;
2. PDFpen or PDFpenPro (See my review here);
3. A bit of patience.

Step 1 - Load up the Script Editor

Script Editor.png

This little application allows you to create and save AppleScripts.

Step 2 - Copy in the below script

on adding folder items to this_folder after receiving added_items
repeat with i from 1 to number of items in added_items
set this_item to item i of added_items
tell application "PDFpenPro"
open this_item
set theDoc to document 1
repeat with aPage in pages of theDoc
ocr aPage
-- Looks like we need to modify PDFpen so that we can detect when OCR is done; for now use 15 seconds
delay 15
end repeat
save theDoc
close theDoc
end tell
end repeat
on error errText
display dialog "Error: " & errText
end try
end adding folder items to


Note - if you use PDFpenPro instead of PDFpen, you'll need to open the script and edit the command that reads "tell application "PDFpen" to read "tell application "PDFpenPro".

Note 2 - Wordpress seems to have converted the double dash before the comment in to an em-dash and the quotes to smart quotes. Although I fixed it in the wordpress code, it still reverts to "fixing" things when I publish so you'll have to correct those in your editor. Sorry. If anyone knows a better way to post applescript via wordpress, please drop me a note.

Step 3 - Save the script

You need to save it to a specific directory:

HD/Library/Scripts/Folder Action Scripts/

I named mine "PDFpen Scriptacular"

Step 4 - Create a folder

Save the folder wherever is convenient. Perhaps in your documents folder or (for you anarchists) on the desktop. By the way, did you know that command-shift-n gets you a new folder? I named mine "OCR Drop."

Step 5 - Enable folder actions

Secondary click on the folder and enable folder actions under the "More" item.
Enable Folder Actions.jpg

Step 6 - Configure Folder Action

Right clicking the folder a second time gives you a new option, Configure Folder Action. Click it.
Configure Folder Actions-1.jpg

Step 7 - Pick Your Folder

On the menu that appears, hit the plus (+) sign under the "Folders with Actions" box.
FA pick folder.jpg

Select your folder, wherever you located it. It will then ask you to pick a script. Pick the PDFpen scriptacular.scpt
pick script.jpg

It should now look like this.
Script menu.jpg

Close the window and you are done.

Now just drag a few PDFs in and let the script go to work. Copy the OCR'd PDFs where they belong and you are done. There are a few additional points:

1. There is no Applescript command in PDFpen that reports when it is done doing an OCR so instead there is a 15 second timer. The PDFpen wizards report they are going to try and fix this in a future release.

2. While this script generally works, it sometimes gave me an error when I overloaded it. Be patient.

I want to give my personal thanks to the gang at Smile On My Mac, particularly Greg, who put this script together for Mac Power Users just because we asked.

PDFPen Review


My day job requires me to spend a great deal of time working with PDF documents. For a long time, that meant I needed to have a license for Adobe Acrobat on all of my computers. This is no small task on the Mac platform since Adobe only sells Adobe Acrobat professional for the Mac which will cost you $450. Fortunately, there are other options. Apple's own Preview application does a pretty good job of displaying PDF documents and allowing basic editing. For some people, this will be plenty. If you need something more robust however, Smile On My Mac's PDFPen may be just what you're looking for.

The tools in PDFPen are much more robust than those offered in Preview. Accessing a PDF document with PDFPen, you can add text, images, and signatures. You can also highlight a text field and open it as an editable text block. So when you receive a PDF document within mistake or typos, you can easily fix it yourself. Additionally, PDFPen has a variety of useful editing tools including highlighting, underscoring, and strike through. It even includes a library with common proofreading marks allowing you to simply drag and editing marks to PDF documents before sending them back for processing or correction. This isn't as efficient as simply using a red pen yet, but when working electronically with someone in another state, you really can't beat it. You can also add notes and comments just as in Adobe Acrobat.

pdfpen-markup 420.png

Another nice feature in PDFPen is the ability to use your digital signature. You can use a scanned copy of your signature and literally drop it in a PDF document before returning it to the sender. This provides a truly paperless option for entering contracts or other transactions. This works hand in glove with another PDFPen feature, the library. The library can hold frequently used images and information including your signature. If you work with PDF forms, PDFPen also will accommodate you. It allows you to fill out and save PDF forms easily. While it is possible now to delete pages and reorder pages using Preview, PDF Pen's implementation of this feature is much easier to use.

One of the improvements with the latest version 5 is the inclusion of optical character recognition. Often PDF documents, when provided you, do not have OCR already performed. PDFPen can now either automatically or a request perform its own optical character recognition on your document. In my tests, the performance was not significantly better or worse than that obtained with Adobe Acrobat. As with all OCR functions, it is a function of the original source document. If you have something typed, the OCR will be much better than if something is handwritten.

For $49.95, I believe PDFPen to be an excellent value. If you need to create your own PDF forms, you can upgrade to PDFPen Pro for $99.95. Another added feature at the pro level is the inclusion of the table of contents. This works with the "bookmarks feature" of Adobe Acrobat. I often send PDFPen bookmarked documents to my PC brethren who are none the wiser.

If you currently are using Apple's Preview application without feeling its limits, you're probably okay in terms of PDF manipulation. However, if you are running into its shortcomings or wish you had some of the Adobe Acrobat features without the Adobe Acrobat price, you should take a serious look at PDFPen and PDFPen Pro. You can find them at Smile on My Mac's website.

You can listen to this review on the Typical Mac User Podcast #161.