Having lived in China for almost 3 years now I am able to recognize a good bunch of characters, I can type in Chinese on the computer too but writing is much easier than reading since it doesn't require you to actually memorize the characters, you just type in pinyin (phonetics). That is not enough to understand a full, complex text.
After months of research I've finally figured out how to recognize chinese characters automatically from a picture, in order to copy/paste the text into a translator such as Google translate or others. The solution was right under my eyes all this time: Microsoft Office 2007. I had no idea that Office 2007 came with such features. I've always known of expensive solutions such as Ominpage Pro, but I refused to resort to purchasing the app considering its price and how little I would need it.
OCR, which stands for Optical Character Recognition, is the principle of proceeding to the digital analysis of an image to extract the characters/text that it contains, in order to be able to manipulate the text on a computer.
The solution I describe is for PC/Windows users only. If you're interested on doing the same, real time and in a much simpler way directly from your iPhone, I recommend the excellent Pleco. I've tried out the demo version and am seriously considering purchasing the full version.
TLDR: there are basically four steps involved in the process:
- Take a photo with your digital camera, or scan your document with your scanner
- (Optional) Convert your photo to a TIF image
- Open the TIF image with Microsoft Office Document Imaging, run the OCR
- Export the text to Microsoft Word then translate it
This tutorial requires the following software to be installed on your computer:
- Microsoft Office 2007 (though this supposedly works with Microsoft Office 2003): installed with "Document Imaging" and "Picture Manager", both are components that you can select during the setup process. If you don't have those two installed on your computer, modify your Office setup to include them.
- Chinese language support for Microsoft Office, which isn't exactly something you come across easily. I have the chance to work in China and we have licenses for the Chinese version of Microsoft Office, so I've had no trouble. As an alternative you can get the Microsoft Office 2007 Multi-Language pack and install Chinese support as well as a bunch of other languages if you're interested.
To illustrate this tutorial I've chosen to work with a photo taken with my iPhone 4S. It's a document I've taken from a random advertisement booklet found at a friend's place. I tried to get a clear shot of the text to make sure OCR works as accurately as possible.
Since you have Microsoft Office installed on your computer, you should have everything it takes. Right-click your JPG image and "Open with..." - "Microsoft Office Picture Manager". Go to "File" - "Export..." and select the TIF format.
Before performing the OCR you need to specify the document language. To do so, open the "Tools" menu, go to "Options" - "OCR" and select "Chinese" in the drop-down list. The next steps are simple...
- Select the text you want translated
- Right-click the selected text and in the menu, go to "Translate" - "Translate..."
- Select the input and output languages and click the little green arrow
- You'll be taken to Microsoft's online translation service, which provides a surprisingly accurate translation of my original text.