Having lived in China for almost 3 years now I am able to recognize a good bunch of characters, I can type in Chinese on the computer too but writing is much easier than reading since it doesn't require you to actually memorize the characters, you just type in pinyin (phonetics). That is not enough to understand a full, complex text.
After months of research I've finally figured out how to recognize chinese characters automatically from a picture, in order to copy/paste the text into a translator such as Google translate or others. The solution was right under my eyes all this time: Microsoft Office 2007. I had no idea that Office 2007 came with such features. I've always known of expensive solutions such as Ominpage Pro, but I refused to resort to purchasing the app considering its price and how little I would need it.
OCR, which stands for Optical Character Recognition, is the principle of proceeding to the digital analysis of an image to extract the characters/text that it contains, in order to be able to manipulate the text on a computer.
The solution I describe is for PC/Windows users only. If you're interested on doing the same, real time and in a much simpler way directly from your iPhone, I recommend the excellent Pleco. I've tried out the demo version and am seriously considering purchasing the full version.
TLDR: there are basically four steps involved in the process:
- Take a photo with your digital camera, or scan your document with your scanner
- (Optional) Convert your photo to a TIF image
- Open the TIF image with Microsoft Office Document Imaging, run the OCR
- Export the text to Microsoft Word then translate it
This tutorial requires the following software to be installed on your computer:
- Microsoft Office 2007 (though this supposedly works with Microsoft Office 2003): installed with "Document Imaging" and "Picture Manager", both are components that you can select during the setup process. If you don't have those two installed on your computer, modify your Office setup to include them.
- Chinese language support for Microsoft Office, which isn't exactly something you come across easily. I have the chance to work in China and we have licenses for the Chinese version of Microsoft Office, so I've had no trouble. As an alternative you can get the Microsoft Office 2007 Multi-Language pack and install Chinese support as well as a bunch of other languages if you're interested.
Step 1: Taking a picture or scanning a document
I don't need to remind you how you take pictures with a digital camera. Nor how to save pictures from a website with your favorite web browser. If you are going to use a scanner though, and that is probably the solution that will get you the best results, you can probably skip the next step if your scanner supports saving as TIF/TIFF documents.
To illustrate this tutorial I've chosen to work with a photo taken with my iPhone 4S. It's a document I've taken from a random advertisement booklet found at a friend's place. I tried to get a clear shot of the text to make sure OCR works as accurately as possible.
To illustrate this tutorial I've chosen to work with a photo taken with my iPhone 4S. It's a document I've taken from a random advertisement booklet found at a friend's place. I tried to get a clear shot of the text to make sure OCR works as accurately as possible.
Step 2: Converting to TIF/TIFF image
Unfortunately, and I must admit I find this quite odd myself, the tool we're going to use for performing the OCR does not support anything other than the TIF format. So if your picture was saved under any other format (JPG typically, like mine) you'll have to convert it. There are plenty of ways to do so.
Since you have Microsoft Office installed on your computer, you should have everything it takes. Right-click your JPG image and "Open with..." - "Microsoft Office Picture Manager". Go to "File" - "Export..." and select the TIF format.
Since you have Microsoft Office installed on your computer, you should have everything it takes. Right-click your JPG image and "Open with..." - "Microsoft Office Picture Manager". Go to "File" - "Export..." and select the TIF format.
Step 3: Performing the OCR
The actual OCR (Optical Character Recognition) is performed by Microsoft Office Document Imaging. Open the tool, which should be located in your Start Menu under Microsoft Office / Microsoft Office Tools / Microsoft Office Document Imaging.
Before performing the OCR you need to specify the document language. To do so, open the "Tools" menu, go to "Options" - "OCR" and select "Chinese" in the drop-down list. The next steps are simple...
Before performing the OCR you need to specify the document language. To do so, open the "Tools" menu, go to "Options" - "OCR" and select "Chinese" in the drop-down list. The next steps are simple...
Open the file... Click on the OCR button... Click on the "Send Text to Word" button... press OK and you're done!
The text should be more or less faithfully transcripted depending on the quality of the original picture. Now onto translating it to something actually legible to the average westerner :-)
Step 4: Translation to English or other languages
There are tons of translators out there but I'm going to stick to Microsoft Office since that is what we've been using from the start. Yes, you can translate Chinese directly from within Word 2007 if you follow the simple instructions described below:
- Select the text you want translated
- Right-click the selected text and in the menu, go to "Translate" - "Translate..."
- Select the input and output languages and click the little green arrow
- You'll be taken to Microsoft's online translation service, which provides a surprisingly accurate translation of my original text.
Before:
After:
Note: the original document IS about shady management techniques. That's all I had.
Voila, you've successfully translated a document written in another language, based on a simple photo and Microsoft Office. Ah, isn't technology wonderful?






11 comments:
Wow, translating documents just got more exciting! I wish we can also use OCR in other languages like Arabic, Russian, Japanese, or even Korean in order to understand their cultures and their words.
Ruby Badcoe
Intersting and beautiful blog lovely presentation thanks for sharing your views...microsoft office 2007 We24support tech team are available 24/7 for repairs on computers, printers, laptops, desktops. Our tech team taken to new heights with our technician’s knowledge and support.at 1-866-978-0799 microsoft office 2007 updates
wow, wonderfull. finally, i can find this way. will be very useful for me, when in large numbers.
deserve it, you can write a very good book.
Keep Shared :)
Nice,it's a useful
Ok, I installed all the necessary things but when I go to options there is no CHINESE language, how can I install it?
Forgot to mention, I'm running win. 7 ultimate sp1 (64bit), I already had try installing few multilanguage pack for office 2007 (sp1, sp2, sp3 for XP cos I don't see for ultimate) and still can't find Chinese at options, so, I guess I fail somewhere. Really would like to install this language for my kid, he's playing a Chinese game and all day he bother me to help him translate this or that (I have a friend who is Chinese, but I can't bother him always for a game thing).
Would appreciate if you could lend me a hand with this issue.
PS: I was the Anonymous before this post, now as my name I write my e-mail address.
I actually enjoyed reading through this posting.Many thanks.
Journal support
For quick OCR and translation, I do it all online.
1. I first use this free service at
http://www.sciweavers.org/free-online-ocr
2. Then I use google translate
http://translate.google.com/
Wow, nice! Thanks Alec, this looks like a neat solution. And I bet it works better than the one I am talking about in this post... I'll give this a try and update my post when I can.
The importance of a technical translation being accurate and efficient can indeed not be overstated. Especially in the ever faster moving world of globalized business, successful information and technology transfer within multinational businesses can make the difference between win or lose.I could say that translators really play a big role in our society.I can't see machines taking over the jobs of human translators in the near future, as they have done with so many other professions.
Thanks Clement, the starting of the page you elaborated the Chinese language which most of us feel very difficult so as to learn and write. The letters appear as some of the geometrical configurations and it was a wrong idea about the beautiful language. Performing the OCR has become easy as it showed a real root where most of us had got stuck. Thanks once again.
Post a Comment