A Technology Blog About Code Development, Architecture, Operating System, Hardware, Tips and Tutorials for Developers.

Saturday, May 15, 2010

INDIC LANGUAGE PDF GENERATION - JAVA

3:42:00 PM Posted by Satish Kumar , , 5 comments

Recently I was working in a task to create PDF document using languages other than English. It was working quite well for international languages. But when it came to indic languages, it did not work properly. I tried with lots of fonts and it did not even work for Hindi. I posted a question to i-text and got to know that, the shaping for indic languages is not yet done and it is not going to be a week end effort to finish. Some Indian developer need to contribute source code for that. Then I started looking at other alternatives. In the mean while I used to bring up this to my friends. I worked two years at Analytica India Pvt. Ltd. There I used to work with a Ubuntu work station (I am a java developer and love to work on Ubuntu than Windows). And I used to export my .odt files to PDF using open office. One of my old colleague told me that he created a hindi .odt file and exported that to PDF and that worked perfectly fine for him. So I decided to spend some time in that.

While googling for OpenOffice SDK, I came across Mr. Raman's blog, who worked with CDAC to create indic language jasper report using OpenOffice. I saw some hope of light and after some discussions with Mr. Raman, I finally started working with OpenOffice SDK. 
I played around the API for two three days and finally I could able to generate PDFs with of the indian languages. Some languages like telegu, assamise, oriya, malayalam did not work, but some indian contributers are working towords that. Hope those missing languages will be available soon.

Next I will demonstrate the full idea how I used the OpenOffice SDK to create PDFs.

Environment set up:

1. Need to download OpenOffice Engine and install it to the computer.
2. Need to download the OpenOffice SDK. This SDK contains lot of examples for developer reference.

Notes:

1. I used some off the classes that Mr. Raman had already wrote for jasper report. I modified the classes according to my requirement.
2. I used five jars from OpenOffice. I shared my workspace(eclipse). But I did it with OpenOffice 2.4.0 and JDK 1.4. You better try with updated OpenOffice 3.x.x and JDK 6.

Work Flow:

1. I took the indic language scripts from some web sites and stored those to database table.
2. First I created the .odt stream and wrote the content there and later exported all that to .pdf.
3. My whole eclipse workspace is shared. You may need to modify to make that work.

Source Code:

You can also pull the source code from GitHub.

5 comments:

  1. Will surely try this,hope it'll be really helpful stuff for the techies who are working on indic language pdf creation :)

    Thanks,
    Barkha

    ReplyDelete
  2. I used JDK 1.4 and OpenOffice 2.4.0.
    In a testing I found that; I am able to create a bold indic text in ".odt" file, but when I am exporting that to ".pdf", it is ignoring the boldness of the indic letters. But for English it is working fine. I think it is a bug in the PDF Exporter in OpenOffice 2.4.0.

    Confirmed the same by manually checking with OpenOffice Writer (2.4.0) by exporting the ".odt" to ".pdf".

    I will soon check this with JDK 1.6 and OpenOffice 3.x.x(latest).

    ReplyDelete
  3. When I tested the same in solaris/unix/linux, the program got stuck while connecting to openoffice. After tracking the processes and ports, I came to know the particular port was already in use by other process. So just changed the port in my program and it worked.

    To see the processes of a user
    ps -ef | grep satish

    To see the process tree for a user
    ptree satish

    To see if a port is already been used
    netstat | grep 1280

    ReplyDelete
  4. In Solaris 10 the PDFs were coming with junk characters. I installed the related fonts to OpenOffice using one GUI tool "/opt/openoffice.org2.4/program/spadmin". I used the same font, what I was using in windows. And the problem was resolved.

    ReplyDelete
  5. thanks for valuable sharing

    Nilesh Patil

    ReplyDelete