A Technology Blog About Code Development, Architecture, Operating System, Hardware, Tips and Tutorials for Developers.

Wednesday, November 14, 2012

XML Parsing in java - StaX Parser

11:57:00 PM Posted by Satish Kumar , No comments
Streaming API for XML, called StaX, is an API for reading and writing XML Documents. StaX is a Pull-Parsing model. Application can take the control over parsing the XML documents by pulling (taking) the events from the parser.
The DOM interface is the easiest XML parser to understand, and use. It parses entire XML document and loads it into memory; then models it with Object for easy traversal or manipulation. DOM Parser is slow and will consume a lot of memory when it loads an XML document which contains a lot of data. Please consider SAX parser as solution for it, SAX is faster than DOM and use less memory.

 JAVA AND XML
                   XML Parsing using Java
                         1. DOM XML Parser
                         2. SAX XML Parser
                         3. StaX XML Parser
                         4. JAXB XML Parser

The core StaX API falls into two categories and they are listed below. They are
  • Cursor API
  • Event Iterator API
Applications can any of these two API for parsing XML documents. The following will focus on the event iterator API as I consider it more convenient to use.

The event iterator API has two main interfaces: XMLEventReader for parsing XML and XMLEventWriter for generating XML.

I have used the following things for this tutorial.

1. JDK 7
2. Maven2

Let's create a java project using maven 

1
mvn archetype:generate -DgroupId=com.techiekernel -DartifactId=ParserDemo -Dpackagename=com.techiekernel

Now create a sample xml document to parse

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<?xml version="1.0" encoding="UTF-8"?>
<products>
 <product>
  <name>R15</name>
  <make>Yamaha</make>
  <engine-cc>150</engine-cc>
  <type>sports</type>
 </product>
 <product>
  <name>Duke</name>
  <make>KTM</make>
  <engine-cc>200</engine-cc>
  <type>Street</type>
 </product>
 <product>
  <name>GS650GS Sertao</name>
  <make>BMW</make>
  <engine-cc>650</engine-cc>
  <type>Enduro</type>
 </product>
 <product>
  <name>Multistada</name>
  <make>Ducati</make>
  <engine-cc>1210</engine-cc>
  <type>Touring</type>
 </product>
</products>

The program to parse the xml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
package com.techiekernel.parser.stax;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.Iterator;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;

public class StaxReader {
 public static void main(String[] args) {
     try {
       // First create a new XMLInputFactory
       XMLInputFactory inputFactory = XMLInputFactory.newInstance();
       // Setup a new eventReader
       InputStream in = new FileInputStream("product.xml");
       XMLEventReader eventReader = inputFactory.createXMLEventReader(in);

       while (eventReader.hasNext()) {
         XMLEvent event = eventReader.nextEvent();

         if (event.isStartElement()) {
           StartElement startElement = event.asStartElement();
           // If we have a item element we create a new item
           if (startElement.getName().getLocalPart() == ("product")) {
            //Do nothing
            System.out.println("Start Product.");
            continue;
           }

           if (event.isStartElement()) {
             if (event.asStartElement().getName().getLocalPart()
                 .equals("name")) {
               event = eventReader.nextEvent();
               System.out.println("Name : " + event.asCharacters().getData());
               continue;
             }
           }
           if (event.asStartElement().getName().getLocalPart()
               .equals("make")) {
             event = eventReader.nextEvent();
             System.out.println("Make : " + event.asCharacters().getData());
             continue;
           }

           if (event.asStartElement().getName().getLocalPart()
               .equals("engine-cc")) {
             event = eventReader.nextEvent();
             System.out.println("Engine : " + event.asCharacters().getData());
             continue;
           }

           if (event.asStartElement().getName().getLocalPart()
               .equals("type")) {
             event = eventReader.nextEvent();
             System.out.println("Type : " + event.asCharacters().getData());
             continue;
           }
         }
         // If we reach the end of an item element we add it to the list
         if (event.isEndElement()) {
           EndElement endElement = event.asEndElement();
           if (endElement.getName().getLocalPart() == ("product")) {
             //Nothing to do
          System.out.println("End Product.");  
           }
         }

       }
     } catch (FileNotFoundException e) {
       e.printStackTrace();
     } catch (XMLStreamException e) {
       e.printStackTrace();
     }
 }
}

Output:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Start Product.
Name : R15
Make : Yamaha
Engine : 150
Type : sports
End Product.
Start Product.
Name : Duke
Make : KTM
Engine : 200
Type : Street
End Product.
Start Product.
Name : GS650GS Sertao
Make : BMW
Engine : 650
Type : Enduro
End Product.
Start Product.
Name : Multistada
Make : Ducati
Engine : 1210
Type : Touring
End Product.


Source Code:

You can pull the code from GitHub.

0 comments:

Post a Comment