org.pricingnexus.tools
Class HtmlToDOMTransform

java.lang.Object
  |
  +--javax.swing.text.html.HTMLEditorKit.ParserCallback
        |
        +--org.pricingnexus.tools.HtmlToDOMTransform

public class HtmlToDOMTransform
extends javax.swing.text.html.HTMLEditorKit.ParserCallback

Tiny but nice helper class: From a stream or String [XXX: must be developed!] this object tries to read a complete HTML page and will try to convert it into a JDOM Document Object and return the reference to this object. The core functionality is handled by the html.parser class, a more internal class from the swing API as it provides us with a "well formed" HTML code, i.e. everything is opened and closed like XML. The only task therefore is to handle the callbacks and construct a Document object from it. The idea for this type of implementation cam after reading the chapter about HTML parsing in the book "JAVA Network Programming" from E.R. Harold. $Header$


Fields inherited from class javax.swing.text.html.HTMLEditorKit.ParserCallback
IMPLIED
 
Constructor Summary
HtmlToDOMTransform()
          Default constructor of class.
 
Method Summary
 void cleanUp()
          Clean up the Document object
 void flush()
          Probably do nothing when flushing occurs
 org.jdom.Document getJDOM()
          If parsing was successfull a reference to the JDOM object can be retreived via this method.
 void handleComment(char[] text, int position)
          We'll don't care for comments
 void handleEndOfLineString(java.lang.String theString)
          Do nothing at EOL - is this correct?
 void handleEndTag(javax.swing.text.html.HTML.Tag tag, int position)
          The close tag is easy: Using the getParrent method of element we climb back one level in the tree
 void handleError(java.lang.String errorMessage, int position)
          Error handling is an open issue for further development
 void handleSimpleTag(javax.swing.text.html.HTML.Tag tag, javax.swing.text.MutableAttributeSet attributes, int position)
          Simple Tags are new elements with just the name of the tag as the element.
 void handleStartTag(javax.swing.text.html.HTML.Tag tag, javax.swing.text.MutableAttributeSet attributes, int position)
          The following six methods instantiate the callbacks from the Parser class A new "open tag" event triggers - the creation of a new child below the currentElement - assignment of the tag name to the element name - finally reassignment of the currentElement to this child
 void handleText(char[] text, int position)
          Also quite straight forward: Just add the Text as new Content to the currentElement
 org.w3c.dom.Document parseToDOM()
          This one works like parseToJDOM() but returns a w3c.DOM object instead
 org.jdom.Document parseToJDOM()
          The important method: Retreives data via the stream and tries to parse it inro a JDOM object.
 void setReader(java.io.Reader theReader)
          The initializer method.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlToDOMTransform

public HtmlToDOMTransform()
Default constructor of class.
Method Detail

setReader

public void setReader(java.io.Reader theReader)
The initializer method. It's separated from the constructor to make this object reusable more easy.

handleStartTag

public void handleStartTag(javax.swing.text.html.HTML.Tag tag,
                           javax.swing.text.MutableAttributeSet attributes,
                           int position)
The following six methods instantiate the callbacks from the Parser class A new "open tag" event triggers - the creation of a new child below the currentElement - assignment of the tag name to the element name - finally reassignment of the currentElement to this child
Overrides:
handleStartTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleEndTag

public void handleEndTag(javax.swing.text.html.HTML.Tag tag,
                         int position)
The close tag is easy: Using the getParrent method of element we climb back one level in the tree
Overrides:
handleEndTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleSimpleTag

public void handleSimpleTag(javax.swing.text.html.HTML.Tag tag,
                            javax.swing.text.MutableAttributeSet attributes,
                            int position)
Simple Tags are new elements with just the name of the tag as the element. But because they don't have any content we don't assign them as the currentElement but leave the currentElement where it's pointing to.
Overrides:
handleSimpleTag in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleText

public void handleText(char[] text,
                       int position)
Also quite straight forward: Just add the Text as new Content to the currentElement
Overrides:
handleText in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleComment

public void handleComment(char[] text,
                          int position)
We'll don't care for comments
Overrides:
handleComment in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleError

public void handleError(java.lang.String errorMessage,
                        int position)
Error handling is an open issue for further development
Overrides:
handleError in class javax.swing.text.html.HTMLEditorKit.ParserCallback

flush

public void flush()
Probably do nothing when flushing occurs
Overrides:
flush in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleEndOfLineString

public void handleEndOfLineString(java.lang.String theString)
Do nothing at EOL - is this correct?
Overrides:
handleEndOfLineString in class javax.swing.text.html.HTMLEditorKit.ParserCallback

parseToJDOM

public org.jdom.Document parseToJDOM()
                              throws java.io.IOException
The important method: Retreives data via the stream and tries to parse it inro a JDOM object. Needs a Reader stream to work with.

parseToDOM

public org.w3c.dom.Document parseToDOM()
                                throws java.io.IOException
This one works like parseToJDOM() but returns a w3c.DOM object instead

cleanUp

public void cleanUp()
Clean up the Document object

getJDOM

public org.jdom.Document getJDOM()
If parsing was successfull a reference to the JDOM object can be retreived via this method. This is just a shallow copy. May be enhanced in the future. If parsing was not successfull a null reference ist returned