There are two xml library projects that do streaming xpath reads with full
expression evaluation: Nux and dom4j. Nux is from LBL and is an "kinda like
BSD" license and dom4j is BSD license.

http://dom4j.org/dom4j-1.6.1/project-info.html
http://acs.lbl.gov/nux/

The licensing probably kills these, right?

Apache includes the Jaxen library, but I can't quite tell if they can stream
or not.

http://xml.apache.org/xalan-j/xpath_apis.html

On Tue, Feb 3, 2009 at 8:48 PM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.p...@gmail.com> wrote:

> On Wed, Feb 4, 2009 at 6:13 AM, Chris Hostetter
> <hossman_luc...@fucit.org> wrote:
> >
> > : > The solr data field is populated properly. So I guess that bit works.
> > : > I really wish I could use xpath="//para"
> >
> > : The limitation comes from streaming the XML instead of creating a DOM.
> > : XPathRecordReader is a custom streaming XPath parser implementation and
> > : streaming is easy only because we limit the syntax. You can use
> > : PlainTextEntityProcessor which gives the XML as a string to a  custom
> > : Transformer. This Transformer can create a DOM, run your XPath query
> and
> > : populate the fields. It's more expensive but it is an option.
> >
> > Maybe it's just me, but it seems like i'm noticing that as DIH gets used
> > more, many people are noting that the XPath processing in DIH doesn't
> work
> > the way they expect because it's a custom XPath parser/engine designed
> for
> > streaming.
> >
> > It seems like it would be helpful to have an alternate processor for
> > people who don't need the streaming support (ie: are dealing with small
> > enough docs that they can load the full DOM tree into memory) that would
> > use the default Java XPath engine (and have less caveats/suprises) ... i
> > wou think it would probably even make sense for this new XPath processor
> > to be the one we suggest for new users, and only suggest the existing
> > (stream based) processor if they have really big xml docs to deal with.
> >
> I guess the current XPathEntityProcessor must be able to switch
> between the streaming xpath(XPathRecordReader) and the default java
> XPath engine .
>
> I am just hoping that all the current syntax and semantics will be
> applicable for the Java Xpath engine. If not ,we will need a new
> EntityProcessor.
>
> I also would like to explore if the current XPathRecordReader can
> implement more XPath syntax with streaming.
>
> The java xpath engine is not at all efficient for large scale data
> processing
>
>
> > (In hindsight XPathEntityProcessor and XPathRecordReader should probably
> > have been named StreamingXPathEntityProcessor and
> > StreamingXPathRecordReader)
>
> >
> > thoughts?
> >
> >
> > -Hoss
> >
> >
>
>
>
> --
> --Noble Paul
>



-- 
Lance Norskog
goks...@gmail.com
650-922-8831 (US)

Reply via email to