There are two xml library projects that do streaming xpath reads with full expression evaluation: Nux and dom4j. Nux is from LBL and is an "kinda like BSD" license and dom4j is BSD license.
http://dom4j.org/dom4j-1.6.1/project-info.html http://acs.lbl.gov/nux/ The licensing probably kills these, right? Apache includes the Jaxen library, but I can't quite tell if they can stream or not. http://xml.apache.org/xalan-j/xpath_apis.html On Tue, Feb 3, 2009 at 8:48 PM, Noble Paul നോബിള് नोब्ळ् < noble.p...@gmail.com> wrote: > On Wed, Feb 4, 2009 at 6:13 AM, Chris Hostetter > <hossman_luc...@fucit.org> wrote: > > > > : > The solr data field is populated properly. So I guess that bit works. > > : > I really wish I could use xpath="//para" > > > > : The limitation comes from streaming the XML instead of creating a DOM. > > : XPathRecordReader is a custom streaming XPath parser implementation and > > : streaming is easy only because we limit the syntax. You can use > > : PlainTextEntityProcessor which gives the XML as a string to a custom > > : Transformer. This Transformer can create a DOM, run your XPath query > and > > : populate the fields. It's more expensive but it is an option. > > > > Maybe it's just me, but it seems like i'm noticing that as DIH gets used > > more, many people are noting that the XPath processing in DIH doesn't > work > > the way they expect because it's a custom XPath parser/engine designed > for > > streaming. > > > > It seems like it would be helpful to have an alternate processor for > > people who don't need the streaming support (ie: are dealing with small > > enough docs that they can load the full DOM tree into memory) that would > > use the default Java XPath engine (and have less caveats/suprises) ... i > > wou think it would probably even make sense for this new XPath processor > > to be the one we suggest for new users, and only suggest the existing > > (stream based) processor if they have really big xml docs to deal with. > > > I guess the current XPathEntityProcessor must be able to switch > between the streaming xpath(XPathRecordReader) and the default java > XPath engine . > > I am just hoping that all the current syntax and semantics will be > applicable for the Java Xpath engine. If not ,we will need a new > EntityProcessor. > > I also would like to explore if the current XPathRecordReader can > implement more XPath syntax with streaming. > > The java xpath engine is not at all efficient for large scale data > processing > > > > (In hindsight XPathEntityProcessor and XPathRecordReader should probably > > have been named StreamingXPathEntityProcessor and > > StreamingXPathRecordReader) > > > > > thoughts? > > > > > > -Hoss > > > > > > > > -- > --Noble Paul > -- Lance Norskog goks...@gmail.com 650-922-8831 (US)