I think my original question/thread was accidentally pwnd. Let me take this opportunity to refocus this thread to my original question about DIH and nested entities and xpath. I'll try to ask a very simple question instead:
Why doesn't this field xpath work? By "not working" I mean the MsgKeywordMF field does not populate in the index...unless I remove the xpath filter. <field column="MsgKeywordMF" xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword" /> OR <field column="MsgKeywordMF" xpath="/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keyword " /> - I modified the original xml so Category was an attribute to MsgItem instead...still does not work despite this matching in other tools and explicitly documented in the DIH wiki page. Full details below (in original post as well). Thx, -- Eric On Fri, May 13, 2011 at 4:53 AM, Weiss, Eric <wei...@llnl.gov> wrote: >Apologies in advance if this topic/question has been previously answeredŠI >have scoured the docs, mail archives, web looking for an answer(s) with no >luck. I am sure I am just being dense or missing something obviousŠplease >point out my stupidity as my head hurts trying to get this working. > >Solr 3.1 >Java 1.6 >Eclipse/Tomcat 7/Maven 2.x > >Goal: to extract manufacturer names from a repeating list of keywords each >denoted by a Category, one of which is "Manufacturer", and load them into >a >MsgKeywordMF field (see xml below) > >I have xml files I am loading via DIH. This an abbreviated example xml >data (each file has repeating "Report" items, each report has repeating >MsgSet, Msg, MsgList, etc items). Notice the nested repeating groups, >namely MsgItems, within each document (Report): > > ><Report> > > <ReportMeta> > > <ReportDate>02/22/2011</ReportDate> > > Š > > </ReportMeta> > > <MsgSet> > > <Msg> > > <SourceDocID>http://someurl.com/path/to/doc</SourceDocID> ><http://someurl.com/path/to/doc%3C/SourceDocID%3E> > > Š > > <DocumentText>........blah blah</DocumentText> > > <MsgList> > > <MsgItem> > > <MsgType>SomeType</MsgType> > > <Category>Location</Category> > > <Keyword>USA</Keyword> > > </MsgItem> > > <MsgItem> > > <MsgType>AnotherType</MsgType> > > <Category>Manufacturer</Category> > > <Keyword>Apple</Keyword> > > </MsgItem> > > Š > > </MsgList> > > </Msg> > > </MsgSet> > ></Report> ><Report> >Š ></Report> ><Report> >Š ></Report> >Š > >Here is my data-config.xml: > > ><dataConfig> > > <dataSource type="FileDataSource" encoding="UTF-8" /> > > > <document> > > <entity name="fileload" rootEntity="false" > > processor="FileListEntityProcessor" fileName="^.*\.xml$" >recursive="false" baseDir="/files/xml/"> > > <entity name="report" > > rootEntity="true" pk="id" > > url="${fileload.fileAbsolutePath}" >processor="XPathEntityProcessor" > > forEach="/Report/MsgSet/Msg" onError="skip" > > transformer="DateFormatTransformer,RegexTransformer"> > > <field column="DocumentText" >xpath="/Report/MsgSet/Msg/DocumentText"/> > > <field column="id" xpath="/Report/MsgSet/Msg/SourceDocID"/> > > <field column="MsgCategory" >xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Category" /> > > <field column="MsgKeyword" >xpath="/Report/MsgSet/Msg/MsgList/MsgItem/Keyword" /> > > <field column="MsgKeywordMF" >xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword >" >/> > > Š > > </entity> > > </entity> > > </document> > ></dataConfig> > > >As seen in my config and sample data above, I am extracting the repeating >"Keywords" into the the MsgKeyword field. Also, and the part that does >NOT >work, I am trying to extract into a separate field just the keywords that >have a "Category" of "Manufacturer" --> <field column="MsgKeywordMF" >xpath="/Report/MsgSet/Msg/MsgList/MsgItem[Category='Manufacturer']/Keyword >" >/> > >I have also tried: <field column="MsgKeywordMF" >xpath="/Report/MsgSet/Msg/MsgList/MsgItem[@Category='Manufacturer']/Keywor >d" >/> >Šafter changing the "Category" to an attribute of MsgItem (<MsgItem >Category="Location">) but it too fails to match. > >I have tested my xpath notation against my xml data file using various >xpath evaluator tools, like within Eclipse, and it matches perfectlyŠbut I >can't get it to match/work during import. > >As I am able to understand it, DIH does not support nested/correlated >entities, at least not with XML data sources using nested entity tags. >I've >tried without success to nest entities but I can't "correlate" the nested >entity with the parent. I think the way I'm trying should work, but no >luck >so farŠ. > >BTW, I can't easily change the xml format, although it is possible with >some painŠ > >Any ideas? > >TIA, >-- Eric > > On 5/13/11 1:58 AM, "Gora Mohanty" <g...@mimirtech.com> wrote: >On Fri, May 13, 2011 at 10:18 AM, Ashique <ashique....@gmail.com> wrote: >> Hi All, >> >> I am a Java/J2ee programmer and very new to SOLR. I would like to >>index a >> table in a postgresSql database to SOLR. Then searching the records >>from a >> GUI (Jsp Page) and showing the results in tabular form. Could any one >>help >> me out with a simple sample code. >[...] > >This is too broad a question. Please start out by looking >at the extensive Solr documentation: >* Complete list: http://wiki.apache.org/solr/FrontPage >* Initial tutorial: http://lucene.apache.org/solr/tutorial.html > It is a good idea to first ensure that you are able to get > this working. >* If you are using Java, this should be of interest: > http://wiki.apache.org/solr/SolJava >* For easy data import from a database, you could consider > using the DataImportHandler: > http://wiki.apache.org/solr/DataImportHandler > >You can ask here if you run into issues while trying these out. > >Regards, >Gora