NP, Been there, done that, got the t-shirt :)...

On Wed, Mar 5, 2014 at 9:51 PM, Farhan Ali <farhan....@gmail.com> wrote:
> Sorry figured out my problem. It was stupid mistake on my part. Once again
> sorry for that
>
> Thanks
> Farhan
>
>
> On Wed, Mar 5, 2014 at 7:14 PM, Farhan Ali <farhan....@gmail.com> wrote:
>
>> Hi,
>> I am a newbie to Solr and I am trying to index some xml documents using
>> DIH and XPath but I am unable to do it. I get a response message of
>> successful indexing but no document is added to the index. I do not know
>> what i m doing wrong.
>>
>> This is my data config xml file
>>
>>
>> <dataConfig>
>>         <dataSource type="FileDataSource"/>
>>                 <document>
>>                         <entity name="nytxmldir" rootEntity="false"
>> datasource="null"
>>                         processor="FileListEntityProcessor"
>>                         fileName=".*\.xml"
>>                         recursive="true"
>>                         baseDir="/home/farhan/Downloads/nytxml"
>>                         >
>>
>>                         <entity name="nytxml"
>>                         pk="id"
>>                         datasource="nytxmldir"
>>                         url="${nytxmldir.fileAbsolutePath}"
>>                         processor="XPathEntityProcessor"
>>                         forEach="/ntif"
>>                         transformer="RegexTransformer">
>>
>>                                 <field column="id"
>> xpath="/ntif/head/docdata/doc-id/@id-string"/>
>>                                 <field column="title"
>> xpath="/ntif/head/title"/>
>>                                 <field column="paragraph"
>> xpath="/ntif/body/body.content/block[@class='full_text']/p"/>
>>
>>                         </entity>
>>                         </entity>
>>                 </document>
>> </dataConfig>
>>
>>
>>
>>
>>
>> This is my xml document
>>
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE nitf SYSTEM "
>> http://www.nitf.org/IPTC/NITF/3.3/specification/dtd/nitf-3-3.dtd";>
>> <nitf change.date="June 10, 2005" change.time="19:30"
>> version="-//IPTC//DTD NITF 3.3//EN">
>>   <head>
>>     <title>Paid Notice: Deaths   BRADLEY, CAROL L.</title>
>>     <meta content="dn010107" name="slug"/>
>>     <meta content="1" name="publication_day_of_month"/>
>>     <meta content="1" name="publication_month"/>
>>     <meta content="2007" name="publication_year"/>
>>     <meta content="Monday" name="publication_day_of_week"/>
>>     <meta content="Classified" name="dsk"/>
>>     <meta content="7" name="print_page_number"/>
>>     <meta content="B" name="print_section"/>
>>     <meta content="3" name="print_column"/>
>>     <meta content="Paid Death Notices" name="online_sections"/>
>>     <docdata>
>>       <doc-id id-string="1815719"/>
>>       <doc.copyright holder="The New York Times" year="2007"/>
>>       <identified-content>
>>         <person class="indexing_service">BRADLEY, CAROL L.</person>
>>         <classifier class="online_producer" type="types_of_material">Paid
>> Death Notice</classifier>
>>         <classifier class="online_producer"
>> type="taxonomic_classifier">Top/Classifieds/Paid Death Notices</classifier>
>>       </identified-content>
>>     </docdata>
>>     <pubdata date.publication="20070101T000000" ex-ref="
>> http://query.nytimes.com/gst/fullpage.html?res=9B06E1DE1E3AF932A35752C0A9619C8B63";
>> item-length="49" name="The New York Times" unit-of-measure="word"/>
>>   </head>
>>   <body>
>>     <body.head>
>>       <hedline>
>>         <hl1>Paid Notice: Deaths   BRADLEY, CAROL L.</hl1>
>>       </hedline>
>>     </body.head>
>>     <body.content>
>>       <block class="lead_paragraph">
>>         <p>BRADLEY--Carol L., 84, of Tinton Falls, NJ died peacefully at
>> Seabrook Village on December 27. Beloved wife of Floyd (Pete) Bradley, Jr.;
>> loving mother of Steven, Floyd and Lynette Bradley; adored grandmother of
>> Victoria Kent and Camilla, William and Melissa Bradley; caring
>> stepgrandmother of Matthew and Charlton Field.</p>
>>       </block>
>>       <block class="full_text">
>>         <p>BRADLEY--Carol L., 84, of Tinton Falls, NJ died peacefully at
>> Seabrook Village on December 27. Beloved wife of Floyd (Pete) Bradley, Jr.;
>> loving mother of Steven, Floyd and Lynette Bradley; adored grandmother of
>> Victoria Kent and Camilla, William and Melissa Bradley; caring
>> stepgrandmother of Matthew and Charlton Field.</p>
>>       </block>
>>     </body.content>
>>   </body>
>> </nitf>
>>
>>
>> I am really stumped as to why it is not working. I know DIH does not
>> support full XPath syntax but according to the wiki it supports the limited
>> XPath syntax that I am using. Also I have read various internet forums and
>> people have suggested to use groovy and xlts which I am unfamiliar with.
>> I hope someone can help me.
>>
>> Thanks
>> Farhan
>>
>>
>>
>>

Reply via email to