I would like to create a text string from the complete node tree,
expressed in XML. So, /html/body would supply a string which starts:
'<div id="header">'. This this possible?

In general, I'm attempting to take the HTML body node, and index it as
a text string. Then, I can fetch that text body and highlight words.
The reason I want to only save the body part is that I can then pull
multiple body parts and string them together into a page. This is how
the www.lucidimagination.com/search does our Solr reference guide
book.

Anyway, /html/body/div/span should supply the text 'Previous' and does
not. I changed this to use a ContentStreamDataSource and post the
data, and then I get this. What does "Total Requests made to
DataSource">0" mean?

  <?xml version="1.0" encoding="UTF-8" ?>
- <response>
- <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">124</int>
  </lst>
- <lst name="initArgs">
- <lst name="defaults">
  <str name="config">xhtml-data-config.xml</str>
  </lst>
  </lst>
  <str name="command">full-import</str>
  <str name="status">idle</str>
  <str name="importResponse" />
- <lst name="statusMessages">
  <str name="Total Requests made to DataSource">0</str>
  <str name="Total Rows Fetched">0</str>
  <str name="Total Documents Skipped">0</str>
  <str name="Full Dump Started">2010-01-31 21:58:50</str>
  <str name="">Indexing completed. Added/Updated: 0 documents. Deleted
0 documents.</str>
  <str name="Committed">2010-01-31 21:58:50</str>
  <str name="Optimized">2010-01-31 21:58:50</str>
  <str name="Total Documents Processed">0</str>
  <str name="Time taken">0:0:0.124</str>
  </lst>
  <str name="WARNING">This response format is experimental. It is
likely to change in the future.</str>
  </response>

2010/1/31 Noble Paul നോബിള്‍  नोब्ळ् <noble.p...@corp.aol.com>:
> It clear that the xpaths provided won't fetch anything. because there
> is no data in those paths. what do you really wish to be indexed ?
>
>
>
> On Sun, Jan 31, 2010 at 10:30 AM, Lance Norskog <goks...@gmail.com> wrote:
>> This DataImportHandler script does not find any documents in this HTML
>> file. The DIH definitely opens the file, but the either the
>> xpathprocessor gets no data or it does not recognize the xpaths
>> described. Any hints? (I'm using Solr 1.5-dev, sometime recent.)
>>
>> Thanks!
>>
>> Lance
>>
>>
>> xhtml-data-config.xml:
>>
>> <dataConfig>
>>        <dataSource type="FileDataSource" encoding="UTF-8" />
>>        <document>
>>        <entity name="xhtml"
>>                        forEach="/html/head | /html/body"
>>                        processor="XPathEntityProcessor" pk="id"
>>                        transformer="TemplateTransformer"
>>                        url="/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html"
>>                        >
>>                <field column="head_s" xpath="/html/head"/>
>>                <field column="body_s" xpath="/html/body"/>
>>        </entity>
>>        </document>
>> </dataConfig>
>>
>> Sample data file: "cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html"
>>
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <html >
>>  <head >
>>    <meta content="en-US" name="DC.language" />
>>  </head>
>>  <body>
>>    <div id="header">
>>     <a href="ch05-tokenizers-filters-Solr1.4.html">First</a>
>>        <span class="nolink">Previous</span>
>>        <a href="ch05-tokenizers-filters-Solr1.41.html">Next</a>
>>        <a href="ch05-tokenizers-filters-Solr1.460.html">Last</a>
>>    </div>
>>    <div dir="ltr" id="content" style="background-color:transparent">
>>      <h1 id="toc0">
>>        <span class="SectionNumber">1</span>
>>        <a id="RefHeading36402771"></a>
>>        <a id="bkmRefHeading36402771"></a>
>>        Understanding Analyzers, Tokenizers, and Filters
>>      </h1>
>>    </div>
>>  </body>
>> </html>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Systems Architect| AOL | http://aol.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to