I would like to create a text string from the complete node tree, expressed in XML. So, /html/body would supply a string which starts: '<div id="header">'. This this possible?
In general, I'm attempting to take the HTML body node, and index it as a text string. Then, I can fetch that text body and highlight words. The reason I want to only save the body part is that I can then pull multiple body parts and string them together into a page. This is how the www.lucidimagination.com/search does our Solr reference guide book. Anyway, /html/body/div/span should supply the text 'Previous' and does not. I changed this to use a ContentStreamDataSource and post the data, and then I get this. What does "Total Requests made to DataSource">0" mean? <?xml version="1.0" encoding="UTF-8" ?> - <response> - <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">124</int> </lst> - <lst name="initArgs"> - <lst name="defaults"> <str name="config">xhtml-data-config.xml</str> </lst> </lst> <str name="command">full-import</str> <str name="status">idle</str> <str name="importResponse" /> - <lst name="statusMessages"> <str name="Total Requests made to DataSource">0</str> <str name="Total Rows Fetched">0</str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started">2010-01-31 21:58:50</str> <str name="">Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.</str> <str name="Committed">2010-01-31 21:58:50</str> <str name="Optimized">2010-01-31 21:58:50</str> <str name="Total Documents Processed">0</str> <str name="Time taken">0:0:0.124</str> </lst> <str name="WARNING">This response format is experimental. It is likely to change in the future.</str> </response> 2010/1/31 Noble Paul നോബിള് नोब्ळ् <noble.p...@corp.aol.com>: > It clear that the xpaths provided won't fetch anything. because there > is no data in those paths. what do you really wish to be indexed ? > > > > On Sun, Jan 31, 2010 at 10:30 AM, Lance Norskog <goks...@gmail.com> wrote: >> This DataImportHandler script does not find any documents in this HTML >> file. The DIH definitely opens the file, but the either the >> xpathprocessor gets no data or it does not recognize the xpaths >> described. Any hints? (I'm using Solr 1.5-dev, sometime recent.) >> >> Thanks! >> >> Lance >> >> >> xhtml-data-config.xml: >> >> <dataConfig> >> <dataSource type="FileDataSource" encoding="UTF-8" /> >> <document> >> <entity name="xhtml" >> forEach="/html/head | /html/body" >> processor="XPathEntityProcessor" pk="id" >> transformer="TemplateTransformer" >> url="/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html" >> > >> <field column="head_s" xpath="/html/head"/> >> <field column="body_s" xpath="/html/body"/> >> </entity> >> </document> >> </dataConfig> >> >> Sample data file: "cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html" >> >> <?xml version="1.0" encoding="UTF-8" ?> >> <html > >> <head > >> <meta content="en-US" name="DC.language" /> >> </head> >> <body> >> <div id="header"> >> <a href="ch05-tokenizers-filters-Solr1.4.html">First</a> >> <span class="nolink">Previous</span> >> <a href="ch05-tokenizers-filters-Solr1.41.html">Next</a> >> <a href="ch05-tokenizers-filters-Solr1.460.html">Last</a> >> </div> >> <div dir="ltr" id="content" style="background-color:transparent"> >> <h1 id="toc0"> >> <span class="SectionNumber">1</span> >> <a id="RefHeading36402771"></a> >> <a id="bkmRefHeading36402771"></a> >> Understanding Analyzers, Tokenizers, and Filters >> </h1> >> </div> >> </body> >> </html> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> > > > > -- > ----------------------------------------------------- > Noble Paul | Systems Architect| AOL | http://aol.com > -- Lance Norskog goks...@gmail.com