You might want to create a field that's analyzed using
HtmlStripCharFilter - this will index all the non-tag/non-attribute text
in the document, and if you store the value, will store the entire XML
document as well.
I've done some work on an XmlStripCharFilter, which does the same thing
(only for well-formed XML) using the WSTX XML parser, which provides a
little bit of extra XML goodness (like entity resolution and xinclude
processing) that HtmlStripCharFilter doesn't. I could share if there's
interest.
-Mike
On 05/18/2011 05:27 PM, Judioo wrote:
Great document. I can see how to import the data direct from the database.
However it seems as though I need to write xpath's in the config to extract
the fields that I wish to transform into an solr document.
So it seems that there is no way of storing the document structure in solr
as is?
2011/5/18 Yury Kats<yuryk...@yahoo.com>
On 5/18/2011 4:19 PM, Judioo wrote:
Any help is greatly appreciated. Pointers to documentation that address
my
issues is even more helpful.
I think this would be a good start:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource