You might want to create a field that's analyzed using HtmlStripCharFilter - this will index all the non-tag/non-attribute text in the document, and if you store the value, will store the entire XML document as well.

I've done some work on an XmlStripCharFilter, which does the same thing (only for well-formed XML) using the WSTX XML parser, which provides a little bit of extra XML goodness (like entity resolution and xinclude processing) that HtmlStripCharFilter doesn't. I could share if there's interest.

-Mike

On 05/18/2011 05:27 PM, Judioo wrote:
Great document. I can see how to import the data direct from the database.
However it seems as though I need to write xpath's in the config to extract
the fields that I wish to transform into an solr document.

So it seems that there is no way of storing the document structure in solr
as is?


2011/5/18 Yury Kats<yuryk...@yahoo.com>

On 5/18/2011 4:19 PM, Judioo wrote:

Any help is greatly appreciated. Pointers to documentation that address
my
issues is even more helpful.
I think this would be a good start:

http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

Reply via email to