Re: Storing, indexing and searching XML documents in Solr

Mike Sokolov Wed, 18 May 2011 14:48:31 -0700

You might want to create a field that's analyzed usingHtmlStripCharFilter - this will index all the non-tag/non-attribute textin the document, and if you store the value, will store the entire XMLdocument as well.

I've done some work on an XmlStripCharFilter, which does the same thing(only for well-formed XML) using the WSTX XML parser, which provides alittle bit of extra XML goodness (like entity resolution and xincludeprocessing) that HtmlStripCharFilter doesn't. I could share if there'sinterest.


-Mike

On 05/18/2011 05:27 PM, Judioo wrote:

Great document. I can see how to import the data direct from the database.
However it seems as though I need to write xpath's in the config to extract
the fields that I wish to transform into an solr document.

So it seems that there is no way of storing the document structure in solr
as is?


2011/5/18 Yury Kats<yuryk...@yahoo.com>

On 5/18/2011 4:19 PM, Judioo wrote:

Any help is greatly appreciated. Pointers to documentation that address

my

issues is even more helpful.

I think this would be a good start:

http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

Re: Storing, indexing and searching XML documents in Solr

Reply via email to