Thank you! That's even more I wanted to know. ;)
Georg
On Tue, Mar 2, 2010 at 10:05 PM, Walter Underwood wrote:
> You are in luck, because Avi Rappoport has just written a tutorial about
> how to do this. It is available from Lucid Imagination:
>
>
> http://www.lucidimagination.com/solutions/wh
You are in luck, because Avi Rappoport has just written a tutorial about how to
do this. It is available from Lucid Imagination:
http://www.lucidimagination.com/solutions/whitepapers/Indexing-Text-and-HTML-Files-with-Solr
I've just started reviewing it, but knowing Avi, I expect it to be very he
There is an HTML filter documented here, which might be of some help -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
Control characters can be eliminated using code like this -
http://bitbucket.org/cogtree/python-solr/src/tip/pythonsolr/pysolr.py#cl-44