Thanks! Does Solr include an HTMLTokenFilterFactory?
Paul On 2/22/08, Reece <[EMAIL PROTECTED]> wrote: > I did this as well, but found problems when searching (tags in between > words caused searching nightmares). I recommend stripping out all the > tags using the HTMLTokenFilterFactory or your own regex when indexing, > and storing the actual HTML in an actual database. > > If you really want to store the HTML though, you can use cdata in the > xml like this: > > <?xml version="1.0" encoding="UTF-8" ?> > <add> > <doc> > <field name="id">123</field> > <field name="title"><![CDATA[yourbightmlstring]]></field> > </doc> > </add> > > The CDATA thing will basically say anything between it's tag's will be > rendered as the field value. It only breaks if your html string has a > "]]>" in it to end the data tag. > > > -Reece > > > > > On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis > <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I'm working on a solr app that pulls HTML from an embedded JavaScript > > WYSIWYG editor, and I need to index on the content, but store and > > reproduce the HTML. The problem I have is when I try to add and > > commit, the HTML gets interpreted as XML. Is the way to do this > > properly to create an HTMLTokenFilterFactory? And if so, is there a > > collection of plugins (like filters and such) that someone can point > > me to? > > > > Regards, > > Paul > > >