Re: Indexing content, storing html

Paul deGrandis Fri, 22 Feb 2008 11:10:44 -0800

Thanks!

Does Solr include an HTMLTokenFilterFactory?


Paul

On 2/22/08, Reece <[EMAIL PROTECTED]> wrote:
> I did this as well, but found problems when searching (tags in between
>  words caused searching nightmares).  I recommend stripping out all the
>  tags using the HTMLTokenFilterFactory or your own regex when indexing,
>  and storing the actual HTML in an actual database.
>
>  If you really want to store the HTML though, you can use cdata in the
>  xml like this:
>
>  <?xml version="1.0" encoding="UTF-8" ?>
>         <add>
>             <doc>
>                 <field name="id">123</field>
>                 <field name="title"><![CDATA[yourbightmlstring]]></field>
>             </doc>
>       </add>
>
>  The CDATA thing will basically say anything between it's tag's will be
>  rendered as the field value.  It only breaks if your html string has a
>  "]]>" in it to end the data tag.
>
>
>  -Reece
>
>
>
>
>  On Fri, Feb 22, 2008 at 12:19 PM, Paul deGrandis
>  <[EMAIL PROTECTED]> wrote:
>  > Hi all,
>  >
>  >  I'm working on a solr app that pulls HTML from an embedded JavaScript
>  >  WYSIWYG editor, and I need to index on the content, but store and
>  >  reproduce the HTML.  The problem I have is when I try to add and
>  >  commit, the HTML gets interpreted as XML.  Is the way to do this
>  >  properly to create an HTMLTokenFilterFactory?  And if so, is there a
>  >  collection of plugins (like filters and such) that someone can point
>  >  me to?
>  >
>  >  Regards,
>  >  Paul
>  >
>

Re: Indexing content, storing html

Reply via email to