Re: HTMLStripCharFilterFactory not working when using SolrJ java client

aseem cheema Tue, 10 Nov 2009 13:54:33 -0800

I printed the UpdateRequest object (getXML) and the XML is:
<add><doc boost="1.0"><field name="url">http://haha.com</field><field
name="body">&lt;center&gt;content&lt;/center&gt;</field></doc></add>


I can see that the issue is because the HTML/XML <> are replaced by &lt; &gt;
I understand that it is required to do so to keep them from
interfering with the solr xml document, but how do I accomplish what I
want to? I need to get the html in body field stripped out.

Any help is highly appreciated.
Thanks
Aseem

On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema <aseemche...@gmail.com> wrote:
> Hey Guys,
> I have HTMLStripCharFilterFactory char filter declared in my
> schema.xml for fieldType text (code below). I am using this field type
> for body field of my schema. I am seeing different behavior when I use
> SolrJ to post a document (code below) and when I use the analysis.jsp.
> The text I am putting in the field is <center>content</center>.
>
> When SolrJ is used, the field gets the whole value
> <center>content</center>, but when analysis.jsp is used, it shows only
> "content" being used for the field.
>
> What am I possibly doing wrong here? How do I get
> HTMLStripCharFilterFactory to work, even if I am pushing data using
> SolrJ. Thanks.
>
> Your help is highly appreciated.
> Thanks
> --
> Aseem
>
> ############# schema.xml ######################
>        <analyzer type="index">
>          <charFilter class="solr.HTMLStripCharFilterFactory"/>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1"                  catenateAll="0"
> splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>
> ################## SolrJ Code ######################
>     CommonsHttpSolrServer server = new
> CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint";);
>      SolrInputDocument doc = new SolrInputDocument();
>      UpdateRequest req = new UpdateRequest();
>      doc.addField("url", "http://haha.com";);
>      doc.addField("body", sbr.toString());*/
>      doc.addField("body", "<center>content</center>");
>      req.add(doc);
>      req.setAction(ACTION.COMMIT, false, false);
>      UpdateResponse resp = req.process(server);
>      System.out.println(resp);
>



-- 
Aseem

Re: HTMLStripCharFilterFactory not working when using SolrJ java client

Reply via email to