I printed the UpdateRequest object (getXML) and the XML is: <add><doc boost="1.0"><field name="url">http://haha.com</field><field name="body"><center>content</center></field></doc></add>
I can see that the issue is because the HTML/XML <> are replaced by < > I understand that it is required to do so to keep them from interfering with the solr xml document, but how do I accomplish what I want to? I need to get the html in body field stripped out. Any help is highly appreciated. Thanks Aseem On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema <aseemche...@gmail.com> wrote: > Hey Guys, > I have HTMLStripCharFilterFactory char filter declared in my > schema.xml for fieldType text (code below). I am using this field type > for body field of my schema. I am seeing different behavior when I use > SolrJ to post a document (code below) and when I use the analysis.jsp. > The text I am putting in the field is <center>content</center>. > > When SolrJ is used, the field gets the whole value > <center>content</center>, but when analysis.jsp is used, it shows only > "content" being used for the field. > > What am I possibly doing wrong here? How do I get > HTMLStripCharFilterFactory to work, even if I am pushing data using > SolrJ. Thanks. > > Your help is highly appreciated. > Thanks > -- > Aseem > > ############# schema.xml ###################### > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" > splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SynonymFilterFactory" > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.EnglishPorterFilterFactory" > protected="protwords.txt"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > > ################## SolrJ Code ###################### > CommonsHttpSolrServer server = new > CommonsHttpSolrServer("http://aseem.desktop.amazon.com:8983/solr/sharepoint"); > SolrInputDocument doc = new SolrInputDocument(); > UpdateRequest req = new UpdateRequest(); > doc.addField("url", "http://haha.com"); > doc.addField("body", sbr.toString());*/ > doc.addField("body", "<center>content</center>"); > req.add(doc); > req.setAction(ACTION.COMMIT, false, false); > UpdateResponse resp = req.process(server); > System.out.println(resp); > -- Aseem