Thanks it worked..!!

> From: j...@basetechnology.com
> To: solr-user@lucene.apache.org
> Subject: Re: Strip HTML Tags and Store
> Date: Thu, 30 May 2013 22:53:37 -0400
> 
> Update Request Processors to the rescue again. Namely, the HTML Strip Field 
> Update processor:
> 
> Add to your solrconfig:
> 
>   <updateRequestProcessorChain name="html-strip-features">
>     <processor class="solr.HTMLStripFieldUpdateProcessorFactory">
>       <str name="fieldName">features</str>
>     </processor>
>     <processor class="solr.LogUpdateProcessorFactory" />
>     <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
> 
> See:
> http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html
> 
> Index content:
> 
>   curl 
> "http://localhost:8983/solr/update?commit=true&update.chain=html-strip-features";
>  
> \
>   -H 'Content-type:application/json' -d '
>   [{"id": "doc-1",
>     "title": "&lt;Hello World&gt;",
>     "features": "<p>This is a <a>test</a> line &gt;.",
>     "other_t": "<p>Other <b>text</b></p>",
>     "more_t": "Some <b>more <i>text</i>.</b> The end"}]'
> 
> Results:
> 
>   "id":"doc-1",
>   "title":["&lt;Hello World&gt;"],
>   "features":["\nThis is a test line >."],
>   "other_t":"<p>Other <b>text</b></p>",
>   "more_t":"Some <b>more <i>text</i>.</b> The end",
> 
> That stripped the HTML only from the "features" field, and expanded the 
> named character entity as well.
> 
> Add multiple <str> for multiple fields, or use "fieldRegex", or... some 
> other options. See:
> http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html
> 
> -- Jack Krupansky
> 
> -----Original Message----- 
> From: Kalyan Kuram
> Sent: Thursday, May 30, 2013 8:18 PM
> To: solr-user@lucene.apache.org
> Subject: Strip HTML Tags and Store
> 
> Hi AllI am trying to understand what gets stored when i configure a field 
> indexed and stored for example i have this in my schema.xml<field 
> name="articleBody" type="text_general" indexed="true" stored="true" />and 
> <fieldType name="text_general" class="solr.TextField" 
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
> <charFilter class="solr.HTMLStripCharFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt" enablePositionIncrements="true" />
>             <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt" enablePositionIncrements="true" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> 
> I was expecting that solr will index & store html strip content when i 
> invoke query i get some thing like this <str 
> name="articleBody"><xhtml:h1><xhtml:b>South African Miners Are Trapped by 
> Debt</xhtml:b></xhtml:h1> <xhtml:p><xhtml:b>▸ A surge in high-interest 
> lending contributes to mine violence</xhtml:b></xhtml:p> <xhtml:p><xhtml:b>▸ 
> At least one bank “may have reckless lending problems”</xhtml:b></xhtml:p> 
> <xhtml:p>In 2008, platinum miner James Ntseane borrowed 8,000 rand ($886) 
> from <xhtml:b>African Bank Investments</xhtml:b> to pay for his 
> grandmother's funeral. Soon after, he took out two more loans, totaling 
> 10,000 rand, for a sofa and house extension. Four years later he owes at 
> least 30,515 rand, according to text messages he gets from African Bank, 
> South Africa's biggest provider of unsecured loans. Under a court-ordered 
> payment plan, his employer garnishes about 13 percent of his monthly 
> 12,600-rand salary for the lender. He doesn't know how much interest he's 
> paying. “They are taking too much money,” says Ntseane, 41.</xhtml:p> 
> <xhtml:p>Ntseane is one of more than 9 million South Africans mired in debt. 
> African Bank, <xhtml:b>Bayport Financial Services, Capitec Bank 
> Holdings</xhtml:b>, and other firms have led a boom in unsecured lending, 
> charging interest as high as 80 percent a year, as is allowed there. Last 
> year a series of strikes led to at least 46 deaths, the country's worst 
> mining violence since the end of apartheid. “One of the contributing factors 
> to all of these strikes has been this surge in unsecured lending,” says Mike 
> Schussler, chief economist at the research group <a 
> href="http://economists.co.za/";>Economists.co.za</a>, echoing an October 
> statement by Trade and Industry Minister Rob Davies.</xhtml:p> <xhtml:p>The 
> value of consumer loans not backed by assets such as homes rose 39 percent 
> in the year through September, to 140 billion rand, reports the National 
> Credit Regulator. The loans made up 10 percent of consumer credit on Sept. 
> 30, up from 8 percent a year earlier. In November, South Africa's National 
> Treasury and the Banking Association of South Africa agreed to review 
> lending affordability rules, improve client education, and reduce wage 
> garnishing after the number of people with bad credit rose to a record. 
> Finance Minister Pravin Gordhan called the rise “worrying” a week 
> earlier.</xhtml:p> <xhtml:p>George Roussos, an executive for central support 
> services at African Bank, says miner Ntseane borrowed more than he claims 
> and took out a credit card. (The bank received permission from Ntseane, who 
> denies the bank's figures, to discuss his account with <xhtml:i>Bloomberg 
> Businessweek</xhtml:i>.) The bank says it stopped charging interest in 2011 
> and has no record of Ntseane making contact after he was injured in a home 
> robbery in 2010. “The bank attempts to communicate clearly and 
> transparently, employing multilingual consultants,” says Roussos.</xhtml:p> 
> <xhtml:p>South African lenders have re sorted to court-ordered wage 
> garnishing in more than 3 million active cases, according to the National 
> Debt Mediation Association, a credit industry group that provides consumer 
> debt counseling. Kem Westdyk, chief executive of <xhtml:b>Summit Garnishee 
> Solutions</xhtml:b>, which helps mining companies review bank requests, says 
> at some companies up to 15 percent of workers have wages garnished; at one, 
> more than a quarter of those cases involve African Bank. “They may have 
> reckless lending problems,” says Westdyk, adding that some workers have five 
> or six garnishee orders against them.</xhtml:p> <xhtml:p>Ntseane says his 
> loan agent didn't mention garnishment when she agreed to delay his loan 
> payments. Although Davies and the country's credit regulator have pledged to 
> clamp down on unsecured lending, Ntseane doesn't have high hopes. “I don't 
> know when I will stop paying,” he says.</xhtml:p> <xhtml:p 
> prism:class="byline"><xhtml:i>—Franz Wild, Mike Cohen, and Renee 
> Bonorchis</xhtml:i></xhtml:p> <xhtml:p><xhtml:i><xhtml:b>The bottom 
> line</xhtml:b> South Africa's unsecured loans jumped 39 percent in a year, 
> and millions of workers are stuck in a vicious cycle of 
> debt.</xhtml:i></xhtml:p></str>
> Can somebody suggest me how to make the html tags that are appearing in the 
> field articleBody disappear
> Kalyan
> 
>  
> 
                                          

Reply via email to