Update Request Processors to the rescue again. Namely, the HTML Strip Field
Update processor:
Add to your solrconfig:
<updateRequestProcessorChain name="html-strip-features">
<processor class="solr.HTMLStripFieldUpdateProcessorFactory">
<str name="fieldName">features</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
See:
http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html
Index content:
curl
"http://localhost:8983/solr/update?commit=true&update.chain=html-strip-features"
\
-H 'Content-type:application/json' -d '
[{"id": "doc-1",
"title": "<Hello World>",
"features": "<p>This is a <a>test</a> line >.",
"other_t": "<p>Other <b>text</b></p>",
"more_t": "Some <b>more <i>text</i>.</b> The end"}]'
Results:
"id":"doc-1",
"title":["<Hello World>"],
"features":["\nThis is a test line >."],
"other_t":"<p>Other <b>text</b></p>",
"more_t":"Some <b>more <i>text</i>.</b> The end",
That stripped the HTML only from the "features" field, and expanded the
named character entity as well.
Add multiple <str> for multiple fields, or use "fieldRegex", or... some
other options. See:
http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html
-- Jack Krupansky
-----Original Message-----
From: Kalyan Kuram
Sent: Thursday, May 30, 2013 8:18 PM
To: solr-user@lucene.apache.org
Subject: Strip HTML Tags and Store
Hi AllI am trying to understand what gets stored when i configure a field
indexed and stored for example i have this in my schema.xml<field
name="articleBody" type="text_general" indexed="true" stored="true" />and
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I was expecting that solr will index & store html strip content when i
invoke query i get some thing like this <str
name="articleBody"><xhtml:h1><xhtml:b>South African Miners Are Trapped by
Debt</xhtml:b></xhtml:h1> <xhtml:p><xhtml:b>▸ A surge in high-interest
lending contributes to mine violence</xhtml:b></xhtml:p> <xhtml:p><xhtml:b>▸
At least one bank “may have reckless lending problems”</xhtml:b></xhtml:p>
<xhtml:p>In 2008, platinum miner James Ntseane borrowed 8,000 rand ($886)
from <xhtml:b>African Bank Investments</xhtml:b> to pay for his
grandmother's funeral. Soon after, he took out two more loans, totaling
10,000 rand, for a sofa and house extension. Four years later he owes at
least 30,515 rand, according to text messages he gets from African Bank,
South Africa's biggest provider of unsecured loans. Under a court-ordered
payment plan, his employer garnishes about 13 percent of his monthly
12,600-rand salary for the lender. He doesn't know how much interest he's
paying. “They are taking too much money,” says Ntseane, 41.</xhtml:p>
<xhtml:p>Ntseane is one of more than 9 million South Africans mired in debt.
African Bank, <xhtml:b>Bayport Financial Services, Capitec Bank
Holdings</xhtml:b>, and other firms have led a boom in unsecured lending,
charging interest as high as 80 percent a year, as is allowed there. Last
year a series of strikes led to at least 46 deaths, the country's worst
mining violence since the end of apartheid. “One of the contributing factors
to all of these strikes has been this surge in unsecured lending,” says Mike
Schussler, chief economist at the research group <a
href="http://economists.co.za/">Economists.co.za</a>, echoing an October
statement by Trade and Industry Minister Rob Davies.</xhtml:p> <xhtml:p>The
value of consumer loans not backed by assets such as homes rose 39 percent
in the year through September, to 140 billion rand, reports the National
Credit Regulator. The loans made up 10 percent of consumer credit on Sept.
30, up from 8 percent a year earlier. In November, South Africa's National
Treasury and the Banking Association of South Africa agreed to review
lending affordability rules, improve client education, and reduce wage
garnishing after the number of people with bad credit rose to a record.
Finance Minister Pravin Gordhan called the rise “worrying” a week
earlier.</xhtml:p> <xhtml:p>George Roussos, an executive for central support
services at African Bank, says miner Ntseane borrowed more than he claims
and took out a credit card. (The bank received permission from Ntseane, who
denies the bank's figures, to discuss his account with <xhtml:i>Bloomberg
Businessweek</xhtml:i>.) The bank says it stopped charging interest in 2011
and has no record of Ntseane making contact after he was injured in a home
robbery in 2010. “The bank attempts to communicate clearly and
transparently, employing multilingual consultants,” says Roussos.</xhtml:p>
<xhtml:p>South African lenders have re sorted to court-ordered wage
garnishing in more than 3 million active cases, according to the National
Debt Mediation Association, a credit industry group that provides consumer
debt counseling. Kem Westdyk, chief executive of <xhtml:b>Summit Garnishee
Solutions</xhtml:b>, which helps mining companies review bank requests, says
at some companies up to 15 percent of workers have wages garnished; at one,
more than a quarter of those cases involve African Bank. “They may have
reckless lending problems,” says Westdyk, adding that some workers have five
or six garnishee orders against them.</xhtml:p> <xhtml:p>Ntseane says his
loan agent didn't mention garnishment when she agreed to delay his loan
payments. Although Davies and the country's credit regulator have pledged to
clamp down on unsecured lending, Ntseane doesn't have high hopes. “I don't
know when I will stop paying,” he says.</xhtml:p> <xhtml:p
prism:class="byline"><xhtml:i>—Franz Wild, Mike Cohen, and Renee
Bonorchis</xhtml:i></xhtml:p> <xhtml:p><xhtml:i><xhtml:b>The bottom
line</xhtml:b> South Africa's unsecured loans jumped 39 percent in a year,
and millions of workers are stuck in a vicious cycle of
debt.</xhtml:i></xhtml:p></str>
Can somebody suggest me how to make the html tags that are appearing in the
field articleBody disappear
Kalyan