Hi Chris, Thanks for your answer, and I add a little thing,
after checking my log it seems that it concerns only some html entities. No problem with & but I have problem with: ü “ etc... I will check your answer to find a solution, Thanks ! Le 29/07/2016 à 23:58, Chris Hostetter a écrit :
: I have several xml files that contains html entities in some fields. ... : If I set my field like this: : : <field name="au">Brown & Gammon</field> : : Solr generates error "Undeclared general entity" ...because that's not valid XML... : if I add CDATA like this: : : <field name="au"><![CDATA[Brown & Gammon]]></field> : : it seems that I can't search with the & ...because that is valid xml, and tells solr you want the literal string "Brown & Gammon" to be indexed -- given a typical analyzer you are probably getting either "&" or "amp" as a term in your index. : Could you help me to find the right syntax ? the client code you are using for indexing can either "parse" these HTML snippets using an HTML parser, and then send solr the *real* string you want to index, or you can configure solr with something like HTMLStripFieldUpdateProcessorFactory (if you want both the indexed form and the stored form to be plain text) or HTMLStripCharFilterFactory (if you wnat to preserve the html markup in the stored value, but strip it as part of the analysis chain for indexing. http://lucene.apache.org/solr/6_1_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html http://lucene.apache.org/core/6_1_0/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilterFactory.html -Hoss http://www.lucidworks.com/
--- L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast. https://www.avast.com/antivirus