Hi Chris,

Thanks for your answer, and I add a little thing,

after checking my log it seems that it concerns only some html entities.
No problem with & but I have problem with:

ü
“
etc...

I will check your answer to find a solution,

Thanks !

Le 29/07/2016 à 23:58, Chris Hostetter a écrit :
: I have several xml files that contains html entities in some fields.

        ...

: If I set my field like this:
:
: <field name="au">Brown &amp; Gammon</field>
:
: Solr generates error "Undeclared general entity"

...because that's not valid XML...

: if I add CDATA like this:
:
: <field name="au"><![CDATA[Brown &amp; Gammon]]></field>
:
: it seems that I can't search with the &

...because that is valid xml, and tells solr you want the literal string
"Brown &amp; Gammon" to be indexed -- given a typical analyzer you are
probably getting either "&amp;" or "amp" as a term in your index.

: Could you help me to find the right syntax ?

the client code you are using for indexing can either "parse" these HTML
snippets using an HTML parser, and then send solr the *real* string you
want to index, or you can configure solr with something like
HTMLStripFieldUpdateProcessorFactory (if you want both the indexed form
and the stored form to be plain text) or HTMLStripCharFilterFactory (if
you wnat to preserve the html markup in the stored value, but strip it as
part of the analysis chain for indexing.


http://lucene.apache.org/solr/6_1_0/solr-core/org/apache/solr/update/processor/HTMLStripFieldUpdateProcessorFactory.html
http://lucene.apache.org/core/6_1_0/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilterFactory.html


-Hoss
http://www.lucidworks.com/



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Reply via email to