: I make a transformation XSLT which return : : --------------------------------------- : si les ruches d’abeilles prouvent la : monarchie, les fourmillières, les troupes d’éléphants ou : de castors prouvent la république. : --------------------------------------- : i put this html in solr: $doc->addField('body_strip_html', $body_norm); ... : But this don't work! : I want to return this xml files (look exemple) if i search "castor".
I'm confused. a) you said you've already transformed your input XML into plain text -- so i don't see what you need HTML striping at all. b) your current problem doesn't seem to have anything to do with HTML or XML ... you're asking why a document containing "castors" (plural) doesn't match a query for "castor" (singular) but the field type you say are using has a very simple analyzer that doens't do any stemming of any kind... >> <analyzer> >> <charFilter class="solr.HTMLStripCharFilterFactory"/> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> </analyzer> ..since there is no HTML in your input, HTMLStripCharFilterFactory is a no-op. which leaves StandardTokenizerFactory which just does tokenization. It seems like all you need to do is add a stemmer (and for efficiency: remove the HTMLStripCharFilterFactory). I'm no expert, but it looks like you are indexing french, so i would suggest using a french stemmer... https://wiki.apache.org/solr/LanguageAnalysis#French -Hoss