Indeed, I saw in the analysis tab of the solr admin that the § char will be 
removed when using type text_general.
But in this use case we want to make a full text search like "_text_:§45" or 
"_text_:§*" to find words starting with §.
We need a text field here, not a string field!
What is your recommended way to deal with it? 
Is it possible to remove the word break behaviour for the  § char?
Or is the best way to encode all § chars when indexing and searching?



Thanks, Bernd



 Mit freundlichen Grüßen

 Bernd Schmidt
 SOFTWARE-ENTWICKLUNG 

 b.schm...@eggheads.de



 Von:   Shawn Heisey <apa...@elyograg.org> 
 An:   <solr-user@lucene.apache.org> 
 Gesendet:   07.12.2017 16:37 
 Betreff:   Re: Howto search for § character 

On 12/6/2017 9:09 AM, Bernd Schmidt wrote: 
> we have defined a field named "_text_" for a full text search based on 
> field-type "text_general": 
> <field name="_text_" type="text_general" multiValued="true" indexed="true" 
> stored="false"/>" 
> 
> When trying to search for the "§" character, we have strange behaviour: 
> 
> q=_text_:§ AND entityClass:StructureNodeImpl  => numFound:469 (all nodes 
> where entityClass:StructureNodeImpl) 
> q=_text_:§ => numFound:0 
> 
> How can we search for the occurence of the § character? 
 
We can't see how your "text_general" type is defined, but if it is 
anything like the same type included in Solr examples, then it probably 
is using StandardTokenizerFactory.  It appears that this tokenizer 
treats the § character as a word break and removes it from the token 
stream.  Most likely, the reason the search with the extra clause works 
is that the part with that character is removed, and the query ends up 
ONLY being the extra clause. 
 
You will need a fieldType with an analysis chain that doesn't remove the 
§ character, and it's almost guaranteed that you'll need to reindex.  
Unless you do that, searching for that character is not going to be 
possible. 
 
Also keep in mind that searching for a single character may not do what 
you expect if that character is not a single word in the text, and that 
certain filters can end up trimming out really short terms like that. 
 
Thanks, 
Shawn 
 




 eggheads GmbH
 Herner Straße 370
44807 Bochum

Fon +49 234 89397-0
Fax +49 234 89397-28
 
 www.eggheads.de
 -----------------------------------------------


Kunden DER TOURISTIK, EMSA, FRIATEC, MAMMUT, SUTTERLÜTY, SCHÄFER SHOP, THOMAS 
COOK, TUI, WILO SE, WÜRTH, u.v.m.


Leistungen Standardsoftware für Product Information Management, Cross Media 
Publishing & Multi Channel Commerce, Prozessberatung


Innovationspreis 2017 eggheads ist Sieger beim Innovationspreis-IT 2017 in der 
Kategorie E-Commerce. Mehr

-----------------------------------------------

Webinar Vorstellung der neuen Funktionalität der eggheads Suite am 12.12.2017. 
Mehr

-----------------------------------------------

Reply via email to