You have to use a different analysis chain. There are about a zillion options, here's a _start_: https://lucene.apache.org/solr/guide/6_6/understanding-analyzers-tokenizers-and-filters.html You'll probably be defining one similar to how text_general is defined, a <fieldType> then use your new type in your <field>. This is really the heart of how you make Solr do what you want when it comes to what's searchable and what's not.
When you use the admin/analysis page, hover over the light gray two-letter abbreviations and it'll pop up the class used for that transformation. You can start with WhitespaceTokenizerFactory which will break only on whitespace. Be aware that other filters can then also manipulate the tokens created by the tokenizer. WhitespaceTokenizerFactory will _not_ remove punctuation for instance, so you have to deal with that. For example periods at the end of a sentence "I Like Cake." would be included in the emitted tokens, so you'e have I Like Cake. You can use one of the filters to deal with that. I would be very reluctant to use the "string" type, it's not analyzed in any way and is almost always the wrong solution for something like this. So input like this I Like Cake. would match _only_ I\ Like\ Cake. You couldn't search on just the term "like", or even "Like" but only "*Like*" which rather defeats the purpose of using tokenized search. Best, Erick On Thu, Dec 7, 2017 at 8:37 AM, Bernd Schmidt <b.schm...@eggheads.de> wrote: > > Indeed, I saw in the analysis tab of the solr admin that the § char will be > removed when using type text_general. > But in this use case we want to make a full text search like "_text_:§45" or > "_text_:§*" to find words starting with §. > We need a text field here, not a string field! > What is your recommended way to deal with it? > Is it possible to remove the word break behaviour for the § char? > Or is the best way to encode all § chars when indexing and searching? > > > > Thanks, Bernd > > > > Mit freundlichen Grüßen > > Bernd Schmidt > SOFTWARE-ENTWICKLUNG > > b.schm...@eggheads.de > > > > Von: Shawn Heisey <apa...@elyograg.org> > An: <solr-user@lucene.apache.org> > Gesendet: 07.12.2017 16:37 > Betreff: Re: Howto search for § character > > On 12/6/2017 9:09 AM, Bernd Schmidt wrote: >> we have defined a field named "_text_" for a full text search based on >> field-type "text_general": >> <field name="_text_" type="text_general" multiValued="true" indexed="true" >> stored="false"/>" >> >> When trying to search for the "§" character, we have strange behaviour: >> >> q=_text_:§ AND entityClass:StructureNodeImpl => numFound:469 (all nodes >> where entityClass:StructureNodeImpl) >> q=_text_:§ => numFound:0 >> >> How can we search for the occurence of the § character? > > We can't see how your "text_general" type is defined, but if it is > anything like the same type included in Solr examples, then it probably > is using StandardTokenizerFactory. It appears that this tokenizer > treats the § character as a word break and removes it from the token > stream. Most likely, the reason the search with the extra clause works > is that the part with that character is removed, and the query ends up > ONLY being the extra clause. > > You will need a fieldType with an analysis chain that doesn't remove the > § character, and it's almost guaranteed that you'll need to reindex. > Unless you do that, searching for that character is not going to be > possible. > > Also keep in mind that searching for a single character may not do what > you expect if that character is not a single word in the text, and that > certain filters can end up trimming out really short terms like that. > > Thanks, > Shawn > > > > > > eggheads GmbH > Herner Straße 370 > 44807 Bochum > > Fon +49 234 89397-0 > Fax +49 234 89397-28 > > www.eggheads.de > ----------------------------------------------- > > > Kunden DER TOURISTIK, EMSA, FRIATEC, MAMMUT, SUTTERLÜTY, SCHÄFER SHOP, THOMAS > COOK, TUI, WILO SE, WÜRTH, u.v.m. > > > Leistungen Standardsoftware für Product Information Management, Cross Media > Publishing & Multi Channel Commerce, Prozessberatung > > > Innovationspreis 2017 eggheads ist Sieger beim Innovationspreis-IT 2017 in > der Kategorie E-Commerce. Mehr > > ----------------------------------------------- > > Webinar Vorstellung der neuen Funktionalität der eggheads Suite am > 12.12.2017. Mehr > > -----------------------------------------------