Just go to the admin/analysis page and enter the terms in the "index" box (I usually uncheck the "verbose" checkbox). You will see exactly what element in your analysis chain is doing this. You'll see light gray two-letter codes on the size, e.g. "ST". Hover over it with your mouse, and you should see exactly what the class and thus the easily-identifiable element of your fieldType for the field in question. For instance:
solr.StandardTokenizerFactory text_general may have fixed _this_ problem, but it's not a great solution. The french analysis chain is tuned to create a better solution for, well, french. Likely solr.FrenchLightStemFilterFactory is removing the last "o", but that's a guess. In general, stemming is incompatible with wildcards. E.g. "running" stems to "run", but "runni*" has no real algorithm that can stem. Best, Erick On Wed, Sep 20, 2017 at 5:18 AM, Sascha Tuschinski <stuschin...@canto.com> wrote: > Hello Erik and Josh, > > Thanks for your hints and comments. > > I found out that the “text_fr” field type didn’t stored the “fraoo” as term. > It stored “frao” only. Maybe because of French field type. This field had > been automatically created. I’m new to Solr and this is maybe correct. > > I use “text_general” as field type now and this works fine. This is fine and > solve our problem. > > I can deliver the output of the debug query from admin/analysis for the > text_fr field type if required. > > Thanks again! > Sascha > > > Am 19.09.17, 20:12 schrieb "Erick Erickson" <erickerick...@gmail.com>: > > Unfortunately the link you provided goes to "localhost", which isn't > accessible. > > The very first thing I'd do is go to the admin/analysis page and put > the terms in both the "index" and "query" boxes for the field in > question. > Next, attach &debug=query to the query to see how the query is actually > parsed. > > My bet: You are using a different stemmer for the two cases and the > actual token in the index is FRao in the problem field, but that's > just a guess. > > It often fools people that the field returned in the document (i.e. in > the fl list) is the _stored_ value, not the actual token in the index. > You can also use the TermsComponent to see the actual terms in the > index as well as the admin/schema_browser link. > > Best, > Erick > > > On Tue, Sep 19, 2017 at 9:01 AM, Sascha Tuschinski > <stuschin...@canto.com> wrote: > > Hello Community, > > > > We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) > with field names defined like "f_1179014266_txt". The number in the middle of > the name differs for each field we use. For language specific fields we are > adding an language specific extension e.g. "f_1179014267_txt_fr", > "f_1179014268_txt_de", "f_1179014269_txt_en" and so on. > > We are having the following odd issue within the french "_fr" field > only: > > Field > > > f_1197829835_txt_fr<http://localhost:8983/solr/#/test_core/schema?field=f_1197829835_txt_fr> > > Dynamic Field / > > > *_txt_fr<http://localhost:8983/solr/#/test_core/schema?dynamic-field=*_txt_fr> > > Type > > text_fr<http://localhost:8983/solr/#/test_core/schema?type=text_fr> > > > > * The saved value which had been added with no problem to the Solr > index is "FRaoo". > > * When searching within the Solr query tool for > "f_1197829839_txt_fr:*FRao*" it returns the items matching the term as seen > below - OK. > > { > > "responseHeader":{ > > "status":0, > > "QTime":1, > > "params":{ > > "q":"f_1197829839_txt_fr:*FRao*", > > "indent":"on", > > "wt":"json", > > "_":"1505808887827"}}, > > "response":{"numFound":1,"start":0,"docs":[ > > { > > "id":"129", > > "f_1197829834_txt_en":"EnAir", > > "f_1197829822_txt_de":"Lufti", > > "f_1197829835_txt_fr":"FRaoi", > > "f_1197829836_txt_it":"ITAir", > > "f_1197829799_txt":["Lufti"], > > "f_1197829838_txt_en":"EnAir", > > "f_1197829839_txt_fr":"FRaoo", > > "f_1197829840_txt_it":"ITAir", > > "_version_":1578520424165146624}] > > }} > > > > * When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found > - Wrong! > > { > > "responseHeader":{ > > "status":0, > > "QTime":1, > > "params":{ > > "q":"f_1197829839_txt_fr:*FRaoo*", > > "indent":"on", > > "wt":"json", > > "_":"1505808887827"}}, > > "response":{"numFound":0,"start":0,"docs":[] > > }} > > When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the > matching items are found - OK > > > > { > > "responseHeader":{ > > "status":0, > > "QTime":1, > > "params":{ > > "q":"f_1197829839_txt_fr:FRaoo", > > "indent":"on", > > "wt":"json", > > "_":"1505808887827"}}, > > "response":{"numFound":1,"start":0,"docs":[ > > { > > "id":"129", > > "f_1197829834_txt_en":"EnAir", > > "f_1197829822_txt_de":"Lufti", > > "f_1197829835_txt_fr":"FRaoi", > > "f_1197829836_txt_it":"ITAir", > > "f_1197829799_txt":["Lufti"], > > "f_1197829838_txt_en":"EnAir", > > "f_1197829839_txt_fr":"FRaoo", > > "f_1197829840_txt_it":"ITAir", > > "_version_":1578520424165146624}] > > }} > > If we save exact the same value into a different language field e.g. > ending on "_en", means "f_1197829834_txt_en", then the search > "f_1197829834_txt_en:*FRaoo*" find all items correctly! > > We have no idea what's wrong here and we even recreated the index and > can reproduce this problem all the time. I can only see that the value starts > with "FR" and the field extension ends with "fr" but this is not problem for > "en", "de" an so on. All fields are used in the same way and have the same > field properties. > > Any help or ideas are highly appreciated. I filed a bug for this > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-11367&data=01%7C01%7Cstuschinski%40canto.com%7C30fde63fe5fa4970052308d4ff8a01eb%7Cd477bdd2a39b47d0aa1bc2bd3de94562%7C0&sdata=zXo0TiIgSBRiqBXpCJESBSSD0RHtcoiQ2zv%2FkITyTeA%3D&reserved=0 > but had been asked to publish my question here. Thanks for reading. > > Greetings, > > _______________________________________________________________ > > Sascha Tuschinski > > Manager Quality Assurance // Canto GmbH > > Phone: +49 (0) 30 390 485 - 41 > > E-mail: stuschin...@canto.com<mailto:stuschin...@canto.com> > > Web: > canto.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.canto.com%2F&data=01%7C01%7Cstuschinski%40canto.com%7C30fde63fe5fa4970052308d4ff8a01eb%7Cd477bdd2a39b47d0aa1bc2bd3de94562%7C0&sdata=7Yu3mA2BaIBEbDJoJekBQvY%2Fgh0caXjA2kWvoOqj8NI%3D&reserved=0> > > > > Canto GmbH > > Lietzenburger Str. 46 > > 10789 Berlin > > Phone: +49 (0)30 390485-0 > > Fax: +49 (0)30 390485-55 > > Amtsgericht Berlin-Charlottenburg HRB 88566 > > Geschäftsführer: Jack McGannon, Thomas Mockenhaupt > > > >