If you use the stemmer in your query analysis it should act the same, right?
On Thu, Apr 30, 2020 at 3:54 PM Erick Erickson <erickerick...@gmail.com> wrote: > > They are being stemmed to two different tokens, “identif” and “identifi”. > Stemming is algorithmic and imperfect and in this case you’re getting bitten > by that algorithm. It looks like you’re using PorterStemFilter, if you want > you can look up the exact algorithm, but I don’t think it’s a bug, just one > of those little joys of English... > > To get a clearer picture of exactly what’s being searched, try adding > &debug=query to your query, in particular looking at the parsed query that’s > returned. That’ll tell you a bunch. In this particular case I don’t think > it’ll tell you anything more, but for future… > > Best, > Erick > > On, and un-checking the ‘verbose’ box on the analysis page removes a lot of > distraction, the detailed information is often TMI ;) > > > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <jhonny.lo...@publicismedia.com> > > wrote: > > > > Sure, rewriting the message with links for images: > > > > > > We’re facing an issue with stemming in solr. Most of the cases are working > > correctly, for example, if we search for bidding, solr brings results for > > bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, > > stemming is not working. Even when analyzers seems to have correct stemming > > of the word, the results are not reflecting that. One example. If I search > > ‘identifying’, this is the output: > > > > Analyzer (image link): > > https://1drv.ms/u/s!AlRTlFq8tQbShd4-Cp40Cmc0QioS0A?e=1f3GJp > > > > A clip of results: > > "haschildren_b":false, > > "isbucket_text_s":"0", > > "sectionbody_t":"\n\n\nIn order to identify 1st price auctions, > > leverage the proprietary tools available or manually pull a log file report > > to understand the trends and gauge auction spread overtime to assess the > > impact of variable auction dynamics.\n\n\n\n\n\n\n", > > "parsedupdatedby_s":"sitecorecarvaini", > > "sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, > > leverage the proprietary tools available or manually pull a log file report > > to understand the trends and gauge auction spread overtime to assess the > > impact of variable auction dynamics.\n\n\n\n\n\n\n", > > "hide_section_b":false > > > > > > As you can see, it has used the stemming correctly and brings results for > > other words based in the root, in this case “Identify”. > > > > However, if I search for “Identification”, this is the output: > > > > Analyzer (imagelink): > > https://1drv.ms/u/s!AlRTlFq8tQbShd49RpiQObzMgSjVhA > > > > > > Even with proper stemming, solr is only bringing results for the word > > identification (or identifications) but nothing else. > > > > The queries are over the same field that has the Porter Stemming Filter > > applied for both, query and index. This behavior is consistent with other > > ‘ion’ ended nouns: representation, modification, etc. > > > > Solr Version: 8.1. Does anyone know why is it happening? Is it a bug? > > > > Thanks. > > > > > > > > > > > > -----Original Message----- > > > > From: Erick Erickson <erickerick...@gmail.com> > > > > Sent: jueves, 30 de abril de 2020 1:47 p. m. > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion' > > > > > > > > This email has been sent from a source external to Publicis Groupe. Please > > use caution when clicking links or opening attachments. > > > > Cet email a été envoyé depuis une source externe à Publicis Groupe. > > Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou > > lorsque vous ouvrez des pièces jointes. > > > > > > > > > > > > > > > > The mail server is pretty aggressive about stripping links, so we can’t see > > the images. > > > > > > > > Could you put them somewhere and paste a link? > > > > > > > > Best, > > > > Erick > > > > > > > >> On Apr 30, 2020, at 2:40 PM, Jhonny Lopez <jhonny.lo...@publicismedia.com> > >> wrote: > > > >> > > > >> We’re facing an issue with stemming in solr. Most of the cases are working > >> correctly, for example, if we search for bidding, solr brings results for > >> bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, > >> stemming is not working. Even when analyzers seems to have correct > >> stemming of the word, the results are not reflecting that. One example. If > >> I search ‘identifying’, this is the output: > > > >> > > > >> Analyzer (image): > > > >> > > > >> A clip of results: > > > >> "haschildren_b":false, > > > >> "isbucket_text_s":"0", > > > >> "sectionbody_t":"\n\n\nIn order to identify 1st price auctions, > >> leverage the proprietary tools available or manually pull a log file > >> report to understand the trends and gauge auction spread overtime to > >> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n", > > > >> "parsedupdatedby_s":"sitecorecarvaini", > > > >> "sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, > >> leverage the proprietary tools available or manually pull a log file > >> report to understand the trends and gauge auction spread overtime to > >> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n", > > > >> "hide_section_b":false > > > >> > > > >> > > > >> As you can see, it has used the stemming correctly and brings results for > >> other words based in the root, in this case “Identify”. > > > >> > > > >> However, if I search for “Identification”, this is the output: > > > >> > > > >> Analyzer (image): > > > >> > > > >> Even with proper stemming, solr is only bringing results for the word > >> identification (or identifications) but nothing else. > > > >> > > > >> The queries are over the same field that has the Porter Stemming Filter > >> applied for both, query and index. This behavior is consistent with other > >> ‘ion’ ended nouns: representation, modification, etc. > > > >> > > > >> Solr Version: 8.1. Does anyone know why is it happening? Is it a bug? > > > >> > > > >> Thanks. > > > >> > > > >> > > > >> > > > >> > > > >> Jhonny Lopez > > > >> Technical Architect > > > >> Avenida Calle 26 No. 92 - 32, Edificio BTS3 > > > >> APDO. 128-1255 Bogota > > > >> T: +573006805461 > > > >> jhonny.lo...@publicismedia.com > > > >> www.prodigious.com > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> ---------------------------------------------------------------------- > > > >> -- Disclaimer The information in this email and any attachments may > > > >> contain proprietary and confidential information that is intended for the > >> addressee(s) only. If you are not the intended recipient, you are hereby > >> notified that any disclosure, copying, distribution, retention or use of > >> the contents of this information is prohibited. When addressed to our > >> clients or vendors, any information contained in this e-mail or any > >> attachments is subject to the terms and conditions in any governing > >> contract. If you have received this e-mail in error, please immediately > >> contact the sender and delete the e-mail. > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > Disclaimer The information in this email and any attachments may contain > > proprietary and confidential information that is intended for the > > addressee(s) only. If you are not the intended recipient, you are hereby > > notified that any disclosure, copying, distribution, retention or use of > > the contents of this information is prohibited. When addressed to our > > clients or vendors, any information contained in this e-mail or any > > attachments is subject to the terms and conditions in any governing > > contract. If you have received this e-mail in error, please immediately > > contact the sender and delete the e-mail. >