If you use the stemmer in your query analysis it should act the same, right?

On Thu, Apr 30, 2020 at 3:54 PM Erick Erickson <erickerick...@gmail.com> wrote:
>
> They are being stemmed to two different tokens, “identif” and “identifi”. 
> Stemming is algorithmic and imperfect and in this case you’re getting bitten 
> by that algorithm. It looks like you’re using PorterStemFilter, if you want 
> you can look up the exact algorithm, but I don’t think it’s a bug, just one 
> of those little joys of English...
>
> To get a clearer picture of exactly what’s being searched, try adding 
> &debug=query to your query, in particular looking at the parsed query that’s 
> returned. That’ll tell you a bunch. In this particular case I don’t think 
> it’ll tell you anything more, but for future…
>
> Best,
> Erick
>
> On, and un-checking the ‘verbose’ box on the analysis page removes a lot of 
> distraction, the detailed information is often TMI ;)
>
> > On Apr 30, 2020, at 2:51 PM, Jhonny Lopez <jhonny.lo...@publicismedia.com> 
> > wrote:
> >
> > Sure, rewriting the message with links for images:
> >
> >
> > We’re facing an issue with stemming in solr. Most of the cases are working 
> > correctly, for example, if we search for bidding, solr brings results for 
> > bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
> > stemming is not working. Even when analyzers seems to have correct stemming 
> > of the word, the results are not reflecting that. One example. If I search 
> > ‘identifying’, this is the output:
> >
> > Analyzer (image link):
> > https://1drv.ms/u/s!AlRTlFq8tQbShd4-Cp40Cmc0QioS0A?e=1f3GJp
> >
> > A clip of results:
> > "haschildren_b":false,
> >        "isbucket_text_s":"0",
> >        "sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
> > leverage the proprietary tools available or manually pull a log file report 
> > to understand the trends and gauge auction spread overtime to assess the 
> > impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >        "parsedupdatedby_s":"sitecorecarvaini",
> >        "sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
> > leverage the proprietary tools available or manually pull a log file report 
> > to understand the trends and gauge auction spread overtime to assess the 
> > impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >        "hide_section_b":false
> >
> >
> > As you can see, it has used the stemming correctly and brings results for 
> > other words based in the root, in this case “Identify”.
> >
> > However, if I search for “Identification”, this is the output:
> >
> > Analyzer (imagelink):
> > https://1drv.ms/u/s!AlRTlFq8tQbShd49RpiQObzMgSjVhA
> >
> >
> > Even with proper stemming, solr is only bringing results for the word 
> > identification (or identifications) but nothing else.
> >
> > The queries are over the same field that has the Porter Stemming Filter 
> > applied for both, query and index. This behavior is consistent with other 
> > ‘ion’ ended nouns: representation, modification, etc.
> >
> > Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> >
> > Thanks.
> >
> >
> >
> >
> >
> > -----Original Message-----
> >
> > From: Erick Erickson <erickerick...@gmail.com>
> >
> > Sent: jueves, 30 de abril de 2020 1:47 p. m.
> >
> > To: solr-user@lucene.apache.org
> >
> > Subject: Re: Possible issue with Stemming and nouns ended with suffix 'ion'
> >
> >
> >
> > This email has been sent from a source external to Publicis Groupe. Please 
> > use caution when clicking links or opening attachments.
> >
> > Cet email a été envoyé depuis une source externe à Publicis Groupe. 
> > Veuillez faire preuve de prudence lorsque vous cliquez sur des liens ou 
> > lorsque vous ouvrez des pièces jointes.
> >
> >
> >
> >
> >
> >
> >
> > The mail server is pretty aggressive about stripping links, so we can’t see 
> > the images.
> >
> >
> >
> > Could you put them somewhere and paste a link?
> >
> >
> >
> > Best,
> >
> > Erick
> >
> >
> >
> >> On Apr 30, 2020, at 2:40 PM, Jhonny Lopez <jhonny.lo...@publicismedia.com> 
> >> wrote:
> >
> >>
> >
> >> We’re facing an issue with stemming in solr. Most of the cases are working 
> >> correctly, for example, if we search for bidding, solr brings results for 
> >> bidding, bid, bids, etc. However, with nouns ended with ‘ion’ suffix, 
> >> stemming is not working. Even when analyzers seems to have correct 
> >> stemming of the word, the results are not reflecting that. One example. If 
> >> I search ‘identifying’, this is the output:
> >
> >>
> >
> >> Analyzer (image):
> >
> >>
> >
> >> A clip of results:
> >
> >> "haschildren_b":false,
> >
> >>        "isbucket_text_s":"0",
> >
> >>        "sectionbody_t":"\n\n\nIn order to identify 1st price auctions, 
> >> leverage the proprietary tools available or manually pull a log file 
> >> report to understand the trends and gauge auction spread overtime to 
> >> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >
> >>        "parsedupdatedby_s":"sitecorecarvaini",
> >
> >>        "sectionbody_t_en":"\n\n\nIn order to identify 1st price auctions, 
> >> leverage the proprietary tools available or manually pull a log file 
> >> report to understand the trends and gauge auction spread overtime to 
> >> assess the impact of variable auction dynamics.\n\n\n\n\n\n\n",
> >
> >>        "hide_section_b":false
> >
> >>
> >
> >>
> >
> >> As you can see, it has used the stemming correctly and brings results for 
> >> other words based in the root, in this case “Identify”.
> >
> >>
> >
> >> However, if I search for “Identification”, this is the output:
> >
> >>
> >
> >> Analyzer (image):
> >
> >>
> >
> >> Even with proper stemming, solr is only bringing results for the word 
> >> identification (or identifications) but nothing else.
> >
> >>
> >
> >> The queries are over the same field that has the Porter Stemming Filter 
> >> applied for both, query and index. This behavior is consistent with other 
> >> ‘ion’ ended nouns: representation, modification, etc.
> >
> >>
> >
> >> Solr Version: 8.1. Does anyone know why is it happening? Is it a bug?
> >
> >>
> >
> >> Thanks.
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>  Jhonny Lopez
> >
> >>  Technical Architect
> >
> >>  Avenida Calle 26 No. 92 - 32, Edificio BTS3
> >
> >>  APDO. 128-1255 Bogota
> >
> >>  T: +573006805461
> >
> >>  jhonny.lo...@publicismedia.com
> >
> >>  www.prodigious.com
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >> ----------------------------------------------------------------------
> >
> >> -- Disclaimer The information in this email and any attachments may
> >
> >> contain proprietary and confidential information that is intended for the 
> >> addressee(s) only. If you are not the intended recipient, you are hereby 
> >> notified that any disclosure, copying, distribution, retention or use of 
> >> the contents of this information is prohibited. When addressed to our 
> >> clients or vendors, any information contained in this e-mail or any 
> >> attachments is subject to the terms and conditions in any governing 
> >> contract. If you have received this e-mail in error, please immediately 
> >> contact the sender and delete the e-mail.
> >
> >
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> > Disclaimer The information in this email and any attachments may contain 
> > proprietary and confidential information that is intended for the 
> > addressee(s) only. If you are not the intended recipient, you are hereby 
> > notified that any disclosure, copying, distribution, retention or use of 
> > the contents of this information is prohibited. When addressed to our 
> > clients or vendors, any information contained in this e-mail or any 
> > attachments is subject to the terms and conditions in any governing 
> > contract. If you have received this e-mail in error, please immediately 
> > contact the sender and delete the e-mail.
>

Reply via email to