Well, when not splitting on whitespace you can the CharFilter for regex 
replacements [1] to clear the entire search string if anywhere in the string a 
banned word is found: 

.*(cigarette|tobacco).*

[1] 
https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.PatternReplaceCharFilterFactory
 
-----Original message-----
> From:Walter Underwood <wun...@wunderwood.org>
> Sent: Thursday 1st October 2020 18:20
> To: solr-user@lucene.apache.org
> Subject: Re: advice on whether to use stopwords for use case
> 
> I can’t think of an easy way to do this in Solr.
> 
> Do a bunch of string searches on the query on the client side. If any of them 
> match, 
> make a “no hits” result page.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> > On Sep 30, 2020, at 11:56 PM, Derek Poh <d...@globalsources.com> wrote:
> > 
> > Yes, the requirements (for now) is not to return any results. I think they 
> > may change the requirements,pending their return from the holidays.
> > 
> >> If so, then check for those words in the query before sending it to Solr.
> > That is what I think so too.
> > 
> > Thinking further, using stopwords for this, there will still be results 
> > return when the number of words in the search keywords is more than the 
> > stopwords.
> > 
> > On 1/10/2020 2:57 am, Walter Underwood wrote:
> >> I’m not clear on the requirements. It sounds like the query “cigar” or 
> >> “cuban cigar”
> >> should return zero results. Is that right?
> >> 
> >> If so, then check for those words in the query before sending it to Solr.
> >> 
> >> But the stopwords approach seems like the requirement is different. Could 
> >> you give
> >> some examples?
> >> 
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my 
> >> blog)
> >> 
> >>> On Sep 30, 2020, at 11:53 AM, Alexandre Rafalovitch <arafa...@gmail.com> 
> >>> <mailto:arafa...@gmail.com> wrote:
> >>> 
> >>> You may also want to look at something like: 
> >>> https://docs.querqy.org/index.html <https://docs.querqy.org/index.html>
> >>> 
> >>> ApacheCon had (is having..) a presentation on it that seemed quite
> >>> relevant to your needs. The videos should be live in a week or so.
> >>> 
> >>> Regards,
> >>>   Alex.
> >>> 
> >>> On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch <arafa...@gmail.com> 
> >>> <mailto:arafa...@gmail.com> wrote:
> >>>> I am not sure why you think stop words are your first choice. Maybe I
> >>>> misunderstand the question. I read it as that you need to exclude
> >>>> completely a set of documents that include specific keywords when
> >>>> called from specific module.
> >>>> 
> >>>> If I wanted to differentiate the searches from specific module, I
> >>>> would give that module a different end-point (Request Query Handler),
> >>>> instead of /select. So, /nocigs or whatever.
> >>>> 
> >>>> Then, in that end-point, you could do all sorts of extra things, such
> >>>> as setting appends or even invariants parameters, which would include
> >>>> filter query to exclude any documents matching specific keywords. I
> >>>> assume it is ok to return documents that are matching for other
> >>>> reasons.
> >>>> 
> >>>> Ideally, you would mark the cigs documents during indexing with a
> >>>> binary or enumeration flag and then during search you just need to
> >>>> check against that flag. In that case, you could copyField  your text
> >>>> and run it against something like
> >>>> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter
> >>>>  
> >>>> <https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter>
> >>>> combined with Shingles for multiwords. Or similar. And just transform
> >>>> it as index-only so that the result is basically a yes/no flag.
> >>>> Similar thing could be done with UpdateRequestProcessor pipeline if
> >>>> you want to end up with a true boolean flag. The idea is the same,
> >>>> just to have an index-only flag that you force lock into for any
> >>>> request from specific module.
> >>>> 
> >>>> Or even with something like ElevationSearchComponent. Same idea.
> >>>> 
> >>>> Hope this helps.
> >>>> 
> >>>> Regards,
> >>>>   Alex.
> >>>> 
> >>>> On Tue, 29 Sep 2020 at 22:28, Derek Poh <d...@globalsources.com> 
> >>>> <mailto:d...@globalsources.com> wrote:
> >>>>> Hi
> >>>>> 
> >>>>> I have read in the mailings list that we should try to avoid using stop
> >>>>> words.
> >>>>> 
> >>>>> I have a use case where I would like to know if there is other
> >>>>> alternative solutions beside using stop words.
> >>>>> 
> >>>>> There is business requirement to return zero result when the search is
> >>>>> cigarette related words and the search is coming from a particular
> >>>>> module on our site. It does not apply to all searches from our site.
> >>>>> There is a list of these cigarette related words. This list contains
> >>>>> single word, multiple words (Electronic cigar), multiple words with
> >>>>> punctuation (e-cigarette case).
> >>>>> I am planning to copy a different set of search fields, that will
> >>>>> include the stopword filter in the index and query stage, for this
> >>>>> module to use.
> >>>>> 
> >>>>> For this use case, other than using stop words to handle it, is there
> >>>>> any alternative solution?
> >>>>> 
> >>>>> Derek
> >>>>> 
> >>>>> ----------------------
> >>>>> CONFIDENTIALITY NOTICE
> >>>>> 
> >>>>> This e-mail (including any attachments) may contain confidential and/or 
> >>>>> privileged information. If you are not the intended recipient or have 
> >>>>> received this e-mail in error, please inform the sender immediately and 
> >>>>> delete this e-mail (including any attachments) from your computer, and 
> >>>>> you must not use, disclose to anyone else or copy this e-mail 
> >>>>> (including any attachments), whether in whole or in part.
> >>>>> 
> >>>>> This e-mail and any reply to it may be monitored for security, legal, 
> >>>>> regulatory compliance and/or other appropriate reasons.
> >> 
> > 
> > 
> > 
> > 
> > 
> > ---------------------- 
> > CONFIDENTIALITY NOTICE 
> > 
> > This e-mail (including any attachments) may contain confidential and/or 
> > privileged information. If you are not the intended recipient or have 
> > received this e-mail in error, please inform the sender immediately and 
> > delete this e-mail (including any attachments) from your computer, and you 
> > must not use, disclose to anyone else or copy this e-mail (including any 
> > attachments), whether in whole or in part. 
> > 
> > This e-mail and any reply to it may be monitored for security, legal, 
> > regulatory compliance and/or other appropriate reasons.
> > 
> > 
> 
> 

Reply via email to