Well, when not splitting on whitespace you can the CharFilter for regex replacements [1] to clear the entire search string if anywhere in the string a banned word is found:
.*(cigarette|tobacco).* [1] https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.PatternReplaceCharFilterFactory -----Original message----- > From:Walter Underwood <wun...@wunderwood.org> > Sent: Thursday 1st October 2020 18:20 > To: solr-user@lucene.apache.org > Subject: Re: advice on whether to use stopwords for use case > > I can’t think of an easy way to do this in Solr. > > Do a bunch of string searches on the query on the client side. If any of them > match, > make a “no hits” result page. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Sep 30, 2020, at 11:56 PM, Derek Poh <d...@globalsources.com> wrote: > > > > Yes, the requirements (for now) is not to return any results. I think they > > may change the requirements,pending their return from the holidays. > > > >> If so, then check for those words in the query before sending it to Solr. > > That is what I think so too. > > > > Thinking further, using stopwords for this, there will still be results > > return when the number of words in the search keywords is more than the > > stopwords. > > > > On 1/10/2020 2:57 am, Walter Underwood wrote: > >> I’m not clear on the requirements. It sounds like the query “cigar” or > >> “cuban cigar” > >> should return zero results. Is that right? > >> > >> If so, then check for those words in the query before sending it to Solr. > >> > >> But the stopwords approach seems like the requirement is different. Could > >> you give > >> some examples? > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > >> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> (my > >> blog) > >> > >>> On Sep 30, 2020, at 11:53 AM, Alexandre Rafalovitch <arafa...@gmail.com> > >>> <mailto:arafa...@gmail.com> wrote: > >>> > >>> You may also want to look at something like: > >>> https://docs.querqy.org/index.html <https://docs.querqy.org/index.html> > >>> > >>> ApacheCon had (is having..) a presentation on it that seemed quite > >>> relevant to your needs. The videos should be live in a week or so. > >>> > >>> Regards, > >>> Alex. > >>> > >>> On Tue, 29 Sep 2020 at 22:56, Alexandre Rafalovitch <arafa...@gmail.com> > >>> <mailto:arafa...@gmail.com> wrote: > >>>> I am not sure why you think stop words are your first choice. Maybe I > >>>> misunderstand the question. I read it as that you need to exclude > >>>> completely a set of documents that include specific keywords when > >>>> called from specific module. > >>>> > >>>> If I wanted to differentiate the searches from specific module, I > >>>> would give that module a different end-point (Request Query Handler), > >>>> instead of /select. So, /nocigs or whatever. > >>>> > >>>> Then, in that end-point, you could do all sorts of extra things, such > >>>> as setting appends or even invariants parameters, which would include > >>>> filter query to exclude any documents matching specific keywords. I > >>>> assume it is ok to return documents that are matching for other > >>>> reasons. > >>>> > >>>> Ideally, you would mark the cigs documents during indexing with a > >>>> binary or enumeration flag and then during search you just need to > >>>> check against that flag. In that case, you could copyField your text > >>>> and run it against something like > >>>> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter > >>>> > >>>> <https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter> > >>>> combined with Shingles for multiwords. Or similar. And just transform > >>>> it as index-only so that the result is basically a yes/no flag. > >>>> Similar thing could be done with UpdateRequestProcessor pipeline if > >>>> you want to end up with a true boolean flag. The idea is the same, > >>>> just to have an index-only flag that you force lock into for any > >>>> request from specific module. > >>>> > >>>> Or even with something like ElevationSearchComponent. Same idea. > >>>> > >>>> Hope this helps. > >>>> > >>>> Regards, > >>>> Alex. > >>>> > >>>> On Tue, 29 Sep 2020 at 22:28, Derek Poh <d...@globalsources.com> > >>>> <mailto:d...@globalsources.com> wrote: > >>>>> Hi > >>>>> > >>>>> I have read in the mailings list that we should try to avoid using stop > >>>>> words. > >>>>> > >>>>> I have a use case where I would like to know if there is other > >>>>> alternative solutions beside using stop words. > >>>>> > >>>>> There is business requirement to return zero result when the search is > >>>>> cigarette related words and the search is coming from a particular > >>>>> module on our site. It does not apply to all searches from our site. > >>>>> There is a list of these cigarette related words. This list contains > >>>>> single word, multiple words (Electronic cigar), multiple words with > >>>>> punctuation (e-cigarette case). > >>>>> I am planning to copy a different set of search fields, that will > >>>>> include the stopword filter in the index and query stage, for this > >>>>> module to use. > >>>>> > >>>>> For this use case, other than using stop words to handle it, is there > >>>>> any alternative solution? > >>>>> > >>>>> Derek > >>>>> > >>>>> ---------------------- > >>>>> CONFIDENTIALITY NOTICE > >>>>> > >>>>> This e-mail (including any attachments) may contain confidential and/or > >>>>> privileged information. If you are not the intended recipient or have > >>>>> received this e-mail in error, please inform the sender immediately and > >>>>> delete this e-mail (including any attachments) from your computer, and > >>>>> you must not use, disclose to anyone else or copy this e-mail > >>>>> (including any attachments), whether in whole or in part. > >>>>> > >>>>> This e-mail and any reply to it may be monitored for security, legal, > >>>>> regulatory compliance and/or other appropriate reasons. > >> > > > > > > > > > > > > ---------------------- > > CONFIDENTIALITY NOTICE > > > > This e-mail (including any attachments) may contain confidential and/or > > privileged information. If you are not the intended recipient or have > > received this e-mail in error, please inform the sender immediately and > > delete this e-mail (including any attachments) from your computer, and you > > must not use, disclose to anyone else or copy this e-mail (including any > > attachments), whether in whole or in part. > > > > This e-mail and any reply to it may be monitored for security, legal, > > regulatory compliance and/or other appropriate reasons. > > > > > >