Re: Exact matching without using new fields

Doss Thu, 21 Jan 2021 07:49:38 -0800

Hi,

You can try search query -> "+information +retrieval"


Meaning the document should have both the keywords. Doc 5 will also be in
the results.

https://lucene.apache.org/solr/guide/8_7/the-standard-query-parser.html#the-boolean-operator

- Mohandoss.

On Wed, Jan 20, 2021 at 1:38 AM gnandre <arnoldbron...@gmail.com> wrote:

> Thanks for replying, Dave.
>
> I am afraid that I am looking for non-index time i.e. query time solution.
>
> Actually in my case I am expecting both documents to be returned from your
> example. I am just trying to avoid returning of documents which contain a
> tokenized versions
> of the provided search query when it is enclosed within double quotes to
> indicate exact matching expectation.
>
> e.g.
> search query -> "information retrieval"
>
> This should match documents like following:
> doc 1: "information retrieval"
> doc 2: "Advanced information retrieval with Solr"
>
> but should NOT match documents like
> doc 3: "informed retrieval"
> doc 4: "information extraction"  (considering 'extraction' was a specified
> synonym of 'retrieval' )
> doc 5: "INFORMATION RETRIEVAL"
>
> etc
>
> I am also ok with these documents showing up as long as they show up at
> bottom. Also, query time solution is a must.
>
> On Tue, Jan 19, 2021 at 12:22 PM David R <davidtr...@hotmail.com> wrote:
>
> > We had the same requirement. Just to echo back your requirements, I
> > understand your case to be this. Given these 2 doc titles:
> >
> > doc 1: "information retrieval"
> > doc 2: "Advanced information retrieval with Solr"
> >
> > You want a phrase search for "information retrieval" to find both
> > documents, but an EXACT phrase search for "information retrieval" to find
> > doc #1 only.
> >
> > If that's true, and case-sensitive search isn't a requirement, I indexed
> > this in the token stream, with adjacent positions of course.
> >
> > START information retrieval END
> > START advanced information retrieval with solr END
> >
> > And with our custom query parser, when an EXACT operator is found, I
> > tokenize the query to match the first case. Otherwise pass it through.
> >
> > Needs custom analyzers on the query and index sides to generate the
> > correct token sequences.
> >
> > It's worked out well for our case.
> >
> > Dave
> >
> >
> >
> > ________________________________
> > From: gnandre <arnoldbron...@gmail.com>
> > Sent: Tuesday, January 19, 2021 4:07 PM
> > To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> > Subject: Exact matching without using new fields
> >
> > Hi,
> >
> > I am aware that to do exact matching (only whatever is provided inside
> > double quotes should be matched) in Solr, we can copy existing fields
> with
> > the help of copyFields into new fields that have very minimal
> tokenization
> > or no tokenization (e.g. using KeywordTokenizer or using string field
> type)
> >
> > However this solution is expensive in terms of index size because it
> might
> > almost double the size of the existing index.
> >
> > Is there any inexpensive way of achieving exact matches from the query
> > side. e.g. boost the original tokens more at query time compared to their
> > tokens?
> >
>

Re: Exact matching without using new fields

Reply via email to