Re: Exact matching without using new fields

gnandre Tue, 19 Jan 2021 12:01:51 -0800

Thanks for replying, Dave.

I am afraid that I am looking for non-index time i.e. query time solution.


Actually in my case I am expecting both documents to be returned from your
example. I am just trying to avoid returning of documents which contain a
tokenized versions
of the provided search query when it is enclosed within double quotes to
indicate exact matching expectation.

e.g.
search query -> "information retrieval"

This should match documents like following:
doc 1: "information retrieval"
doc 2: "Advanced information retrieval with Solr"

but should NOT match documents like
doc 3: "informed retrieval"
doc 4: "information extraction"  (considering 'extraction' was a specified
synonym of 'retrieval' )
doc 5: "INFORMATION RETRIEVAL"

etc

I am also ok with these documents showing up as long as they show up at
bottom. Also, query time solution is a must.

On Tue, Jan 19, 2021 at 12:22 PM David R <davidtr...@hotmail.com> wrote:

> We had the same requirement. Just to echo back your requirements, I
> understand your case to be this. Given these 2 doc titles:
>
> doc 1: "information retrieval"
> doc 2: "Advanced information retrieval with Solr"
>
> You want a phrase search for "information retrieval" to find both
> documents, but an EXACT phrase search for "information retrieval" to find
> doc #1 only.
>
> If that's true, and case-sensitive search isn't a requirement, I indexed
> this in the token stream, with adjacent positions of course.
>
> START information retrieval END
> START advanced information retrieval with solr END
>
> And with our custom query parser, when an EXACT operator is found, I
> tokenize the query to match the first case. Otherwise pass it through.
>
> Needs custom analyzers on the query and index sides to generate the
> correct token sequences.
>
> It's worked out well for our case.
>
> Dave
>
>
>
> ________________________________
> From: gnandre <arnoldbron...@gmail.com>
> Sent: Tuesday, January 19, 2021 4:07 PM
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> Subject: Exact matching without using new fields
>
> Hi,
>
> I am aware that to do exact matching (only whatever is provided inside
> double quotes should be matched) in Solr, we can copy existing fields with
> the help of copyFields into new fields that have very minimal tokenization
> or no tokenization (e.g. using KeywordTokenizer or using string field type)
>
> However this solution is expensive in terms of index size because it might
> almost double the size of the existing index.
>
> Is there any inexpensive way of achieving exact matches from the query
> side. e.g. boost the original tokens more at query time compared to their
> tokens?
>

Re: Exact matching without using new fields

Reply via email to