Thanks for replying, Dave. I am afraid that I am looking for non-index time i.e. query time solution.
Actually in my case I am expecting both documents to be returned from your example. I am just trying to avoid returning of documents which contain a tokenized versions of the provided search query when it is enclosed within double quotes to indicate exact matching expectation. e.g. search query -> "information retrieval" This should match documents like following: doc 1: "information retrieval" doc 2: "Advanced information retrieval with Solr" but should NOT match documents like doc 3: "informed retrieval" doc 4: "information extraction" (considering 'extraction' was a specified synonym of 'retrieval' ) doc 5: "INFORMATION RETRIEVAL" etc I am also ok with these documents showing up as long as they show up at bottom. Also, query time solution is a must. On Tue, Jan 19, 2021 at 12:22 PM David R <davidtr...@hotmail.com> wrote: > We had the same requirement. Just to echo back your requirements, I > understand your case to be this. Given these 2 doc titles: > > doc 1: "information retrieval" > doc 2: "Advanced information retrieval with Solr" > > You want a phrase search for "information retrieval" to find both > documents, but an EXACT phrase search for "information retrieval" to find > doc #1 only. > > If that's true, and case-sensitive search isn't a requirement, I indexed > this in the token stream, with adjacent positions of course. > > START information retrieval END > START advanced information retrieval with solr END > > And with our custom query parser, when an EXACT operator is found, I > tokenize the query to match the first case. Otherwise pass it through. > > Needs custom analyzers on the query and index sides to generate the > correct token sequences. > > It's worked out well for our case. > > Dave > > > > ________________________________ > From: gnandre <arnoldbron...@gmail.com> > Sent: Tuesday, January 19, 2021 4:07 PM > To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> > Subject: Exact matching without using new fields > > Hi, > > I am aware that to do exact matching (only whatever is provided inside > double quotes should be matched) in Solr, we can copy existing fields with > the help of copyFields into new fields that have very minimal tokenization > or no tokenization (e.g. using KeywordTokenizer or using string field type) > > However this solution is expensive in terms of index size because it might > almost double the size of the existing index. > > Is there any inexpensive way of achieving exact matches from the query > side. e.g. boost the original tokens more at query time compared to their > tokens? >