Re: Similarity plugins which are normalized

2018-11-29 Thread Tanya Bompi
Thanks a lot Doug. Maybe setting more importance to certain fields is the way to go in conjunction with the overall match. Tanu On Thu, Nov 29, 2018 at 1:52 PM Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > The usual advice is relevance scores don’t exist on a scale where a > thre

Re: Similarity plugins which are normalized

2018-11-29 Thread Doug Turnbull
The usual advice is relevance scores don’t exist on a scale where a threshold is useful. As these are just heuristics used for ranking , not a confidence level. I would instead focus on what attributes of a document consider it relevant or not (strong match in certain fields). A couple of things

Re: similarity as a parameter

2015-12-16 Thread Ahmet Arslan
Hi Markus, I confirm (if that counts) that all current built-in similarities (expect Sweet spot) save same stuff into the norms. They can be switched/changed at search time. Actually, I am doing this today with Lucene, experimenting different term-weighting models using a single index. It would

Re: similarity as a parameter

2015-12-15 Thread Jack Krupansky
There should probably be some doc notes about this stuff, at a minimum alerting the user to the prospect that changing the similarity for a field (or the default for all fields) can require reindexing and when it is likely to require reindexing. The Lucene-level Javadoc should probably say these sa

RE: similarity as a parameter

2015-12-15 Thread Markus Jelsma
try Kan > Sent: Tuesday 15th December 2015 19:07 > To: solr-user@lucene.apache.org; Ahmet Arslan > Subject: Re: similarity as a parameter > > Markus, Jack, > > I think Ahmet nails it pretty nicely: the similarity functions in question > are compatible on the index level

RE: similarity as a parameter

2015-12-15 Thread Chris Hostetter
: Sweetspot does require reindexing but is that the only one? I have not : investigated some exotic implementations, anyone to confirm sweetspot is : the only one? In that case you could patch QueryComponent right, instead : of having a custom component? I'm not sure how where this thread deve

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan
Hi Hoss, Thanks for sharing the knowledge on dangerous zones, will try to avoid them. #2 is quite probable way of implementing this in my case, as many Query objects are custom (although not all). But #1 is compelling too and sounds like a bit less trouble. On Tue, Dec 15, 2015 at 8:13 PM, Chris

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan
Markus, Jack, I think Ahmet nails it pretty nicely: the similarity functions in question are compatible on the index level. So it is not necessary to create a separate search field. Ahmet, I like your idea. Will take a look, thanks. Rgds, Dmitry On Tue, Dec 15, 2015 at 7:58 PM, Ahmet Arslan wr

Re: similarity as a parameter

2015-12-15 Thread Ahmet Arslan
I wonder what solr-plugin would be best for this functionality. How about a custom search component, in its prepare method? I think we can access (Solr)IndexSearcher inside a SearchComponent. setSimilarity in the process method should work. Ahmet On Tuesday, December 15, 2015 7:43 PM, Ahmet

Re: similarity as a parameter

2015-12-15 Thread Chris Hostetter
: I think this is a legitimate request. Majority of the similarities are : compatible index wise. I think the only exception is sweet spot : similarity. I think you are grossly underestimating the risk of arbitrarily using diff Similarities between index time and query time -- particulaly in h

Re: similarity as a parameter

2015-12-15 Thread Ahmet Arslan
Hi Dmitry, I think this is a legitimate request. Majority of the similarities are compatible index wise. I think the only exception is sweet spot similarity. In Lucene, it can be changed on the fly with a new Searcher. It should be possible to do so in solr. Thanks, Ahmet On Tuesday, Decemb

Re: similarity as a parameter

2015-12-15 Thread Jack Krupansky
You would need to define an alternate field which copied a base field but then had the desired alternate similarity, using SchemaSimilarityFactory. See: https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements -- Jack Krupansky On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan wrote:

RE: similarity as a parameter

2015-12-15 Thread Markus Jelsma
Hello Dmitry - this is currently not possible. Quickest way is to reconfigure and reload the cores. Some similarities also require you to reindex, so it is a bad idea anyway. Markus -Original message- > From:Dmitry Kan > Sent: Tuesday 15th December 2015 16:02 > To: solr-user@lucene.ap

Re: Similarity search with Solr

2013-12-13 Thread Jayni
okay, thanks for your help Janek -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-search-with-Solr-tp4106623p4106648.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Similarity search with Solr

2013-12-13 Thread Jack Krupansky
Do a proof of concept implementation and see for yourself if you find the performance acceptable. I mean, performance should be reasonably decent. -- Jack Krupansky -Original Message- From: Jayni Sent: Friday, December 13, 2013 12:22 PM To: solr-user@lucene.apache.org Subject: Re

Re: Similarity search with Solr

2013-12-13 Thread Jayni
@kamaci The sentences are stored in txt files, but I can also import them. The file includes a lot of RTF-stuff like a font table, but I'm only interested in the sentences, which are enclosed by tags. @Jack Krupansky-2 Do you think it will be fast enough. I got millions of sentences and I have to

Re: Similarity search with Solr

2013-12-13 Thread Jack Krupansky
Just use the edismax query parser with bigrams and trigrams enabled and the default operator set to OR. That will select all sentences even vaguely similar and will more highly score sentences that have a greater number of words and phrases that match. -- Jack Krupansky -Original Message-

Re: Similarity search with Solr

2013-12-13 Thread Furkan KAMACI
Hi; Could you explain your infrastructure? Thanks; Furkan KAMACI 2013/12/13 Jayni > Hi, > > I want to do a similarity search on millions of sentences. They are written > in natural language and I want to find sentences, which have a "similar" > set > of words. > A search based on trigrams or

Re: "Similarity" of numbers in MoreLikeThisHandler

2012-07-04 Thread nanshi
very well explained. However, you dont know the number (integer/float) field value of a matched in advance. So even suppose the Similarity field is constructed, how to use it in the query? -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-of-numbers-in-MoreLikeThisHan

Re: Similarity per field

2011-05-27 Thread Brian Lamb
I'm still not having any luck with this. Has anyone actually gotten this to work so far? I feel like I've followed the directions to the letter but it just doesn't work. Thanks, Brian Lamb On Wed, May 25, 2011 at 2:48 PM, Brian Lamb wrote: > I looked at the patch page and saw the files that wer

Re: Similarity per field

2011-05-25 Thread Brian Lamb
I looked at the patch page and saw the files that were changed. I went into my install and looked at those same files and found that they had indeed been changed. So it looks like I have the correct version of solr. On Wed, May 25, 2011 at 1:01 PM, Brian Lamb wrote: > Hi all, > > I sent a mail in

Re: Similarity

2011-05-24 Thread Brian Lamb
This did the trick. Thanks! On Mon, May 23, 2011 at 5:03 PM, Markus Jelsma wrote: > Hmm. I don't add code to Apache packages but create my own packages and > namespaces, build a jar and add it to the lib directory as specified in > solrconfig. Then you can use the FQCN to in the similarity config

Re: Similarity

2011-05-23 Thread Markus Jelsma
Hmm. I don't add code to Apache packages but create my own packages and namespaces, build a jar and add it to the lib directory as specified in solrconfig. Then you can use the FQCN to in the similarity config to point to the class. May be it can work when messing inside the apache namespace bu

Re: Similarity

2011-05-23 Thread Brian Lamb
Okay well this is encouraging. I changed SweetSpotSimilarity to MyClassSimilarity. I created this class in: lucene/contrib/misc/src/java/org/apache/lucene/misc/ I am getting a ClassNotFoundException when I try to start solr. Here is the contents of the MyClassSimilarity file: package org.apache

Re: Similarity

2011-05-23 Thread Markus Jelsma
As far as i know, SweetSpotSimilarty needs be configured. I did use it once but wrapped a factory around it to configure the sweet spot. It worked just as expected and explained in that paper about the subject. If you use a custom similarity that , for example, caps tf to 1. Does it then work?

Re: Similarity class for an individual field

2011-05-20 Thread Brian Lamb
So what was my mistake? I still have not resolved this issue. On Fri, May 20, 2011 at 11:22 AM, Brian Lamb wrote: > Yes. Was that not what I was supposed to do? > > > On Thu, May 19, 2011 at 8:26 PM, Koji Sekiguchi wrote: > >> (11/05/20 3:45), Brian Lamb wrote: >> >>> Hi all, >>> >>> Based on adv

Re: Similarity class for an individual field

2011-05-20 Thread Brian Lamb
Yes. Was that not what I was supposed to do? On Thu, May 19, 2011 at 8:26 PM, Koji Sekiguchi wrote: > (11/05/20 3:45), Brian Lamb wrote: > >> Hi all, >> >> Based on advice I received on a previous email thread, I applied patch >> https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be

Re: Similarity class for an individual field

2011-05-19 Thread Koji Sekiguchi
(11/05/20 3:45), Brian Lamb wrote: Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cd $ svn u

Re: Similarity class for an individual field

2011-05-19 Thread Brian Lamb
I tried editing the SweetSpotSimilarity class located at lucene/contrib/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java to just return 1 for each function and the score does not change at all. This has led me to believe that it does not recognize similarity at all. At this point, all

Re: Similarity class for an individual field

2011-05-19 Thread Brian Lamb
Also, I've tried adding: To the end of the schema file so that it is applied globally but it does not appear to change the score either. What am I doing incorrectly? Thanks, Brian Lamb On Thu, May 19, 2011 at 2:45 PM, Brian Lamb wrote: > Hi all, > > Based on advice I received on a previous e

Re: Similarity

2010-06-24 Thread Dave Searle
You could write some client code to translate your query into the following (Foo and baz) or (foo or baz) This seems to work well for me On 24 Jun 2010, at 21:20, Blargy wrote: > > > Yonik Seeley-2-2 wrote: >> >> Depends on the larger context of what you are trying to do. >> Do you still wa

Re: Similarity

2010-06-24 Thread Yonik Seeley
On Thu, Jun 24, 2010 at 4:20 PM, Blargy wrote: > Yonik Seeley-2-2 wrote: >> >> Depends on the larger context of what you are trying to do. >> Do you still want the idf and length norm relevancy factors?  If not, >> use a filter, or boost the particular clause with 0. >> > > I do want the other rel

Re: Similarity

2010-06-24 Thread Blargy
Yonik Seeley-2-2 wrote: > > Depends on the larger context of what you are trying to do. > Do you still want the idf and length norm relevancy factors? If not, > use a filter, or boost the particular clause with 0. > I do want the other relevancy factors.. ie boost, phrase-boosting etc but I j

Re: Similarity

2010-06-24 Thread Yonik Seeley
On Thu, Jun 24, 2010 at 3:17 PM, Blargy wrote: > > Can someone explain how I can override the default behavior of the tf > contributing a higher score for documents with repeated words? > > For example: > > Query: "foo" > Doc1: "foo bar" score 1.0 > Doc2: "foo foo bar" score 1.1 > > Doc2 contains

Re: "Similarity" of numbers in MoreLikeThisHandler

2008-07-04 Thread Chris Hostetter
: I didn't realize that subsets were used to evaluate similarity. From your : example, I assume that the strings: 456 and 123456 are "similar". If I store : them as integers instead of strings, will Solr/Lucene still use subsets to : assign similarity? Strictly speaking MLT opperates on "Terms" .

Re: "Similarity" of numbers in MoreLikeThisHandler

2008-07-04 Thread wojtekpia
I didn't realize that subsets were used to evaluate similarity. From your example, I assume that the strings: 456 and 123456 are "similar". If I store them as integers instead of strings, will Solr/Lucene still use subsets to assign similarity? -- View this message in context: http://www.nabbl

Re: "Similarity" of numbers in MoreLikeThisHandler

2008-07-04 Thread Francisco Sanmartin
The problem is the concept of "similarity". Your concept of similarity is based on the meaning of the numbers (or the words). Solr's concept of similarity is based on subsets of characters. This way for Solr "thunder" is similar to "thunderstorm" or to "under" because there are sets of characte

Re: "Similarity" of numbers in MoreLikeThisHandler

2008-07-04 Thread wojtekpia
I stored 2 copies of a single field: one as a number, the other as a string. The MLT handler returned the same documents regardless of which of the 2 fields I used for similarity. So to answer my own question, the MoreLikeThisHandler does not do numeric comparisons on numeric fields. -- View this

Re: similarity search with solr

2008-02-26 Thread Erik Hatcher
On Feb 26, 2008, at 6:11 PM, Michael Hess wrote: Is it possible to to pass a document id to solr and get back the documents that are close to it? Indeed: Erik