Re: Edgengram

2011-06-01 Thread Brian Lamb
I think in my case LowerCaseTokenizerFactory will be sufficient because there will never be spaces in this particular field. But thank you for the useful link! Thanks, Brian Lamb On Wed, Jun 1, 2011 at 11:44 AM, Erick Erickson wrote: > Be a little careful here. LowerCaseTokenizerFactory is diff

Re: Edgengram

2011-06-01 Thread Erick Erickson
Be a little careful here. LowerCaseTokenizerFactory is different than KeywordTokenizerFactory. LowerCaseTokenizerFactory will give you more than one term. e.g. the string "Intelligence can't be MeaSurEd" will give you 5 terms, any of which may match. i.e. "intelligence", "can", "t", "be", "measure

Re: Edgengram

2011-06-01 Thread Brian Lamb
Hi Tomás, Thank you very much for your suggestion. I took another crack at it using your recommendation and it worked ideally. The only thing I had to change was to The first did not produce any results but the second worked beautifully. Thanks! Brian Lamb 2011/5/31 Tomás Fernánde

Re: Edgengram

2011-05-31 Thread Tomás Fernández Löbbe
...or also use the LowerCaseTokenizerFactory at query time for consistency, but not the edge ngram filter. 2011/5/31 Tomás Fernández Löbbe > Hi Brian, I don't know if I understand what you are trying to achieve. You > want the term query "abcdefg" to have an idf of 1 insead of 7? I think using >

Re: Edgengram

2011-05-31 Thread Tomás Fernández Löbbe
Hi Brian, I don't know if I understand what you are trying to achieve. You want the term query "abcdefg" to have an idf of 1 insead of 7? I think using the KeywordTokenizerFilterFactory at query time should work. I would be something like: this way, at query time "abcde

Re: Edgengram

2011-05-31 Thread Brian Lamb
I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and i

Re: Edgengram

2011-05-31 Thread bmdakshinamur...@gmail.com
Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue,

Re: Edgengram

2011-05-31 Thread Brian Lamb
In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type "abcdefg". That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, c

Re: Edgengram

2011-05-31 Thread Erick Erickson
That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb wrote: > For this, I ended u

Re: Edgengram

2011-05-27 Thread Brian Lamb
For this, I ended up just changing it to string and using "abcdefg*" to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb wrote: > Hi all, > > I'm running into some confusion with the way edgengram works. I have the > field set up as: > > position

Re: EdgeNgram Auto suggest - doubles ignore

2011-02-08 Thread Erick Erickson
I'm afraid I'll have to pass, I'm absolutely swamped at the moment. Perhaps someone else can pick it up. I will say that you should be getting terms back when you pre-lower-case them, so look in your index via the admin page or Luke to see if what's really in your index is what you think in the "n

Re: EdgeNgram Auto suggest - doubles ignore

2011-02-08 Thread johnnyisrael
Hi Erick, If you have time, Can you please take a look and provide your comments (or) suggestions for this problem? Please let me know if you need any more information. Thanks, Johnny -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp

Re: EdgeNgram Auto suggest - doubles ignore

2011-02-01 Thread johnnyisrael
Hi Erick, I tried to use terms component, I got ended up with the following problems. Problem: 1 Custom Sort not working in terms component: http://lucene.472066.n3.nabble.com/Term-component-sort-is-not-working-td1905059.html#a1909386 I want to sort using one of my custom fiel

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Erick Erickson
OK, try this. Use some analysis chain for your field like: This can be a multiValued field, BTW. now use the TermsComponent to fetch your data. See: http://wiki.apache.org/solr/TermsComponent and specify terms.prefix=apple e.g. http://localhost:8983/solr/terms?terms.prefix=app&terms.fl=bli

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread mesenthil
Right now our configuration says multivalues=true. But that need not be "true" in our case. Will make it false and try and update this thread with more details.. -- View this message in context: http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp2321919p2334627.html Sent

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Jonathan Rochkind
Ah, sorry, I got confused about your requirements, if you just want to match at the beginning of the field, it may be more possible. Using edgegrams or wildcard. If you have a single-valued field. Do you have a single-valued or a multi-valued field? That is, does each document have just one v

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread mesenthil
The index contains around 1.5 million documents. As this is used for autosuggest feature, performance is an important factor. So it looks like, using edgeNgram it is difficult to achieve the the following Result should return only those terms where search letter is matching with the first word

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Markus Jelsma
Oh, i should perhaps mention that EdgeNGrams will yield results a lot quicker than using wildcards at the cost of a larger index. You should, of course, use EdgeNGrams if you worry about performance and have a huge index and a number of queries per second. > Then you don't need NGrams at all. A

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Markus Jelsma
Then you don't need NGrams at all. A wildcard will suffice or you can use the TermsComponent. If these strings are indexed as single tokens (KeywordTokenizer with LowercaseFilter) you can simply do field:app* to retrieve the "apple milk shake". You can also use the string field type but then yo

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Jonathan Rochkind
I haven't figured out any way to achieve that AT ALL without making a seperate Solr index just to serve autosuggest queries. At least when you want to auto-suggest on a multi-value field. Someone posted a crazy tricky way to do it with a single-valued field a while ago. If you can/are willing

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread johnnyisrael
Hi Eric, What I want here is, lets say I have 3 documents like ["pineapple vers apple", "milk with apple", "apple milk shake" ] and If i search for "apple", it should return only "apple milk shake" because that term alone starts with the letter "apple" which I typed in. It should not bring oth

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread Erick Erickson
Let's back up here because now I'm not clear what you actually want. EdgeNGrams are a way of matching substrings, which is what's happening here. Of course searching "apple" against any of the three examples, just as searching for "apple" without grams would match, that's the expected behavior. So

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-25 Thread johnnyisrael
Hi Eric, You are right, there is a copy field to EdgeNgram, I tried the configuration but it not working as expected. Configuration I tried: edgy_user_query == When I search for th

Re: EdgeNgram Auto suggest - doubles ignore

2011-01-24 Thread Erick Erickson
See below. On Mon, Jan 24, 2011 at 1:51 PM, johnnyisrael wrote: > > Hi, > > I am trying out the auto suggest using EdgeNgram. > > Using the following tutorial as a reference. > > > http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ > > In the above

Re: EdgeNGram relevancy

2010-11-16 Thread Robert Gründler
it seems adding the '+' (required) operator to each term in a multi-term query does the trick: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#+ ie: edgytext2:(+Martin +Sco) -robert On Nov 16, 2010, at 8:52 PM, Robert Gründler wrote: > thanks for the explanation. > > the result

Re: EdgeNGram relevancy

2010-11-16 Thread Robert Gründler
thanks for the explanation. the results for the autocompletion are pretty good now, but we still have a small problem. When there are hits in the "edgytext2" fields, results which only have hits in the "edgytext" field should not be returned at all. Example: Query: "Martin Sco" Current Resu

Re: EdgeNGram relevancy

2010-11-11 Thread Jonathan Rochkind
Without the parens, the "edgytext:" only applied to "Mr", the default field still applied to "Scorcese". The double quotes are neccesary in the second case (rather than parens), because on a non-tokenized field because the standard query parser will "pre-tokenize" on whitespace before sending

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
> > Did you run your query without using () and "" operators? If yes can you try > this? > &q=edgytext:(Mr Scorsese) OR edgytext2:"Mr Scorsese"^2.0 I didn't use () and "" in my query before. Using the query with those operators works now, stopwords are thrown out as the should, thanks. However,

Re: EdgeNGram relevancy

2010-11-11 Thread Andy
Ah I see. Thanks for the explanation. Could you set the defaultOperator to "AND"? That way both "Bill" and "Cl" must be a match and that would exclude "Clyde Phillips". --- On Thu, 11/11/10, Robert Gründler wrote: > From: Robert Gründler > Su

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
according to the fieldtype i posted previously, i think it's because of: 1. WhiteSpaceTokenizer splits the String "Clyde Phillips" into 2 tokens: "Clyde" and "Phillips" 2. EdgeNGramFilter gets the 2 tokens, and creates an EdgeNGram for each token: "C" "Cl" "Cly" ... AND "P" "Ph" "Phi" ... Th

Re: EdgeNGram relevancy

2010-11-11 Thread Andy
Could anyone help me understand what does "Clyde Phillips" appear in the results for "Bill Cl"?? "Clyde Phillips" doesn't produce any EdgeNGram that would match "Bill Cl", so why is it even in the results? Thanks. --- On Thu, 11/11/10, Ahmet Arslan wrote: > You can add an additional field, w

Re: EdgeNGram relevancy

2010-11-11 Thread Nick Martin
On 12 Nov 2010, at 01:46, Ahmet Arslan wrote: >> This setup now makes troubles regarding StopWords, here's >> an example: >> >> Let's say the index contains 2 Strings: "Mr Martin >> Scorsese" and "Martin Scorsese". "Mr" is in the stopword >> list. >> >> Query: edgytext:Mr Scorsese OR edgytext2

Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan
> This setup now makes troubles regarding StopWords, here's > an example: > > Let's say the index contains 2 Strings: "Mr Martin > Scorsese" and "Martin Scorsese". "Mr" is in the stopword > list. > > Query: edgytext:Mr Scorsese OR edgytext2:Mr Scorsese^2.0 > > This way, the only result i get is

Re: EdgeNGram relevancy

2010-11-11 Thread Robert Gründler
thanks a lot, that setup works pretty well now. the only problem now is that the StopWords do not work that good anymore. I'll provide an example, but first the 2 fieldtypes:

Re: EdgeNGram relevancy

2010-11-11 Thread Ahmet Arslan
You can add an additional field, with using KeywordTokenizerFactory instead of WhitespaceTokenizerFactory. And query both these fields with an OR operator. edgytext:(Bill Cl) OR edgytext2:"Bill Cl" You can even apply boost so that begins with matches comes first. --- On Thu, 11/11/10, Robert G