Re: Deciding whether to stem at query time

2012-04-25 Thread Otis Gospodnetic
Andrew Wagner >To: solr-user@lucene.apache.org >Sent: Tuesday, April 24, 2012 10:40 AM >Subject: Re: Deciding whether to stem at query time > >I'm sorry, I'm missing something. What's the difference between "storing" >and "indexing" a field?

Re: Deciding whether to stem at query time

2012-04-24 Thread Erick Erickson
When you set store="true" in your schema, a verbatim copy of the raw input is placed in the *.fdt file. That is the information returned when you specify the "fl" parameter for instance. When you set index="true", the input is analyzed and the resulting terms are placed in the inverted index and a

Re: Deciding whether to stem at query time

2012-04-24 Thread Andrew Wagner
I'm sorry, I'm missing something. What's the difference between "storing" and "indexing" a field? On Tue, Apr 24, 2012 at 10:28 AM, Paul Libbrecht wrote: > > Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit : > > This would not necessarily increase the size of your index that much - > you don't

Re: Deciding whether to stem at query time

2012-04-24 Thread Paul Libbrecht
Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit : > This would not necessarily increase the size of your index that much - you > don't to store both fields, just 1 of them if you really need it for > highlighting or displaying. If not, just index. I second this. The query expansion process i

Re: Deciding whether to stem at query time

2012-04-24 Thread Otis Gospodnetic
mance-monitoring > > From: Andrew Wagner >To: solr-user@lucene.apache.org >Sent: Tuesday, April 24, 2012 7:21 AM >Subject: Re: Deciding whether to stem at query time > >Ah, this is a really good point. Still seems like it has the downsides o

Re: Deciding whether to stem at query time

2012-04-24 Thread Andrew Wagner
Ah, this is a really good point. Still seems like it has the downsides of #2, though, much bigger space requirements and possibly some time lost on queries. On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood wrote: > There is a third approach. Create two fields and always query both of > them, wit

Re: Deciding whether to stem at query time

2012-04-23 Thread Walter Underwood
Right. Stemming is less useful for author fields, you don't need to match "bill gate" or "steve job". Also, if you want to do fuzzy matching, you should only do that on the exact fields, not the stemmed fields. wunder On Apr 23, 2012, at 3:45 PM, Michael Sokolov wrote: > Yes, and you might ch

Re: Deciding whether to stem at query time

2012-04-23 Thread Michael Sokolov
Yes, and you might choose to use different options for different fields. For dictionary searches, where users are searching for specific words, and a high degree of precision is called for, stemming is less helpful, but for full text searches, more so. -Mike On 4/23/2012 3:35 PM, Walter Unde

Re: Deciding whether to stem at query time

2012-04-23 Thread Walter Underwood
There is a third approach. Create two fields and always query both of them, with the exact field given a higher weight. This works great and performs well. It is what we did at Netflix and what I'm doing at Chegg. wunder On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: > So I just realized t