Re: Deciding whether to stem at query time

Erick Erickson Tue, 24 Apr 2012 08:32:58 -0700

When you set store="true" in your schema, a verbatim copy of
the raw input is placed in the *.fdt file. That is the information
returned when you specify the "fl" parameter for instance.

When you set index="true", the input is analyzed and the
resulting terms are placed in the inverted index and are
searchable.

The two are essentially completely orthogonal for all you
specify them at the same time.

So, a field that's stored but not indexed would be displayable
to the user, but no searches could be performed on it.

A field indexed but stored can be searched, but the information
is not retrievable.

Why are there two options? Well, you may use copyField to
index the data two different ways for two different purposes, as
in this thread. Putting the verbatim data in twice is wasteful,
you only ever need it once.

Why store in the first palce? Because all that gets into the
inverted index is the results of the analysis. So if you indexed
"story" with stemming turned on, it might result in "stori" being
in the index. And if you use phonetic filters, it's much worse,
your terms will be something like "UNT4" or "KMPT" which are
totally unsuitable to show the user. So if you want to _search_
phonetically but display the field to the user, you would both
index and store.

And even if you could recover the terms from the inverted
index as they were fed in, it would be a very expensive
process. Luke does this, you might try reconstructing
a document with Luke to see what a reconstructed doc
looks like, and how long it takes.

Hope that helps
Erick

On Tue, Apr 24, 2012 at 10:40 AM, Andrew Wagner <wagner.and...@gmail.com> wrote:
> I'm sorry, I'm missing something. What's the difference between "storing"
> and "indexing" a field?
>
> On Tue, Apr 24, 2012 at 10:28 AM, Paul Libbrecht <p...@hoplahup.net> wrote:
>
>>
>> Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit :
>> > This would not necessarily increase the size of your index that much -
>> you don't to store both fields, just 1 of them if you really need it for
>> highlighting or displaying.  If not, just index.
>>
>> I second this.
>> The query expansion process is far from being a slow thing... you can
>> easily expand to tens of fields with a fairly small penalty.
>>
>> Where you have a penalty is at stored fields... these need to be really
>> carefully avoided as much as possible.
>> As long as you keep them small, the legendary performance of SOLR will
>> still hold.
>>
>> paul

Re: Deciding whether to stem at query time

Reply via email to