Re: No Analyzer, tokenizer or stemmer works at Solr

Erick Erickson Thu, 07 Jan 2010 12:53:47 -0800

Well, I'd approach either of these use cases
by simply performing my computations on
the input and storing the result in another
(non-indexed unless I wanted to search it)
field. This wouldn't happen in the Analyzer,
but in the code that populated the document
fields.....


Which is a much cleaner solution IMO than creating
some sort of "index this but store that" capability.
The purpose of analysis is to produce *searchable*
tokens after all.

But we're getting into angels dancing on pins here. Do
you actually have a use case you're trying to implement
or is this mostly theoretical?

Erick

On Thu, Jan 7, 2010 at 2:08 PM, MitchK <mitc...@web.de> wrote:

>
> The difference between stored and indexed is clear now.
>
> You are right, if you are responsing only to "normal users".
>
> Use case:
> You got a stored field "The good, the bad and the ugly".
> And you got a really fantastic analyzer, which is doing some magic to this
> movie title.
> Let's say, the analyzer translates the title into md5 or into another
> abstract expression.
> Instead of doing the same magical function on the client's side again and
> again, he only needs to take the prepared data from your response.
>
> Another use case could be:
> Imagine you have got two categories: cheap and expensive and your document
> gots a title-, a label-, an owner- and a price-field.
> Imagine you would analyze, index and store them like you normally do and
> afterwards you want to set, whether the document belongs to the expensive
> item-group or not.
> If the price for the item is higher than 500$, it belongs to the expensive
> ones, otherwise not.
> I think, this would be a job for a special analyzer - and this only makes
> sense, if I also store the analyzed data.
>
> I think information retrieval is a really interesting use case.
>
>
> Erick Erickson wrote:
> >
> > What is your use case for "responding sometimes with the indexed value"?
> > Other than reconstructing a field that hasn't been stored, I can't think
> > of
> > one.
> >
> > I still think you're missing the point. Indexing and storing are
> > orthogonal operations that have (almost) nothing to do with each
> > other, for all that they happen at the same time on the same field.
> >
> > You never search against the stored data in a field. You *always*
> > search against the indexed data.
> >
> > Contrariwise, you never display the indexed form to the user, you
> > *always* show the stored data (unless you come up with
> > a really interesting use case).
> >
> > Step back and consider what happens when you index data,
> > it gets broken up all kinds of ways. Stop words are removed,
> > case may change, etc, etc, etc. It makes no sense to
> > then display this data for a user. Would you really like
> > to have, say a movie title "The Good, The Bad, and The
> > Ugly". Remove stopwords, puncuation and lowercase
> > and you index three tokens "good", "bad", "ugly".
> > Even if you reconstruct this field, the user would see
> > "good bad ugly". Bad, very bad.
> >
> > Yet I want to display the original title to the user in
> > response to searching on "ugly", so I need the
> > original, unanalyzed data.
> >
> > Perhaps it would help to think of it this way.
> > 1> take some data and index it in f1
> >     but do NOT store it in f1. Store it in f2
> >     but do NOT index it in f2.
> > 2> take that same data, index AND store
> >     it in f3.
> >
> > <1> is almost entirely equivalent to <2>
> > in terms of index resources.
> >
> > Practically though, <1> is harder to use,
> > because you have to remember
> > to use f1 for searching and f2 for getting
> > the raw data.
> >
> > HTH
> > Erick
> >
> > On Thu, Jan 7, 2010 at 12:11 PM, MitchK <mitc...@web.de> wrote:
> >
> >>
> >> Thank you, Ryan. I will have a look on lucene's material and luke.
> >>
> >> I think I got it. :)
> >>
> >> Sometimes there will be the need, to response on the one hand the value
> >> and
> >> on the other hand the indexed version of the value.
> >> How can I fullfill such needs? Doing copyfield on indexed-only fields?
> >>
> >>
> >>
> >> ryantxu wrote:
> >> >
> >> >
> >> > On Jan 7, 2010, at 10:50 AM, MitchK wrote:
> >> >
> >> >>
> >> >> Eric,
> >> >>
> >> >> you mean, everything is okay, but I do not see it?
> >> >>
> >> >>>> Internally for searching the analysis takes place and writes to the
> >> >>>> index in an inverted fashion, but the stored stuff is left alone.
> >> >>
> >> >> if I use an analyzer, Solr "stores" it's output two ways?
> >> >> One public output, which is similar to the original input
> >> >> and one "hidden" or internal output, which is based on the
> >> >> analyzer's work?
> >> >> Did I understand that right?
> >> >
> >> > yes.
> >> >
> >> > indexed fields and stored fields are different.
> >> >
> >> > Solr results show stored fields in the results (however facets are
> >> > based on indexed fields)
> >> >
> >> > Take a look at Lucene in Action for a better description of what is
> >> > happening.  The best tool to get your head around what is happening is
> >> > probably luke (http://www.getopt.org/luke/)
> >> >
> >> >
> >> >>
> >> >> If yes, I have got another problem:
> >> >> I don't want to waste any diskspace.
> >> >
> >> > You have control over what is stored and what is indexed -- how that
> >> > is configured is up to you.
> >> >
> >> > ryan
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27065305.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: No Analyzer, tokenizer or stemmer works at Solr

Reply via email to