Okay, you're right. It really would be cleaner, if I do such stuff in the code which populates the document to Solr.
Is there a way to prepare a document the described way with Lucene/Solr, before I analyze it? My use case is to categorize several documents in an automatic way, which includes that I have to "create" data from the given input doing some information retrieval. The problem is I am really new to Solr and Lucene - as you can see - and I do not know, whether there are some classes that fit my needs. Any idea? Erick Erickson wrote: > > Well, I'd approach either of these use cases > by simply performing my computations on > the input and storing the result in another > (non-indexed unless I wanted to search it) > field. This wouldn't happen in the Analyzer, > but in the code that populated the document > fields..... > > Which is a much cleaner solution IMO than creating > some sort of "index this but store that" capability. > The purpose of analysis is to produce *searchable* > tokens after all. > > But we're getting into angels dancing on pins here. Do > you actually have a use case you're trying to implement > or is this mostly theoretical? > > Erick > > On Thu, Jan 7, 2010 at 2:08 PM, MitchK <mitc...@web.de> wrote: > >> >> The difference between stored and indexed is clear now. >> >> You are right, if you are responsing only to "normal users". >> >> Use case: >> You got a stored field "The good, the bad and the ugly". >> And you got a really fantastic analyzer, which is doing some magic to >> this >> movie title. >> Let's say, the analyzer translates the title into md5 or into another >> abstract expression. >> Instead of doing the same magical function on the client's side again and >> again, he only needs to take the prepared data from your response. >> >> Another use case could be: >> Imagine you have got two categories: cheap and expensive and your >> document >> gots a title-, a label-, an owner- and a price-field. >> Imagine you would analyze, index and store them like you normally do and >> afterwards you want to set, whether the document belongs to the expensive >> item-group or not. >> If the price for the item is higher than 500$, it belongs to the >> expensive >> ones, otherwise not. >> I think, this would be a job for a special analyzer - and this only makes >> sense, if I also store the analyzed data. >> >> I think information retrieval is a really interesting use case. >> >> >> Erick Erickson wrote: >> > >> > What is your use case for "responding sometimes with the indexed >> value"? >> > Other than reconstructing a field that hasn't been stored, I can't >> think >> > of >> > one. >> > >> > I still think you're missing the point. Indexing and storing are >> > orthogonal operations that have (almost) nothing to do with each >> > other, for all that they happen at the same time on the same field. >> > >> > You never search against the stored data in a field. You *always* >> > search against the indexed data. >> > >> > Contrariwise, you never display the indexed form to the user, you >> > *always* show the stored data (unless you come up with >> > a really interesting use case). >> > >> > Step back and consider what happens when you index data, >> > it gets broken up all kinds of ways. Stop words are removed, >> > case may change, etc, etc, etc. It makes no sense to >> > then display this data for a user. Would you really like >> > to have, say a movie title "The Good, The Bad, and The >> > Ugly". Remove stopwords, puncuation and lowercase >> > and you index three tokens "good", "bad", "ugly". >> > Even if you reconstruct this field, the user would see >> > "good bad ugly". Bad, very bad. >> > >> > Yet I want to display the original title to the user in >> > response to searching on "ugly", so I need the >> > original, unanalyzed data. >> > >> > Perhaps it would help to think of it this way. >> > 1> take some data and index it in f1 >> > but do NOT store it in f1. Store it in f2 >> > but do NOT index it in f2. >> > 2> take that same data, index AND store >> > it in f3. >> > >> > <1> is almost entirely equivalent to <2> >> > in terms of index resources. >> > >> > Practically though, <1> is harder to use, >> > because you have to remember >> > to use f1 for searching and f2 for getting >> > the raw data. >> > >> > HTH >> > Erick >> > >> > On Thu, Jan 7, 2010 at 12:11 PM, MitchK <mitc...@web.de> wrote: >> > >> >> >> >> Thank you, Ryan. I will have a look on lucene's material and luke. >> >> >> >> I think I got it. :) >> >> >> >> Sometimes there will be the need, to response on the one hand the >> value >> >> and >> >> on the other hand the indexed version of the value. >> >> How can I fullfill such needs? Doing copyfield on indexed-only fields? >> >> >> >> >> >> >> >> ryantxu wrote: >> >> > >> >> > >> >> > On Jan 7, 2010, at 10:50 AM, MitchK wrote: >> >> > >> >> >> >> >> >> Eric, >> >> >> >> >> >> you mean, everything is okay, but I do not see it? >> >> >> >> >> >>>> Internally for searching the analysis takes place and writes to >> the >> >> >>>> index in an inverted fashion, but the stored stuff is left alone. >> >> >> >> >> >> if I use an analyzer, Solr "stores" it's output two ways? >> >> >> One public output, which is similar to the original input >> >> >> and one "hidden" or internal output, which is based on the >> >> >> analyzer's work? >> >> >> Did I understand that right? >> >> > >> >> > yes. >> >> > >> >> > indexed fields and stored fields are different. >> >> > >> >> > Solr results show stored fields in the results (however facets are >> >> > based on indexed fields) >> >> > >> >> > Take a look at Lucene in Action for a better description of what is >> >> > happening. The best tool to get your head around what is happening >> is >> >> > probably luke (http://www.getopt.org/luke/) >> >> > >> >> > >> >> >> >> >> >> If yes, I have got another problem: >> >> >> I don't want to waste any diskspace. >> >> > >> >> > You have control over what is stored and what is indexed -- how that >> >> > is configured is up to you. >> >> > >> >> > ryan >> >> > >> >> > >> >> >> >> -- >> >> View this message in context: >> >> >> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27065305.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27076795.html Sent from the Solr - User mailing list archive at Nabble.com.