See below. If this still doesn't make sense, could you show us some examples?
Best Erick On Tue, Nov 30, 2010 at 8:33 AM, Greg Smith <audi...@gmail.com> wrote: > Bernd, > > Looking at the results returned in the search results the field is > populated > with all of the information regardless of whether there was an email > contained in the contents. > > Right here is what Bernd was talking about. What's returned is the stored, verbatim text that was input. It is a literal copy. It has not been analyzed, fitered, or otherwise manipulated. Consider the poor user if this wasn't the case. You input "The Party is going swimmingly", would you really want the user to see "parti go swim"? So the returned data is the literal input. Which has nothing to do with what's searched. Searching is done against the analyzed text. > Would the way the analysers and tokens be handled different if using a copy > field? > > It's not. I literal copy of the input is sent to the copyfield and the analysis stack you've defined for that field is used. Do note that the raw data is sent to the copy field, not the analyzed stream. Try looking at the solr/admin/schema.jsp page (schema browser) to see the terms, which are the analyzed form of your input for your fields.... You might get some additional mileage out of TermsComponent, see: http://wiki.apache.org/solr/TermsComponent > Thanks > > On 30 November 2010 10:54, Bernd Fehling <bernd.fehl...@uni-bielefeld.de > >wrote: > > > > > Am 30.11.2010 10:56, schrieb Greg Smith: > > > Hi, > > > > > > I have written a plugin to filter on email types and keep those tokens, > > > however when I run it in the analysis in the admin it all works fine. > > > > > > But when I use the data import handler to import the data and set the > > field > > > type it doesn't remove the other tokens and keeps the field in the > > original > > > form. > > > > > > I have sent the query and index analyzers to use the standard tokenizer > > > factory and my custom email filter only. > > > > > > What could be causing this issue? > > > > > > > It sound like my misunderstanding which I had till the end of > > last week about indexing and storing of solr/lucene databases. > > I also had several Tokenizers and Filters and thought they aren't working > > but only in analysis of admin. > > As a matter of fact if they work in the analysis of admin then they work > > :-) > > But you can't see it with the search result page, because the search > result > > page is always displaying the original stored value _not_ the tokenized > or > > filtered > > indexed value. > > The Tokenized/Filtered content will be indexed which is not represented > > with the result page. > > Check with Schema Browser from admin what the indexed content of your > > Tokenized/Filtered field is. > > > > Best regards > > Bernd > > >