Sure. copyField it into a new indexed non-stored field with the following type definition: <fieldType name="address_email" class="solr.TextField"> <analyzer> <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/> <filter class="solr.TypeTokenFilterFactory" types="filter_email.txt" enablePositionIncrements="true" useWhitelist="true"/> </analyzer> </fieldType>
Content of filter_email.txt is (including <> signs): <EMAIL> You will have the emails only left as tokens. Can't display them easily, but can certainly search. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Mar 14, 2013 at 2:33 PM, Jorge Luis Betancourt Gonzalez < jlbetanco...@uci.cu> wrote: > Sorry for the duplicated mail :-(, any advice on a configuration for > searching emails in a field that does not have only email addresses, so the > email addresses are contained in larger textual messages? > > ----- Mensaje original ----- > De: "Ahmet Arslan" <iori...@yahoo.com> > Para: solr-user@lucene.apache.org > Enviados: Jueves, 14 de Marzo 2013 11:23:47 > Asunto: Re: Question about email search > > Hi, > > Since you have word delimiter filter in your analysis chain, I am not sure > if e-mail addresses are recognised. You can check that on solr admin UI, > analysis page. > > If e-mail addresses kept one token, I would use leading wildcard query. > &q=*@gmail.com > > There was a similar question recently: > http://search-lucene.com/m/XF2ejnM6Vi2 > > --- On Thu, 3/14/13, Jorge Luis Betancourt Gonzalez <jlbetanco...@uci.cu> > wrote: > > > From: Jorge Luis Betancourt Gonzalez <jlbetanco...@uci.cu> > > Subject: Question about email search > > To: solr-user@lucene.apache.org > > Date: Thursday, March 14, 2013, 5:11 PM > > I'm using solr 3.6.2 to crawl some > > data using nutch, in my schema I've one field with all the > > content extracted from the page, which could possibly > > include email addresses, this is the configuration of my > > schema: > > > > <fieldType name="text" > > class="solr.TextField" > > > > positionIncrementGap="100" > > autoGeneratePhraseQueries="true"> > > <analyzer > > type="index"> > > > > <tokenizer class="solr.StandardTokenizerFactory"/> > > > > <filter class="solr.StandardFilterFactory"/> > > > > <filter class="solr.ISOLatin1AccentFilterFactory"/> > > > > <filter class="solr.SnowballPorterFilterFactory" > > languange="Spanish"/> > > > > <charFilter class="solr.HTMLStripCharFilterFactory"/> > > > > <filter class="solr.StopFilterFactory" > > > > ignoreCase="true" words="stopwords.txt"/> > > > > <filter class="solr.WordDelimiterFilterFactory" > > > > generateWordParts="1" > > generateNumberParts="1" > > > > catenateWords="1" catenateNumbers="1" > > catenateAll="0" > > > > splitOnCaseChange="1"/> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > <filter > > class="solr.RemoveDuplicatesTokenFilterFactory"/> > > </analyzer> > > </fieldType> > > > > The thing is that I'm trying to search against a field of > > this type (text) with a value like "@gmail.com" and I'm > > intended to get documents with that text, any advice? > > > > slds > > -- > > "It is only in the mysterious equation of love that any > > logical reasons can be found." > > "Good programmers often confuse halloween (31 OCT) with > > christmas (25 DEC)" > > > > >