Hello, Jack, Steve, Thank you for your answers. I´ve never used UAX29URLEmailTokenizerFactory, but I´ve read about it before trying RegExp´s queries. As far as I know, UAX29URLEmailTokenizerFactory allows to tokenize an entry text value into patterns that match URLs, E-mails, etc. Reading the documentation I haven´t found any way to select just E-mail patterns, not URL ones, for example. I feel that it may have sense to specify one or multiple patterns in a configuration file to be setted during the Tokenizer definition in the schema.xml, but I found nothing.
I´ve just want to retrieve those documents indexed where they appear at least one E-mail inside de text. However, even using UAX29URLEmailTokenizerFactory, and suposing that I store that E-mail data in a field called 'emails' (I feel creative, hehe), a query like the following appears to be... dirty: http://localhost:8080/mysolr/select?q=emails:[* TO *]&start=0&rows=10&sort=mydate desc What do you think about? And Andy... I know many RegExps to find E-mail patterns in a text - that wasn´t my question, and of course there is no perfect one. However, Lucene RegExp syntax is different from classic RegExp one, so is not as easy as copy & paste any RegExps and, voilá! E-mails everywhere. Thank you very much in advance, Best regards, 2013/7/30 Jack Krupansky <j...@basetechnology.com> > Just use the UAX29URLEmailTokenizerFactory, which recognizes email > addresses. > > Any particular reason that you're trying to reinvent the wheel? > > -- Jack Krupansky > > -----Original Message----- From: Luis Cappa Banda > Sent: Tuesday, July 30, 2013 10:53 AM > To: solr-user@lucene.apache.org > Subject: Email regular expression. > > > Hello everyone! > > Unfortunately I have to search all E-mail addresses found in a text field > from each document. I've been reading for a while how to use RegExp's in > Solr, but after trying some of them they didn't work. I've noticed that > Lucene RegExp syntax sometimes is very different from the classic RegExp > syntax, so that may be the reason why they didn't work for me, and maybe > someone more expert can help me. > > The syntax is the following: > > *E-mail: * > > text:/[a-z0-9_\|-]+(\.[a-z0-9_**\|-]|)*@[a-z0-9-]|(\.[a-z0-9-]** > |)*\.([a-z]{2,4})/ > > Thank you very much in advance! > > Best regards, > > -- > - Luis Cappa > -- - Luis Cappa