A faster way to do Regex transform is to use the 'PatternReplace' tokenizer or filter. These are inside the schema processing tree, not in the DIH tree.
You would use <copyField> to get the data from your input field to a copy with the regex pattern analyzer type. Look in schema.xml for an example of using the Pattern tools. On Thu, May 10, 2012 at 4:54 AM, Husain, Yavar <yhus...@firstam.com> wrote: > Thanks Jack. > > I tried (Regex Transformer) it out and the indexing has gone really slow. Is > it (RegEx Transformer) slower than N-Gram Indexing? I mean they may be apples > and oranges but what I mean is finally after extracting the field I want to > NGram Index it. So It seems going in for NGram Indexing of Full Text (i.e. > without extracting what I need using RegexTransformer) is a better solution > ignoring space complexity?? > > Any views? > > THANKS!! > > -----Original Message----- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Thursday, May 10, 2012 4:09 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr On Fly Field creation from full text for N-Gram Indexing > > You can use "Regex Transformer" to extract from a source field. > > See: > http://wiki.apache.org/solr/DataImportHandler#RegexTransformer > > -- Jack Krupansky > > -----Original Message----- > From: Husain, Yavar > Sent: Thursday, May 10, 2012 6:04 AM > To: solr-user@lucene.apache.org > Subject: Solr On Fly Field creation from full text for N-Gram Indexing > > I have full text in my database and I am indexing that using Solr. Now at > runtime i.e. when the indexing is going on can I extract certain parameters > based on regex and create another field/column on the fly using Solr for that > extracted text? > > For example my DB has just 2 columns (DocId & FullText): > > DocId FullText > 1 My name is Avi. RoleId: GYUIOP-MN-1087456. ..... > > Now say while indexing I want to extract RoleId, place it in another column > created on fly and index that column using N-Gram indexing. I dont want to go > for N-Gram of Full text as that would be too time expensive. > > Thanks!! Any clues would be appreciated. > </PRE> > <BR> > ******************************************************************************************<BR>This > message may contain confidential or proprietary information intended only for > the use of the<BR>addressee(s) named above or may contain information that is > legally privileged. If you are<BR>not the intended addressee, or the person > responsible for delivering it to the intended addressee,<BR>you are hereby > notified that reading, disseminating, distributing or copying this message is > strictly<BR>prohibited. If you have received this message by mistake, please > immediately notify us by<BR>replying to the message and delete the original > message and any copies immediately thereafter.<BR> <BR> Thank you.~<BR> > ******************************************************************************************<BR> > FAFLD<BR> > <PRE> > -- Lance Norskog goks...@gmail.com