A faster way to do Regex transform is to use the 'PatternReplace'
tokenizer or filter. These are inside the schema processing tree, not
in the DIH tree.

You would use <copyField> to get the data from your input field to a
copy with the regex pattern analyzer type. Look in schema.xml for an
example of using the Pattern tools.

On Thu, May 10, 2012 at 4:54 AM, Husain, Yavar <yhus...@firstam.com> wrote:
> Thanks Jack.
>
> I tried (Regex Transformer) it out and the indexing has gone really slow. Is 
> it (RegEx Transformer) slower than N-Gram Indexing? I mean they may be apples 
> and oranges but what I mean is finally after extracting the field I want to 
> NGram Index it. So It seems going in for NGram Indexing of Full Text (i.e. 
> without extracting what I need using RegexTransformer) is a better solution 
> ignoring space complexity??
>
> Any views?
>
> THANKS!!
>
> -----Original Message-----
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Thursday, May 10, 2012 4:09 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr On Fly Field creation from full text for N-Gram Indexing
>
> You can use "Regex Transformer" to extract from a source field.
>
> See:
> http://wiki.apache.org/solr/DataImportHandler#RegexTransformer
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Husain, Yavar
> Sent: Thursday, May 10, 2012 6:04 AM
> To: solr-user@lucene.apache.org
> Subject: Solr On Fly Field creation from full text for N-Gram Indexing
>
> I have full text in my database and I am indexing that using Solr. Now at 
> runtime i.e. when the indexing is going on can I extract certain parameters 
> based on regex and create another field/column on the fly using Solr for that 
> extracted text?
>
> For example my DB has just 2 columns (DocId & FullText):
>
> DocId    FullText
> 1            My name is Avi. RoleId: GYUIOP-MN-1087456. .....
>
> Now say while indexing I want to extract RoleId, place it in another column 
> created on fly and index that column using N-Gram indexing. I dont want to go 
> for N-Gram of Full text as that would be too time expensive.
>
> Thanks!! Any clues would be appreciated.
> </PRE>
> <BR>
> ******************************************************************************************<BR>This
> message may contain confidential or proprietary information intended only for 
> the use of the<BR>addressee(s) named above or may contain information that is 
> legally privileged. If you are<BR>not the intended addressee, or the person 
> responsible for delivering it to the intended addressee,<BR>you are hereby 
> notified that reading, disseminating, distributing or copying this message is 
> strictly<BR>prohibited. If you have received this message by mistake, please 
> immediately notify us by<BR>replying to the message and delete the original 
> message and any copies immediately thereafter.<BR> <BR> Thank you.~<BR> 
> ******************************************************************************************<BR>
> FAFLD<BR>
> <PRE>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to