Thanks a bunch, got it working with a reluctant qualifier and the use of " as the escaped representation of double qoutes within the regex value so that the config file doesn't crash & burn:
<field column="imageUrl" regex=".*?img src="(.*?)".*" sourceColName="description" /> Cheers, - Pulkit On Wed, Sep 14, 2011 at 2:24 PM, Pulkit Singhal <pulkitsing...@gmail.com> wrote: > Hello, > > Feel free to point me to alternate sources of information if you deem > this question unworthy of the Solr list :) > > But until then please hear me out! > > When my config is something like: > <field column="imageUrl" > regex=".*img src=.(.*)\.gif..alt=.*" > sourceColName="description" > /> > I don't get any data. > > But when my config is like: > <field column="imageUrl" > regex=".*img src=.(.*)..alt=.*" > sourceColName="description" > /> > I get the following data as the value for imageUrl: > http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_.gif" > width="64" > > As the result shows, this is a string that should be able to match > even on the 1st regex=".*img src=.(.*)\.gif..alt=.*" and produce a > result like: > http://g-ecx.images-amazon.com/images/G/01/x-locale/common/customer-reviews/stars-5-0._V192240867_ > But it doesn't! > Can anyone tell me why that would be the case? > Is it something about the way RegexTransformer is wired or is it just > my regex value that isn't right? >