DIH : RegexTransformer with groupNames requires all groups to be not empty?

Chantal Ackermann Tue, 03 Nov 2009 05:46:09 -0800

follow-up:


regex="([^\|]+)\|\d+,\d+,\d+,(.+)"

is the version I chose after I had the following problems with
regex="([^\|]+)\|\d+,\d+,\d+,(.*)"
(changed * into + for the second group):

The role field contained empty values even if I added aTrimFilterFactory with minimum length of 1. So, I changed the regularexpression to find only non-empty values. Well, it does now - but if itcannot find a value for the second group it doesn't even add the valuefor the first group.


Any help on getting this solved is greatly appreciated.
It boils down to this question:

- How can I achieve that the RegexTransformer adds a value only if

it contains a non-empty value and avoiding at the same time that it onlyadds values when all of the groups contain values.

Maybe the configuration with groupNames is meant to work like that. Ifthat is the case, it's probably worth adding this information to theWiki. I will change back to using the sourceCol attribute as

https://issues.apache.org/jira/browse/SOLR-1498
should be fixed with this 1.4.0RC version, now.

Thanks!
Chantal

Chantal Ackermann schrieb:

Dear all,

my DIH config contains the following directive for the RegexTransformer:

<field column="person" groupNames="participant,role"
regex="([^\|]+)\|\d+,\d+,\d+,(.+)" />

(this is SOLR 1.4.0 RC downloaded yesterday from Grant's URL)

It expects input of the kind (version A):
Daniel Radcliffe|24897,1,1,Harry Potter

It should also work with (version B):
Daniel Radcliffe|24897,1,1,

In my index, however, I can only find documents that either contain
participant and role or neither. Of course, I didn't check all
documents. But for both fields, Luke shows the same number of documents:
Docs:  47015

(There are definitely datasets that contain participants without role.)

I'll check the code and try with a different configuration (using
sourceCol). But I thought I'd spread the news before the release is definit.

Thanks,
Chantal

DIH : RegexTransformer with groupNames requires all groups to be not empty?

Reply via email to