Re: Copy field and regex

Erick Erickson Fri, 08 Dec 2017 10:01:00 -0800

I think you're getting confused by seeing the _stored_ data rather
than the indexed data. When you return fields in documents, you get
the stored data which is a verbatim copy of the input, no analysis
done at all. To see what's in the index (and thus what would be
grouped on) look at:


adminUI>>analysis>>(your field) and put some sample values in and see
what the regex transformer does. NOTE: unclick the "verbose" box for
less clutter.
or
adminUI>>(select core)>>schema browser
or
termscomponent

If you require the stored value to be different, you have several choices
1> change it on the client side before ingestion
2> use one of field mutating classes

Most often, people don't bother storing the copyfield since the stored
value is available in the original, the copyField destination is just
used for things like you're interested in.

Best,
Erick

On Fri, Dec 8, 2017 at 8:56 AM, Bradley Belyeu
<bradley.belyeu@life.church> wrote:
> I’m struggling a bit getting a copy field & regex tokenizer to work like I 
> think it should…
> I have an open source project I’m just starting out with here: 
> https://github.com/youversion/solrcloud
> I have a uniqueKey field USFM defined as:
> <field name="usfm" type="string" indexed="true" required="true" stored="true" 
> />
> And a USFM will always be in the pattern of 3 characters followed by a period 
> followed by one or more digits followed by another period and finally one or 
> more digits.
> Optionally after the final digit there may be a hyphen and another digit.
> IE: JHN.3.16 or MAT.6.33-34
>
> I’m wanting to do a result grouping by the first three characters, period, & 
> digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 I would 
> want grouped together.
> So my thought was to define another field and then copy the USFM into it and 
> use the regex tokenizer defined as so:
>
>     <fieldType name="chapter" class="solr.TextField" positionIncrementGap="0">
>         <analyzer>
>             <tokenizer class="solr.PatternTokenizerFactory" 
> pattern="^(\w+\.\d+)\.\d+-*\d*$" group="1" />
>         </analyzer>
>     </fieldType>
>     <field name="chapter" type="chapter" indexed="true" required="true" 
> stored="true" />
>     <copyField source="usfm" dest="chapter" />
>
> BUT, when I import my data the entire USFM is being stored inside the chapter 
> field. And I get query results that look like:
>        {
>         "usfm":"MAT.10.1",
>         "chapter":"MAT.10.1",
>         "devo_keywords_en":"fear",
>         "_version_":1586184983451533312},
>       {
>         "usfm":"MAT.10.10",
>         "chapter":"MAT.10.10",
>         "devo_keywords_en":"fear",
>         "_version_":1586184983451533314},
>       {
>         "usfm":"MAT.10.11",
>         "chapter":"MAT.10.11",
>         "devo_keywords_en":"fear",
>         "_version_":1586184983451533316},
>       {
>         "usfm":"MAT.10.12",
>         "chapter":"MAT.10.12",
>         "devo_keywords_en":"fear",
>         "_version_":1586184983451533318}
>
> It’s probably something simple I’ve missed, but I’ve been banging my head for 
> long enough I thought I’d ask for help.
> Thanks in advance!

Re: Copy field and regex

Reply via email to