I’m struggling a bit getting a copy field & regex tokenizer to work like I 
think it should…
I have an open source project I’m just starting out with here: 
https://github.com/youversion/solrcloud
I have a uniqueKey field USFM defined as:
<field name="usfm" type="string" indexed="true" required="true" stored="true" />
And a USFM will always be in the pattern of 3 characters followed by a period 
followed by one or more digits followed by another period and finally one or 
more digits.
Optionally after the final digit there may be a hyphen and another digit.
IE: JHN.3.16 or MAT.6.33-34

I’m wanting to do a result grouping by the first three characters, period, & 
digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 I would 
want grouped together.
So my thought was to define another field and then copy the USFM into it and 
use the regex tokenizer defined as so:

    <fieldType name="chapter" class="solr.TextField" positionIncrementGap="0">
        <analyzer>
            <tokenizer class="solr.PatternTokenizerFactory" 
pattern="^(\w+\.\d+)\.\d+-*\d*$" group="1" />
        </analyzer>
    </fieldType>
    <field name="chapter" type="chapter" indexed="true" required="true" 
stored="true" />
    <copyField source="usfm" dest="chapter" />

BUT, when I import my data the entire USFM is being stored inside the chapter 
field. And I get query results that look like:
       {
        "usfm":"MAT.10.1",
        "chapter":"MAT.10.1",
        "devo_keywords_en":"fear",
        "_version_":1586184983451533312},
      {
        "usfm":"MAT.10.10",
        "chapter":"MAT.10.10",
        "devo_keywords_en":"fear",
        "_version_":1586184983451533314},
      {
        "usfm":"MAT.10.11",
        "chapter":"MAT.10.11",
        "devo_keywords_en":"fear",
        "_version_":1586184983451533316},
      {
        "usfm":"MAT.10.12",
        "chapter":"MAT.10.12",
        "devo_keywords_en":"fear",
        "_version_":1586184983451533318}

It’s probably something simple I’ve missed, but I’ve been banging my head for 
long enough I thought I’d ask for help.
Thanks in advance!

Reply via email to