I’m struggling a bit getting a copy field & regex tokenizer to work like I think it should… I have an open source project I’m just starting out with here: https://github.com/youversion/solrcloud I have a uniqueKey field USFM defined as: <field name="usfm" type="string" indexed="true" required="true" stored="true" /> And a USFM will always be in the pattern of 3 characters followed by a period followed by one or more digits followed by another period and finally one or more digits. Optionally after the final digit there may be a hyphen and another digit. IE: JHN.3.16 or MAT.6.33-34
I’m wanting to do a result grouping by the first three characters, period, & digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 I would want grouped together. So my thought was to define another field and then copy the USFM into it and use the regex tokenizer defined as so: <fieldType name="chapter" class="solr.TextField" positionIncrementGap="0"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="^(\w+\.\d+)\.\d+-*\d*$" group="1" /> </analyzer> </fieldType> <field name="chapter" type="chapter" indexed="true" required="true" stored="true" /> <copyField source="usfm" dest="chapter" /> BUT, when I import my data the entire USFM is being stored inside the chapter field. And I get query results that look like: { "usfm":"MAT.10.1", "chapter":"MAT.10.1", "devo_keywords_en":"fear", "_version_":1586184983451533312}, { "usfm":"MAT.10.10", "chapter":"MAT.10.10", "devo_keywords_en":"fear", "_version_":1586184983451533314}, { "usfm":"MAT.10.11", "chapter":"MAT.10.11", "devo_keywords_en":"fear", "_version_":1586184983451533316}, { "usfm":"MAT.10.12", "chapter":"MAT.10.12", "devo_keywords_en":"fear", "_version_":1586184983451533318} It’s probably something simple I’ve missed, but I’ve been banging my head for long enough I thought I’d ask for help. Thanks in advance!