I think you're getting confused by seeing the _stored_ data rather than the indexed data. When you return fields in documents, you get the stored data which is a verbatim copy of the input, no analysis done at all. To see what's in the index (and thus what would be grouped on) look at:
adminUI>>analysis>>(your field) and put some sample values in and see what the regex transformer does. NOTE: unclick the "verbose" box for less clutter. or adminUI>>(select core)>>schema browser or termscomponent If you require the stored value to be different, you have several choices 1> change it on the client side before ingestion 2> use one of field mutating classes Most often, people don't bother storing the copyfield since the stored value is available in the original, the copyField destination is just used for things like you're interested in. Best, Erick On Fri, Dec 8, 2017 at 8:56 AM, Bradley Belyeu <bradley.belyeu@life.church> wrote: > I’m struggling a bit getting a copy field & regex tokenizer to work like I > think it should… > I have an open source project I’m just starting out with here: > https://github.com/youversion/solrcloud > I have a uniqueKey field USFM defined as: > <field name="usfm" type="string" indexed="true" required="true" stored="true" > /> > And a USFM will always be in the pattern of 3 characters followed by a period > followed by one or more digits followed by another period and finally one or > more digits. > Optionally after the final digit there may be a hyphen and another digit. > IE: JHN.3.16 or MAT.6.33-34 > > I’m wanting to do a result grouping by the first three characters, period, & > digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 I would > want grouped together. > So my thought was to define another field and then copy the USFM into it and > use the regex tokenizer defined as so: > > <fieldType name="chapter" class="solr.TextField" positionIncrementGap="0"> > <analyzer> > <tokenizer class="solr.PatternTokenizerFactory" > pattern="^(\w+\.\d+)\.\d+-*\d*$" group="1" /> > </analyzer> > </fieldType> > <field name="chapter" type="chapter" indexed="true" required="true" > stored="true" /> > <copyField source="usfm" dest="chapter" /> > > BUT, when I import my data the entire USFM is being stored inside the chapter > field. And I get query results that look like: > { > "usfm":"MAT.10.1", > "chapter":"MAT.10.1", > "devo_keywords_en":"fear", > "_version_":1586184983451533312}, > { > "usfm":"MAT.10.10", > "chapter":"MAT.10.10", > "devo_keywords_en":"fear", > "_version_":1586184983451533314}, > { > "usfm":"MAT.10.11", > "chapter":"MAT.10.11", > "devo_keywords_en":"fear", > "_version_":1586184983451533316}, > { > "usfm":"MAT.10.12", > "chapter":"MAT.10.12", > "devo_keywords_en":"fear", > "_version_":1586184983451533318} > > It’s probably something simple I’ve missed, but I’ve been banging my head for > long enough I thought I’d ask for help. > Thanks in advance!