Ah, thank you Erick & Shawn. That makes perfect sense. And yes when this goes 
to prod it will be distributed. Good point about docValues and needing a single 
shard, thanks!
I’m new to result grouping, so I’m still prototyping that it will work for what 
I need.

On 12/8/17, 12:00 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:

    I think you're getting confused by seeing the _stored_ data rather
    than the indexed data. When you return fields in documents, you get
    the stored data which is a verbatim copy of the input, no analysis
    done at all. To see what's in the index (and thus what would be
    grouped on) look at:
    
    adminUI>>analysis>>(your field) and put some sample values in and see
    what the regex transformer does. NOTE: unclick the "verbose" box for
    less clutter.
    or
    adminUI>>(select core)>>schema browser
    or
    termscomponent
    
    If you require the stored value to be different, you have several choices
    1> change it on the client side before ingestion
    2> use one of field mutating classes
    
    Most often, people don't bother storing the copyfield since the stored
    value is available in the original, the copyField destination is just
    used for things like you're interested in.
    
    Best,
    Erick
    
    On Fri, Dec 8, 2017 at 8:56 AM, Bradley Belyeu
    <bradley.belyeu@life.church> wrote:
    > I’m struggling a bit getting a copy field & regex tokenizer to work like 
I think it should…
    > I have an open source project I’m just starting out with here: 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyouversion%2Fsolrcloud&data=02%7C01%7Cbradley.belyeu%40life.church%7C1c830048a2f84986e57d08d53e659b6d%7C8c9a6ca9b4314964afb4b8e1a2ba636f%7C1%7C0%7C636483528492765542&sdata=ZWo4gQwKOa0wGo5%2B822bro2sxnEg9F5b7cNil%2F0pj4k%3D&reserved=0
    > I have a uniqueKey field USFM defined as:
    > <field name="usfm" type="string" indexed="true" required="true" 
stored="true" />
    > And a USFM will always be in the pattern of 3 characters followed by a 
period followed by one or more digits followed by another period and finally 
one or more digits.
    > Optionally after the final digit there may be a hyphen and another digit.
    > IE: JHN.3.16 or MAT.6.33-34
    >
    > I’m wanting to do a result grouping by the first three characters, 
period, & digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 
I would want grouped together.
    > So my thought was to define another field and then copy the USFM into it 
and use the regex tokenizer defined as so:
    >
    >     <fieldType name="chapter" class="solr.TextField" 
positionIncrementGap="0">
    >         <analyzer>
    >             <tokenizer class="solr.PatternTokenizerFactory" 
pattern="^(\w+\.\d+)\.\d+-*\d*$" group="1" />
    >         </analyzer>
    >     </fieldType>
    >     <field name="chapter" type="chapter" indexed="true" required="true" 
stored="true" />
    >     <copyField source="usfm" dest="chapter" />
    >
    > BUT, when I import my data the entire USFM is being stored inside the 
chapter field. And I get query results that look like:
    >        {
    >         "usfm":"MAT.10.1",
    >         "chapter":"MAT.10.1",
    >         "devo_keywords_en":"fear",
    >         "_version_":1586184983451533312},
    >       {
    >         "usfm":"MAT.10.10",
    >         "chapter":"MAT.10.10",
    >         "devo_keywords_en":"fear",
    >         "_version_":1586184983451533314},
    >       {
    >         "usfm":"MAT.10.11",
    >         "chapter":"MAT.10.11",
    >         "devo_keywords_en":"fear",
    >         "_version_":1586184983451533316},
    >       {
    >         "usfm":"MAT.10.12",
    >         "chapter":"MAT.10.12",
    >         "devo_keywords_en":"fear",
    >         "_version_":1586184983451533318}
    >
    > It’s probably something simple I’ve missed, but I’ve been banging my head 
for long enough I thought I’d ask for help.
    > Thanks in advance!
    

Reply via email to