Re: Copy field and regex

2017-12-08 Thread Shawn Heisey
On 12/8/2017 1:03 PM, Erick Erickson wrote: Second, grouping works fine in distributed mode with a couple of restrictions, see the reference guide. Collapse/Expand (an alternative to standard grouping) requires that all the members of a group be on the same shard. In 5.x, distributed grouping s

Re: Copy field and regex

2017-12-08 Thread Erick Erickson
Grouping does _not_ require docValues, it's just that the with docValues=false, uninverted structure is built on the heap at run time. When docValues=true, the uninverted structure is written to disk at index time and MMapped into the OS's memory space rather than the Java heap. Second, grouping w

Re: Copy field and regex

2017-12-08 Thread Bradley Belyeu
Ah, thank you Erick & Shawn. That makes perfect sense. And yes when this goes to prod it will be distributed. Good point about docValues and needing a single shard, thanks! I’m new to result grouping, so I’m still prototyping that it will work for what I need. On 12/8/17, 12:00 PM, "Erick Erick

Re: Copy field and regex

2017-12-08 Thread Erick Erickson
I think you're getting confused by seeing the _stored_ data rather than the indexed data. When you return fields in documents, you get the stored data which is a verbatim copy of the input, no analysis done at all. To see what's in the index (and thus what would be grouped on) look at: adminUI>>an

Re: Copy field and regex

2017-12-08 Thread Shawn Heisey
On 12/8/2017 9:56 AM, Bradley Belyeu wrote: > I’m wanting to do a result grouping by the first three characters, period, & > digit(s). For example, docs with the unique keys JHN.3.16 & JHN.3.17 I would > want grouped together. > So my thought was to define another field and then copy the USFM int

Copy field and regex

2017-12-08 Thread Bradley Belyeu
I’m struggling a bit getting a copy field & regex tokenizer to work like I think it should… I have an open source project I’m just starting out with here: https://github.com/youversion/solrcloud I have a uniqueKey field USFM defined as: And a USFM will always be in the pattern of 3 characters fo