If you copyField and don't store the copy, then it is only the indexed (term) representation for the copy that is much smaller. Just a thought.
The other thing is that you seem to be saying that you want to do a match phrase but with a token gap, right? Like an eDisMax slop? http://lucene.apache.org/solr/guide/7_4/the-extended-dismax-query-parser.html Regards, Alex. On 25 July 2018 at 14:47, David Hastings <hastings.recurs...@gmail.com> wrote: > Hey all. have a situation that seems pretty rough. currently in our data > we have a lot of sentences like this: > > elements comprise the "stuff" of the tax. 3 Reg. § 1.901-2(a)(2). 4 Only > non-Saudis are subject to the > <https://heinonline.org/HOL/SearchVolumeSOLR?input=(((%223%20Regulation%201%22%20OR%20%223%20Regulation%201%22%20OR%20%223%20Reg.%201%22)%20AND%20NOT%20id:hein.journals/rcatorbg3.14))&div=13&handle=hein.journals/taxlr53&collection=journals> > By default the word delimiter is treating all punctuation as a space. So > when you search for: > 3 Reg. 1, your results can include 3 Reg. § 1.901 > > I Have experimented with the WDF and added § => ALPHA and this works, and > treats the character as a letter. however during some queries, I still > need searches such as > > Servitudes 2.10 > > to return results with: > > > Servitudes § 2.10 > > > I at the moment can not conceive of a way to to this aside from two > separate text fields, and effectively doubling the size of my index. > which currently sits at 300 gb optimized, and 500gb if left to its > own. > > > Thanks for any help or suggestions