Ah, so I could index the text including the § character as an alpha, use no qs value when trying to ignore it, and for users add i a qs value assuming I use edismax, whic I currently am.
Tested this method and it works as expected. Thanks, saved me a lot of time! -David On Wed, Jul 25, 2018 at 3:15 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > If you copyField and don't store the copy, then it is only the indexed > (term) representation for the copy that is much smaller. Just a > thought. > > The other thing is that you seem to be saying that you want to do a > match phrase but with a token gap, right? Like an eDisMax slop? > http://lucene.apache.org/solr/guide/7_4/the-extended-dismax- > query-parser.html > > Regards, > Alex. > > On 25 July 2018 at 14:47, David Hastings <hastings.recurs...@gmail.com> > wrote: > > Hey all. have a situation that seems pretty rough. currently in our > data > > we have a lot of sentences like this: > > > > elements comprise the "stuff" of the tax. 3 Reg. § 1.901-2(a)(2). 4 Only > > non-Saudis are subject to the > > <https://heinonline.org/HOL/SearchVolumeSOLR?input=(((% > 223%20Regulation%201%22%20OR%20%223%20Regulation%201%22% > 20OR%20%223%20Reg.%201%22)%20AND%20NOT%20id:hein. > journals/rcatorbg3.14))&div=13&handle=hein.journals/ > taxlr53&collection=journals> > > By default the word delimiter is treating all punctuation as a space. So > > when you search for: > > 3 Reg. 1, your results can include 3 Reg. § 1.901 > > > > I Have experimented with the WDF and added § => ALPHA and this works, and > > treats the character as a letter. however during some queries, I still > > need searches such as > > > > Servitudes 2.10 > > > > to return results with: > > > > > > Servitudes § 2.10 > > > > > > I at the moment can not conceive of a way to to this aside from two > > separate text fields, and effectively doubling the size of my index. > > which currently sits at 300 gb optimized, and 500gb if left to its > > own. > > > > > > Thanks for any help or suggestions >