Hey all. have a situation that seems pretty rough. currently in our data we have a lot of sentences like this:
elements comprise the "stuff" of the tax. 3 Reg. § 1.901-2(a)(2). 4 Only non-Saudis are subject to the <https://heinonline.org/HOL/SearchVolumeSOLR?input=(((%223%20Regulation%201%22%20OR%20%223%20Regulation%201%22%20OR%20%223%20Reg.%201%22)%20AND%20NOT%20id:hein.journals/rcatorbg3.14))&div=13&handle=hein.journals/taxlr53&collection=journals> By default the word delimiter is treating all punctuation as a space. So when you search for: 3 Reg. 1, your results can include 3 Reg. § 1.901 I Have experimented with the WDF and added § => ALPHA and this works, and treats the character as a letter. however during some queries, I still need searches such as Servitudes 2.10 to return results with: Servitudes § 2.10 I at the moment can not conceive of a way to to this aside from two separate text fields, and effectively doubling the size of my index. which currently sits at 300 gb optimized, and 500gb if left to its own. Thanks for any help or suggestions