If you copyField and don't store the copy, then it is only the indexed
(term) representation for the copy that is much smaller. Just a
thought.

The other thing is that you seem to be saying that you want to do a
match phrase but with a token gap, right? Like an eDisMax slop?
http://lucene.apache.org/solr/guide/7_4/the-extended-dismax-query-parser.html

Regards,
   Alex.

On 25 July 2018 at 14:47, David Hastings <hastings.recurs...@gmail.com> wrote:
> Hey all.  have a situation that seems pretty rough.  currently in our data
> we have a lot of sentences like this:
>
> elements comprise the "stuff" of the tax. 3 Reg. § 1.901-2(a)(2). 4 Only
> non-Saudis are subject to the
> <https://heinonline.org/HOL/SearchVolumeSOLR?input=(((%223%20Regulation%201%22%20OR%20%223%20Regulation%201%22%20OR%20%223%20Reg.%201%22)%20AND%20NOT%20id:hein.journals/rcatorbg3.14))&div=13&handle=hein.journals/taxlr53&collection=journals>
> By default the word delimiter is treating all punctuation as a space.  So
> when you search for:
> 3 Reg. 1, your results can include  3 Reg. § 1.901
>
> I Have experimented with the WDF and added § => ALPHA and this works, and
> treats the character as a letter.  however during some queries, I still
> need searches such as
>
> Servitudes 2.10
>
> to return results with:
>
>
> Servitudes § 2.10
>
>
> I at the moment can not conceive of a way to to this aside from two
> separate text fields, and effectively doubling the size of my index.
> which currently sits at 300 gb optimized, and 500gb if left to its
> own.
>
>
> Thanks for any help or suggestions

Reply via email to