Mikhail,
Yeah, I considered that originally, but then after analyzing the data
noticed that was not possible. Some of the content we analyze contains
large tables that after ocr get turned into long running sentences which
contain 500k+ words per a sentence. Overall there are probably around 10k
o
Mike,
When Lucene's Analyser indexes the text it adds positions into the index
which are lately used by SpanQueries. Have you considered idea of position
increment gap? e.g. the first sentence is indexed with words positions:
0,1,2,3,... the second sentence with 100,101,102,103,..., third
200,201,