Thanks Eric.
I don't want to disable the phrase searches option.
I just wonder if there is any way I can find terms within index, and thought 
the pos file analysis may be a direction.
I suspect that our index is full of long float numbers (i.e: 
1234.4546786585899544) which may be unnecessary.  Before I make any changes in 
our index process (like drop such terms), I want to prove my suspicion.
I can make a search using regex in order to find how many _documents_ contains 
those terms, but I would like to know how many such _terms_ (unique or total) 
are indexed. Is there a way to do it? Maybe with luke?


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, June 28, 2016 8:27 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Positions files analysis

Positions are necessary if you need to do "phrase searches".
If that's not necessary, simply turn that option off in your schema for the 
fields where it's unnecessary. See the reference guide for termVectors 
termPositions termOffsets

I'm really not sure what you're asking by:
"Is there a way I can read/analyze index files as .pos?"

The various file extensions are a result of the options you define on your 
fields, that's just the way Lucene works...

Best,
Erick

On Mon, Jun 27, 2016 at 7:25 AM, asteiner <astei...@varonis.com> wrote:
> Hi
>
> I have a very large index and I'd like to see how can I reduce it.
> Some of the largest files in the index are the .pos files (positions).
> There are many excel files indexed with formulas, so I suspect that a
> large part of the index is used by junk terms as very long numbers.
> Is there a way I can read/analyze index files as .pos?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Positions-files-analysis-tp4284485.
> html Sent from the Solr - User mailing list archive at Nabble.com.
________________________________
This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.

Reply via email to