Thanks Eric. I don't want to disable the phrase searches option. I just wonder if there is any way I can find terms within index, and thought the pos file analysis may be a direction. I suspect that our index is full of long float numbers (i.e: 1234.4546786585899544) which may be unnecessary. Before I make any changes in our index process (like drop such terms), I want to prove my suspicion. I can make a search using regex in order to find how many _documents_ contains those terms, but I would like to know how many such _terms_ (unique or total) are indexed. Is there a way to do it? Maybe with luke?
-----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, June 28, 2016 8:27 AM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Positions files analysis Positions are necessary if you need to do "phrase searches". If that's not necessary, simply turn that option off in your schema for the fields where it's unnecessary. See the reference guide for termVectors termPositions termOffsets I'm really not sure what you're asking by: "Is there a way I can read/analyze index files as .pos?" The various file extensions are a result of the options you define on your fields, that's just the way Lucene works... Best, Erick On Mon, Jun 27, 2016 at 7:25 AM, asteiner <astei...@varonis.com> wrote: > Hi > > I have a very large index and I'd like to see how can I reduce it. > Some of the largest files in the index are the .pos files (positions). > There are many excel files indexed with formulas, so I suspect that a > large part of the index is used by junk terms as very long numbers. > Is there a way I can read/analyze index files as .pos? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Positions-files-analysis-tp4284485. > html Sent from the Solr - User mailing list archive at Nabble.com. ________________________________ This email and any attachments thereto may contain private, confidential, and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.