Thanks Shawn. Actually now that I think about it, Yonik also mentioned something about lucene number representation once in reply to one of my questions. Here it is: Could you also tell me what these `#8;#0;#0;#0;#1; strings represent in the debug output?
"That's internally how a number is encoded into a string (5 bytes, the first being binary 8, the next 0, etc.) This is not representable in XML as � is illegal, hence we leave off the '&' so it's not a true character entity. -Yonik" Hey I followed your link, and it had a link to this talk. Did you see this example? http://lucene.sourceforge.net/talks/pisa/ VInt Encoding Example (table was flattened during pasting): Value First byte Second byte Third byte 0 00000000 1 00000001 2 00000010 ... 127 01111111 128 10000000 00000001 129 10000001 00000001 130 10000010 00000001 ... 16,383 11111111 01111111 16,384 10000000 10000000 00000001 16,385 10000001 10000000 00000001 ... -----Original Message----- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, January 30, 2013 5:28 PM Cc: solr-user@lucene.apache.org Subject: Re: field space consumption - stored vs not stored On 1/30/2013 6:24 PM, Shawn Heisey wrote: > If I had to guess about the extra space required for storing an int > field, I would say it's in the neighborhood of 20 bytes per document, > perhaps less. I am also interested in a definitive answer. The answer is very likely less than 20 bytes per doc. I was assuming a larger size for VInt than it is likely to use. See the answer for this question: http://stackoverflow.com/questions/2752612/what-is-the-vint-in-lucene Thanks, Shawn