Thanks Shawn.  Actually now that I think about it,  Yonik also mentioned 
something about lucene number representation once in reply to one of my 
questions.  Here it is:
Could you also tell me what these `#8;#0;#0;#0;#1; strings represent in the 
debug output?

"That's internally how a number is encoded into a string (5 bytes, the first 
being binary 8, the next 0, etc.)  This is not representable in XML as � is 
illegal, hence we leave off the '&' so it's not a true character entity.  
-Yonik"

Hey I followed your link, and it had a link to this talk.  Did you see this 
example?
http://lucene.sourceforge.net/talks/pisa/

VInt Encoding Example (table was flattened during pasting):

Value

First byte

Second byte

Third byte

0

00000000



1

00000001



2

00000010



...




127

01111111



128

10000000

00000001



129

10000001

00000001


130

10000010

00000001


...




16,383

11111111

01111111


16,384

10000000

10000000

00000001

16,385

10000001

10000000

00000001

...



-----Original Message-----
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, January 30, 2013 5:28 PM
Cc: solr-user@lucene.apache.org
Subject: Re: field space consumption - stored vs not stored

On 1/30/2013 6:24 PM, Shawn Heisey wrote:
> If I had to guess about the extra space required for storing an int 
> field, I would say it's in the neighborhood of 20 bytes per document, 
> perhaps less.  I am also interested in a definitive answer.

The answer is very likely less than 20 bytes per doc.  I was assuming a larger 
size for VInt than it is likely to use.  See the answer for this
question:

http://stackoverflow.com/questions/2752612/what-is-the-vint-in-lucene

Thanks,
Shawn



Reply via email to