There aren't any tables involved. There's basically one list (per field) of unique tokens for the entire index, and also, a list for each token of which documents contain that token. Which is efficiently encoded, but I don't know the details of that encoding, maybe someone who does can tell you, or you can look at the lucene source, or get one of the several good books on lucene. These 'lists' are set up so you can efficiently look up a token, and see what documents contain that token. That's basically what lucene does, the purpose of lucene. Oh, and then there's term positions and such too, so not only can you see what documents contain that token but you can do proximity searches and stuff.
This all gets into lucene implementation details I am not familiar with though. Why do you want to know? If you have specific concerns about disk space or RAM usage or something and how different schema choices effect it, ask them, and someone can probably tell you more easily than someone can explain the total architecture of lucene in a short listserv message. But, hey, maybe someone other than me can do that too! ________________________________________ From: Dennis Gearon [gear...@sbcglobal.net] Sent: Tuesday, January 25, 2011 7:02 PM To: solr-user@lucene.apache.org Subject: Re: in-index representaton of tokens I am saying there is a list of tokens that have been parsed (a table of them) for each column? Or one for the whole index? Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. ----- Original Message ---- From: Jonathan Rochkind <rochk...@jhu.edu> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Sent: Tue, January 25, 2011 9:29:36 AM Subject: Re: in-index representaton of tokens Why does it matter? You can't really get at them unless you store them. I don't know what "table per column" means, there's nothing in Solr architecture called a "table" or a "column". Although by column you probably mean more or less Solr "field". There is nothing like a "table" in Solr. Solr is still not an rdbms. On 1/25/2011 12:26 PM, Dennis Gearon wrote: > So, the index is a list of tokens per column, right? > > There's a table per column that lists the analyzed tokens? > > And the tokens per column are represented as what, system integers? 32/64 bit > unsigned ints? > > Dennis Gearon > > > Signature Warning > ---------------- > It is always a good idea to learn from your own mistakes. It is usually a >better > idea to learn from others’ mistakes, so you do not have to make them yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. >