There aren't any tables involved. There's basically one list (per field) of 
unique tokens for the entire index, and also, a list for each token of which 
documents contain that token. Which is efficiently encoded, but I don't know 
the details of that encoding, maybe someone who does can tell you, or you can 
look at the lucene source, or get one of the several good books on lucene.  
These 'lists' are set up so you can efficiently look up a token, and see what 
documents contain that token.  That's basically what lucene does, the purpose 
of lucene. Oh, and then there's term positions and such too, so not only can 
you see what documents contain that token but you can do proximity searches and 
stuff. 

This all gets into lucene implementation details I am not familiar with though. 
 

Why do you want to know?  If you have specific concerns about disk space or RAM 
usage or something and how different schema choices effect it, ask them, and 
someone can probably tell you more easily than someone can explain the total 
architecture of lucene in a short listserv message. But, hey, maybe someone 
other than me can do that too!
________________________________________
From: Dennis Gearon [gear...@sbcglobal.net]
Sent: Tuesday, January 25, 2011 7:02 PM
To: solr-user@lucene.apache.org
Subject: Re: in-index representaton of tokens

I am saying there is a list of tokens that have been parsed (a table of them)
for each column? Or one for the whole index?

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Jonathan Rochkind <rochk...@jhu.edu>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Sent: Tue, January 25, 2011 9:29:36 AM
Subject: Re: in-index representaton of tokens

Why does it matter?  You can't really get at them unless you store them.

I don't know what "table per column" means, there's nothing in Solr
architecture called a "table" or a "column". Although by column you
probably mean more or less Solr "field".  There is nothing like a
"table" in Solr.

Solr is still not an rdbms.

On 1/25/2011 12:26 PM, Dennis Gearon wrote:
> So, the index is a list of tokens per column, right?
>
> There's a table per column that lists the analyzed tokens?
>
> And the tokens per column are represented as what, system integers? 32/64 bit
> unsigned ints?
>
>   Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>

Reply via email to