On Mar 5, 2009, at 1:07 PM, Suryasnat Das wrote:
I have some queries on SOLR fo which i need immediate resolution. A
fast
help would be greatly appreciated.
a.) We know that fields are also indexed. So can we index some
specific
fields(like author, id, etc) first and then do the indexing for rest
of the
fields(like creation date etc) at a later time.
You have to reindex the entire document in order to add fields to it,
but you certainly can do so at any time. In other words, you can just
add fields to an existing document without sending in all the fields
you want on that document.
b.) SOLR returns the whole text content of a file during a search
operation.
So how can we extract a portion of the whole content? I mean a
snippet of
the content containing that search keyword. Sample code would be of
great
help.
Use Solr's highlighting capabilities: <http://wiki.apache.org/solr/HighlightingParameters
>
c.) What is multi core indexing?
Separate Solr/Lucene indexes, that all are served from a single
instance of Solr.
d.) What is the number of index files that are normally created in a
index
operation?
Depends on the number of fields, and how you have the index
configuration set. If file handles ever become a problem you can set
it to use the compound file format, but in practice I've never seen it
be a problem.
What will be the expected number of index files when i index a 4
tera byte of filedata and what will be the index size for all the
index
files? If anybody has worked nsuch huge volume of data then some
pointers
would be of great help.
The rule of thumb is that a Lucene index is roughly 35% the size of
the original text, assuming you are not storing the fields in Lucene,
but only indexing it.
Erik