RE: Indexing Question for large dataset

Joshua Bouchair Wed, 13 Apr 2011 08:13:33 -0700

Don't know of any other way to organize the documents. We need to have the 
specific price that belongs to the user, so I don't think that the facets would 
be the issue. The facet querying would be modified to the corresponding price 
list field for that user. Let's say the customer belongs to priceList1500, I 
would use the price from that column (priceList1500) instead of the priceList1 
or even price column. Let me post an example data in another way.


INDEX FIELD | INDEX DATA
------------------------
ID          |    1       (INDEXED | STORED)
NAME        |    TEST    (INDEXED | STORED | MULTIVALUED)
PRICE       |    1.00    (INDEXED)
PRICELIST1  |    0.99    (INDEXED)
PRICELIST2  |    0.89    (INDEXED)
PRICELIST500|    0.85    (INDEXED)
------------------------
ID          |    2       (INDEXED | STORED)
NAME        |    TEST2   (INDEXED | STORED | MULTIVALUED)
PRICE       |    1.10    (INDEXED)
PRICELIST1  |    1.09    (INDEXED)
PRICELIST250|    1.05    (INDEXED)
PRICELIST600|    1.03    (INDEXED)

The price list correspond to customer contracts with the company for contracted 
item pricing. Is there a specific size limit to the amount of index columns 
SOLR/LUCENCE can handle? Is there a better way of handling this? Do you see an 
issue with ram from what I am stating here? Also, with the index so huge, let's 
say 5000 columns across per data set will that degrade search performance 
dramatically (note the search fields of course would not be for all those 
columns)?

Example Query:
q=name&fl=NAME,ID&facet=true&facet.field=PRICELIST500

Thanks,
Josh B.

-----Original Message-----
From: kenf_nc [mailto:ken.fos...@realestate.com] 
Sent: Wednesday, April 13, 2011 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing Question for large dataset

Indexing isn't a problem, it's just disk space and space is cheap. But, if
you do facets on all those price columns, that gets put into RAM which isn't
as cheap or plentiful. Your cache buffers may get overloaded a lot and
performance will suffer.

2000 price columns seems like a lot, could the documents be organized
differently? Hard to tell from your example.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-Question-for-large-dataset-tp2816344p2816377.html
Sent from the Solr - User mailing list archive at Nabble.com.
The recipient of this email should check this email and any attachments for the 
presence of viruses. 
The Wasserstrom Companies accepts no liability for any damage caused by any 
virus transmitted by this email.

This footnote also confirms that this email message has been scanned for the 
presence of computer viruses.

The Wasserstrom Companies

RE: Indexing Question for large dataset

Reply via email to