First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by Solr standards, so you may well be able to do exactly what you propose with acceptable performance/size. I'd advise just trying it with, say, 200,000 docs. Why 200K? because index growth is non-linear with the first bunch of documents taking up more space than the second. So index 100K, examine your indexes and index 100K more. Now use the delta to extrapolate to 4M.
You don't need to store the taxonomy in each doc for auto-complete, you can get your auto-completion from a different index. Or you can index your taxonomies in a "special" document in Solr and query the (unique) field in that document for autocomplete. For faceting, you do need taxonomies. But remember that the nature of the inverted index is that unique terms are only stored once, and the document ID for each document that that term appears in is recorded. So if you have 3/europe/germany/berlin stored in 1M documents, your index space is really <string length + overhead> + <space for 1M ids>. Best Erick On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine <dfonta...@rosebud.fr>wrote: > Yes, i am not obliged to store taxonomies. > > My taxonomies are type of > > english_taxon_label = Berlin > english_taxon_type = location > english_taxon_hierarchy = 0/world > 1/world/europe > 2/world/europe/germany > 3/world/europe/germany/berlin > > I need *_taxon_hierarchy to faceting and label to auto complete. > > With a RDBMs, i have 100 entry max for one taxo, but with solr and 4 > million documents the redundandcy is huge, no ? > > And i have 10 different taxonomies per document .... > > Damien > > Le 24/01/2011 10:30, Em a écrit : > > Hi Damien, >> >> why are you storing the taxonomies? >> When it comes to faceting, it only depends on indexed values. If there is >> a >> meaningful difference between the indexed and the stored value, I would >> prefer to use an RDBMs or something like that to reduce redundancy. >> >> Does this help? >> >> Regards >> > >