Thanks Em and Erick for your answers,

Now, i better understand functioning of Solr.

Damien

Le 24/01/2011 16:23, Erick Erickson a écrit :
First, the redundancy is certainly there, but that's what Solr does, handles
large
amounts of data. 4 million documents is actually a pretty small corpus by
Solr
standards, so you may well be able to do exactly what you propose with
acceptable performance/size. I'd advise just trying it with, say, 200,000
docs.
Why 200K? because index growth is non-linear with the first bunch of
documents
taking up more space than the second. So index 100K, examine your indexes
and index 100K more. Now use the delta to extrapolate to 4M.

You don't need to store the taxonomy in each doc for auto-complete, you can
get your auto-completion from a different index. Or you can index your
taxonomies
in a "special" document in Solr and query the (unique) field in that
document for
autocomplete.

For faceting, you do need taxonomies. But remember that the nature of the
inverted index is that unique terms are only stored once, and the document
ID for each document that that term appears in is recorded. So if you have
3/europe/germany/berlin stored in 1M documents, your index space is really
<string length + overhead>  +<space for 1M ids>.

Best
Erick

On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine<dfonta...@rosebud.fr>wrote:

Yes, i am not obliged to store taxonomies.

My taxonomies are type of

english_taxon_label = Berlin
english_taxon_type = location
english_taxon_hierarchy = 0/world
                                              1/world/europe
                                              2/world/europe/germany
                                              3/world/europe/germany/berlin

I need *_taxon_hierarchy to faceting and label to auto complete.

With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
million documents the redundandcy is huge, no ?

And i have 10 different taxonomies per document ....

Damien

Le 24/01/2011 10:30, Em a écrit :

  Hi Damien,
why are you storing the taxonomies?
When it comes to faceting, it only depends on indexed values. If there is
a
meaningful difference between the indexed and the stored value, I would
prefer to use an RDBMs or something like that to reduce redundancy.

Does this help?

Regards



Reply via email to