Hi Erick,

in some usecases I really think that your suggestion with some
unique-documents for meta-information is a good approach to solve some
issues.
However there is a hurdle for me and maybe you can help me to clear it:

What is the best way to get such meta-data?
I see three possible approaches:
1st: get it in another request
2nd: get it with a requestHandler
3rd: get it with a searchComponent

I think the 2nd and 3rd are the cleanest ways.
But to make a decision between them I run into two problems:
RequestHandler: Should I extend the StandardRequestHandler to do what I
need? If so, I could just query my index for the needed information and add
it to the request before I pass it up the SearchComponents.

SearchComponent: The problem with the SearchComponent is the distributed
thing and how to test it. However, if this would be the cleanest way to go,
one should go it.

What would you do, if you want to add some meta-information to your request
that was not given by the user?

Regards,
Em


Erick Erickson wrote:
> 
> First, the redundancy is certainly there, but that's what Solr does,
> handles
> large
> amounts of data. 4 million documents is actually a pretty small corpus by
> Solr
> standards, so you may well be able to do exactly what you propose with
> acceptable performance/size. I'd advise just trying it with, say, 200,000
> docs.
> Why 200K? because index growth is non-linear with the first bunch of
> documents
> taking up more space than the second. So index 100K, examine your indexes
> and index 100K more. Now use the delta to extrapolate to 4M.
> 
> You don't need to store the taxonomy in each doc for auto-complete, you
> can
> get your auto-completion from a different index. Or you can index your
> taxonomies
> in a "special" document in Solr and query the (unique) field in that
> document for
> autocomplete.
> 
> For faceting, you do need taxonomies. But remember that the nature of the
> inverted index is that unique terms are only stored once, and the document
> ID for each document that that term appears in is recorded. So if you have
> 3/europe/germany/berlin stored in 1M documents, your index space is really
> <string length + overhead> + <space for 1M ids>.
> 
> Best
> Erick
> 
> On Mon, Jan 24, 2011 at 4:53 AM, Damien Fontaine
> <dfonta...@rosebud.fr>wrote:
> 
>> Yes, i am not obliged to store taxonomies.
>>
>> My taxonomies are type of
>>
>> english_taxon_label = Berlin
>> english_taxon_type = location
>> english_taxon_hierarchy = 0/world
>>                                              1/world/europe
>>                                              2/world/europe/germany
>>                                             
>> 3/world/europe/germany/berlin
>>
>> I need *_taxon_hierarchy to faceting and label to auto complete.
>>
>> With a RDBMs, i have 100 entry max for one taxo, but with solr and 4
>> million documents the redundandcy is huge, no ?
>>
>> And i have 10 different taxonomies per document ....
>>
>> Damien
>>
>> Le 24/01/2011 10:30, Em a écrit :
>>
>>  Hi Damien,
>>>
>>> why are you storing the taxonomies?
>>> When it comes to faceting, it only depends on indexed values. If there
>>> is
>>> a
>>> meaningful difference between the indexed and the stored value, I would
>>> prefer to use an RDBMs or something like that to reduce redundancy.
>>>
>>> Does this help?
>>>
>>> Regards
>>>
>>
>>
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2320666.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to