Simple design with _single_ valued fields:

Id                        Category                        Product
001                      TV                                SONY 12345
002                      Radio                             Panasonic 54321
003                      TV                                Toshiba ABCD
004                      Radio                             ABCD Z-54321


We have 4 documents with single-valued fields. It's not neccessary to store
'Category' field in index... Data is not 'normalized' from DBA's viewpoint,
but it is what Lucene needs...




Britske wrote:
> 
> no, I'm using dynamic fields, they've been around for a pretty long time. 
> I use int-values in the 10k fields for filtering and sorting. On top of
> that I use a lot of full-text filtering on the other fields, as well as
> faceting, etc. 
> 
> I do understand that, at first glance, it seems possible to use
> multivalued fields, but with multivalued fields it's not possible to
> pinpoint the exact value within the multivalued field that I need.
> Consider the case with 1 multi-valued field, category, as you called it,
> which would have at most 10k fields. The meaning of these values within
> the field are completely lost, although it is a requirement to fetch
> products (thus values in the multivalued field)  given a specific set of
> criteria. In other words, there is no way of getting a specific value from
> a multivalued field given a set of criteria.  Now, compare that with my
> current design in which these criteria pinpoint a specific field / column
> to use and the difference should be clear. 
> 
> regards,
> Britske
> 
> 
> Funtick wrote:
>> 
>> 
>> Yes, it should be extremely simple! I simply can't understand how you
>> describe it:
>> 
>> Britske wrote:
>>> 
>>> Rows in solr represent productcategories. I will have up to 100k of
>>> them. 
>>> 
>>> - Each product category can have 10k products each. These are encoded as
>>> the 10k columns / fields (all 10k fields are int values) 
>>>   
>>> - At any given at most 1 product per productcategory is returned,
>>> (analoguous to selecting 1 out of 10k columns). (This is the
>>> requirements that makes this scheme possible) 
>>> 
>>> -products in the same column have certain characteristics in common,
>>> which are encoded in the column name (using dynamic fields). So the
>>> combination of these characteristics uniquely determines 1 out of 10k
>>> columns. When the user hasn't supplied all characteristics good defaults
>>> for these characteristics can be chosen, so a column can always be
>>> determined. 
>>> 
>>> - on top of that each row has 20 productcategory-fields (which all
>>> possible 10k products of that category share). 
>>> 
>> 
>> 1. You can't really define 10.000 columns; you are probably using
>> multivalued field for that. (sorry if I am not familiar with
>> newest-greatest features of SOLR such as 'dynamic fields')
>> 
>> 2. You are trying to pass to Lucene 'normalized data'
>> - But it is indeed the job of Lucene, to normalize data!
>> 
>> 3. All 10k fields are int values!? Lucene is designed for full-text
>> search... are you trying to use Lucene instead of a database?
>> 
>> Sorry if I don't understand your design...
>> 
>> 
>> 
>> 
>> Britske wrote:
>>> 
>>> 
>>> 
>>> Funtick wrote:
>>>> 
>>>> 
>>>> Britske wrote:
>>>>> 
>>>>> - Rows in solr represent productcategories. I will have up to 100k of
>>>>> them. 
>>>>> - Each product category can have 10k products each. These are encoded
>>>>> as the 10k columns / fields (all 10k fields are int values) 
>>>>> 
>>>> 
>>>> You are using multivalued fields, you are not using 10k fields. And 10k
>>>> is huge.
>>>> 
>>>> Design is wrong... you should define two fileds only: <Category,
>>>> Product>. Lucene will do the rest.
>>>> 
>>>> -Fuad
>>>> 
>>> 
>>> ;-). Well I wish it was that simple. 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18757461.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to