Simple design with _single_ valued fields: Id Category Product 001 TV SONY 12345 002 Radio Panasonic 54321 003 TV Toshiba ABCD 004 Radio ABCD Z-54321
We have 4 documents with single-valued fields. It's not neccessary to store 'Category' field in index... Data is not 'normalized' from DBA's viewpoint, but it is what Lucene needs... Britske wrote: > > no, I'm using dynamic fields, they've been around for a pretty long time. > I use int-values in the 10k fields for filtering and sorting. On top of > that I use a lot of full-text filtering on the other fields, as well as > faceting, etc. > > I do understand that, at first glance, it seems possible to use > multivalued fields, but with multivalued fields it's not possible to > pinpoint the exact value within the multivalued field that I need. > Consider the case with 1 multi-valued field, category, as you called it, > which would have at most 10k fields. The meaning of these values within > the field are completely lost, although it is a requirement to fetch > products (thus values in the multivalued field) given a specific set of > criteria. In other words, there is no way of getting a specific value from > a multivalued field given a set of criteria. Now, compare that with my > current design in which these criteria pinpoint a specific field / column > to use and the difference should be clear. > > regards, > Britske > > > Funtick wrote: >> >> >> Yes, it should be extremely simple! I simply can't understand how you >> describe it: >> >> Britske wrote: >>> >>> Rows in solr represent productcategories. I will have up to 100k of >>> them. >>> >>> - Each product category can have 10k products each. These are encoded as >>> the 10k columns / fields (all 10k fields are int values) >>> >>> - At any given at most 1 product per productcategory is returned, >>> (analoguous to selecting 1 out of 10k columns). (This is the >>> requirements that makes this scheme possible) >>> >>> -products in the same column have certain characteristics in common, >>> which are encoded in the column name (using dynamic fields). So the >>> combination of these characteristics uniquely determines 1 out of 10k >>> columns. When the user hasn't supplied all characteristics good defaults >>> for these characteristics can be chosen, so a column can always be >>> determined. >>> >>> - on top of that each row has 20 productcategory-fields (which all >>> possible 10k products of that category share). >>> >> >> 1. You can't really define 10.000 columns; you are probably using >> multivalued field for that. (sorry if I am not familiar with >> newest-greatest features of SOLR such as 'dynamic fields') >> >> 2. You are trying to pass to Lucene 'normalized data' >> - But it is indeed the job of Lucene, to normalize data! >> >> 3. All 10k fields are int values!? Lucene is designed for full-text >> search... are you trying to use Lucene instead of a database? >> >> Sorry if I don't understand your design... >> >> >> >> >> Britske wrote: >>> >>> >>> >>> Funtick wrote: >>>> >>>> >>>> Britske wrote: >>>>> >>>>> - Rows in solr represent productcategories. I will have up to 100k of >>>>> them. >>>>> - Each product category can have 10k products each. These are encoded >>>>> as the 10k columns / fields (all 10k fields are int values) >>>>> >>>> >>>> You are using multivalued fields, you are not using 10k fields. And 10k >>>> is huge. >>>> >>>> Design is wrong... you should define two fileds only: <Category, >>>> Product>. Lucene will do the rest. >>>> >>>> -Fuad >>>> >>> >>> ;-). Well I wish it was that simple. >>> >> >> > > -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18757461.html Sent from the Solr - User mailing list archive at Nabble.com.