Another option would be to extend Solr (and donate back) to incorporate Lucene's payload functionality, in which case you could associate the percentile of the color as a payload and use the BoostingTermQuery... :-) If you're interested in this, a discussion on solr-dev is probably warranted to figure out the best way to do this.

-Grant

On Sep 28, 2007, at 9:23 AM, Yonik Seeley wrote:

If it were just a couple of colors, you could have a separate field
for each color and then index the percent in that field.

black:70
grey:20

and then you could use a function query to influence the score (or you
could sort by the color percent).

However, this doesn't scale well to a large index with a large number of colors. Each field used like that will take up 4 bytes per document in the index.

so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes = 400MB
Doable depending on your index size (use "int" or "float" and not
"sint" or "sfloat" type for this... it will be better on the memory).

If you needed to be better on the memory, you could encode all of the
colors into a single value (perhaps into a compact string... one
percentile per byte or something) and then have a custom function that
extracts the value for a particular color.  (this involves some java
development)

-Yonik


On 9/28/07, Guangwei Yuan <[EMAIL PROTECTED]> wrote:
Hi,

We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with
Solr's facet search (thanks all who've developed Solr).

The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query "color:black" should return results in which the black dress
ranks higher than other products with less percentage of black.

My question is: how to configure and index the color field so that products
with higher percentage of color X ranks higher for query "color:X"?

Thanks for your help!

- Guangwei


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ


Reply via email to