Another option would be to extend Solr (and donate back) to
incorporate Lucene's payload functionality, in which case you could
associate the percentile of the color as a payload and use the
BoostingTermQuery... :-) If you're interested in this, a discussion
on solr-dev is probably warranted to figure out the best way to do this.
-Grant
On Sep 28, 2007, at 9:23 AM, Yonik Seeley wrote:
If it were just a couple of colors, you could have a separate field
for each color and then index the percent in that field.
black:70
grey:20
and then you could use a function query to influence the score (or you
could sort by the color percent).
However, this doesn't scale well to a large index with a large
number of colors.
Each field used like that will take up 4 bytes per document in the
index.
so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes =
400MB
Doable depending on your index size (use "int" or "float" and not
"sint" or "sfloat" type for this... it will be better on the memory).
If you needed to be better on the memory, you could encode all of the
colors into a single value (perhaps into a compact string... one
percentile per byte or something) and then have a custom function that
extracts the value for a particular color. (this involves some java
development)
-Yonik
On 9/28/07, Guangwei Yuan <[EMAIL PROTECTED]> wrote:
Hi,
We're running an e-commerce site that provides product search.
We've been
able to extract colors from product images, and we think it'd be
cool and
useful to search products by color. A product image can have up to
5 colors
(from a color space of about 100 colors), so we can implement it
easily with
Solr's facet search (thanks all who've developed Solr).
The problem arises when we try to sort the results by the color
relevancy.
What's different from a normal facet search is that colors are
weighted. For
example, a black dress can have 70% of black, 20% of gray, 10% of
brown. A
search query "color:black" should return results in which the
black dress
ranks higher than other products with less percentage of black.
My question is: how to configure and index the color field so that
products
with higher percentage of color X ranks higher for query "color:X"?
Thanks for your help!
- Guangwei
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ