> > What I am envisioning (at least to start) is have all this add two fields in > the index. One would be for color information for the color similarity > search. The other would be a simple multivalued text field that we put > keywords into based on what OpenCV can detect about the image. If it > detects faces, we would put "face" into this field. Other things that it > can detect would result in other keywords. > > For the color search, I have a few inter-related hurdles. I've got to > figure out what form the color data actually takes and how to represent it > in Solr. I need Java code for Solr that can take an input color value and > find similar values in the index. Then I need some code that can go in our > feed processing scripts for new content. That code would also go into a > crawler script to handle existing images. >
You are on the right track. You can create a set of representative keywords from the image. OpenCV gets a color histogram from the image - you can set the bin values to be as granular as you need, and create a look-up list of color names to generate a MVF representative of the image. If you want to get more sophisticated, represent the colors with payloads in correlation with the distribution of the color in the image. Another approach would be to segment the image and extract colors from each. So if you have a red rose with all white background, the textual representation would be something like: white, white.......red.......white, white Play around and see which works best. HTH