Hi all!

First of all: Solr is an amazing project. Big thanks to the community!
I really appreciate the stability, and especially the pre-configured
jetty example ;)

And now for the question: I'm currently on my way to writing a
RequestHandler for Solr that deals with content based image search
(using Lire https://code.google.com/p/lire/). In General everything is
running fine, but ...

As soon as I hit a virtual border, say 1.5 million images or a certain
index size around 2GB, I'm experiencing performance drops. I know from
my experience with Lucene and some profiling with Lucene that this can
be caused by the compression of stored fields. I'm currently using
binary fields to store byte[] objects, which are used after a hash
based search for re-ranking. So based on the hashes a term query is
issued in the request handler, then the 500-3000 documents (candidate
results) are read from the index and the byte[] data is used to
re-rank the candidate results.

My question is now: Until now I just found ways to add single byte
values as DocValues to the index, but not a whole binary fields. Do
you have any idea where to start if I want to put my binary fields
into DocValues?

cheers,
  Mathias

-- 
Dr. Mathias Lux
Assistant Professor, Klagenfurt University, Austria
http://tinyurl.com/mlux-itec

Reply via email to