Hi all! First of all: Solr is an amazing project. Big thanks to the community! I really appreciate the stability, and especially the pre-configured jetty example ;)
And now for the question: I'm currently on my way to writing a RequestHandler for Solr that deals with content based image search (using Lire https://code.google.com/p/lire/). In General everything is running fine, but ... As soon as I hit a virtual border, say 1.5 million images or a certain index size around 2GB, I'm experiencing performance drops. I know from my experience with Lucene and some profiling with Lucene that this can be caused by the compression of stored fields. I'm currently using binary fields to store byte[] objects, which are used after a hash based search for re-ranking. So based on the hashes a term query is issued in the request handler, then the 500-3000 documents (candidate results) are read from the index and the byte[] data is used to re-rank the candidate results. My question is now: Until now I just found ways to add single byte values as DocValues to the index, but not a whole binary fields. Do you have any idea where to start if I want to put my binary fields into DocValues? cheers, Mathias -- Dr. Mathias Lux Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-itec