Thanks for the feedback. What I am trying to do is to "abuse" integers to store 8bit (or even lower) values of metrics I use for content-based image/video search (such as statistical values regarding color distribution) and then implement similarity calculations based on formulas using vector distances. The Index can become large (tens of millions of documents each with say 50-100 integers describing the image metrics). I am looking at using a part of those metrics for selecting a subset of images using range queries and then more for sorting the result set by relevance.
I was first looking at implementing those metrics as binary fields (see other posting) and then use a custom function for the distance calculation but so far I got the impression that way is not supported really well by Solr. Base64-En/Decoding would kill performance and implementing a custom field type with all that is probably required for that to work properly is currently beyond my Solr knowledge. Besides, using built-in Solr features makes it easier to finetune/experiment with different approaches, because I can just play around with different queries and see what works best, without each time adjusting a custom function. I hope that provides a better picture of what I am trying to achieve. Best, Robert On Fri, Oct 16, 2015 at 4:50 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Under the covers, Lucene stores ints in a packed format, so I'd just count > on that for a first pass. > > What is "a lot of integer values"? Hundreds of millions? Billions? > Trillions? > > Unless you give us some indication of scale, it's hard to say anything > helpful. But unless you have some evidence that your going to blow out > memory I'd just ignore the "wasted" bits. Especially if you can use > docValues, > that option holds much of the underlying data in MMapDirectory > that uses swappable OS memory.... > > Best, > Erick > > On Fri, Oct 16, 2015 at 1:53 AM, Robert Krüger <krue...@lesspain.de> > wrote: > > Hi, > > > > I have a data model where I would store and index a lot of integer values > > with a very restricted range (e.g. 0-255), so theoretically the 32 bits > of > > Solr's integer fields are complete overkill. I want to be able to to > things > > like vector distance calculations on those fields. Should I worry about > the > > "wasted" bits or will Solr compress/organize the index in a way that > > compensates for this if there are only 256 (or even fewer) distinct > values? > > > > Any recommendations on how my fields should be defined to make things > like > > numeric functions work as fast as technically possible? > > > > Thanks in advance, > > > > Robert > -- Robert Krüger Managing Partner Lesspain GmbH & Co. KG www.lesspain-software.com