Re: Efficiency of integer storage/use

Mikhail Khludnev Sun, 18 Oct 2015 12:09:40 -0700

Robert,
>From what I know as inverted index as docvalues compress content much, even
stored fields compressed too. So, I think you have much chance to
experiment successfully. You might need tweak schema disabling storing
unnecessary info in the index.


On Sat, Oct 17, 2015 at 1:15 AM, Robert Krüger <krue...@lesspain.de> wrote:

> Thanks for the feedback.
>
> What I am trying to do is to "abuse" integers to store 8bit (or even lower)
> values of metrics I use for content-based image/video search (such as
> statistical values regarding color distribution) and then implement
> similarity calculations based on formulas using vector distances. The Index
> can become large (tens of millions of documents each with say 50-100
> integers  describing the image metrics). I am looking at using a part of
> those metrics for selecting a subset of images using range queries and then
> more for sorting the result set by relevance.
>
> I was first looking at implementing those metrics as binary fields (see
> other posting) and then use a custom function for the distance calculation
> but so far I got the impression that way is not supported really well by
> Solr. Base64-En/Decoding would kill performance and implementing a custom
> field type with all that is probably required for that to work properly is
> currently beyond my Solr knowledge. Besides, using built-in Solr features
> makes it easier to finetune/experiment with different approaches, because I
> can just play around with different queries and see what works best,
> without each time adjusting a custom function.
>
> I hope that provides a better picture of what I am trying to achieve.
>
> Best,
>
> Robert
>
> On Fri, Oct 16, 2015 at 4:50 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > Under the covers, Lucene stores ints in a packed format, so I'd just
> count
> > on that for a first pass.
> >
> > What is "a lot of integer values"? Hundreds of millions? Billions?
> > Trillions?
> >
> > Unless you give us some indication of scale, it's hard to say anything
> > helpful. But unless you have some evidence that your going to blow out
> > memory I'd just ignore the "wasted" bits. Especially if you can use
> > docValues,
> > that option holds much of the underlying data in MMapDirectory
> > that uses swappable OS memory....
> >
> > Best,
> > Erick
> >
> > On Fri, Oct 16, 2015 at 1:53 AM, Robert Krüger <krue...@lesspain.de>
> > wrote:
> > > Hi,
> > >
> > > I have a data model where I would store and index a lot of integer
> values
> > > with a very restricted range (e.g. 0-255), so theoretically the 32 bits
> > of
> > > Solr's integer fields are complete overkill. I want to be able to to
> > things
> > > like vector distance calculations on those fields. Should I worry about
> > the
> > > "wasted" bits or will Solr compress/organize the index in a way that
> > > compensates for this if there are only 256 (or even fewer) distinct
> > values?
> > >
> > > Any recommendations on how my fields should be defined to make things
> > like
> > > numeric functions work as fast as technically possible?
> > >
> > > Thanks in advance,
> > >
> > > Robert
> >
>
>
>
> --
> Robert Krüger
> Managing Partner
> Lesspain GmbH & Co. KG
>
> www.lesspain-software.com
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Re: Efficiency of integer storage/use

Reply via email to