Roman, It's covered in http://wiki.apache.org/solr/ContentStream | For POST requests where the content-type is not "application/x-www-form-urlencoded", the raw POST body is passed as a stream.
So, there is no need for encoding of binary data inside the body. Regarding encoding, I have a positive experience of passing such ids encoded by vInt, but they need to be presorted. On Tue, Jul 2, 2013 at 10:46 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > Hello Mikhail, > > Yes, GET is limited, but POST is not - so I just wanted that it works in > both the same way. But I am not sure if I am understanding your question > completely. Could you elaborate on the parameters/body part? Is there no > need for encoding of binary data inside the body? Or do you mean it is > treated as a string? Or is it just a bytestream and other parameters are > seen as string? > > On a general note: my main concern was to send many ids fast, if we use > ints (32bit), in one MB, one can fit ~250K, with bitset 33 times more (sb > check numbers please :)). But certainly, if the bitset is sparse or the > collection of ids just a 'a few thousands', stream of ints/longs will be > smaller, better to use. > > roman > > > > On Tue, Jul 2, 2013 at 2:00 PM, Mikhail Khludnev < > mkhlud...@griddynamics.com > > wrote: > > > Hello Roman, > > > > Don't you consider to pass long id sequence as body and access internally > > in solr as a content stream? It makes base64 compression not necessary. > > AFAIK url length is limited somehow, anyway. > > > > > > On Tue, Jul 2, 2013 at 9:32 PM, Roman Chyla <roman.ch...@gmail.com> > wrote: > > > > > Wrong link to the parser, should be: > > > > > > > > > https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/search/BitSetQParserPlugin.java > > > > > > > > > On Tue, Jul 2, 2013 at 1:25 PM, Roman Chyla <roman.ch...@gmail.com> > > wrote: > > > > > > > Hello @, > > > > > > > > This thread 'kicked' me into finishing som long-past task of > > > > sending/receiving large boolean (bitset) filter. We have been using > > > bitsets > > > > with solr before, but now I sat down and wrote it as a qparser. The > use > > > > cases, as you have discussed are: > > > > > > > > - necessity to send loooong list of ids as a query (where it is not > > > > possible to do it the 'normal' way) > > > > - or filtering ACLs > > > > > > > > > > > > It works in the following way: > > > > > > > > - external application constructs bitset and sends it as a query to > > > solr > > > > (q or fq, depends on your needs) > > > > - solr unpacks the bitset (translated bits into lucene ids, if > > > > necessary), and wraps this into a query which then has the easy job > of > > > > 'filtering' wanted/unwanted items > > > > > > > > Therefore it is good only if you can search against something that is > > > > indexed as integer (id's often are). > > > > > > > > A simple benchmark shows acceptable performance, to send the bitset > > > > (randomly populated, 10M, with 4M bits set), it takes 110ms > (25+64+20) > > > > > > > > To decode this string (resulting byte size 1.5Mb!) it takes ~90ms > > > > (5+14+68ms) > > > > > > > > But I haven't tested latency of sending it over the network and the > > query > > > > performance, but since the query is very similar as MatchAllDocs, it > is > > > > probably very fast (and I know that sending many Mbs to Solr is fast > as > > > > well) > > > > > > > > I know this is not exactly 'standard' solution, and it is probably > not > > > > something you want to see with hundreds of millions of docs, but > people > > > > seem to be doing 'not the right thing' all the time;) > > > > So if you think this is something useful for the community, please > let > > me > > > > know. If somebody would be willing to test it, i can file a JIRA > > ticket. > > > > > > > > Thanks! > > > > > > > > Roman > > > > > > > > > > > > The code, if no JIRA is needed, can be found here: > > > > > > > > > > > > > > https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/search/AdsQParserPlugin.java > > > > > > > > > > > > > > https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/apache/solr/search/TestBitSetQParserPlugin.java > > > > > > > > 839ms. run > > > > 154ms. Building random bitset indexSize=10000000 fill=0.5 -- > > > > Size=15054208,cardinality=3934477 highestBit=9999999 > > > > 25ms. Converting bitset to byte array -- resulting array > > length=1250000 > > > > 20ms. Encoding byte array into base64 -- resulting array > > length=1666668 > > > > ratio=1.3333344 > > > > 62ms. Compressing byte array with GZIP -- resulting array > > > > length=1218602 ratio=0.9748816 > > > > 20ms. Encoding gzipped byte array into base64 -- resulting string > > > > length=1624804 ratio=1.2998432 > > > > 5ms. Decoding gzipped byte array from base64 > > > > 14ms. Uncompressing decoded byte array > > > > 68ms. Converting from byte array to bitset > > > > 743ms. running > > > > > > > > > > > > On Tue, Jun 18, 2013 at 3:51 PM, Erick Erickson < > > erickerick...@gmail.com > > > >wrote: > > > > > > > >> Not necessarily. If the auth tokens are available on some > > > >> other system (DB, LDAP, whatever), one could get them > > > >> in the PostFilter and cache them somewhere since, > > > >> presumably, they wouldn't be changing all that often. Or > > > >> use a UserCache and get notified whenever a new searcher > > > >> was opened and regenerate or purge the cache. > > > >> > > > >> Of course you're right if the post filter does NOT have > > > >> access to the source of truth for the user's privileges. > > > >> > > > >> FWIW, > > > >> Erick > > > >> > > > >> On Tue, Jun 18, 2013 at 8:54 AM, Otis Gospodnetic > > > >> <otis.gospodne...@gmail.com> wrote: > > > >> > Hi, > > > >> > > > > >> > The unfortunate thing about this is what you still have to *pass* > > that > > > >> > filter from the client to the server every time you want to use > that > > > >> > filter. If that filter is big/long, passing that in all the time > > has > > > >> > some price that could be eliminated by using "server-side named > > > >> > filters". > > > >> > > > > >> > Otis > > > >> > -- > > > >> > Solr & ElasticSearch Support > > > >> > http://sematext.com/ > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > On Tue, Jun 18, 2013 at 8:16 AM, Erick Erickson < > > > >> erickerick...@gmail.com> wrote: > > > >> >> You might consider "post filters". The idea > > > >> >> is to write a custom filter that gets applied > > > >> >> after all other filters etc. One use-case > > > >> >> here is exactly ACL lists, and can be quite > > > >> >> helpful if you're not doing *:* type queries. > > > >> >> > > > >> >> Best > > > >> >> Erick > > > >> >> > > > >> >> On Mon, Jun 17, 2013 at 5:12 PM, Otis Gospodnetic > > > >> >> <otis.gospodne...@gmail.com> wrote: > > > >> >>> Btw. ElasticSearch has a nice feature here. Not sure what it's > > > >> >>> called, but I call it "named filter". > > > >> >>> > > > >> >>> http://www.elasticsearch.org/blog/terms-filter-lookup/ > > > >> >>> > > > >> >>> Maybe that's what OP was after? > > > >> >>> > > > >> >>> Otis > > > >> >>> -- > > > >> >>> Solr & ElasticSearch Support > > > >> >>> http://sematext.com/ > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> On Mon, Jun 17, 2013 at 4:59 PM, Alexandre Rafalovitch > > > >> >>> <arafa...@gmail.com> wrote: > > > >> >>>> On Mon, Jun 17, 2013 at 12:35 PM, Igor Kustov < > > ivkus...@gmail.com> > > > >> wrote: > > > >> >>>>> So I'm using query like > > > >> >>>>> > > > >> > > > > http://127.0.0.1:8080/solr/select?q=*:*&fq={!mqparser}id:%281%202%203%29 > > < > > > > > > http://127.0.0.1:8080/solr/select?q=*:*&fq=%7B!mqparser%7Did:%281%202%203%29 > > > > > > > >> >>>> > > > >> >>>> If the IDs are purely numeric, I wonder if the better way is to > > > send > > > >> a > > > >> >>>> bitset. So, bit 1 is on if ID:1 is included, bit 2000 is on if > > > >> ID:2000 > > > >> >>>> is included. Even using URL-encoding rules, you can fit at > least > > 65 > > > >> >>>> sequential ID flags per character and I am sure there are more > > > >> >>>> efficient encoding schemes for long empty sequences. > > > >> >>>> > > > >> >>>> Regards, > > > >> >>>> Alex. > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> Personal website: http://www.outerthoughts.com/ > > > >> >>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > > >> >>>> - Time is the quality of nature that keeps events from > happening > > > all > > > >> >>>> at once. Lately, it doesn't seem to be working. (Anonymous - > > via > > > >> GTD > > > >> >>>> book) > > > >> > > > > > > > > > > > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > Principal Engineer, > > Grid Dynamics > > > > <http://www.griddynamics.com> > > <mkhlud...@griddynamics.com> > > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>