Streaming expressions malfunctioning
Hi there, I just recently upgraded a SOLR instance to 6.0.1 version and while been trying the new streaming expressions feature I discovered what I think might be a bug in the HTTP interface. I tried to create two simple streaming expressions, as described below: innerJoin( search(collection1,zkhost="localhost:9983",qt="/export",fl="id",sort="id asc",q="*:*"), search(collection2,zkhost="localhost:9983",qt="/export",fl="id",sort="id asc",q="*:*"), on("id") ) facet( collection3, q="*:*", buckets="field1", bucketSorts="sum(field2) asc", sum(field2), count(*) ) What I noticed is while I can obtain the expressions results using JAVA, the feature does not seem to function when I try to get the same data via cURL. You can see the code snippets I used below. Can you tell me if I am doing anything wrong or if this is indeed a bug in this version? Inner Join expression HTTP/PHP (not working) Request: JAVA (working) Facet Expression HTTP/PHP Request: JAVA Thanks for the help! Regards, João Pereira -- View this message in context: http://lucene.472066.n3.nabble.com/Streaming-expressions-malfunctioning-tp4281016.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Streaming expressions malfunctioning
EDIT: I'll keep testing with other stream sources/decorators. So far only the search endpoint works both in the JAVA and cURL implementation Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/Streaming-expressions-malfunctioning-tp4281016p4281019.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Streaming expressions malfunctioning
Hi, Actually there were errors in the expression synthax , examining the logs allowed me to see what the error was. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Streaming-expressions-malfunctioning-tp4281016p4281198.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR SelectStream bug
Hi, While trying to create an example with the Select stream decorator, I stumbled upon a bug in the solr 6.0.1 core. The expression I was trying to run was: via HTTP The request returned an error message, so I looked in the server full stack trace: After examining the file org.apache.solr.handler.StreamHandler, I noticed the initialization of the functionNames HashMap of StreamFactory class was missing the select operation. This is done in StreamHandler.inform function, so I just added after , recompiled and... it works :) Will this issue be fixed in 6.0.2? Thanks. Cheers, João Pereira -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-SelectStream-bug-tp4281995.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to integrate OpenNLP with Solr
Hi Sweta, I recently adapted that patch to a Solr instance running version 6.4 . If my memory does not fail me, I think the only changes I had to make were updating the package imports for the last OpenNLP version (I am using OpenNLP 1.8): What problem are you struggling with, exactly? Best, João -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-integrate-OpenNLP-with-Solr-tp4345601p4345626.html Sent from the Solr - User mailing list archive at Nabble.com.
Dynamic schema memory consumption
Hello guys, I manage a Solr cluster and I am experiencing some problems with dynamic schemas. The cluster has 16 nodes and 1500 collections with 12 shards per collection and 2 replicas per shard. The nodes can be divided in 2 major tiers: - tier1 is composed of 12 machines with 4 physical cores (8 virtual), 32GB ram and 4TB ssd; these are used mostly for direct queries and data exports; - tier2 is composed of 4 machines with 20 physical cores (40 virtual), 128GB and 4TB ssd; these are mostly for aggregation queries (facets) The problem I am experiencing is that when using dynamic schemas, the Solr heap size rises dramatically. I have two tier2 machines (lets call them A and B) running one Solr instance each with 96GB heap size, with 36 collections totaling 3TB of mainly fixed-schema (55GB schemaless) data indexed in each machine, and the heap consumption is on average 60GB (it peaks at around 80GB and drops to around 40GB after a GC run). On the other tier2 machines (C and D) I was running one Solr instance on each machine with 32GB heap size and 4 fixed schema collections with about 725GB of data indexed in each machine, which took up about 12GB of heap size. Recently I added 46 collections to these machines with about 220Gb of data. In order to do this I was forced to raise the heap size to 64GB and after indexing everything now the machines have an averaged consumption of 48GB (!!!) (max ~55GB, after GC runs ~37GB) I also noticed that when indexed fixed schema data the CPU utilization is also dramatically lower. I have around 100 workers indexing fixed schema data with and CPU utilization rate of about 10%, while I have only one worker for schemaless data with a CPU utilization cost of about 20%. So, I have a two big questions here: 1. Is this dramatic rise in resources consumption when using dynamic fields "normal"? 2. Is there a way to lower the memory requirements? If so, how? Thanks for your time! -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamic schema memory consumption
Dorian Hoxha wrote > Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a > little too much for 3TB of data ? > Something like 0.167GB for each shard ? > Isn't that too much overhead (i've mostly worked with es but still lucene > underneath) ? I don't have only 3TB , I have 3TB in two tier2 machines, the whole cluster is 12 TB :) So what I was trying to explain was this: NODES A & B 3TB per machine , 36 collections * 12 shards (432 indexes) , average heap footprint of 60GB NODES C & D - at first ~725GB per machine, 4 collections * 12 shards (48 indexes) , average heap footprint of 12GB NODES C & D - after addding 220GB schemaless data ~1TB per machine, 46 collections * 12 shards (552 indexes), average heap footprint of 48GB So, what you are suggesting is that the culprit for the bump in heap footprint is the new collections? Dorian Hoxha wrote > Also you should change the heap 32GB->30GB so you're guaranteed to get > pointer compression. I think you should have no need to increase it more > than this, since most things have moved to out-of-heap stuff, like > docValues etc. I was forced to raise the heap size because the memory requirements dramatically raised, hence this post :) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329345.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamic schema memory consumption
The way the data is spread across the cluster is not really uniform. Most of shards have way lower than 50GB; I would say about 15% of the total shards have more than 50GB. Dorian Hoxha wrote > Each shard is a lucene index which has a lot of overhead. And this overhead depends on what? I mean, if I create an empty collection will it take up much heap size just for "being there" ? Dorian Hoxha wrote > I don't know about static/dynamic memory-issue though. I could not find anything related in the docs or the mailing list either, but I'm still not ready to discard this suspicion... Again, thx for your time -- View this message in context: http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: java.lang.NullPointerException in json facet hll function
Hi, Any updates on this issue? I am using Solr 6.3 and I have hit this same bug... Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-in-json-facet-hll-function-tp4265378p4337877.html Sent from the Solr - User mailing list archive at Nabble.com.