Streaming expressions malfunctioning

2016-06-07 Thread jpereira
Hi there,

I just recently upgraded a SOLR instance to 6.0.1 version and while been
trying the new streaming expressions feature I discovered what I think might
be a bug in the HTTP interface.

I tried to create two simple streaming expressions, as described below:

innerJoin(
search(collection1,zkhost="localhost:9983",qt="/export",fl="id",sort="id
asc",q="*:*"),
search(collection2,zkhost="localhost:9983",qt="/export",fl="id",sort="id
asc",q="*:*"),
on("id")
)

facet(
collection3,
q="*:*",
buckets="field1",
bucketSorts="sum(field2) asc",
sum(field2),
count(*)
)

What I noticed is while I can obtain the expressions results using JAVA, the
feature does not seem to function when I try to get the same data via cURL.
You can see the code snippets I used below. Can you tell me if I am doing
anything wrong or if this is indeed a bug in this version?

Inner Join expression

HTTP/PHP (not working)

Request:



JAVA (working)



Facet Expression

HTTP/PHP

Request:


JAVA



Thanks for the help!

Regards,

João Pereira



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Streaming-expressions-malfunctioning-tp4281016.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Streaming expressions malfunctioning

2016-06-07 Thread jpereira
EDIT: I'll keep testing with other stream sources/decorators. So far only the
search endpoint works both in the JAVA and cURL implementation

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Streaming-expressions-malfunctioning-tp4281016p4281019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Streaming expressions malfunctioning

2016-06-08 Thread jpereira
Hi,

Actually there were errors in the expression synthax , examining the logs
allowed me to see what the error was.

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Streaming-expressions-malfunctioning-tp4281016p4281198.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR SelectStream bug

2016-06-13 Thread jpereira
Hi,

While trying to create an example with the Select stream decorator, I
stumbled upon a bug in the solr 6.0.1 core.

The expression I was trying to run was:


via HTTP


The request returned an error message, so I looked in the server full stack
trace:


After examining the file org.apache.solr.handler.StreamHandler, I noticed
the initialization of the functionNames HashMap of StreamFactory class was
missing the select operation. This is done in StreamHandler.inform function,
so I just added  after , recompiled and... it works :)

Will this issue be fixed in 6.0.2?

Thanks.

Cheers,

João Pereira



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-SelectStream-bug-tp4281995.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unable to integrate OpenNLP with Solr

2017-07-12 Thread jpereira
Hi Sweta,

I recently adapted that patch to a Solr instance running version 6.4 . If my
memory does not fail me, I think the only changes I had to make were
updating the package imports for the last OpenNLP version (I am using
OpenNLP 1.8):



What problem are you struggling with, exactly?

Best,

João



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-integrate-OpenNLP-with-Solr-tp4345601p4345626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dynamic schema memory consumption

2017-04-10 Thread jpereira
Hello guys,

I manage a Solr cluster and I am experiencing some problems with dynamic
schemas.

The cluster has 16 nodes and 1500 collections with 12 shards per collection
and 2 replicas per shard. The nodes can be divided in 2 major tiers: 
 - tier1 is composed of 12 machines with 4 physical cores (8 virtual), 32GB
ram and 4TB ssd; these are used mostly for direct queries and data exports;
 - tier2 is composed of 4 machines with 20 physical cores (40 virtual),
128GB and 4TB ssd; these are mostly for aggregation queries (facets)

The problem I am experiencing is that when using dynamic schemas, the Solr
heap size rises dramatically. 

I have two tier2 machines (lets call them A and B) running one Solr instance
each with 96GB heap size, with 36 collections totaling 3TB of mainly
fixed-schema (55GB schemaless) data indexed in each machine, and the heap
consumption is on average 60GB (it peaks at around 80GB and drops to around
40GB after a GC run).

On the other tier2 machines (C and D) I was running one Solr instance on
each machine with 32GB heap size and 4 fixed schema collections with about
725GB of data indexed in each machine, which took up about 12GB of heap
size. Recently I added 46 collections to these machines with about 220Gb of
data. In order to do this I was forced to raise the heap size to 64GB and
after indexing everything now the machines have an averaged consumption of
48GB (!!!) (max ~55GB, after GC runs ~37GB)

I also noticed that when indexed fixed schema data the CPU utilization is
also dramatically lower. I have around 100 workers indexing fixed schema
data with and CPU utilization rate of about 10%, while I have only one
worker for schemaless data with a CPU utilization cost of about 20%.

So, I have a two big questions here:
1. Is this dramatic rise in resources consumption when using dynamic fields
"normal"?
2. Is there a way to lower the memory requirements? If so, how?

Thanks for your time!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
Dorian Hoxha wrote
> Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a
> little too much for 3TB of data ?
> Something like 0.167GB for each shard ?
> Isn't that too much overhead (i've mostly worked with es but still lucene
> underneath) ?

I don't have only 3TB , I have 3TB in two tier2 machines, the whole cluster
is 12 TB :) So what I was trying to explain was this:

NODES A & B
3TB per machine , 36 collections * 12 shards (432 indexes) , average heap
footprint of 60GB

NODES C & D - at first
~725GB per machine, 4 collections * 12 shards (48 indexes) , average heap
footprint of 12GB

NODES C & D - after addding 220GB schemaless data
~1TB per machine, 46 collections * 12 shards (552 indexes),  average heap
footprint of 48GB

So, what you are suggesting is that the culprit for the bump in heap
footprint is the new collections?


Dorian Hoxha wrote
> Also you should change the heap 32GB->30GB so you're guaranteed to get 
> pointer compression. I think you should have no need to increase it more 
> than this, since most things have moved to out-of-heap stuff, like 
> docValues etc. 

I was forced to raise the heap size because the memory requirements
dramatically raised, hence this post :)

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329345.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic schema memory consumption

2017-04-11 Thread jpereira
The way the data is spread across the cluster is not really uniform. Most of
shards have way lower than 50GB; I would say about 15% of the total shards
have more than 50GB.


Dorian Hoxha wrote
> Each shard is a lucene index which has a lot of overhead. 

And this overhead depends on what? I mean, if I create an empty collection
will it take up much heap size  just for "being there" ?


Dorian Hoxha wrote
> I don't know about static/dynamic memory-issue though.

I could not find anything related in the docs or the mailing list either,
but I'm still not ready to discard this suspicion...

Again, thx for your time



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamic-schema-memory-consumption-tp4329184p4329367.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: java.lang.NullPointerException in json facet hll function

2017-05-29 Thread jpereira
Hi,

Any updates on this issue? I am using Solr 6.3 and I have hit this same
bug...

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-in-json-facet-hll-function-tp4265378p4337877.html
Sent from the Solr - User mailing list archive at Nabble.com.