date:20151211

Re: Solr 5.2.1 deadlock on commit

2015-12-11 Thread Ali Nazemian

I really appreciate if somebody can help me to solve this problem.
Regards.

On Tue, Dec 8, 2015 at 9:22 PM, Ali Nazemian  wrote:

> I did that already. The situation was worse. The autocommit part makes
> solr unavailable.
> On Dec 8, 2015 7:13 PM, "Emir Arnautovic" 
> wrote:
>
>> Hi Ali,
>> Can you try without explicit commits and see if threads will still be
>> blocked.
>>
>> Thanks,
>> Emir
>>
>> On 08.12.2015 16:19, Ali Nazemian wrote:
>>
>>> The indexing load is as follows:
>>> - Around 1000 documents every 5 mins.
>>> - The indexing speed is slow because of the complicated analyzer which is
>>> applied to each document. It takes around 60 seconds to index 1000
>>> documents with applying this analyzer (It is really slow. However, based
>>> on
>>> the analyzing part I think it would be acceptable).
>>> - The concurrentsolrclient is used in all the indexing/updating cases.
>>>
>>> Regards.
>>>
>>> On Tue, Dec 8, 2015 at 6:36 PM, Ali Nazemian 
>>> wrote:
>>>
>>> Dear Emir,
 Hi,
 There are some cases that I have soft commit in my application. However,
 the bulk update part has only hard commit for a bulk of 2500 documents.
 Here are some information about the whole indexing/updating scenarios:
 - Indexing part uses soft commit.
 - In a single update cases soft commit is used.
 - For bulk update batch hard commit is used (on 2500 documents)
 - Auto hard commit :120 sec
 - Auto soft commit: disable

 Best regards.


 On Tue, Dec 8, 2015 at 12:35 PM, Emir Arnautovic <
 emir.arnauto...@sematext.com> wrote:

 Hi Ali,
> This thread is blocked because cannot obtain update lock - in this
> particular case when doing soft commit. I am guessing that there
> others are
> blocked for the same reason. Can you tell us bit more about your setup
> and
> indexing load and procedure? Do you do explicit commits?
>
> Regards,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 08.12.2015 08:16, Ali Nazemian wrote:
>
> Hi,
>> There is a while since I have had problem with Solr 5.2.1 and I could
>> not
>> fix it yet. The only think that is clear to me is when I send bulk
>> update
>> to Solr the commit thread will be blocked! Here is the thread dump
>> output:
>>
>> "qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting
>> for
>> monitor entry [0x7f081cf04000]
>>  java.lang.Thread.State: BLOCKED (on object monitor)
>> at
>>
>>
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
>> - waiting to lock <0x00067ba2e660> (a java.lang.Object)
>> at
>>
>>
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
>> at
>>
>>
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>> at
>>
>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
>> at
>>
>>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
>> at
>>
>>
>> org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
>> at
>>
>>
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>> at
>>
>>
>> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
>> at
>>
>>
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
>> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
>> at
>>
>>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
>> at
>>
>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> at
>>
>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>> at
>>
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>> at
>>
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>> at
>>
>>
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>> at
>>
>>
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>

Re: Solr 5.2.1 deadlock on commit

2015-12-11 Thread Emir Arnautovic


Hi Ali,
Is Solr busy at that time and eventually recover or it is deadlocked? 
Can you provide full thread dump when it happened?
Do you run only indexing at that time? Is "unavailable" only from 
indexing perspective, or you cannot do anything with Solr?
Is there any indexing scenario that does not cause this (extreme/useless 
one is without commits)?

Did you try throttling indexing or changing bulk size?
How many indexing threads?

Thanks,
Emir

On 11.12.2015 10:06, Ali Nazemian wrote:

I really appreciate if somebody can help me to solve this problem.
Regards.

On Tue, Dec 8, 2015 at 9:22 PM, Ali Nazemian  wrote:


I did that already. The situation was worse. The autocommit part makes
solr unavailable.
On Dec 8, 2015 7:13 PM, "Emir Arnautovic" 
wrote:


Hi Ali,
Can you try without explicit commits and see if threads will still be
blocked.

Thanks,
Emir

On 08.12.2015 16:19, Ali Nazemian wrote:


The indexing load is as follows:
- Around 1000 documents every 5 mins.
- The indexing speed is slow because of the complicated analyzer which is
applied to each document. It takes around 60 seconds to index 1000
documents with applying this analyzer (It is really slow. However, based
on
the analyzing part I think it would be acceptable).
- The concurrentsolrclient is used in all the indexing/updating cases.

Regards.

On Tue, Dec 8, 2015 at 6:36 PM, Ali Nazemian 
wrote:

Dear Emir,

Hi,
There are some cases that I have soft commit in my application. However,
the bulk update part has only hard commit for a bulk of 2500 documents.
Here are some information about the whole indexing/updating scenarios:
- Indexing part uses soft commit.
- In a single update cases soft commit is used.
- For bulk update batch hard commit is used (on 2500 documents)
- Auto hard commit :120 sec
- Auto soft commit: disable

Best regards.


On Tue, Dec 8, 2015 at 12:35 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

Hi Ali,

This thread is blocked because cannot obtain update lock - in this
particular case when doing soft commit. I am guessing that there
others are
blocked for the same reason. Can you tell us bit more about your setup
and
indexing load and procedure? Do you do explicit commits?

Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 08.12.2015 08:16, Ali Nazemian wrote:

Hi,

There is a while since I have had problem with Solr 5.2.1 and I could
not
fix it yet. The only think that is clear to me is when I send bulk
update
to Solr the commit thread will be blocked! Here is the thread dump
output:

"qtp595445781-8207" prio=10 tid=0x7f0bf68f5800 nid=0x5785 waiting
for
monitor entry [0x7f081cf04000]
  java.lang.Thread.State: BLOCKED (on object monitor)
at


org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:608)
- waiting to lock <0x00067ba2e660> (a java.lang.Object)
at


org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at


org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at


org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
at


org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
at


org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
at


org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at


org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at


org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:270)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:177)
at


org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:98)
at


org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at


org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at


org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at


org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at


org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at


org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at


org.eclipse.jetty.server.handler.ContextHandler.doHan

Can Solr5.3 support multiple Geographical enveloppe

2015-12-11 Thread Frederic MERCEUR


Dear All,

do you know if Solr5.3 is supposed to support multiple Geographical 
enveloppe search ?


Indeed, we have a bbox index  defined as follow :

   multiValued="true"/>
distanceUnits="degrees" numberType="_bbox_coord" />
precisionStep="8" docValues="true" stored="false"/>


When we try to index this record :


68590130
ENVELOPE(-180.0, -118.0, 34.0, 
-34.0)
ENVELOPE(151.0, 180.0, 34.0, 
-34.0)



We get this error :

Exception writing document id 88590090 to the index; possible analysis error

And when we remove on of the two bounding_box fields, it is indexed with 
no problem.


Any idea ?

Thanks,
Fred

--
Fred Merceur
http://annuaire.ifremer.fr/cv/16828/

NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-11 Thread Vikram Parmar

We are creating a web application which would contain posts (something like
FB or say Youtube). For the stable part of the data (i.e.the facets, search
results & its content), we plan to use SOLR.

What should we use for the unstable part of the data (i.e. dynamic and
volatile content such as Like counts, Comments counts, Viewcounts)?


Option 1) Redis

What about storing the "dynamic" data in a different data store (like
Redis)? Thus, everytime the counts get refreshed, I do not have to reindex
the data into SOLR at all. Thus SOLR indexing is only triggered when new
posts are added to the site, and never on any activity on the posts by the
users.

Side-note :-
I also looked at the SOLR-Redis plugin at
https://github.com/sematext/solr-redis

The plugin looks good, but not sure if the plugin can be used to fetch the
data stored in Redis as part of the solr result set, i.e. in docs. The
description looks more like the Redis data can be used in the function
queries for boosting, sorting, etc. Anyone has experience with this?


Option 2) SOLR NRT with Soft Commits

We would depend on the in-built NRT features. Let's say we do soft-commits
every second and hard-commits every 10 seconds. Suppose huge amount of
dynamic data is created on the site across hundreds of posts, e.g. 10
likes across 1 posts. Thus, this would mean soft-commiting on 1
rows every second. And then hard-commiting those many rows every 10
seconds. Isn't this overkill?


Which option is preferred? How would you compare both options in terms of
scalibility, maintenance, feasibility, best-practices, etc? Any real-life
experiences or links to articles?

Many thanks!


p.s. EFF (external file fields) is not an option, as I read that the data
in that file can only be used in function queries and cannot be returned as
part of a document.

Re: Indexing of annotated corpora

2015-12-11 Thread Alessandro Benedetti

Let me answer in line :

On 10 December 2015 at 06:11, Emmanuel CARTIER <
emmanuel.cart...@lipn.univ-paris13.fr> wrote:

> Hi,
>
> I am a newbie in Solr and I would like to know
>
> 1. The most efficient way(s?) to index annotated corpora with Linguistic
> information at the token and chunk levels. My documents are in XML and has
> the following structure:
> 
> 
>  
> I
> am
> a
> 
> weak
> newbie
> 
> 
> ...
>
> My main use case is to be able to search for tokens or lemma and faceting
> with pos. Or to search for a combination word + specific pos-tag.
> I cannot figure out how to index the token level, so as to "link" to each
> token its pos (part-of-speech) and lemma. I haven't find any documentation
> on that. At the moment, as my xml is not solr-conformant, I use the
> DataImportHandler.
>

1) for requirement 1 I can not see any particular problem. Just model your
Solr document to be a "token" .
Your fields will be :
surface_form
lemma
pos
...
Index all the fields and curate the field properties.
Then do your boolean queries with all the facets you want.




>
> 2. if it is possible to use an existing Tokenizer or Filter to do a
> dictionary lookup for each token (the external dictionary will contain
> lemma and pos information for each word) - this is for a use case when no
> token annotation has been done on the source document.
>

mmm do you want to index the lemma as synonyms of the original token ?
and then not applying ad query time the lemmatisation ?
How do you want to use together lemmas and surface forms in this use case ?
For storing the pos you can use the payload of the token and specifically a
custom token filter if you want, or tokenises if it fits better.
Take inspiration from this :

https://wiki.apache.org/solr/OpenNLP

Cheers

>
> Any suggestion and pointers will be much appreciated!
> Thanks in advance,
>
> Emmanuel
>
>
> --
> Emmanuel Cartier
> Enseignant-Chercheur en Linguistique Informatique
> LIPN CNRS UMR 7030 - équipe RCLN
> http://lipn.univ-paris13.fr/fr/rcln
> Université Paris 13 Sorbonne Paris Cité
> 99 avenue Jean-Baptiste Clement
> 93430 Villetaneuse
> tél. : (+33) 06 46 79 12 86
> email : emmanuel.cart...@univ-paris13.fr
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro

Hi,

I'm playing with the nested documents feature and after run this query:

http://localhost:8983/solr/ecommerce-15/query?q=id:3181426982318142698228*

The documents has the IDs:

- Parent :  3181426982318142698228
- Child_1 : 31814269823181426982280
- Child_2 : 31814269823181426982281


I have this return:

{
responseHeader: {
status: 0,
QTime: 3,
params: {
q: "id:3181426982318142698228*"
}
},
response: {
numFound: 3,
start: 0,
maxScore: 1,
docs: [{
id: "31814269823181426982280",
child_type: "ecommerce_product",
qty: 1,
product_price: 49.99
}, {
id: "31814269823181426982281",
child_type: "ecommerce_product",
qty: 1,
product_price: 139.9
}]
}
}

As you can see the numFound is 3, and I have only 2 child documents, it's
not supposed to ignore the parent document?



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nested-document-query-with-wrong-numFound-value-tp4244851.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: capacity of storage a single core

2015-12-11 Thread Alessandro Benedetti

Susheel, this is a very good idea.
I am a little bit busy this period, so I doubt I can contribute with a blog
post, but it would be great if anyone has time.
If not I will add it to my backlog and sooner or later I will do it :)

Furthermore latest observations from Erick are pure gold, and I agree
completely.
I have only a question related this :

1>  "the entire index". What's this? The size on disk?
> 90% of that size on disk may be stored data which
> uses very little memory, which is limited by the
> documentCache in Solr. OTOH, only 10% of the on-disk
> size might be stored data.


If I am correct the documentCache in Solr is a map that relates the Lucene
document ordinal to the stored fields for that document.
We have control on that and we can assign our preferred values.
First question :
1) Is this using the JVM memory to store this cache ? I assume yes.
So we need to take care of our JVM memory if we want to store in memory big
chunks of the stored index.

2) MMap index segments are actually only the segments used for searching ?
Is not the Lucene directory memory mapping the stored segments as well ?
This was my understanding but maybe I am wrong.
In the case we first memory map the stored segments and then potentially
store them on the Solr cache as well, right ?

Cheers


On 10 December 2015 at 19:43, Susheel Kumar  wrote:

> Like the details here Eric how you broke memory into different parts. I
> feel if we can combine lot of this knowledge from your various posts, above
> sizing blog, Solr wiki pages, Uwe article on MMap/heap,  consolidate and
> present in at single place which may help lot of new folks/folks struggling
> with memory/heap/sizing issues questions etc.
>
> Thanks,
> Susheel
>
> On Wed, Dec 9, 2015 at 12:40 PM, Erick Erickson 
> wrote:
>
> > I object to the question. And the advice. And... ;).
> >
> > Practically, IMO guidance that "the entire index should
> > fit into memory" is misleading, especially for newbies.
> > Let's break it down:
> >
> > 1>  "the entire index". What's this? The size on disk?
> > 90% of that size on disk may be stored data which
> > uses very little memory, which is limited by the
> > documentCache in Solr. OTOH, only 10% of the on-disk
> > size might be stored data.
> >
> > 2> "fit into memory". What memory? Certainly not
> > the JVM as much of the Lucene-level data is in
> > MMapDirectory which uses the OS memory. So
> > this _probably_ means JVM + OS memory, and OS
> > memory is shared amongst other processes as well.
> >
> > 3> Solr and Lucene build in-memory structures that
> > aren't reflected in the index size on disk. I've seen
> > filterCaches for instance that have been (mis) configured
> > that could grow to 100s of G. This is totally not reflected in
> > the "index size".
> >
> > 4> Try faceting on a text field with lots of unique
> > values. Bad Practice, but you'll see just how quickly
> > the _query_ can change the memory requirements.
> >
> > 5> Sure, with modern hardware we can create huge JVM
> > heaps... that hit GC pauses that'll drive performance
> > down, sometimes radically.
> >
> > I've seen 350M docs, 200-300 fields (aggregate) fit into 12G
> > of JVM. I've seen 25M docs (really big ones) strain 48G
> > JVM heaps.
> >
> > Jack's approach is what I use; pick a number and test with it.
> > Here's an approach:
> >
> >
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> > Best,
> > Erick
> >
> > On Wed, Dec 9, 2015 at 8:54 AM, Susheel Kumar 
> > wrote:
> > > Thanks, Jack for quick reply.  With Replica / Shard I mean to say on a
> > > given machine there may be two/more replicas and all of them may not
> fit
> > > into memory.
> > >
> > > On Wed, Dec 9, 2015 at 11:00 AM, Jack Krupansky <
> > jack.krupan...@gmail.com>
> > > wrote:
> > >
> > >> Yes, there are nuances to any general rule. It's just a starting
> point,
> > and
> > >> your own testing will confirm specific details for your specific app
> and
> > >> data. For example, maybe you don't query all fields commonly, so each
> > >> field-specific index may not require memory or not require it so
> > commonly.
> > >> And, yes, each app has its own latency requirements. The purpose of a
> > >> general rule is to generally avoid unhappiness, but if you have an
> > appetite
> > >> and tolerance for unhappiness, then go for it.
> > >>
> > >> Replica vs. shard? They're basically the same - a replica is a copy
> of a
> > >> shard.
> > >>
> > >> -- Jack Krupansky
> > >>
> > >> On Wed, Dec 9, 2015 at 10:36 AM, Susheel Kumar  >
> > >> wrote:
> > >>
> > >> > Hi Jack,
> > >> >
> > >> > Just to add, OS Disk Cache will still make query performant even
> > though
> > >> > entire index can't be loaded into memory. How much more latency
> > compare
> > >> to
> > >> > if index gets completely loaded into memory may vary depending to
> > index
> > >> > size etc.  I am trying to clarify this here because lot of folks
> takes
> > >>

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-11 Thread Andrea Gazzarini

Hi Vikram,
sounds like you're using those "dynamic" fields only for visualization
(i.e. you don't need to have them "indexed")...this is the big point that
could make the difference.

If the answer is yes, about the first option (NOTE: I don't know Redis and
that plugin), a custom SearchComponent would be very easy to implement. It
would contribute to search results in a dedicated section of the response
(see for example the highlight or the facet component)

I don't have a concrete experience about the second option, but still
assuming that

- you need those fields stored, not indexed
- the response page size is not huge (this is considered a bad practice in
Solr)

I would avoid to bomb Solr with repeated updates

Best,
Andrea



2015-12-11 11:48 GMT+01:00 Vikram Parmar :

> We are creating a web application which would contain posts (something like
> FB or say Youtube). For the stable part of the data (i.e.the facets, search
> results & its content), we plan to use SOLR.
>
> What should we use for the unstable part of the data (i.e. dynamic and
> volatile content such as Like counts, Comments counts, Viewcounts)?
>
>
> Option 1) Redis
>
> What about storing the "dynamic" data in a different data store (like
> Redis)? Thus, everytime the counts get refreshed, I do not have to reindex
> the data into SOLR at all. Thus SOLR indexing is only triggered when new
> posts are added to the site, and never on any activity on the posts by the
> users.
>
> Side-note :-
> I also looked at the SOLR-Redis plugin at
> https://github.com/sematext/solr-redis
>
> The plugin looks good, but not sure if the plugin can be used to fetch the
> data stored in Redis as part of the solr result set, i.e. in docs. The
> description looks more like the Redis data can be used in the function
> queries for boosting, sorting, etc. Anyone has experience with this?
>
>
> Option 2) SOLR NRT with Soft Commits
>
> We would depend on the in-built NRT features. Let's say we do soft-commits
> every second and hard-commits every 10 seconds. Suppose huge amount of
> dynamic data is created on the site across hundreds of posts, e.g. 10
> likes across 1 posts. Thus, this would mean soft-commiting on 1
> rows every second. And then hard-commiting those many rows every 10
> seconds. Isn't this overkill?
>
>
> Which option is preferred? How would you compare both options in terms of
> scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> experiences or links to articles?
>
> Many thanks!
>
>
> p.s. EFF (external file fields) is not an option, as I read that the data
> in that file can only be used in function queries and cannot be returned as
> part of a document.
>

Schema API, change the defaultoperator

2015-12-11 Thread Yago Riveiro

Hi,

How can I change the defaultoperator parameter through the schema API?

Thanks.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-API-change-the-defaultoperator-tp4244857.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: capacity of storage a single core

2015-12-11 Thread Toke Eskildsen

On Thu, 2015-12-10 at 14:43 -0500, Susheel Kumar wrote:
> Like the details here Eric how you broke memory into different parts. I
> feel if we can combine lot of this knowledge from your various posts, above
> sizing blog, Solr wiki pages, Uwe article on MMap/heap,  consolidate and
> present in at single place which may help lot of new folks/folks struggling
> with memory/heap/sizing issues questions etc.

To demonstrate part of the problem:

Say we have an index with documents representing employees, with three
defined fields: name, company and the dynamic *_custom. Each company
uses 3 dynamic fields with custom names as they see fit.

Let's say we keep track of 1K companies, each with 10K employees.

The full index is now

  total documents: 10M (1K*10K)
  name: 10M unique values (or less due to names not being unique)
  company: Exactly 1K unique values
  *_custom: 3K unique fields, each with 1K unique values

We do our math-math-thing and arrive at an approximate index size of 5GB
(just an extremely loose guess here). Heap is nothing to speak of for
basic search on this, so let's set that to 1GB. We estimate that a
machine with 8GB of physical RAM is more than fine for this - halving
that to 4GB would probably also work well.

Say we want to group on company. The "company" field is UnInverted, so
there is an array of 10M pointers to 10K values. That is about 50MB
overhead. No change needed to heap allocation.

Say we want to filter on company and cache the filters. Each filter
takes ~1MB, so that is 1000*1MB = 1GB of heap. Okay, so we bump the heap
from 1 to 2GB. The 4GB machine might be a bit small here, depending on
storage, but the 8GB one will work just fine.

Say each company wants to facet on their custom fields. There are 3K of
those fields. Each one requiring ~50MB (like the company grouping) for
UnInversion. That is 150GB of heap. Yes, 150GB.

What about DocValues? Well, if we just use standard String faceting, we
need a map from segment-ordinals to global-ordinals for each facet field
or in other words a map with 1K entries for each facet. Such a map can
be represented with < 20 bits/entry (finely packed), so that is ~3KB of
heap for each field or 9GB (3K*3KB) for the full range of custom fields.
Still way too much for our 8GB machine.

Say we change the custom fields to fixed fields named "custom1",
"custom2" & "custom3" and do some name-mapping in the front-end so it
just looks as if the companies chooses the names themselves.
Suddenly there are only 3 larger fields to facet on instead of 3K small
ones. That is 3*50MB of heap required, even without using DocValues.
And we're back to our 4GB machine.

But wait, the index is used quite a lot! 200 concurrent requests. Each
facet request requires a counter and for the three custom fields there
are 1M unique values (1000 for each company). Those counters takes up
4bytes*1M = 4MB each and for 200 concurrent requests that is 800MB +
overhead. Better bump the heap with 1GB extra.

Except that someone turned on threaded faceting, so we do that for the 3
custom fields at the same time, so we better bump with 2GB more. Whoops,
even the 8GB machine is too small.

Not sure I follow all of the above myself, but the morale should be
clear: Seemingly innocuous changes to requirements or setup can easily
result is huge changes to requirements. If I were to describe such
things enough for another person (without previous in-depth knowledge in
this field) to make educated guesses, it would be a massive amount of
text with a lot of hard to grasp parts. I have tried twice and scrapped
it both times as it quickly became apparent that it would be much too
unwieldy.

Trying to not be a wet blanket, this could also be because I have my
head too far down these things. Skipping some details and making some
clearly stated choices up front could work. There is no doubt that there
are a lot of people that asks for estimates and "we cannot say anything"
is quite a raw deal.

- Toke Eskildsen, State and University Library, Denmark

how to secure standalone solr

2015-12-11 Thread Mugeesh Husain

Hello,

Anyone told me how to secure standalone solr .

1.)using Kerberos Plugin is a good practice or any other else.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-secure-standalone-solr-tp4244866.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Nested document query with wrong numFound value

2015-12-11 Thread Mikhail Khludnev

what do you see with debugQuery=true  ?

On Fri, Dec 11, 2015 at 2:02 PM, Yago Riveiro 
wrote:

> Hi,
>
> I'm playing with the nested documents feature and after run this query:
>
> http://localhost:8983/solr/ecommerce-15/query?q=id:3181426982318142698228*
>
> The documents has the IDs:
>
> - Parent :  3181426982318142698228
> - Child_1 : 31814269823181426982280
> - Child_2 : 31814269823181426982281
>
>
> I have this return:
>
> {
> responseHeader: {
> status: 0,
> QTime: 3,
> params: {
> q: "id:3181426982318142698228*"
> }
> },
> response: {
> numFound: 3,
> start: 0,
> maxScore: 1,
> docs: [{
> id: "31814269823181426982280",
> child_type: "ecommerce_product",
> qty: 1,
> product_price: 49.99
> }, {
> id: "31814269823181426982281",
> child_type: "ecommerce_product",
> qty: 1,
> product_price: 139.9
> }]
> }
> }
>
> As you can see the numFound is 3, and I have only 2 child documents, it's
> not supposed to ignore the parent document?
>
>
>
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nested-document-query-with-wrong-numFound-value-tp4244851.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-11 Thread Jack Krupansky

You can consider DataStax Enterprise (DSE) which deeply integrates
Solr (not just a plugin) with the Cassandra database (DSE Search):
http://www.datastax.com/products/datastax-enterprise-search

Solr's Join queries are supported across tables in DSE Search, so you could
keep dynamic data in a separate table (use the same partition key to assure
that the join will be more efficient by being on the same node.)


-- Jack Krupansky

On Fri, Dec 11, 2015 at 6:21 AM, Andrea Gazzarini 
wrote:

> Hi Vikram,
> sounds like you're using those "dynamic" fields only for visualization
> (i.e. you don't need to have them "indexed")...this is the big point that
> could make the difference.
>
> If the answer is yes, about the first option (NOTE: I don't know Redis and
> that plugin), a custom SearchComponent would be very easy to implement. It
> would contribute to search results in a dedicated section of the response
> (see for example the highlight or the facet component)
>
> I don't have a concrete experience about the second option, but still
> assuming that
>
> - you need those fields stored, not indexed
> - the response page size is not huge (this is considered a bad practice in
> Solr)
>
> I would avoid to bomb Solr with repeated updates
>
> Best,
> Andrea
>
>
>
> 2015-12-11 11:48 GMT+01:00 Vikram Parmar :
>
> > We are creating a web application which would contain posts (something
> like
> > FB or say Youtube). For the stable part of the data (i.e.the facets,
> search
> > results & its content), we plan to use SOLR.
> >
> > What should we use for the unstable part of the data (i.e. dynamic and
> > volatile content such as Like counts, Comments counts, Viewcounts)?
> >
> >
> > Option 1) Redis
> >
> > What about storing the "dynamic" data in a different data store (like
> > Redis)? Thus, everytime the counts get refreshed, I do not have to
> reindex
> > the data into SOLR at all. Thus SOLR indexing is only triggered when new
> > posts are added to the site, and never on any activity on the posts by
> the
> > users.
> >
> > Side-note :-
> > I also looked at the SOLR-Redis plugin at
> > https://github.com/sematext/solr-redis
> >
> > The plugin looks good, but not sure if the plugin can be used to fetch
> the
> > data stored in Redis as part of the solr result set, i.e. in docs. The
> > description looks more like the Redis data can be used in the function
> > queries for boosting, sorting, etc. Anyone has experience with this?
> >
> >
> > Option 2) SOLR NRT with Soft Commits
> >
> > We would depend on the in-built NRT features. Let's say we do
> soft-commits
> > every second and hard-commits every 10 seconds. Suppose huge amount of
> > dynamic data is created on the site across hundreds of posts, e.g. 10
> > likes across 1 posts. Thus, this would mean soft-commiting on 1
> > rows every second. And then hard-commiting those many rows every 10
> > seconds. Isn't this overkill?
> >
> >
> > Which option is preferred? How would you compare both options in terms of
> > scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> > experiences or links to articles?
> >
> > Many thanks!
> >
> >
> > p.s. EFF (external file fields) is not an option, as I read that the data
> > in that file can only be used in function queries and cannot be returned
> as
> > part of a document.
> >
>

Re: Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro

This:





{


responseHeader: {


status: 0,


QTime: 10,


params: {


q: "id:3181426982318142698228*",


debugQuery: "true"


}


},


response: {


numFound: 3,


start: 0,


maxScore: 1,


docs: [{


id: "31814269823181426982280",


child_type: "ecommerce_product",


qty: 1,


product_price: 49.99


}, {


id: "31814269823181426982281",


child_type: "ecommerce_product",


qty: 1,


product_price: 139.9


}]


},


debug: {


track: {


rid: 
"node-01-ecommerce-15_shard1_replica2-1449842438070-0",


EXECUTE_QUERY: {


http: 
//node-17:8983/solr/ecommerce-15_shard2_replica1/: {


QTime: "0",


ElapsedTime: "2",


RequestPurpose: "GET_TOP_IDS",


NumFound: "0",


Response: 
"{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
 timing, track],qt=/query,fl=[id, 
score],shards.purpose=4,start=0,fsv=true,shard.url=http://node-17:8983/solr/ecommerce-15_shard2_replica1/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}";


},


http: 
//node-01:8983/solr/ecommerce-15_shard1_replica2/: {


QTime: "0",


ElapsedTime: "2",


RequestPurpose: "GET_TOP_IDS",


NumFound: "11",


Response: 
"{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
 timing, track],qt=/query,fl=[id, 
score],shards.purpose=4,start=0,fsv=true,shard.url=http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}},response={numFound=11,start=0,maxScore=1.0,docs=[SolrDocument{id=31814269823181426982280,
 score=1.0}, SolrDocument{id=31814269823181426982280, score=1.0}, 
SolrDocument{id=31814269823181426982280, score=1.0}, 
SolrDocument{id=31814269823181426982280, score=1.0}, 
SolrDocument{id=31814269823181426982280, score=1.0}, 
SolrDocument{id=31814269823181426982281, score=1.0}, 
SolrDocument{id=31814269823181426982281, score=1.0}, 
SolrDocument{id=31814269823181426982281, score=1.0}, 
SolrDocument{id=31814269823181426982281, score=1.0}, 
SolrDocument{id=31814269823181426982281, 
score=1.0}]},sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}"


},


http: //node-14:8983/solr/ecommerce-15_shard4_replica1/: {


QTime: "0",


ElapsedTime: "2",


RequestPurpose: "GET_TOP_IDS",


NumFound: "0",


Response: 
"{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
 timing, track],qt=/query,fl=[id, 
score],shards.purpose=4,start=0,fsv=true,shard.url=http://node-14:8983/solr/ecommerce-15_shard4_replica1/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}},response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight

Re: Schema API, change the defaultoperator

2015-12-11 Thread Shawn Heisey

On 12/11/2015 4:23 AM, Yago Riveiro wrote:
> How can I change the defaultoperator parameter through the schema API?

The default operator and default field settings in the schema have been
deprecated for quite some time, so I would imagine that you can't change
them with the schema API -- they shouldn't be there, so there's no need
to support the ability to change them.

Look into the q.op and df parameters, which can be defined in the
request handler definition (solrconfig.xml) or passed in with the query.

https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-StandardQueryParserParameters

Thanks,
Shawn

Re: Authorization API versus zkcli.sh

2015-12-11 Thread Shalin Shekhar Mangar

Shouldn't this be the znode version? Why put a version in
security.json? Or is the idea that the user will upload security.json
only once and then use the security APIs for all further changes?

On Fri, Dec 11, 2015 at 11:51 AM, Noble Paul  wrote:
> Please do not put any number. That number is used by the system to
> optimize loading/reloading plugins. It is not relevant for the user.
>
> On Thu, Dec 10, 2015 at 11:52 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
>  wrote:
>> Looking at security.json in Zookeeper, I notice that both the authentication 
>> section and the authorization section ends with something like
>>
>> "":{"v":47}},
>>
>> Am I correct in thinking that this 47 (in this case) is a version number, 
>> and that ANY number could be used in the file uploaded to security.json 
>> using "zkcli.sh -putfile"?
>>
>> Or is this some sort of checksum whose value must match some unclear 
>> criteria?
>>
>>
>> -Original Message-
>> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
>> Sent: Sunday, December 06, 2015 8:42 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Authorization API versus zkcli.sh
>>
>> There's nothing cluster specific in security.json if you're using those
>> plugins. It is totally safe to just take the file from one cluster and
>> upload it for another for things to work.
>>
>> On Sat, Dec 5, 2015 at 3:38 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
>> craig.oak...@nih.gov> wrote:
>>
>>> Looking through
>>> cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
>>> one notices that security.json is initially created by zkcli.sh, and then
>>> modified by means of the Authentication API and the Authorization API. By
>>> and large, this sounds like a good way to accomplish such tasks, assuming
>>> that these APIs do some error checking to prevent corruption of
>>> security.json
>>>
>>> I was wondering about cases where one is cloning an existing Solr
>>> instance, such as when creating an instance in Amazon Cloud. If one has a
>>> security.json that has been thoroughly tried and successfully tested on
>>> another Solr instance, is it possible / safe / not-un-recommended to use
>>> zkcli.sh to load the full security.json (as extracted via zkcli.sh from the
>>> Zookeeper of the thoroughly tested existing instance)? Or would the
>>> official verdict be that the only acceptable way to create security.json is
>>> to load a minimal version with zkcli.sh and then to build the remaining
>>> components with the Authentication API and the Authorization API (in a
>>> script, if one wants to automate the process: although such a script would
>>> have to include plain-text passwords)?
>>>
>>> I figured there is no harm in asking.
>>>
>>
>>
>>
>> --
>> Anshum Gupta
>
>
>
> --
> -
> Noble Paul



-- 
Regards,
Shalin Shekhar Mangar.

Re: Authorization API versus zkcli.sh

2015-12-11 Thread Anshum Gupta

yes, that's the assumption. The reason why there's a version there is to
optimize on reloads i.e. Authentication and Authorization plugins are
reloaded only when the version number is changed. e.g.
* Start with Ver 1 for both authentication and authorization
* Make changes to Authentication, the version for this section is updated
to the znode version, while the version for the authorization section is
not changed. This forces the authentication plugin to be reloaded but not
the authorization plugin. Similarly for authorization.

It's a way to optimize the reloads without splitting the definition into 2
znodes, which is also an option.


On Fri, Dec 11, 2015 at 8:06 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Shouldn't this be the znode version? Why put a version in
> security.json? Or is the idea that the user will upload security.json
> only once and then use the security APIs for all further changes?
>
> On Fri, Dec 11, 2015 at 11:51 AM, Noble Paul  wrote:
> > Please do not put any number. That number is used by the system to
> > optimize loading/reloading plugins. It is not relevant for the user.
> >
> > On Thu, Dec 10, 2015 at 11:52 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
> >  wrote:
> >> Looking at security.json in Zookeeper, I notice that both the
> authentication section and the authorization section ends with something
> like
> >>
> >> "":{"v":47}},
> >>
> >> Am I correct in thinking that this 47 (in this case) is a version
> number, and that ANY number could be used in the file uploaded to
> security.json using "zkcli.sh -putfile"?
> >>
> >> Or is this some sort of checksum whose value must match some unclear
> criteria?
> >>
> >>
> >> -Original Message-
> >> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> >> Sent: Sunday, December 06, 2015 8:42 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Authorization API versus zkcli.sh
> >>
> >> There's nothing cluster specific in security.json if you're using those
> >> plugins. It is totally safe to just take the file from one cluster and
> >> upload it for another for things to work.
> >>
> >> On Sat, Dec 5, 2015 at 3:38 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> >> craig.oak...@nih.gov> wrote:
> >>
> >>> Looking through
> >>>
> cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
> >>> one notices that security.json is initially created by zkcli.sh, and
> then
> >>> modified by means of the Authentication API and the Authorization API.
> By
> >>> and large, this sounds like a good way to accomplish such tasks,
> assuming
> >>> that these APIs do some error checking to prevent corruption of
> >>> security.json
> >>>
> >>> I was wondering about cases where one is cloning an existing Solr
> >>> instance, such as when creating an instance in Amazon Cloud. If one
> has a
> >>> security.json that has been thoroughly tried and successfully tested on
> >>> another Solr instance, is it possible / safe / not-un-recommended to
> use
> >>> zkcli.sh to load the full security.json (as extracted via zkcli.sh
> from the
> >>> Zookeeper of the thoroughly tested existing instance)? Or would the
> >>> official verdict be that the only acceptable way to create
> security.json is
> >>> to load a minimal version with zkcli.sh and then to build the remaining
> >>> components with the Authentication API and the Authorization API (in a
> >>> script, if one wants to automate the process: although such a script
> would
> >>> have to include plain-text passwords)?
> >>>
> >>> I figured there is no harm in asking.
> >>>
> >>
> >>
> >>
> >> --
> >> Anshum Gupta
> >
> >
> >
> > --
> > -
> > Noble Paul
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Anshum Gupta

Re: capacity of storage a single core

2015-12-11 Thread Susheel Kumar

Thanks, Alessandro.  We can attempt to come up with such a blog and I can
volunteer for bullets/headings to start with. I also agree that we can
can't come up with some definitive answer as mentioned in other places but
can give an attempt to at least consolidate all these knowledge into one
place.   As of now i see few sources which can be referred to come up with
some consolidated knowledge

https://wiki.apache.org/solr/SolrPerformanceProblems
http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
Uwe's Article on MMAP
Erick's and others valuable posts



On Fri, Dec 11, 2015 at 6:20 AM, Alessandro Benedetti  wrote:

> Susheel, this is a very good idea.
> I am a little bit busy this period, so I doubt I can contribute with a blog
> post, but it would be great if anyone has time.
> If not I will add it to my backlog and sooner or later I will do it :)
>
> Furthermore latest observations from Erick are pure gold, and I agree
> completely.
> I have only a question related this :
>
> 1>  "the entire index". What's this? The size on disk?
> > 90% of that size on disk may be stored data which
> > uses very little memory, which is limited by the
> > documentCache in Solr. OTOH, only 10% of the on-disk
> > size might be stored data.
>
>
> If I am correct the documentCache in Solr is a map that relates the Lucene
> document ordinal to the stored fields for that document.
> We have control on that and we can assign our preferred values.
> First question :
> 1) Is this using the JVM memory to store this cache ? I assume yes.
> So we need to take care of our JVM memory if we want to store in memory big
> chunks of the stored index.
>
> 2) MMap index segments are actually only the segments used for searching ?
> Is not the Lucene directory memory mapping the stored segments as well ?
> This was my understanding but maybe I am wrong.
> In the case we first memory map the stored segments and then potentially
> store them on the Solr cache as well, right ?
>
> Cheers
>
>
> On 10 December 2015 at 19:43, Susheel Kumar  wrote:
>
> > Like the details here Eric how you broke memory into different parts. I
> > feel if we can combine lot of this knowledge from your various posts,
> above
> > sizing blog, Solr wiki pages, Uwe article on MMap/heap,  consolidate and
> > present in at single place which may help lot of new folks/folks
> struggling
> > with memory/heap/sizing issues questions etc.
> >
> > Thanks,
> > Susheel
> >
> > On Wed, Dec 9, 2015 at 12:40 PM, Erick Erickson  >
> > wrote:
> >
> > > I object to the question. And the advice. And... ;).
> > >
> > > Practically, IMO guidance that "the entire index should
> > > fit into memory" is misleading, especially for newbies.
> > > Let's break it down:
> > >
> > > 1>  "the entire index". What's this? The size on disk?
> > > 90% of that size on disk may be stored data which
> > > uses very little memory, which is limited by the
> > > documentCache in Solr. OTOH, only 10% of the on-disk
> > > size might be stored data.
> > >
> > > 2> "fit into memory". What memory? Certainly not
> > > the JVM as much of the Lucene-level data is in
> > > MMapDirectory which uses the OS memory. So
> > > this _probably_ means JVM + OS memory, and OS
> > > memory is shared amongst other processes as well.
> > >
> > > 3> Solr and Lucene build in-memory structures that
> > > aren't reflected in the index size on disk. I've seen
> > > filterCaches for instance that have been (mis) configured
> > > that could grow to 100s of G. This is totally not reflected in
> > > the "index size".
> > >
> > > 4> Try faceting on a text field with lots of unique
> > > values. Bad Practice, but you'll see just how quickly
> > > the _query_ can change the memory requirements.
> > >
> > > 5> Sure, with modern hardware we can create huge JVM
> > > heaps... that hit GC pauses that'll drive performance
> > > down, sometimes radically.
> > >
> > > I've seen 350M docs, 200-300 fields (aggregate) fit into 12G
> > > of JVM. I've seen 25M docs (really big ones) strain 48G
> > > JVM heaps.
> > >
> > > Jack's approach is what I use; pick a number and test with it.
> > > Here's an approach:
> > >
> > >
> >
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Dec 9, 2015 at 8:54 AM, Susheel Kumar 
> > > wrote:
> > > > Thanks, Jack for quick reply.  With Replica / Shard I mean to say on
> a
> > > > given machine there may be two/more replicas and all of them may not
> > fit
> > > > into memory.
> > > >
> > > > On Wed, Dec 9, 2015 at 11:00 AM, Jack Krupansky <
> > > jack.krupan...@gmail.com>
> > > > wrote:
> > > >
> > > >> Yes, there are nuances to any general rule. It's just a starting
> > point,
> > > and
> > > >> your own testing will confirm specific details for your specific app
> > and
> > > >> data. For example, maybe you don't query all fields commonly, so
> each
> >

Re: Schema API, change the defaultoperator

2015-12-11 Thread Yago Riveiro

I uploaded a schema.xml manualy with the defaultoperator configuration and it's 
working.

My problem is that my legacy application is huge and I can't go to all places 
to add the q.op parameter.

The solrconfig.xml option should be an option. The q.op param defined in 
request handlers works with POST http calls?

On Fri, Dec 11, 2015 at 2:26 PM, Shawn Heisey  wrote:

> On 12/11/2015 4:23 AM, Yago Riveiro wrote:
>> How can I change the defaultoperator parameter through the schema API?
> The default operator and default field settings in the schema have been
> deprecated for quite some time, so I would imagine that you can't change
> them with the schema API -- they shouldn't be there, so there's no need
> to support the ability to change them.
> Look into the q.op and df parameters, which can be defined in the
> request handler definition (solrconfig.xml) or passed in with the query.
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-StandardQueryParserParameters
> Thanks,
> Shawn

RE: Use multiple istance simultaneously

2015-12-11 Thread Gian Maria Ricci - aka Alkampfer

Thanks for all of your clarification. I know that solrcloud is a really
better configuration than any other, but actually it has a complexity that
is really higher. I just want to give you the pain point I've noticed while
I was gathering all the info I can got on SolrCloud.

1) zookeeper documentation says that to have the best experience you should
have a dedicated filesystem for the persistence and it should never swap to
disk. I've not found any guidelines on how I should dimension zookeeper
machine, how much ram, disk? Can I install zookeeper in the same machines
where Solr resides ( I suspect no, because Solr machine are under stress and
if zookeeper start swapping is can lead to problem)?

2) What about the update? If I need to update my solrcloud instance and the
new version requires a new version of zookeeper which is the path to go? I
need to first update zookeeper, or upgrading solr to existing machine or?
Maybe I did not search well but I did not find a comprehensive guideline
that told me how to upgrade my SolrCloud installation in various situation. 

3) Which are the best practices to run DIH in solrcloud? I think I can round
robin triggering DIH import on different server composing the cloud
infrastructure, or there is a better way to go? (I probably need to trigger
a DIH each 5/10 minutes but the number of new records is really small)

4) Since I believe that it is not best practice to install zookeeper on same
SolrMachine (as separated process, not the built in zookeeper), I need at
least three more machine to maintain / monitor / upgrade and I need also to
monitor zookeeper, a new appliance that need to be mastered by IT
Infrastructure.

Is there any guidelines on how to automate promoting a slave as a master in
classic Master Slave situation? I did not find anything official, because
auto promoting a slave into master could solve my problem.

--
Gian Maria Ricci
Cell: +39 320 0136949

-Original Message-
From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] 
Sent: martedì 8 dicembre 2015 11:25
To: solr-user@lucene.apache.org
Subject: Re: Use multiple istance simultaneously

Can you tolerate havin
g indices in different state or you plan to keep them in sync with
controlled commits. DIH-ing content from source when new machine is needed
will probably be slow and I am afraid that you will end up simulating
master-slave model (copying state from one of healthy nodes and DIH-ing
diff). I would recommend using SolrCloud with single shard and let Solr do
the hard work.

Regards,
Emir

On 04.12.2015 14:37, Gian Maria Ricci - aka Alkampfer wrote:
> Many thanks for your response.
>
> I worked with Solr until early version 4.0, then switched to 
> ElasticSearch for a variety of reasons. I've used replication in the 
> past with SolR, but with Elasticsearch basically I had no problem 
> because it works similar to SolrCloud by default and with almost zero
configuration.
>
> Now I've a customer that want to use Solr, and he want the simplest 
> possible stuff to maintain in production. Since most of the work will 
> be done by Data Import Handler, having multiple parallel and 
> independent mach
ines is easy to
> maintain. If one machine fails, it is enough to configure another 
> machine, configure core and restart DIH.
>
> I'd like to know if other people went through this path in the past.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>  
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: giovedì 3 dicembre 2015 10:15
> To: solr-user@lucene.apache.org
> Subject: Re: Use multiple istance simultaneously
>
> On 12/3/2015 1:25 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> In such a scenario could it be feasible to simply configure 2 or 3 
>> identical instance of Solr and configure the application that 
>> transfer data to solr to all the instances simultaneously (the 
>> approach will be a DIH incremental for some core and an external 
>> application that push data continuously for other cores)? Which could 
>> be the drawback of using this approach?
> When I first set up Solr, I used replication.  Then version 3.1.0 was 
> released, in
cluding a non-backward-compatible upgrade to javabin, and it was
> not possible to replicate between 1.x and 3.x.
>
> This incompatibility meant that it would not be possible to do a 
> gradual upgrade to 3.x, where the slaves are upgraded first and then the
master.
>
> To get around the problem, I basically did exactly wh at you've described.
> I turned off replication and configured a second copy of my build 
> program to update what used to be slave servers.
>
> Later, when I moved to a SolrJ program for index maintenance, I made 
> one copy of the maintenance program capable of updating multiple 
> copies of the index in parallel.
>
> I have stuck with this architecture through 4.x and moving into 5.x, 
> even though I could go back to replication or switch to SolrCloud.
> Having completely indepen

Block Join query

2015-12-11 Thread Novin


Hi Guys,

I'm trying  block join query, so I have tried   +{!parent 
which="doctype:200"}flow:624 worked fine. But when i tried +{!parent 
which="doctype:200"}flow:[624 TO 700]


Got the below error

org.apache.solr.search.SyntaxError: Cannot parse 'flow_l:[624': 
Encountered \"\" at line 1, column 11.\nWas expecting one of:\n
\"TO\" ...\n ...\n  ...\n


Just wondering too, can we able to do range in block join query.

Thanks,
Novin

Re: Solr 6 Distributed Join

2015-12-11 Thread Dennis Gove

Akiel,

Without seeing your full url I assume that you're missing the
stream=innerJoin(.) part of it. A full sample url would look like this
http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers,
fl="personId,companyId,title", q=companyId:*, sort="companyId
asc",zkHost="localhost:2181",qt="/export"),search(companies,
fl="id,companyName", q=*:*, sort="id
asc",zkHost="localhost:2181",qt="/export"),on="companyId=id")

This example will return a join of career records with the company name for
all career records with a non-null companyId.

And the pieces have the following meaning:
http://localhost:8983/solr/careers/stream?  - you have a collection called
careers available on localhost:8983 and you're hitting its stream handler
?stream=  - you are passing the stream parameter to the stream handler
zkHost="localhost:2181"  - there is a zk instance running on localhost:2181
where solr can get clusterstate information. Note, that since you're
sending the request to the careers collection this param is not required in
the search(careers) part but is required in the search(companies)
part. For simplicity I usually just provide it for all.
qt="/export"  - tells solr to use the export handler. this assumes all your
fields are in docValues. if you'd rather not use the export handler then
you probably want to provide the rows=# param to tell solr to return a
large # of rows for each underlying search. Without it solr will default
to, I believe, 10 rows.

CCing the user list so others can see this as well.

We're working on additional documentation for Streaming Aggregation and
Expressions. The page can be found at
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions but
it's missing a lot of things we've added recently.

- Dennis

On Fri, Dec 11, 2015 at 9:51 AM, Akiel Ahmed  wrote:

> Hi,
>
> Sorry, this is out of the blue - I have joined the Solr mailing list, but
> I don't know if that it is the correct place to ask my question. If you are
> not the best person to talk to can you please point me in the right
> direction.
>
> I want to try using the Solr 6 distributed joins but cant find enough
> material on the web to make it work. I have added the stream handler to my
> solrconfig.xml (see below) and when issuing an inner join query (see below)
> I get a an error - the localparm named stream is missing so I get a
> NullPointerException. Is there a way to play with the join via the Solr web
> UI, or if not do you have a code snippet via a SolrJ client that performs a
> join?
>
> solrconfig.xml
>
> 
> 
> json
> false
> 
> 
>
> query
> innerJoin(
> search(getting_started, _search_field:john),
> search(getting_started, _search_field:friends),
> on="id=_link_from_id")
>
> Cheers
>
> Akiel
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>

RE: Authorization API versus zkcli.sh

2015-12-11 Thread Oakley, Craig (NIH/NLM/NCBI) [C]

So, when one has finished constructing the desired security.json (by means of 
Authentication and Authorization commands) and then run "zkcli.sh -cmd getfile" 
to get this security.json in order for it to be used as a template: one should 
edit the template to remove this "":{"v":85} clause (and the comma which 
precedes it): correct?

I notice that the documented minimal security.json which simply creates the 
solr:SolrRocks login:pswd does not have such a clause: so I assume that the 
lack of such a clause is not an error.


From: Anshum Gupta [ans...@anshumgupta.net]
Sent: Friday, December 11, 2015 9:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Authorization API versus zkcli.sh

yes, that's the assumption. The reason why there's a version there is to
optimize on reloads i.e. Authentication and Authorization plugins are
reloaded only when the version number is changed. e.g.
* Start with Ver 1 for both authentication and authorization
* Make changes to Authentication, the version for this section is updated
to the znode version, while the version for the authorization section is
not changed. This forces the authentication plugin to be reloaded but not
the authorization plugin. Similarly for authorization.

It's a way to optimize the reloads without splitting the definition into 2
znodes, which is also an option.


On Fri, Dec 11, 2015 at 8:06 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Shouldn't this be the znode version? Why put a version in
> security.json? Or is the idea that the user will upload security.json
> only once and then use the security APIs for all further changes?
>
> On Fri, Dec 11, 2015 at 11:51 AM, Noble Paul  wrote:
> > Please do not put any number. That number is used by the system to
> > optimize loading/reloading plugins. It is not relevant for the user.
> >
> > On Thu, Dec 10, 2015 at 11:52 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
> >  wrote:
> >> Looking at security.json in Zookeeper, I notice that both the
> authentication section and the authorization section ends with something
> like
> >>
> >> "":{"v":47}},
> >>
> >> Am I correct in thinking that this 47 (in this case) is a version
> number, and that ANY number could be used in the file uploaded to
> security.json using "zkcli.sh -putfile"?
> >>
> >> Or is this some sort of checksum whose value must match some unclear
> criteria?
> >>
> >>
> >> -Original Message-
> >> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> >> Sent: Sunday, December 06, 2015 8:42 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Authorization API versus zkcli.sh
> >>
> >> There's nothing cluster specific in security.json if you're using those
> >> plugins. It is totally safe to just take the file from one cluster and
> >> upload it for another for things to work.
> >>
> >> On Sat, Dec 5, 2015 at 3:38 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
> >> craig.oak...@nih.gov> wrote:
> >>
> >>> Looking through
> >>>
> cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
> >>> one notices that security.json is initially created by zkcli.sh, and
> then
> >>> modified by means of the Authentication API and the Authorization API.
> By
> >>> and large, this sounds like a good way to accomplish such tasks,
> assuming
> >>> that these APIs do some error checking to prevent corruption of
> >>> security.json
> >>>
> >>> I was wondering about cases where one is cloning an existing Solr
> >>> instance, such as when creating an instance in Amazon Cloud. If one
> has a
> >>> security.json that has been thoroughly tried and successfully tested on
> >>> another Solr instance, is it possible / safe / not-un-recommended to
> use
> >>> zkcli.sh to load the full security.json (as extracted via zkcli.sh
> from the
> >>> Zookeeper of the thoroughly tested existing instance)? Or would the
> >>> official verdict be that the only acceptable way to create
> security.json is
> >>> to load a minimal version with zkcli.sh and then to build the remaining
> >>> components with the Authentication API and the Authorization API (in a
> >>> script, if one wants to automate the process: although such a script
> would
> >>> have to include plain-text passwords)?
> >>>
> >>> I figured there is no harm in asking.
> >>>
> >>
> >>
> >>
> >> --
> >> Anshum Gupta
> >
> >
> >
> > --
> > -
> > Noble Paul
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



--
Anshum Gupta

Re: Schema API, change the defaultoperator

2015-12-11 Thread Shawn Heisey

On 12/11/2015 8:02 AM, Yago Riveiro wrote:
> I uploaded a schema.xml manualy with the defaultoperator configuration and 
> it's working.
> 
> My problem is that my legacy application is huge and I can't go to all places 
> to add the q.op parameter.
> 
> The solrconfig.xml option should be an option. The q.op param defined in 
> request handlers works with POST http calls?

Anything you put in the handler definition will work with either GET or
POST requests.  Solr doesn't care how the information gets there.

Thanks,
Shawn

Re: Joins with SolrCloud

2015-12-11 Thread Dennis Gove

Mugeesh,

You can use Streaming Aggregation to provide various types of
cross-collection joins. This is currently available in trunk and will be a
part of Solr 6.

To follow with your example, let's assume the following setup:
Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345

You could send a streaming query to solr that would return all reviews for
restaurants in NYC and include the user's hometown

hashJoin(
  innerJoin(
search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
asc", zkHost="zk2:2345", qt="/export"),
search(reviews, q="*:*", fl="userId, review, score", sort="userId asc",
zkHost="zk1:2345", qt="/export"),
on="userId"
  ),
  hashed=search(restaurants, q="city:nyc", fl="restaurantId,
restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
  on="restaurantId"
)

Note that the # of shards doesn't matter and doesn't need to be considered
as a part of your query. Were you to send this off to a url for result,
it'd look like this

http://machine1:8983/solr/users/stream?stream=
[the
expression above]

Additional information about Streaming API, Streaming Aggregation, and
Streaming Expressions can be found at
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
though this is currently incomplete as a lot of the new features have yet
to be added to the documentation.

For those interested, joins were added under tickets
https://issues.apache.org/jira/browse/SOLR-7584 and
https://issues.apache.org/jira/browse/SOLR-8188.

- Dennis

On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain  wrote:

> I have create 3 cores  on same machine using solrlcoud.
> core: Restaurant,User,Review
> each of core has only 1 shards and 2 replicas.
>
> Question
> 1.) It is possible to use join among 3 of cores on same machine( or
> different machine)
> 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
>
> Client: is not interested to de-normalized data.
>
> Give some suggestion how to solved that problem.
>
> Thanks
> Mugeesh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Joins with SolrCloud

2015-12-11 Thread Dennis Gove

Something I forgot to mention - the collection shards can live on any
number of machines, anywhere in the world. As long as the clusterstate in
zk knows where the shard can be found (ie, a basis of SolrCloud) then
everything will work. The example I gave had the shards living on the same
machine but that is not a requirement.

On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove  wrote:

> Mugeesh,
>
> You can use Streaming Aggregation to provide various types of
> cross-collection joins. This is currently available in trunk and will be a
> part of Solr 6.
>
> To follow with your example, let's assume the following setup:
> Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
> Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
> Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345
>
> You could send a streaming query to solr that would return all reviews for
> restaurants in NYC and include the user's hometown
>
> hashJoin(
>   innerJoin(
> search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
> asc", zkHost="zk2:2345", qt="/export"),
> search(reviews, q="*:*", fl="userId, review, score", sort="userId
> asc", zkHost="zk1:2345", qt="/export"),
> on="userId"
>   ),
>   hashed=search(restaurants, q="city:nyc", fl="restaurantId,
> restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>   on="restaurantId"
> )
>
> Note that the # of shards doesn't matter and doesn't need to be considered
> as a part of your query. Were you to send this off to a url for result,
> it'd look like this
>
> http://machine1:8983/solr/users/stream?stream=
> [the
> expression above]
>
> Additional information about Streaming API, Streaming Aggregation, and
> Streaming Expressions can be found at
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> though this is currently incomplete as a lot of the new features have yet
> to be added to the documentation.
>
> For those interested, joins were added under tickets
> https://issues.apache.org/jira/browse/SOLR-7584 and
> https://issues.apache.org/jira/browse/SOLR-8188.
>
> - Dennis
>
>
> On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain  wrote:
>
>> I have create 3 cores  on same machine using solrlcoud.
>> core: Restaurant,User,Review
>> each of core has only 1 shards and 2 replicas.
>>
>> Question
>> 1.) It is possible to use join among 3 of cores on same machine( or
>> different machine)
>> 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
>>
>> Client: is not interested to de-normalized data.
>>
>> Give some suggestion how to solved that problem.
>>
>> Thanks
>> Mugeesh
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>

Re: Use multiple istance simultaneously

2015-12-11 Thread Shawn Heisey

On 12/11/2015 8:19 AM, Gian Maria Ricci - aka Alkampfer wrote:
> Thanks for all of your clarification. I know that solrcloud is a really
> better configuration than any other, but actually it has a complexity that
> is really higher. I just want to give you the pain point I've noticed while
> I was gathering all the info I can got on SolrCloud.
> 
> 1) zookeeper documentation says that to have the best experience you should
> have a dedicated filesystem for the persistence and it should never swap to
> disk. I've not found any guidelines on how I should dimension zookeeper
> machine, how much ram, disk? Can I install zookeeper in the same machines
> where Solr resides ( I suspect no, because Solr machine are under stress and
> if zookeeper start swapping is can lead to problem)?

Standalone zookeeper doesn't require much in the way of resources.
Unless the SolrCloud installation is enormous, a machine with 1-2GB of
RAM is probably plenty, if the only thing it is doing is zookeeper and
it's not running Windows.  If the SolrCloud install has a lot of
collections, shards, and/or servers, then you might need more, because
the zookeeper database will be larger.

> 2) What about the update? If I need to update my solrcloud instance and the
> new version requires a new version of zookeeper which is the path to go? I
> need to first update zookeeper, or upgrading solr to existing machine or?
> Maybe I did not search well but I did not find a comprehensive guideline
> that told me how to upgrade my SolrCloud installation in various situation. 

If you're following recommendations and using standalone zookeeper, then
upgrading it is entirely separate from upgrading Solr.  It's probably a
good idea to upgrade your three (or more) zookeeper servers first.

Here's a FAQ entry from zookeeper about upgrades:

https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A6

> 3) Which are the best practices to run DIH in solrcloud? I think I can round
> robin triggering DIH import on different server composing the cloud
> infrastructure, or there is a better way to go? (I probably need to trigger
> a DIH each 5/10 minutes but the number of new records is really small)

When checking the status of an import, you must send the status request
to the same machine where you sent the command to start the import.

If you're only ever going to run one DIH at a time, then I don't see any
reason to involve multiple servers.  If you want to run more than one
simultaneously, then you might want to run each one on a different machine.

> 4) Since I believe that it is not best practice to install zookeeper on same
> SolrMachine (as separated process, not the built in zookeeper), I need at
> least three more machine to maintain / monitor / upgrade and I need also to
> monitor zookeeper, a new appliance that need to be mastered by IT
> Infrastructure.

The only real reason to avoid zookeeper and Solr on the same machine is
performance under high load, and mostly that comes down to I/O
performance, so if you can put zookeeper on a separate set of disks,
you're probably good.  If the query/update load will not be high, then
sharing machines will likely work well, even if the disks are all shared.

> Is there any guidelines on how to automate promoting a slave as a master in
> classic Master Slave situation? I did not find anything official, because
> auto promoting a slave into master could solve my problem.

I don't know of any explicit information explaining how to promote a new
master.  Basically what you have to do is reconfigure the new master's
replication (so it stops trying to be a slave), reconfigure every slave
to point to the new master, and reconfigure every client that makes
index updates.  DNS changes *might* be able to automate the slave and
update client reconfig, but the master reconfig requires changing Solr's
configuration, which at the very least will require reloading or
restarting that server.  That could be automated, but it's up to you to
write the automation.

Thanks,
Shawn

Solrcloud 4.8.1 - Solr cores reload

2015-12-11 Thread Vincenzo D'Amore

Hi all,

in day by day work, often I need to change the solr configurations files.
Often adding new synonyms, changing the schema or the solrconfig.xml.

Everything is stored in zookeeper.

But I have inherited a piece of code that, after every change, reload all
the cores using CoreAdmin API.

Now I have 15 replicas in the collection, and after every core reload the
code waits for 60 seconds (I suppose it's because who wrote the code was
worried about the cache invalidation).

Given that, it takes about 25 minutes to update all the cores. Obviously
during this time we cannot modify the collection.

The question is, to reduce this wait, if I use the collection API RELOAD,
what are the counter indication?

Thanks in advance for your time,
Vincenzo


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

SolrCloud 4.8.1 - commit wait

2015-12-11 Thread Vincenzo D'Amore

Hi all,

I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards and
15 replicas.
There is a solrj application that feeds the collection, updating few
documents every hour, I don't understand why, at end of process, the hard
commit takes about 8/10 minutes.

Even if there are only few hundreds of documents.

This is the autocommit configuration:


1
1000
false


In your experience why hard commit takes so long even for so few documents?

Now I'm changing the code to softcommit, calling commit (waitFlush =
false, waitSearcher
= false, softCommit = true);

solrServer.commit(false, false, true);.

I have configured NRTCachingDirectoryFactory, but I'm a little bit worried
if a server goes down (something like: kill -9, SolrCloud crashes, out of
memory, etc.), and if, using this strategy softcommit+NRTCachingDirectory,
SolrCloud instance could not recover a replica.

Should I worry about this new configuration? I was thinking to take a
snapshot of everything every day, in order to recover immediately the
index. Could this be considered a best practice?

Thanks in advance for your time,
Vincenzo

-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Unable to create lot of cores -- Failing after 100 cores

2015-12-11 Thread Venkat Paranjothi

Hello all,

We need to create around 300 collections with replication factor 2.  But
after creating 100, we couldn't create more and most of them are in RED
state in the solrcloud.

Is this issue related to zookeeper jute.maxBuffer issue?   If so, how can
we increase the size of zookeeper maxbuffer and memory size.

Thanks
Venkat

How Json facet API works with domains and facet functions?

2015-12-11 Thread Yago Riveiro

Hi,

How the json facet api works with domains and facet functions?

I try to google some info and I do not find nothing useful.

How can do a query that find all parents that match a clause (a date) and
calculate the avg price of all of children that have property X?

Following yonik's blog example I try something like this:

http://localhost:8983/solr/query?q={!parent
which="parent_type:ecommerce"}date:2015-12-11T00:00:00Z&json.facet={x:'avg(price)',
domain: { blockChildren : "parent_type:ecommerce"}} 

but doesn't work.



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-Json-facet-API-works-with-domains-and-facet-functions-tp4244907.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How Json facet API works with domains and facet functions?

2015-12-11 Thread Yonik Seeley

If you search on the parents and want to match child documents, I
think you want {!child} and not {!parent} in your queries or filters.

fq={!child of=...}date_query_on_parents
fq=child_prop:X

For this specific example, you don't even need the block-join support
in facets since the base domain (query+filters) will already be the
child docs you want to facet over.

-Yonik


On Fri, Dec 11, 2015 at 11:46 AM, Yago Riveiro  wrote:
> Hi,
>
> How the json facet api works with domains and facet functions?
>
> I try to google some info and I do not find nothing useful.
>
> How can do a query that find all parents that match a clause (a date) and
> calculate the avg price of all of children that have property X?
>
> Following yonik's blog example I try something like this:
>
> http://localhost:8983/solr/query?q={!parent
> which="parent_type:ecommerce"}date:2015-12-11T00:00:00Z&json.facet={x:'avg(price)',
> domain: { blockChildren : "parent_type:ecommerce"}}
>
> but doesn't work.
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-Json-facet-API-works-with-domains-and-facet-functions-tp4244907.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to secure standalone solr

2015-12-11 Thread Don Bosco Durai

>Anyone told me how to secure standalone solr .
Recently there were few discussion on this. In short, it is not tested and 
there doesn’t seem to a plan to test it.

>1.)using Kerberos Plugin is a good practice or any other else.
The answer depends how you are using it. Where you are deploying it, who is 
accessing it, whether you want to restrict by access type (read/write), what 
authentication environment (LDAP/AD, Kerberos, etc) you already have.

Depending upon your use cases and environment, you may have one or more options.

Bosco






On 12/11/15, 4:27 AM, "Mugeesh Husain"  wrote:

>Hello,
>
>Anyone told me how to secure standalone solr .
>
>1.)using Kerberos Plugin is a good practice or any other else.
>
>
>
>--
>View this message in context: 
>http://lucene.472066.n3.nabble.com/how-to-secure-standalone-solr-tp4244866.html
>Sent from the Solr - User mailing list archive at Nabble.com.

Re: How Json facet API works with domains and facet functions?

2015-12-11 Thread Yago Riveiro

One more question.


It’s posisble use the domain clause in json facet without a term query?




Ex.





json.facet={

    x:'avg(price)',

   domain: { blockChildren : "parent_type:ecommerce”}

}




This make any sense, or I always should reduce the domain using the query and 
filters.




—/Yago Riveiro

On Fri, Dec 11, 2015 at 5:17 PM, Yonik Seeley  wrote:

> If you search on the parents and want to match child documents, I
> think you want {!child} and not {!parent} in your queries or filters.
> fq={!child of=...}date_query_on_parents
> fq=child_prop:X
> For this specific example, you don't even need the block-join support
> in facets since the base domain (query+filters) will already be the
> child docs you want to facet over.
> -Yonik
> On Fri, Dec 11, 2015 at 11:46 AM, Yago Riveiro  wrote:
>> Hi,
>>
>> How the json facet api works with domains and facet functions?
>>
>> I try to google some info and I do not find nothing useful.
>>
>> How can do a query that find all parents that match a clause (a date) and
>> calculate the avg price of all of children that have property X?
>>
>> Following yonik's blog example I try something like this:
>>
>> http://localhost:8983/solr/query?q={!parent
>> which="parent_type:ecommerce"}date:2015-12-11T00:00:00Z&json.facet={x:'avg(price)',
>> domain: { blockChildren : "parent_type:ecommerce"}}
>>
>> but doesn't work.
>>
>>
>>
>> -
>> Best regards
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/How-Json-facet-API-works-with-domains-and-facet-functions-tp4244907.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Joins with SolrCloud

2015-12-11 Thread Joel Bernstein

You can also do the innerJoin in parallel across worker nodes using the
parallel function:

hashJoin(
parallel(workerCollection,
innerJoin(
search(users, q="*:*",
fl="userId, full_name, hometown", sort="userId asc", zkHost="zk2:2345",
qt="/export" partitionKeys="userId"),
search(reviews, q="*:*",
fl="userId, review, score", sort="userId asc", zkHost="zk1:2345",
qt="/export" partitionKeys="userId"),
on="userId"
),
 workers="20",
 zkHost="zk1:2345",
 sort="userId asc"
 ),
   hashed=search(restaurants, q="city:nyc",
fl="restaurantId, restaurantName",
sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
   on="restaurantId"
)

The parallel function will return the tuples from the innerJoin which is
performed on 20 workers in this example. The worker nodes will be selected
from "workerCollection" which can be any SolrCloud collection with enough
nodes. The "partitionKeys" parameter has been added to searches so that
results with the same userId are shuffled to the same worker node.



Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Dec 11, 2015 at 11:00 AM, Dennis Gove  wrote:

> Mugeesh,
>
> You can use Streaming Aggregation to provide various types of
> cross-collection joins. This is currently available in trunk and will be a
> part of Solr 6.
>
> To follow with your example, let's assume the following setup:
> Restaurants: avail on machine1:8983 with 3 shards, zk at zk1:2345
> Users: avail on machine2:8983 with 2 shards, zk at zk2:2345
> Reviews: avail on machine1:8983 with 10 shards, zk at zk1:2345
>
> You could send a streaming query to solr that would return all reviews for
> restaurants in NYC and include the user's hometown
>
> hashJoin(
>   innerJoin(
> search(users, q="*:*", fl="userId, full_name, hometown", sort="userId
> asc", zkHost="zk2:2345", qt="/export"),
> search(reviews, q="*:*", fl="userId, review, score", sort="userId asc",
> zkHost="zk1:2345", qt="/export"),
> on="userId"
>   ),
>   hashed=search(restaurants, q="city:nyc", fl="restaurantId,
> restaurantName", sort="restaurantId asc", zkHost="zk1:2345", qt="/export"),
>   on="restaurantId"
> )
>
> Note that the # of shards doesn't matter and doesn't need to be considered
> as a part of your query. Were you to send this off to a url for result,
> it'd look like this
>
> http://machine1:8983/solr/users/stream?stream=
>  >[the
> expression above]
>
> Additional information about Streaming API, Streaming Aggregation, and
> Streaming Expressions can be found at
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions,
> though this is currently incomplete as a lot of the new features have yet
> to be added to the documentation.
>
> For those interested, joins were added under tickets
> https://issues.apache.org/jira/browse/SOLR-7584 and
> https://issues.apache.org/jira/browse/SOLR-8188.
>
> - Dennis
>
>
> On Mon, Dec 7, 2015 at 7:42 AM, Mugeesh Husain  wrote:
>
> > I have create 3 cores  on same machine using solrlcoud.
> > core: Restaurant,User,Review
> > each of core has only 1 shards and 2 replicas.
> >
> > Question
> > 1.) It is possible to use join among 3 of cores on same machine( or
> > different machine)
> > 2.)I am struggling how to use join among 3 of core in solrlcoud mode.
> >
> > Client: is not interested to de-normalized data.
> >
> > Give some suggestion how to solved that problem.
> >
> > Thanks
> > Mugeesh
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4243957.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

SolrCloud page is blank

2015-12-11 Thread Aswath Srinivasan (TMS)

Hi All,

We have set up a solr 5.3.1. Now I realize that in the solr admin UI, the cloud 
page is blank. What could be the reason behind this? Following are the 
exceptions that I’m seeing in the logs

12/11/2015, 9:58:37 AM

WARN

null

ClientCnxn

Session 0x25111a5595ab885 for server null, unexpected error, closing socket 
connection and attempting reconnect


java.net.ConnectException: Connection refused

 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

 at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)

 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)


12/11/2015, 9:58:36 AM

WARN

null

ClientCnxn

Session 0x25111a5595ab885 for server abc01.abc.anbc.com/10.15.12.122:2181, 
unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Unreasonable length = 2703892
 at 
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:95)
 at 
org.apache.zookeeper.proto.GetDataResponse.deserialize(GetDataResponse.java:54)
 at 
org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:814)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
 at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)


Thank you,
Aswath NS
Mobile  +1 424 345 5340
Office+1 310 468 6729

Re: Unable to create lot of cores -- Failing after 100 cores

2015-12-11 Thread Erick Erickson

A quick Google search shows the following:

"you must set -Djute.maxbuffer in zookeeper and solr..."

What have you tried? What were the results?

What version of Solr are you using? 5.x defaults to
an individual state.json file per collection rather than
one big one for all collections, that will also help.

Best,
Erick

On Fri, Dec 11, 2015 at 8:40 AM, Venkat Paranjothi  wrote:
> Hello all,
>
> We need to create around 300 collections with replication factor 2.  But
> after creating 100, we couldn't create more and most of them are in RED
> state in the solrcloud.
>
> Is this issue related to zookeeper jute.maxBuffer issue?   If so, how can
> we increase the size of zookeeper maxbuffer and memory size.
>
> Thanks
> Venkat

Re: SolrCloud 4.8.1 - commit wait

2015-12-11 Thread Erick Erickson

First of all, your autocommit settings are _very_ aggressive. Committing
every second is far to frequent IMO.

As an aside, I generally prefer to omit the maxDocs as it's not all
that predictable,
but that's a personal preference and really doesn't bear on your problem..

My _guess_ is that you are doing a lot of autowarming. The number of docs
doesn't really matter if your autowarming is taking forever, your Solr logs
should report the autowarm times at INFO level, have you checked those?

The commit settings shouldn't be a problem in terms of your server dying,
the indexing process flushes docs to the tlog independent of committing so
upon restart they should be recovered. Here's a blog on the subject:

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Fri, Dec 11, 2015 at 8:24 AM, Vincenzo D'Amore  wrote:
> Hi all,
>
> I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards and
> 15 replicas.
> There is a solrj application that feeds the collection, updating few
> documents every hour, I don't understand why, at end of process, the hard
> commit takes about 8/10 minutes.
>
> Even if there are only few hundreds of documents.
>
> This is the autocommit configuration:
>
> 
> 1
> 1000
> false
> 
>
> In your experience why hard commit takes so long even for so few documents?
>
> Now I'm changing the code to softcommit, calling commit (waitFlush =
> false, waitSearcher
> = false, softCommit = true);
>
> solrServer.commit(false, false, true);.
>
> I have configured NRTCachingDirectoryFactory, but I'm a little bit worried
> if a server goes down (something like: kill -9, SolrCloud crashes, out of
> memory, etc.), and if, using this strategy softcommit+NRTCachingDirectory,
> SolrCloud instance could not recover a replica.
>
> Should I worry about this new configuration? I was thinking to take a
> snapshot of everything every day, in order to recover immediately the
> index. Could this be considered a best practice?
>
> Thanks in advance for your time,
> Vincenzo
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251

Re: Solrcloud 4.8.1 - Solr cores reload

2015-12-11 Thread Erick Erickson

You should absolutely always use the Collection API rather than
any core admin API if at all possible. If for no other reason
than your client will be _lots_ simpler (i.e. you don't have
to find all the replicas and issue the core admin RELOAD
command for each one).

I'm not entirely sure whether the RELOAD command is
synchronous or not though.

Best,
erick

On Fri, Dec 11, 2015 at 8:22 AM, Vincenzo D'Amore  wrote:
> Hi all,
>
> in day by day work, often I need to change the solr configurations files.
> Often adding new synonyms, changing the schema or the solrconfig.xml.
>
> Everything is stored in zookeeper.
>
> But I have inherited a piece of code that, after every change, reload all
> the cores using CoreAdmin API.
>
> Now I have 15 replicas in the collection, and after every core reload the
> code waits for 60 seconds (I suppose it's because who wrote the code was
> worried about the cache invalidation).
>
> Given that, it takes about 25 minutes to update all the cores. Obviously
> during this time we cannot modify the collection.
>
> The question is, to reduce this wait, if I use the collection API RELOAD,
> what are the counter indication?
>
> Thanks in advance for your time,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251

Re: Nested document query with wrong numFound value

2015-12-11 Thread Mikhail Khludnev

Ok. I got it. SolrCloud relies on uniqueKey (id) for merging shard results,
but in your examples it doesn't work, because nested documents disables
this. And you have duplicates, which make merge heap mad:

false}
},response={numFound=11,start=0,maxScore=1.0,docs=[SolrDocument{id=31814269823181426982280,
score=1.0}, SolrDocument{id=31814269823181426982280, score=1.0},
SolrDocument{id=31814269823181426982280, score=1.0},
SolrDocument{id=31814269823181426982280, score=1.0},
SolrDocument{id=31814269823181426982280, score=1.0},
SolrDocument{id=31814269823181426982281, score=1.0},
SolrDocument{id=31814269823181426982281, score=1.0},
SolrDocument{id=31814269823181426982281, score=1.0},
SolrDocument{id=31814269823181426982281, score=1.0},

Yago, you encounter a quite curious fact. Congratulation!
You can only retrieve parent document with SolrCloud, hence use {!parent
..}.. of fq=type:parent.

ccing Devs:
Shouldn't it prosecute ID dupes explicitly? Is it a known feature?


On Fri, Dec 11, 2015 at 5:08 PM, Yago Riveiro 
wrote:

> This:
>
>
>
>
>
> {
>
>
> responseHeader: {
>
>
> status: 0,
>
>
> QTime: 10,
>
>
> params: {
>
>
> q: "id:3181426982318142698228*",
>
>
> debugQuery: "true"
>
>
> }
>
>
> },
>
>
> response: {
>
>
> numFound: 3,
>
>
> start: 0,
>
>
> maxScore: 1,
>
>
> docs: [{
>
>
> id: "31814269823181426982280",
>
>
> child_type: "ecommerce_product",
>
>
> qty: 1,
>
>
> product_price: 49.99
>
>
> }, {
>
>
> id: "31814269823181426982281",
>
>
> child_type: "ecommerce_product",
>
>
> qty: 1,
>
>
> product_price: 139.9
>
>
> }]
>
>
> },
>
>
> debug: {
>
>
> track: {
>
>
> rid:
> "node-01-ecommerce-15_shard1_replica2-1449842438070-0",
>
>
> EXECUTE_QUERY: {
>
>
> http:
> //node-17:8983/solr/ecommerce-15_shard2_replica1/: {
>
>
> QTime: "0",
>
>
> ElapsedTime: "2",
>
>
> RequestPurpose: "GET_TOP_IDS",
>
>
> NumFound: "0",
>
>
> Response:
> "{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
> timing, track],qt=/query,fl=[id,
> score],shards.purpose=4,start=0,fsv=true,shard.url=
> http://node-17:8983/solr/ecommerce-15_shard2_replica1/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}
> },response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}"
>
>
> },
>
>
> http:
> //node-01:8983/solr/ecommerce-15_shard1_replica2/: {
>
>
> QTime: "0",
>
>
> ElapsedTime: "2",
>
>
> RequestPurpose: "GET_TOP_IDS",
>
>
> NumFound: "11",
>
>
> Response:
> "{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
> timing, track],qt=/query,fl=[id,
> score],shards.purpose=4,start=0,fsv=true,shard.url=
> http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}},response={numFound=11,start=0,maxScore=1.0,docs=[SolrDocument{id=31814269823181426982280,
> score=1.0}, SolrDocument{id=31814269823181426982280, score=1.0},
> SolrDocument{id=31814269823181426982280, score=1.0},
> SolrDocument{id=31814269823181426982280, score=1.0},
> SolrDocument{id=31814269823181426982280, score=1.0},
> SolrDocument{id=31814269823181426982281, score=1.0},
> SolrDocument{id=31814269823181426982281, score=1.0},
> SolrDocument{id=31814269823181426982281, score=1.0},
> SolrDocument{id=31814269823181426982281, score=1.0}

Re: Block Join query

2015-12-11 Thread Mikhail Khludnev

Novin,

I regret so much. It's my pet peeve in Solr query parsing. Handling s space
is dependent from the first symbol of query sting
This will work (starts from '{!' ):
q={!parent which="doctype:200"}flow:[624 TO 700]
These won't due to " ", "+":
q= {!parent which="doctype:200"}flow:[624 TO 700]
q=+{!parent which="doctype:200"}flow:[624 TO 700]
Subordinate clauses with spaces are better handled with "Nested Queries" or
so, check the post

On Fri, Dec 11, 2015 at 6:31 PM, Novin  wrote:

> Hi Guys,
>
> I'm trying  block join query, so I have tried   +{!parent
> which="doctype:200"}flow:624 worked fine. But when i tried +{!parent
> which="doctype:200"}flow:[624 TO 700]
>
> Got the below error
>
> org.apache.solr.search.SyntaxError: Cannot parse 'flow_l:[624':
> Encountered \"\" at line 1, column 11.\nWas expecting one of:\n
> \"TO\" ...\n ...\n  ...\n
>
> Just wondering too, can we able to do range in block join query.
>
> Thanks,
> Novin
>
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro

When do you say that I have duplicates, what do you mean? 


If I have duplicate documents is not intentional, each document must be unique.


Running a query for each id:





- Parent :  3181426982318142698228

- Child_1 : 31814269823181426982280

- Child_2 : 31814269823181426982281




The result is one document for each …





responseHeader: 
{
status: 0,



QTime: 3,



params: 
{
q: "id:3181426982318142698228",



fl: "id",



q.op: "AND"



}


},



response: 
{
numFound: 1,



start: 0,



maxScore: 11.017976,



docs: 
[

{
id: "3181426982318142698228"


}

]


}







responseHeader: 
{
status: 0,



QTime: 3,



params: 
{
q: "id:31814269823181426982280",



fl: "id",



q.op: "AND"



}


},



response: 
{
numFound: 1,



start: 0,



maxScore: 9.919363,



docs: 
[

{
id: "31814269823181426982280"


}

]


}






responseHeader: 
{
status: 0,



QTime: 3,



params: 
{
q: "id:31814269823181426982281",



fl: "id",



q.op: "AND"



}


},



response: 
{
numFound: 1,



start: 0,



maxScore: 9.919363,



docs: 
[

{
id: "31814269823181426982281"


}

]


}










—/Yago Riveiro





Ok. I got it. SolrCloud relies on uniqueKey (id) for merging shard results,

but in your examples it doesn't work, because nested documents disables

this. And you have duplicates, which make merge heap mad:


false}

},response={numFound=11,start=0,maxScore=1.0,docs=[SolrDocument{id=31814269823181426982280,

score=1.0}, SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982280, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},

SolrDocument{id=31814269823181426982281, score=1.0},


Yago, you encounter a quite curious fact. Congratulation!

You can only retrieve parent document with SolrCloud, hence use {!parent

..}.. of fq=type:parent.


ccing Devs:

Shouldn't it prosecute ID dupes explicitly? Is it a known feature?



On Fri, Dec 11, 2015 at 5:08 PM, Yago Riveiro 

wrote:


> This:

>

>

>

>

>

> {

>

>

> responseHeader: {

>

>

> status: 0,

>

>

> QTime: 10,

>

>

> params: {

>

>

> q: "id:3181426982318142698228*",

>

>

> debugQuery: "true"

>

>

> }

>

>

> },

>

>

> response: {

>

>

> numFound: 3,

>

>

> start: 0,

>

>

> maxScore: 1,

>

>

> docs: [{

>

>

> id: "31814269823181426982280",

>

>

> child_type: "ecommerce_product",

>

>

> qty: 1,

>

>

> product_price: 49.99

>

>

> }, {

>

>

> id: "31814269823181426982281",

>

>

> child_type: "ecommerce_product",

>

>

> qty: 1,

>

>

> product_price: 139.9

>

>

> }]

>

>

> },

>

>

> debug: {

>

>

> track: {

>

>

> rid:

> "node-01-ecommerce-15_shard1_replica2-1449842438070-0",

>

>

> EXECUTE_QUERY: {

>

>

> http:

> //node-17:8983/solr/ecommerce-15_shard2_replica1/: {

>

>

> QTime: "0",

>

>

> ElapsedTime: "2",

>

>

> RequestPurpose: "GET_TOP_IDS",

>

>

> NumFound: "0",

>

>

> Response:

> "{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,

> timing, track],qt=/query,fl=[id,

> score],shards.purpose=4,start=0,fsv=true,shard.url=

> http://node-17:8983/solr/ecommerce-15_shard2_replica1/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}

> },response={numFound=0,start=0,maxScore=0.0,docs=[]},sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}"

>

>

> },

>

>

> http

Re: Unable to create lot of cores -- Failing after 100 cores

2015-12-11 Thread Venkat Paranjothi

Thanks Eric,

This is what we did to the zookeeper and solr settings..   Still, we are
not seeing the improvement in the collection creation..  it takes lot of
time to see the collection on the Solrcloud.

added the following line in zkServer.sh

export JVMFLAGS="$JVMFLAGS -Xms256m -Xmx1g -Djute.maxBuffer=10485760"

added the following into   catalina.sh
JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/opt/apache_solr/installed/solr
-Dport=8080 -DhostContext=solr -Dsolr.Data.dir=/solr/dataidx
-Djute.maxBuffer=10485760  -DzkClientTimeout=20
-DzkHost=x.x.x.x:2181,y.y.y.y:2181,z.z.z.z:2181
-Dcollection.configname=collection_configuration_v1"

Here is my zoo.cfg settings

# The number of milliseconds of each tick
tickTime=3000

# The number of ticks that the initial synchronization phase can take
initLimit=200

maxClientCnxns=0

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=50

# the directory where the snapshot is stored.
# Choose appropriately for your environment
dataDir=/opt/apache_solr/solrcloud/zookeeper_data

# the port at which the clients will connect
clientPort=2181

# the directory where transaction log is stored.
# this parameter provides dedicated log device for ZooKeeper
dataLogDir=/opt/apache_solr/solrcloud/zookeeper_log

# ZooKeeper server and its port no.
# ZooKeeper ensemble should know about every other machine in the ensemble
# specify server id by creating 'myid' file in the dataDir
# use hostname instead of IP address for convenient maintenance
server.1=x.x.x.x:2888:3888
server.2=y.y.y.y:2888:3888
server.3=z.z.z.z:2888:3888

Thanks,

Venkat Paranjothi
Kenexa 2xB Software Engineering Team

   Phone: 1-978-899-2746 
   E-mail: vpara...@us.ibm.com   

From:   Erick Erickson 
To: solr-user 
Date:   12/11/2015 01:16 PM
Subject:Re: Unable to create lot of cores -- Failing after 100 cores

A quick Google search shows the following:

"you must set -Djute.maxbuffer in zookeeper and solr..."

What have you tried? What were the results?

What version of Solr are you using? 5.x defaults to
an individual state.json file per collection rather than
one big one for all collections, that will also help.

Best,
Erick

On Fri, Dec 11, 2015 at 8:40 AM, Venkat Paranjothi 
wrote:
> Hello all,
>
> We need to create around 300 collections with replication factor 2.  But
> after creating 100, we couldn't create more and most of them are in RED
> state in the solrcloud.
>
> Is this issue related to zookeeper jute.maxBuffer issue?   If so, how can
> we increase the size of zookeeper maxbuffer and memory size.
>
> Thanks
> Venkat

Re: Unable to create lot of cores -- Failing after 100 cores

2015-12-11 Thread Shalin Shekhar Mangar

Which version of Solr are you using? As Erick, said, use the latest 5.3.1
release, it is much more better in handling many collections.

On Sat, Dec 12, 2015 at 1:39 AM, Venkat Paranjothi 
wrote:

> Thanks Eric,
>
> This is what we did to the zookeeper and solr settings.. Still, we are not
> seeing the improvement in the collection creation.. it takes lot of time to
> see the collection on the Solrcloud.
>
> added the following line in zkServer.sh
>
> export JVMFLAGS="$JVMFLAGS -Xms256m -Xmx1g -Djute.maxBuffer=10485760"
>
> added the following into catalina.sh
> JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/opt/apache_solr/installed/solr
> -Dport=8080 -DhostContext=solr -Dsolr.Data.dir=/solr/dataidx
> -Djute.maxBuffer=10485760 -DzkClientTimeout=20
> -DzkHost=x.x.x.x:2181,y.y.y.y:2181,z.z.z.z:2181
> -Dcollection.configname=collection_configuration_v1"
>
>
> *Here is my zoo.cfg settings*
>
> # The number of milliseconds of each tick
> tickTime=3000
>
> # The number of ticks that the initial synchronization phase can take
> initLimit=200
>
> maxClientCnxns=0
>
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=50
>
> # the directory where the snapshot is stored.
> # Choose appropriately for your environment
> dataDir=/opt/apache_solr/solrcloud/zookeeper_data
>
> # the port at which the clients will connect
> clientPort=2181
>
> # the directory where transaction log is stored.
> # this parameter provides dedicated log device for ZooKeeper
> dataLogDir=/opt/apache_solr/solrcloud/zookeeper_log
>
> # ZooKeeper server and its port no.
> # ZooKeeper ensemble should know about every other machine in the ensemble
> # specify server id by creating 'myid' file in the dataDir
> # use hostname instead of IP address for convenient maintenance
> server.1=x.x.x.x:2888:3888
> server.2=y.y.y.y:2888:3888
> server.3=z.z.z.z:2888:3888
>
>
>
> Thanks,
>
> *Venkat Paranjothi*
> Kenexa 2xB Software Engineering Team
> --
> *Phone:* 1-978-899-2746
> *E-mail:* *vpara...@us.ibm.com* 
>
>
>
>
> [image: Inactive hide details for Erick Erickson ---12/11/2015 01:16:57
> PM---A quick Google search shows the following: "you must set -]Erick
> Erickson ---12/11/2015 01:16:57 PM---A quick Google search shows the
> following: "you must set -Djute.maxbuffer in zookeeper and solr..."
>
> From: Erick Erickson 
> To: solr-user 
> Date: 12/11/2015 01:16 PM
> Subject: Re: Unable to create lot of cores -- Failing after 100 cores
> --
>
>
>
> A quick Google search shows the following:
>
> "you must set -Djute.maxbuffer in zookeeper and solr..."
>
> What have you tried? What were the results?
>
> What version of Solr are you using? 5.x defaults to
> an individual state.json file per collection rather than
> one big one for all collections, that will also help.
>
> Best,
> Erick
>
> On Fri, Dec 11, 2015 at 8:40 AM, Venkat Paranjothi 
> wrote:
> > Hello all,
> >
> > We need to create around 300 collections with replication factor 2.  But
> > after creating 100, we couldn't create more and most of them are in RED
> > state in the solrcloud.
> >
> > Is this issue related to zookeeper jute.maxBuffer issue?   If so, how can
> > we increase the size of zookeeper maxbuffer and memory size.
> >
> > Thanks
> > Venkat
>
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr memory usage

2015-12-11 Thread Otis Gospodnetić

Hi Steve,

Fluctuation is OK.  100% utilization for more than a moment is not :)

Not sure what tool(s) you use for monitoring your Solr servers, but look
under "JVM Pool Utilization" in SPM if you're using SPM.
Or this live demo of a Solr system:
* click on https://apps.sematext.com/demo to get into the demo account
* look at "JVM Pool Utilization" on
https://apps.sematext.com/spm-reports/mainPage.do?selectedApplication=1704&r=poolReportPage×tamp=1449865787801&stickyFiltersOff=false

And on that JVM Pool Size chart on top of the page you will see giant saw
pattern which is a healthy sign :)

HTH
Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Wed, Dec 9, 2015 at 9:56 AM, Steven White  wrote:

> Thanks Erick!!  Your summary and the blog by Uwe (thank you too Uwe) are
> very helpful.
>
> A follow up question.  I also noticed the "JVM-Memory" report off Solr's
> home page is fluctuating.  I expect some fluctuation, but it kinda worries
> me when it fluctuates up / down in a range of 4 GB and maybe more.  I.e.:
> at times it is at 5 GB and other times it is at 10 GB (this is while I'm
> running my search tests).  What does such high fluctuation means?
>
> If it helps, Solr's "JVM-Memory" report states 2.5 GB usage when Solr is
> first started and before I run any search on it.  I'm taking this as my
> base startup memory usage.
>
> Steve
>
> On Tue, Dec 8, 2015 at 3:17 PM, Erick Erickson 
> wrote:
>
> > You're doing nothing wrong, that particular bit of advice has
> > always needed a bit of explanation.
> >
> > Solr (well, actually Lucene) uses MMapDirectory for much of
> > the index structure which uses the OS memory rather than
> > the JVM heap. See Uwe's excellent:
> >
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > Plus, the size on disk includes the stored data, which is in the *.fdt
> > files in data/index. Very little of the stored data is kept in the JVM
> > so that's another reason your Java heap may be smaller than
> > your raw index size on disk.
> >
> > The advice about fitting your entire index into memory really has
> > the following caveats (at least).
> > 1> "memory" includes the OS memory available to the process
> > 2> The size of the index on disk is misleading, the *.fdt files
> >  should be subtracted in order to get a truer picture.
> > 3> Both Solr and Lucene create structures in the Java JVM
> >  that are _not_ reflected in the size on disk.
> >
> > <1> and <2> mean the JVM memory necessary is smaller
> > than the size on disk.
> >
> > <3> means the JVM memory will be larger than.
> >
> > So you're doing the right thing, testing and seeing what you
> > _really_ need. I'd pretty much take your test, add some
> > padding and consider it good. You're _not_ doing the
> > really bad thing of using the same query over and over
> > again and hoping .
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Dec 8, 2015 at 11:54 AM, Steven White 
> > wrote:
> > > Hi folks,
> > >
> > > My index size on disk (optimized) is 20 GB (single core, single index).
> > I
> > > have a system with 64 GB of RAM.  I start Solr with 24 GB of RAM.
> > >
> > > I have run load tests (up to 100 concurrent users) for hours where each
> > > user issuing unique searches (the same search is never executed again
> for
> > > at least 30 minute since it was last executed).  In all tests I run,
> > Solr's
> > > JVM memory never goes over 10 GB (monitoring http://localhost:8983/).
> > >
> > > I read over and over, for optimal performance, Solr should be given
> > enough
> > > RAM to hold the index in memory.  Well, I have done that and some but
> > yet I
> > > don't see Solr using up that whole RAM.  What am I doing wrong?  Is my
> > test
> > > at fault?  I doubled the test load (number of users) and didn't see
> much
> > of
> > > a difference with RAM usage but yet my search performance went down
> > (takes
> > > about 40% longer now).  I run my tests again but this time with only 12
> > GB
> > > of RAM given to Solr.  Test result didn't differ much from the 24 GB
> run
> > > and Solr never used more than 10 GB of RAM.
> > >
> > > Can someone help me understand this?  I don't want to give Solr RAM
> that
> > it
> > > won't use.
> > >
> > > PS: This is simply search tests, there is no update to the index at
> all.
> > >
> > > Thanks in advanced.
> > >
> > > Steve
> >
>

API accessible without authentication even though Basic Auth Plugin is enabled

2015-12-11 Thread Kristine Jetzke

Hi,

I noticed that it is possible to access the API even if the Basic Auth plugin 
is enabled. Is that a known issue/done on purpose? I didn’t find anything in 
JIRA or the docs.

What I did:
- Started zookeeper on port 2181 and uploaded security.json from 
https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 

- Started Solr cluster using cloud example: bin/solr start -e cloud -c -z 
localhost:2181
- Executed the following commands:
- curl -u solr:SolrRocks 
'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
 Returns 200 as expected
- curl -u solr:wrongPassword 
'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
 Returns 401 as expected
- curl 
'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
 Returns 200 even though no Authorization header is set.

I don’t understand why the last part works like it does. If I don’t give 
credentials, I would expect that the behavior is the same as with invalid 
credentials. Is there a special reason why it behaves like this? I’m wondering 
because I’m working on a custom authentication plugin and was looking into the 
existing ones to understand how they work.

Thanks,

tine

Re: Unable to create lot of cores -- Failing after 100 cores

2015-12-11 Thread Venkat Paranjothi


Thanks shalin..

Sorry i forgot to mention the verison..  4.6.1 + tomcat6 + zookeeper 3.4.5



Thanks,

Venkat Paranjothi
Kenexa 2xB Software Engineering Team
 
 
 
   Phone: 1-978-899-2746 
   E-mail: vpara...@us.ibm.com   
 





From:   Shalin Shekhar Mangar 
To: solr-user@lucene.apache.org
Date:   12/11/2015 03:18 PM
Subject:Re: Unable to create lot of cores -- Failing after 100 cores



Which version of Solr are you using? As Erick, said, use the latest 5.3.1
release, it is much more better in handling many collections.

On Sat, Dec 12, 2015 at 1:39 AM, Venkat Paranjothi 
wrote:

> Thanks Eric,
>
> This is what we did to the zookeeper and solr settings.. Still, we are
not
> seeing the improvement in the collection creation.. it takes lot of time
to
> see the collection on the Solrcloud.
>
> added the following line in zkServer.sh
>
> export JVMFLAGS="$JVMFLAGS -Xms256m -Xmx1g -Djute.maxBuffer=10485760"
>
> added the following into catalina.sh
> JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/opt/apache_solr/installed/solr
> -Dport=8080 -DhostContext=solr -Dsolr.Data.dir=/solr/dataidx
> -Djute.maxBuffer=10485760 -DzkClientTimeout=20
> -DzkHost=x.x.x.x:2181,y.y.y.y:2181,z.z.z.z:2181
> -Dcollection.configname=collection_configuration_v1"
>
>
> *Here is my zoo.cfg settings*
>
> # The number of milliseconds of each tick
> tickTime=3000
>
> # The number of ticks that the initial synchronization phase can take
> initLimit=200
>
> maxClientCnxns=0
>
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=50
>
> # the directory where the snapshot is stored.
> # Choose appropriately for your environment
> dataDir=/opt/apache_solr/solrcloud/zookeeper_data
>
> # the port at which the clients will connect
> clientPort=2181
>
> # the directory where transaction log is stored.
> # this parameter provides dedicated log device for ZooKeeper
> dataLogDir=/opt/apache_solr/solrcloud/zookeeper_log
>
> # ZooKeeper server and its port no.
> # ZooKeeper ensemble should know about every other machine in the
ensemble
> # specify server id by creating 'myid' file in the dataDir
> # use hostname instead of IP address for convenient maintenance
> server.1=x.x.x.x:2888:3888
> server.2=y.y.y.y:2888:3888
> server.3=z.z.z.z:2888:3888
>
>
>
> Thanks,
>
> *Venkat Paranjothi*
> Kenexa 2xB Software Engineering Team
> --
> *Phone:* 1-978-899-2746
> *E-mail:* *vpara...@us.ibm.com* 
>
>
>
>
> [image: Inactive hide details for Erick Erickson ---12/11/2015 01:16:57
> PM---A quick Google search shows the following: "you must set -]Erick
> Erickson ---12/11/2015 01:16:57 PM---A quick Google search shows the
> following: "you must set -Djute.maxbuffer in zookeeper and solr..."
>
> From: Erick Erickson 
> To: solr-user 
> Date: 12/11/2015 01:16 PM
> Subject: Re: Unable to create lot of cores -- Failing after 100 cores
> --
>
>
>
> A quick Google search shows the following:
>
> "you must set -Djute.maxbuffer in zookeeper and solr..."
>
> What have you tried? What were the results?
>
> What version of Solr are you using? 5.x defaults to
> an individual state.json file per collection rather than
> one big one for all collections, that will also help.
>
> Best,
> Erick
>
> On Fri, Dec 11, 2015 at 8:40 AM, Venkat Paranjothi 
> wrote:
> > Hello all,
> >
> > We need to create around 300 collections with replication factor 2.
But
> > after creating 100, we couldn't create more and most of them are in RED
> > state in the solrcloud.
> >
> > Is this issue related to zookeeper jute.maxBuffer issue?   If so, how
can
> > we increase the size of zookeeper maxbuffer and memory size.
> >
> > Thanks
> > Venkat
>
>
>
>


--
Regards,
Shalin Shekhar Mangar.

Re: Nested document query with wrong numFound value

2015-12-11 Thread Mikhail Khludnev

On Fri, Dec 11, 2015 at 11:05 PM, Yago Riveiro 
wrote:

> When do you say that I have duplicates, what do you mean?
>

I mean

http: //node-01:8983/solr/ecommerce-15_shard1_replica2/: {
QTime: "0",
ElapsedTime: "2",
RequestPurpose: "GET_TOP_IDS",
NumFound: "11",
Response:
"{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
timing, track],qt=/query,fl=[id,
score],shards.purpose=4,start=0,fsv=true,shard.url=
http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}
},response={numFound=11,start=0,maxScore=1.0,*docs=[**SolrDocument{id=**31814269823181426982280,
score=1.0}, SolrDocument{id=**31814269823181426982280, score=1.0},
SolrDocument{id=**31814269823181426982280, score=1.0},
SolrDocument{id=**31814269823181426982280,
score=1.0}, SolrDocument{id=**31814269823181426982280, score=1.0},
SolrDocument{id=**31814269823181426982281, score=1.0},
SolrDocument{id=**31814269823181426982281,
score=1.0}, SolrDocument{id=**31814269823181426982281, score=1.0},
SolrDocument{id=**31814269823181426982281, score=1.0},
SolrDocument{id=**31814269823181426982281,
score=1.0}]}*
,sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}"

Perhaps, it's worth to verify shards one by one sending requests with
distrib=false.


>
>
> If I have duplicate documents is not intentional, each document must be
> unique.
>
>
> Running a query for each id:
>
>
>
>
>
> - Parent :  3181426982318142698228
>
> - Child_1 : 31814269823181426982280
>
> - Child_2 : 31814269823181426982281
>
>
>
>
> The result is one document for each …
>
>
>
>
>
> responseHeader:
> {
> status: 0,
>
>
>
> QTime: 3,
>
>
>
> params:
> {
> q: "id:3181426982318142698228",
>
>
>
> fl: "id",
>
>
>
> q.op: "AND"
>
>
>
> }
>
>
> },
>
>
>
> response:
> {
> numFound: 1,
>
>
>
> start: 0,
>
>
>
> maxScore: 11.017976,
>
>
>
> docs:
> [
>
> {
> id: "3181426982318142698228"
>
>
> }
>
> ]
>
>
> }
>
>
>
>
>
>
>
> responseHeader:
> {
> status: 0,
>
>
>
> QTime: 3,
>
>
>
> params:
> {
> q: "id:31814269823181426982280",
>
>
>
> fl: "id",
>
>
>
> q.op: "AND"
>
>
>
> }
>
>
> },
>
>
>
> response:
> {
> numFound: 1,
>
>
>
> start: 0,
>
>
>
> maxScore: 9.919363,
>
>
>
> docs:
> [
>
> {
> id: "31814269823181426982280"
>
>
> }
>
> ]
>
>
> }
>
>
>
>
>
>
> responseHeader:
> {
> status: 0,
>
>
>
> QTime: 3,
>
>
>
> params:
> {
> q: "id:31814269823181426982281",
>
>
>
> fl: "id",
>
>
>
> q.op: "AND"
>
>
>
> }
>
>
> },
>
>
>
> response:
> {
> numFound: 1,
>
>
>
> start: 0,
>
>
>
> maxScore: 9.919363,
>
>
>
> docs:
> [
>
> {
> id: "31814269823181426982281"
>
>
> }
>
> ]
>
>
> }
>
>
>
>
>
>
>
>
>
>
> —/Yago Riveiro
>
>
>
>
>
> Ok. I got it. SolrCloud relies on uniqueKey (id) for merging shard results,
>
> but in your examples it doesn't work, because nested documents disables
>
> this. And you have duplicates, which make merge heap mad:
>
>
> false}
>
> <
> http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false%7D
> >},response={numFound=11,start=0,maxScore=1.0,docs=[SolrDocument{id=31814269823181426982280,
>
> score=1.0}, SolrDocument{id=31814269823181426982280, score=1.0},
>
> SolrDocument{id=31814269823181426982280, score=1.0},
>
> SolrDocument{id=31814269823181426982280, score=1.0},
>
> SolrDocument{id=31814269823181426982280, score=1.0},
>
> SolrDocument{id=31814269823181426982281, score=1.0},
>
> SolrDocument{id=31814269823181426982281, score=1.0},
>
> SolrDocument{id=31814269823181426982281, score=1.0},
>
> SolrDocument{id=31814269823181426982281, score=1.0},
>
>
> Yago, you encounter a quite curious fact. Congratulation!
>
> You can only retrieve parent document with SolrCloud, hence use {!parent
>
> ..}.. of fq=type:parent.
>
>
> ccing Devs:
>
> Shouldn't it prosecute ID dupes explicitly? Is it a known feature?
>
>
>
> On Fri, Dec 11, 2015 at 5:08 PM, Yago Riveiro 
>
> wrote:
>
>
> > This:
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > {
>
> >
>
> >
>
> > responseHeader: {
>
> >
>
> >
>
> > status: 0,
>
> >
>
> >
>
> > QTime: 10,
>
> >
>
> >
>
> > params: {
>
> >
>
> >
>
> > q: "id:3181426982318142698228*",
>
> >
>
> >
>
> > debugQuery: "true"
>
> >
>
> >
>
> >

Re: Unable to create lot of cores -- Failing after 100 cores

2015-12-11 Thread Erick Erickson

You seem to have changed the problem statement from
being unable to create cores to it takes a long time for
them to show up.

What's the current problem you're having?

Best,
Erick

On Fri, Dec 11, 2015 at 12:45 PM, Venkat Paranjothi 
wrote:

> Thanks shalin..
>
> Sorry i forgot to mention the verison.. 4.6.1 + tomcat6 + zookeeper 3.4.5
>
> Thanks,
>
> *Venkat Paranjothi*
> Kenexa 2xB Software Engineering Team
> --
> *Phone:* 1-978-899-2746
> *E-mail:* *vpara...@us.ibm.com* 
>
>
>
>
> [image: Inactive hide details for Shalin Shekhar Mangar ---12/11/2015
> 03:18:53 PM---Which version of Solr are you using? As Erick, said]Shalin
> Shekhar Mangar ---12/11/2015 03:18:53 PM---Which version of Solr are you
> using? As Erick, said, use the latest 5.3.1 release, it is much more b
>
> From: Shalin Shekhar Mangar 
> To: solr-user@lucene.apache.org
> Date: 12/11/2015 03:18 PM
> Subject: Re: Unable to create lot of cores -- Failing after 100 cores
> --
>
>
>
> Which version of Solr are you using? As Erick, said, use the latest 5.3.1
> release, it is much more better in handling many collections.
>
> On Sat, Dec 12, 2015 at 1:39 AM, Venkat Paranjothi 
> wrote:
>
> > Thanks Eric,
> >
> > This is what we did to the zookeeper and solr settings.. Still, we are
> not
> > seeing the improvement in the collection creation.. it takes lot of time
> to
> > see the collection on the Solrcloud.
> >
> > added the following line in zkServer.sh
> >
> > export JVMFLAGS="$JVMFLAGS -Xms256m -Xmx1g -Djute.maxBuffer=10485760"
> >
> > added the following into catalina.sh
> > JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/opt/apache_solr/installed/solr
> > -Dport=8080 -DhostContext=solr -Dsolr.Data.dir=/solr/dataidx
> > -Djute.maxBuffer=10485760 -DzkClientTimeout=20
> > -DzkHost=x.x.x.x:2181,y.y.y.y:2181,z.z.z.z:2181
> > -Dcollection.configname=collection_configuration_v1"
> >
> >
> > *Here is my zoo.cfg settings*
> >
> > # The number of milliseconds of each tick
> > tickTime=3000
> >
> > # The number of ticks that the initial synchronization phase can take
> > initLimit=200
> >
> > maxClientCnxns=0
> >
> > # The number of ticks that can pass between
> > # sending a request and getting an acknowledgement
> > syncLimit=50
> >
> > # the directory where the snapshot is stored.
> > # Choose appropriately for your environment
> > dataDir=/opt/apache_solr/solrcloud/zookeeper_data
> >
> > # the port at which the clients will connect
> > clientPort=2181
> >
> > # the directory where transaction log is stored.
> > # this parameter provides dedicated log device for ZooKeeper
> > dataLogDir=/opt/apache_solr/solrcloud/zookeeper_log
> >
> > # ZooKeeper server and its port no.
> > # ZooKeeper ensemble should know about every other machine in the
> ensemble
> > # specify server id by creating 'myid' file in the dataDir
> > # use hostname instead of IP address for convenient maintenance
> > server.1=x.x.x.x:2888:3888
> > server.2=y.y.y.y:2888:3888
> > server.3=z.z.z.z:2888:3888
> >
> >
> >
> > Thanks,
> >
> > *Venkat Paranjothi*
> > Kenexa 2xB Software Engineering Team
> > --
> > *Phone:* 1-978-899-2746
> > *E-mail:* *vpara...@us.ibm.com* 
> >
> >
> >
> >
> > [image: Inactive hide details for Erick Erickson ---12/11/2015 01:16:57
> > PM---A quick Google search shows the following: "you must set -]Erick
> > Erickson ---12/11/2015 01:16:57 PM---A quick Google search shows the
> > following: "you must set -Djute.maxbuffer in zookeeper and solr..."
> >
> > From: Erick Erickson 
> > To: solr-user 
> > Date: 12/11/2015 01:16 PM
> > Subject: Re: Unable to create lot of cores -- Failing after 100 cores
> > --
> >
> >
> >
> > A quick Google search shows the following:
> >
> > "you must set -Djute.maxbuffer in zookeeper and solr..."
> >
> > What have you tried? What were the results?
> >
> > What version of Solr are you using? 5.x defaults to
> > an individual state.json file per collection rather than
> > one big one for all collections, that will also help.
> >
> > Best,
> > Erick
> >
> > On Fri, Dec 11, 2015 at 8:40 AM, Venkat Paranjothi 
> > wrote:
> > > Hello all,
> > >
> > > We need to create around 300 collections with replication factor 2.
> But
> > > after creating 100, we couldn't create more and most of them are in RED
> > > state in the solrcloud.
> > >
> > > Is this issue related to zookeeper jute.maxBuffer issue?   If so, how
> can
> > > we increase the size of zookeeper maxbuffer and memory size.
> > >
> > > Thanks
> > > Venkat
> >
> >
> >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>
>

Re: Unable to create lot of cores -- Failing after 100 cores

2015-12-11 Thread Venkat Paranjothi


Eric,

  Sorry for the confusion.

  Noticed both issues, some times collection was showing in RED in the
  solrcloud.. if i leave this for a day, next day it shows  GREEN.  This is
  not true for all the cases and not seeing the consistent behavior.

  After restarting the  zookeeper & tomcat services, I  was able to create
  the first collection successfully and second collection was showing RED
  in the solrcloud.

  At the end, the behavior is not consistent.





From:   Erick Erickson 
To: solr-user 
Date:   12/11/2015 03:48 PM
Subject:Re: Unable to create lot of cores -- Failing after 100 cores



You seem to have changed the problem statement from
being unable to create cores to it takes a long time for
them to show up.

What's the current problem you're having?

Best,
Erick

On Fri, Dec 11, 2015 at 12:45 PM, Venkat Paranjothi 
wrote:

> Thanks shalin..
>
> Sorry i forgot to mention the verison.. 4.6.1 + tomcat6 + zookeeper 3.4.5
>
> Thanks,
>

>
>
>
>
> [image: Inactive hide details for Shalin Shekhar Mangar ---12/11/2015
> 03:18:53 PM---Which version of Solr are you using? As Erick, said]Shalin
> Shekhar Mangar ---12/11/2015 03:18:53 PM---Which version of Solr are you
> using? As Erick, said, use the latest 5.3.1 release, it is much more b
>
> From: Shalin Shekhar Mangar 
> To: solr-user@lucene.apache.org
> Date: 12/11/2015 03:18 PM
> Subject: Re: Unable to create lot of cores -- Failing after 100 cores
> --
>
>
>
> Which version of Solr are you using? As Erick, said, use the latest 5.3.1
> release, it is much more better in handling many collections.
>
> On Sat, Dec 12, 2015 at 1:39 AM, Venkat Paranjothi 
> wrote:
>
> > Thanks Eric,
> >
> > This is what we did to the zookeeper and solr settings.. Still, we are
> not
> > seeing the improvement in the collection creation.. it takes lot of
time
> to
> > see the collection on the Solrcloud.
> >
> > added the following line in zkServer.sh
> >
> > export JVMFLAGS="$JVMFLAGS -Xms256m -Xmx1g -Djute.maxBuffer=10485760"
> >
> > added the following into catalina.sh
> > JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/opt/apache_solr/installed/solr
> > -Dport=8080 -DhostContext=solr -Dsolr.Data.dir=/solr/dataidx
> > -Djute.maxBuffer=10485760 -DzkClientTimeout=20
> > -DzkHost=x.x.x.x:2181,y.y.y.y:2181,z.z.z.z:2181
> > -Dcollection.configname=collection_configuration_v1"
> >
> >
> > *Here is my zoo.cfg settings*
> >
> > # The number of milliseconds of each tick
> > tickTime=3000
> >
> > # The number of ticks that the initial synchronization phase can take
> > initLimit=200
> >
> > maxClientCnxns=0
> >
> > # The number of ticks that can pass between
> > # sending a request and getting an acknowledgement
> > syncLimit=50
> >
> > # the directory where the snapshot is stored.
> > # Choose appropriately for your environment
> > dataDir=/opt/apache_solr/solrcloud/zookeeper_data
> >
> > # the port at which the clients will connect
> > clientPort=2181
> >
> > # the directory where transaction log is stored.
> > # this parameter provides dedicated log device for ZooKeeper
> > dataLogDir=/opt/apache_solr/solrcloud/zookeeper_log
> >
> > # ZooKeeper server and its port no.
> > # ZooKeeper ensemble should know about every other machine in the
> ensemble
> > # specify server id by creating 'myid' file in the dataDir
> > # use hostname instead of IP address for convenient maintenance
> > server.1=x.x.x.x:2888:3888
> > server.2=y.y.y.y:2888:3888
> > server.3=z.z.z.z:2888:3888
> >
> >
> >
> > Thanks,
> >
> > *Venkat Paranjothi*
> > Kenexa 2xB Software Engineering Team
> > --
> > *Phone:* 1-978-899-2746
> > *E-mail:* *vpara...@us.ibm.com* 
> >
> >
> >
> >
> > [image: Inactive hide details for Erick Erickson ---12/11/2015 01:16:57
> > PM---A quick Google search shows the following: "you must set -]Erick
> > Erickson ---12/11/2015 01:16:57 PM---A quick Google search shows the
> > following: "you must set -Djute.maxbuffer in zookeeper and solr..."
> >
> > From: Erick Erickson 
> > To: solr-user 
> > Date: 12/11/2015 01:16 PM
> > Subject: Re: Unable to create lot of cores -- Failing after 100 cores
> > --
> >
> >
> >
> > A quick Google search shows the following:
> >
> > "you must set -Djute.maxbuffer in zookeeper and solr..."
> >
> > What have you tried? What were the results?
> >
> > What version of Solr are you using? 5.x defaults to
> > an individual state.json file per collection rather than
> > one big one for all collections, that will also help.
> >
> > Best,
> > Erick
> >
> > On Fri, Dec 11, 2015 at 8:40 AM, Venkat Paranjothi

> > wrote:
> > > Hello all,
> > >
> > > We need to create around 300 collections with replication factor 2.
> But
> > > after creating 100, we couldn't create more and most of them are in
RED
> > > state in the solrcloud.
> > >
> > > Is this issue related to zookeeper jute.maxBuffer issue?   If so, how
> can
> > > we increase the size of zookeeper maxbuffer a

RE: JSON facets and excluded queries

2015-12-11 Thread Aigner, Max

Answering one question myself after doing some testing on 5.3.1: 

Yes, facet.threads is still relevant with Json facets. 

We are seeing significant gains as we are increasing the number of threads from 
1 up to 4. Beyond that we only observed marginal  improvements -- which makes 
sense because the test VM has 4 cores. 

-Original Message-
From: Aigner, Max [mailto:max.aig...@nordstrom.com] 
Sent: Thursday, December 10, 2015 12:33 PM
To: solr-user@lucene.apache.org
Subject: RE: JSON facets and excluded queries

Another question popped up around this: 
Is the facet.threads parameter still relevant with Json facets? I saw that the 
facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got fixed in  
5.3 so I'm looking into re-enabling this parameter for our searches. 

On a side note, I've been testing Json facet performance and I've observed that 
they're generally  faster unless facet prefix filtering comes into play, then 
they seem to be slower than standard facets. 
Is that just a fluke or should I switch to Json Query Facets instead of using 
facet prefix filtering?

Thanks again,
Max

-Original Message-
From: Aigner, Max [mailto:max.aig...@nordstrom.com] 
Sent: Wednesday, November 25, 2015 11:54 AM
To: solr-user@lucene.apache.org
Subject: RE: JSON facets and excluded queries

Yes, just tried that and it works fine. 

That just removed a showstopper for me as my queries contain lots of tagged FQs 
and multi-select facets (implemented the 'good way' :). 

Thank you for the quick help! 

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Wednesday, November 25, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facets and excluded queries

On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley  wrote:
> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max  wrote:
>> Thanks, this is great :=))
>>
>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem to 
>> be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. Did I 
>> get that right?
>
> Hmmm, the "domain" keyword was added for 5.3 along with block join
> faceting: http://yonik.com/solr-nested-objects/
> That's when I switched "excludeTags" to also be under the "domain" keyword.
>
> Let me try it out...

Ah, I messed up that migration...
OK, for now, instead of
  domain:{excludeTags:foo}
just use
  excludeTags:foo
and it should work.

-Yonik

Re: Nested document query with wrong numFound value

2015-12-11 Thread Yago Riveiro

Mmmm,


In fact, if I running a json facet query the result count is 5 for both of 
them, this is consistent with the debug query.




What I don't understand is from where these documents are.




I pre-clean the colection several time with a delete query (id:*) and index 
always  31814269823181426982280 and 31814269823181426982281 as a children of  
3181426982318142698228


 

Can this issue be related to SOLR-5211?.


—/Yago Riveiro

On Fri, Dec 11, 2015 at 8:46 PM, Mikhail Khludnev
 wrote:

> On Fri, Dec 11, 2015 at 11:05 PM, Yago Riveiro 
> wrote:
>> When do you say that I have duplicates, what do you mean?
>>
> I mean
> http: //node-01:8983/solr/ecommerce-15_shard1_replica2/: {
> QTime: "0",
> ElapsedTime: "2",
> RequestPurpose: "GET_TOP_IDS",
> NumFound: "11",
> Response:
> "{responseHeader={status=0,QTime=0,params={df=_text_,distrib=false,debug=[false,
> timing, track],qt=/query,fl=[id,
> score],shards.purpose=4,start=0,fsv=true,shard.url=
> http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false}
> },response={numFound=11,start=0,maxScore=1.0,*docs=[**SolrDocument{id=**31814269823181426982280,
> score=1.0}, SolrDocument{id=**31814269823181426982280, score=1.0},
> SolrDocument{id=**31814269823181426982280, score=1.0},
> SolrDocument{id=**31814269823181426982280,
> score=1.0}, SolrDocument{id=**31814269823181426982280, score=1.0},
> SolrDocument{id=**31814269823181426982281, score=1.0},
> SolrDocument{id=**31814269823181426982281,
> score=1.0}, SolrDocument{id=**31814269823181426982281, score=1.0},
> SolrDocument{id=**31814269823181426982281, score=1.0},
> SolrDocument{id=**31814269823181426982281,
> score=1.0}]}*
> ,sort_values={},debug={timing={time=0.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}},process={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},debug={time=0.0}"
> Perhaps, it's worth to verify shards one by one sending requests with
> distrib=false.
>>
>>
>> If I have duplicate documents is not intentional, each document must be
>> unique.
>>
>>
>> Running a query for each id:
>>
>>
>>
>>
>>
>> - Parent :  3181426982318142698228
>>
>> - Child_1 : 31814269823181426982280
>>
>> - Child_2 : 31814269823181426982281
>>
>>
>>
>>
>> The result is one document for each …
>>
>>
>>
>>
>>
>> responseHeader:
>> {
>> status: 0,
>>
>>
>>
>> QTime: 3,
>>
>>
>>
>> params:
>> {
>> q: "id:3181426982318142698228",
>>
>>
>>
>> fl: "id",
>>
>>
>>
>> q.op: "AND"
>>
>>
>>
>> }
>>
>>
>> },
>>
>>
>>
>> response:
>> {
>> numFound: 1,
>>
>>
>>
>> start: 0,
>>
>>
>>
>> maxScore: 11.017976,
>>
>>
>>
>> docs:
>> [
>>
>> {
>> id: "3181426982318142698228"
>>
>>
>> }
>>
>> ]
>>
>>
>> }
>>
>>
>>
>>
>>
>>
>>
>> responseHeader:
>> {
>> status: 0,
>>
>>
>>
>> QTime: 3,
>>
>>
>>
>> params:
>> {
>> q: "id:31814269823181426982280",
>>
>>
>>
>> fl: "id",
>>
>>
>>
>> q.op: "AND"
>>
>>
>>
>> }
>>
>>
>> },
>>
>>
>>
>> response:
>> {
>> numFound: 1,
>>
>>
>>
>> start: 0,
>>
>>
>>
>> maxScore: 9.919363,
>>
>>
>>
>> docs:
>> [
>>
>> {
>> id: "31814269823181426982280"
>>
>>
>> }
>>
>> ]
>>
>>
>> }
>>
>>
>>
>>
>>
>>
>> responseHeader:
>> {
>> status: 0,
>>
>>
>>
>> QTime: 3,
>>
>>
>>
>> params:
>> {
>> q: "id:31814269823181426982281",
>>
>>
>>
>> fl: "id",
>>
>>
>>
>> q.op: "AND"
>>
>>
>>
>> }
>>
>>
>> },
>>
>>
>>
>> response:
>> {
>> numFound: 1,
>>
>>
>>
>> start: 0,
>>
>>
>>
>> maxScore: 9.919363,
>>
>>
>>
>> docs:
>> [
>>
>> {
>> id: "31814269823181426982281"
>>
>>
>> }
>>
>> ]
>>
>>
>> }
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> —/Yago Riveiro
>>
>>
>>
>>
>>
>> Ok. I got it. SolrCloud relies on uniqueKey (id) for merging shard results,
>>
>> but in your examples it doesn't work, because nested documents disables
>>
>> this. And you have duplicates, which make merge heap mad:
>>
>>
>> false}
>>
>> <
>> http://node-01:8983/solr/ecommerce-15_shard1_replica2/,rid=node-01-ecommerce-15_shard1_replica2-1449842438070-0,rows=10,version=2,q=id:3181426982318142698228*,requestPurpose=GET_TOP_IDS,NOW=1449842438070,isShard=true,wt=javabin,debugQuery=false%7D
>> >},response={numFound=11,start=0,maxScore=1.0,docs=[SolrDocument{id=31814269823181426982280,
>>
>> score=1.0}, SolrDocument{id=31814269823181426982280, score=1.0},
>>
>> SolrDocument{id=31814269823181426982280, score=1.0},
>>
>> SolrDocument{id=31814269823181426982280, score=1.0},
>>
>> SolrDocument{id=31814269823181426982280, score=1.0},
>>
>> SolrDocument{id=31814269823181426982281, score=1.0},
>>
>> SolrDocument{id=31814269823181426982281, score=1.0},
>>
>> SolrDocument{id=

Re: SolrCloud 4.8.1 - commit wait

2015-12-11 Thread Mark Miller

He has waitSearcher as false it looks, so all the time should be in the
commit. So that amount of time does sound odd.

I would certainly change those commit settings though. I would not use
maxDocs, that is an ugly way to control this. And one second is much too
aggressive as Erick says.

If you want to attempt that kind of visibility, you should use the
softAutoCommit. The regular autoCommit should be at least 15 or 20 seconds.

- Mark

On Fri, Dec 11, 2015 at 1:22 PM Erick Erickson 
wrote:

> First of all, your autocommit settings are _very_ aggressive. Committing
> every second is far to frequent IMO.
>
> As an aside, I generally prefer to omit the maxDocs as it's not all
> that predictable,
> but that's a personal preference and really doesn't bear on your problem..
>
> My _guess_ is that you are doing a lot of autowarming. The number of docs
> doesn't really matter if your autowarming is taking forever, your Solr logs
> should report the autowarm times at INFO level, have you checked those?
>
> The commit settings shouldn't be a problem in terms of your server dying,
> the indexing process flushes docs to the tlog independent of committing so
> upon restart they should be recovered. Here's a blog on the subject:
>
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Fri, Dec 11, 2015 at 8:24 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards
> and
> > 15 replicas.
> > There is a solrj application that feeds the collection, updating few
> > documents every hour, I don't understand why, at end of process, the hard
> > commit takes about 8/10 minutes.
> >
> > Even if there are only few hundreds of documents.
> >
> > This is the autocommit configuration:
> >
> > 
> > 1
> > 1000
> > false
> > 
> >
> > In your experience why hard commit takes so long even for so few
> documents?
> >
> > Now I'm changing the code to softcommit, calling commit (waitFlush =
> > false, waitSearcher
> > = false, softCommit = true);
> >
> > solrServer.commit(false, false, true);.
> >
> > I have configured NRTCachingDirectoryFactory, but I'm a little bit
> worried
> > if a server goes down (something like: kill -9, SolrCloud crashes, out of
> > memory, etc.), and if, using this strategy
> softcommit+NRTCachingDirectory,
> > SolrCloud instance could not recover a replica.
> >
> > Should I worry about this new configuration? I was thinking to take a
> > snapshot of everything every day, in order to recover immediately the
> > index. Could this be considered a best practice?
> >
> > Thanks in advance for your time,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>
-- 
- Mark
about.me/markrmiller

Re: JSON facets and excluded queries

2015-12-11 Thread Erick Erickson

Do note that the number of threads also won't help much last I knew unless you
are faceting over that many fields too. I.e. setting this to 5 while faceting on
only 1 field won't help.

And it's not implemented for all facet types IIRC.

Best,
Erick

On Fri, Dec 11, 2015 at 1:07 PM, Aigner, Max  wrote:
> Answering one question myself after doing some testing on 5.3.1:
>
> Yes, facet.threads is still relevant with Json facets.
>
> We are seeing significant gains as we are increasing the number of threads 
> from 1 up to 4. Beyond that we only observed marginal  improvements -- which 
> makes sense because the test VM has 4 cores.
>
> -Original Message-
> From: Aigner, Max [mailto:max.aig...@nordstrom.com]
> Sent: Thursday, December 10, 2015 12:33 PM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facets and excluded queries
>
> Another question popped up around this:
> Is the facet.threads parameter still relevant with Json facets? I saw that 
> the facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got 
> fixed in  5.3 so I'm looking into re-enabling this parameter for our searches.
>
> On a side note, I've been testing Json facet performance and I've observed 
> that they're generally  faster unless facet prefix filtering comes into play, 
> then they seem to be slower than standard facets.
> Is that just a fluke or should I switch to Json Query Facets instead of using 
> facet prefix filtering?
>
> Thanks again,
> Max
>
> -Original Message-
> From: Aigner, Max [mailto:max.aig...@nordstrom.com]
> Sent: Wednesday, November 25, 2015 11:54 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facets and excluded queries
>
> Yes, just tried that and it works fine.
>
> That just removed a showstopper for me as my queries contain lots of tagged 
> FQs and multi-select facets (implemented the 'good way' :).
>
> Thank you for the quick help!
>
> -Original Message-
> From: Yonik Seeley [mailto:ysee...@gmail.com]
> Sent: Wednesday, November 25, 2015 11:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facets and excluded queries
>
> On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley  wrote:
>> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max  
>> wrote:
>>> Thanks, this is great :=))
>>>
>>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem 
>>> to be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. 
>>> Did I get that right?
>>
>> Hmmm, the "domain" keyword was added for 5.3 along with block join
>> faceting: http://yonik.com/solr-nested-objects/
>> That's when I switched "excludeTags" to also be under the "domain" keyword.
>>
>> Let me try it out...
>
> Ah, I messed up that migration...
> OK, for now, instead of
>   domain:{excludeTags:foo}
> just use
>   excludeTags:foo
> and it should work.
>
> -Yonik

Re: API accessible without authentication even though Basic Auth Plugin is enabled

2015-12-11 Thread Chris Hostetter


Ugh ... no sure WTF is going on here, but that's for reporting it with 
clear steps to reproduce...

https://issues.apache.org/jira/browse/SOLR-8408

: Date: Fri, 11 Dec 2015 20:43:46 +0100
: From: Kristine Jetzke 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: API accessible without authentication even though Basic Auth Plugin
: is enabled
: 
: Hi,
: 
: I noticed that it is possible to access the API even if the Basic Auth plugin 
is enabled. Is that a known issue/done on purpose? I didn’t find anything in 
JIRA or the docs.
: 
: What I did:
: - Started zookeeper on port 2181 and uploaded security.json from 
https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 

: - Started Solr cluster using cloud example: bin/solr start -e cloud -c -z 
localhost:2181
: - Executed the following commands:
: - curl -u solr:SolrRocks 
'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
 Returns 200 as expected
: - curl -u solr:wrongPassword 
'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
 Returns 401 as expected
: - curl 
'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
 Returns 200 even though no Authorization header is set.
: 
: I don’t understand why the last part works like it does. If I don’t give 
credentials, I would expect that the behavior is the same as with invalid 
credentials. Is there a special reason why it behaves like this? I’m wondering 
because I’m working on a custom authentication plugin and was looking into the 
existing ones to understand how they work.
: 
: Thanks,
: 
: tine

-Hoss
http://www.lucidworks.com/

Re: Block Join query

2015-12-11 Thread Novin


No Worries, I was just wondering what did I miss.  And thanks for blog link.

On 11/12/2015 18:52, Mikhail Khludnev wrote:

Novin,

I regret so much. It's my pet peeve in Solr query parsing. Handling s space
is dependent from the first symbol of query sting
This will work (starts from '{!' ):
q={!parent which="doctype:200"}flow:[624 TO 700]
These won't due to " ", "+":
q= {!parent which="doctype:200"}flow:[624 TO 700]
q=+{!parent which="doctype:200"}flow:[624 TO 700]
Subordinate clauses with spaces are better handled with "Nested Queries" or
so, check the post



On Fri, Dec 11, 2015 at 6:31 PM, Novin  wrote:


Hi Guys,

I'm trying  block join query, so I have tried   +{!parent
which="doctype:200"}flow:624 worked fine. But when i tried +{!parent
which="doctype:200"}flow:[624 TO 700]

Got the below error

org.apache.solr.search.SyntaxError: Cannot parse 'flow_l:[624':
Encountered \"\" at line 1, column 11.\nWas expecting one of:\n
\"TO\" ...\n ...\n  ...\n

Just wondering too, can we able to do range in block join query.

Thanks,
Novin

In Solr 5.3.0, how to load customized analyzer jar file ?

2015-12-11 Thread Mingzhu Gao

Hi All ,

I switched from solr 4.x version to solr 5.3.0 .

And I am creating a core and run it as standalone mode , not cloud mode .
I want to know , how to load those external jar file , for example , my 
customized analyzer or filter ?
I add a  in solrconfig.xml , for example :





However , it looks that it doesn't work , it still complain "Cannot load 
analyzer" .

It's okay for me to load them ins solr 4.10.4 , however , in solr 5.3.0 ,

It seems that it changed the way to load jar files .


Can anybody help me on this ?  Thanks in advance .


Thanks,

-Judy

Re: In Solr 5.3.0, how to load customized analyzer jar file ?

2015-12-11 Thread Ahmet Arslan

Hi,

Apparently best way thing to do is create lib directory under the solr home 
directory.

Jars in this directory loaded automatically. No need a solrconfig.xml entry.

thanks,
Ahmet



On Saturday, December 12, 2015 2:05 AM, Mingzhu Gao  wrote:
Hi All ,

I switched from solr 4.x version to solr 5.3.0 .

And I am creating a core and run it as standalone mode , not cloud mode .
I want to know , how to load those external jar file , for example , my 
customized analyzer or filter ?
I add a  in solrconfig.xml , for example :





However , it looks that it doesn't work , it still complain "Cannot load 
analyzer" .

It's okay for me to load them ins solr 4.10.4 , however , in solr 5.3.0 ,

It seems that it changed the way to load jar files .


Can anybody help me on this ?  Thanks in advance .


Thanks,

-Judy

RE: JSON facets and excluded queries

2015-12-11 Thread Aigner, Max

Good to know, thank you. 

From an implementation standpoint that makes a lot of sense. 
We are only using facets of type 'term' for now and for those it works nicely. 
Our usual searches carry around 8-12 facets so we are covered from that side 
:-) 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 11, 2015 3:12 PM
To: solr-user 
Subject: Re: JSON facets and excluded queries

Do note that the number of threads also won't help much last I knew unless you 
are faceting over that many fields too. I.e. setting this to 5 while faceting 
on only 1 field won't help.

And it's not implemented for all facet types IIRC.

Best,
Erick

On Fri, Dec 11, 2015 at 1:07 PM, Aigner, Max  wrote:
> Answering one question myself after doing some testing on 5.3.1:
>
> Yes, facet.threads is still relevant with Json facets.
>
> We are seeing significant gains as we are increasing the number of threads 
> from 1 up to 4. Beyond that we only observed marginal  improvements -- which 
> makes sense because the test VM has 4 cores.
>
> -Original Message-
> From: Aigner, Max [mailto:max.aig...@nordstrom.com]
> Sent: Thursday, December 10, 2015 12:33 PM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facets and excluded queries
>
> Another question popped up around this:
> Is the facet.threads parameter still relevant with Json facets? I saw that 
> the facet prefix bug https://issues.apache.org/jira/browse/SOLR-6686 got 
> fixed in  5.3 so I'm looking into re-enabling this parameter for our searches.
>
> On a side note, I've been testing Json facet performance and I've observed 
> that they're generally  faster unless facet prefix filtering comes into play, 
> then they seem to be slower than standard facets.
> Is that just a fluke or should I switch to Json Query Facets instead of using 
> facet prefix filtering?
>
> Thanks again,
> Max
>
> -Original Message-
> From: Aigner, Max [mailto:max.aig...@nordstrom.com]
> Sent: Wednesday, November 25, 2015 11:54 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facets and excluded queries
>
> Yes, just tried that and it works fine.
>
> That just removed a showstopper for me as my queries contain lots of tagged 
> FQs and multi-select facets (implemented the 'good way' :).
>
> Thank you for the quick help!
>
> -Original Message-
> From: Yonik Seeley [mailto:ysee...@gmail.com]
> Sent: Wednesday, November 25, 2015 11:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facets and excluded queries
>
> On Wed, Nov 25, 2015 at 2:29 PM, Yonik Seeley  wrote:
>> On Wed, Nov 25, 2015 at 2:15 PM, Aigner, Max  
>> wrote:
>>> Thanks, this is great :=))
>>>
>>> I hadn't seen the domain:{excludeTags:...} syntax yet and it doesn't seem 
>>> to be working on 5.3.1 so I'm assuming this is work slated for 5.4 or 6. 
>>> Did I get that right?
>>
>> Hmmm, the "domain" keyword was added for 5.3 along with block join
>> faceting: http://yonik.com/solr-nested-objects/
>> That's when I switched "excludeTags" to also be under the "domain" keyword.
>>
>> Let me try it out...
>
> Ah, I messed up that migration...
> OK, for now, instead of
>   domain:{excludeTags:foo}
> just use
>   excludeTags:foo
> and it should work.
>
> -Yonik

Re: SolrCloud 4.8.1 - commit wait

2015-12-11 Thread Vincenzo D'Amore

Thanks Erick, Mark,

I'll raise maxTime asap.
Just to be sure understand, given that I have openSearcher=false, I suppose
it shouldn't trigger autowarming at least until a commit is executed,
shouldn't it?

Anyway, I don't understand, given that maxTime is very aggressive, why hard
commit takes so long.

Thanks again for your answers.
Vincenzo


On Fri, Dec 11, 2015 at 7:22 PM, Erick Erickson 
wrote:

> First of all, your autocommit settings are _very_ aggressive. Committing
> every second is far to frequent IMO.
>
> As an aside, I generally prefer to omit the maxDocs as it's not all
> that predictable,
> but that's a personal preference and really doesn't bear on your problem..
>
> My _guess_ is that you are doing a lot of autowarming. The number of docs
> doesn't really matter if your autowarming is taking forever, your Solr logs
> should report the autowarm times at INFO level, have you checked those?
>
> The commit settings shouldn't be a problem in terms of your server dying,
> the indexing process flushes docs to the tlog independent of committing so
> upon restart they should be recovered. Here's a blog on the subject:
>
>
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Fri, Dec 11, 2015 at 8:24 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards
> and
> > 15 replicas.
> > There is a solrj application that feeds the collection, updating few
> > documents every hour, I don't understand why, at end of process, the hard
> > commit takes about 8/10 minutes.
> >
> > Even if there are only few hundreds of documents.
> >
> > This is the autocommit configuration:
> >
> > 
> > 1
> > 1000
> > false
> > 
> >
> > In your experience why hard commit takes so long even for so few
> documents?
> >
> > Now I'm changing the code to softcommit, calling commit (waitFlush =
> > false, waitSearcher
> > = false, softCommit = true);
> >
> > solrServer.commit(false, false, true);.
> >
> > I have configured NRTCachingDirectoryFactory, but I'm a little bit
> worried
> > if a server goes down (something like: kill -9, SolrCloud crashes, out of
> > memory, etc.), and if, using this strategy
> softcommit+NRTCachingDirectory,
> > SolrCloud instance could not recover a replica.
> >
> > Should I worry about this new configuration? I was thinking to take a
> > snapshot of everything every day, in order to recover immediately the
> > index. Could this be considered a best practice?
> >
> > Thanks in advance for your time,
> > Vincenzo
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: API accessible without authentication even though Basic Auth Plugin is enabled

2015-12-11 Thread Noble Paul

It works as designed.

Protect the read path using the following command
curl  http://localhost:8983/solr/admin/authorization -H
'Content-type:application/json' -d '{ set-permission : {name : read,
role : admin}}'
Then, you will have the right experience

In this case /select is not protected. So an unauthenticated request
must be able to access /select path. authentication layer has no idea
whether it is a protected resource or not. So, when no credentials
headers are sent it sets the user principal as null and lets the
request go through. Whereas in the case of wrong credentials, the
choices are 1) fail the request or 2) forward the request as if the
principal is null . #2 would be bad user experience because the
Authorization layer would say principal is null (unauthenicated) and
the user would not know that the credentials were wrong.

On Sat, Dec 12, 2015 at 5:14 AM, Chris Hostetter
 wrote:
>
> Ugh ... no sure WTF is going on here, but that's for reporting it with
> clear steps to reproduce...
>
> https://issues.apache.org/jira/browse/SOLR-8408
>
> : Date: Fri, 11 Dec 2015 20:43:46 +0100
> : From: Kristine Jetzke 
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: API accessible without authentication even though Basic Auth Plugin
> : is enabled
> :
> : Hi,
> :
> : I noticed that it is possible to access the API even if the Basic Auth 
> plugin is enabled. Is that a known issue/done on purpose? I didn’t find 
> anything in JIRA or the docs.
> :
> : What I did:
> : - Started zookeeper on port 2181 and uploaded security.json from 
> https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin 
> 
> : - Started Solr cluster using cloud example: bin/solr start -e cloud -c -z 
> localhost:2181
> : - Executed the following commands:
> : - curl -u solr:SolrRocks 
> 'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
>  Returns 200 as expected
> : - curl -u solr:wrongPassword 
> 'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
>  Returns 401 as expected
> : - curl 
> 'http://localhost:8983/solr/gettingstarted_shard1_replica1/select?q=*%3A*&wt=json&indent=true':
>  Returns 200 even though no Authorization header is set.
> :
> : I don’t understand why the last part works like it does. If I don’t give 
> credentials, I would expect that the behavior is the same as with invalid 
> credentials. Is there a special reason why it behaves like this? I’m 
> wondering because I’m working on a custom authentication plugin and was 
> looking into the existing ones to understand how they work.
> :
> : Thanks,
> :
> : tine
>
> -Hoss
> http://www.lucidworks.com/



-- 
-
Noble Paul

Re: Authorization API versus zkcli.sh

2015-12-11 Thread Noble Paul

Oakley,

1) ideally you should only upload the first empty security.json. In
that case there is no need to add the version attributes. Thereafter
you are supposed to use the API
2) Just in case you need to upload the security.json, please remove
that attribute

shalin:
The version is added inside the json is because we have no idea
whether an edit caused authentication or authorization to be reloaded
. By adding separate versions I'm able to make that distinction

On Fri, Dec 11, 2015 at 9:02 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
> So, when one has finished constructing the desired security.json (by means of 
> Authentication and Authorization commands) and then run "zkcli.sh -cmd 
> getfile" to get this security.json in order for it to be used as a template: 
> one should edit the template to remove this "":{"v":85} clause (and the comma 
> which precedes it): correct?
>
> I notice that the documented minimal security.json which simply creates the 
> solr:SolrRocks login:pswd does not have such a clause: so I assume that the 
> lack of such a clause is not an error.
>
> 
> From: Anshum Gupta [ans...@anshumgupta.net]
> Sent: Friday, December 11, 2015 9:48 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Authorization API versus zkcli.sh
>
> yes, that's the assumption. The reason why there's a version there is to
> optimize on reloads i.e. Authentication and Authorization plugins are
> reloaded only when the version number is changed. e.g.
> * Start with Ver 1 for both authentication and authorization
> * Make changes to Authentication, the version for this section is updated
> to the znode version, while the version for the authorization section is
> not changed. This forces the authentication plugin to be reloaded but not
> the authorization plugin. Similarly for authorization.
>
> It's a way to optimize the reloads without splitting the definition into 2
> znodes, which is also an option.
>
>
> On Fri, Dec 11, 2015 at 8:06 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>> Shouldn't this be the znode version? Why put a version in
>> security.json? Or is the idea that the user will upload security.json
>> only once and then use the security APIs for all further changes?
>>
>> On Fri, Dec 11, 2015 at 11:51 AM, Noble Paul  wrote:
>> > Please do not put any number. That number is used by the system to
>> > optimize loading/reloading plugins. It is not relevant for the user.
>> >
>> > On Thu, Dec 10, 2015 at 11:52 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
>> >  wrote:
>> >> Looking at security.json in Zookeeper, I notice that both the
>> authentication section and the authorization section ends with something
>> like
>> >>
>> >> "":{"v":47}},
>> >>
>> >> Am I correct in thinking that this 47 (in this case) is a version
>> number, and that ANY number could be used in the file uploaded to
>> security.json using "zkcli.sh -putfile"?
>> >>
>> >> Or is this some sort of checksum whose value must match some unclear
>> criteria?
>> >>
>> >>
>> >> -Original Message-
>> >> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
>> >> Sent: Sunday, December 06, 2015 8:42 AM
>> >> To: solr-user@lucene.apache.org
>> >> Subject: Re: Authorization API versus zkcli.sh
>> >>
>> >> There's nothing cluster specific in security.json if you're using those
>> >> plugins. It is totally safe to just take the file from one cluster and
>> >> upload it for another for things to work.
>> >>
>> >> On Sat, Dec 5, 2015 at 3:38 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <
>> >> craig.oak...@nih.gov> wrote:
>> >>
>> >>> Looking through
>> >>>
>> cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
>> >>> one notices that security.json is initially created by zkcli.sh, and
>> then
>> >>> modified by means of the Authentication API and the Authorization API.
>> By
>> >>> and large, this sounds like a good way to accomplish such tasks,
>> assuming
>> >>> that these APIs do some error checking to prevent corruption of
>> >>> security.json
>> >>>
>> >>> I was wondering about cases where one is cloning an existing Solr
>> >>> instance, such as when creating an instance in Amazon Cloud. If one
>> has a
>> >>> security.json that has been thoroughly tried and successfully tested on
>> >>> another Solr instance, is it possible / safe / not-un-recommended to
>> use
>> >>> zkcli.sh to load the full security.json (as extracted via zkcli.sh
>> from the
>> >>> Zookeeper of the thoroughly tested existing instance)? Or would the
>> >>> official verdict be that the only acceptable way to create
>> security.json is
>> >>> to load a minimal version with zkcli.sh and then to build the remaining
>> >>> components with the Authentication API and the Authorization API (in a
>> >>> script, if one wants to automate the process: although such a script
>> would
>> >>> have to include plain-text passwords)?
>> >>>
>> >>> I figured there is no harm in asking.
>> >>>
>> >>
>> >>
>> >>
>

Re: how to secure standalone solr

2015-12-11 Thread Noble Paul

For standalone Solr , Kerberos is the only option for authentication.
If you have  a SolrCloud setup, you have other options

https://cwiki.apache.org/confluence/display/solr/Basic+Authentication+Plugin
https://cwiki.apache.org/confluence/display/solr/Rule-Based+Authorization+Plugin

On Fri, Dec 11, 2015 at 11:02 PM, Don Bosco Durai  wrote:
>>Anyone told me how to secure standalone solr .
> Recently there were few discussion on this. In short, it is not tested and 
> there doesn’t seem to a plan to test it.
>
>>1.)using Kerberos Plugin is a good practice or any other else.
> The answer depends how you are using it. Where you are deploying it, who is 
> accessing it, whether you want to restrict by access type (read/write), what 
> authentication environment (LDAP/AD, Kerberos, etc) you already have.
>
> Depending upon your use cases and environment, you may have one or more 
> options.
>
> Bosco
>
>
>
>
>
>
> On 12/11/15, 4:27 AM, "Mugeesh Husain"  wrote:
>
>>Hello,
>>
>>Anyone told me how to secure standalone solr .
>>
>>1.)using Kerberos Plugin is a good practice or any other else.
>>
>>
>>
>>--
>>View this message in context: 
>>http://lucene.472066.n3.nabble.com/how-to-secure-standalone-solr-tp4244866.html
>>Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
-
Noble Paul

Re: In Solr 5.3.0, how to load customized analyzer jar file ?

2015-12-11 Thread Mingzhu Gao

Thanks Ahmet , 

You mean solr home directory , root of solr  or {solr_root}/bin ?  I try
both , it doesn¹t work .

Can anybody has any other idea ?

Thanks,
-Judy

On 12/11/15, 4:10 PM, "Ahmet Arslan"  wrote:

>Hi,
>
>Apparently best way thing to do is create lib directory under the solr
>home directory.
>
>Jars in this directory loaded automatically. No need a solrconfig.xml
>entry.
>
>thanks,
>Ahmet
>
>
>
>On Saturday, December 12, 2015 2:05 AM, Mingzhu Gao 
>wrote:
>Hi All ,
>
>I switched from solr 4.x version to solr 5.3.0 .
>
>And I am creating a core and run it as standalone mode , not cloud mode .
>I want to know , how to load those external jar file , for example , my
>customized analyzer or filter ?
>I add a  in solrconfig.xml , for example :
>
>
>
>
>
>However , it looks that it doesn't work , it still complain "Cannot load
>analyzer" .
>
>It's okay for me to load them ins solr 4.10.4 , however , in solr 5.3.0 ,
>
>It seems that it changed the way to load jar files .
>
>
>Can anybody help me on this ?  Thanks in advance .
>
>
>Thanks,
>
>-Judy

Re: SolrCloud 4.8.1 - commit wait

2015-12-11 Thread Vincenzo D'Amore

Hi All,

an update, I have switched logging from WARN to INFO for all except for
those two:

- org.apache.solr.core
- org.apache.solr.handler.component.SpellCheckComponent

Well, looking at log file I'm unable to find any autowarm log line, even
after few updates and commits.

Looking at solrconfig.xml I see most autowarmCount parameters are set to 0






Not sure what this means...

On Sat, Dec 12, 2015 at 1:13 AM, Vincenzo D'Amore 
wrote:

> Thanks Erick, Mark,
>
> I'll raise maxTime asap.
> Just to be sure understand, given that I have openSearcher=false, I
> suppose it shouldn't trigger autowarming at least until a commit is
> executed, shouldn't it?
>
> Anyway, I don't understand, given that maxTime is very aggressive, why
> hard commit takes so long.
>
> Thanks again for your answers.
> Vincenzo
>
>
> On Fri, Dec 11, 2015 at 7:22 PM, Erick Erickson 
> wrote:
>
>> First of all, your autocommit settings are _very_ aggressive. Committing
>> every second is far to frequent IMO.
>>
>> As an aside, I generally prefer to omit the maxDocs as it's not all
>> that predictable,
>> but that's a personal preference and really doesn't bear on your problem..
>>
>> My _guess_ is that you are doing a lot of autowarming. The number of docs
>> doesn't really matter if your autowarming is taking forever, your Solr
>> logs
>> should report the autowarm times at INFO level, have you checked those?
>>
>> The commit settings shouldn't be a problem in terms of your server dying,
>> the indexing process flushes docs to the tlog independent of committing so
>> upon restart they should be recovered. Here's a blog on the subject:
>>
>>
>> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best,
>> Erick
>>
>> On Fri, Dec 11, 2015 at 8:24 AM, Vincenzo D'Amore 
>> wrote:
>> > Hi all,
>> >
>> > I have a SolrCloud cluster with a collection (2.5M docs) with 3 shards
>> and
>> > 15 replicas.
>> > There is a solrj application that feeds the collection, updating few
>> > documents every hour, I don't understand why, at end of process, the
>> hard
>> > commit takes about 8/10 minutes.
>> >
>> > Even if there are only few hundreds of documents.
>> >
>> > This is the autocommit configuration:
>> >
>> > 
>> > 1
>> > 1000
>> > false
>> > 
>> >
>> > In your experience why hard commit takes so long even for so few
>> documents?
>> >
>> > Now I'm changing the code to softcommit, calling commit (waitFlush =
>> > false, waitSearcher
>> > = false, softCommit = true);
>> >
>> > solrServer.commit(false, false, true);.
>> >
>> > I have configured NRTCachingDirectoryFactory, but I'm a little bit
>> worried
>> > if a server goes down (something like: kill -9, SolrCloud crashes, out
>> of
>> > memory, etc.), and if, using this strategy
>> softcommit+NRTCachingDirectory,
>> > SolrCloud instance could not recover a replica.
>> >
>> > Should I worry about this new configuration? I was thinking to take a
>> > snapshot of everything every day, in order to recover immediately the
>> > index. Could this be considered a best practice?
>> >
>> > Thanks in advance for your time,
>> > Vincenzo
>> >
>> > --
>> > Vincenzo D'Amore
>> > email: v.dam...@gmail.com
>> > skype: free.dev
>> > mobile: +39 349 8513251
>>
>
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: Solrcloud 4.8.1 - Solr cores reload

2015-12-11 Thread Vincenzo D'Amore

Thanks for your suggestion Erick, I'm changing the code and I'll use the
Collections API RELOAD.
I have done few test changing synonyms dictionary or solrconfig and
everything works fine.

Well, I think you already know, but looking at solr.log file after the
collections api reload call, I have seen a bunch of lines like this one:

- Collection Admin sending CoreAdmin cmd to http://192.168.101.118:8080/solr
params:action=RELOAD&core=collection1_shard1_replica1&qt=%2Fadmin%2Fcores
...

Best regards and thanks again,
Vincenzo


On Fri, Dec 11, 2015 at 7:38 PM, Erick Erickson 
wrote:

> You should absolutely always use the Collection API rather than
> any core admin API if at all possible. If for no other reason
> than your client will be _lots_ simpler (i.e. you don't have
> to find all the replicas and issue the core admin RELOAD
> command for each one).
>
> I'm not entirely sure whether the RELOAD command is
> synchronous or not though.
>
> Best,
> erick
>
> On Fri, Dec 11, 2015 at 8:22 AM, Vincenzo D'Amore 
> wrote:
> > Hi all,
> >
> > in day by day work, often I need to change the solr configurations files.
> > Often adding new synonyms, changing the schema or the solrconfig.xml.
> >
> > Everything is stored in zookeeper.
> >
> > But I have inherited a piece of code that, after every change, reload all
> > the cores using CoreAdmin API.
> >
> > Now I have 15 replicas in the collection, and after every core reload the
> > code waits for 60 seconds (I suppose it's because who wrote the code was
> > worried about the cache invalidation).
> >
> > Given that, it takes about 25 minutes to update all the cores. Obviously
> > during this time we cannot modify the collection.
> >
> > The question is, to reduce this wait, if I use the collection API RELOAD,
> > what are the counter indication?
> >
> > Thanks in advance for your time,
> > Vincenzo
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Re: In Solr 5.3.0, how to load customized analyzer jar file ?

2015-12-11 Thread Ahmet Arslan

Hi Judy,

It is where solr.xml file resides.

Ahmet



On Saturday, December 12, 2015 3:30 AM, Mingzhu Gao  wrote:
Thanks Ahmet , 

You mean solr home directory , root of solr  or {solr_root}/bin ?  I try
both , it doesn¹t work .

Can anybody has any other idea ?

Thanks,
-Judy


On 12/11/15, 4:10 PM, "Ahmet Arslan"  wrote:

>Hi,
>
>Apparently best way thing to do is create lib directory under the solr
>home directory.
>
>Jars in this directory loaded automatically. No need a solrconfig.xml
>entry.
>
>thanks,
>Ahmet
>
>
>
>On Saturday, December 12, 2015 2:05 AM, Mingzhu Gao 
>wrote:
>Hi All ,
>
>I switched from solr 4.x version to solr 5.3.0 .
>
>And I am creating a core and run it as standalone mode , not cloud mode .
>I want to know , how to load those external jar file , for example , my
>customized analyzer or filter ?
>I add a  in solrconfig.xml , for example :
>
>
>
>
>
>However , it looks that it doesn't work , it still complain "Cannot load
>analyzer" .
>
>It's okay for me to load them ins solr 4.10.4 , however , in solr 5.3.0 ,
>
>It seems that it changed the way to load jar files .
>
>
>Can anybody help me on this ?  Thanks in advance .
>
>
>Thanks,
>
>-Judy

Getting a document version back after updating

2015-12-11 Thread Debraj Manna

Is there a way I can get the version of a document back in response after
adding or updating the document via Solrj 5.2.1?

Re: In Solr 5.3.0, how to load customized analyzer jar file ?

2015-12-11 Thread Binoy Dalal

I am using Solr 5.3.1 and used the lib directive in solrconfig to load one
of my search components and that worked fine.
So, I don't think that anything has changed here.
What is the stacktrace of the errors you get?

On Sat, Dec 12, 2015 at 9:32 AM Ahmet Arslan 
wrote:

> Hi Judy,
>
> It is where solr.xml file resides.
>
> Ahmet
>
>
>
> On Saturday, December 12, 2015 3:30 AM, Mingzhu Gao 
> wrote:
> Thanks Ahmet ,
>
> You mean solr home directory , root of solr  or {solr_root}/bin ?  I try
> both , it doesn¹t work .
>
> Can anybody has any other idea ?
>
> Thanks,
> -Judy
>
>
> On 12/11/15, 4:10 PM, "Ahmet Arslan"  wrote:
>
> >Hi,
> >
> >Apparently best way thing to do is create lib directory under the solr
> >home directory.
> >
> >Jars in this directory loaded automatically. No need a solrconfig.xml
> >entry.
> >
> >thanks,
> >Ahmet
> >
> >
> >
> >On Saturday, December 12, 2015 2:05 AM, Mingzhu Gao 
> >wrote:
> >Hi All ,
> >
> >I switched from solr 4.x version to solr 5.3.0 .
> >
> >And I am creating a core and run it as standalone mode , not cloud mode .
> >I want to know , how to load those external jar file , for example , my
> >customized analyzer or filter ?
> >I add a  in solrconfig.xml , for example :
> >
> >
> >
> >
> >
> >However , it looks that it doesn't work , it still complain "Cannot load
> >analyzer" .
> >
> >It's okay for me to load them ins solr 4.10.4 , however , in solr 5.3.0 ,
> >
> >It seems that it changed the way to load jar files .
> >
> >
> >Can anybody help me on this ?  Thanks in advance .
> >
> >
> >Thanks,
> >
> >-Judy
>
-- 
Regards,
Binoy Dalal

69 matches

Mail list logo