Solr Stale pages

2018-08-30 Thread kunhu0...@gmail.com
Hello All,

I would like to know how Solr will handle the stale pages. For example there
are 30 documents indexed for a domain abc.com and in the second collection i
have only 27 documents for the same abc.com domain  and this needs to be
indexed in Solr. 
 So how solr will handle the old pages alraedy indexed ? will it delete the
stale pages in every new collection update ?
Thank you





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Need Help on Solr Client connection Pooling

2018-08-30 Thread Gembali Satish kumar
Hi Team,

Need some help on  Client connection object pooling
I am using SolrJ API to connect the Solr.

This below snippet I used to create the client object.

*SolrClient client = new HttpSolrClient.Builder(*
* SolrUtil.getSolrURL(tsConfigUtil.getClusterAdvertisedAddress(),
aInCollectionName)).build();*

after my job search done, I am closing my client.
*client.close();*

but from UI getting more requests to search the data
I think to create the *client object *on every request is costly is there
any way to pool the *SolrClient objects?*?
If there kindly share me the reference

Thanks and Regards,
Satish


Re: Need Help on Solr Client connection Pooling

2018-08-30 Thread Shalin Shekhar Mangar
You should create a single HttpSolrClient and re-use for all requests. It
is thread safe and creates an Http connection pool internally (well Apache
HttpClient does).

On Thu, Aug 30, 2018 at 2:28 PM Gembali Satish kumar <
gembalisatishku...@gmail.com> wrote:

> Hi Team,
>
> Need some help on  Client connection object pooling
> I am using SolrJ API to connect the Solr.
>
> This below snippet I used to create the client object.
>
> *SolrClient client = new HttpSolrClient.Builder(*
> * SolrUtil.getSolrURL(tsConfigUtil.getClusterAdvertisedAddress(),
> aInCollectionName)).build();*
>
> after my job search done, I am closing my client.
> *client.close();*
>
> but from UI getting more requests to search the data
> I think to create the *client object *on every request is costly is there
> any way to pool the *SolrClient objects?*?
> If there kindly share me the reference
>
> Thanks and Regards,
> Satish
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: SolrCore Initialization Failure Error loading class 'solr.IntField'

2018-08-30 Thread Salvo Bonanno
The solr version in both enviroment is 7.4.0

Looks that there was a problem using the intPointField type for a key
field in my schema, I've changed the type to string and now everything
works.

Thanks everyone for the replies.
On Wed, Aug 29, 2018 at 9:39 PM Shawn Heisey  wrote:
>
> On 8/29/2018 1:27 AM, Salvo Bonanno wrote:
> > [error]
> > corename: 
> > org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > Could not load conf for core corename: Can't load schema
> > /opt/solr/server/solr/corename/conf/managed-schema:  Plugin init
> > failure for [schema.xml] fieldType "long": Error loading class
> > 'solr.IntPointField'
>
> As Erick mentioned, this could be caused by running a version of Solr
> that doesn't have Point field classes.  I probably wouldn't try to use
> Points unless I were on the latest 6.x release (6.6.5 right now) or a
> 7.x release.  If the version isn't the problem, read on.
>
> What might be happening here is that the Solr jars (the ones that are
> part of Solr itself, like solr-core-7.4.0.jar) are on your classpath
> more than once.  They could even be the exact same version as the Solr
> version you're running, but if they are on the classpath more than once,
> it's very confusing to Java, and interferes with attempts to load classes.
>
> Thanks,
> Shawn
>


Re: Solr Stale pages

2018-08-30 Thread Jan Høydahl
Hi

Please give us more context. You can start with telling us which crawler you 
are using and more about your architecture.
It is NOT Solr's responsibility to add/delete documents on its own. it is the 
client (crawler) that has to know when a document is stale or gone from the 
source, and then the crawler needs to explicitly send a delete request for that 
doc.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 30. aug. 2018 kl. 08:48 skrev kunhu0...@gmail.com:
> 
> Hello All,
> 
> I would like to know how Solr will handle the stale pages. For example there
> are 30 documents indexed for a domain abc.com and in the second collection i
> have only 27 documents for the same abc.com domain  and this needs to be
> indexed in Solr. 
> So how solr will handle the old pages alraedy indexed ? will it delete the
> stale pages in every new collection update ?
> Thank you
> 
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr Stale pages

2018-08-30 Thread kunhu0...@gmail.com
Thanks for the update

I'm using Nutch 1.14 and Solr 6.6.3 and Zookeeper 3.4.12. We are using two
Solr and configured as Solr cloud. Please let me know if anything is missing



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Stale pages

2018-08-30 Thread Cassandra Targett
As Jan pointed out, unless your client sends Solr some instructions for
what to do with those documents specifically, Solr doesn't do anything.

In your example, Nutch crawls 30 documents at first, and 30 documents are
sent to Solr and added to the index. On next crawl, it finds 27 documents,
and 27 documents are sent to Solr. If these documents have the same unique
keys (IDs) as 27 documents already in the index, the documents in the index
will be updated (someone can correct me on this, but I believe these IDs
get updated even if the content itself has not changed).

Unless Nutch (or any other client) specifically tells Solr to do something
with the 3 documents that were not sent as part of this second update, Solr
does nothing with regard to those documents. Which makes sense, you don't
want Solr just deleting documents because you didn't happen to update them
with every indexing request.

Solr maintains no record of where a document came from, what client sent
it, nor whether subsequent updates from the same client update or do not
update the same set of documents as previous requests from the same client.
It is up to the client process itself to keep track of this, and send Solr
details of what to do with subsequent update requests. In this case, what
you want is for Nutch to send Solr a delete by ID request for those 3
documents so they are removed. I'm not sure if Nutch is capable of doing
that, however.

On Thu, Aug 30, 2018 at 7:00 AM kunhu0...@gmail.com 
wrote:

> Thanks for the update
>
> I'm using Nutch 1.14 and Solr 6.6.3 and Zookeeper 3.4.12. We are using two
> Solr and configured as Solr cloud. Please let me know if anything is
> missing
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Split on whitespace parameter doubt

2018-08-30 Thread David Argüello Sánchez
Hi everyone,

I am doing some tests to understand how the split on whitespace
parameter works with eDisMax query parser. I understand the behaviour,
but I have a doubt about why it works like that.

When sow=true, it works as it did with previous Solr versions.
When sow=false, the behaviour changes and all the terms have to be
present in the same field. However, if all queried fields' query
structure is the same, it works as if it had sow=true. This is the
thing that I don’t fully understand.
Specifying sow=false I might want to match only those documents
containing all the terms in the same field, but because of all queried
fields having the same query structure, I would get back documents
containing both terms in any of the fields.

Does anyone know the reasoning behind this decision?
Thank you in advance.

Regards,
David


Re: cloud disk space utilization

2018-08-30 Thread Kudrettin Güleryüz
Thank you Shalin. I'll try creating a policy with practically zero effect
for now.

On Wed, Aug 29, 2018 at 11:31 PM Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> There is a bad oversight on our part which causes preferences to not be
> used for placing replicas unless a cluster policy also exists. We hope to
> fix it in the next release (Solr 7.5). See
> https://issues.apache.org/jira/browse/SOLR-12648
>
> You may also be interested in
> https://issues.apache.org/jira/browse/SOLR-12592
>
>
> On Tue, Aug 28, 2018 at 2:47 AM Kudrettin Güleryüz 
> wrote:
>
> > Hi,
> >
> > We have six Solr nodes with ~1TiB disk space on each mounted as ext4. The
> > indexers sometimes update the collections and create new ones if update
> > wouldn't be faster than scratch indexing. (up to around 5 million
> documents
> > are indexed for each collection) On average there are around 130
> > collections on this SolrCloud. Collection sizes vary from 1GiB to 150GiB.
> >
> > Preferences set:
> >
> >   "cluster-preferences":[{
> >   "maximize":"freedisk",
> >   "precision":10}
> > ,{
> >   "minimize":"cores",
> >   "precision":1}
> > ,{
> >   "minimize":"sysLoadAvg",
> >   "precision":3}],
> >
> > * Is it be possible to run out of disk space on one of the nodes while
> > others would have plenty? I observe some are getting close to ~80%
> > utilization while others stay at ~60%
> > * Would this difference be due to collection index size differences or
> due
> > to error on my side to come up with a useful policy/preferences?
> >
> > Thank you
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


join works with a core, doesn't work with a collection

2018-08-30 Thread Steve Pruitt
Is there something different I need to do for a query with a join for a 
Collection?  Singular Collection, not across Collections.

Initially, I used a Core for simple development.  One of my queries uses a 
join.  It works fine.

I created a Collection with the same schema.  Indexed the same documents, but 
the join returns an empty list.
I am running SolrCloud entirely locally using the Getting started with 
SolrCloud instructions.

The Query is:

q=expctr-type:journey&{!join from=expctr-label-memberIds 
to=expctr-id}expctr-id:4b6f7d34-a58b-3399-b077-685951d06738

the document 4b6f7d34-a58b-3399-b077-685951d06738
has the multivalue field memberIds.  Its contains identifiers of other 
documents.

Thanks in advance.

-S


Re: Need Help on Solr Client connection Pooling

2018-08-30 Thread Shawn Heisey

On 8/30/2018 2:13 AM, Gembali Satish kumar wrote:

*SolrClient client = new HttpSolrClient.Builder(*
* SolrUtil.getSolrURL(tsConfigUtil.getClusterAdvertisedAddress(),
aInCollectionName)).build();*

after my job search done, I am closing my client.
*client.close();*

but from UI getting more requests to search the data
I think to create the *client object *on every request is costly is there
any way to pool the *SolrClient objects?*?
If there kindly share me the reference


Yes, creating the client on every request is costly.

Supplementing what Shalin told you:

Exactly which version of SolrJ you're running can affect how many 
threads can use the object at the same time, unless you explicitly build 
it to handle more.  Newer versions set it up with lots of thread 
capability, but older versions just create the internal HttpClient 
object with defaults.  By default, HttpClient only allows two threads.


What version of SolrJ are you using?

Thanks,
Shawn



Re: SolrCore Initialization Failure Error loading class 'solr.IntField'

2018-08-30 Thread Shawn Heisey

On 8/30/2018 3:14 AM, Salvo Bonanno wrote:

The solr version in both enviroment is 7.4.0

Looks that there was a problem using the intPointField type for a key
field in my schema, I've changed the type to string and now everything
works.


Seeing that problem in 7.4.0 definitely sounds like you've got a problem 
with extra copies of jars on your classpath.  The problems get even 
worse if the extra copies are a different version.  If extra jars are 
present, you might run into additional problems.


If using a string type (which is usually solr.StrField) works for you, 
then great, you might be done configuring that field.


Thanks,
Shawn



Re: join works with a core, doesn't work with a collection

2018-08-30 Thread Shawn Heisey

On 8/30/2018 9:00 AM, Steve Pruitt wrote:

Is there something different I need to do for a query with a join for a 
Collection?  Singular Collection, not across Collections.

Initially, I used a Core for simple development.  One of my queries uses a 
join.  It works fine.


I know very little about using the join feature. Everything that I know 
about it has been picked up by reading this mailing list.


I think one of the key elements is that all of the indexes involved in a 
join must exist on the same server.  It is quite common with a SolrCloud 
setup to have the data spread across multiple servers, especially when 
collections have multiple shards.


I do not know whether you're having a problem because of having your 
collection spread across multiple servers or not, but it does seem to be 
a likely possibility.


Thanks,
Shawn



RE: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

2018-08-30 Thread Steve Pruitt
Single server.  Localhost.  I am using the simple setup and took all the 
defaults.



-Original Message-
From: Shawn Heisey  
Sent: Thursday, August 30, 2018 11:14 AM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

On 8/30/2018 9:00 AM, Steve Pruitt wrote:
> Is there something different I need to do for a query with a join for a 
> Collection?  Singular Collection, not across Collections.
>
> Initially, I used a Core for simple development.  One of my queries uses a 
> join.  It works fine.

I know very little about using the join feature. Everything that I know about 
it has been picked up by reading this mailing list.

I think one of the key elements is that all of the indexes involved in a join 
must exist on the same server.  It is quite common with a SolrCloud setup to 
have the data spread across multiple servers, especially when collections have 
multiple shards.

I do not know whether you're having a problem because of having your collection 
spread across multiple servers or not, but it does seem to be a likely 
possibility.

Thanks,
Shawn



Re: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

2018-08-30 Thread Shawn Heisey

On 8/30/2018 9:17 AM, Steve Pruitt wrote:

Single server.  Localhost.  I am using the simple setup and took all the 
defaults.


Is there more than one Solr instance on that server? SolrCloud considers 
multiple instances to be completely separate, even if they're actually 
on the same hardware.


Thanks,
Shawn



RE: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

2018-08-30 Thread Steve Pruitt
If you mean another running Solr server running, then no.  

-Original Message-
From: Shawn Heisey  
Sent: Thursday, August 30, 2018 11:31 AM
To: solr-user@lucene.apache.org
Subject: Re: [EXTERNAL] - Re: join works with a core, doesn't work with a 
collection

On 8/30/2018 9:17 AM, Steve Pruitt wrote:
> Single server.  Localhost.  I am using the simple setup and took all the 
> defaults.

Is there more than one Solr instance on that server? SolrCloud considers 
multiple instances to be completely separate, even if they're actually on the 
same hardware.

Thanks,
Shawn



Solr suggestions: why are exact matches omitted

2018-08-30 Thread Clemens Wyss DEV
Given the following configuration:
...


suggest_word_fuzzy
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FuzzyLookupFactory
true
_my_suggest_word
2
0.01
.01
suggest_word 

false 
false 
true 


...
When I try to find suggestions for "11000.35" I get
"11000.33"
"11000.34"
"11000.36"
"11000.37"
...
but not "11000.35", although "11000.35" exists (and is suggested when I for 
example type "11000.34")

Thx in advance
- Clemens


AW: Solr suggestions: why are exact matches omitted

2018-08-30 Thread Clemens Wyss DEV
Or do the spellcheck results give an indication that "11000.35" has an exact 
match?

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV  
Gesendet: Donnerstag, 30. August 2018 18:01
An: 'solr-user@lucene.apache.org' 
Betreff: Solr suggestions: why are exact matches omitted

Given the following configuration:
...


suggest_word_fuzzy
org.apache.solr.spelling.suggest.Suggester
org.apache.solr.spelling.suggest.fst.FuzzyLookupFactory
true
_my_suggest_word
2
0.01
.01
suggest_word 

false 
false 
true 


...
When I try to find suggestions for "11000.35" I get "11000.33"
"11000.34"
"11000.36"
"11000.37"
...
but not "11000.35", although "11000.35" exists (and is suggested when I for 
example type "11000.34")

Thx in advance
- Clemens


Re: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

2018-08-30 Thread Shawn Heisey

On 8/30/2018 9:49 AM, Steve Pruitt wrote:

If you mean another running Solr server running, then no.


I mean multiple Solr processes.

The cloud example (started with bin/solr -e cloud) starts two Solr 
instances if you give it the defaults.  They are both running on the 
same machine, but if part of the data is on the instance running on port 
8983 and part of the data is on the instance running on port 7574, I 
don't think you can do a join.


Thanks,
Shawn



Cannot Figure out Reason for Persistent Zookeeper Warning

2018-08-30 Thread THADC
Hello,
we have a solr server set up with a pair of replicated solr servers and a
three-node zookeeper front end. This configuration is replicated in several
environments. In one environment, we frequently receive the following
zookeeper-related warning against each of the three webapps that have solr
client interfaces:

*[BLAH_WEBAPP] 2018-08-30 08:34:38,639 WARN
[org.apache.zookeeper.ClientCnxn] - 
java.lang.NoClassDefFoundError: org/apache/zookeeper/proto/SetWatches
at
org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:927)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1144)*

This warning does not cause any noticeable issues other than the fact that
it is generated with considerable frequency and therefore fills our tomcat
AND zookeeper log files.

Still, I would love to be able to figure out what the problem might be and
just get rid of it. I asked on the zookeeper nabble forum, but got no
replies. I would be grateful for any insights. Thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

2018-08-30 Thread Steve Pruitt
Gosh, really?  This is not mentioned anywhere in the documentation that I can 
find.  There are node to HW considerations if you are joining across different 
Collections.
But, the same Collection?  Tell me this is not so.

-S

-Original Message-
From: Shawn Heisey  
Sent: Thursday, August 30, 2018 12:11 PM
To: solr-user@lucene.apache.org
Subject: Re: [EXTERNAL] - Re: join works with a core, doesn't work with a 
collection

On 8/30/2018 9:49 AM, Steve Pruitt wrote:
> If you mean another running Solr server running, then no.

I mean multiple Solr processes.

The cloud example (started with bin/solr -e cloud) starts two Solr instances if 
you give it the defaults.  They are both running on the same machine, but if 
part of the data is on the instance running on port
8983 and part of the data is on the instance running on port 7574, I don't 
think you can do a join.

Thanks,
Shawn



RE: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

2018-08-30 Thread Steve Pruitt
Shawn,

You are correct.  I created another setup.  This time with 1 node, 1 shard, 2 
replicas and the join worked!
Running with the example SolrCloud setup doesn't work for join queries.

Thanks.

-S


-Original Message-
From: Steve Pruitt  
Sent: Thursday, August 30, 2018 12:25 PM
To: solr-user@lucene.apache.org
Subject: RE: [EXTERNAL] - Re: join works with a core, doesn't work with a 
collection

Gosh, really?  This is not mentioned anywhere in the documentation that I can 
find.  There are node to HW considerations if you are joining across different 
Collections.
But, the same Collection?  Tell me this is not so.

-S

-Original Message-
From: Shawn Heisey  
Sent: Thursday, August 30, 2018 12:11 PM
To: solr-user@lucene.apache.org
Subject: Re: [EXTERNAL] - Re: join works with a core, doesn't work with a 
collection

On 8/30/2018 9:49 AM, Steve Pruitt wrote:
> If you mean another running Solr server running, then no.

I mean multiple Solr processes.

The cloud example (started with bin/solr -e cloud) starts two Solr instances if 
you give it the defaults.  They are both running on the same machine, but if 
part of the data is on the instance running on port
8983 and part of the data is on the instance running on port 7574, I don't 
think you can do a join.

Thanks,
Shawn



Re: Split on whitespace parameter doubt

2018-08-30 Thread Emir Arnautović
Hi David,
Your observations seem correct. If all fields produces the same tokens then 
Solr goes for “term centric” query, but if different fields produce different 
tokens, then it uses field centric query. Here is blog post that explains it 
from multiword synonyms perspective: 
https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
 


IMO the issue is that it is not clear how term centric would look like in case 
of different tokens: Imagine that your query is “a b” and you are searching  
two fields title (analysed) and title_s (string) so you will end up with tokens 
‘a’, ‘b’ and ‘a b’. So term centric query would be (title:a || title_s:a) 
(title:b || title_s:b)(title:a b || title_s:a b). If not already weird, lets 
assume you allow one token to be missed…

I am not sure why field centric field is not used all the time or at least why 
there is no parameter to force it.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 30 Aug 2018, at 15:02, David Argüello Sánchez 
>  wrote:
> 
> Hi everyone,
> 
> I am doing some tests to understand how the split on whitespace
> parameter works with eDisMax query parser. I understand the behaviour,
> but I have a doubt about why it works like that.
> 
> When sow=true, it works as it did with previous Solr versions.
> When sow=false, the behaviour changes and all the terms have to be
> present in the same field. However, if all queried fields' query
> structure is the same, it works as if it had sow=true. This is the
> thing that I don’t fully understand.
> Specifying sow=false I might want to match only those documents
> containing all the terms in the same field, but because of all queried
> fields having the same query structure, I would get back documents
> containing both terms in any of the fields.
> 
> Does anyone know the reasoning behind this decision?
> Thank you in advance.
> 
> Regards,
> David



ZooKeeper issues with AWS

2018-08-30 Thread Jack Schlederer
Hi all,

My team is attempting to spin up a SolrCloud cluster with an external
ZooKeeper ensemble. We're trying to engineer our solution to be HA and
fault-tolerant such that we can lose either 1 Solr instance or 1 ZooKeeper
and not take downtime. We use chaos engineering to randomly kill instances
to test our fault-tolerance. Killing Solr instances seems to be solved, as
we use a high enough replication factor and Solr's built in autoscaling to
ensure that new Solr nodes added to the cluster get the replicas that were
lost from the killed node. However, ZooKeeper seems to be a different
story. We can kill 1 ZooKeeper instance and still maintain, and everything
is good. It comes back and starts participating in leader elections, etc.
Kill 2, however, and we lose the quorum and we have collections/replicas
that appear as "gone" on the Solr Admin UI's cloud graph display, and we
get Java errors in the log reporting that collections can't be read from
ZK. This means we aren't servicing search requests. We found an open JIRA
that reports this same issue, but its only affected version is 5.3.1. We
are experiencing this problem in 7.3.1. Has there been any progress or
potential workarounds on this issue since?

Thanks,
Jack

Reference:
https://issues.apache.org/jira/browse/SOLR-8868


Re: ZooKeeper issues with AWS

2018-08-30 Thread Walter Underwood
How many Zookeeper nodes in your ensemble? You need five nodes to
handle two failures.

Are your Solr instances started with a zkHost that lists all five Zookeeper 
nodes?

What version of Zookeeper?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 30, 2018, at 1:45 PM, Jack Schlederer 
>  wrote:
> 
> Hi all,
> 
> My team is attempting to spin up a SolrCloud cluster with an external
> ZooKeeper ensemble. We're trying to engineer our solution to be HA and
> fault-tolerant such that we can lose either 1 Solr instance or 1 ZooKeeper
> and not take downtime. We use chaos engineering to randomly kill instances
> to test our fault-tolerance. Killing Solr instances seems to be solved, as
> we use a high enough replication factor and Solr's built in autoscaling to
> ensure that new Solr nodes added to the cluster get the replicas that were
> lost from the killed node. However, ZooKeeper seems to be a different
> story. We can kill 1 ZooKeeper instance and still maintain, and everything
> is good. It comes back and starts participating in leader elections, etc.
> Kill 2, however, and we lose the quorum and we have collections/replicas
> that appear as "gone" on the Solr Admin UI's cloud graph display, and we
> get Java errors in the log reporting that collections can't be read from
> ZK. This means we aren't servicing search requests. We found an open JIRA
> that reports this same issue, but its only affected version is 5.3.1. We
> are experiencing this problem in 7.3.1. Has there been any progress or
> potential workarounds on this issue since?
> 
> Thanks,
> Jack
> 
> Reference:
> https://issues.apache.org/jira/browse/SOLR-8868



Analyzer used if the field type has only index type specified

2018-08-30 Thread Natarajan, Rajeswari
Hi ,

In case of fieldTypes which specify only  ‘index’ time analyzers , what will be 
the analyzer used during query time.


Example below specified only index time analyzer. So what will be used during 
query time


https://opengrok.ariba.com/source/s?path=solr/&project=arches_rel>.TextField"

 positionIncrementGap="100" autoGeneratePhraseQueries="true">

  

   https://opengrok.ariba.com/source/s?path=solr/&project=arches_rel>.WhitespaceTokenizerFactory"/>

   https://opengrok.ariba.com/source/s?path=solr/&project=arches_rel>.WordDelimiterFilterFactory"

generateWordParts="1" generateNumberParts="1" 
catenateWords="0"catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1" preserveOriginal="1"/>

   https://opengrok.ariba.com/source/s?path=solr/&project=arches_rel>.LowerCaseFilterFactory"/>

https://opengrok.ariba.com/source/s?path=solr/&project=arches_rel>.StopFilterFactory"


words="lang/en/stopwords.txt"
 enablePositionIncrements="true"/>

https://opengrok.ariba.com/source/s?path=solr/&project=arches_rel>.EnglishPossessiveFilterFactory"/>

   https://opengrok.ariba.com/source/s?path=solr/&project=arches_rel>.PorterStemFilterFactory"/>

   




Regards
Rajeswari


change DocExpirationUpdateProcessorFactory deleteByQuery NOW parameter time zone

2018-08-30 Thread Derek Poh

Hi

Can the timezone of the NOW parameter in the |deleteByQuery| of the 
DocExpirationUpdateProcessorFactory be change to my timezone?


I am in SG and using solr 6.5.1.

The timestamp of the entries in the solr.log is in my timezone but the 
NOW parameter of the |deleteByQuery| is a different timezone (UTC?).


The |deleteByQuery| entry in the solr.log:

2018-08-30 16:34:03.941 INFO  (qtp834133664-3600) [c:exhibitor_product_2 
s:shard1 r:core_node1 x:exhibitor_product_2_shard1_replica2] 
o.a.s.u.p.LogUpdateProcessorFactory 
[exhibitor_product_2_shard1_replica2]  webapp=/solr path=/update 
params={update.distrib=FROMLEADER&_version_=-1610212229046599680&distrib.from=http://192.168.83.152:8983/solr/exhibitor_product_2_shard1_replica1/&wt=javabin&version=2}{deleteByQuery={!cache=false}P_TradeShowOnlineEndDate:[* 
TO 2018-08-30T08:34:06.804Z] (-1610212229046599680)} 0 23



DocExpirationUpdateProcessorFactory definition in solrconfig.xml:


  
    P_SupplierId
    P_TradeShowId
    P_ProductId
    id
  
  
    id
    
  
  
 -1
  
  
    
    
    86400
    P_TradeShowOnlineEndDate
  
  
  



stored="true" multiValued="false"/>


Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Solrcloud collection file location on zookeeper

2018-08-30 Thread Sushant Vengurlekar
Where does zookeeper store the collection info on local filesystem on
zookeeper?

Thank you


Re: ZooKeeper issues with AWS

2018-08-30 Thread Jack Schlederer
We run a 3 node ZK cluster, but I'm not concerned about 2 nodes failing at
the same time. Our chaos process only kills approximately one node per
hour, and our cloud service provider automatically spins up another ZK node
when one goes down. All 3 ZK nodes are back up within 2 minutes, talking to
each other and syncing data. It's just that Solr doesn't seem to recognize
it. We'd have to restart Solr to get it to recognize the new Zookeepers,
which we can't do without taking downtime or losing data that's stored on
non-persistent disk within the container.

The ZK_HOST environment variable lists all 3 ZK nodes.

We're running ZooKeeper version 3.4.13.

Thanks,
Jack

On Thu, Aug 30, 2018 at 4:12 PM Walter Underwood 
wrote:

> How many Zookeeper nodes in your ensemble? You need five nodes to
> handle two failures.
>
> Are your Solr instances started with a zkHost that lists all five
> Zookeeper nodes?
>
> What version of Zookeeper?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Aug 30, 2018, at 1:45 PM, Jack Schlederer <
> jack.schlede...@directsupply.com> wrote:
> >
> > Hi all,
> >
> > My team is attempting to spin up a SolrCloud cluster with an external
> > ZooKeeper ensemble. We're trying to engineer our solution to be HA and
> > fault-tolerant such that we can lose either 1 Solr instance or 1
> ZooKeeper
> > and not take downtime. We use chaos engineering to randomly kill
> instances
> > to test our fault-tolerance. Killing Solr instances seems to be solved,
> as
> > we use a high enough replication factor and Solr's built in autoscaling
> to
> > ensure that new Solr nodes added to the cluster get the replicas that
> were
> > lost from the killed node. However, ZooKeeper seems to be a different
> > story. We can kill 1 ZooKeeper instance and still maintain, and
> everything
> > is good. It comes back and starts participating in leader elections, etc.
> > Kill 2, however, and we lose the quorum and we have collections/replicas
> > that appear as "gone" on the Solr Admin UI's cloud graph display, and we
> > get Java errors in the log reporting that collections can't be read from
> > ZK. This means we aren't servicing search requests. We found an open JIRA
> > that reports this same issue, but its only affected version is 5.3.1. We
> > are experiencing this problem in 7.3.1. Has there been any progress or
> > potential workarounds on this issue since?
> >
> > Thanks,
> > Jack
> >
> > Reference:
> > https://issues.apache.org/jira/browse/SOLR-8868
>
>