Re: Negative CDCR Queue Size?

2018-11-09 Thread Amrit Sarkar
Hi Webster,

The queue size "*-1*" suggests the target is not initialized, and you
should see a "WARN" in the logs suggesting something bad happened at the
respective target. I am also posting the source code for reference.

Any chance you can look for WARN in the logs or probably check at
respective source and target the CDCR is configured and was running ok?
without any manual intervention?

Also, you mentioned there are a number of intermittent issues with CDCR, I
see you have reported few Jiras. I will be grateful if you can report the
rest?

Code:

> for (CdcrReplicatorState state : replicatorManager.getReplicatorStates()) {
>   NamedList queueStats = new NamedList();
>   CdcrUpdateLog.CdcrLogReader logReader = state.getLogReader();
>   if (logReader == null) {
> String collectionName = 
> req.getCore().getCoreDescriptor().getCloudDescriptor().getCollectionName();
> String shard = 
> req.getCore().getCoreDescriptor().getCloudDescriptor().getShardId();
> log.warn("The log reader for target collection {} is not initialised @ 
> {}:{}",
> state.getTargetCollection(), collectionName, shard);
> queueStats.add(CdcrParams.QUEUE_SIZE, -1l);
>   } else {
> queueStats.add(CdcrParams.QUEUE_SIZE, 
> logReader.getNumberOfRemainingRecords());
>   }
>   queueStats.add(CdcrParams.LAST_TIMESTAMP, 
> state.getTimestampOfLastProcessedOperation());
>   if (hosts.get(state.getZkHost()) == null) {
> hosts.add(state.getZkHost(), new NamedList());
>   }
>   ((NamedList) hosts.get(state.getZkHost())).add(state.getTargetCollection(), 
> queueStats);
> }
> rsp.add(CdcrParams.QUEUES, hosts);
>
>
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Wed, Nov 7, 2018 at 12:47 AM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> I'm sorry I should have included that. We are running Solr 7.2. We use
> CDCR for almost all of our collections. We have experienced several
> intermittent problems with CDCR, this one seems to be new, at least I
> hadn't seen it before
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, November 06, 2018 12:36 PM
> To: solr-user 
> Subject: Re: Negative CDCR Queue Size?
>
> What version of Solr? CDCR has changed quite a bit in the 7x  code line so
> it's important to know the version.
>
> On Tue, Nov 6, 2018 at 10:32 AM Webster Homer <
> webster.ho...@milliporesigma.com> wrote:
> >
> > Several times I have noticed that the CDCR action=QUEUES will return a
> negative queueSize. When this happens we seem to be missing data in the
> target collection. How can this happen? What does a negative Queue size
> mean? The timestamp is an empty string.
> >
> > We have two targets for a source. One looks like this, with a negative
> > queue size
> > queues":
> > ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco
> > m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize
> > ",-1,"lastTimestamp",""]],
> >
> > The other is healthy
> > "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom
> > -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize"
> > ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
> >
> > We are not seeing CDCR errors.
> >
> > What could cause this behavior?
>


Re: Master Slave Replication Issue

2018-11-09 Thread damian.pawski
Hi, 
We have switched from 5.4 to 7.2.1 and we have started to see more issues
with the replication.
I think it may be related to the fact that a delta import was started during
a full import (not the case for the Solr 5.4).

I am getting below error:

XXX: java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
Directory MMapDirectory@XXX\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@21ff4974 still has
pending deleted files; cannot initialize IndexWriter

Are there more known issues with Solr 7.X and the replication?
Based on https://issues.apache.org/jira/browse/SOLR-11938 I can not trust
Solr 7.X anymore.

How can I fix the 
"XXX: java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
Directory MMapDirectory@XXX\index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@21ff4974 still has
pending deleted files; cannot initialize IndexWriter
"
issue?

Thank you
Damian



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Sql server data import

2018-11-09 Thread Verthosa
Hello, i managed to set up a connection to my sql server to import data into
Solr. The idea is to import filetables but for now i first want to get it
working using regular tables. So i created 

*data-config.xml*
   
 
 
  

  
  
 


*schema.xml*
i added
  


and changed uniqueKey entry to 
Id

When i want to import my data (which is just data like Id: 5, PublicId:
"test"), i get the following error in the logging. 

Error creating document : SolrInputDocument(fields: [PublicId=10065,​
Id=117])


I tried all sorts of things but can't get it fixed. Is anyone want to give
me a hand?

thanks in advance!




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sql server data import

2018-11-09 Thread Alexandre Rafalovitch
Which version of Solr is it? Because we have not used schema.xml for a
very long time. It has been managed-schema instead.

Also, have you tried using DIH example that uses database and
modifying it just enough to read data from your database. Even if it
has a lot of extra junk, this would test half of the pipeline, which
you can then transfer to the clean setup.

Regards,
   Alex.
On Fri, 9 Nov 2018 at 08:09, Verthosa  wrote:
>
> Hello, i managed to set up a connection to my sql server to import data into
> Solr. The idea is to import filetables but for now i first want to get it
> working using regular tables. So i created
>
> *data-config.xml*
> 
> driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>
> url="jdbc:sqlserver://localhost;databaseName=inConnexion_Tenant2;integratedSecurity=true"
> />
>   
>
>  
>  
>
>   
>   
>
> *schema.xml*
> i added
>  multiValued="false" />
>  multiValued="false"/>
>
> and changed uniqueKey entry to
> Id
>
> When i want to import my data (which is just data like Id: 5, PublicId:
> "test"), i get the following error in the logging.
>
> Error creating document : SolrInputDocument(fields: [PublicId=10065,​
> Id=117])
>
>
> I tried all sorts of things but can't get it fixed. Is anyone want to give
> me a hand?
>
> thanks in advance!
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Sql server data import

2018-11-09 Thread Gu, Steve (CDC/DDPHSS/OS) (CTR)
What is "​"  in the PublicId?  Is it part of the data?  Did you check if 
the special characters in your data cause the problem?

Steve

###
Error creating document : SolrInputDocument(fields: [PublicId=10065,​
Id=117])

-Original Message-
From: Verthosa  
Sent: Friday, November 9, 2018 7:51 AM
To: solr-user@lucene.apache.org
Subject: Sql server data import

Hello, i managed to set up a connection to my sql server to import data into 
Solr. The idea is to import filetables but for now i first want to get it 
working using regular tables. So i created 

*data-config.xml*
   
 
 
  

  
  
 


*schema.xml*
i added
  


and changed uniqueKey entry to
Id

When i want to import my data (which is just data like Id: 5, PublicId:
"test"), i get the following error in the logging. 

Error creating document : SolrInputDocument(fields: [PublicId=10065,​
Id=117])


I tried all sorts of things but can't get it fixed. Is anyone want to give me a 
hand?

thanks in advance!




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Disabling jvm properties from ui

2018-11-09 Thread Jan Høydahl
Yes, it is important to understand that only trusted clients and persons should 
be given access to Solr's port.

But it may stil be surprising to users that e.g. passwords to a DB or SSL 
keystore is available over HTTP when there is no need for them at the client 
side. I'm not saying itis a bug, but may be surprising. So I think we should 
continue step by step to address these and have Solr behave after the principle 
of least surprise, thus the discussion in 
https://issues.apache.org/jira/browse/SOLR-12976

After locking down secrets as good as possible, the next logical step would be 
to couple Solr's Authentication/Authorization feature to this, so that if a 
client has a role with the read/edit securityconfig permission, then she could 
be allowed to see those properties. So far the authorization is true/false 
based on handler/HTTPMethod meaning we'd have to add a new 
/solr/admin/info/system/secrets/ handler which could return those hidden props. 
But there may not be a need to retrieve these on API level at all.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 8. nov. 2018 kl. 19:54 skrev Gus Heck :
> 
> That's an interesting feature, and it addresses X, but there are lots of
> ways to discover system properties. In a managed schema, enter a field name
> ${java.version} and you'll get a field named 1.8.0_144 (or whatever). I
> still think it's important to address Y they are trying to hide the system
> properties from someone they have placed their trust in already.
> 
> On Thu, Nov 8, 2018 at 1:16 PM Jan Høydahl  wrote:
> 
>> It's not documented in the Ref Guide, but you can set this system property
>> to fix it:
>> 
>> 
>> SOLR_OPTS="-Dsolr.redaction.system.pattern=(.*password.*|.*your-own-regex.*)"
>> 
>> Then the property will show as --REDACTED— in the UI.
>> 
>> Note that the property still will leak through /solr/admin/metrics and you
>> need to add the same exclusion in solr.xml, see
>> https://lucene.apache.org/solr/guide/7_5/metrics-reporting.html#the-metrics-hiddensysprops-element
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 7. nov. 2018 kl. 20:51 skrev Naveen M :
>>> 
>>> Hi,
>>> 
>>> Is there a way to disable jvm properties from the solr UI.
>>> 
>>> It has some information which we don’t want to expose. Any pointers would
>>> be helpful.
>>> 
>>> 
>>> Thanks
>> 
>> 
> 
> -- 
> http://www.the111shift.com



Re: Want to subscribe to this list

2018-11-09 Thread Steve Rowe
Hi Michela,

For subscription info see: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc

I'm not aware of any Slack discussion groups, but there are two freenode.net 
IRC channels - see: http://lucene.apache.org/solr/community.html#irc

Steve

> On Nov 8, 2018, at 10:42 AM, Michela Dennis  wrote:
> 
> Do you by any chance have a slack discussion group as well?
> 
> Michela Dennis



Re: Master Slave Replication Issue

2018-11-09 Thread Erick Erickson
Damian:

You say you've switched from 5x to 7x. Did you try to use an index
created with 5x or did you index fresh with 7x? Solr/Lucene do not
guarantee backward compatibility across more than one major version.

Best,
Erick
On Fri, Nov 9, 2018 at 2:34 AM damian.pawski  wrote:
>
> Hi,
> We have switched from 5.4 to 7.2.1 and we have started to see more issues
> with the replication.
> I think it may be related to the fact that a delta import was started during
> a full import (not the case for the Solr 5.4).
>
> I am getting below error:
>
> XXX: java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
> Directory MMapDirectory@XXX\index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@21ff4974 still has
> pending deleted files; cannot initialize IndexWriter
>
> Are there more known issues with Solr 7.X and the replication?
> Based on https://issues.apache.org/jira/browse/SOLR-11938 I can not trust
> Solr 7.X anymore.
>
> How can I fix the
> "XXX: java.lang.IllegalArgumentException:java.lang.IllegalArgumentException:
> Directory MMapDirectory@XXX\index
> lockFactory=org.apache.lucene.store.NativeFSLockFactory@21ff4974 still has
> pending deleted files; cannot initialize IndexWriter
> "
> issue?
>
> Thank you
> Damian
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sql server data import

2018-11-09 Thread Verthosa
Hello, i managed to fix the problem. I'm using Solr 7.5.0. My problem was
that in the server logs i got "This Indexschema is not mutable" (i did not
know about the logs folder, so i just found out 5 minutes ago). I fixed it
by modifying solrconfig.xml to

false*}"

processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">





Since then the indexing is done correctly. I even got the blob fields
indexation working now ! Thanks for your reply, everything is fixed for now. 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sql server data import

2018-11-09 Thread Erick Erickson
Ok, what that means is you're letting Solr do its best to figure out
what fields you should have in the schema and how they're defined.
Almost invariably, you can do better by explicitly defining the fields
you need in your schema rather than enabling add-unknown. It's
fine for getting started, but not advised for production.

Best,
Erick
On Fri, Nov 9, 2018 at 7:52 AM Verthosa  wrote:
>
> Hello, i managed to fix the problem. I'm using Solr 7.5.0. My problem was
> that in the server logs i got "This Indexschema is not mutable" (i did not
> know about the logs folder, so i just found out 5 minutes ago). I fixed it
> by modifying solrconfig.xml to
>
>  name="add-unknown-fields-to-the-schema"
> default="${update.autoCreateFields:false*}"
>
> processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
> 
> 
> 
> 
>
> Since then the indexing is done correctly. I even got the blob fields
> indexation working now ! Thanks for your reply, everything is fixed for now.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Indexing vs Search node

Hi guys,
I read in several blog posts that it's never a good idea to index and
search on the same node. I wonder how that can be achieved in Solr Cloud or
if it happens automatically.




-- 

Fernando Otero

Sr Engineering Manager, Panamera

Buenos Aires - Argentina

Mobile: +54 911 67697108

Email:  fernando.ot...@olx.com


Re: Indexing vs Search node


On 11/9/2018 12:13 PM, Fernando Otero wrote:

 I read in several blog posts that it's never a good idea to index and
search on the same node. I wonder how that can be achieved in Solr Cloud or
if it happens automatically.


I would disagree with that blanket assertion.

Indexing does put extra load on a server that can interfere with query 
performance.  Whether that will be a real problem pretty much depends on 
exactly how much indexing you're doing, and what kind of query load you 
need to handle.  For extreme scaling, it can be a good idea to separate 
indexing and searching.


With a master/slave architecture, any version of Solr can separate 
indexing and querying.


Before 7.x, it wasn't possible to separate indexing and querying with 
SolrCloud.  With previous major versions, ALL replicas do the same 
indexing.  With 7.x, that's still the default behavior, but 7.x has new 
replica types that make it possible for indexing to only take place on 
shard leaders. The latest version of Solr 7.x has a way to prefer 
certain replica types, which is how the separation can be achieved.


Thanks,
Shawn



Re: Indexing vs Search node

Fernando:

I'd phrase it more strongly than Shawn. Prior to 7.0
all replicas both indexed and search (they were NRT replica),
so there wasn't any choice but to index and search on
every replica.

It's one of those things that if you have very high
throughput (indexing) situations, you _might_
want to use TLOG and/or PULL replicas.

But TANSTAAFL (There Ain't  No Such Thing As A Free Lunch).
TLOG/PULL replicas copy index segments around, which
may be up to 5G each (default TieredMergePolicy cap on individual
segment sizes), whereas NRT replicas just get the raw document.

So in the TLOG/PULL situations, you'll get bursts of network traffic
but each replica has less CPU load because all the replicas but one
for each shard do not  have to index the doc.

In the NRT case, the raw documents are forwarded so the
network is less bursty, but all of the replicas spend CPU
cycles indexing.

So I wouldn't worry about it unless you running into performance
problems, _then_ I'd investigate TLOG/PULL replicas.

Best,
Erick
On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey  wrote:
>
> On 11/9/2018 12:13 PM, Fernando Otero wrote:
> >  I read in several blog posts that it's never a good idea to index and
> > search on the same node. I wonder how that can be achieved in Solr Cloud or
> > if it happens automatically.
>
> I would disagree with that blanket assertion.
>
> Indexing does put extra load on a server that can interfere with query
> performance.  Whether that will be a real problem pretty much depends on
> exactly how much indexing you're doing, and what kind of query load you
> need to handle.  For extreme scaling, it can be a good idea to separate
> indexing and searching.
>
> With a master/slave architecture, any version of Solr can separate
> indexing and querying.
>
> Before 7.x, it wasn't possible to separate indexing and querying with
> SolrCloud.  With previous major versions, ALL replicas do the same
> indexing.  With 7.x, that's still the default behavior, but 7.x has new
> replica types that make it possible for indexing to only take place on
> shard leaders. The latest version of Solr 7.x has a way to prefer
> certain replica types, which is how the separation can be achieved.
>
> Thanks,
> Shawn
>


Re: Indexing vs Search node

I personally like standalone solr for this reason, i can tune the indexing
"master" for doing nothing but taking in documents and that way the slaves
dont battle for resources in the process.

On Fri, Nov 9, 2018 at 3:10 PM Erick Erickson 
wrote:

> Fernando:
>
> I'd phrase it more strongly than Shawn. Prior to 7.0
> all replicas both indexed and search (they were NRT replica),
> so there wasn't any choice but to index and search on
> every replica.
>
> It's one of those things that if you have very high
> throughput (indexing) situations, you _might_
> want to use TLOG and/or PULL replicas.
>
> But TANSTAAFL (There Ain't  No Such Thing As A Free Lunch).
> TLOG/PULL replicas copy index segments around, which
> may be up to 5G each (default TieredMergePolicy cap on individual
> segment sizes), whereas NRT replicas just get the raw document.
>
> So in the TLOG/PULL situations, you'll get bursts of network traffic
> but each replica has less CPU load because all the replicas but one
> for each shard do not  have to index the doc.
>
> In the NRT case, the raw documents are forwarded so the
> network is less bursty, but all of the replicas spend CPU
> cycles indexing.
>
> So I wouldn't worry about it unless you running into performance
> problems, _then_ I'd investigate TLOG/PULL replicas.
>
> Best,
> Erick
> On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey  wrote:
> >
> > On 11/9/2018 12:13 PM, Fernando Otero wrote:
> > >  I read in several blog posts that it's never a good idea to index
> and
> > > search on the same node. I wonder how that can be achieved in Solr
> Cloud or
> > > if it happens automatically.
> >
> > I would disagree with that blanket assertion.
> >
> > Indexing does put extra load on a server that can interfere with query
> > performance.  Whether that will be a real problem pretty much depends on
> > exactly how much indexing you're doing, and what kind of query load you
> > need to handle.  For extreme scaling, it can be a good idea to separate
> > indexing and searching.
> >
> > With a master/slave architecture, any version of Solr can separate
> > indexing and querying.
> >
> > Before 7.x, it wasn't possible to separate indexing and querying with
> > SolrCloud.  With previous major versions, ALL replicas do the same
> > indexing.  With 7.x, that's still the default behavior, but 7.x has new
> > replica types that make it possible for indexing to only take place on
> > shard leaders. The latest version of Solr 7.x has a way to prefer
> > certain replica types, which is how the separation can be achieved.
> >
> > Thanks,
> > Shawn
> >
>


Re: Indexing vs Search node


On 11/9/2018 1:58 PM, David Hastings wrote:

I personally like standalone solr for this reason, i can tune the indexing
"master" for doing nothing but taking in documents and that way the slaves
dont battle for resources in the process.


SolrCloud can be set up pretty similar to this if you're running 7.5.  
You set things up so each collection has two TLOG replicas and the rest 
of them are PULL.


SolrCloud doesn't have master and slave in the same way as the old 
architecture.  There are no single points of failure if the hardware is 
set up correctly.  But because PULL replicas cannot become leader, they 
are a lot like slaves.  Solr 7.5 and later can configure a preference 
for different replica types at query time.  So with the setup described 
above, you tell it to prefer PULL replicas.  If all the PULL replicas 
were to die, then SolrCloud would use whatever is left.


Let's say that you set up a collection so it has two TLOG replicas and 
four PULL replicas.  You could have the TLOG replicas live on a pair of 
servers with SSD drives and less memory than the other four servers that 
have PULL replicas, which could be running standard hard drives.  
Queries love memory, indexing loves fast disks.  The preference that 
indicates PULL replicas would keep the queries so they are running only 
on the four machines with more memory.


The reason that you want two TLOG replicas instead of one is so that if 
the current leader dies, there is another TLOG replica available to 
become leader.


Thanks,
Shawn