Re: Expressing "not equals" in Block Join Parent Query

2017-04-06 Thread Mikhail Khludnev
&fq={!parent which="contentType_s:Header"}accountNo_s:* -accountNo_s
:"123456"

05 апр. 2017 г. 21:47 пользователь "Zheng Lin Edwin Yeo" <
edwinye...@gmail.com> написал:

> Hi,
>
> Is there any way which we can express not equals in Block Join Parent
> Query?
>
> For example, I want to find accountNo_s that is not equal to 123456
>
> Currently, I am putting it in this way for it to work:
> &fq={!parent which="contentType_s:Header"}accountNo_s:* AND !accountNo_s
> :"123456"
>
> It does not work when I put it this way:
> &fq={!parent which="contentType_s:Header"}!accountNo_s :"123456"
>
> But I don't think it is a good practice to put the additional accountNo_s:*
> before that. Is there other better way which we could query this?
>
> I'm using Solr 6.4.2
>
>
> Regards,
> Edwin
>


Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-06 Thread Himanshu Sachdeva
Hi all,

We use solr in our website for product search. Currently, we have 2.1
million documents in the products core and these documents each have around
350 fields. >90% of the fields are indexed. We have this master instance of
solr running on 15GB RAM and 200GB drive. We have also configured 10 slaves
for handling the reads from website. Slaves poll master at an interval of
20 minutes. We monitored the index size for a few days and found that it
varies widely from 11GB to 43GB.


​
Recently, we started getting a lot of out of memory errors on the master.
Everytime, solr becomes unresponsive and we need to restart jetty to bring
it back up. At the same we observed the variation in index size. We are
suspecting that these two problems may be linked.

What could be the reason that the index size becomes almost 4x?  Why does
it vary so much? Any pointers will be appreciated. If you need any more
details on the config, please let me know.

-- 
Himanshu Sachdeva


Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-06 Thread Toke Eskildsen
On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote:
> We monitored the index size for a few days and found that it varies
> widely from 11GB to 43GB. 

Lucene/Solr indexes consists of segments, each holding a number of
documents. When a document is deleted, its bytes are not removed
immediately, only marked. When a document is updated, it is effectively
a delete and an add.

If you have an index with 3 documents
  segment-0 (live docs [0, 1, 2], deleted docs [])
and update document 0 and 1, you will have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live docs
[0, 1], deleted docs [])
if you then update document 1 again, you will
have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live
docs [0], deleted docs [1])
  segment-1 (live docs [1], deleted docs [])

for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents.

The space is reclaimed when segments are merged, but depending on your setup 
and update pattern that may take some time. Furthermore there is a temporary 
overhead of merging, when the merged segment is being written and the old 
segments are still available. 4x the minimum size is fairly large, but not 
unrealistic, with enough index-updates.

> Recently, we started getting a lot of out of memory errors on the
> master. Everytime, solr becomes unresponsive and we need to restart
> jetty to bring it back up. At the same we observed the variation in
> index size. We are suspecting that these two problems may be linked.

Quick sanity check: Look for "Overlapping onDeckSearchers" in your
solr.log to see if your memory problems are caused by multiple open
searchers:
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm
ingSearchers.3DX.22_mean.3F
-- 
Toke Eskildsen, Royal Danish Library


Where does the value for ${dih.delta.id} come from exactly?

2017-04-06 Thread Mathias
Hello, 

I'm trying to write the data config for our dataimporthandler, from postges
database to solr.

In many tables we use a bytea as pk and for this tables the value returned
from ${dih.delta.id} looks like '[B@3bbd583d' and I get an query exception
at [.

Does someone know how to work with this values?

Also i tried to create the delta queries like "SELECT encode(id, 'hex') AS
id ...", but the returned value does not change.
So is the value in ${dih.delta.id} really the result from the delta-query or
is it really just the id field from the returned db record? 

Thanks in advance,
Mathias 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Where-does-the-value-for-dih-delta-id-come-from-exactly-tp4328678.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 4.10.3 - Replication for DR strategy

2017-04-06 Thread Muhammad Imad Qureshi
I know this is a very old version and we are working to upgrade this to SOLR 
5.5. In the meantime, We also know how to do the replication by copying indexed 
data from the file system. Currently on prod a particular shard 
(shard1/replica1) has a directory called “/core_node_97”. When we move data 
from prod to DR, it copies data to same directory in DR (core_node97). How do 
we ensure that shard is also pointing to exact same directory? In this setup, 
shard1/replica1 has data in core_node97 on prod but on DR the directory is 
“core_node1” which is not right. I would want shard1/replica1 on DR side to 
point to core_node97.
I have tried exporting clusterjstate.son from zookeeper after modifying ip 
addresses to point to the ip addresses from DR (but that didn't help).
ThanksImad

SolrCloudProxy

2017-04-06 Thread Florian Gleixner
Hi,

i created a Solr Cloud Proxy. Its intention is to provide a real cloud
connection to clients that are not cloud-aware. Since SolrJ is Java
only, this is true for almost all clients, that are written in other
languages.

https://gitlab.lrz.de/a2814ad/SolrCloudProxy

It is not very much tested and is not feature complete and surely not
bug-free. Feedback is welcome.

Flo



signature.asc
Description: OpenPGP digital signature


Re: Solr 6.4.1 Issue

2017-04-06 Thread Shawn Heisey
On 4/3/2017 9:55 AM, Islam Omar wrote:
> I have a trouble problem when doing *full import in solr 6.4.1 using MySQL
> DB , the problem is : *
>
> i need to create 1 core which will be around 9,500,000 documents , when i
> do full import with *batchSize* *= -1* in datasource  , everything was ok
> but when the solr finish fetching data from database it *can't stop* running
> full import command and continue trying to fetch another data.
>
> *the log throws this exception *
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.sql.SQLException: Operation not allowed after ResultSet closed

> Caused by: java.sql.SQLException: Operation not allowed after ResultSet
> closed

> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:458)

This error is happening in code added by SOLR-2199, in version 6.1.0:

https://issues.apache.org/jira/browse/SOLR-2199

Based on the error message and what I know about JDBC, I believe that it
is isolated to the specific new code added by that issue.

I have done DIH imports from MySQL with version 6.3.0, creating indexes
much larger than your 9.5 million rows, without any problems.

This is the first time I've seen this particular problem mentioned.  If
it WERE a bug in the SOLR-2199 code, it seems like it would have already
been noticed in the nine months since 6.1.0 was released.  It is more
likely that there's an issue with the MySQL JDBC driver, or possibly a
result of the specific SQL statements that are being used in the import.

It is always possible that a new bug was introduced in 6.4.x, but DIH is
heavily used by Solr users, and 6.4.x has been out for some time, so I
think that a general problem with DIH would have already been reported
at least once.  At this point, I think Solr probably does not have a
bug, but I'm not completely ruling that possibility out.

What version of the MySQL JDBC driver are you using with Solr?  I would
strongly recommend that you use the latest GA version.  That is
currently version 5.1.41.  There are development versions of the 6.0
driver available, but I do not recommend using those except in test
environments where you can make changes without affecting production
systems.

https://dev.mysql.com/downloads/connector/j/5.1.html

If a new driver doesn't help, we can look deeper into whether Solr has a
bug.  You may need to share your dataimport config, schema, and
solrconfig.xml, after redacting any sensitive information.

Thanks,
Shawn



Possible bug

2017-04-06 Thread OTH
I'm not sure if any one else had this problem, but this is a problem I had:

I'm using Solr 6.4.1, on Windows, and when would run 'bin\solr delete -c
', it wouldn't work properly.  It turned out it was because
there was a space character which shouldn't have been there at the end of
line 1380 in the solr.bat file.  I'm not sure if that's the way it came or
if maybe I had accidentally added that space at some point, though I don't
seem to remember doing anything like that.

After removing that space, the delete command works fine.

Regards


Re: Possible bug

2017-04-06 Thread Steve Rowe
Thanks for reporting.

This was fixed by , which 
will be included in forthcoming Solr 6.5.1.

--
Steve
www.lucidworks.com

> On Apr 6, 2017, at 12:54 PM, OTH  wrote:
> 
> I'm not sure if any one else had this problem, but this is a problem I had:
> 
> I'm using Solr 6.4.1, on Windows, and when would run 'bin\solr delete -c
> ', it wouldn't work properly.  It turned out it was because
> there was a space character which shouldn't have been there at the end of
> line 1380 in the solr.bat file.  I'm not sure if that's the way it came or
> if maybe I had accidentally added that space at some point, though I don't
> seem to remember doing anything like that.
> 
> After removing that space, the delete command works fine.
> 
> Regards



Re: Solr 6.4.1 Issue

2017-04-06 Thread Mikhail Khludnev
Hello,

I can't understand what does actually means "everything was ok
> but when the solr finish fetching data from database it *can't stop*
running
> full import command and continue trying to fetch another data.".
Islam,
How do you know that it finishes fetching the data?
Are you sure that there was no any error or interrupt command which can
cause the exception?

On Mon, Apr 3, 2017 at 6:55 PM, Islam Omar  wrote:

> Hi All ,
>
> I have a trouble problem when doing *full import in solr 6.4.1 using MySQL
> DB , the problem is : *
>
> i need to create 1 core which will be around 9,500,000 documents , when i
> do full import with *batchSize* *= -1* in datasource  , everything was ok
> but when the solr finish fetching data from database it *can't stop*
> running
> full import command and continue trying to fetch another data.
>
> *the log throws this exception *
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.sql.SQLException: Operation not allowed after ResultSet closed
> at
> org.apache.solr.handler.dataimport.DataImportHandlerException.
> wrapAndThrow(DataImportHandlerException.java:61)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$
> ResultSetIterator.hasnext(JdbcDataSource.java:464)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$
> ResultSetIterator$1.hasNext(JdbcDataSource.java:377)
> at
> org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(
> EntityProcessorBase.java:133)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.
> nextRow(SqlEntityProcessor.java:75)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(
> EntityProcessorWrapper.java:244)
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:475)
> at
> org.apache.solr.handler.dataimport.DocBuilder.
> buildDocument(DocBuilder.java:414)
> ... 7 more
> Caused by: java.sql.SQLException: Operation not allowed after ResultSet
> closed
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1074)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:988)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:974)
> at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:919)
> at com.mysql.jdbc.ResultSetImpl.checkClosed(ResultSetImpl.java:803)
> at com.mysql.jdbc.ResultSetImpl.next(ResultSetImpl.java:6985)
> at com.mysql.jdbc.StatementImpl.getMoreResults(StatementImpl.java:2232)
> at com.mysql.jdbc.StatementImpl.getMoreResults(StatementImpl.java:2216)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$
> ResultSetIterator.hasnext(JdbcDataSource.java:458)
>
> *Best regards* ,
> *Islam omar*
> *Java developer*
>



-- 
Sincerely yours
Mikhail Khludnev


clusterstate.json not updated in zookeeper after creating the collection using API

2017-04-06 Thread Xie, Sean
Hi

I created collection in a SolrCloud cluster (6.3.0), but found the 
clusterstate.json is not updated in zookeeper. It’s empty.

I’m able to query cluster state using API: 
admin/collections?action=CLUSTERSTATUS&wt=json

Any reason why clusterstate.json is not updated?

Thanks
Sean

Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.


Re: clusterstate.json not updated in zookeeper after creating the collection using API

2017-04-06 Thread Erick Erickson
clusterstate.json is a remnant of when all collection information was
held in that node. It should always be empty currently.

The state for the collection should be in
collections>>collection_name1>>state.json
collections>>collection_name2>>state.json

and so on.

Best,
Erick

On Thu, Apr 6, 2017 at 8:10 PM, Xie, Sean  wrote:
> Hi
>
> I created collection in a SolrCloud cluster (6.3.0), but found the 
> clusterstate.json is not updated in zookeeper. It’s empty.
>
> I’m able to query cluster state using API: 
> admin/collections?action=CLUSTERSTATUS&wt=json
>
> Any reason why clusterstate.json is not updated?
>
> Thanks
> Sean
>
> Confidentiality Notice::  This email, including attachments, may include 
> non-public, proprietary, confidential or legally privileged information.  If 
> you are not an intended recipient or an authorized agent of an intended 
> recipient, you are hereby notified that any dissemination, distribution or 
> copying of the information contained in or transmitted with this e-mail is 
> unauthorized and strictly prohibited.  If you have received this email in 
> error, please notify the sender by replying to this message and permanently 
> delete this e-mail, its attachments, and any copies of it immediately.  You 
> should not retain, copy or use this e-mail or any attachment for any purpose, 
> nor disclose all or any part of the contents to any other person. Thank you.


Re: Expressing "not equals" in Block Join Parent Query

2017-04-06 Thread Zheng Lin Edwin Yeo
Thanks for your reply.

Meaning there is still a need to include accountNo_s:* before we put the
"not equals" parameters?

Regards,
Edwin


On 6 April 2017 at 15:49, Mikhail Khludnev  wrote:

> &fq={!parent which="contentType_s:Header"}accountNo_s:* -accountNo_s
> :"123456"
>
> 05 апр. 2017 г. 21:47 пользователь "Zheng Lin Edwin Yeo" <
> edwinye...@gmail.com> написал:
>
> > Hi,
> >
> > Is there any way which we can express not equals in Block Join Parent
> > Query?
> >
> > For example, I want to find accountNo_s that is not equal to 123456
> >
> > Currently, I am putting it in this way for it to work:
> > &fq={!parent which="contentType_s:Header"}accountNo_s:* AND !accountNo_s
> > :"123456"
> >
> > It does not work when I put it this way:
> > &fq={!parent which="contentType_s:Header"}!accountNo_s :"123456"
> >
> > But I don't think it is a good practice to put the additional
> accountNo_s:*
> > before that. Is there other better way which we could query this?
> >
> > I'm using Solr 6.4.2
> >
> >
> > Regards,
> > Edwin
> >
>