Apche SOLR httpclient library upgrade

2018-09-17 Thread padmanabhan1616
Hi Team,

We are planning to upgrade httpclient-4.3.1.jar to httpclient-4.3.6.jar as
it covers some of security vulnerabilities. 

Is it good idea to upgrade httpclient jar alone?

Thanks,
Padmanabhan 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Apache zookeeper jar upgrade for SOLR

2018-09-17 Thread padmanabhan1616
Hi Team,

We are using Apache SOLR-5.2.1 and Zookeeper-3.4.5

is it good idea to upgrade zookeeper-3.4.5.jar to zookeeper-3.4.10.jar for
SOLR-5.2.1 version

is there any impact to upgrade specific libraries for Apache SOLR instead of
full upgrade?

Thanks,
Padmanabhan



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Apache zookeeper jar upgrade for SOLR

2018-09-17 Thread Jan Høydahl
Hi

First of all, you are running a super old version of Solr, with its own set of 
known security vulnerabilities[1].
We highly recommend a full upgrade to latest 7.x version.

In some cases, however, an urgent need may call for temporarily patching single 
jar's. Whether that is possible
or not is something you have to assess yourself through reading the changelogs 
for the library in question,
paying special attention to deprecations and back-compat breaks. If your 
research concludes that it should be
possible you simply have to try yourself, and thoroughly test the features you 
depend upon in a test environment.

Wrt the zookeeper ant httpclient jars you ask about, the version number 
difference indicate that an in-place
upgrade *may* be possible, but you really have to double check yourself :)

[1] http://lucene.apache.org/solr/news.html 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 17. sep. 2018 kl. 08:46 skrev padmanabhan1616 :
> 
> Hi Team,
> 
> We are using Apache SOLR-5.2.1 and Zookeeper-3.4.5
> 
> is it good idea to upgrade zookeeper-3.4.5.jar to zookeeper-3.4.10.jar for
> SOLR-5.2.1 version
> 
> is there any impact to upgrade specific libraries for Apache SOLR instead of
> full upgrade?
> 
> Thanks,
> Padmanabhan
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Haystack, the search relevance conference comes to London on October 2nd 2018

2018-09-17 Thread Charlie Hull

On 21/08/2018 15:14, Charlie Hull wrote:

Hi all,

We're very happy to announce the first Haystack Europe conference in 
London on October 2nd.


Hi all,

Just to note the full conference programme is now up, including talks on 
Learning to Rank, tools for visualising and tuning relevance, building 
search relevance teams and more. Hope to see some of you there!

https://opensourceconnections.com/events/haystack-europe-2018/

Cheers

Charlie


https://opensourceconnections.com/events/haystack-europe-2018/

Come and hear talks by Doug Turnbull, co-author of Relevant Search, 
Karen Renshaw, Head of Search and Content for Grainger Global Online and 
other relevance experts, plus the usual networking and knowledge sharing.


Hope to meet some of you there!

Cheers

Charlie




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Explode kind of function in Solr

2018-09-17 Thread Rushikesh Garadade
@joels...@gmail.com
Thanks for the reply. This is what I want.
However My current implementation is with Spring boot Solr, I did not find
catesianProduct implementation in sping boot.

Please let me know in case if you know anything around "impementaion of
cartesianProduct in spring data.

Thanks,
Rushikesh Garadade

On Thu, Sep 13, 2018 at 6:48 PM Joel Bernstein  wrote:

> Solr Streaming Expressions allow you to do this with the cartesianProduct
> function:
>
>
> http://lucene.apache.org/solr/guide/7_4/stream-decorator-reference.html#cartesianproduct
>
> The structure of the expression is:
>
> cartesianProduct(search(...))
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 13, 2018 at 6:21 AM Rushikesh Garadade <
> rushikeshgarad...@gmail.com> wrote:
>
> > Hello All,
> > Is there any functionality in solr that can convert (explode) results
> from
> > 1 document to many docuement.
> > *Example: *
> > Lets say I have doc:
> > {
> > id:1,
> > phone: [11,22,33]
> > }
> >
> > when I query to solr with id=1 I want result as below:
> > [{
> > id:1,
> > phone:11
> > },
> > {
> > id:1,
> > phone:22
> > },
> > {
> > d:1,
> > phone:33
> > }]
> >
> > Please let me know if this is possible in Solr , if Yes how?
> >
> > Thanks,
> > Rushikesh Garadade
> >
>


20180917-Need Apache SOLR support

2018-09-17 Thread KARTHICKRM
Dear SOLR Team,

 

We are beginners to Apache SOLR, We need following clarifications from you.

 

1.  In SOLRCloud, How can we install more than one Shared on Single PC? 

 

2.  How many maximum number of shared can be added under on SOLRCloud?

 

3.  In my application there is no need of ACID properties, other than
this can I use SOLR as a Complete Database?

 

4.  In Which OS we can feel the better performance, Windows Server OS /
Linux?

 

5.  If a SOLR Core contains 2 Billion indexes, what is the recommended
RAM size and Java heap space for better performance? 

 

6.  I have 20 fields per document, how many maximum number of documents
can be inserted / retrieved in a single request?

 

7.   If I have Billions of indexes, If the "start" parameter is 10th
Million index and "end" parameter is  start+100th index, for this case any
performance issue will be raised ?

 

8.  Which .net client is best for SOLR?

 

9.  Is there any limitation for single field, I mean about the size for
blob data?

 

 

Thanks,

Karthick.R.M

+91 8124774480

 

 



Re: 20180917-Need Apache SOLR support

2018-09-17 Thread Jan Høydahl
> We are beginners to Apache SOLR, We need following clarifications from you.
> 
> 
> 
> 1.  In SOLRCloud, How can we install more than one Shared on Single PC? 

You typically have one installation of Solr on each server. Then you can add a 
collection with multiple shards, specifying how many shards you wish when 
creating the collection, e.g.

bin/solr create -c mycoll -shards 4

Although possible, it is normally not advised to install multiple instances of 
Solr on the same server.

> 2.  How many maximum number of shared can be added under on SOLRCloud?

There is no limit. You should find a good number based on the number of 
documents, the size of your data, the number of servers in your cluster, 
available RAM and disk size and the required performance.

In practice you will guess the initial #shards and then benchmark a few 
different settings before you decide.
Note that you can also adjust the number of shards as you go through 
CREATESHARD / SPLITSHARD APIs, so even if you start out with few shards you can 
grow later.

> 3.  In my application there is no need of ACID properties, other than
> this can I use SOLR as a Complete Database?

You COULD, but Solr is not intended to be your primary data store. You should 
always design your system so that you can re-index all content from some source 
(does not need to be a database) when needed. There are several use cases for a 
complete re-index that you should consider.

> 4.  In Which OS we can feel the better performance, Windows Server OS /
> Linux?

I'd say Linux if you can. If you HAVE to, then you could also run on Windows :-)

> 5.  If a SOLR Core contains 2 Billion indexes, what is the recommended
> RAM size and Java heap space for better performance? 

It depends. It is not likely that you will ever put 2bn docs in one single 
core. Normally you would have sharded long before that number.
The amount of physical RAM and the amount of Java heap to allocate to Solr must 
be calculated and decided on a per case basis.
You could also benchmark this - test if a larger RAM size improves performance 
due to caching. Depending on your bottlennecks, adding more RAM may be a way to 
scale further before needing to add more servers.

Sounds like you should consult with a Solr expert to dive deep into your exact 
usecase and architect the optimal setup for your case, if you have these 
amounts of data.

> 6.  I have 20 fields per document, how many maximum number of documents
> can be inserted / retrieved in a single request?

No limit. But there are practical limits.
For indexing (update), attempt various batch sizes and find which gives the 
best performance for you. It is just as important to do inserts (updates) in 
many parallell connections as in large batches.

For searching, why would you want to know a maximum? Normally the usecase for 
search is to get TOP N docs, not a maximum number?
If you need to retrieve thousands of results, you should have a look at /export 
handler and/or streaming expressions.

> 7.   If I have Billions of indexes, If the "start" parameter is 10th
> Million index and "end" parameter is  start+100th index, for this case any
> performance issue will be raised ?

Don't do it!
This is a warning sign that you are using Solr in a wrong way.

If you need to scroll through all docs in the index, have a look at streaming 
expressions or cursorMark instead!

> 8.  Which .net client is best for SOLR?

The only I'm aware of is SolrNET. There may be others. None of them are 
supported by the Solr project.

> 9.  Is there any limitation for single field, I mean about the size for
> blob data?

I think there is some default cutoff for very large values.

Why would you want to put very large blobs into documents?
This is a warning flag that you may be using the search index in a wrong way. 
Consider storing large blobs outside of the search index and reference them 
from the docs.


In general, it would help a lot if you start telling us WHAT you intend to use 
Solr for, what you try to achieve, what performance goals/requirements you have 
etc, instead of a lot of very specific max/min questions. There are very seldom 
hard limits, and if there are, it is usually not a good idea to approach them :)

Jan



Re: 20180917-Need Apache SOLR support

2018-09-17 Thread Susheel Kumar
I'll highly advice if you can use Java library/SolrJ to connect to Solr
than .Net.  There are many things taken care by CloudSolrClient and other
classes when communicated with Solr Cloud having shards/replica's etc and
if your .Net port for SolrJ are not up to date/having all the functionality
(which I am sure) , you may run into issues.

Thnx

On Mon, Sep 17, 2018 at 10:01 AM Jan Høydahl  wrote:

> > We are beginners to Apache SOLR, We need following clarifications from
> you.
> >
> >
> >
> > 1.  In SOLRCloud, How can we install more than one Shared on Single
> PC?
>
> You typically have one installation of Solr on each server. Then you can
> add a collection with multiple shards, specifying how many shards you wish
> when creating the collection, e.g.
>
> bin/solr create -c mycoll -shards 4
>
> Although possible, it is normally not advised to install multiple
> instances of Solr on the same server.
>
> > 2.  How many maximum number of shared can be added under on
> SOLRCloud?
>
> There is no limit. You should find a good number based on the number of
> documents, the size of your data, the number of servers in your cluster,
> available RAM and disk size and the required performance.
>
> In practice you will guess the initial #shards and then benchmark a few
> different settings before you decide.
> Note that you can also adjust the number of shards as you go through
> CREATESHARD / SPLITSHARD APIs, so even if you start out with few shards you
> can grow later.
>
> > 3.  In my application there is no need of ACID properties, other than
> > this can I use SOLR as a Complete Database?
>
> You COULD, but Solr is not intended to be your primary data store. You
> should always design your system so that you can re-index all content from
> some source (does not need to be a database) when needed. There are several
> use cases for a complete re-index that you should consider.
>
> > 4.  In Which OS we can feel the better performance, Windows Server
> OS /
> > Linux?
>
> I'd say Linux if you can. If you HAVE to, then you could also run on
> Windows :-)
>
> > 5.  If a SOLR Core contains 2 Billion indexes, what is the
> recommended
> > RAM size and Java heap space for better performance?
>
> It depends. It is not likely that you will ever put 2bn docs in one single
> core. Normally you would have sharded long before that number.
> The amount of physical RAM and the amount of Java heap to allocate to Solr
> must be calculated and decided on a per case basis.
> You could also benchmark this - test if a larger RAM size improves
> performance due to caching. Depending on your bottlennecks, adding more RAM
> may be a way to scale further before needing to add more servers.
>
> Sounds like you should consult with a Solr expert to dive deep into your
> exact usecase and architect the optimal setup for your case, if you have
> these amounts of data.
>
> > 6.  I have 20 fields per document, how many maximum number of
> documents
> > can be inserted / retrieved in a single request?
>
> No limit. But there are practical limits.
> For indexing (update), attempt various batch sizes and find which gives
> the best performance for you. It is just as important to do inserts
> (updates) in many parallell connections as in large batches.
>
> For searching, why would you want to know a maximum? Normally the usecase
> for search is to get TOP N docs, not a maximum number?
> If you need to retrieve thousands of results, you should have a look at
> /export handler and/or streaming expressions.
>
> > 7.   If I have Billions of indexes, If the "start" parameter is 10th
> > Million index and "end" parameter is  start+100th index, for this case
> any
> > performance issue will be raised ?
>
> Don't do it!
> This is a warning sign that you are using Solr in a wrong way.
>
> If you need to scroll through all docs in the index, have a look at
> streaming expressions or cursorMark instead!
>
> > 8.  Which .net client is best for SOLR?
>
> The only I'm aware of is SolrNET. There may be others. None of them are
> supported by the Solr project.
>
> > 9.  Is there any limitation for single field, I mean about the size
> for
> > blob data?
>
> I think there is some default cutoff for very large values.
>
> Why would you want to put very large blobs into documents?
> This is a warning flag that you may be using the search index in a wrong
> way. Consider storing large blobs outside of the search index and reference
> them from the docs.
>
>
> In general, it would help a lot if you start telling us WHAT you intend to
> use Solr for, what you try to achieve, what performance goals/requirements
> you have etc, instead of a lot of very specific max/min questions. There
> are very seldom hard limits, and if there are, it is usually not a good
> idea to approach them :)
>
> Jan
>
>


Re: Logging fails when starting Solr in Windows using solr.cmd

2018-09-17 Thread Shawn Heisey

On 9/16/2018 3:05 PM, marcostocch...@gmail.com wrote:

I experienced the same issue on Windows 10 professional. I don't think the OS 
version is important. The solr version might. I have solr-7.4.0.


The problem has been fixed in the source code and the next version of 
Solr (7.5.0) won't experience this issue.


https://issues.apache.org/jira/browse/SOLR-12538

We had a problem with Jira and ended up erasing all logs in Jira related 
to the change.  The git repository still has it.


https://git1-us-west.apache.org/repos/asf?p=lucene-solr.git;a=commit;h=93ae3669

Thanks,
Shawn



Re: Logging fails when starting Solr in Windows using solr.cmd

2018-09-17 Thread Erick Erickson
The file:/// change was made in:
https://issues.apache.org/jira/browse/SOLR-12538, how to reconcile
these two?
On Sun, Sep 16, 2018 at 10:54 PM marcostocch...@gmail.com
 wrote:
>
>
>
> On 2018/07/03 08:53:20, ja...@jafurrer.ch wrote:
> > Hi,
> >
> > I was intending to open an Issue in Jira when I read that I'm supposed
> > to first contact this mailinglist.
> >
> > Problem description
> > ==
> >
> > System: Microsoft Windows 10 Enterprise Version 10.0.16299 Build 16299
> >
> > Steps to reproduce the problem:
> > 1) Download solr-7.4.0.tgz
> > 2) Unzip to C:\solr-7.4.0
> > 3) No changes (configuration or otherwise) whatsoever
> > 4) Open cmd.exe
> > 5) Execute the following command: cd c:\solr-7.4.0\bin
> > 6) Execute the following command: solr.cmd start -p 8983
> > 7) The following console output appears:
> >
> >
> > c:\solr-7.4.0\bin>solr.cmd start -p 8983
> > ERROR StatusLogger Unable to access
> > file:/c:/solr-7.4.0/server/file:c:/solr-7.4.0/server/scripts/cloud-scripts/log4j2.xml
> >   java.io.FileNotFoundException:
> > c:\solr-7.4.0\server\file:c:\solr-7.4.0\server\scripts\cloud-scripts\log4j2.xml
> > (Die Syntax für den Dateinamen, Verzeichnisnamen oder die
> > Datenträgerbezeichnung ist falsch)
> >  at java.io.FileInputStream.open0(Native Method)
> >  at java.io.FileInputStream.open(FileInputStream.java:195)
> >  at java.io.FileInputStream.(FileInputStream.java:138)
> >  at java.io.FileInputStream.(FileInputStream.java:93)
> >  at
> > sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
> >  at
> > sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
> >  at java.net.URL.openStream(URL.java:1045)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationSource.fromUri(ConfigurationSource.java:247)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:404)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:346)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:260)
> >  at
> > org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:615)
> >  at
> > org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:636)
> >  at
> > org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:231)
> >  at
> > org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:153)
> >  at
> > org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
> >  at
> > org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
> >  at
> > org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:121)
> >  at
> > org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
> >  at
> > org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:46)
> >  at
> > org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
> >  at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:358)
> >  at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:383)
> >  at org.apache.solr.util.SolrCLI.(SolrCLI.java:228)
> > ERROR StatusLogger Unable to access
> > file:/c:/solr-7.4.0/server/file:c:/solr-7.4.0/server/resources/log4j2.xml
> >   java.io.FileNotFoundException:
> > c:\solr-7.4.0\server\file:c:\solr-7.4.0\server\resources\log4j2.xml (Die
> > Syntax für den Dateinamen, Verzeichnisnamen oder die
> > Datenträgerbezeichnung ist falsch)
> >  at java.io.FileInputStream.open0(Native Method)
> >  at java.io.FileInputStream.open(FileInputStream.java:195)
> >  at java.io.FileInputStream.(FileInputStream.java:138)
> >  at java.io.FileInputStream.(FileInputStream.java:93)
> >  at
> > sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
> >  at
> > sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
> >  at java.net.URL.openStream(URL.java:1045)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationSource.fromUri(ConfigurationSource.java:247)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:404)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:346)
> >  at
> > org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:260)
> >  at
> > org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:615)
> >   

Re: 20180917-Need Apache SOLR support

2018-09-17 Thread Walter Underwood
Do not use Solr as a database. It was never designed to be a database.
It is missing a lot of features that are normal in databases.

* no transactions
* no rollback (in Solr Cloud)
* no session isolation (one client’s commit will commit all data in progress)
* no schema migration
* no version migration
* no real backups (Solr backup is a cold server, not a dump/load)
* no dump/load
* modify record (atomic updates are a subset of this)

Solr assumes you can always reload all the data from a repository. This is done
instead of migration or backups.

If you use Solr as a database and lose all your data, don’t blame us. It was
never designed to do that.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 17, 2018, at 7:01 AM, Jan Høydahl  wrote:
> 
>> We are beginners to Apache SOLR, We need following clarifications from you.
>> 
>> 
>> 
>> 1.  In SOLRCloud, How can we install more than one Shared on Single PC? 
> 
> You typically have one installation of Solr on each server. Then you can add 
> a collection with multiple shards, specifying how many shards you wish when 
> creating the collection, e.g.
> 
> bin/solr create -c mycoll -shards 4
> 
> Although possible, it is normally not advised to install multiple instances 
> of Solr on the same server.
> 
>> 2.  How many maximum number of shared can be added under on SOLRCloud?
> 
> There is no limit. You should find a good number based on the number of 
> documents, the size of your data, the number of servers in your cluster, 
> available RAM and disk size and the required performance.
> 
> In practice you will guess the initial #shards and then benchmark a few 
> different settings before you decide.
> Note that you can also adjust the number of shards as you go through 
> CREATESHARD / SPLITSHARD APIs, so even if you start out with few shards you 
> can grow later.
> 
>> 3.  In my application there is no need of ACID properties, other than
>> this can I use SOLR as a Complete Database?
> 
> You COULD, but Solr is not intended to be your primary data store. You should 
> always design your system so that you can re-index all content from some 
> source (does not need to be a database) when needed. There are several use 
> cases for a complete re-index that you should consider.
> 
>> 4.  In Which OS we can feel the better performance, Windows Server OS /
>> Linux?
> 
> I'd say Linux if you can. If you HAVE to, then you could also run on Windows 
> :-)
> 
>> 5.  If a SOLR Core contains 2 Billion indexes, what is the recommended
>> RAM size and Java heap space for better performance? 
> 
> It depends. It is not likely that you will ever put 2bn docs in one single 
> core. Normally you would have sharded long before that number.
> The amount of physical RAM and the amount of Java heap to allocate to Solr 
> must be calculated and decided on a per case basis.
> You could also benchmark this - test if a larger RAM size improves 
> performance due to caching. Depending on your bottlennecks, adding more RAM 
> may be a way to scale further before needing to add more servers.
> 
> Sounds like you should consult with a Solr expert to dive deep into your 
> exact usecase and architect the optimal setup for your case, if you have 
> these amounts of data.
> 
>> 6.  I have 20 fields per document, how many maximum number of documents
>> can be inserted / retrieved in a single request?
> 
> No limit. But there are practical limits.
> For indexing (update), attempt various batch sizes and find which gives the 
> best performance for you. It is just as important to do inserts (updates) in 
> many parallell connections as in large batches.
> 
> For searching, why would you want to know a maximum? Normally the usecase for 
> search is to get TOP N docs, not a maximum number?
> If you need to retrieve thousands of results, you should have a look at 
> /export handler and/or streaming expressions.
> 
>> 7.   If I have Billions of indexes, If the "start" parameter is 10th
>> Million index and "end" parameter is  start+100th index, for this case any
>> performance issue will be raised ?
> 
> Don't do it!
> This is a warning sign that you are using Solr in a wrong way.
> 
> If you need to scroll through all docs in the index, have a look at streaming 
> expressions or cursorMark instead!
> 
>> 8.  Which .net client is best for SOLR?
> 
> The only I'm aware of is SolrNET. There may be others. None of them are 
> supported by the Solr project.
> 
>> 9.  Is there any limitation for single field, I mean about the size for
>> blob data?
> 
> I think there is some default cutoff for very large values.
> 
> Why would you want to put very large blobs into documents?
> This is a warning flag that you may be using the search index in a wrong way. 
> Consider storing large blobs outside of the search index and reference them 
> from the docs.
> 
> 
> In general, it would help a lot if you start telling us WH

OOM Solr 4.8.1

2018-09-17 Thread Vincenzo D'Amore
Hi there,

recently I had few Java OOM in my Solr 4.8.1 instance.

Here the configuration I have.

-Djava.util.logging.config.file=/opt/tomcat/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Dsolr.log=/opt/tomcat/logs
-DzkHost=ep-1:2181,ep-2:2181,ep-3:2181
-Dsolr.solr.home=/store/solr
-Xms2g -Xmx16g
-server
-XX:+UseG1GC
-XX:+ParallelRefProcEnabled
-XX:G1HeapRegionSize=8m
-XX:MaxGCPauseMillis=400
-XX:+UseLargePages
-XX:+AggressiveOpts
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/tomcat/dumpoom/dump.hprof
-Dcom.sun.management.jmxremote.port=1616
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.rmi.port=1616
-Dcom.sun.management.jmxremote.local.only=false
-Djava.rmi.server.hostname=localhost
-Djava.endorsed.dirs=/opt/tomcat/endorsed

This is the error:

org.apache.solr.common.SolrException: Error while processing facet fields:
java.lang.OutOfMemoryError: Java heap space

Here the complete stacktrace:
https://gist.github.com/freedev/a14aa9e6ae33fc3ddb2f02d602b34e2b

I suppose these errors are generated by an increase of traffic coming from
crawlers/spiders.
So given the sudden appear of these errors, I've configured a memory dump
of jvm in case of oom.

Analyzing the memory dump with the Eclipse Memory Analizer and running the
Usual argh... :) the "Leak Suspects Report" I've found that about 80% of
memory was occupied by one instance of FieldCacheImpl :

One instance of "org.apache.lucene.search.FieldCacheImpl" loaded by
"org.apache.catalina.loader.WebappClassLoader @ 0x3c145b028" occupies
8,248,329,008 (79.69%) bytes. The memory is accumulated in one instance of
"java.util.WeakHashMap$Entry[]" loaded by "".

I was unable to understand what field was, it seems to be a float.

Anyone has an advice to give me? For long time this server has worked well,
without problems. Recently we have a huge traffic coming from
spiders/crawlers but I don't understand how these requests can consume all
the available memory.

Best regards,
Vincenzo

-- 
Vincenzo D'Amore


Re: 20180917-Need Apache SOLR support

2018-09-17 Thread Shawn Heisey

On 9/17/2018 7:04 AM, KARTHICKRM wrote:

Dear SOLR Team,

We are beginners to Apache SOLR, We need following clarifications from you.


Much of what I'm going to say is a mirror of what you were already told 
by Jan.  All of Jan's responses are good.



1.  In SOLRCloud, How can we install more than one Shared on Single PC?


One Solr instance can run multiple indexes.  Except for one specific 
scenario that I hope you don't run into, you should NOT run multiple 
Solr instances per server.  There should only be one.  If your query 
rate is very low, then you can get good performance from multiple shards 
per node, but with a high query rate, you'll only want one shard per node.



2.  How many maximum number of shared can be added under on SOLRCloud?


There is no practical limit.  If you create enough of them (more than a 
few hundred), you can end up with severe scalability problems related to 
SolrCloud's interaction with ZooKeeper.



3.  In my application there is no need of ACID properties, other than
this can I use SOLR as a Complete Database?


Solr is NOT a database.  All of its capability and all the optimizations 
it contains are all geared towards search.  If you try to use it as a 
database, you're going to be disappointed with it.



4.  In Which OS we can feel the better performance, Windows Server OS /
Linux?


From those two choices, I would strongly recommend Linux. If you have 
an open source operating system that you prefer to Linux, go with that.



5.  If a SOLR Core contains 2 Billion indexes, what is the recommended
RAM size and Java heap space for better performance?


I hope you mean 2 billion documents here, not 2 billion indexes.  Even 
though technically speaking there's nothing preventing SolrCloud from 
handling that many indexes, you'll run into scalability problems long 
before you reach that many.


If you do mean documents ... don't put that many documents in one core.  
That number includes deleted documents, which means there's a good 
possibility of going beyond the actual limit if you try to have 2 
billion documents that haven't been deleted.



6.  I have 20 fields per document, how many maximum number of documents
can be inserted / retrieved in a single request?


There's no limit to the number that can be retrieved.  But because the 
entire response must be built in memory, you can run your Solr install 
out of heap memory by trying to build a large response.  Streaming 
expressions can be used for really large results to avoid the memory issues.


As for the number of documents that can be inserted by a single request 
... Solr defaults to a maximum POST body size of 2 megabytes.  This can 
be increased through an option in solrconfig.xml.  Unless your documents 
are huge, this is usually enough to send several thousand at once, which 
should be plenty.



7.   If I have Billions of indexes, If the "start" parameter is 10th
Million index and "end" parameter is  start+100th index, for this case any
performance issue will be raised ?


Let's say that you send a request with these parameters, and the index 
has three shards:


start=1000&rows=100

Every shard in the index is going to return a result to the coordinating 
node of ten million plus 100.  That's thirty million individual 
results.  The coordinating node will combine those results, sort them, 
and then request full documents for the 100 specific rows that were 
requested.  This takes a lot of time and a lot of memory.


For deep paging, use cursorMark.  For large result sets, use streaming 
expressions.  I have used cursorMark ... it's only disadvantage is that 
you can't jump straight to page 1, you must go through all of the 
earlier pages too.  But page 1 will be just as fast as page 1.  I 
have never used streaming expressions.



8.  Which .net client is best for SOLR?


No idea.  The only client produced by this project is the Java client.  
All other clients are third-party, including .NET clients.



9.  Is there any limitation for single field, I mean about the size for
blob data?


There are technically no limitations here.  But if your data is big 
enough, it begins to cause scalability problems.  It takes time to read 
data off the disk, for the CPU to process it, etc.


In conclusion, I have much the same thing to say as Jan said.  It sounds 
to me like you're not after a search engine, and that Solr might not be 
the right product for what you're trying to accomplish.  I'll say this 
again: Solr is NOT a database.


Thanks,
Shawn



Re: OOM Solr 4.8.1

2018-09-17 Thread Shawn Heisey

On 9/17/2018 9:52 AM, Vincenzo D'Amore wrote:

recently I had few Java OOM in my Solr 4.8.1 instance.

Here the configuration I have.


The only part of your commandline options that matters for OOM is the 
max heap. Which is 16GB for your server.  Note, you should set the min 
heap and max heap to the same value.  Java will eventually allocate the 
entire max heap it has been allowed ... better to do so right from the 
start.



This is the error:

org.apache.solr.common.SolrException: Error while processing facet fields:
java.lang.OutOfMemoryError: Java heap space


Your heap isn't big enough.  You have two choices.  Make the heap 
bigger, or change something so Solr doesn't need as much heap memory.


https://wiki.apache.org/solr/SolrPerformanceProblems#Reducing_heap_requirements

If you enable docValues for fields that you use for faceting, much less 
heap memory will be required to get facet results.


Sometimes the only real way to change how much memory is required is to 
reduce the size of the index.  Put fewer documents into the index, 
probably by spreading the index across multiple servers (shards).


Thanks,
Shawn



Solr standalone health checks

2018-09-17 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

All,

I can see three possibilities for monitoring a Solr (7.4.0) deployment:

1. bin/solr healthcheck
2. curl /solr/[collection]/admin/ping
3. JMX

Option #1 isn't available unless ZK is in use, and I'm not using ZK in
my case.

Option #2 issues a very simple query and essentially returns a
"service is up" response.

Option #3 requires a JVM to be launched in order to check to see if
things are working well.

I have read about the Prometheus/Grafana reporting, but that includes
much more information about the performance of Solr that I'm currently
interested in.

The basic questions I'd like to have answered on a regular basis are:

1. Is the JVM up (this can be done with a ping, of course)
2. Is the heap healthy? Any OOMEs?
3. Will a sample query return in a reasonable amount of time?

1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2 is
trickier. I can do this via JMX, but I'd prefer to avoid spinning-up a
whole JVM just to probe Solr for one or two values.

Are there any other options for monitoring Solr that I am missing?

- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlugFhwACgkQHPApP6U8
pFjShRAAoV3auXv0PIhztMW7385hi0jV2Fl6V4PrF/TZUYpEQ0jDXdseC5bm+tip
rwKmAYTqu6smvNtC4Qlj27+BdFSmaDP2MwfGN9sWCPahRLHdUKHfxwi4MnWTegM/
OkGuTiVYjzLe2vUlf4BACFFTRAz2bkRHua81SqiOMU1nZFQlj8mHy4qRBFK57Zcd
R6GGry2zcnDTkXql5v/kOCaJiXUj76n8regMVaC0M04AFIvGrIqqJ/BfxkTPmVEf
v1kC+zbKiThTl2fOSLRzwoLJvMpPghLKg5cvb9QQyRgrTQbYcYTPgytstKYS4c87
1mlbj92+T5D6kbw5snBoNIXqfPP+3kUQEeoEwz9m05SRYeoV/SR/M+wqqag5Vmz9
1Gje4TrLAfNOCxk1jSBkUWsgR5lC3msyDSDbLE/2i/m6iANxUoPnin0bQHpau6XN
tGvxyTzyZa4O1hfsWyuTywipdJOadtjyDkAEEU5CeExFAY4EILxRr78mqMx1g+CV
lefLYavs0rfQzvkkL01meL2nqitk82/x6l0PCyIh6WHHrIJ1XYWR+nQszeqY8HJE
BX0NITMqQ2gk50JpzbKqrcLWNGLvAZTzFvLKUUq4pgtab3tBwwaDzVHsxhNy517Z
933Cz92cP1VJtUKkQrw4YDChQzZt9wIHIm5vcAaBgwKCZPRWcds=
=yiXp
-END PGP SIGNATURE-


Re: Boost matches occurring early in the field (offset)

2018-09-17 Thread Chris Hostetter


: I have seen that one. But as I understand spanFirst, it only allows you 
: to define a boost if your span matches, i.e. not a gradually lower score 
: the further down in the document the match is?

I believe you are incorrect.

Unless something has drastically changed in SpanQuery in the past few 
years, all SpanQueries automatically "boost" the resulting scores of 
matching documents based on the "width" of the spans that match -- similar 
to how a phrase query with a high slop value will score higher for a doc 
with one "tight" match then on a doc with one "loose" match...

https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/search/similarities/Similarity.SimScorer.html

So in the specific case of SpanFirst -- any matching span is not 
only anchored (on the left) at the start of the field value, and (on the 
right) by at most max term position value specified, but the closer the 
sub-span match is to the start of the field value, the smaller the 
resulting Span, and the higher the score.

(If this general relationsihp of Span "width" to score isn't clear from 
the high level jdocs, then it should probably be called out better? ... 
i'm not sure if it's particulalry clear/obvious inthe PhraseQuery jdocs 
either)



-Hoss
http://www.lucidworks.com/


Re: Atomic Update Failure With solr.UUID Field

2018-09-17 Thread Chris Hostetter


My suggestion:

* completley avoid using UUIDField
* use StrField instead
* use the UUIDUpdateProcessorFactory if you want solr to generate the 
UUIDs for you when adding a new doc.

The fact that UUIDField internally passes values around as java.util.UUID 
objects (and other classes like it that don't stick to java "primative" 
ovalues) is the source of a large amount of pain in various places of the 
code base, with almost not value add to end users.


: Date: Wed, 29 Aug 2018 11:11:58 -0700
: From: Stephen Lewis Bianamara 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: Atomic Update Failure With solr.UUID Field
: 
: Hi All,
: 
: Just checking back in. Did anyone have a chance to take a look? Would love
: to get some help here. My design requires docs with many UUIDs which should
: not need to be updated each time and should be optimally performant for
: filters. So I think this bug is currently a hard blocker for me to be able
: to use SOLR :( Is anyone from the SOLR community able to assist? I've
: gathered some additional data in the mean time, and I would really
: appreciate someone familiar with the area taking a look.
: 
: Here are my additional discoveries
: 
:1. Turning on doc values and turning off stored, atomic updates work as
:they're supposed to with UUID
:2. Turning on doc values and turning on stored, atomic updates break as
:before with UUID. Thus it is 100% an effect of turning on stored.
:3. The error is being thrown here
:

:.
: 
: From the point that the error is thrown, I see a couple of possible options
: as to what the fix may be. However, I'm relatively new to the innards of
: the SOLR stack and only an occasional Java dev, so I'd love some guidance
: on the matter.
: 
: Perhaps the fix is to make java.Util.UUID implement BytesRef? Perhaps the
: fix is to add another bit of logic after the " if (o instanceof BytesRef) "
: conditional block. Something like, cast the object to a UUID and then
: serialize to a byte array?
: 
: Cheers,
: Stephen
: 
: On Wed, Aug 22, 2018 at 8:53 AM Stephen Lewis Bianamara <
: stephen.bianam...@gmail.com> wrote:
: 
: > Hello again! I found a thread which seems relevant. It looks like someone
: > else found this occurred as well, but did not follow up with repro steps.
: > But I did! :)
: >
: >
: > 
http://lucene.472066.n3.nabble.com/TransactionLog-doesn-t-know-how-to-serialize-class-java-util-UUID-try-implementing-ObjectResolver-td4332277.html
: >
: > Would love to work together to get this fixed.
: >
: > On Tue, Aug 21, 2018 at 6:50 PM Stephen Lewis Bianamara <
: > stephen.bianam...@gmail.com> wrote:
: >
: >> Hello SOLR Community,
: >>
: >> I'm prototyping a collection on SOLR 6.6.3 with UUID fields, and I'm
: >> hitting some trouble with atomic updates. At a high level, here's the
: >> problem: suppose you have a schema with an optional field of type solr.UUID
: >> field, and a document with a value for that field. Any atomic update on
: >> that document which does not contain the UUID field will fail. Below I
: >> provide an example and then an exact set of repro steps.
: >>
: >> So for example, suppose I have the following doc: {"Id":1,
: >> "SomeString":"woof", "MyUUID":"617c7768-7cc3-42d0-9ae1-74398bc5a3e7"}. If I
: >> run an atomic update on it like {"Id":1,"SomeString":{"set":"meow"}}, it
: >> will fail with message "TransactionLog doesn't know how to serialize class
: >> java.util.UUID; try implementing ObjectResolver?"
: >>
: >> Is this a known issue? Precise repro below. Thanks!
: >>
: >> Exact repro
: >> -
: >> 1. Define collection MyCollection with the following schema:
: >>
: >> 
: >>   
: >> 
: >> 
: >> 
: >> 
: >>   
: >>   Id
: >>   
: >> 
: >> 
: >>   
: >>
: >> 2. Create a document {"Id":1, "SomeString":"woof"} in the admin UI
: >> (MyCollection > Documents > /update). The update succeeds and the doc is
: >> searchable.
: >> 3. Apply the following atomic update. It succeeds. {"Id":1,
: >> "SomeString":{"set":"bark"}}
: >> 4. Add a value for MyUUID (either with atomic update or regular). It
: >> succeeds. {"Id":1,  
"MyUUID":{"set":"617c7768-7cc3-42d0-9ae1-74398bc5a3e7"}}
: >> 5. Try to atomically update just the SomeString field. It fails.
: >> {"Id":1,  "SomeString":{"set":"meow"}}
: >>
: >> The error that happens on failure is the following.
: >>
: >> Status: 
{"data":{"responseHeader":{"status":500,"QTime":2},"error":{"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],"msg":"TransactionLog
: >> doesn't know how to serialize class java.util.UUID; try implementing
: >> ObjectResolver?","trace":"org.apache.solr.common.SolrException:
: >> TransactionLog doesn't know how to serialize class java.util.UUID; try
: >

Re: Solr standalone health checks

2018-09-17 Thread Shawn Heisey

On 9/17/2018 3:01 PM, Christopher Schultz wrote:

The basic questions I'd like to have answered on a regular basis are:

1. Is the JVM up (this can be done with a ping, of course)
2. Is the heap healthy? Any OOMEs?
3. Will a sample query return in a reasonable amount of time?

1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2 is
trickier. I can do this via JMX, but I'd prefer to avoid spinning-up a
whole JVM just to probe Solr for one or two values.


If your Solr version is at least 5.5.1 and you're NOT on Windows, number 
2 can also be verified by a ping request.


With a new enough version on the correct operating system, Solr is 
started with an option that will kill the process should an 
OutOfMemoryError occur.  When that happens, it won't be able to answer a 
ping request.


Here's the issue that fixes a problem with the startup on 5.5.1 or later:

https://issues.apache.org/jira/browse/SOLR-8145

Thanks,
Shawn



Re: Solr standalone health checks

2018-09-17 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Shawn,

On 9/17/18 17:21, Shawn Heisey wrote:
> On 9/17/2018 3:01 PM, Christopher Schultz wrote:
>> The basic questions I'd like to have answered on a regular basis
>> are:
>> 
>> 1. Is the JVM up (this can be done with a ping, of course) 2. Is
>> the heap healthy? Any OOMEs? 3. Will a sample query return in a
>> reasonable amount of time?
>> 
>> 1 and 3 are quite easily done using e.g. /solr/[c]/ping, but #2
>> is trickier. I can do this via JMX, but I'd prefer to avoid
>> spinning-up a whole JVM just to probe Solr for one or two
>> values.
> 
> If your Solr version is at least 5.5.1 and you're NOT on Windows,
> number 2 can also be verified by a ping request.

Interesting. I did mention 7.4.0 but not my OS. I'm on Debian Linux,
and I'm running Solr using the Solr-supplied init.d scripts (via solr
install).

> With a new enough version on the correct operating system, Solr is 
> started with an option that will kill the process should an 
> OutOfMemoryError occur.  When that happens, it won't be able to
> answer a ping request.
> 
> Here's the issue that fixes a problem with the startup on 5.5.1 or
> later:
> 
> https://issues.apache.org/jira/browse/SOLR-8145

Given that, I'll go ahead and set things up to do a simple
/solr/[c]/ping request for health-monitoring.

Thanks,
- -chris
-BEGIN PGP SIGNATURE-
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlugIGMACgkQHPApP6U8
pFi8YQ//dwJdJG1VtqUgbFI437HzUMhuI+9SBOf0nateQFqQbfoqkLhC/z3dwjvj
qqhqcT68D2x1bYYk/5we7KD9I6PZ50mL5sZlU34NYC9AFMB5QEdTtWljlqGM/Xoe
elvsKYJVmZn9kvc6iwqyLU71clcRX27NhEDAFrPrCmhgZKRTpNqtgYyEOsIJZ/CL
muMml4hV5eNIc+VOle+jcqwTrWY4xtaf6Fmo6NLCsUvC2CB5/QI7JoYzvnLvVVMD
IVn6AnsLd/wIVSJiPyVYDA58/pVj1w6Jb36L8eg0fxfoO+eAkObUU3s71QglZlIx
m9Qkd8lGQ7qNxUDOMSgPNW/j7tZcxn39FRsM9b3z7kWJGriBcz/S5jX9QSNcArmh
pyHIf48y8wOgl/wQsmsGgXsHtdlwJu+84B3sFGjUKQU/2JPO88XJEo+pKluaMFDO
E2yZGdTvfRbXLTqe/XCGN89yKyIOKJAX2ZXP9EU0PmFSFbeod6oqbT/MKO3+DzCm
PpkUV10vlmqnsJ+5edj89hmM5gJOKcwQTDZ2E/U5tvs4DJHZTG578hnZp1coDU/c
m7M80m5SyE/5ycYBODp6oyJNAkEf6suJ+BIyQkr61t9/L7yvwSm80nFheFpVMIMX
N/lRL9ar4U/lLDL00aVhDecyNSFOvDjSUBlIlQ4hUb80bZiz3xY=
=lOp1
-END PGP SIGNATURE-


Re: Is that a mistake or bug?

2018-09-17 Thread zhenyuan wei
Oh~ got it, Thanks

 于2018年9月16日周日 下午9:54写道:

> I think we both understand you well :) So once again, to explain it to
> you, please have a look at the aforementioned
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/component/ResponseBuilder.java,
> these lines:
>
> final Boolean segmentTerminatedEarly =
> result.getSegmentTerminatedEarly();
> if (segmentTerminatedEarly != null) {
>
> rsp.getResponseHeader().add(SolrQueryResponse.RESPONSE_HEADER_SEGMENT_TERMINATED_EARLY_KEY,
> segmentTerminatedEarly);
> }
>
> Got it now? :)
>
> Petr
> __
> > Od: "zhenyuan wei" 
> > Komu: solr-user@lucene.apache.org
> > Datum: 03.09.2018 11:21
> > Předmět: Re: Is that a mistake or bug?
> >
> >Oh ~ I feel embarrassed to  explaining it again, maybe my english not so
> >well~
> >my actually mean is:   IF  QueryResult.segmentTerminatedEarly  is boolean
> >,not Boolean , declared in QueryResult.
> >public class QueryResult{
> >   private boolean partialResults
> >  * private Boolean segmentTerminatedEarly;  >  private boolean
> >segmentTerminatedEarly;*
> >   ..
> >}
> >
> >then  in QueryComponent.process() method, like follow :
> >
> >QueryResult result = new QueryResult();
>
> >cmd.setSegmentTerminateEarly(params.getBool(CommonParams.SEGMENT_TERMINATE_EARLY,
> >CommonParams.SEGMENT_TERMINATE_EARLY_DEFAULT));
> >
> >
> >*if (cmd.getSegmentTerminateEarly()) { // this if block code can be
> >deleted .result.setSegmentTerminatedEarly(Boolean.FALSE); } *
> >
> >
> >
> >
> >
> >
> >
> > 于2018年9月3日周一 下午4:52写道:
> >
> >> Hi, really nope :) Because as MK writes below,
> >> result.segmentTerminatedEarly is used as a 3-state variable.
> >>
> >> The only line that could be improved, is probably replacing
> >> "Boolean.FALSE" by simply "false", but that is really a minor thing...
> >>
> >> Regards
> >>
> >> PB
> >> __
> >> > Od: "zhenyuan wei" 
> >> > Komu: solr-user@lucene.apache.org
> >> > Datum: 03.09.2018 10:24
> >> > Předmět: Re: Is that a mistake or bug?
> >> >
> >> >I mean, use terminatedEarly as basic boolean type, then  no need to
> >> explicitly
> >> >assign it as Boolean.FALSE,  because basic boolean's default value is
> >> false.
> >> >
> >> >Mikhail Khludnev  于2018年9月3日周一 下午4:13写道:
> >> >
> >> >> Nope. In this case, it will respond terminatedEarly=false even if
> noone
> >> >> request it.
> >> >>
> >> >> On Mon, Sep 3, 2018 at 9:09 AM zhenyuan wei 
> wrote:
> >> >>
> >> >> > Yeah,got it~. So the QueryResult.segmentTerminatedEarly maybe a
> >> boolean,
> >> >> > instead of Boolean,  is better, right?
> >> >> >
> >> >> > Mikhail Khludnev  于2018年9月3日周一 下午1:36写道:
> >> >> >
> >> >> > > It's neither, it's on purpose. By default
> >> >> result.segmentTerminatedEarly
> >> >> > is
> >> >> > > null, hence it doesn't appear in result output. see
> >> >> > > ResponseBuilder.setResult(QueryResult).
> >> >> > > So, if cmd requests early termination, it sets false by default,
> >> >> enabling
> >> >> > > "false" output even it won't be the case. And later it might be
> >> flipped
> >> >> > to
> >> >> > > true.
> >> >> > >
> >> >> > >
> >> >> > > On Mon, Sep 3, 2018 at 5:57 AM zhenyuan wei 
> >> wrote:
> >> >> > >
> >> >> > > > Hi all,
> >> >> > > > I saw the code like following:
> >> >> > > >
> >> >> > > > QueryResult result = new QueryResult();
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
> cmd.setSegmentTerminateEarly(params.getBool(CommonParams.SEGMENT_TERMINATE_EARLY,
> >> >> > > > CommonParams.SEGMENT_TERMINATE_EARLY_DEFAULT));
> >> >> > > > if (cmd.getSegmentTerminateEarly()) {
> >> >> > > >   result.setSegmentTerminatedEarly(Boolean.FALSE);
> >> >> > > > }
> >> >> > > >
> >> >> > > > It says if request's param segmentTerminateEarly=true, which
> means
> >> >> > search
> >> >> > > > maybe terminated early within a segment,  then set
> >> >> > > > result.setSegmentTerminatedEarly as false , this code is of a
> >> little
> >> >> > > > confusion
> >> >> > > > .
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Sincerely yours
> >> >> > > Mikhail Khludnev
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> Sincerely yours
> >> >> Mikhail Khludnev
> >> >>
> >> >
> >> >
> >>
> >
> >
>


Re: 20180917-Need Apache SOLR support

2018-09-17 Thread Ere Maijala



Shawn Heisey kirjoitti 17.9.2018 klo 19.03:

7.   If I have Billions of indexes, If the "start" parameter is 10th
Million index and "end" parameter is  start+100th index, for this case 
any

performance issue will be raised ?


Let's say that you send a request with these parameters, and the index 
has three shards:


start=1000&rows=100

Every shard in the index is going to return a result to the coordinating 
node of ten million plus 100.  That's thirty million individual 
results.  The coordinating node will combine those results, sort them, 
and then request full documents for the 100 specific rows that were 
requested.  This takes a lot of time and a lot of memory.


What Shawn says above means that even if you give Solr a heap big enough 
to handle that, you'll run into serious performance issues even with a 
light load since the these huge allocations easily lead to 
stop-the-world garbage collections that kill performance. I've tried it 
and it was bad.


If you are thinking of a user interface that allows jumping to an 
arbitrary result page, you'll have to limit it to some sensible number 
of results (10 000 is probably safe, 100 000 may also work) or use 
something else than Solr. Cursor mark or streaming are great options, 
but only if you want to process all the records. Often the deep paging 
need is practically the need to see the last results, and that can also 
be achieved by allowing reverse sorting.


Regards,
Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Boost matches occurring early in the field (offset)

2018-09-17 Thread Ere Maijala
The original question is interesting and also I'd like to boost terms 
with lower positions, but if it's possible with the payload stuff, the 
slides and the article at 
https://lucidworks.com/2017/09/14/solr-payloads/ left me completely 
confused. A simple complete example would be so great.


Regards,
Ere

Alexandre Rafalovitch kirjoitti 29.8.2018 klo 23.51:

TokenOffsetPayloadTokenFilter ? It is mentioned in
https://www.slideshare.net/lucidworks/payloads-in-solr-erik-hatcher-lucidworks
, but no detailed example seems to be given.

I do see this question from time to time, so a definitive feedback
would be useful for the future.

Regards,
Alex.

On 29 August 2018 at 16:18, Jan Høydahl  wrote:

I also tend to use "sentinel tokens" for exact match or to anchor a search. But 
in order to obtain decaying boost the further down in the article a match is, you'd need 
to write several such span/slop queries with varying slops, e.g. highest boost for first 
10 words, medium boost for first 50 words, low boost for first 150 words, no boost below 
that.

As I wrote in my initial mail, we can do such workarounds, or play with 
payloads etc. But my real question is whether/how it is possible to factor the 
actual term offset information from a matching term into the scoring algorithm? 
Would you need to implement your own Scorer/Weight impl?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


29. aug. 2018 kl. 15:37 skrev Doug Turnbull 
:

You can also insert a token at the beginning of the query during analysis
using a char filter. I call these sort of boundary tokens "sentinel
tokens". So a phrase search for "red shoes" becomes " red shoes".
You can add some slop to allow for permissible distance (with

You can also use the Limit Token Count Token Filter and create a copyField,
so if you want to boost on first 10 matches, just limit to 10 tokens then
use this as a boost query
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LimitTokenCountFilter

-Doug

On Wed, Aug 29, 2018 at 6:26 AM Mikhail Khludnev  wrote:



<
https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-XMLQueryParser




On Wed, Aug 29, 2018 at 1:19 PM Jan Høydahl  wrote:


Hi,

Is there an ootb way to boost term matches based on their position/offset
inside a field, so that the term gets a higher score if it occurs in the
befinning of the field and lower boost or a deboost if it occurs towards
the end of a field?

I know that I could index the first part of the text in a new field and
boost on that, but that is kind of "binary".
I could also add the term offset as payload for every term and boost on
that, but this should not be necessary since offset info is already part

of

the index?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com




--
Sincerely yours
Mikhail Khludnev


--
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug




--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: 20180917-Need Apache SOLR support

2018-09-17 Thread zhenyuan wei
Is that means: Small amount of shards  gains  better performance?
I also have a usecase which contains 3 billion documents,the collection
contains 60 shard now. Is that 10 shard is better than 60 shard?



Shawn Heisey  于2018年9月18日周二 上午12:04写道:

> On 9/17/2018 7:04 AM, KARTHICKRM wrote:
> > Dear SOLR Team,
> >
> > We are beginners to Apache SOLR, We need following clarifications from
> you.
>
> Much of what I'm going to say is a mirror of what you were already told
> by Jan.  All of Jan's responses are good.
>
> > 1.  In SOLRCloud, How can we install more than one Shared on Single
> PC?
>
> One Solr instance can run multiple indexes.  Except for one specific
> scenario that I hope you don't run into, you should NOT run multiple
> Solr instances per server.  There should only be one.  If your query
> rate is very low, then you can get good performance from multiple shards
> per node, but with a high query rate, you'll only want one shard per node.
>
> Thanks,
> Shawn
>
>


Re: 20180917-Need Apache SOLR support

2018-09-17 Thread Shawn Heisey

On 9/17/2018 9:05 PM, zhenyuan wei wrote:

Is that means: Small amount of shards  gains  better performance?
I also have a usecase which contains 3 billion documents,the collection
contains 60 shard now. Is that 10 shard is better than 60 shard?


There is no definite answer to this question.  It depends on a bunch of 
things.  How big is each shard once it's finally built?  What's your 
query rate?  How many machines do you have, and how much memory do those 
machines have?


Thanks,
Shawn



Re: using uuid for documents

2018-09-17 Thread Zahra Aminolroaya
Hello Alfonso,


Thanks. You used the dedupe updateRequestProcessorChain, so for this
application we cannot use the uuid updateRequestProcessorChain
individually?!


Best,
Zahra



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Modify Schema for Solr Cloud

2018-09-17 Thread Rathor, Piyush (US - Philadelphia)
Hi All,

I am new to solr cloud.

Can you please let me know on how to update the schema on solr cloud.

Thanks & Regards
Piyush Rathor
Consultant
Deloitte Digital (Salesforce.com / Force.com)
Deloitte Consulting Pvt. Ltd.
Office: +1 (615) 209 4980
Mobile : +1 (302) 397 1491
prat...@deloitte.com | 
www.deloitte.com
[cid:image001.png@01D012F3.6C4D42E0]
Please consider the environment before printing.


This message (including any attachments) contains confidential information 
intended for a specific individual and purpose, and is protected by law. If you 
are not the intended recipient, you should delete this message and any 
disclosure, copying, or distribution of this message, or the taking of any 
action based on it, by you is strictly prohibited.

v.E.1