Generic questions - increase performance

2016-04-13 Thread Bastien Latard - MDPI AG

Dear Folks, :-)

From this source 
, I read:
"Each incoming request requires a thread [...] If still more 
simultaneous requests (more than maxThreads) are received, they are 
stacked up inside the server socket"


I have a couple of generic questions.

1) *How would the increase of maxThreads will behave with RAM usage? 
e.g.: if I increase by 2, would it be twice more?*


2) *What's the defaults values of maxThreads and maxConnections?*
This post  says 
"maxConnections=10,000 and maxThreads=200"


3) Here is my config (/etc/tomcat7/server.xml):

*Is there a way to kill the request if someone make a big query (e.g.: 
50 seconds), but either close the connection or get a timeout after 5 
seconds?**(or is it default behavior?)

*
Thanks!

Kind regards,
Bastien



Re: Cache problem

2016-04-13 Thread Shawn Heisey
On 4/13/2016 12:57 AM, Bastien Latard - MDPI AG wrote:
> Thank you Shawn & Reth!
>
> So I have now some questions, again
>
>
> Remind: I have only Solr running on this server (i.e.: java + tomcat).
>
> /BTW: I needed to increase previously the java heap size because I
> went out of memory. Actually, you only see here 2Gb (8Gb previously)
> for JVM because I automatically restart tomcat for a better
> performance every 30 minutes if no DIH running./

If you size your heap appropriately and properly tune garbage
collection, restarting like this should be unnecessary.

> Question #1:
> From the picture above, we see Physical memory: ~60Gb
> *  -> is this because of -Xmx40960m AND -XX:MaxPermSize=20480m ? *

I don't actually know whether permgen is allocated from the heap, or *in
addition* to the heap.  Your current allocated heap size is 20GB, which
means that at most Java is taking up 30GB, but it might be just 20GB. 
The other 30-40GB is used by the operating system -- for disk caching
(the page cache).  It's perfectly normal for physical memory to be
almost completely maxed out.  The physical memory graph is nearly
useless for troubleshooting.

Here's a screenshot of one of my servers:

https://www.dropbox.com/s/55d4x33tpyyaoff/solr-dashboard-physical-mem.png?dl=0

Notice that the max heap here is 8GB ... yet physical memory has 59GB
allocated -- 95 percent.  There are some additional java processes
taking up a few GB, but the vast majority of the memory is used by the
OS page cache.

> Question #2:
> /"The OS caches the actual index files"./
>
> *Does this mean that OS will try to cache 47.48Gb for this index? (if
> not, how can I know the size of the cache)
> */Or are you speaking about page cache
> ?/

I am talking about the page cache, also known as the disk cache.  The OS
will potentially use *all* unassigned memory for the page cache.  You
can ask your operating system how much memory is being used for this
purpose.

> Question #3:
> /"documentCache does live in Java heap"
> /*Is there a way to know the real size used/needed by this caching?*

Solr does not report memory usage with that much detail.  Perhaps one
day it will, but we're not there yet.  The size of an entry in the
documentCache should be approximately the size of the stored data for
that document, plus Java overhead required to hold the data.  The
filterCache is the one that usually uses a large amount of memory.

Thanks,
Shawn



Re: Generic questions - increase performance

2016-04-13 Thread Shawn Heisey
On 4/13/2016 1:50 AM, Bastien Latard - MDPI AG wrote:
> From this source
> , I read:
> "Each incoming request requires a thread [...] If still more
> simultaneous requests (more than maxThreads) are received, they are
> stacked up inside the server socket"
>
> I have a couple of generic questions.
>
> 1) *How would the increase of maxThreads will behave with RAM usage?
> e.g.: if I increase by 2, would it be twice more?*

Java threads themselves use a *very* small amount of memory.  The memory
used will be determined by what those threads are *doing*.

> 2) *What's the defaults values of maxThreads and maxConnections?*
> This post  says
> "maxConnections=10,000 and maxThreads=200"

The maxThreads setting defaults to 200 in every container I've looked
at.  The Jetty that comes with Solr has this setting increased to 1
-- so that the limit is effectively removed for typical installations. 
The tomcat documentation says the following about maxConnections:  For
NIO the default is |1|. For APR/native, the default is |8192|.

> 3) Here is my config (/etc/tomcat7/server.xml):
> connectionTimeout="2"
>URIEncoding="UTF-8"
>redirectPort="8443" />
> *Is there a way to kill the request if someone make a big query (e.g.:
> 50 seconds), but either close the connection or get a timeout after 5
> seconds?**(or is it default behavior?)
> *

We strongly recommend using the Jetty that comes with Solr.  This is the
only deployment option that has official support starting with Solr
5.0.  That Jetty has been properly tuned for Solr -- an out-of-the-box
config for Tomcat (or any other container) isn't tuned.

For stopping queries that take too long, Solr has this setting:

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThetimeAllowedParameter

The timeAllowed parameter does not always work -- it depends on which
phase of a query is taking too long.

You can also configure the soTimeout in your client's TCP settings, and
in Tomcat, to kill the connection if it is idle for too long.

There may be other options that might work, these are the ones that I
know about.

Thanks,
Shawn



Re: Soft commit does not affecting query performance

2016-04-13 Thread Bhaumik Joshi
Hi Bill,


Please find below reference.

http://www.cloudera.com/documentation/enterprise/5-4-x/topics/search_tuning_solr.html
* "Enable soft commits and set the value to the largest value that 
meets your requirements. The default value of 1000 (1 second) is too aggressive 
for some environments."


Thanks & Regards,

Bhaumik Joshi



From: billnb...@gmail.com 
Sent: Monday, April 11, 2016 7:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Soft commit does not affecting query performance

Why do you think it would ?

Bill Bell
Sent from mobile


> On Apr 11, 2016, at 7:48 AM, Bhaumik Joshi  wrote:
>
> Hi All,
>
> We are doing query performance test with different soft commit intervals. In 
> the test with 1sec of soft commit interval and 1min of soft commit interval 
> we didn't notice any improvement in query timings.
>
>
>
> We did test with SolrMeter (Standalone java tool for stress tests with Solr) 
> for 1sec soft commit and 1min soft commit.
>
> Index stats of test solr cloud: 0.7 million documents and 1 GB index size.
>
> Solr cloud has 2 shard and each shard has one replica.
>
>
>
> Please find below detailed test readings: (all timings are in milliseconds)
>
>
> Soft commit - 1sec
> Queries per sec Updates per sec   Total Queries   
>   Total Q time   Avg Q Time Total Client time 
>   Avg Client time
> 1  5  
> 100 44340 
>443 48834
> 488
> 5  5  
> 101 128914
>   1276   143239  1418
> 10   5
>   104 295325  
> 2839   330931  3182
> 25   5
>   102 675319  
> 6620   793874  7783
>
> Soft commit - 1min
> Queries per sec Updates per sec   Total Queries   
>   Total Q time   Avg Q Time Total Client time 
>   Avg Client time
> 1  5  
> 100 44292 
>442 48569
> 485
> 5  5  
> 105 131389
>   1251   147174  1401
> 10   5
>   102 299518  
> 2936   337748  3311
> 25   5
>   108 742639  
> 6876   865222  8011
>
> As theory suggests soft commit affects query performance but in my case it 
> doesn't. Can you put some light on this?
> Also suggest if I am missing something here.
>
> Regards,
> Bhaumik Joshi
>
>
>
>
>
>
>
>
>
>
> [Asite]
>
> The Hyperloop Station Design Competition - A 48hr design collaboration, from 
> mid-day, 23rd May 2016.
> REGISTER HERE http://www.buildearthlive.com/hyperloop
[http://www.buildearthlive.com/resources/images/BuildEarthLiveLogo-Hyperloop-2.png]

The Hyperloop Station Design Competition - Build Earth 
Live
www.buildearthlive.com
The Hyperloop Station Design Competition. A 48hr design collaboration, from 
mid-day,23rd May.



>
> [Build Earth Live Hyperloop]
>
> [CC Award Winners 2015]


overseer status documentation

2016-04-13 Thread GOURAUD Emmanuel
Hi all, 

I'm looking for exact definitions of results returned by an "OVERSTATUS" API 
query, and the documentation talk about " various overseer APIs" 

does someone have more informations about that? 

cheers, 




Emmanuel GOURAUD 
Ingénieur Infogérance | IT Outsourcing Engineer 


egour...@jouve.fr 


Tél. : +33(0) 2 43 08 25 54 



1, rue du docteur Sauvé, 53100 Mayenne, France 
www.jouve.com   
Avant d'imprimer cet e-mail, réfléchissez à l'impact sur l'environnement. 
Merci. Before you print this e-mail, think about the impact on the environment. 
Thank you. 
Le présent mail ainsi que toutes les informations qu'il contient ne peuvent en 
aucun cas être considérés comme un engagement juridique de quelque nature que 
ce soit de JOUVE. Tout accord devra être formulé par écrit papier ultérieur 
signé par un représentant légal de JOUVE. Par ailleurs, si vous recevez ce mail 
par erreur, merci de nous le signaler et de le détruire ainsi que l'intégralité 
du document qui pourrait y être joint. 
The present email and all information included therein do not constitute a 
legal agreement accorded by Jouve. All legal agreements must be formulated in 
writing on paper by a legal representative of JOUVE. If you have received this 
email by mistake, please inform us of that fact and destroy the email and any 
documents it might contain. Thank you for your cooperation. 


Solr facet using gap function

2016-04-13 Thread Ali Nazemian
Dear all,
Hi,

I am wondering, is there any way to introduce and add a function for facet
gap parameter? I already know there are some Date Math that can be used.
(Such as DAY, MONTH, and etc.) I want to add some functions and try to use
them as gap in facet range; Is it possible?

Sincerely,
Ali.


Configuring F5 load balancer with Solr cloud to switch between clusters on separate servers

2016-04-13 Thread preeti kumari
Hi,

I have a solr cloud setup.
Two clusters - Primary and Secondary. Each cluster has two collections.
Primary cluster is hosted on Solr Node N1,   N2,  N3
Secondary Cluster is hosted on Solr Nodes N3, N4, N5 .
All solr nodes are on different physical servers.
Primary clusters are running on zk1,zk2,zk3 quorum  whereas Secondary is
running on zk4,zk5,zk6.

Now for production i need to configure F5 such that if i don't have
response from Primary cluster , i should route it to Secondary cluster.

Please let me know how F5 would help to achieve this.

How to configure solr nodes to F5 load balancer?
If any other load balancer would help to achieve this or using
cloudsolrclient will make my setup fault tolerant.

Thanks
Preeti


Re: Cache problem

2016-04-13 Thread Bastien Latard - MDPI AG

Thank you all again for your good and detailed answer.
I will combine all of them to try to build a better environment.

*Just a last question...*
/I don't remember exactly when I needed to increase the java heap.../
/but is it possible that this was for the DataImport.../

*Would the DIH work if it cannot "load" the temporary index into the 
java heap in the full-index mode?*
I thought that's why I needed to increase this value...but I might be 
confused!


kind regards,
Bastien

On 13/04/2016 09:54, Shawn Heisey wrote:

>Question #1:
> From the picture above, we see Physical memory: ~60Gb
>*  -> is this because of -Xmx40960m AND -XX:MaxPermSize=20480m ? *

I don't actually know whether permgen is allocated from the heap, or *in
addition* to the heap.  Your current allocated heap size is 20GB, which
means that at most Java is taking up 30GB, but it might be just 20GB.
The other 30-40GB is used by the operating system -- for disk caching
(the page cache).  It's perfectly normal for physical memory to be
almost completely maxed out.  The physical memory graph is nearly
useless for troubleshooting.


Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/



Re: Configuring F5 load balancer with Solr cloud to switch between clusters on separate servers

2016-04-13 Thread preeti kumari
Updating the info :

Primary cluster is hosted on Solr Node N1,   N2,  N3
Secondary Cluster is hosted on Solr Nodes N4, N5, N6.
All solr nodes (N1-N6)  are on different physical servers.

On Wed, Apr 13, 2016 at 3:59 PM, preeti kumari 
wrote:

> Hi,
>
> I have a solr cloud setup.
> Two clusters - Primary and Secondary. Each cluster has two collections.
> Primary cluster is hosted on Solr Node N1,   N2,  N3
> Secondary Cluster is hosted on Solr Nodes N3, N4, N5 .
> All solr nodes are on different physical servers.
> Primary clusters are running on zk1,zk2,zk3 quorum  whereas Secondary is
> running on zk4,zk5,zk6.
>
> Now for production i need to configure F5 such that if i don't have
> response from Primary cluster , i should route it to Secondary cluster.
>
> Please let me know how F5 would help to achieve this.
>
> How to configure solr nodes to F5 load balancer?
> If any other load balancer would help to achieve this or using
> cloudsolrclient will make my setup fault tolerant.
>
> Thanks
> Preeti
>
>
>


Re: Solr document duplicated during pagination

2016-04-13 Thread Anil
Yes Erick.

I have the attached the queries generated from logs.

i see many duplicate records :( . i could not see any duplicates on solr
admin console.

Each run giving different number of duplicates.

Do you think Not (-) on query is an issue? please advice.

Thanks,
Anil




On 10 April 2016 at 21:28, Erick Erickson  wrote:

> If the index is being updated while indexing, this
> can happen.
>
> But what do you mean by  "i see page 1 & 2 has
> common documents and similarly in other pages as well"?
>
> Is it the _same_ id ( as Lior mentions)?
> Docs are "the same" to Solr if and only if they have the
> same 
>
> Best,
> Erick
>
> On Sun, Apr 10, 2016 at 6:13 AM, Lior Sapir  wrote:
> > It will not happen but you must:
> > 1. Have Unique ID for each document
> > 2. Make sure you define this field in the schema.xml
> >  YOUR_DOC_UQ_ID_FIELD_NAME
> > 3. If you are using multiple shards query  and not using solr cloud then
> > you have to make sure you are not inserting the same document into two
> > different shards. The uniqueness I mentioned in sections 1,2  is only
> for a
> > specific shard/core. There is no way that one solr core will enforce
> > uniqueness on other shards/cores unless you use solr cloud.
> >
> >
> > On Sun, Apr 10, 2016 at 2:53 PM, Anil  wrote:
> >
> >> HI,
> >>
> >> i am loading solr recrods for a particular query to application  cache.
> >>
> >> Lets say total number of eligible records (numFound) are 501.
> >>
> >> my solr queries would be
> >>
> >> page 1 : q=*:*&start=0&rows=100
> >> page 2 : q=*:*&start=100&rows=100
> >> page 3 : q=*:*&start=200&rows=100
> >> page 4 : q=*:*&start=300&rows=100
> >> page 5 : q=*:*&start=400&rows=100
> >> page 6 : q=*:*&start=500&rows=100
> >>
> >> i see page 1 & 2 has common documents and similarly in other pages as
> well.
> >> Is this correct behavior ? Please correct.
> >>
> >>
> >> Thanks,
> >> Anil
> >>
>
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=100&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=200&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=300&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=400&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=500&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=600&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=700&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=800&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=900&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1000&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1100&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1200&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1300&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1400&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1500&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1600&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1700&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1800&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=1900&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=2000&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=2100&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=2200&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=2300&debugQuery=false&collection=product-collection_default&fl=id
Query : q=-status:"CLOSED" AND 
productType:"p"&rows=100&start=2400&debugQuery=false&collection=product-collection_default&fl=id
Query : q=

RE: EmbeddedSolr for unit tests in Solr 6

2016-04-13 Thread Rohana Rajapakse
Thanks Shalin,

I am now trying to use MiniSolrCloudCluster to create mini cluster in solr 6/7 
as below:
 
MiniSolrCloudCluster miniCluster = new MiniSolrCloudCluster(1, 
temp_folder_path, path_to_solr.xml, null);


It throws the following exception:

org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /solr/solr.xml
at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:503)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:500)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:458)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:445)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:432)
at 
org.apache.solr.cloud.MiniSolrCloudCluster.(MiniSolrCloudCluster.java:199)
at 
org.apache.solr.cloud.MiniSolrCloudCluster.(MiniSolrCloudCluster.java:168)
at 
com.gossinteractive.solr.TestMiniSolrCloudCluster.setup(TestMiniSolrCloudCluster.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)


This is a Zookeeper exception and I cannot figure out what is wrong. Can 
someone shed some light please?

Rohana


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: 12 April 2016 13:19
To: solr-user@lucene.apache.org
Subject: Re: EmbeddedSolr for unit tests in Solr 6

Rohana, as I said earlier, the MiniSolrCloudCluster is specifically made for 
your use-case i.e. where you want to quickly setup a SolrCloud cluster in your 
own application for testing. It is available in the solr-test-framework 
artifact.

On Tue, Apr 12, 2016 at 4:31 PM, Rohana Rajapakse < 
rohana.rajapa...@gossinteractive.com> wrote:

> Please note that I am not writing unit tests for testing classes in Solr.
> I need a temporary Solr index to test classes in my own application 
> that needs a Solr index. I would like to use classes that are 
> available in solr-core and solr-solrj jars. I could do this easily in 
> solr-4.x versions using EmbeddedSolrServer. I prefer not to extend 
> SolrTestCaseJ4 class. Also MiniSolrCloudCluster is not available in solr-core 
> or solr-solrj jar.
>
> What is the best way of doing this in Solr-6.x / Solr-7.0  ?
>
> -Original Message-
> From: Joe Lawson [mailto:jlaw...@opensourceconnections.com]
> Sent: 11 April 2016 17:31
> To: solr-user@lucene.apache.org
> Subject: Re: EmbeddedSolr for unit tests in Solr 6
>
> Check for example tests here too:
>
> https://github.com/apache/lucene-solr/tree/master/solr/core/src/test/o
> rg/apache/solr
>
> On Mon, Apr 11, 2016 at 12:24 PM, Shalin Shekhar Mangar < 
> shalinman...@gmail.com> wrote:
>
> > Please use MiniSolrCloudCluster instead of EmbeddedSolrServer for 
> > unit/integration tests.
> >
> > On Mon, Apr 11, 2016 at 2:26 PM, Rohana Rajapakse < 
> > rohana.rajapa...@gossinteractive.com> wrote:
> >
> > > Thanks Shawn,
> > >
> > > I am now pointing solrHomeFolder to 
> > > lucene-solr-master\solr\server\solr
> > > which contains the correct solr.xml file.
> > > Tried the following two ways to create an EmbeddedSolrServer:
> > >
> > >
> > > 1. CoreContainer corecon =
> > > CoreContainer.createAndLoad(Paths.get(solr

CloudDescription sometimes null

2016-04-13 Thread Markus Jelsma
Hello - we use CloudDescriptor to get information about the collection. Very 
early after starting Solr, we obain an instance:
   cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();

In some strange cases, at some later point cloudDescriptor is null? Is it 
possible cloudDescriptor is being set at some later stage in Solr? When reading 
cloud information, do i always have to get a new cloudDescriptor instance?

Many thanks!
Markus


Re: boost parent fields BlockJoinQuery

2016-04-13 Thread michael solomon
Thank your for the response.
this query worked without errors:

> (city:"tucson"^1000) +{!parent which="is_parent:true"
> score=max}(normal_text:"silver ring")
>
however, this is not exactly what I was looking for..
I got from solr all the documents that their city field has the value
Tucson.
but I wanted to BOOST only the city:"Tucson" not search in this field.
Thank you a lot,
Micheal

On Tue, Apr 12, 2016 at 10:41 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Giving the error message you undercopypasted search query and and omit the
> closing bracket.
>
> On Tue, Apr 12, 2016 at 3:30 PM, michael solomon 
> wrote:
>
> > Thanks,
> > when I'm trying:
> > city:"walla walla"^10 {!parent which="is_parent:true"
> > score=max}(normal_text:walla)
> > I get:
> >
> > > "msg": "org.apache.solr.search.SyntaxError: Cannot parse
> > > '(normal_text:walla': Encountered \"\" at line 1, column 18.\nWas
> > > expecting one of:\n ...\n ...\n ...\n
> \"+\"
> > > ...\n\"-\" ...\n ...\n\"(\" ...\n\")\" ...\n
> > > \"*\" ...\n\"^\" ...\n ...\n ...\n
> > >  ...\n ...\n ...\n
> > >  ...\n\"[\" ...\n\"{\" ...\n ...\n
> > > \"filter(\" ...\n ...\n"
> >
> >
> > On Tue, Apr 12, 2016 at 1:30 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> > > Hello,
> > >
> > > It's usually
> > > parent_field:"bla bla"^10 {!parent which="is_parent:true"
> > > score=max}(child_field:bla)
> > > or
> > > parent_field:"bla bla"^10 +{!parent which="is_parent:true"
> > > score=max}(child_field:bla)
> > >
> > > there should be no spaces in child clause, otherwise extract it to
> param
> > > and refrer via v=$param
> > >
> > >
> > > On Tue, Apr 12, 2016 at 9:56 AM, michael solomon  >
> > > wrote:
> > >
> > > > Hi,
> > > > I'm using in BlockJoin Parser Query for return the parent of the
> > relevant
> > > > child i.e:
> > > > {!parent which="is_parent:true" score=max}(child_field:bla)
> > > >
> > > > It's possible to boost the parent? something like:
> > > >
> > > > {!parent which="is_parent:true" score=max}(child_field:bla)
> > > > parent_field:"bla bla"^10
> > > > Thanks,
> > > > Michael
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > 
> > > 
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Configuring F5 load balancer with Solr cloud to switch between clusters on separate servers

2016-04-13 Thread Shawn Heisey
On 4/13/2016 4:29 AM, preeti kumari wrote:
> I have a solr cloud setup.
> Two clusters - Primary and Secondary. Each cluster has two collections.
> Primary cluster is hosted on Solr Node N1,   N2,  N3
> Secondary Cluster is hosted on Solr Nodes N3, N4, N5 .
> All solr nodes are on different physical servers.
> Primary clusters are running on zk1,zk2,zk3 quorum  whereas Secondary is
> running on zk4,zk5,zk6.
>
> Now for production i need to configure F5 such that if i don't have
> response from Primary cluster , i should route it to Secondary cluster.
>
> Please let me know how F5 would help to achieve this.
>
> How to configure solr nodes to F5 load balancer?
> If any other load balancer would help to achieve this or using
> cloudsolrclient will make my setup fault tolerant.

You will need to ask someone who knows about the F5 how to do this. 
This question is outside the scope of this mailing list.

With haproxy (software load balancer) instead of F5, you would set all
of the servers in the second cluster as backup servers, which means they
would not be considered for usage unless all of the first cluster
servers were down.  I use haproxy with Solr, so I happen to know how it
works.

CloudSolrClient (from SolrJ) is only designed to handle one cluster --
it would not be able to fail over to a second cluster.

Thanks,
Shawn



Re: Cache problem

2016-04-13 Thread Shawn Heisey
On 4/13/2016 4:34 AM, Bastien Latard - MDPI AG wrote:
> Thank you all again for your good and detailed answer.
> I will combine all of them to try to build a better environment.
>
> *Just a last question...*
> /I don't remember exactly when I needed to increase the java heap.../
> /but is it possible that this was for the DataImport.../
>
> *Would the DIH work if it cannot "load" the temporary index into the
> java heap in the full-index mode?*
> I thought that's why I needed to increase this value...but I might be
> confused!

The default behavior on many JDBC drivers is to load the *entire* SQL
result into memory *before* sending those results to the requesting
application.  This is the way the MySQL driver behaves by default, and
the way that older versions of the Microsoft driver for SQL Server
behave by default.

There should be a way to tell the JDBC driver to stream the results back
instead of loading them into memory.  For MySQL, you just have to set
the batchSize parameter in the DIH config to -1, which causes the
underlying code to do "setFetchSize(Integer.MIN_VALUE)".  For SQL
Server, you need a recent version of the driver, where they changed the
default behavior.   For other databases, you may need a JDBC url parameter.

Thanks,
Shawn



Re: Solr document duplicated during pagination

2016-04-13 Thread Shawn Heisey
On 4/13/2016 4:57 AM, Anil wrote:
> Yes Erick.
>
> I have the attached the queries generated from logs.
>
> i see many duplicate records :( . i could not see any duplicates on
> solr admin console.
>
> Each run giving different number of duplicates.
>
> Do you think Not (-) on query is an issue? please advice.

There are two ways this can happen.  One is that the index has changed
between different queries, pushing or pulling results between the end of
one page and the beginning of the next page.  The other is having the
same uniqueKey value in more than one shard.

Lior Sapir indicated that SolrCloud would behave differently and
eliminate all duplicates from multiple shards, but this is *not* the
case.  Both cloud and non-cloud behave the same.  When the duplicates
are on different pages, they will not be filtered out.  Solr *will*
eliminate duplicates from all results *in the same query* ... but
different pages are different queries.

Thanks,
Shawn



number of zookeeper & aws instances

2016-04-13 Thread Jay Potharaju
Hi,

In my current setup I have about 30 million docs which will grow to 100
million by the end of the year. In order to accommodate scaling and query
load, i am planning to have atleast 2 shards and 2/3 replicas to begin
with. With the above solrcloud setup I plan to have 3 zookeepers in the
quorum.

If the number of replicas and shards increases, the number of solr
instances will also go up. With keeping that in mind I was wondering if
there are any guidelines on the number of zk instances to solr instances.

Secondly are there any recommendations for setting up solr in AWS?

-- 
Thanks
Jay


Adding custom filter plugin to solr cloud

2016-04-13 Thread Harsha JSN
Hi,
  I had set up solr cloud with some set of nodes. I am trying to add an
external library which has custom query parser logic. I have done this by
copying the custom jar file to lib folder in each node.
May i know if this is the correct way to do or is there a standard way to
add custom libraries in solr cloud.

Thanks
Harsha.


Re: Adding custom filter plugin to solr cloud

2016-04-13 Thread Erick Erickson
Copying files "to the right place" certainly is one way. It does
suffer from bookkeeping
issues, i.e. when you make changes you have to get the new jar files
all pushed out
to the right place on all the nodes.

Another possibility is:
https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
more docs here:
https://cwiki.apache.org/confluence/display/solr/Blob+Store+API

I haven't tried these personally, but as you can see the intent is to
deal with just
this case.

Best,
Erick

On Wed, Apr 13, 2016 at 6:52 AM, Harsha JSN  wrote:
> Hi,
>   I had set up solr cloud with some set of nodes. I am trying to add an
> external library which has custom query parser logic. I have done this by
> copying the custom jar file to lib folder in each node.
> May i know if this is the correct way to do or is there a standard way to
> add custom libraries in solr cloud.
>
> Thanks
> Harsha.


Re: number of zookeeper & aws instances

2016-04-13 Thread Erick Erickson
For collections with this few nodes, 3 zookeepers are plenty. From
what I've seen people don't go to 5 zookeepers until they have
hundreds and hundreds of nodes.

100M docs can fit on 2 shards, I've actually seen many more. That
said, if the docs are very large and/or the searchers are complex
performance may not be what you need. Here's a long blog on
testing a configuration to destruction to be _sure_ you can scale
as you need:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Wed, Apr 13, 2016 at 6:47 AM, Jay Potharaju  wrote:
> Hi,
>
> In my current setup I have about 30 million docs which will grow to 100
> million by the end of the year. In order to accommodate scaling and query
> load, i am planning to have atleast 2 shards and 2/3 replicas to begin
> with. With the above solrcloud setup I plan to have 3 zookeepers in the
> quorum.
>
> If the number of replicas and shards increases, the number of solr
> instances will also go up. With keeping that in mind I was wondering if
> there are any guidelines on the number of zk instances to solr instances.
>
> Secondly are there any recommendations for setting up solr in AWS?
>
> --
> Thanks
> Jay


Re: number of zookeeper & aws instances

2016-04-13 Thread Jay Potharaju
Thanks for the feedback Eric.
I am assuming the number of replicas help in load balancing and reliability. 
That being said are there any recommendation for that, or is it dependent on 
query load and performance sla's.

Any suggestions on aws setup?
Thanks


> On Apr 13, 2016, at 7:12 AM, Erick Erickson  wrote:
> 
> For collections with this few nodes, 3 zookeepers are plenty. From
> what I've seen people don't go to 5 zookeepers until they have
> hundreds and hundreds of nodes.
> 
> 100M docs can fit on 2 shards, I've actually seen many more. That
> said, if the docs are very large and/or the searchers are complex
> performance may not be what you need. Here's a long blog on
> testing a configuration to destruction to be _sure_ you can scale
> as you need:
> 
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> Best,
> Erick
> 
>> On Wed, Apr 13, 2016 at 6:47 AM, Jay Potharaju  wrote:
>> Hi,
>> 
>> In my current setup I have about 30 million docs which will grow to 100
>> million by the end of the year. In order to accommodate scaling and query
>> load, i am planning to have atleast 2 shards and 2/3 replicas to begin
>> with. With the above solrcloud setup I plan to have 3 zookeepers in the
>> quorum.
>> 
>> If the number of replicas and shards increases, the number of solr
>> instances will also go up. With keeping that in mind I was wondering if
>> there are any guidelines on the number of zk instances to solr instances.
>> 
>> Secondly are there any recommendations for setting up solr in AWS?
>> 
>> --
>> Thanks
>> Jay


Re: CloudDescription sometimes null

2016-04-13 Thread Erick Erickson
It takes a little time for core discovery to enumerate all of the
cores and fill in the various descriptors. That said, I'd be surprised
if you actually can hit this very often since the coreDescriptor
creation code also creates the cloudDescriptor and they're both loaded
by the enumeration process and are just loaded from the
core.properties file.

And the coreDescriptor isn't even added to the list of coreDescriptors
until the cloudDescriptor has been built so I'd always expect
getCoreDescriptor() to return null but _not_
getCOreDescriptor().getCloudDescriptor.

So I'm really puzzled (or reading the code wrong).

Erick

On Wed, Apr 13, 2016 at 5:11 AM, Markus Jelsma
 wrote:
> Hello - we use CloudDescriptor to get information about the collection. Very 
> early after starting Solr, we obain an instance:
>cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
>
> In some strange cases, at some later point cloudDescriptor is null? Is it 
> possible cloudDescriptor is being set at some later stage in Solr? When 
> reading cloud information, do i always have to get a new cloudDescriptor 
> instance?
>
> Many thanks!
> Markus


Re: number of zookeeper & aws instances

2016-04-13 Thread Erick Erickson
bq: or is it dependent on query load and performance sla's

Exactly. The critical bit is that every single replica meets your SLA.
By that I mean let's claim that your SLA is 500ms. If you can
serve 10 qps at that SLA with one replica/shard (i.e. leader only)
you can server 50 QPS by adding 4 more replicas.

What you _cannot_ do is reduce the 500ms response time by
adding more replicas. You'll need to add more shards, which probably
means re-indexing. Which is why I recommend pushing a test system
to destruction before deciding on the final numbers.

And having at least 2 replicas shard (leader and replica) is usually
a very good thing because Solr will stop serving queries or indexing
if all the replicas for any shard are down.

Best,
Erick

On Wed, Apr 13, 2016 at 7:19 AM, Jay Potharaju  wrote:
> Thanks for the feedback Eric.
> I am assuming the number of replicas help in load balancing and reliability. 
> That being said are there any recommendation for that, or is it dependent on 
> query load and performance sla's.
>
> Any suggestions on aws setup?
> Thanks
>
>
>> On Apr 13, 2016, at 7:12 AM, Erick Erickson  wrote:
>>
>> For collections with this few nodes, 3 zookeepers are plenty. From
>> what I've seen people don't go to 5 zookeepers until they have
>> hundreds and hundreds of nodes.
>>
>> 100M docs can fit on 2 shards, I've actually seen many more. That
>> said, if the docs are very large and/or the searchers are complex
>> performance may not be what you need. Here's a long blog on
>> testing a configuration to destruction to be _sure_ you can scale
>> as you need:
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> Best,
>> Erick
>>
>>> On Wed, Apr 13, 2016 at 6:47 AM, Jay Potharaju  
>>> wrote:
>>> Hi,
>>>
>>> In my current setup I have about 30 million docs which will grow to 100
>>> million by the end of the year. In order to accommodate scaling and query
>>> load, i am planning to have atleast 2 shards and 2/3 replicas to begin
>>> with. With the above solrcloud setup I plan to have 3 zookeepers in the
>>> quorum.
>>>
>>> If the number of replicas and shards increases, the number of solr
>>> instances will also go up. With keeping that in mind I was wondering if
>>> there are any guidelines on the number of zk instances to solr instances.
>>>
>>> Secondly are there any recommendations for setting up solr in AWS?
>>>
>>> --
>>> Thanks
>>> Jay


RE: CloudDescription sometimes null

2016-04-13 Thread Markus Jelsma
Hello,

That core.getCoreDescriptor().getCloudDescriptor(); piece of code runs in 
RequestHandlerBase.inform(core) so maybe that is too early. I'll add a NPE 
check and reset cloudDescriptor is missing.

Thanks!
Markus 
 
-Original message-
> From:Erick Erickson 
> Sent: Wednesday 13th April 2016 16:23
> To: solr-user 
> Subject: Re: CloudDescription sometimes null
> 
> It takes a little time for core discovery to enumerate all of the
> cores and fill in the various descriptors. That said, I'd be surprised
> if you actually can hit this very often since the coreDescriptor
> creation code also creates the cloudDescriptor and they're both loaded
> by the enumeration process and are just loaded from the
> core.properties file.
> 
> And the coreDescriptor isn't even added to the list of coreDescriptors
> until the cloudDescriptor has been built so I'd always expect
> getCoreDescriptor() to return null but _not_
> getCOreDescriptor().getCloudDescriptor.
> 
> So I'm really puzzled (or reading the code wrong).
> 
> Erick
> 
> On Wed, Apr 13, 2016 at 5:11 AM, Markus Jelsma
>  wrote:
> > Hello - we use CloudDescriptor to get information about the collection. 
> > Very early after starting Solr, we obain an instance:
> >cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
> >
> > In some strange cases, at some later point cloudDescriptor is null? Is it 
> > possible cloudDescriptor is being set at some later stage in Solr? When 
> > reading cloud information, do i always have to get a new cloudDescriptor 
> > instance?
> >
> > Many thanks!
> > Markus
> 


Re: Which line is solr following in terms of a BI Tool?

2016-04-13 Thread Kevin Risden
For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more
as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains
SOLR-8502. There are further improvements coming in SOLR-8659 that didn't
make it into 6.0. The Solr JDBC piece leverages ParallelSQL and in some
cases uses JSON facets under the hood.

The Solr JDBC driver should enable BI tools to connect to Solr and use the
language of SQL. This is also a familiar interface for many Java developers.

Just a note: Solr is not an RDBMS and shouldn't be treated like one even
with a JDBC driver. The Solr JDBC driver is more of a convenience for
querying.

Kevin Risden

On Tue, Apr 12, 2016 at 6:24 PM, Erick Erickson 
wrote:

> The unsatisfactory answer is that the have different characteristics.
>
> The analytics contrib does not work in distributed mode. It's not
> receiving a lot of love at this point.
>
> The JSON facets are estimations. Generally very close but are not
> guaranteed to be 100% accurate. The variance, as I understand it,
> is something on the order of < 1% in most cases.
>
> The pivot facets are accurate, but more expensive than the JSON
> facets.
>
> And, to make matters worse, the ParllelSQL way of doing some
> aggregations is going to give yet another approach.
>
> Best,
> Erick
>
> On Tue, Apr 12, 2016 at 7:15 AM, Pablo  wrote:
> > Hello,
> > I think this topic is important for solr users that are planning to use
> solr
> > as a BI Tool.
> > Speaking about facets, nowadays there are three majors way of doing
> (more or
> > less) the same  in solr.
> > First, you have the pivot facets, on the other hand you have the
> Analytics
> > component and finally you have the JSON Facet Api.
> > So, which line is Solr following? Which of these component is going to
> be in
> > constant development and which one is going to be deprecated sooner.
> > In Yonik page, there are some test that shows how JSON Facet Api performs
> > better than legacy facets, also the Api was way simpler than the pivot
> > facets, so in my case that was enough to base my solution around the JSON
> > Api. But I would like to know what are the thoughts of the solr
> developers.
> >
> > Thanks!
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


update log not in ACTIVE or REPLAY state

2016-04-13 Thread michael dürr
Hello,

when I launch my two nodes in Solr cloud, I always get the following error
at node2:

PeerSync: core=portal_shard1_replica2 url=http://127.0.1.1:8984/solr ERROR,
update log not in ACTIVE or REPLAY state. FSUpdateLog{state=BUFFERING,
tlog=null}

Actually, I cannot experience any problems, but before going to production,
I wanted to know why I get this error?

I'm running two nodes (node1 and node2) in a Solr Cloud cluster (5.4.1).
node1 is started with embedded zookeeper and listens to port 8983. Node2
listens on port 8984 and registers with the embedded zookeeper of node1 at
port 9983.
I have one collection "portal" (1 shard, 2 replicas), where each node
serves one replica.
The settings for commit on both nodes are:



  ${solr.autoCommit.maxTime:15000}
  false


and


  ${solr.autoSoftCommit.maxTime:-1}


Can you give me some advise, how to get rid of this error?
Should I simply ignore it?

Thanks,
Michael


Re: Which line is solr following in terms of a BI Tool?

2016-04-13 Thread Pablo Anzorena
Thank you very much both of you for your insights!
I really appreciate it.



2016-04-13 11:30 GMT-03:00 Kevin Risden :

> For Solr 6, ParallelSQL and Solr JDBC driver are going to be developed more
> as well as JSON facets. The Solr JDBC driver that is in Solr 6 contains
> SOLR-8502. There are further improvements coming in SOLR-8659 that didn't
> make it into 6.0. The Solr JDBC piece leverages ParallelSQL and in some
> cases uses JSON facets under the hood.
>
> The Solr JDBC driver should enable BI tools to connect to Solr and use the
> language of SQL. This is also a familiar interface for many Java
> developers.
>
> Just a note: Solr is not an RDBMS and shouldn't be treated like one even
> with a JDBC driver. The Solr JDBC driver is more of a convenience for
> querying.
>
> Kevin Risden
>
> On Tue, Apr 12, 2016 at 6:24 PM, Erick Erickson 
> wrote:
>
> > The unsatisfactory answer is that the have different characteristics.
> >
> > The analytics contrib does not work in distributed mode. It's not
> > receiving a lot of love at this point.
> >
> > The JSON facets are estimations. Generally very close but are not
> > guaranteed to be 100% accurate. The variance, as I understand it,
> > is something on the order of < 1% in most cases.
> >
> > The pivot facets are accurate, but more expensive than the JSON
> > facets.
> >
> > And, to make matters worse, the ParllelSQL way of doing some
> > aggregations is going to give yet another approach.
> >
> > Best,
> > Erick
> >
> > On Tue, Apr 12, 2016 at 7:15 AM, Pablo  wrote:
> > > Hello,
> > > I think this topic is important for solr users that are planning to use
> > solr
> > > as a BI Tool.
> > > Speaking about facets, nowadays there are three majors way of doing
> > (more or
> > > less) the same  in solr.
> > > First, you have the pivot facets, on the other hand you have the
> > Analytics
> > > component and finally you have the JSON Facet Api.
> > > So, which line is Solr following? Which of these component is going to
> > be in
> > > constant development and which one is going to be deprecated sooner.
> > > In Yonik page, there are some test that shows how JSON Facet Api
> performs
> > > better than legacy facets, also the Api was way simpler than the pivot
> > > facets, so in my case that was enough to base my solution around the
> JSON
> > > Api. But I would like to know what are the thoughts of the solr
> > developers.
> > >
> > > Thanks!
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Which-line-is-solr-following-in-terms-of-a-BI-Tool-tp4269597.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Iana Bondarska
Hi All,
I'm trying to get solr version via solrj api. If I try to use
SystemInfoRequest without specifying collection -- I'm getting an error "No
collection is set and no default collection specified".
Could you tell me please, is there any way to get solr version without
specifying collection?

Thanks,
Iana


Querying of multiple string value

2016-04-13 Thread Zheng Lin Edwin Yeo
Hi,

Would like to find out, is there any way to do a multiple value query of a
field that is of type String, besides using the OR parameters?

Currently, I am using the OR parameters like
http://localhost:8983/solr/collection1/highlight?q=id:collection1_0001 OR
id:collection1_0002

But this will get longer and longer if I were to have many records to
retrieve based on their ID. The fieldType is string, so it is not possible
to do things like sorting, more than or less than.

I'm using Solr 5.4.0

Regards,
Edwin


Re: Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Shawn Heisey
On 4/13/2016 9:01 AM, Iana Bondarska wrote:
> I'm trying to get solr version via solrj api. If I try to use
> SystemInfoRequest without specifying collection -- I'm getting an error "No
> collection is set and no default collection specified".
> Could you tell me please, is there any way to get solr version without
> specifying collection?

What version of Solr?  What version of SolrJ?  is it SolrCloud?  Which
SolrClient implementation are you using?

I do not see anything in the Solr source code named "SystemInfoRequest",
so it would probably be a good idea to share your .java file on the
Internet and provide a URL to reach it.

Thanks,
Shawn



RE: number of zookeeper & aws instances

2016-04-13 Thread Garth Grimm
I thought that if you start with 3 Zk nodes in the ensemble, and only lose 1, 
it will have no effect on indexing at all, since you still have a quorum.

If you lose 2 (which takes you below quorum), then the cloud loses "confidence" 
in which solr core is the leader of each shard and stops indexing.  But queries 
will continue since no zk managed information is needed for that.

Please correct me if I'm wrong, on any of that.

-Original Message-
From: Daniel Collins [mailto:danwcoll...@gmail.com] 
Sent: Wednesday, April 13, 2016 10:34 AM
To: solr-user@lucene.apache.org
Subject: Re: number of zookeeper & aws instances

Just to chip in, more ZKs are probably only necessary if you are doing NRT 
indexing.

Loss of a single ZK (in a 3 machine setup) will block indexing for the time it 
takes to get that machine/instance back up, however it will have less impact on 
search, since the search side can use the existing state of the cloud to work.  
If you only index once a day, then that's fine, but in our scenario, we 
continually index all day long, so we can't afford a "break".
Hence we actually run 7 ZKs currently though we plan to go down to 5.  That 
gives us the ability to lose 2 machines without affecting indexing.

But as Erick says, for "normal" scenarios, where search load is much greater 
than indexing load, 3 should be sufficient.


On 13 April 2016 at 15:27, Erick Erickson  wrote:

> bq: or is it dependent on query load and performance sla's
>
> Exactly. The critical bit is that every single replica meets your SLA.
> By that I mean let's claim that your SLA is 500ms. If you can serve 10 
> qps at that SLA with one replica/shard (i.e. leader only) you can 
> server 50 QPS by adding 4 more replicas.
>
> What you _cannot_ do is reduce the 500ms response time by adding more 
> replicas. You'll need to add more shards, which probably means 
> re-indexing. Which is why I recommend pushing a test system to 
> destruction before deciding on the final numbers.
>
> And having at least 2 replicas shard (leader and replica) is usually a 
> very good thing because Solr will stop serving queries or indexing if 
> all the replicas for any shard are down.
>
> Best,
> Erick
>
> On Wed, Apr 13, 2016 at 7:19 AM, Jay Potharaju 
> wrote:
> > Thanks for the feedback Eric.
> > I am assuming the number of replicas help in load balancing and
> reliability. That being said are there any recommendation for that, or 
> is it dependent on query load and performance sla's.
> >
> > Any suggestions on aws setup?
> > Thanks
> >
> >
> >> On Apr 13, 2016, at 7:12 AM, Erick Erickson 
> >> 
> wrote:
> >>
> >> For collections with this few nodes, 3 zookeepers are plenty. From 
> >> what I've seen people don't go to 5 zookeepers until they have 
> >> hundreds and hundreds of nodes.
> >>
> >> 100M docs can fit on 2 shards, I've actually seen many more. That 
> >> said, if the docs are very large and/or the searchers are complex 
> >> performance may not be what you need. Here's a long blog on testing 
> >> a configuration to destruction to be _sure_ you can scale as you 
> >> need:
> >>
> >>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract
> -why-we-dont-have-a-definitive-answer/
> >>
> >> Best,
> >> Erick
> >>
> >>> On Wed, Apr 13, 2016 at 6:47 AM, Jay Potharaju 
> >>> 
> wrote:
> >>> Hi,
> >>>
> >>> In my current setup I have about 30 million docs which will grow 
> >>> to 100 million by the end of the year. In order to accommodate 
> >>> scaling and
> query
> >>> load, i am planning to have atleast 2 shards and 2/3 replicas to 
> >>> begin with. With the above solrcloud setup I plan to have 3 
> >>> zookeepers in the quorum.
> >>>
> >>> If the number of replicas and shards increases, the number of solr 
> >>> instances will also go up. With keeping that in mind I was 
> >>> wondering if there are any guidelines on the number of zk 
> >>> instances to solr
> instances.
> >>>
> >>> Secondly are there any recommendations for setting up solr in AWS?
> >>>
> >>> --
> >>> Thanks
> >>> Jay
>


Re: number of zookeeper & aws instances

2016-04-13 Thread Daniel Collins
Just to chip in, more ZKs are probably only necessary if you are doing NRT
indexing.

Loss of a single ZK (in a 3 machine setup) will block indexing for the time
it takes to get that machine/instance back up, however it will have less
impact on search, since the search side can use the existing state of the
cloud to work.  If you only index once a day, then that's fine, but in our
scenario, we continually index all day long, so we can't afford a "break".
Hence we actually run 7 ZKs currently though we plan to go down to 5.  That
gives us the ability to lose 2 machines without affecting indexing.

But as Erick says, for "normal" scenarios, where search load is much
greater than indexing load, 3 should be sufficient.


On 13 April 2016 at 15:27, Erick Erickson  wrote:

> bq: or is it dependent on query load and performance sla's
>
> Exactly. The critical bit is that every single replica meets your SLA.
> By that I mean let's claim that your SLA is 500ms. If you can
> serve 10 qps at that SLA with one replica/shard (i.e. leader only)
> you can server 50 QPS by adding 4 more replicas.
>
> What you _cannot_ do is reduce the 500ms response time by
> adding more replicas. You'll need to add more shards, which probably
> means re-indexing. Which is why I recommend pushing a test system
> to destruction before deciding on the final numbers.
>
> And having at least 2 replicas shard (leader and replica) is usually
> a very good thing because Solr will stop serving queries or indexing
> if all the replicas for any shard are down.
>
> Best,
> Erick
>
> On Wed, Apr 13, 2016 at 7:19 AM, Jay Potharaju 
> wrote:
> > Thanks for the feedback Eric.
> > I am assuming the number of replicas help in load balancing and
> reliability. That being said are there any recommendation for that, or is
> it dependent on query load and performance sla's.
> >
> > Any suggestions on aws setup?
> > Thanks
> >
> >
> >> On Apr 13, 2016, at 7:12 AM, Erick Erickson 
> wrote:
> >>
> >> For collections with this few nodes, 3 zookeepers are plenty. From
> >> what I've seen people don't go to 5 zookeepers until they have
> >> hundreds and hundreds of nodes.
> >>
> >> 100M docs can fit on 2 shards, I've actually seen many more. That
> >> said, if the docs are very large and/or the searchers are complex
> >> performance may not be what you need. Here's a long blog on
> >> testing a configuration to destruction to be _sure_ you can scale
> >> as you need:
> >>
> >>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >>
> >> Best,
> >> Erick
> >>
> >>> On Wed, Apr 13, 2016 at 6:47 AM, Jay Potharaju 
> wrote:
> >>> Hi,
> >>>
> >>> In my current setup I have about 30 million docs which will grow to 100
> >>> million by the end of the year. In order to accommodate scaling and
> query
> >>> load, i am planning to have atleast 2 shards and 2/3 replicas to begin
> >>> with. With the above solrcloud setup I plan to have 3 zookeepers in the
> >>> quorum.
> >>>
> >>> If the number of replicas and shards increases, the number of solr
> >>> instances will also go up. With keeping that in mind I was wondering if
> >>> there are any guidelines on the number of zk instances to solr
> instances.
> >>>
> >>> Secondly are there any recommendations for setting up solr in AWS?
> >>>
> >>> --
> >>> Thanks
> >>> Jay
>


Re: number of zookeeper & aws instances

2016-04-13 Thread Shawn Heisey
On 4/13/2016 9:34 AM, Daniel Collins wrote:
> Just to chip in, more ZKs are probably only necessary if you are doing NRT
> indexing.
>
> Loss of a single ZK (in a 3 machine setup) will block indexing for the time
> it takes to get that machine/instance back up

That would NOT block indexing.  If you have three zookeepers and you
lose one, SolrCloud functionality will not change.  If you lose TWO,
then you would no longer be able to index.

If you've seen a situation where losing one zookeeper out of three
causes indexing to stop, then either something is not configured
correctly, or you've encountered a bug.  I would bet more on a
misconfiguration than a bug.

A 5-node ensemble would allow you to lose a server and still be able to
take down another server for maintenance, without affecting SolrCloud
operation.

Thanks,
Shawn



Re: number of zookeeper & aws instances

2016-04-13 Thread Daniel Collins
Yeah, sorry, my maths was clearly flawed today, thanks for correcting me
Shawn.

What I meant was in a 3 ZK setup, if you lose one machine, you are okay,
but you are also "at risk", since losing anything else would lose quorum.
So in our NRT-style scenario, we would have to get that dead machine back
ASAP.

As Shawn says, we have a larger ensemble to allow for another machine
crashing during a planned maintenance window (so we are down 2 ZKs for some
period of time, and that is still ok).

It all depends how DR you need to be.

On 13 April 2016 at 16:48, Shawn Heisey  wrote:

> On 4/13/2016 9:34 AM, Daniel Collins wrote:
> > Just to chip in, more ZKs are probably only necessary if you are doing
> NRT
> > indexing.
> >
> > Loss of a single ZK (in a 3 machine setup) will block indexing for the
> time
> > it takes to get that machine/instance back up
>
> That would NOT block indexing.  If you have three zookeepers and you
> lose one, SolrCloud functionality will not change.  If you lose TWO,
> then you would no longer be able to index.
>
> If you've seen a situation where losing one zookeeper out of three
> causes indexing to stop, then either something is not configured
> correctly, or you've encountered a bug.  I would bet more on a
> misconfiguration than a bug.
>
> A 5-node ensemble would allow you to lose a server and still be able to
> take down another server for maintenance, without affecting SolrCloud
> operation.
>
> Thanks,
> Shawn
>
>


Re: Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Iana Bondarska
yes, it's solr_cloud, version is 4.8. Solrj version is 5.4.1.sorry, yes,
there are no systeminforequest, I'm sending
new GenericSolrRequest(SolrRequest.METHOD.GET, "/admin/info/system",
new MapSolrParams(ImmutableMap.of()))

2016-04-13 18:38 GMT+03:00 Shawn Heisey :

> On 4/13/2016 9:01 AM, Iana Bondarska wrote:
> > I'm trying to get solr version via solrj api. If I try to use
> > SystemInfoRequest without specifying collection -- I'm getting an error
> "No
> > collection is set and no default collection specified".
> > Could you tell me please, is there any way to get solr version without
> > specifying collection?
>
> What version of Solr?  What version of SolrJ?  is it SolrCloud?  Which
> SolrClient implementation are you using?
>
> I do not see anything in the Solr source code named "SystemInfoRequest",
> so it would probably be a good idea to share your .java file on the
> Internet and provide a URL to reach it.
>
> Thanks,
> Shawn
>
>


Commiting with no updates

2016-04-13 Thread Robert Brown

Hi,

My autoSoftCommit is set to 1 minute.  Does this actually affect things 
if no documents have actually been updated/created?  Will this also 
affect the clearing of any caches?


Is this also the same for hard commits, either with autoCommit or making 
an explicit http request to commit.


Thanks,
Rob



Re: Commiting with no updates

2016-04-13 Thread Chris Hostetter

the autoCommit settings initialize trackers so that they only fire after 
some updates have been made -- don't think of it as a cron that fires 
every X seconds, think of it as an update monitor that triggers timers.  
if an update comes in, and there are no timers currently active, a timer 
is created to to the commit in X seconds.

independend of autocommit, there is other intelegence lower down in solr 
to try and recognize if a redundet commit is fired but no changes will 
result in a new search, to prevent unnneccessary object churn and cache 
clearing.

: My autoSoftCommit is set to 1 minute.  Does this actually affect things if no
: documents have actually been updated/created?  Will this also affect the
: clearing of any caches?
: 
: Is this also the same for hard commits, either with autoCommit or making an
: explicit http request to commit.


-Hoss
http://www.lucidworks.com/


How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Thrinadh Kuppili
Hi,

 I have created 2 multivalued fields FirstName, Lastname

 In solr the values available are :
 FirstName": [ "Kim", "Jake","NATALIE", "Tammey"]
 LastName": [ "Lara", "Sharan","Taylor", "Taylor"]

 I am trying to search where firstName is Tammey and LastName is Taylor.

 I should be able to search firstname [4] and lastname [4] and get the
record
 but currently it is searching with the firstname [4] and lastname [3] which
 shouldn't not happen.

 Do let me know if more details are needed.

 Thnx




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Querying of multiple string value

2016-04-13 Thread Shawn Heisey
On 4/13/2016 9:25 AM, Zheng Lin Edwin Yeo wrote:
> Would like to find out, is there any way to do a multiple value query of a
> field that is of type String, besides using the OR parameters?
>
> Currently, I am using the OR parameters like
> http://localhost:8983/solr/collection1/highlight?q=id:collection1_0001 OR
> id:collection1_0002
>
> But this will get longer and longer if I were to have many records to
> retrieve based on their ID. The fieldType is string, so it is not possible
> to do things like sorting, more than or less than.
>
> I'm using Solr 5.4.0

The terms query parser was added in 4.10, and would do what you need:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

Thanks,
Shawn



Certain Plurals Not Working SOLR v3.6

2016-04-13 Thread Sara Woodmansee
Hello all,

We are having a website built with a search engine based on SOLR v3.6.  We 
submit terms (keywords) as singulars, and the code is supposed to add (search 
for) plurals.

Most search terms work, but we have an issue with plural terms that end in -ee, 
-oe, -ie, -ae,  and words that end in -s.  In comparison, the following work 
fine: words that end with -oo, -ue, -e, -a.

The developers have been unable to find a solution, but this has to be a common 
issue (?) Someone surely has found a solution to this problem??  I am not a 
coder, but we are quite desperate to find a solution to this issue, as search 
absolutely needs to work. Here was the developers feedback: "Unfortunately we 
tried to apply all the filters for stemming but this problem is not resolved". 
https://wiki.apache.org/solr/LanguageAnalysis#English

Any suggestions greatly appreciated.

Thanks!
Sara 
_

DO NOT WORK:
Plural terms that end in -ee, -oe, -ie, -ae,  and words that end in -s.  

Examples: 

palm tree = 0 results
palm trees = 21 results

bees = 1 result
bee = 0 results

dungarees = 1 result
dungaree = 0 results

shoes = 1 result
shoe = 0 results

toe = 1 result
toes = 0 results

tie = 1 result
ties = 0 results

Crees = 1 result
Cree = 0 results

dais = 1 result
daises = 0 results

bias = 1 result
biases = 0 results

dress = 1 result
dresses = 0 results
_

WORK:
In comparison, the following work fine:  words that end with -oo, -ue, -e, -a

Examples: 
tide = 1 result
tides = 1 results

hues = 2 results
hue = 2 results

dakota = 1 result
dakotas = 1 result

loo = 1 result
loos = 1 result
_



Re: Solrj: SystemInfoRequest fails if no default collection specified.

2016-04-13 Thread Shawn Heisey
On 4/13/2016 10:15 AM, Iana Bondarska wrote:
> yes, it's solr_cloud, version is 4.8. Solrj version is 5.4.1.sorry, yes,
> there are no systeminforequest, I'm sending
> new GenericSolrRequest(SolrRequest.METHOD.GET, "/admin/info/system",
> new MapSolrParams(ImmutableMap.of()))

Side note:  When running SolrCloud, you should not use a significantly
different SolrJ version.  Significantly different versions may not
communicate correctly -- especially when the *major* version is different.

SolrJ is generating the error you see, because you're using
CloudSolrClient and that client requires a collection, unless the
handler that you are accessing is one that works on the whole cluster,
such as /admin/collections.

Because the /admin/info/system request is node-specific, it doesn't make
sense to try to use it with CloudSolrClient -- you won't know which node
it's being sent to.  Use HttpSolrClient instead.  See this comment I
wrote on an issue in Jira, with a code example:

https://issues.apache.org/jira/browse/SOLR-8216?focusedCommentId=14976963&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14976963

Thanks,
Shawn



Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Erick Erickson
multivalued fields (parallel) simply do not support this type of
search. You'd have to do something like index kim_lara, jake_sharan,
natalie_taylor etc then search for those terms.

A more general option is to put these in a multiValued field "name",
with a positionIncrementGap of, say, 10. Now searching for phrases
will find these, as "kim lara", which would also match "kim a lara"
assuming you searched with a "slop" of 2.

you won't get matches across names as long as the slop is < 10 due to
positionIncrementGap.

Best,
Erick

On Wed, Apr 13, 2016 at 10:23 AM, Thrinadh  Kuppili
 wrote:
> Hi,
>
>  I have created 2 multivalued fields FirstName, Lastname
>
>  In solr the values available are :
>  FirstName": [ "Kim", "Jake","NATALIE", "Tammey"]
>  LastName": [ "Lara", "Sharan","Taylor", "Taylor"]
>
>  I am trying to search where firstName is Tammey and LastName is Taylor.
>
>  I should be able to search firstname [4] and lastname [4] and get the
> record
>  but currently it is searching with the firstname [4] and lastname [3] which
>  shouldn't not happen.
>
>  Do let me know if more details are needed.
>
>  Thnx
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Ahmet Arslan
Hi Thrinadh,

I think you can pull something together with FieldMaskingSpanQuery
http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html

Ahmet



On Wednesday, April 13, 2016 8:24 PM, Thrinadh Kuppili  
wrote:
Hi,

I have created 2 multivalued fields FirstName, Lastname

In solr the values available are :
FirstName": [ "Kim", "Jake","NATALIE", "Tammey"]
LastName": [ "Lara", "Sharan","Taylor", "Taylor"]

I am trying to search where firstName is Tammey and LastName is Taylor.

I should be able to search firstname [4] and lastname [4] and get the
record
but currently it is searching with the firstname [4] and lastname [3] which
shouldn't not happen.

Do let me know if more details are needed.

Thnx




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Jack Krupansky
I was also going to point out the field masking span query, but... also
that it is at the Lucene level and not surfaced in Solr:
http://lucene.apache.org/core/6_0_0/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html

Also see:
https://issues.apache.org/jira/browse/LUCENE-1494

But no hint anywhere that I know of for how to surface this Lucene feature
in Solr.

I would suggest the workaround of using an update processor to combine the
first and last names into a single multivalues field.

-- Jack Krupansky

On Wed, Apr 13, 2016 at 4:20 PM, Ahmet Arslan 
wrote:

> Hi Thrinadh,
>
> I think you can pull something together with FieldMaskingSpanQuery
>
> http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
>
> Ahmet
>
>
>
> On Wednesday, April 13, 2016 8:24 PM, Thrinadh Kuppili <
> thrinadh...@gmail.com> wrote:
> Hi,
>
> I have created 2 multivalued fields FirstName, Lastname
>
> In solr the values available are :
> FirstName": [ "Kim", "Jake","NATALIE", "Tammey"]
> LastName": [ "Lara", "Sharan","Taylor", "Taylor"]
>
> I am trying to search where firstName is Tammey and LastName is Taylor.
>
> I should be able to search firstname [4] and lastname [4] and get the
> record
> but currently it is searching with the firstname [4] and lastname [3] which
> shouldn't not happen.
>
> Do let me know if more details are needed.
>
> Thnx
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Get number of results in filtered query

2016-04-13 Thread Fundera Developer

Hi all,

we are developing a search engine in which all the possible results have 
one or more countries associated. If, apart from writing the query, the 
user selects a country, we use a filterquery to restrict the results to 
those that match the query and are associated to that country. Nothing 
spectacular so far  :-D


However, we would like to show the number of results that are returned 
by the unfiltered query, since we already have the number of results 
associated to each country as we are also faceting on that field. Is it 
possible to have that number without executing the query twice?


Thanks in advance!



Problem retaining PDF text

2016-04-13 Thread Alan G Quan
I am indexing PDF documents in Solr 5.3.0 like this:
curl 
"http://localhost:8983/solr/mycore1/update/extract?literal.id=101&commit=true"; 
-F "myfile=@101.pdf".
This works fine and I can search for keywords in the PDF text in Solr and it 
finds the document correctly.  But when I make any subsequent changes to the 
Solr record for that document, using atomic updates "set" or "add", the PDF 
text is lost.  I verified this by searching for the same keyword after the 
update and the document is not found.  The Solr record for the document with 
the literal id field value "101" is still there after the update but the text 
is gone.  Why does Solr delete the PDF text after any update of the record for 
the document, and is there a way to prevent that?

Regards,
Alan


Re: Get number of results in filtered query

2016-04-13 Thread Jack Krupansky
If you just do a faceted query without the filter, each facet will give you
the number of results for that country and numResults will give you the
total number of results across all countries. But once you apply one or
more filters, numResults reflects onl the post-filtering documents.

-- Jack Krupansky

On Wed, Apr 13, 2016 at 4:43 PM, Fundera Developer <
funderadevelo...@outlook.com> wrote:

> Hi all,
>
> we are developing a search engine in which all the possible results have
> one or more countries associated. If, apart from writing the query, the
> user selects a country, we use a filterquery to restrict the results to
> those that match the query and are associated to that country. Nothing
> spectacular so far  :-D
>
> However, we would like to show the number of results that are returned by
> the unfiltered query, since we already have the number of results
> associated to each country as we are also faceting on that field. Is it
> possible to have that number without executing the query twice?
>
> Thanks in advance!
>
>


Re: Problem retaining PDF text

2016-04-13 Thread Alexandre Rafalovitch
Atomic update requires to reload the content of all _other_ fields to
reconstruct full document before putting it back into Lucene index.
That's because Lucene does not support 'update' and every update
actually deletes the original and recreates it.

The problem is that your PDF text is probably not stored. So, when you
do the update, it does not form a part of the new document and just
disappears. Changing that to stored should fix the issue, at the cost
of storing untokenized text. If that has performance implications, you
could look at lazy loading fields setting.

Regards,
Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 14 April 2016 at 08:49, Alan G Quan  wrote:
> I am indexing PDF documents in Solr 5.3.0 like this:
> curl 
> "http://localhost:8983/solr/mycore1/update/extract?literal.id=101&commit=true";
>  -F "myfile=@101.pdf".
> This works fine and I can search for keywords in the PDF text in Solr and it 
> finds the document correctly.  But when I make any subsequent changes to the 
> Solr record for that document, using atomic updates "set" or "add", the PDF 
> text is lost.  I verified this by searching for the same keyword after the 
> update and the document is not found.  The Solr record for the document with 
> the literal id field value "101" is still there after the update but the text 
> is gone.  Why does Solr delete the PDF text after any update of the record 
> for the document, and is there a way to prevent that?
>
> Regards,
> Alan


Re: Get number of results in filtered query

2016-04-13 Thread Alexandre Rafalovitch
Sounds like a job for tagging and excluding:
https://cwiki.apache.org/confluence/display/solr/Faceting?focusedCommentId=42569404#Faceting-TaggingandExcludingFilters

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 14 April 2016 at 06:43, Fundera Developer
 wrote:
> Hi all,
>
> we are developing a search engine in which all the possible results have one
> or more countries associated. If, apart from writing the query, the user
> selects a country, we use a filterquery to restrict the results to those
> that match the query and are associated to that country. Nothing spectacular
> so far  :-D
>
> However, we would like to show the number of results that are returned by
> the unfiltered query, since we already have the number of results associated
> to each country as we are also faceting on that field. Is it possible to
> have that number without executing the query twice?
>
> Thanks in advance!
>


Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Thrinadh Kuppili
I will look into it. Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901p4270013.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Thrinadh Kuppili
Thanks Eric,

It tried by indexing with one multivalued field as name but I am not getting
the expected result, below are the stored data and query to fetch from solr 

Name: James_Cook

fq = name:Jam*

I should be able to search using first name (james) or Last name ( cook) or
partial names (jam or coo)
so I appended * before searching .
 
The field Name is defined as sol.strfield with no filters/tokenizer's.

Thnx
Thrinadh








--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901p4270011.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search for a First, Last of contact which are stored in differnet multivalued fields

2016-04-13 Thread Erick Erickson
String fields are unanalyzed so this simply will not work. String
fields are case sensitive, don't split on whitespace or non-letters
etc.

You might get significant mileage out of the admin/analysis page to
see exactly what tokenizations occur. Your original question implied
you were looking for complete names. For partial fields you need
analyzed terms.

Try the multiValued approach with positionIncrementGap then. And also
consider complexPhraseQueryParser, it handles things like "jam* cook"
where the usual parsers don't.
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Best,
Erick

On Wed, Apr 13, 2016 at 4:54 PM, Thrinadh  Kuppili
 wrote:
> Thanks Eric,
>
> It tried by indexing with one multivalued field as name but I am not getting
> the expected result, below are the stored data and query to fetch from solr
>
> Name: James_Cook
>
> fq = name:Jam*
>
> I should be able to search using first name (james) or Last name ( cook) or
> partial names (jam or coo)
> so I appended * before searching .
>
> The field Name is defined as sol.strfield with no filters/tokenizer's.
>
> Thnx
> Thrinadh
>
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-search-for-a-First-Last-of-contact-which-are-stored-in-differnet-multivalued-fields-tp4269901p4270011.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Querying of multiple string value

2016-04-13 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Thanks for the reply. It works.

Regards,
Edwin


On 14 April 2016 at 01:40, Shawn Heisey  wrote:

> On 4/13/2016 9:25 AM, Zheng Lin Edwin Yeo wrote:
> > Would like to find out, is there any way to do a multiple value query of
> a
> > field that is of type String, besides using the OR parameters?
> >
> > Currently, I am using the OR parameters like
> > http://localhost:8983/solr/collection1/highlight?q=id:collection1_0001
> OR
> > id:collection1_0002
> >
> > But this will get longer and longer if I were to have many records to
> > retrieve based on their ID. The fieldType is string, so it is not
> possible
> > to do things like sorting, more than or less than.
> >
> > I'm using Solr 5.4.0
>
> The terms query parser was added in 4.10, and would do what you need:
>
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser
>
> Thanks,
> Shawn
>
>


Optimal indexing speed in Solr

2016-04-13 Thread Zheng Lin Edwin Yeo
Hi,

Would like to find out, what is the optimal indexing speed in Solr?

Previously, I managed to get more than 3GB/hour, but now the speed has drop
to 0.7GB/hr. What could be the potential reason behind this?

Besides the index size getting bigger, I have only added in more
collections into the core and added another field. Other than that nothing
else has been changed..

Could the source file which I'm indexing made a difference in the indexing
speed?

I'm using Solr 5.4.0 for now, but will be planning to migrate to Solr 6.0.0.

Regards,
Edwin


How to declare field type for IntPoint field in solr 6.0 schema?

2016-04-13 Thread Rafis Ismagilov
Should it be PointType, BinaryField, or something else. All examples use 
TrieIntField for int.
Thanks,Rafis