using variables in data-config.xml

2016-08-11 Thread Prasanna S. Dhakephalkar
Hi,

 

I have 7 cores.

 

In each data-config.xml , I have

 



 

There is similar structures on production, testing and partner instances.

So if I have to make changes I have to do in all data-config files.

 

I am looking for a mechanism where some variables 

Like 

dbname=abcd

dbusre=username

dbpass=password

are defined in solr.xml file under

../server/solr_test directory

 

and can be referenced
.../servre/solr_test/{core_1,core_2,...core_7}/conf/data-config.xml

 

Tired looking on net, I found articles that are telling me to edit
solrconfig.xml in each core, that does not satisfy my need.

 

I am using solr 5.3.1

 

Regards,

 

Prasanna.



Re: commit it taking 1300 ms

2016-08-11 Thread Emir Arnautovic

Hi Midas,

1. How many indexing threads?
2. Do you batch documents and what is your batch size?
3. How frequently do you commit?

I would recommend:
1. Move commits to Solr (set auto soft commit to max allowed time)
2. Use batches (bulks)
3. tune bulk size and number of threads to achieve max performance.

Thanks,
Emir


On 11.08.2016 08:21, Midas A wrote:

Emir,

other queries:

a) Solr cloud : NO
b) 
c)  
d) 
e) we are using multi threaded system.

On Thu, Aug 11, 2016 at 11:48 AM, Midas A  wrote:


Emir,

we post json documents through the curl it takes the time (same time i
would like to say that we are not hard committing ). that curl takes time
i.e. 1.3 sec.

On Wed, Aug 10, 2016 at 2:29 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Midas,

According to your autocommit configuration and your worry about commit
time I assume that you are doing explicit commits from client code and that
1.3s is client observed commit time. If that is the case, than it might be
opening searcher that is taking time.

How do you index data - single threaded or multithreaded? How frequently
do you commit from client? Can you let Solr do soft commits instead of
explicitly committing? Do you have warmup queries? Is this SolrCloud? What
is number of servers (what spec), shards, docs?

In any case monitoring can give you more info about server/Solr behavior
and help you diagnose issues more easily/precisely. One such monitoring
tool is our SPM .

Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On 10.08.2016 05:20, Midas A wrote:


Thanks for replying

index size:9GB
2000 docs/sec.

Actually earlier it was taking less but suddenly it has increased .

Currently we do not have any monitoring  tool.

On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

Hi Midas,

Can you give us more details on your index: size, number of new docs
between commits. Why do you think 1.3s for commit is to much and why do
you
need it to take less? Did you do any system/Solr monitoring?

Emir


On 09.08.2016 14:10, Midas A wrote:

please reply it is urgent.

On Tue, Aug 9, 2016 at 11:17 AM, Midas A  wrote:

Hi ,


commit is taking more than 1300 ms . what should i check on server.

below is my configuration .

 ${solr.autoCommit.maxTime:15000} <
openSearcher>false  

${solr.autoSoftCommit.maxTime:-1} 



--

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/





--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



RE: using variables in data-config.xml

2016-08-11 Thread Srinivasa Meenavali
Hi Prasanna,


You can use Request Parameters in Solr 5.5 but not in your version . 



"these parameters can be passed to the full-import command or defined in the 
 section in sol
rconfig.xml. This example shows the parameters with the full-import command:
dataimport?command=full-import&jdbcurl=jdbc:hsqldb:./example-DIH/hsqldb/ex&jdbcuse
r=sa&jdbcpassword=secret"

Regards
Srinivas Meenavalli

-Original Message-
From: Prasanna S. Dhakephalkar [mailto:prasann...@merajob.in] 
Sent: Thursday, August 11, 2016 4:40 PM
To: solr-user@lucene.apache.org
Subject: using variables in data-config.xml

Hi,

 

I have 7 cores.

 

In each data-config.xml , I have

 



 

There is similar structures on production, testing and partner instances.

So if I have to make changes I have to do in all data-config files.

 

I am looking for a mechanism where some variables 

Like 

dbname=abcd

dbusre=username

dbpass=password

are defined in solr.xml file under

../server/solr_test directory

 

and can be referenced
.../servre/solr_test/{core_1,core_2,...core_7}/conf/data-config.xml

 

Tired looking on net, I found articles that are telling me to edit 
solrconfig.xml in each core, that does not satisfy my need.

 

I am using solr 5.3.1

 

Regards,

 

Prasanna.


This electronic mail transmission may contain privileged,
confidential and/or proprietary information intended only for the
person(s) named.  Any use, distribution, copying or disclosure to
another person is strictly prohibited.  If you are not the
addressee indicated in this message (or responsible for delivery
of the message to such person), you may not copy or deliver this
message to anyone. In such case, you should destroy this message
and kindly notify the sender by reply email.


Authenticating the SOLR 6.0 using Kerberos Authentication

2016-08-11 Thread Preeti Bhat
Hi All,

We are running a SOLR 6.0 in AWS EC2 instance (Windows Server 2012) . Based on 
the below URL, I found that we would be able to Authenticate the SOLR in 
standalone mode using "Kerberos Authentication".
But since we are having this in the AWS, we don't have the control on the 
domain and Kerberos is using the Users from Active Directory which we are 
unable to add in our case.

Could anyone please advise on any other authentication process which we might 
be able to use or any process steps on Kerberos?

https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins



Thanks and Regards,
Preeti Bhat



NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.




Wildcard search not working

2016-08-11 Thread Ribeaud, Christian (Ext)
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian



Re: Wildcard search not working

2016-08-11 Thread Ahmet Arslan
Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian


Consume sql response using solrj

2016-08-11 Thread Pablo Anzorena
Hey,

I'm trying to get the response of solr via QueryResponse using
QueryResponse queryResponse = client.query(solrParams); (where client is a
CloudSolrClient)

The error it thows is:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://tywin:8983/solr/testcollection1_shard1_replica1:
Expected mime type application/octet-stream but got text/plain.
{"result-set":{"docs":[
{"count(*)":5304,"d1":2},
{"count(*)":5160,"d1":1},
{"count(*)":5016,"d1":3},
{"count(*)":4893,"d1":4},
{"count(*)":4824,"d1":5},
{"EOF":true,"RESPONSE_TIME":11}]}}
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558)

Then I tryed to implement a custom ResponseParser that override the
getContentType() and returns "text/plain", but it returns another error.

So... Is it a way to get the sql response via this method?

I make it works via Connection and ResultSets, but I need to use the other
way (if possible).

Thanks!


Re: Consume sql response using solrj

2016-08-11 Thread Joel Bernstein
There are two ways to do this with SolrJ:

1) Use the JDBC driver.

2) Use the SolrStream to send the request and then read() the Tuples. This
is what the JDBC driver does under the covers. The sample code can be found
here:
https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java

The constructStream() method creates a SolrStream with the request.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena 
wrote:

> Hey,
>
> I'm trying to get the response of solr via QueryResponse using
> QueryResponse queryResponse = client.query(solrParams); (where client is a
> CloudSolrClient)
>
> The error it thows is:
>
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> Error
> from server at http://tywin:8983/solr/testcollection1_shard1_replica1:
> Expected mime type application/octet-stream but got text/plain.
> {"result-set":{"docs":[
> {"count(*)":5304,"d1":2},
> {"count(*)":5160,"d1":1},
> {"count(*)":5016,"d1":3},
> {"count(*)":4893,"d1":4},
> {"count(*)":4824,"d1":5},
> {"EOF":true,"RESPONSE_TIME":11}]}}
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.
> executeMethod(HttpSolrClient.java:558)
>
> Then I tryed to implement a custom ResponseParser that override the
> getContentType() and returns "text/plain", but it returns another error.
>
> So... Is it a way to get the sql response via this method?
>
> I make it works via Connection and ResultSets, but I need to use the other
> way (if possible).
>
> Thanks!
>


Re: AnalyticsQuery fails on a sharded collection

2016-08-11 Thread tedsolr
OK, some more info ... it's not aggregating because the doc values it's using
for grouping are the unique ID field's. There are some big differences in
the whole flow between searches against a single shard collection, and
searches against a multi-shard collection. In a single shard collection the
AnalyticsQuery is called one time, and there's only one pass through the
delegating collector. If someone could explain what's going on in a
multi-sharded search that would help a lot I think. My test collection has
two shards each one has a replica.

For this search
.../aggr?q=*:*&fl=VENDOR_NAME&sort=VENDOR_NAME+asc 
The user has selected just one field to view, so I make VENDOR_NAME the
group by field.

This is what I see while debugging:
1. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats]
2. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
[AggregationStats]
3. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
[AggregationStats]
4. getAnalyticsCollector() is called (fl is id + [AggregationStats])
5. getAnalyticsCollector() is called again (fl is id + [AggregationStats])
6. custom DelegatingCollector finish() is called
7. custom DelegatingCollector finish() is called
8. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats] + id +  [AggregationStats]
9. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats] + id +  [AggregationStats]

And from the log:

INFO  - 2016-08-11 09:19:47.245; [ShardTest1 shard1_1 core_node4
ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/&rows=10&version=2&q=*:*&NOW=1470925120206&isShard=true&wt=javabin&_=1470925120222}
hits=12096 status=0 QTime=64734 

INFO  - 2016-08-11 09:19:48.876; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/&rows=10&version=2&q=*:*&NOW=1470925120206&isShard=true&wt=javabin&_=1470925120222}
hits=12062 status=0 QTime=66365 

INFO  - 2016-08-11 09:19:50.952; [ShardTest1 shard1_1 core_node4
ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&shards.purpose=64&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/&version=2&q=*:*&NOW=1470925120206&ids=100713,940122,44812,210965,584851&isShard=true&wt=javabin&_=1470925120222}
status=0 QTime=2070 

INFO  - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&shards.purpose=64&fq={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/&version=2&q=*:*&NOW=1470925120206&ids=533737,44864,100672,940123,96752&isShard=true&wt=javabin&_=1470925120222}
status=0 QTime=4293 

INFO  - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={q=*:*&indent=true&fl=VENDOR_NAME&sort=VENDOR_NAME+asc&wt=json&_=1470925120222}
hits=24158 status=0 QTime=72972 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291301.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Consume sql response using solrj

2016-08-11 Thread Pablo Anzorena
Excellent!

Thanks Joel

2016-08-11 11:19 GMT-03:00 Joel Bernstein :

> There are two ways to do this with SolrJ:
>
> 1) Use the JDBC driver.
>
> 2) Use the SolrStream to send the request and then read() the Tuples. This
> is what the JDBC driver does under the covers. The sample code can be found
> here:
> https://github.com/apache/lucene-solr/blob/master/solr/
> solrj/src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
>
> The constructStream() method creates a SolrStream with the request.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena 
> wrote:
>
> > Hey,
> >
> > I'm trying to get the response of solr via QueryResponse using
> > QueryResponse queryResponse = client.query(solrParams); (where client is
> a
> > CloudSolrClient)
> >
> > The error it thows is:
> >
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> > Error
> > from server at http://tywin:8983/solr/testcollection1_shard1_replica1:
> > Expected mime type application/octet-stream but got text/plain.
> > {"result-set":{"docs":[
> > {"count(*)":5304,"d1":2},
> > {"count(*)":5160,"d1":1},
> > {"count(*)":5016,"d1":3},
> > {"count(*)":4893,"d1":4},
> > {"count(*)":4824,"d1":5},
> > {"EOF":true,"RESPONSE_TIME":11}]}}
> > at
> > org.apache.solr.client.solrj.impl.HttpSolrClient.
> > executeMethod(HttpSolrClient.java:558)
> >
> > Then I tryed to implement a custom ResponseParser that override the
> > getContentType() and returns "text/plain", but it returns another error.
> >
> > So... Is it a way to get the sql response via this method?
> >
> > I make it works via Connection and ResultSets, but I need to use the
> other
> > way (if possible).
> >
> > Thanks!
> >
>


Re: AnalyticsQuery fails on a sharded collection

2016-08-11 Thread Joel Bernstein
Yes the AnalyticsQuery is being called twice in the logs, which is not a
good thing. Originally I believe this was not the case but changes in the
QueryComponent in later release have caused this to happen. The test cases
aren't broken by this so it didn't get caught.

The actual merge of the results from the AnalyticsQuery, which is done in
the MergeStrategy, will only happen on the first stage. In the second stage
the results from the Analytics query should be ignored. As a work around
for the double call to the AnalyticsQuery you can look for the "ids" param
in your Analytics query and skip gathering the analytics if it's present.
The ids param is sent in the second phase of a distributed search.

What you're running into here is that the MergeStrategy is not really in
use in combination with the AnalyticsQuery. There are users that use the
MergeStrategy to handle custom merging of documents to produce custom
rankings. But the AnalyticsQuery really hasn't been used much with the
MergeStrategy that I'm aware of. So this has not been reported before.

I have moved away from using the MergeStrategy for merging custom
analytics. I'll give you a little context for how this has evolved.

The MergeStrategy was originally introduced for an e-commerce customer that
wanted to produce custom rankings. As part of that work the AnalyticsQuery
was added to support custom analytics. And the MergeStrategy supported that
as well.

Later, Streaming Expressions were added which took control of the merge in
a much more elegant way then the MergeStrategy. So now there are features
in Solr that nicely combine an AnalyticsQuery which is merged through the
Streaming Expression framework. The FeatureSelectionStream and the
TextLogitStream use this approach. These two streams are in master and
branch_6x if you want to see how they operate.



















Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Aug 11, 2016 at 10:29 AM, tedsolr  wrote:

> OK, some more info ... it's not aggregating because the doc values it's
> using
> for grouping are the unique ID field's. There are some big differences in
> the whole flow between searches against a single shard collection, and
> searches against a multi-shard collection. In a single shard collection the
> AnalyticsQuery is called one time, and there's only one pass through the
> delegating collector. If someone could explain what's going on in a
> multi-sharded search that would help a lot I think. My test collection has
> two shards each one has a replica.
>
> For this search
> .../aggr?q=*:*&fl=VENDOR_NAME&sort=VENDOR_NAME+asc
> The user has selected just one field to view, so I make VENDOR_NAME the
> group by field.
>
> This is what I see while debugging:
> 1. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats]
> 2. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
> [AggregationStats]
> 3. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
> [AggregationStats]
> 4. getAnalyticsCollector() is called (fl is id + [AggregationStats])
> 5. getAnalyticsCollector() is called again (fl is id + [AggregationStats])
> 6. custom DelegatingCollector finish() is called
> 7. custom DelegatingCollector finish() is called
> 8. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats] + id +  [AggregationStats]
> 9. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats] + id +  [AggregationStats]
>
> And from the log:
>
> INFO  - 2016-08-11 09:19:47.245; [ShardTest1 shard1_1 core_node4
> ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_1_replica2/&rows=10&version=2&q=*:*&NOW=
> 1470925120206&isShard=true&wt=javabin&_=1470925120222}
> hits=12096 status=0 QTime=64734
>
> INFO  - 2016-08-11 09:19:48.876; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_0_replica2/&rows=10&version=2&q=*:*&NOW=
> 1470925120206&isShard=true&wt=javabin&_=1470925120222}
> hits=12062 status=0 QTime=66365
>
> INFO  - 2016-08-11 09:19:50.952; [ShardTest1 shard1_1 core_node4
> ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&f

Re: commit it taking 1300 ms

2016-08-11 Thread Erick Erickson
bq:  we post json documents through the curl it takes the time (same time i
would like to say that we are not hard committing ). that curl takes time
i.e. 1.3 sec.

OK, I'm really confused. _what_ is taking 1.3 seconds? When you said
commit, I was thinking of Solr's commit operation, which is totally distinct
from just adding a doc to the index. But I read the above statement
as you're saying it takes 1.3 seconds just to send a doc to Solr.

Let's see the exact curl command you're using please?

Best,
Erick


On Thu, Aug 11, 2016 at 5:32 AM, Emir Arnautovic
 wrote:
> Hi Midas,
>
> 1. How many indexing threads?
> 2. Do you batch documents and what is your batch size?
> 3. How frequently do you commit?
>
> I would recommend:
> 1. Move commits to Solr (set auto soft commit to max allowed time)
> 2. Use batches (bulks)
> 3. tune bulk size and number of threads to achieve max performance.
>
> Thanks,
> Emir
>
>
>
> On 11.08.2016 08:21, Midas A wrote:
>>
>> Emir,
>>
>> other queries:
>>
>> a) Solr cloud : NO
>> b) > size="5000" initialSize="5000" autowarmCount="10"/>
>> c)  > size="1000" initialSize="1000" autowarmCount="10"/>
>> d) > size="1000" initialSize="1000" autowarmCount="10"/>
>> e) we are using multi threaded system.
>>
>> On Thu, Aug 11, 2016 at 11:48 AM, Midas A  wrote:
>>
>>> Emir,
>>>
>>> we post json documents through the curl it takes the time (same time i
>>> would like to say that we are not hard committing ). that curl takes time
>>> i.e. 1.3 sec.
>>>
>>> On Wed, Aug 10, 2016 at 2:29 PM, Emir Arnautovic <
>>> emir.arnauto...@sematext.com> wrote:
>>>
 Hi Midas,

 According to your autocommit configuration and your worry about commit
 time I assume that you are doing explicit commits from client code and
 that
 1.3s is client observed commit time. If that is the case, than it might
 be
 opening searcher that is taking time.

 How do you index data - single threaded or multithreaded? How frequently
 do you commit from client? Can you let Solr do soft commits instead of
 explicitly committing? Do you have warmup queries? Is this SolrCloud?
 What
 is number of servers (what spec), shards, docs?

 In any case monitoring can give you more info about server/Solr behavior
 and help you diagnose issues more easily/precisely. One such monitoring
 tool is our SPM .

 Regards,
 Emir

 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr & Elasticsearch Support * http://sematext.com/

 On 10.08.2016 05:20, Midas A wrote:

> Thanks for replying
>
> index size:9GB
> 2000 docs/sec.
>
> Actually earlier it was taking less but suddenly it has increased .
>
> Currently we do not have any monitoring  tool.
>
> On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
> Hi Midas,
>>
>> Can you give us more details on your index: size, number of new docs
>> between commits. Why do you think 1.3s for commit is to much and why
>> do
>> you
>> need it to take less? Did you do any system/Solr monitoring?
>>
>> Emir
>>
>>
>> On 09.08.2016 14:10, Midas A wrote:
>>
>> please reply it is urgent.
>>>
>>> On Tue, Aug 9, 2016 at 11:17 AM, Midas A 
>>> wrote:
>>>
>>> Hi ,
>>>
 commit is taking more than 1300 ms . what should i check on server.

 below is my configuration .

  ${solr.autoCommit.maxTime:15000} <
 openSearcher>false  
 
 ${solr.autoSoftCommit.maxTime:-1} 



 --
>>
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>


Re: Consume sql response using solrj

2016-08-11 Thread Pablo Anzorena
Joel, one more thing.

Is there anyway to use the sql and the lucene query syntax? The thing is
that my bussiness application is tightly coupled with the lucene query
syntax, so I need a way to use both the sql features (without the where
clause) and the query syntax of lucene.

Thanks.

2016-08-11 11:40 GMT-03:00 Pablo Anzorena :

> Excellent!
>
> Thanks Joel
>
> 2016-08-11 11:19 GMT-03:00 Joel Bernstein :
>
>> There are two ways to do this with SolrJ:
>>
>> 1) Use the JDBC driver.
>>
>> 2) Use the SolrStream to send the request and then read() the Tuples. This
>> is what the JDBC driver does under the covers. The sample code can be
>> found
>> here:
>> https://github.com/apache/lucene-solr/blob/master/solr/solrj
>> /src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
>>
>> The constructStream() method creates a SolrStream with the request.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena > >
>> wrote:
>>
>> > Hey,
>> >
>> > I'm trying to get the response of solr via QueryResponse using
>> > QueryResponse queryResponse = client.query(solrParams); (where client
>> is a
>> > CloudSolrClient)
>> >
>> > The error it thows is:
>> >
>> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>> > Error
>> > from server at http://tywin:8983/solr/testcollection1_shard1_replica1:
>> > Expected mime type application/octet-stream but got text/plain.
>> > {"result-set":{"docs":[
>> > {"count(*)":5304,"d1":2},
>> > {"count(*)":5160,"d1":1},
>> > {"count(*)":5016,"d1":3},
>> > {"count(*)":4893,"d1":4},
>> > {"count(*)":4824,"d1":5},
>> > {"EOF":true,"RESPONSE_TIME":11}]}}
>> > at
>> > org.apache.solr.client.solrj.impl.HttpSolrClient.
>> > executeMethod(HttpSolrClient.java:558)
>> >
>> > Then I tryed to implement a custom ResponseParser that override the
>> > getContentType() and returns "text/plain", but it returns another error.
>> >
>> > So... Is it a way to get the sql response via this method?
>> >
>> > I make it works via Connection and ResultSets, but I need to use the
>> other
>> > way (if possible).
>> >
>> > Thanks!
>> >
>>
>
>


Re: Consume sql response using solrj

2016-08-11 Thread Joel Bernstein
There are no test cases for this but you can try this syntax:

select a from b where _query_=(a:c AND d:f)

This should get translated to:

_query_:(a:c AND d:f)

This link describes the behavior of _query_
https://lucidworks.com/blog/2009/03/31/nested-queries-in-solr/

Just not positive how the SQL parser will treat the : in the query.




Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Aug 11, 2016 at 12:22 PM, Pablo Anzorena 
wrote:

> Joel, one more thing.
>
> Is there anyway to use the sql and the lucene query syntax? The thing is
> that my bussiness application is tightly coupled with the lucene query
> syntax, so I need a way to use both the sql features (without the where
> clause) and the query syntax of lucene.
>
> Thanks.
>
> 2016-08-11 11:40 GMT-03:00 Pablo Anzorena :
>
> > Excellent!
> >
> > Thanks Joel
> >
> > 2016-08-11 11:19 GMT-03:00 Joel Bernstein :
> >
> >> There are two ways to do this with SolrJ:
> >>
> >> 1) Use the JDBC driver.
> >>
> >> 2) Use the SolrStream to send the request and then read() the Tuples.
> This
> >> is what the JDBC driver does under the covers. The sample code can be
> >> found
> >> here:
> >> https://github.com/apache/lucene-solr/blob/master/solr/solrj
> >> /src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
> >>
> >> The constructStream() method creates a SolrStream with the request.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena <
> anzorena.f...@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hey,
> >> >
> >> > I'm trying to get the response of solr via QueryResponse using
> >> > QueryResponse queryResponse = client.query(solrParams); (where client
> >> is a
> >> > CloudSolrClient)
> >> >
> >> > The error it thows is:
> >> >
> >> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
> >> > Error
> >> > from server at http://tywin:8983/solr/testcollection1_shard1_replica1
> :
> >> > Expected mime type application/octet-stream but got text/plain.
> >> > {"result-set":{"docs":[
> >> > {"count(*)":5304,"d1":2},
> >> > {"count(*)":5160,"d1":1},
> >> > {"count(*)":5016,"d1":3},
> >> > {"count(*)":4893,"d1":4},
> >> > {"count(*)":4824,"d1":5},
> >> > {"EOF":true,"RESPONSE_TIME":11}]}}
> >> > at
> >> > org.apache.solr.client.solrj.impl.HttpSolrClient.
> >> > executeMethod(HttpSolrClient.java:558)
> >> >
> >> > Then I tryed to implement a custom ResponseParser that override the
> >> > getContentType() and returns "text/plain", but it returns another
> error.
> >> >
> >> > So... Is it a way to get the sql response via this method?
> >> >
> >> > I make it works via Connection and ResultSets, but I need to use the
> >> other
> >> > way (if possible).
> >> >
> >> > Thanks!
> >> >
> >>
> >
> >
>


Create collection on all nodes using the Collection API

2016-08-11 Thread Alexandre Drouin
Hi,

What would be the best/easiest way to create a collection (only one shard) 
using the Collection API and have a replica created on all live nodes?

Using the 'create collection' API, I can use the 'replicationFactor' parameter 
and specify the number of replica I want for my collection.  So if I have 3 
lives nodes I can say 'replicationFactor=3' and my collection will have a 
replica on all lives nodes.  However I do not want to 'hardcode' my number of 
live nodes for obvious reasons, so because of that I have the following 
questions:

1) Is there a way to create a collection (only one shard) and having a replica 
of the shard on all live nodes?

2) Assuming #1 is not possible, is it possible to have the list of live nodes ? 
 If I can have the list of live nodes I could detect the number required for 
the replicationFactor parameter.

Thanks

Alexandre Drouin


Re: Consume sql response using solrj

2016-08-11 Thread Joel Bernstein
Actually try this:

select a from b where _query_='a:b'

*This produces the query:*

(_query_:"a:b")

which should run.





Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Aug 11, 2016 at 1:04 PM, Joel Bernstein  wrote:

> There are no test cases for this but you can try this syntax:
>
> select a from b where _query_=(a:c AND d:f)
>
> This should get translated to:
>
> _query_:(a:c AND d:f)
>
> This link describes the behavior of _query_ https://lucidworks.
> com/blog/2009/03/31/nested-queries-in-solr/
>
> Just not positive how the SQL parser will treat the : in the query.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Aug 11, 2016 at 12:22 PM, Pablo Anzorena 
> wrote:
>
>> Joel, one more thing.
>>
>> Is there anyway to use the sql and the lucene query syntax? The thing is
>> that my bussiness application is tightly coupled with the lucene query
>> syntax, so I need a way to use both the sql features (without the where
>> clause) and the query syntax of lucene.
>>
>> Thanks.
>>
>> 2016-08-11 11:40 GMT-03:00 Pablo Anzorena :
>>
>> > Excellent!
>> >
>> > Thanks Joel
>> >
>> > 2016-08-11 11:19 GMT-03:00 Joel Bernstein :
>> >
>> >> There are two ways to do this with SolrJ:
>> >>
>> >> 1) Use the JDBC driver.
>> >>
>> >> 2) Use the SolrStream to send the request and then read() the Tuples.
>> This
>> >> is what the JDBC driver does under the covers. The sample code can be
>> >> found
>> >> here:
>> >> https://github.com/apache/lucene-solr/blob/master/solr/solrj
>> >> /src/java/org/apache/solr/client/solrj/io/sql/StatementImpl.java
>> >>
>> >> The constructStream() method creates a SolrStream with the request.
>> >>
>> >> Joel Bernstein
>> >> http://joelsolr.blogspot.com/
>> >>
>> >> On Thu, Aug 11, 2016 at 10:05 AM, Pablo Anzorena <
>> anzorena.f...@gmail.com
>> >> >
>> >> wrote:
>> >>
>> >> > Hey,
>> >> >
>> >> > I'm trying to get the response of solr via QueryResponse using
>> >> > QueryResponse queryResponse = client.query(solrParams); (where client
>> >> is a
>> >> > CloudSolrClient)
>> >> >
>> >> > The error it thows is:
>> >> >
>> >> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrE
>> xception:
>> >> > Error
>> >> > from server at http://tywin:8983/solr/testcol
>> lection1_shard1_replica1:
>> >> > Expected mime type application/octet-stream but got text/plain.
>> >> > {"result-set":{"docs":[
>> >> > {"count(*)":5304,"d1":2},
>> >> > {"count(*)":5160,"d1":1},
>> >> > {"count(*)":5016,"d1":3},
>> >> > {"count(*)":4893,"d1":4},
>> >> > {"count(*)":4824,"d1":5},
>> >> > {"EOF":true,"RESPONSE_TIME":11}]}}
>> >> > at
>> >> > org.apache.solr.client.solrj.impl.HttpSolrClient.
>> >> > executeMethod(HttpSolrClient.java:558)
>> >> >
>> >> > Then I tryed to implement a custom ResponseParser that override the
>> >> > getContentType() and returns "text/plain", but it returns another
>> error.
>> >> >
>> >> > So... Is it a way to get the sql response via this method?
>> >> >
>> >> > I make it works via Connection and ResultSets, but I need to use the
>> >> other
>> >> > way (if possible).
>> >> >
>> >> > Thanks!
>> >> >
>> >>
>> >
>> >
>>
>
>


Effects of insert order on query performance

2016-08-11 Thread Jeff Wartes

This isn’t really a question, although some validation would be nice. It’s more 
of a warning.

Tldr is that the insert order of documents in my collection appears to have had 
a huge effect on my query speed.


I have a very large (sharded) SolrCloud 5.4 index. One aspect of this index is 
a multi-valued field (“permissions”) that for 90% of docs contains one 
particular value, (“A”) and for 10% of docs contains another distinct value. 
(“B”) It’s intended to represent something like permissions, so more values are 
possible in the future, but not present currently. In fact, the addition of 
docs with value B to this index was very recent, previously all docs had value 
“A”. All queries, in addition to various other Boolean-query type restrictions, 
have a terms query on this field, like {!terms f=permissions v=A} or {!terms 
f=permissions v=A,B}

Last week, I tried to re-index the whole collection from scratch, using source 
data. Query performance on the resulting re-index proved to be abysmal, I could 
get barely 10% of my previous query throughput, and even that was at latencies 
that were orders of magnitude higher than what I had in production.

I hooked up some CPU profiling to a server that had shards from both the old 
and new version of the collection, and eventually it looked like the 
significant difference in processing the two collections was coming from 
ConstantWeight.scorer()
Specifically, this line
https://github.com/apache/lucene-solr/blob/0a1dd10d5262153f4188dfa14a08ba28ec4ccb60/solr/core/src/java/org/apache/solr/search/SolrConstantScoreQuery.java#L102
was far more expensive in my re-indexed collection. From there, the call chain 
goes through an LRUQueryCache, down to a BulkScorer, and ends up with the extra 
work happening here:
https://github.com/apache/lucene-solr/blob/0a1dd10d5262153f4188dfa14a08ba28ec4ccb60/lucene/core/src/java/org/apache/lucene/search/Weight.java#L169

I don’t pretend to understand all that code, but the difference in my re-index 
appears to have something to do either with that cache, or the aggregate 
docIdSets that need weights generated is simply much bigger in my re-index.


But the queries didn’t change, and the data is basically the same, what else 
could have changed?

The documents with the “B” distinct value were added recently to the 
high-performance collection, but the A’s and the B’s were all mixed up in the 
source data dump I used to re-index. On a hunch, I manually ordered the docs 
such that the A’s were all first and re-indexed again, and performance is great!

Here’s my theory: Using TieredMergePolicy, the vast quantity of the documents 
in an index are contained in the largest segments. I’m guessing there’s an 
optimization somewhere that says something like “This segment only has A’s”. By 
indexing all the A’s first, those biggest segments only contain A’s, and only 
the smallest, newest segments are unable to make use of that optimization.

Here’s the scary part: Although my re-index is now performing well, if this 
theory is right, some random insert (or a deliberate optimize) at some random 
point in the future could cascade a segment merge such that the largest 
segment(s) now contain both A’s and B’s, and performance suddenly goes over a 
cliff. I have no way to prevent this possibility except to stop doing inserts.

My current thinking is that I need to pull the terms-query part out of the 
query and do a filter query for it instead. Probably as a post-filter, since 
I’ve had bad luck with very large filter queries and the filter cache. I’d 
tested this originally (when I only had A’s), but found the performance was a 
bit worse than just leaving it in the query. I’ll take a bit worse and 
predictability over a bit better and a time bomb though, if those are my 
choices.


If anyone has any comments refuting or supporting this theory, I’d certainly 
like to hear it. This is the first time I’ve encountered anything about insert 
order mattering from a performance perspective, and it becomes a general-form 
question around how to handle low-cardinality fields.



Re: Create collection on all nodes using the Collection API

2016-08-11 Thread Anshum Gupta
Hi Alexandre,

You can you the CLUSTERSTATUS Collections API (
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18)
to get a list of live nodes.

-Anshum

On Thu, Aug 11, 2016 at 10:16 AM Alexandre Drouin <
alexandre.dro...@orckestra.com> wrote:

> Hi,
>
> What would be the best/easiest way to create a collection (only one shard)
> using the Collection API and have a replica created on all live nodes?
>
> Using the 'create collection' API, I can use the 'replicationFactor'
> parameter and specify the number of replica I want for my collection.  So
> if I have 3 lives nodes I can say 'replicationFactor=3' and my collection
> will have a replica on all lives nodes.  However I do not want to
> 'hardcode' my number of live nodes for obvious reasons, so because of that
> I have the following questions:
>
> 1) Is there a way to create a collection (only one shard) and having a
> replica of the shard on all live nodes?
>
> 2) Assuming #1 is not possible, is it possible to have the list of live
> nodes ?  If I can have the list of live nodes I could detect the number
> required for the replicationFactor parameter.
>
> Thanks
>
> Alexandre Drouin
>


Want zero results from SOLR when there are no matches for "querystring"

2016-08-11 Thread John Bickerstaff
First let me say that this is very possibly the "x - y problem" so let me
state up front what my ultimate need is -- then I'll ask about the thing I
imagine might help...  which, of course, is heavily biased in the direction
of my experience coding Java and writing SQL...

I have a piece of a query that calculates a score based on a "weighting"
number stored in each solr doc.  I'm including the xml for my custom
endpoint below...

The specific line is this:
product(field(category_weight),20)

What I just realized is that when I query Solr for a string that has NO
matches in the entire corpus, I still get a slew of results because EVERY
doc has the weighting value in the category_weight field - and therefore
every doc gets some score.

What I would like is to return zero results if there is no match for the
querystring.  My collection is small enough that I don't care if the actual
calculation runs on each doc (although that's wasteful) -- I just don't
want to see results come back for zero matches to the querystring

(The /select endpoint does this of course, but my custom endpoint includes
this "weighting" piece and therefore returns every doc in the corpus
because they all have the weighting.


Enter my imagined solution...  The potential X-Y problem...


So - given that I come from a programming background, I immediately start
thinking of an if statement ...

 if(some_score_for_the_primary_search_string) {
  run_the_category_weight_calculation;
 } else {
  do_NOT_run_category_weight_calc;
 }


Another way of thinking of it would be something like the "WHERE" clause in
SQL...

 run_category_weight_calculation WHERE "searchstring" is found in the
document, not otherwise.

I'm aware that things could be handled in the client-side of my web app,
but if possible, I'd like the interface to SOLR to be as clean as possible,
and massage incoming SOLR data as little as possible.

In other words, do NOT return any docs if the querystring (and any
synonyms) match zero docs.

Here is the endpoint XML for the query.  I've highlighted the specific line
that is causing the unintended results...


 

 
   all
   20
   
   text
  
   synonym_edismax>
   true

   1.5
   1.1
   75%
   *:*
   20
   meta_doc_type:chapterDoc
   {!synonym_edismax qf='title' synonyms='true'
synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
v=$q}
   id category_weight title category_ss score
contentType
   {!edismax qf='title' bf='' bq='' v=$q}
=
   *product(field(category_weight),20)*
=
   product(query($titleQuery),4)
   text contentType^1000
   python
   true
   true
   true
   all
 
  

And here is the debug output for a query.  (This was a test for synonyms,
which you'll see in the output.) The original query string was, of
course, "μ-heavy
chain disease"

You'll note that although there is no score in the first doc explain for
the actual querystring, the highlighted section does get a score for
product(double(category_weight)=1.5,const(20))

... which is the thing that is currently causing all the docs in the
collection to "match" even though the querystring is not in any of them.

"debug":{ "rawquerystring":"\"μ-heavy chain disease\"",
"querystring":"\"μ-heavy
chain disease\"", "parsedquery":"(DisjunctionMaxQuery((text:\"μ heavy chain
disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
((+DisjunctionMaxQuery((text:\"mu heavy chain disease\" | (contentType:\"mu
heavy chain disease\")^1000.0)))/no_coord^1.1)
((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0)))/no_coord^1.1) ((+DisjunctionMaxQuery((text:\"μ heavy chain
disease\" | (contentType:\"μ heavy chain disease\")^1000.0)))/no_coord^1.1)
((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0)))/no_coord^1.1)) ((DisjunctionMaxQuery((title:\"μ heavy
chain disease\"))^2.5 ((+DisjunctionMaxQuery((title:\"mu heavy chain
disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
hcd\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ heavy chain
disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ
hcd\")))/no_coord^1.1)))
FunctionQuery(product(double(category_weight),const(20)))
FunctionQuery(product(query(+(title:\"μ heavy chain
disease\"),def=0.0),const(4)))", "parsedquery_toString":"(((text:\"μ heavy
chain disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5
((+(text:\"mu heavy chain disease\" | (contentType:\"mu heavy chain
disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0))^1.1) ((+(text:\"μ heavy chain disease\" | (contentType:\"μ
heavy chain disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ
hcd\")^1000.0))^1.1)) title:\"μ heavy chain disease\"))^2.5
((+(title:\"mu heavy 

RE: Wildcard search not working

2016-08-11 Thread Ribeaud, Christian (Ext)
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, 
honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the 
corresponding field:

...




 









...

What is wrong with this schema? Respectively, what should I change to be able 
to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian


Need Help Resolving Unknown Shape Definition Error

2016-08-11 Thread Jennifer Coston

Hello,

I am trying to setup a local solr core so that I can perform Spatial
searches on it. I am using version 5.2.1. I have updated my schema.xml file
to include the location-rpt fieldType:



And I have defined my field to use this type:



I also added the jts-1.4.0.jar file to C:\solr-5.2.1\server\solr-webapp
\webapp\WEB-INF\lib.

However when I try to add a document through the Solr Admin Console I am
seeing this response:

{
  "responseHeader": {
"status": 400,
"QTime": 6
  },
  "error": {
"msg": "Unknown Shape definition [POLYGON((-77.23 38.922, -77.23
38.923, -77.228 38.923, -77.228 38.922, -77.23 38.922))]",
"code": 400
  }
}

I can submit documents successfully if I remove the positionWkt field. Did
I miss a configuration step?

Here is the document I am trying to add:

{
"observationId": "8e09f47f",
"observationType": "image",
"startTime": "2015-09-19T21:03:51Z",
"endTime": "2015-09-19T21:03:51Z",
"receiptTime": "2016-07-29T15:49:49.328Z",
"locationLat": 38.9225015078814,
"locationLon": -77.22900299194423,
"position": "38.9225015078814,-77.22900299194423",
"positionWkt": "POLYGON((-77.23 38.922, -77.23 38.923, -77.228
38.923, -77.228 38.922, -77.23 38.922))",
"provider": "a"
}

Here are the fields I added to the schema.xml file (I started with the
template, please let me know if you need the whole thing):

observationId














Thank you!

Jennifer

Re: Wildcard search not working

2016-08-11 Thread Upayavira
You have a stemming filter in your analysis chain. Go to the analysis
tab, select the 'text' field, and put "Roche" into both boxes. Click
analyse. I bet you you will see Roch, not Roche, because of your
stemming filter shown below.

That's what Ahmet shrewdly identified above.

Upayavira

On Thu, 11 Aug 2016, at 08:31 PM, Ribeaud, Christian (Ext) wrote:
> Hi Ahmet,
> 
> Many thanks for your reply. I had a look at the URL you pointed out but,
> honestly, I have to admit that I did not fully understand you.
> Let's be a bit more concrete. Following the schema snippet for the
> corresponding field:
> 
> ...
>  required="false" multiValued="false" />
> 
> 
>  positionIncrementGap="100">
>  
> 
> 
>  words="lang/stopwords_de.txt" format="snowball" />
> 
> 
> 
> 
> 
> 
> ...
> 
> What is wrong with this schema? Respectively, what should I change to be
> able to correctly do wildcard searches?
> 
> Many thanks for your time. Cheers,
> 
> christian
> --
> Christian Ribeaud
> Software Engineer (External)
> NIBR / WSJ-310.5.17
> Novartis Campus
> CH-4056 Basel
> 
> 
> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com] 
> Sent: Donnerstag, 11. August 2016 16:00
> To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
> Subject: Re: Wildcard search not working
> 
> Hi Chiristian,
> 
> The query r?che may not return at least the same number of matches as
> roche depending on your analysis chain.
> The difference is roche is analyzed but r?che don't. Wildcard queries are
> executed on the indexed/analyzed terms.
> For example, if roche is indexed/analyzed as roch, the query r?che won't
> match it.
> 
> Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis
> 
> Ahmet
> 
> 
> 
> On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)"
>  wrote:
> Hi,
> 
> What would be the reasons making the wildcard search for Lucene Query
> Parser NOT working?
> 
> We are using Solr 5.4.1 and, using the admin console, I am triggering for
> instance searches with term 'roche' in a specific core. Everything fine,
> I am getting for instance two matches. I would expect at least the same
> number of matches with term 'r?che'. However, this does NOT happen. I am
> getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not
> work neither but 'roch*' works.
> 
> Switching debug mode brings following output:
> 
> "debug": {
> "rawquerystring": "roch?",
> "querystring": "roch?",
> "parsedquery": "text:roch?",
> "parsedquery_toString": "text:roch?",
> "explain": {},
> "QParser": "LuceneQParser",
> ...
> 
> Any idea? Thanks and cheers,
> 
> christian


Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-11 Thread Chris Hostetter

: First let me say that this is very possibly the "x - y problem" so let me
: state up front what my ultimate need is -- then I'll ask about the thing I
: imagine might help...  which, of course, is heavily biased in the direction
: of my experience coding Java and writing SQL...

Thank you so much for asking your question this way!

Right off the bat, the background you've provided seems supicious...

: I have a piece of a query that calculates a score based on a "weighting"
...
: The specific line is this:
: product(field(category_weight),20)
: 
: What I just realized is that when I query Solr for a string that has NO
: matches in the entire corpus, I still get a slew of results because EVERY
: doc has the weighting value in the category_weight field - and therefore
: every doc gets some score.

...that is *NOT* how dismax and edisamx normally work.  

While both the "bf" abd "bq" params result in "additive" boosting, and the 
implementation of that "additive boost" comes from adding new optional 
clauses to the top level BooleanQuery that is executed, that only happens 
after the "main" query (from your "q" param) is added to that top level 
BooleanQuery as a "mandaory" clause.

So, for example, "bf=true()" and "bq=*:*" should match & boost every doc, 
but with the techprducts configs/data these requests still don't match 
anything...

/select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
/select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query

...and if you look at the debug output, the parsed queries shows that the 
"bogus" part of the query is mandatory...

+DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) 
FunctionQuery(const(true))

(i didn't use "pf" in that example, but the effect is the same, the "pf" 
based clauses are optional, while the "qf" based clauses are mandatory)

If you compare that example to your debug output, you'll notice a 
difference in structure -- it's a bit hard to see in your example, but if 
you simplify your qf, pf, and q fields it should be more obvious, but 
AFAICT the "main" parts of your query are getting wrapped in an extra 
layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in 
the top level query ... i don't see *any* mandatory clauses in your top 
level BooleanQuery, which is why any match on a bf or bq function is 
enough to cause a document to match.

I suspect the reason your parsed query structure is so diff has to do with 
this...

:synonym_edismax>


1) how exactly is "synonym_edismax" defined in your solrconfig.xml? 
2) what QParserPlugin are you using to implement that?

I suspect whatever QParserPlugin you are using has a bug in it :)


If you can't fix the bug, one possibile workaround would be to abandon bf 
and bq params completely, and instead wrap the query it produces in in a 
{!boost} parser with whatever function you want (using functions like
sum() or prod() to combine multiple functions, and query() to incorporate 
your current bq param).  Doing this will require chanign how you specify 
you input (example below) and it will result in *multiplicitive* boosts -- 
so your scores will be much diff, and you will likely have to adjust your 
constants, but: 1) multiplicitive boosts are almost always what people 
*really* want anyway; 2) it will ensure the boosts are only applied for 
things matching your main query, no matter how that query parser works or 
what bugs it has.

Example of using {!boost} to wrap an arbitrary other parser...

instead of...
  defType=foofoo
  q=barbarbar

use...
   q={!boost b=$func defType=foofoo v=$qq}
  qq=barbarbar
func=sum(something,somethingelse)

https://cwiki.apache.org/confluence/display/solr/Other+Parsers
https://cwiki.apache.org/confluence/display/solr/Function+Queries




: 
: What I would like is to return zero results if there is no match for the
: querystring.  My collection is small enough that I don't care if the actual
: calculation runs on each doc (although that's wasteful) -- I just don't
: want to see results come back for zero matches to the querystring
: 
: (The /select endpoint does this of course, but my custom endpoint includes
: this "weighting" piece and therefore returns every doc in the corpus
: because they all have the weighting.
: 
: 
: Enter my imagined solution...  The potential X-Y problem...
: 
: 
: So - given that I come from a programming background, I immediately start
: thinking of an if statement ...
: 
:  if(some_score_for_the_primary_search_string) {
:   run_the_category_weight_calculation;
:  } else {
:   do_NOT_run_category_weight_calc;
:  }
: 
: 
: Another way of thinking of it would be something like the "WHERE" clause in
: SQL...
: 
:  run_category_weight_calculation WHERE "searchstring" is found in the
: document, not otherwise.
: 
: I'm aware that things could be handled in the client-side of my web app,
: but if possible, I'd like the interface to SOL

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-11 Thread John Bickerstaff
Thanks!

To answer your questions, while I digest the rest of that information...

I'm using the hon-lucene-synonyms.5.0.4.jar from here:
https://github.com/healthonnet/hon-lucene-synonyms

The config looks like this - and IIRC, is simply a copy from the
recommended cofig on the site mentioned above.

 


  
  


  solr.PatternTokenizerFactory
  



  solr.ShingleFilterFactory
  true
  true
  2
  4



  solr.SynonymFilterFactory
  solr.KeywordTokenizerFactory
  example_synonym_file.txt
  true
  true

  

  



On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter 
wrote:

>
> : First let me say that this is very possibly the "x - y problem" so let me
> : state up front what my ultimate need is -- then I'll ask about the thing
> I
> : imagine might help...  which, of course, is heavily biased in the
> direction
> : of my experience coding Java and writing SQL...
>
> Thank you so much for asking your question this way!
>
> Right off the bat, the background you've provided seems supicious...
>
> : I have a piece of a query that calculates a score based on a "weighting"
> ...
> : The specific line is this:
> : product(field(category_weight),20)
> :
> : What I just realized is that when I query Solr for a string that has NO
> : matches in the entire corpus, I still get a slew of results because EVERY
> : doc has the weighting value in the category_weight field - and therefore
> : every doc gets some score.
>
> ...that is *NOT* how dismax and edisamx normally work.
>
> While both the "bf" abd "bq" params result in "additive" boosting, and the
> implementation of that "additive boost" comes from adding new optional
> clauses to the top level BooleanQuery that is executed, that only happens
> after the "main" query (from your "q" param) is added to that top level
> BooleanQuery as a "mandaory" clause.
>
> So, for example, "bf=true()" and "bq=*:*" should match & boost every doc,
> but with the techprducts configs/data these requests still don't match
> anything...
>
> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query
>
> ...and if you look at the debug output, the parsed queries shows that the
> "bogus" part of the query is mandatory...
>
> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
> FunctionQuery(const(true))
>
> (i didn't use "pf" in that example, but the effect is the same, the "pf"
> based clauses are optional, while the "qf" based clauses are mandatory)
>
> If you compare that example to your debug output, you'll notice a
> difference in structure -- it's a bit hard to see in your example, but if
> you simplify your qf, pf, and q fields it should be more obvious, but
> AFAICT the "main" parts of your query are getting wrapped in an extra
> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
> the top level query ... i don't see *any* mandatory clauses in your top
> level BooleanQuery, which is why any match on a bf or bq function is
> enough to cause a document to match.
>
> I suspect the reason your parsed query structure is so diff has to do with
> this...
>
> :synonym_edismax>
>
>
> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml?
> 2) what QParserPlugin are you using to implement that?
>
> I suspect whatever QParserPlugin you are using has a bug in it :)
>
>
> If you can't fix the bug, one possibile workaround would be to abandon bf
> and bq params completely, and instead wrap the query it produces in in a
> {!boost} parser with whatever function you want (using functions like
> sum() or prod() to combine multiple functions, and query() to incorporate
> your current bq param).  Doing this will require chanign how you specify
> you input (example below) and it will result in *multiplicitive* boosts --
> so your scores will be much diff, and you will likely have to adjust your
> constants, but: 1) multiplicitive boosts are almost always what people
> *really* want anyway; 2) it will ensure the boosts are only applied for
> things matching your main query, no matter how that query parser works or
> what bugs it has.
>
> Example of using {!boost} to wrap an arbitrary other parser...
>
> instead of...
>   defType=foofoo
>   q=barbarbar
>
> use...
>q={!boost b=$func defType=foofoo v=$qq}
>   qq=barbarbar
> func=sum(something,somethingelse)
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
>
>
>
> :
> : What I would like is to return zero results if there is no match for the
> : querystring.  My collection is small enough that I don't care if the
> actual
> : calculation runs on each doc (although that's wasteful) -- I just don't
> : want to see results come back for zero matches to the querystring

RE: using variables in data-config.xml

2016-08-11 Thread Prasanna S. Dhakephalkar
Hi Shrinivasa,

Thanks for your reply.
I think I need to investigate more.

Regards,

Prasann.

-Original Message-
From: Srinivasa Meenavali [mailto:smeenav...@professionalaccess.com] 
Sent: Thursday, August 11, 2016 6:21 PM
To: solr-user@lucene.apache.org
Subject: RE: using variables in data-config.xml

Hi Prasanna,


You can use Request Parameters in Solr 5.5 but not in your version . 



"these parameters can be passed to the full-import command or defined in the
 section in sol rconfig.xml. This example shows the parameters
with the full-import command:
dataimport?command=full-import&jdbcurl=jdbc:hsqldb:./example-DIH/hsqldb/ex&j
dbcuse
r=sa&jdbcpassword=secret"

Regards
Srinivas Meenavalli

-Original Message-
From: Prasanna S. Dhakephalkar [mailto:prasann...@merajob.in]
Sent: Thursday, August 11, 2016 4:40 PM
To: solr-user@lucene.apache.org
Subject: using variables in data-config.xml

Hi,

 

I have 7 cores.

 

In each data-config.xml , I have

 



 

There is similar structures on production, testing and partner instances.

So if I have to make changes I have to do in all data-config files.

 

I am looking for a mechanism where some variables 

Like 

dbname=abcd

dbusre=username

dbpass=password

are defined in solr.xml file under

../server/solr_test directory

 

and can be referenced
.../servre/solr_test/{core_1,core_2,...core_7}/conf/data-config.xml

 

Tired looking on net, I found articles that are telling me to edit
solrconfig.xml in each core, that does not satisfy my need.

 

I am using solr 5.3.1

 

Regards,

 

Prasanna.


This electronic mail transmission may contain privileged, confidential
and/or proprietary information intended only for the
person(s) named.  Any use, distribution, copying or disclosure to another
person is strictly prohibited.  If you are not the addressee indicated in
this message (or responsible for delivery of the message to such person),
you may not copy or deliver this message to anyone. In such case, you should
destroy this message and kindly notify the sender by reply email.