Merge update request with existing documents

2018-09-12 Thread Vincenzo D'Amore
Hi all,

I have to update a bunch of documents but in the update requests there are
only parts of the documents.

Given that the atomic update is not a feasible option, because sometimes I
have to remove one or more instances of a dynamic field but I'm unable to
know their name(s) at update time.

At least I would like to try to avoid to read (remotely) the documents I've
to update and merge them with the update requests.

Can I use the StatelessScriptUpdateProcessorFactory to do this?
Could I try extend the UpdateProcessor adding my behaviour ?
There are other feasible way in your opinion?

Thanks in advance for your help,
Vincenzo


-- 
Vincenzo D'Amore


Re: 6.x to 7.x differences

2018-09-12 Thread Shawn Heisey

On 9/11/2018 8:32 PM, John Blythe wrote:

we recently migrated to cloud. part of that migration jumped us from 6.1 to
7.4.

one example query between our old solr instance and our new cloud instance
produces 42 results and 19k results.

the analyzer is the same aside from WordDelimiterFilterFactory moving over
to the graph variation of it and the lucene parser moving from 6.1 to 7.4
obviously.


Did you completely reindex after changing your schema?  Not doing this, 
especially if attempting to use the index from the earlier version, can 
lead to problems.  Have you checked what happens if you use the 
non-graph version of WDF (and completely reindex), so you can see 
whether that changes anything?  That filter will disappear in 8.0, but 
it's still there for all of 7.x.


Adding "debug=query" to your URL parameters is very useful in locating 
differences.  Maybe 6.1 and 7.4 are parsing the query differently.  
There's a good chance that this will reveal something we can pursue.



i've used the analysis tool in solr admin to try to determine the
difference between the two. i'm seeing the same output between index and
query results yet when actually running the queries have that huge
divergence of results.


One of the big differences between 6.x and 7.x for query parsing is that 
the sow (split on whitespace) parameter defaults to true in 6.x (and I 
think it didn't even exist in 6.1, so it's effectively true).  In 7.x, 
that parameter defaults to false.  So the query parser in 7.x tends to 
behave *exactly* like what you see in the analysis tool, whereas in 6.x 
the input would be split on whitespace before ever reaching analysis, 
which can result in very subtle differences in how the input is 
analyzed.  Adding "sow=true" to your URL parameters is something you can 
try as a quick test.


Thanks,
Shawn



Re: Error casting to PointField

2018-09-12 Thread Shawn Heisey

On 9/11/2018 10:15 PM, Zahra Aminolroaya wrote:

Thanks Erick. We used to use TrieLongField for our unique id and in the
document it is said that all Trie* fieldtypes are casting to
*pointfieldtypes. What would be the alternative solution?


I've never heard of Trie casting to Point.

Point is the recommended replacement for *similar* but not *identical* 
functionality to what Trie provides.


Point fields are excellent at range queries (as an example, field:[500 
TO 1000]) ... but they are terrible for single value lookups (such as 
id:75224575).  They do not work for the uniqueKey.  I suspect that the 
aspect that makes them terrible at single-value lookups is the same 
aspect that prevents their use as uniqueKey, but I'm not familiar enough 
with the low level details to be able to say for certain.


Trie is also excellent for range queries, but not as good as Point.  It 
is very fast at single-value lookups, and can be used as uniqueKey.


Using a string type (class solr.StrField) is one workaround for the 
uniqueKey problem.  It allows fast single-value lookups, but you lose 
the ability to do numeric range queries.  The string type sorts 
alphanumerically, not numerically.


You can keep using Trie for all 7.x versions, but that type will be gone 
in 8.0.  Hopefully by that release the limitations in Point types will 
be removed.  You'll probably have to reindex if that happens, especially 
if the problem is fixed by adding a new set of field types instead of 
altering the existing Point types.


Thanks,
Shawn



Retrieving binary fields

2018-09-12 Thread Dominik Safaric
Hi,

I've implemented a custom FieldType, BinaryDocValuesField, that stores
binary values as doc-values.

I am interested onto how can I retrieve the stored values via a Solr query
as base64 encoded string? Because if I issue a "*" query with all fields
selected, I get all fields expect the binary - the same behaviour implies
to the native Solr BinaryField.

Thanks in advance,
Dominik


switch query parser and solr cloud

2018-09-12 Thread Dwane Hall
Good afternoon Solr brains trust I'm seeking some community advice if somebody 
can spare a minute from their busy schedules.

I'm attempting to use the switch query parser to influence client search 
behaviour based on a client specified request parameter.

Essentially I want the following to occur:

-A user has the option to pass through an optional request parameter 
"allResults" to solr
-If "allResults" is true then return all matching query records by appending a 
filter query for all records (fq=*:*)
-If "allResults" is empty then apply a filter using the collapse query parser 
({!collapse field=SUMMARY_FIELD})

Environment
Solr 7.3.1 (1 solr node DEV, 4 solr nodes PTST)
4 shard collection

My Implementation
I'm using the switch query parser to choose client behaviour by appending a 
filter query to the user request very similar to what is documented in the solr 
reference guide here 
(https://lucene.apache.org/solr/guide/7_4/other-parsers.html#switch-query-parser)

The request uses the params api (pertinent line below is the _appends_ filter 
queries)
(useParams=firstParams,secondParams)

  "set":{
"firstParams":{
"op":"AND",
"wt":"json",
"start":0,
"allResults":"false",
"fl":"FIELD_1,FIELD_2,SUMMARY_FIELD",
  "_appends_":{
"fq":"{!switch default=\"{!collapse field=SUMMARY_FIELD}\" 
case.true=*:* v=${allResults}}",
  },
  "_invariants_":{
"deftype":"edismax",
"timeAllowed":2,
"rows":"30",
"echoParams":"none",
}
  }
   }

   "set":{
"secondParams":{
"df":"FIELD_1",
"q":"{!edismax v=${searchString} df=FIELD_1 q.op=${op}}",
  "_invariants_":{
"qf":"FIELD_1,FIELD_2,SUMMARY_FIELD",
}
  }
   }}

Everything works nicely until I move from a single node solr instance (DEV) to 
a clustered solr instance (PTST) in which I receive a null pointer exception 
from Solr which I'm having trouble picking apart.  I've co-located the solr 
documents using document routing which appear to be the only requirement for 
the collapse query parser's use.

Does anyone know if the switch query parser has any limitations in a sharded 
solr cloud environment or can provide any possible troubleshooting advice?

Any community recommendations would be greatly appreciated

Solr stack trace
2018-09-12 12:16:12,918 4064160860 ERROR : [c:my_collection s:shard1 
r:core_node3 x:my_collection_ptst_shard1_replica_n1] 
org.apache.solr.common.SolrException : 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2: 
java.lang.NullPointerException
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748

Thanks for taking the time to assist,

Dwane


Re: Retrieving binary fields

2018-09-12 Thread Mikhail Khludnev
Hello, Dominik.
IIRC, fl=field(foo_field_1)

On Wed, Sep 12, 2018 at 1:49 PM Dominik Safaric 
wrote:

> Hi,
>
> I've implemented a custom FieldType, BinaryDocValuesField, that stores
> binary values as doc-values.
>
> I am interested onto how can I retrieve the stored values via a Solr query
> as base64 encoded string? Because if I issue a "*" query with all fields
> selected, I get all fields expect the binary - the same behaviour implies
> to the native Solr BinaryField.
>
> Thanks in advance,
> Dominik
>


-- 
Sincerely yours
Mikhail Khludnev


Re: switch query parser and solr cloud

2018-09-12 Thread Shawn Heisey

On 9/12/2018 5:47 AM, Dwane Hall wrote:

Good afternoon Solr brains trust I'm seeking some community advice if somebody 
can spare a minute from their busy schedules.

I'm attempting to use the switch query parser to influence client search 
behaviour based on a client specified request parameter.

Essentially I want the following to occur:

-A user has the option to pass through an optional request parameter 
"allResults" to solr
-If "allResults" is true then return all matching query records by appending a 
filter query for all records (fq=*:*)
-If "allResults" is empty then apply a filter using the collapse query parser 
({!collapse field=SUMMARY_FIELD})


I'm looking at the documentation for the switch parser and I'm having 
difficulty figuring out what it actually does.


This is the kind of thing that is better to handle in your client 
instead of asking Solr to do it for you.  You'd have to have your code 
construct the complex localparam for the switch parser ... it would be 
much easier to write code to insert your special collapse filter when it 
is required.



Everything works nicely until I move from a single node solr instance (DEV) to 
a clustered solr instance (PTST) in which I receive a null pointer exception 
from Solr which I'm having trouble picking apart.  I've co-located the solr 
documents using document routing which appear to be the only requirement for 
the collapse query parser's use.


Some features break down when working with sharded indexes.  This is one 
of the reasons that sharding should only be done when it is absolutely 
required.  A single-shard index tends to perform better anyway, unless 
it's really really huge.


The error is a remote exception, from 
https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which 
suggests that maybe not all your documents are co-located on the same 
shard the way you think they are.  Is this a remote server/shard?  I am 
completely guessing here.  It's always possible that you've encountered 
a bug.  Does this one (not fixed) look like it might apply?


https://issues.apache.org/jira/browse/SOLR-9104

There should be a server-side error logged by the Solr instance running 
on myserver:1234 as well.  Have you looked at that?


I do not know what PTST means.  Is that important for me to understand?

Thanks,
Shawn



Idle Timeout while DIH indexing and implicit sharding in 7.4

2018-09-12 Thread Вадим Иванов
Hello gurus, 
I am using solrCloud with DIH for indexing my data.
Testing 7.4.0 with implicitly sharded collection  I have noticed that any
indexing 
longer then 2 minutes always failing with many timeout records in log coming
from all replicas in collection.

Such as:
x:Mycol_s_0_replica_t40 RequestHandlerBase
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout
expired: 120001/12 ms
null:java.io.IOException: java.util.concurrent.TimeoutException: Idle
timeout expired: 12/12 ms
at
org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:1075)
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:313)
at
org.apache.solr.servlet.ServletInputStreamWrapper.read(ServletInputStreamWra
pper.java:74)
...
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired:
12/12 ms
at
org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
eduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
49)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
24)
... 1 more
Suppressed: java.lang.Throwable: HttpInput failure
at
org.eclipse.jetty.server.HttpInput.failed(HttpInput.java:821)
at
org.eclipse.jetty.server.HttpConnection$BlockingReadCallback.failed(HttpConn
ection.java:649)
at
org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:134)

Resulting indexing status:
  "statusMessages":{
"Total Requests made to DataSource":"1",
"Total Rows Fetched":"2828323",
"Total Documents Processed":"2828323",
"Total Documents Skipped":"0",
"Full Dump Started":"2018-09-12 14:28:21",
"":"Indexing completed. Added/Updated: 2828323 documents. Deleted 0
documents.",
"Committed":"2018-09-12 14:33:41",
"Time taken":"0:5:19.507",
"Full Import failed":"2018-09-12 14:33:41"}}

Nevertheless all these documents seems indexed fine and searchable.
If the same collection not sharded  or sharded as " compositeId"   indexing
done without any errors.
Type of replicas - nrt or tolg doesn't matter.
Small Indexing (taking less than 2 minutes) run smoothly.

Testing environment - 1 node, Collection with 6 shards, 1 replica for each
shard
Collection:
/admin/collections?action=CREATE&name=Mycol
&numShards=6
&router.name=implicit
&shards=s_0,s_1,s_2,s_3,s_4,s_5
&router.field=sf_shard
&collection.configName=Mycol 
&maxShardsPerNode=10
&nrtReplicas=0&tlogReplicas=1


I have never noticed such behavior before on my prod configuration (solr
6.3.0)
Seems like bug in new version, but I could not find any jira on issue.

Any ideas, please...

--
BR
Vadim Ivanov



how to access solr in solrcloud

2018-09-12 Thread Gu, Steve (CDC/OD/OADS) (CTR)
Hi, all

I am upgrading our solr to 7.4 and would like to set up solrcloud for failover 
and load balance.   There are three zookeeper servers (zk1:2181, zk1:2182) and 
two solr instance solr1:8983, solr2:8983.  So what will be the solr url should 
the client to use for access?  Will it be solr1:8983, the leader?

If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?  Will 
the request be routed to solr2:8983 via the zookeeper?  I understand that 
zookeeper is doing all the coordination works but wanted to understand how this 
works.

Any insight would be greatly appreciated.
Steve



Re: Docker and Solr Indexing

2018-09-12 Thread Dominique Bejean
Hi,

Are you aware about issues in Java applications in Docker if java version
is not 10 ?
https://blog.docker.com/2018/04/improved-docker-container-integration-with-java-10/

Regards.

Dominique


Le mer. 12 sept. 2018 à 05:42, Shawn Heisey  a écrit :

> On 9/11/2018 9:20 PM, solrnoobie wrote:
> > So what we did is we upgraded the instances to 16 gigs and we rarely
> > encounter this now.
> >
> > So what we did was to increase the batch size to 500 instead of 50 and it
> > worked for our test data. But when we tried 1000 batch size, the invalid
> > content type error returned. Can you guys shed some light on why this is
> > happening? I don't think that a thousand per batch is too much (although
> we
> > have documents with many fields and child documents) so I am not really
> sure
> > what's causing this aside from a docker containter restart.
>
> At no point in this thread have you shared the actual error messages.
> Without those and the exact version of Solr, it's difficult to help
> you.  Saying that you got a "content type error" doesn't mean anything.
> We need to see the actual error, complete with all stacktrace data.  The
> best information will be found in the logfile -- solr.log.
>
> Solr (as packaged by this project) is not designed to restart itself
> automatically.  If the JVM encounters an OutOfMemoryError exception and
> the platform is NOT Windows, then Solr is designed to kill itself ...
> but it will NOT automatically restart without outside intervention or a
> change to its startup scripts.  This is done because program operation
> is completely unpredictable when OOME hits, so the best course of action
> is to self-terminate and let the admin fix the problem that cause the OOME.
>
> The publicly available Solr docker container is NOT an official product
> of this project.  It is third-party, so problems specific to the docker
> container may need to be handled by the project that created it.  If the
> docker container is set up to automatically restart Solr when it dies, I
> would consider that to be a bug. About the only reason that Solr will
> ever die is the OOME self-termination that I already described ... and
> since the OOME is likely to occur again after restart, it's usually
> better for the software to stay offline until the admin fixes the problem.
>
> Thanks,
> Shawn
>
>


Re: 6.x to 7.x differences

2018-09-12 Thread John Blythe
hey guys.

preeti: good thought, but this was something we were already aware of and
had accounted for. thanks tho!

shawn: at first, no. we rsynced data up after running it through the
migration tool. we'd gotten errors when using WDF so updated all instances
of it to WDGF (and subsequently added FlattenGraphFilterFactory to each
index analyzer that used WDGF to avoid errors).

the sow seems to be the key here. adding that to the query url dropped me
from +19k to 62 results lol. 'subtle' is a not so subtle understatement in
this case! i'm a big fan of finally being able to not be driven batty by
the analysis vs. query results though, so looking forward to playing w that
some more. for our immediate purposes, however, i think this solves it!

--
John Blythe


On Wed, Sep 12, 2018 at 1:35 AM Preeti Bhat 
wrote:

> Hi John,
>
> Please check the solrQueryParser option, it was removed in 7.4 version, so
> you will need to provide AND in solrconfig.xml or
> give the q.op option while querying to solve this problem. By default solr
> makes it an "OR" operation leading to too many results.
>
> Old Way: In Managed-schema or schema.xml
> 
>
> New Way: in solrconfig.xml
>
>path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse">
> 
>   AND
> 
>   
>
>
> Thanks and Regards,
> Preeti Bhat
>
> -Original Message-
> From: John Blythe [mailto:johnbly...@gmail.com]
> Sent: Wednesday, September 12, 2018 8:02 AM
> To: solr-user@lucene.apache.org
> Subject: 6.x to 7.x differences
>
> hi, all.
>
> we recently migrated to cloud. part of that migration jumped us from 6.1
> to 7.4.
>
> one example query between our old solr instance and our new cloud instance
> produces 42 results and 19k results.
>
> the analyzer is the same aside from WordDelimiterFilterFactory moving over
> to the graph variation of it and the lucene parser moving from 6.1 to 7.4
> obviously.
>
> i've used the analysis tool in solr admin to try to determine the
> difference between the two. i'm seeing the same output between index and
> query results yet when actually running the queries have that huge
> divergence of results.
>
> i'm left scratching my head at this point. i'm guessing it's from the
> lucene parser? hoping to get some clarity from you guys!
>
> thanks!
>
> --
> John Blythe
>
> NOTICE TO RECIPIENTS: This communication may contain confidential and/or
> privileged information. If you are not the intended recipient (or have
> received this communication in error) please notify the sender and
> it-supp...@shoregrp.com immediately, and destroy this communication. Any
> unauthorized copying, disclosure or distribution of the material in this
> communication is strictly forbidden. Any views or opinions presented in
> this email are solely those of the author and do not necessarily represent
> those of the company. Finally, the recipient should check this email and
> any attachments for the presence of viruses. The company accepts no
> liability for any damage caused by any virus transmitted by this email.
>
>
>


Re: switch query parser and solr cloud

2018-09-12 Thread Erick Erickson
You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey  wrote:
>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if 
> > somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search 
> > behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter 
> > "allResults" to solr
> > -If "allResults" is true then return all matching query records by 
> > appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query 
> > parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) 
> > to a clustered solr instance (PTST) in which I receive a null pointer 
> > exception from Solr which I'm having trouble picking apart.  I've 
> > co-located the solr documents using document routing which appear to be the 
> > only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>


RE: how to access solr in solrcloud

2018-09-12 Thread Vadim Ivanov
Hi,  Steve
If you are using  solr1:8983 to access solr and solr1 is down IMHO nothing
helps you to access dead ip.
You should switch to any other live node in the cluster or I'd propose to
have nginx as frontend to access
Solrcloud. 

-- 
BR, Vadim



-Original Message-
From: Gu, Steve (CDC/OD/OADS) (CTR) [mailto:c...@cdc.gov] 
Sent: Wednesday, September 12, 2018 4:38 PM
To: 'solr-user@lucene.apache.org'
Subject: how to access solr in solrcloud

Hi, all

I am upgrading our solr to 7.4 and would like to set up solrcloud for
failover and load balance.   There are three zookeeper servers (zk1:2181,
zk1:2182) and two solr instance solr1:8983, solr2:8983.  So what will be the
solr url should the client to use for access?  Will it be solr1:8983, the
leader?

If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?
Will the request be routed to solr2:8983 via the zookeeper?  I understand
that zookeeper is doing all the coordination works but wanted to understand
how this works.

Any insight would be greatly appreciated.
Steve




Re: how to access solr in solrcloud

2018-09-12 Thread David Santamauro
... or haproxy.

On 9/12/18, 10:23 AM, "Vadim Ivanov"  wrote:

Hi,  Steve
If you are using  solr1:8983 to access solr and solr1 is down IMHO nothing
helps you to access dead ip.
You should switch to any other live node in the cluster or I'd propose to
have nginx as frontend to access
Solrcloud. 

-- 
BR, Vadim



-Original Message-
From: Gu, Steve (CDC/OD/OADS) (CTR) [mailto:c...@cdc.gov] 
Sent: Wednesday, September 12, 2018 4:38 PM
To: 'solr-user@lucene.apache.org'
Subject: how to access solr in solrcloud

Hi, all

I am upgrading our solr to 7.4 and would like to set up solrcloud for
failover and load balance.   There are three zookeeper servers (zk1:2181,
zk1:2182) and two solr instance solr1:8983, solr2:8983.  So what will be the
solr url should the client to use for access?  Will it be solr1:8983, the
leader?

If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?
Will the request be routed to solr2:8983 via the zookeeper?  I understand
that zookeeper is doing all the coordination works but wanted to understand
how this works.

Any insight would be greatly appreciated.
Steve





Re: how to access solr in solrcloud

2018-09-12 Thread Walter Underwood
Use a load balancer. It doesn’t have to be fancy, we use the Amazon ALB because 
our clusters are in AWS.

Zookeeper never handles queries. It coordinates cluster changes with the Solr 
instances.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 12, 2018, at 6:38 AM, Gu, Steve (CDC/OD/OADS) (CTR)  
> wrote:
> 
> Hi, all
> 
> I am upgrading our solr to 7.4 and would like to set up solrcloud for 
> failover and load balance.   There are three zookeeper servers (zk1:2181, 
> zk1:2182) and two solr instance solr1:8983, solr2:8983.  So what will be the 
> solr url should the client to use for access?  Will it be solr1:8983, the 
> leader?
> 
> If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?  
> Will the request be routed to solr2:8983 via the zookeeper?  I understand 
> that zookeeper is doing all the coordination works but wanted to understand 
> how this works.
> 
> Any insight would be greatly appreciated.
> Steve
> 



RE: how to access solr in solrcloud

2018-09-12 Thread Gu, Steve (CDC/OD/OADS) (CTR)
Thanks, David

-Original Message-
From: David Santamauro  
Sent: Wednesday, September 12, 2018 10:28 AM
To: solr-user@lucene.apache.org
Cc: David Santamauro 
Subject: Re: how to access solr in solrcloud

... or haproxy.

On 9/12/18, 10:23 AM, "Vadim Ivanov"  wrote:

Hi,  Steve
If you are using  solr1:8983 to access solr and solr1 is down IMHO nothing
helps you to access dead ip.
You should switch to any other live node in the cluster or I'd propose to
have nginx as frontend to access
Solrcloud. 

-- 
BR, Vadim



-Original Message-
From: Gu, Steve (CDC/OD/OADS) (CTR) [mailto:c...@cdc.gov] 
Sent: Wednesday, September 12, 2018 4:38 PM
To: 'solr-user@lucene.apache.org'
Subject: how to access solr in solrcloud

Hi, all

I am upgrading our solr to 7.4 and would like to set up solrcloud for
failover and load balance.   There are three zookeeper servers (zk1:2181,
zk1:2182) and two solr instance solr1:8983, solr2:8983.  So what will be the
solr url should the client to use for access?  Will it be solr1:8983, the
leader?

If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?
Will the request be routed to solr2:8983 via the zookeeper?  I understand
that zookeeper is doing all the coordination works but wanted to understand
how this works.

Any insight would be greatly appreciated.
Steve





RE: how to access solr in solrcloud

2018-09-12 Thread Gu, Steve (CDC/OD/OADS) (CTR)
Thanks, Walter

-Original Message-
From: Walter Underwood  
Sent: Wednesday, September 12, 2018 10:41 AM
To: solr-user@lucene.apache.org
Subject: Re: how to access solr in solrcloud

Use a load balancer. It doesn’t have to be fancy, we use the Amazon ALB because 
our clusters are in AWS.

Zookeeper never handles queries. It coordinates cluster changes with the Solr 
instances.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 12, 2018, at 6:38 AM, Gu, Steve (CDC/OD/OADS) (CTR)  
> wrote:
> 
> Hi, all
> 
> I am upgrading our solr to 7.4 and would like to set up solrcloud for 
> failover and load balance.   There are three zookeeper servers (zk1:2181, 
> zk1:2182) and two solr instance solr1:8983, solr2:8983.  So what will be the 
> solr url should the client to use for access?  Will it be solr1:8983, the 
> leader?
> 
> If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?  
> Will the request be routed to solr2:8983 via the zookeeper?  I understand 
> that zookeeper is doing all the coordination works but wanted to understand 
> how this works.
> 
> Any insight would be greatly appreciated.
> Steve
> 



RE: how to access solr in solrcloud

2018-09-12 Thread Gu, Steve (CDC/OD/OADS) (CTR)
Vadim,

That makes perfect sense.

Thanks
Steve

-Original Message-
From: Vadim Ivanov  
Sent: Wednesday, September 12, 2018 10:23 AM
To: solr-user@lucene.apache.org
Subject: RE: how to access solr in solrcloud

Hi,  Steve
If you are using  solr1:8983 to access solr and solr1 is down IMHO nothing 
helps you to access dead ip.
You should switch to any other live node in the cluster or I'd propose to have 
nginx as frontend to access Solrcloud. 

--
BR, Vadim



-Original Message-
From: Gu, Steve (CDC/OD/OADS) (CTR) [mailto:c...@cdc.gov] 
Sent: Wednesday, September 12, 2018 4:38 PM
To: 'solr-user@lucene.apache.org'
Subject: how to access solr in solrcloud

Hi, all

I am upgrading our solr to 7.4 and would like to set up solrcloud for
failover and load balance.   There are three zookeeper servers (zk1:2181,
zk1:2182) and two solr instance solr1:8983, solr2:8983.  So what will be the
solr url should the client to use for access?  Will it be solr1:8983, the
leader?

If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?
Will the request be routed to solr2:8983 via the zookeeper?  I understand
that zookeeper is doing all the coordination works but wanted to understand
how this works.

Any insight would be greatly appreciated.
Steve





solr, multiple ports

2018-09-12 Thread David Hastings
is there a way to start the default solr installation on more than one
port?  Only thing I could find was adding another connector to Jetty, via
https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to-listen-to-multiple-ports

however the default solr start command takes the -p parameter, can this
start listening on multiple ports?

-Thanks, David


large query producing graph error ... maybe?

2018-09-12 Thread John Blythe
hey all!

i'm having an issue w large queries. one of our use cases is for users to
drop in an untold amount of product skus. we previously had our
maxBooleanClause limit set to 20k (eek!). but it worked phenomenally well
and i think our record amount from a user was ~19k items.

we're now on 7.4 Cloud. i'm getting this error when testing with a measly
600 skus:

org.apache.lucene.util.graph.GraphTokenStreamFiniteStrings.articulationPointsRecurse(GraphTokenStreamFiniteStrings.java:278)\n\tat


there's a lot more to the error message but that is the tail end of it all
and is repeated a lot of times (maybe 600? idk).

no mention of maxBooleanClause issues specifically in the output, shows as
a stack overflow error.

is this something we can solve in our solr/cloud/zk configuration or is it
somewhere else to be solved?

thanks!

--
John Blythe


Faceting with EnumFieldType in 7.1

2018-09-12 Thread Peter Tyrrell
I updated an older Solr 4.10 core to Solr 7.1 recently. In so doing, I took an 
old 'gradeLevel_enum' field of type EnumField and made it an EnumFieldType, 
since the former has been deprecated. The old core was able to facet on 
gradeLevel_enum, but the new 7.1 core just returns no facet values whatsoever 
for that field. Both cores return gradeLevel_enum values ok when 
fl=gradeLevel_enum.

In the schema, gradeLevel_enum is defined dynamically:




This simple query fails to return any facet values in 7.1, but does facet in 
4.10:

http://localhost:8983/solr/core1/select?facet.field=gradeLevel_enum&facet=on&fl=id,gradeLevel_enum&q=*:*&wt=json

Thanks for any insight.

Peter Tyrrell, MLIS
Lead Developer at Andornot
1-866-266-2525 x706 / ptyrr...@andornot.com



Re: Unable to enable SSL with self-sign certs

2018-09-12 Thread Chris Hostetter


: WARN: (main) AbstractLifeCycle FAILED org.eclipse.jetty.server.Server@...
: java.io.FileNotFoundException: /opt/solr-5.4.1/server (Is a directory)
: java.io.FileNotFoundException: /opt/solr-5.4.1/server (Is a directory)
: at java.io.FileInputStream.open0(Native Method)
: at java.io.FileInputStream.open(FileInputStream.java:195) 
: 
: The above jks is in the etc folder (/opt/solr-5.4.1/server/etc) and the
: permissions are 644. The settings in the /etc/default/solr.in.sh file are as
: follows:

What are the owner/group/perms of all the following...

/opt/solr-5.4.1/server/etc/solr-ssl.keystore.jks
/opt/solr-5.4.1/server/etc
/opt/solr-5.4.1/server
/opt/solr-5.4.1
/opt

...because my best guess for why be a read perms issue on "solr-5.4.1" 
preventing it from "finding" the server directory inside of it?



-Hoss
http://www.lucidworks.com/


Re: solr, multiple ports

2018-09-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 9/12/18 11:03 AM, David Hastings wrote:
> is there a way to start the default solr installation on more than
> one port?  Only thing I could find was adding another connector to
> Jetty, via 
> https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to-
listen-to-multiple-ports
>
>  however the default solr start command takes the -p parameter, can
> this start listening on multiple ports?

What's your use-case?

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluZO50ACgkQHPApP6U8
pFhJIg//ZS7N9E/JlYjh6dxI9x5nOUMPw3wBMKsVo290e5HnQ+6Vx+CZYEtq/nQA
Pxh7TdyvgdD67cj+jZCWYn4a3JLoVT+MVSciIZoVIDcvMEFHgmsviUwGCEq7+xsg
WvuCPEo6IzY+yZZ33wzdr7jlv+jNbFHtrF5t9nuQk8YfNrLqwvaEVor6g6+t+R/j
2L0+UOPPzRvposLiJBUKhYedBxtWas7A05WSFHpYou9wmDhJSB6P1RfnlJSdUNVd
M1BBzJpLTGo5fFgP1zZTVns+jdo6lFTo/g/UpBvVhgv1pPTkN/6vXCTbYlhXpkhO
PxXplyX6OZfavTJUvAzDoUH44xSIuAgLi+7G2nzogXqejZPjwDj6J42jvL/v3ZXP
dz4CFas0gtY/PW9eaug/nGD7UMqCrqSMxOxBatKWnNEl6R359Zp4tRD9fmU097Vc
rmky2kjZazFNGcA0RU7F8Z/pNIbWmoVAkc08yDJ6uyqfh63PI5+CEBclHZqYcAhj
cZWoDvmZL56bT8gQ5leGxME7+QQNLm6nTV1O1l9u+HeWqhBYOlbhDFOlzPVL5cYQ
SZeaseWGamn1HtyZGJN+dZoQxB3QXlHQY9Nj837QDV9tdLlHsujJ3u7w8uJBoJF0
sILKM1oQoPNcCTjv+JbhGKu6z/eq7syVwkwE9zKTlITlcemEoY8=
=6Dnr
-END PGP SIGNATURE-


Re: solr, multiple ports

2018-09-12 Thread David Hastings
Use case is we are upgrading our servers, and have been running solr 5 and 7 
side by side on the same machines to make sure we got 7 to reflect the results 
of our current install. However to finally make the switch, it would require 
changing many many scripts and servers that have already been modified to use 
both servers

On Sep 12, 2018, at 12:15 PM, Christopher Schultz 
mailto:ch...@christopherschultz.net>> wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 9/12/18 11:03 AM, David Hastings wrote:
is there a way to start the default solr installation on more than
one port?  Only thing I could find was adding another connector to
Jetty, via
https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to-
listen-to-multiple-ports

however the default solr start command takes the -p parameter, can
this start listening on multiple ports?

What's your use-case?

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluZO50ACgkQHPApP6U8
pFhJIg//ZS7N9E/JlYjh6dxI9x5nOUMPw3wBMKsVo290e5HnQ+6Vx+CZYEtq/nQA
Pxh7TdyvgdD67cj+jZCWYn4a3JLoVT+MVSciIZoVIDcvMEFHgmsviUwGCEq7+xsg
WvuCPEo6IzY+yZZ33wzdr7jlv+jNbFHtrF5t9nuQk8YfNrLqwvaEVor6g6+t+R/j
2L0+UOPPzRvposLiJBUKhYedBxtWas7A05WSFHpYou9wmDhJSB6P1RfnlJSdUNVd
M1BBzJpLTGo5fFgP1zZTVns+jdo6lFTo/g/UpBvVhgv1pPTkN/6vXCTbYlhXpkhO
PxXplyX6OZfavTJUvAzDoUH44xSIuAgLi+7G2nzogXqejZPjwDj6J42jvL/v3ZXP
dz4CFas0gtY/PW9eaug/nGD7UMqCrqSMxOxBatKWnNEl6R359Zp4tRD9fmU097Vc
rmky2kjZazFNGcA0RU7F8Z/pNIbWmoVAkc08yDJ6uyqfh63PI5+CEBclHZqYcAhj
cZWoDvmZL56bT8gQ5leGxME7+QQNLm6nTV1O1l9u+HeWqhBYOlbhDFOlzPVL5cYQ
SZeaseWGamn1HtyZGJN+dZoQxB3QXlHQY9Nj837QDV9tdLlHsujJ3u7w8uJBoJF0
sILKM1oQoPNcCTjv+JbhGKu6z/eq7syVwkwE9zKTlITlcemEoY8=
=6Dnr
-END PGP SIGNATURE-


Re: solr, multiple ports

2018-09-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

David,

On 9/12/18 12:21 PM, David Hastings wrote:
>> On Sep 12, 2018, at 12:15 PM, Christopher Schultz
>> mailto:ch...@christopherschultz.net>>
>> wrote:
>> 
>> David,
>> 
>> On 9/12/18 11:03 AM, David Hastings wrote:
>>> is there a way to start the default solr installation on more
>>> than one port? Only thing I could find was adding another
>>> connector to Jetty, via
>>> 
https://stackoverflow.com/questions/6905098/how-to-configure-jetty-to-li
sten-to-multiple-ports
>>> however the default solr start command takes the -p parameter, 
>>> can this start listening on multiple ports?>>
>> What's your use-case?
> 
> Use case is we are upgrading our servers, and have been running
> solr 5 and 7 side by side on the same machines to make sure we got
> 7 to reflect the results of our current install. However to finally
> make the switch, it would require changing many many scripts and
> servers that have already been modified to use both servers
Can you configure your servers to redirect port X -> port Y?

This is trivial using iptables, but you didn't mention your
environment. What OS, etc. are you using?

- -chris
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAluZPqgACgkQHPApP6U8
pFifoA//eDjEVraAMrtiHBcUGvMEIpNy2mCQKt0tfk2GqOvq3phO03pYta8ygJeg
UgS6c9E5Zrx7UzLF+x8ngyA+2YVvOatpRABx20k2Q8d4zq/gc2QrK+w/DwffDlV2
r3cFJkVa1k/NrDqownYAGlCvUPLUGa0JSEFnzokoh44Drn5TgRolWNqowDPitOoL
FitT97n95XhpuQrIXG1wA0nicpeBKYCLrp5HJkVVHQrLZPqkIm2FjtJZuwZN54pN
PpvBNnNIvaYfjtWxpmJ9sW2/PmwqmT4RwSmRsJUQ6H/iWFMsi1e/MCQQFWx/mOLK
4YS/yBvRaT1dRJMrzL0517zlrqdStwBh005bBeZ0+EE7DROwufYcT7hD9VBytG4y
vzgFybRA3yo5LELp2Loj2MqMvbSHNFiT290m9JgLcJRf861dGD/Luj/AYEN4qV6k
TrhlyzijKiUJmAjBIP/i8FxRNX9YkGl8QleDb4iIi5WUdPog5Enz0rw2O/l5Xie9
cz8pGj+OOEmuMLoMLBII7Crkqnmsla+hPpB2x9+lqoE0erjrngCShAiCLi9vGOJY
u6oETiGTcZjgTNnXYpZLBxZw71q4sbZhpkUIC68NJE0IIO322Vu4yreM9AaYhObq
Ak9fFrPPfgCyF7IAB6kkvWfP5eYOfK7TzTB4b9pWVN7J6owF8nA=
=+X20
-END PGP SIGNATURE-


Re: large query producing graph error ... maybe?

2018-09-12 Thread Erick Erickson
Looks like your SKU field is points-based? Strings would probably be
better, if you switched to points-based it's new code.

And maxBooleanClauses is so old-school ;) You're better off with
TermsQueryParser, especially if you pre-sort the tokens. see:
https://lucene.apache.org/solr/guide/6_6/other-parsers.html

Although IIRC this is automagic in recent Solr's. I'd also put it in
an fq clause with cache=false...

Best,
Erick
On Wed, Sep 12, 2018 at 8:27 AM John Blythe  wrote:
>
> hey all!
>
> i'm having an issue w large queries. one of our use cases is for users to
> drop in an untold amount of product skus. we previously had our
> maxBooleanClause limit set to 20k (eek!). but it worked phenomenally well
> and i think our record amount from a user was ~19k items.
>
> we're now on 7.4 Cloud. i'm getting this error when testing with a measly
> 600 skus:
>
> org.apache.lucene.util.graph.GraphTokenStreamFiniteStrings.articulationPointsRecurse(GraphTokenStreamFiniteStrings.java:278)\n\tat
>
>
> there's a lot more to the error message but that is the tail end of it all
> and is repeated a lot of times (maybe 600? idk).
>
> no mention of maxBooleanClause issues specifically in the output, shows as
> a stack overflow error.
>
> is this something we can solve in our solr/cloud/zk configuration or is it
> somewhere else to be solved?
>
> thanks!
>
> --
> John Blythe


Re: large query producing graph error ... maybe?

2018-09-12 Thread John Blythe
well, it's our general text field that things get dumped into. this special
use case that is sku specific just ends up being done on the general input.

ended up raising the Xss value and i'm able to get results :)

i imagine this is a n00b or stupid question but imma go for it: what would
be the value add on fq + cache=false variation?

thanks for the help!




--
John Blythe


On Wed, Sep 12, 2018 at 1:02 PM Erick Erickson 
wrote:

> Looks like your SKU field is points-based? Strings would probably be
> better, if you switched to points-based it's new code.
>
> And maxBooleanClauses is so old-school ;) You're better off with
> TermsQueryParser, especially if you pre-sort the tokens. see:
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html
>
> Although IIRC this is automagic in recent Solr's. I'd also put it in
> an fq clause with cache=false...
>
> Best,
> Erick
> On Wed, Sep 12, 2018 at 8:27 AM John Blythe  wrote:
> >
> > hey all!
> >
> > i'm having an issue w large queries. one of our use cases is for users to
> > drop in an untold amount of product skus. we previously had our
> > maxBooleanClause limit set to 20k (eek!). but it worked phenomenally well
> > and i think our record amount from a user was ~19k items.
> >
> > we're now on 7.4 Cloud. i'm getting this error when testing with a measly
> > 600 skus:
> >
> >
> org.apache.lucene.util.graph.GraphTokenStreamFiniteStrings.articulationPointsRecurse(GraphTokenStreamFiniteStrings.java:278)\n\tat
> >
> >
> > there's a lot more to the error message but that is the tail end of it
> all
> > and is repeated a lot of times (maybe 600? idk).
> >
> > no mention of maxBooleanClause issues specifically in the output, shows
> as
> > a stack overflow error.
> >
> > is this something we can solve in our solr/cloud/zk configuration or is
> it
> > somewhere else to be solved?
> >
> > thanks!
> >
> > --
> > John Blythe
>


Re: Docker and Solr Indexing

2018-09-12 Thread Shawn Heisey

On 9/12/2018 7:43 AM, Dominique Bejean wrote:

Are you aware about issues in Java applications in Docker if java version
is not 10 ?
https://blog.docker.com/2018/04/improved-docker-container-integration-with-java-10/


Solr explicitly sets heap size when it starts, so Java is *NOT* 
determining the heap size automatically.


As for CPUs, if the container isn't sized appropriately, then I guess 
you might have an issue there.


The latest version of Solr should start and run just fine in Java 10.  
Some earlier versions of 7.x have problems starting in Java 10, but 
should *run* fine after the script is fixed to detect the version 
correctly.  Solr 6.x is not qualified for Java 9, and therefore not 
qualified for Java 10.


Thanks,
Shawn



Re: 6.x to 7.x differences

2018-09-12 Thread Shawn Heisey

On 9/12/2018 8:12 AM, John Blythe wrote:

shawn: at first, no. we rsynced data up after running it through the
migration tool. we'd gotten errors when using WDF so updated all instances
of it to WDGF (and subsequently added FlattenGraphFilterFactory to each
index analyzer that used WDGF to avoid errors).


The messages you get in the log from WDF are not errors. They are 
warnings.  Just letting you know that the filter will be removed in the 
next major version.



the sow seems to be the key here. adding that to the query url dropped me
from +19k to 62 results lol. 'subtle' is a not so subtle understatement in
this case! i'm a big fan of finally being able to not be driven batty by
the analysis vs. query results though, so looking forward to playing w that
some more. for our immediate purposes, however, i think this solves it!


Setting sow=false is a key part of the "graph" nature of the new filters 
that aren't deprecated.  Mostly this is to support multi-word synonyms 
properly.


https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

Thanks,
Shawn



enquoted searches

2018-09-12 Thread John Blythe
hi (again!). hopefully this will be the last question for a while—i've
really gotten my money's worth the last day or two :)

searches like "foo bar" aren't working the same way they used to for us
since our 7.4 upgrade this weekend.

in both cases our phrase was wrapped in double quotes. the case that
performed as expected had the quotes escaped with a backslash.

this is the debug info from the one that is working as expected:

"parsedquery":"text:craniofacial text:bone text:screw
> (text:nonbioabsorbable PhraseQuery(text:\"non bioabsorbable\"))
> text:sterile",
> "parsedquery_toString":"text:craniofacial text:bone text:screw
> (text:nonbioabsorbable text:\"non bioabsorbable\") text:sterile",


and the other:

"parsedquery":"SpanNearQuery(spanNear([text:craniofacial, text:bone,
> text:screw, spanOr([text:nonbioabsorbable, spanNear([text:non,
> text:bioabsorbable], 0, true)]), text:sterile], 0, true))",
> "parsedquery_toString":"spanNear([text:craniofacial, text:bone,
> text:screw, spanOr([text:nonbioabsorbable, spanNear([text:non,
> text:bioabsorbable], 0, true)]), text:sterile], 0, true)",


it seems to be related to the spanNear() and/or spanOr() usage that is
injected in the latter case.

this is the query, by the way: "Craniofacial bone screw, non-bioabsorbable,
sterile"

removing ", sterile" will render results as expected, too. from the bit of
reading i did on the spanquery stuff i was thinking that maybe it was
related to positioning issues, specifically with 'sterile'. in the Analysis
tab, however, it's in position 6 in both indexing and querying output.

thanks for any thoughts or assists here!

best,

--
John Blythe


Re: how to access solr in solrcloud

2018-09-12 Thread Shawn Heisey

On 9/12/2018 7:38 AM, Gu, Steve (CDC/OD/OADS) (CTR) wrote:

I am upgrading our solr to 7.4 and would like to set up solrcloud for failover 
and load balance.   There are three zookeeper servers (zk1:2181, zk1:2182) and 
two solr instance solr1:8983, solr2:8983.  So what will be the solr url should 
the client to use for access?  Will it be solr1:8983, the leader?

If we  use solr1:8983 to access solr, what happens if solr1:8983 is down?  Will 
the request be routed to solr2:8983 via the zookeeper?  I understand that 
zookeeper is doing all the coordination works but wanted to understand how this 
works.


Zookeeper does not handle Solr requests.  It doesn't know anything at 
all about Solr.  It is Solr that uses ZK to coordinate the cluster.


If you are using the Java client called CloudSolrClient, then you will 
most likely be informing it about ZK, not Solr, and it will 
automatically determine what Solr servers there are by talking to ZK, 
and then will talk directly to the correct Solr servers.  If you are not 
using a client that is ZK-aware, then you will need a load balancer 
sitting in front of your Solr servers. Don't put a load balancer in 
front of ZooKeeper.  Your clients will then talk to the load balancer.


Thanks,
Shawn



Re: 6.x to 7.x differences

2018-09-12 Thread John Blythe
thanks, shawn. yep, i saw the multi term synonym discussion when googling
around a bit after your first reply. pretty jazzed about finally getting to
tinker w that instead of creating our regex ducktape solution
for_multi_term_synonyms!

thanks again-

--
John Blythe


On Wed, Sep 12, 2018 at 2:15 PM Shawn Heisey  wrote:

> On 9/12/2018 8:12 AM, John Blythe wrote:
> > shawn: at first, no. we rsynced data up after running it through the
> > migration tool. we'd gotten errors when using WDF so updated all
> instances
> > of it to WDGF (and subsequently added FlattenGraphFilterFactory to each
> > index analyzer that used WDGF to avoid errors).
>
> The messages you get in the log from WDF are not errors. They are
> warnings.  Just letting you know that the filter will be removed in the
> next major version.
>
> > the sow seems to be the key here. adding that to the query url dropped me
> > from +19k to 62 results lol. 'subtle' is a not so subtle understatement
> in
> > this case! i'm a big fan of finally being able to not be driven batty by
> > the analysis vs. query results though, so looking forward to playing w
> that
> > some more. for our immediate purposes, however, i think this solves it!
>
> Setting sow=false is a key part of the "graph" nature of the new filters
> that aren't deprecated.  Mostly this is to support multi-word synonyms
> properly.
>
>
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>
> Thanks,
> Shawn
>
>


Solr File based spellchecker is not returning expected result

2018-09-12 Thread Rajdeep Sahoo
Hi ,
I am using solr 4.6 version.
My document is having a "iphone 7" but when I am searching with with
"iphone7" I am getting the result because here worddelimiterfilterfactory
is taking care by slipt on numerics functionality.
  (Iphone7-->iphone 7)
But I want solr to return a spellcheck suggestion as "iphone 7" .
When I am configuring "Iphone 7" in the spellings.txt file it is not
returning expected result which i slike
iphone7--->"iphone 7" 7

Another problem is how can I use filebasedspellchecke ,   wordbreak
spellchecke and directsolrspellchecker at the same time.Here getting error
for distanceMeasure param

  please help.Thanks in advance


Solr Usage Question

2018-09-12 Thread Samatha Sajja
Hi,

I have a use case where I am not sure which type of fields to use.

Use case: For every order line we would like to store statuses and quantities
For ex, I have placed an order with some item quantity of 6. 4 of them got 
shipped and 2 of them in processing. I would like to search on status and also 
need to know how much quantity is in that status

Our current data model is for every order line. In the above scenario an order 
line can have multiple statuses and multiple quantities in those statuses. 
Don’t want to duplicate the data as it will update the statuses and quantity 
very often.

Solution:  Thinking of having a status:[SHIPPED,PROCESSING] and 
status_quantity:{"SHIPPED":4,"PROCESSING":2}

Question: What is your recommendation? How should I define these fields?

Regards
Samatha Sajja
Staff Software Engineer - TechCA
Samsclub.com Engineering
Location: 860 -1st Floor
Email: ssa...@walmartlabs.com
Slack: ssajja



Re: Solr Usage Question

2018-09-12 Thread Walter Underwood
My recommendation is to put that data in a relational database. That does not 
look like an appropriate use for Solr.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 12, 2018, at 1:10 PM, Samatha Sajja 
>  wrote:
> 
> Hi,
> 
> I have a use case where I am not sure which type of fields to use.
> 
> Use case: For every order line we would like to store statuses and quantities
> For ex, I have placed an order with some item quantity of 6. 4 of them got 
> shipped and 2 of them in processing. I would like to search on status and 
> also need to know how much quantity is in that status
> 
> Our current data model is for every order line. In the above scenario an 
> order line can have multiple statuses and multiple quantities in those 
> statuses. Don’t want to duplicate the data as it will update the statuses and 
> quantity very often.
> 
> Solution:  Thinking of having a status:[SHIPPED,PROCESSING] and 
> status_quantity:{"SHIPPED":4,"PROCESSING":2}
> 
> Question: What is your recommendation? How should I define these fields?
> 
> Regards
> Samatha Sajja
> Staff Software Engineer - TechCA
> Samsclub.com Engineering
> Location: 860 -1st Floor
> Email: ssa...@walmartlabs.com
> Slack: ssajja
> 



Re: Solr Usage Question

2018-09-12 Thread Alexandre Rafalovitch
Your use case should not start from data stored, but from the queries
you want to search. Then you massage your data to fit that.

Don't worry too much about 'duplicate' too much at this stage. You
could delete the historical records if needed. Or index them without
storing.

What you should try to avoid is updating a record, as - under the
covers - a completely new record is created anyway. That's how the
search speed is achieved.

Regards,
   Alex.

On 12 September 2018 at 16:10, Samatha Sajja
 wrote:
> Hi,
>
> I have a use case where I am not sure which type of fields to use.
>
> Use case: For every order line we would like to store statuses and quantities
> For ex, I have placed an order with some item quantity of 6. 4 of them got 
> shipped and 2 of them in processing. I would like to search on status and 
> also need to know how much quantity is in that status
>
> Our current data model is for every order line. In the above scenario an 
> order line can have multiple statuses and multiple quantities in those 
> statuses. Don’t want to duplicate the data as it will update the statuses and 
> quantity very often.
>
> Solution:  Thinking of having a status:[SHIPPED,PROCESSING] and 
> status_quantity:{"SHIPPED":4,"PROCESSING":2}
>
> Question: What is your recommendation? How should I define these fields?
>
> Regards
> Samatha Sajja
> Staff Software Engineer - TechCA
> Samsclub.com Engineering
> Location: 860 -1st Floor
> Email: ssa...@walmartlabs.com
> Slack: ssajja
>


Re: switch query parser and solr cloud

2018-09-12 Thread Dwane Hall
Thanks for the suggestions and responses Erick and Shawn.  Erick I only return 
30 records irrespective of the query (not the entire payload) I removed some of 
my configuration settings for readability. The parameter "allResults" was a 
little misleading I apologise for that but I appreciate your input.

Shawn thanks for your comments. Regarding the switch query parser the Hossman 
has a great description of its use and application here 
(https://lucidworks.com/2013/02/20/custom-solr-request-params/).  PTST is just 
our performance testing environment and is not important in the context of the 
question other than it being a multi node solr environment.  The server side 
error was the null pointer which is why I was having a few difficulties 
debugging it as there was not a lot of info to troubleshoot.  I'll keep playing 
and explore the client filter option for addressing this issue.

Thanks again for both of your input

Cheers,

Dwane

From: Erick Erickson 
Sent: Thursday, 13 September 2018 12:20 AM
To: solr-user
Subject: Re: switch query parser and solr cloud

You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey  wrote:
>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if 
> > somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search 
> > behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter 
> > "allResults" to solr
> > -If "allResults" is true then return all matching query records by 
> > appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query 
> > parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) 
> > to a clustered solr instance (PTST) in which I receive a null pointer 
> > exception from Solr which I'm having trouble picking apart.  I've 
> > co-located the solr documents using document routing which appear to be the 
> > only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>


Re: how to access solr in solrcloud

2018-09-12 Thread Florian Gleixner
On 9/12/18 8:21 PM, Shawn Heisey wrote:
> On 9/12/2018 7:38 AM, Gu, Steve (CDC/OD/OADS) (CTR) wrote:
>> I am upgrading our solr to 7.4 and would like to set up solrcloud for
>> failover and load balance.   There are three zookeeper servers
>> (zk1:2181, zk1:2182) and two solr instance solr1:8983, solr2:8983.  So
>> what will be the solr url should the client to use for access?  Will
>> it be solr1:8983, the leader?
>>
>> If we  use solr1:8983 to access solr, what happens if solr1:8983 is
>> down?  Will the request be routed to solr2:8983 via the zookeeper?  I
>> understand that zookeeper is doing all the coordination works but
>> wanted to understand how this works.
> 
> Zookeeper does not handle Solr requests.  It doesn't know anything at
> all about Solr.  It is Solr that uses ZK to coordinate the cluster.
> 
> If you are using the Java client called CloudSolrClient, then you will
> most likely be informing it about ZK, not Solr, and it will
> automatically determine what Solr servers there are by talking to ZK,
> and then will talk directly to the correct Solr servers.  If you are not
> using a client that is ZK-aware, then you will need a load balancer
> sitting in front of your Solr servers. Don't put a load balancer in
> front of ZooKeeper.  Your clients will then talk to the load balancer.

The advantage over haproxy/nginx/... solutions is, that a client, that
is using zookeeper, registers at zookeeper and in case a solr node goes
down, the solr node may inform zookeeper, which may inform all
registered clients. Failover can be much faster with CloudSolrClient
than with haproxy or similar solutions.
And CloudSolrClient knows which is the leader and when indexing, it
routes documents to the leader which avoids overhead.
I've written a SolrCloudProxy which can be used to connect non-cloud
aware clients to a solr cloud. The proxy uses CloudSolrClient with all
its advantages. It is not yet production ready, but you may want to try it:
https://gitlab.lrz.de/a2814ad/SolrCloudProxy





signature.asc
Description: OpenPGP digital signature


Re: Solr File based spellchecker is not returning expected result

2018-09-12 Thread Rajdeep Sahoo
Another ask is , how can  I use multiple spellchecker at the same time
based on condition
Currently we are using two spellchecker   [spellcheck.dictionary=
wordbreak  , spellcheck.dictionary=en ]
 If wordbreak dictionary is having suggesion it will make a second call for
fethching the result and in the same call we are using direct solr
spellchecker
  If we are not getitng some result in the second call we are using
directsolr spellchecker.

How can I write a function query for getting suggestion against multiple
spellchecker?




On Thu, Sep 13, 2018 at 12:31 AM Rajdeep Sahoo 
wrote:

> Hi ,
> I am using solr 4.6 version.
> My document is having a "iphone 7" but when I am searching with with
> "iphone7" I am getting the result because here worddelimiterfilterfactory
> is taking care by slipt on numerics functionality.
>   (Iphone7-->iphone 7)
> But I want solr to return a spellcheck suggestion as "iphone 7" .
> When I am configuring "Iphone 7" in the spellings.txt file it is not
> returning expected result which i slike
> iphone7--->"iphone 7" 7
>
> Another problem is how can I use filebasedspellchecke ,   wordbreak
> spellchecke and directsolrspellchecker at the same time.Here getting error
> for distanceMeasure param
>
>   please help.Thanks in advance
>