if we can do that Inverted index and forward information stored on different disks by solr?

2018-08-31 Thread shreck





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


if we can do that Inverted index and forward information stored on different disks by solr?

2018-08-31 Thread shreck





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: if we can do that Inverted index and forward information stored on different disks by solr?

2018-08-31 Thread Mikhail Khludnev
Nope. You can try to hack DirectoryFactory to make it layout files based on
extension.

On Fri, Aug 31, 2018 at 11:58 AM shreck  wrote:

>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


Re: if we can do that Inverted index and forward information stored on different disks by solr?

2018-08-31 Thread shreck
thanks  , Elasticsearch  also  dosen't support  the feature?   
in some situation,i think we can improve search  performance through this
way.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [EXTERNAL] - Re: join works with a core, doesn't work with a collection

2018-08-31 Thread Jan Høydahl
Hi,

You can have multiple nodes as long as you make sure that your collection has 
only one shard, then the joins will work.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 30. aug. 2018 kl. 19:51 skrev Steve Pruitt :
> 
> Shawn,
> 
> You are correct.  I created another setup.  This time with 1 node, 1 shard, 2 
> replicas and the join worked!
> Running with the example SolrCloud setup doesn't work for join queries.
> 
> Thanks.
> 
> -S
> 
> 
> -Original Message-
> From: Steve Pruitt  
> Sent: Thursday, August 30, 2018 12:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: [EXTERNAL] - Re: join works with a core, doesn't work with a 
> collection
> 
> Gosh, really?  This is not mentioned anywhere in the documentation that I can 
> find.  There are node to HW considerations if you are joining across 
> different Collections.
> But, the same Collection?  Tell me this is not so.
> 
> -S
> 
> -Original Message-
> From: Shawn Heisey  
> Sent: Thursday, August 30, 2018 12:11 PM
> To: solr-user@lucene.apache.org
> Subject: Re: [EXTERNAL] - Re: join works with a core, doesn't work with a 
> collection
> 
> On 8/30/2018 9:49 AM, Steve Pruitt wrote:
>> If you mean another running Solr server running, then no.
> 
> I mean multiple Solr processes.
> 
> The cloud example (started with bin/solr -e cloud) starts two Solr instances 
> if you give it the defaults.  They are both running on the same machine, but 
> if part of the data is on the instance running on port
> 8983 and part of the data is on the instance running on port 7574, I don't 
> think you can do a join.
> 
> Thanks,
> Shawn
> 



Re: Split on whitespace parameter doubt

2018-08-31 Thread Jan Høydahl
> I am not sure why field centric field is not used all the time or at least 
> why there is no parameter to force it.

Yea, we should have a parameter to force a field/term centric mode if possible.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com 

> 30. aug. 2018 kl. 20:13 skrev Emir Arnautović  >:
> 
> Hi David,
> Your observations seem correct. If all fields produces the same tokens then 
> Solr goes for “term centric” query, but if different fields produce different 
> tokens, then it uses field centric query. Here is blog post that explains it 
> from multiword synonyms perspective: 
> https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
>  
> 
>  
>   
> >
> 
> IMO the issue is that it is not clear how term centric would look like in 
> case of different tokens: Imagine that your query is “a b” and you are 
> searching  two fields title (analysed) and title_s (string) so you will end 
> up with tokens ‘a’, ‘b’ and ‘a b’. So term centric query would be (title:a || 
> title_s:a) (title:b || title_s:b)(title:a b || title_s:a b). If not already 
> weird, lets assume you allow one token to be missed…
> 
> I am not sure why field centric field is not used all the time or at least 
> why there is no parameter to force it.
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ 
> 
> 
> 
> 
>> On 30 Aug 2018, at 15:02, David Argüello Sánchez 
>> mailto:arguellosanchezda...@gmail.com>> 
>> wrote:
>> 
>> Hi everyone,
>> 
>> I am doing some tests to understand how the split on whitespace
>> parameter works with eDisMax query parser. I understand the behaviour,
>> but I have a doubt about why it works like that.
>> 
>> When sow=true, it works as it did with previous Solr versions.
>> When sow=false, the behaviour changes and all the terms have to be
>> present in the same field. However, if all queried fields' query
>> structure is the same, it works as if it had sow=true. This is the
>> thing that I don’t fully understand.
>> Specifying sow=false I might want to match only those documents
>> containing all the terms in the same field, but because of all queried
>> fields having the same query structure, I would get back documents
>> containing both terms in any of the fields.
>> 
>> Does anyone know the reasoning behind this decision?
>> Thank you in advance.
>> 
>> Regards,
>> David
> 



Bugs on documentation

2018-08-31 Thread waizou mailing
Hi Guies,

I'm I on the right channel to report possible bugs in the Solr Ref Guide
(7.3 and 7.4 page
https://lucene.apache.org/solr/guide/7_3/solrcloud-autoscaling-overview.html)
?


Re: Bugs on documentation

2018-08-31 Thread Mikhail Khludnev
Hi.
Right. You can create JIRA ticket and attach patch for
solrcloud-autoscaling-overview.adoc.
Here's the example of the similar contribution
https://issues.apache.org/jira/browse/SOLR-11834

On Fri, Aug 31, 2018 at 2:25 PM waizou mailing 
wrote:

> Hi Guies,
>
> I'm I on the right channel to report possible bugs in the Solr Ref Guide
> (7.3 and 7.4 page
>
> https://lucene.apache.org/solr/guide/7_3/solrcloud-autoscaling-overview.html
> )
> ?
>


-- 
Sincerely yours
Mikhail Khludnev


Re: if we can do that Inverted index and forward information stored on different disks by solr?

2018-08-31 Thread Shawn Heisey

On 8/31/2018 3:19 AM, shreck wrote:

thanks  , Elasticsearch  also  dosen't support  the feature?
in some situation,i think we can improve search  performance through this
way.


I cannot see how it would improve performance.

The best way to improve search performance is to add memory to the 
system so the operating system doesn't need to read the index data off 
the disk at all -- in that situation, the data will come from memory, 
which is like lightning compared to molasses.


Thanks,
Shawn



Re: Solrcloud collection file location on zookeeper

2018-08-31 Thread Shawn Heisey

On 8/30/2018 10:15 PM, Sushant Vengurlekar wrote:

Where does zookeeper store the collection info on local filesystem on
zookeeper?


ZooKeeper doesn't store info in a way that you can find on the 
filesystem.  ZK data is stored in a file structure that's basically a 
database.   The way I understand it, the entire database is contained in 
a single file with a binary structure that cannot be easily examined by 
anything other than ZooKeeper itself.


If you want detailed information about how ZK stores data, you will need 
to visit a support resource for the ZooKeeper project. It is a 
completely separate project from Solr.


SolrCloud does not store index data in ZooKeeper.

Thanks,
Shawn



Re: change DocExpirationUpdateProcessorFactory deleteByQuery NOW parameter time zone

2018-08-31 Thread Shawn Heisey

On 8/30/2018 7:26 PM, Derek Poh wrote:
Can the timezone of the NOW parameter in the |deleteByQuery| of the 
DocExpirationUpdateProcessorFactory be change to my timezone?


I am in SG and using solr 6.5.1.


I do not know what SG is.

The timezone cannot be changed.  Solr *always* handles dates in UTC.  
You can assign a timezone when doing date math, but this is only used to 
determine when a new day or week starts -- the dates themselves will be 
in UTC.


Thanks,
Shawn



Re: Solrcloud collection file location on zookeeper

2018-08-31 Thread Sushant Vengurlekar
Any idea where this database is stored on the file system.  I don’t want to
read it but just know where it resides.

Thanks

On Fri, Aug 31, 2018 at 7:15 AM Shawn Heisey  wrote:

> On 8/30/2018 10:15 PM, Sushant Vengurlekar wrote:
> > Where does zookeeper store the collection info on local filesystem on
> > zookeeper?
>
> ZooKeeper doesn't store info in a way that you can find on the
> filesystem.  ZK data is stored in a file structure that's basically a
> database.   The way I understand it, the entire database is contained in
> a single file with a binary structure that cannot be easily examined by
> anything other than ZooKeeper itself.
>
> If you want detailed information about how ZK stores data, you will need
> to visit a support resource for the ZooKeeper project. It is a
> completely separate project from Solr.
>
> SolrCloud does not store index data in ZooKeeper.
>
> Thanks,
> Shawn
>
>


Re: Solrcloud collection file location on zookeeper

2018-08-31 Thread Shawn Heisey

On 8/31/2018 8:40 AM, Sushant Vengurlekar wrote:

Any idea where this database is stored on the file system.  I don’t want to
read it but just know where it resides.


If you followed recommendations, your ZooKeeper ensemble is NOT using 
the ZK server that's embedded inside Solr.  In that case I would have 
absolutely no idea, as the setup of that would not be part of Solr at 
all and the locations can be customized to be anything you want.


If you didn't follow advice and are using the embedded zookeeper, there 
is probably a zoo_data directory under your solr home.  Inside that 
location should be a version-2 directory, where the actual data will reside.


Thanks,
Shawn



Re: Solrcloud collection file location on zookeeper

2018-08-31 Thread Erick Erickson
You should have a "zoo.cfg" file in the conf directory (a sibling to
the bin directory where you run ZK). Inside there the dataDir property
specifies where ZooKeeper stores data. NOTE: the default is somewhere
under /tmp and should NOT be used for production since the contents of
/tmp can disappear when you reboot on *nix op systems.

Best,
Erick

On Fri, Aug 31, 2018 at 7:55 AM Shawn Heisey  wrote:
>
> On 8/31/2018 8:40 AM, Sushant Vengurlekar wrote:
> > Any idea where this database is stored on the file system.  I don’t want to
> > read it but just know where it resides.
>
> If you followed recommendations, your ZooKeeper ensemble is NOT using
> the ZK server that's embedded inside Solr.  In that case I would have
> absolutely no idea, as the setup of that would not be part of Solr at
> all and the locations can be customized to be anything you want.
>
> If you didn't follow advice and are using the embedded zookeeper, there
> is probably a zoo_data directory under your solr home.  Inside that
> location should be a version-2 directory, where the actual data will reside.
>
> Thanks,
> Shawn
>


Re: Bugs on documentation

2018-08-31 Thread Erick Erickson
Atom is a free editor that lets you see what the AsciiDoc looks like
via the "preview" pane.

IntelliJ also has an AsciiDoc preview plugin that will help as well.

Best,
Erick
On Fri, Aug 31, 2018 at 6:23 AM Mikhail Khludnev  wrote:
>
> Hi.
> Right. You can create JIRA ticket and attach patch for
> solrcloud-autoscaling-overview.adoc.
> Here's the example of the similar contribution
> https://issues.apache.org/jira/browse/SOLR-11834
>
> On Fri, Aug 31, 2018 at 2:25 PM waizou mailing 
> wrote:
>
> > Hi Guies,
> >
> > I'm I on the right channel to report possible bugs in the Solr Ref Guide
> > (7.3 and 7.4 page
> >
> > https://lucene.apache.org/solr/guide/7_3/solrcloud-autoscaling-overview.html
> > )
> > ?
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: ZooKeeper issues with AWS

2018-08-31 Thread Erick Erickson
Jack:

Is it possible to reproduce "manually"? By that I mean without the
chaos bit by the following:

- Start 3 ZK nodes
- Create a multi-node, multi-shard Solr collection.
- Sequentially stop and start the ZK nodes, waiting for the ZK quorum
to recover between restarts.
- Solr does not reconnect to the restarted ZK node and will think it's
lost quorum after the second node is restarted.

bq. Kill 2, however, and we lose the quorum and we have
collections/replicas that appear as "gone" on the Solr Admin UI's
cloud graph display.

It's odd that replicas appear as "gone", and suggests that your ZK
ensemble is possibly not correctly configured, although exactly how is
a mystery. Solr pulls it's picture of the topology of the network from
ZK, establishes watches and the like. For most operations, Solr
doesn't even ask ZooKeeper for anything since it's picture of the
cluster is stored locally. ZKs job is to inform the various Solr nodes
when the topology changes, i.e. _Solr_ nodes change state. For
querying and indexing, there's no ZK involved at all. Even if _all_
ZooKeeper nodes disappear, Solr should still be able to talk to other
Solr nodes and shouldn't show them as down just because it can't talk
to ZK. Indeed, querying should be OK although indexing will fail if
quorum is lost.

But you say you see the restarted ZK nodes rejoin the ZK ensemble, so
the ZK config seems right. Is there any chance your chaos testing
"somehow" restarts the ZK nodes with any changes to the configs?
Shooting in the dark here.

For a replica to be "gone", the host node should _also_ be removed
form the "live_nodes" znode, H. I do wonder if what you're
observing is a consequence of both killing ZK nodes and Solr nodes.
I'm not saying this is what _should_ happen, just trying to understand
what you're reporting.

My theory here is that your chaos testing kills some Solr nodes and
that fact is correctly propagated to the remaining Solr nodes. Then
your ZK nodes are killed and somehow Solr doesn't reconnect to ZK
appropriately so it's picture of the cluster has the node as
permanently down. Then you restart the Solr node and that information
isn't propagated to the Solr nodes since they didn't reconnect. If
that were the case, then I'd expect the admin UI to correctly show the
state of the cluster when hit on a Solr node that has never been
restarted.

As you can tell, I'm using something of a scattergun approach here b/c
this isn't what _should_ happen given what you describe.
Theoretically, all the ZK nodes should be able to go away and come
back and Solr reconnect...

As an aside, if you are ever in the code you'll see that for a replica
to be usable, it must have both the state set to "active" _and_ the
corresponding node has to be present in the live_nodes ephemeral
zNode.

Is there any chance you could try the manual steps above (AWS isn't
necessary here) and let us know what happens? And if we can get a
reproducible set of steps, feel free to open a JIRA.
On Thu, Aug 30, 2018 at 10:11 PM Jack Schlederer
 wrote:
>
> We run a 3 node ZK cluster, but I'm not concerned about 2 nodes failing at
> the same time. Our chaos process only kills approximately one node per
> hour, and our cloud service provider automatically spins up another ZK node
> when one goes down. All 3 ZK nodes are back up within 2 minutes, talking to
> each other and syncing data. It's just that Solr doesn't seem to recognize
> it. We'd have to restart Solr to get it to recognize the new Zookeepers,
> which we can't do without taking downtime or losing data that's stored on
> non-persistent disk within the container.
>
> The ZK_HOST environment variable lists all 3 ZK nodes.
>
> We're running ZooKeeper version 3.4.13.
>
> Thanks,
> Jack
>
> On Thu, Aug 30, 2018 at 4:12 PM Walter Underwood 
> wrote:
>
> > How many Zookeeper nodes in your ensemble? You need five nodes to
> > handle two failures.
> >
> > Are your Solr instances started with a zkHost that lists all five
> > Zookeeper nodes?
> >
> > What version of Zookeeper?
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Aug 30, 2018, at 1:45 PM, Jack Schlederer <
> > jack.schlede...@directsupply.com> wrote:
> > >
> > > Hi all,
> > >
> > > My team is attempting to spin up a SolrCloud cluster with an external
> > > ZooKeeper ensemble. We're trying to engineer our solution to be HA and
> > > fault-tolerant such that we can lose either 1 Solr instance or 1
> > ZooKeeper
> > > and not take downtime. We use chaos engineering to randomly kill
> > instances
> > > to test our fault-tolerance. Killing Solr instances seems to be solved,
> > as
> > > we use a high enough replication factor and Solr's built in autoscaling
> > to
> > > ensure that new Solr nodes added to the cluster get the replicas that
> > were
> > > lost from the killed node. However, ZooKeeper seems to be a different
> > > story. We can kill 1 ZooKeeper instance and still maintain

Re: Solr7 embeded req: Bad content type error

2018-08-31 Thread Andrea Gazzarini

Hi Alfonso, thank you, too: I've learned another stuff ;)

Andrea

On 29/08/18 14:31, Alfonso Noriega wrote:

Thanks for your time and help Andrea,
I guess we should try to use the Json API
 provided
by Solr and figure out a way to do so with SolrJ.


On Wed, 29 Aug 2018 at 14:21, Andrea Gazzarini  wrote:


Well, I don't know the actual reason why the behavior is different
between Cloud and Embedded client: maybe things are different because in
the Embedded Solr HTTP is not involved at all, but I'm just shooting in
the dark.

I'm not aware about POST capabilities you mentioned, sorry

Andrea


On 29/08/2018 14:07, Alfonso Noriega wrote:

Yes, I realized that changing the method to GET solves the issue but it

is

intentionally set to POST as in a real case scenario we had the issue of
users creating too long queries which where facing the REST length

limits.

I think a possible solution would be to send the Solr params as a json in
the request body, but I am not sure if SolrJ supports this.

Alfonso.

On Wed, 29 Aug 2018 at 13:46, Andrea Gazzarini 

wrote:

I think that's the issue: just guessing because I do not have the code
in front of me.

POST requests put the query in the request body, and the
EmbeddedSolrServer expects to find a valid JSON. Did you try to remove
the Method param?

Andrea


On 29/08/2018 13:12, Alfonso Noriega wrote:

Hi Andrea,

Thanks for your help, something which is relevant and I forgot to

mention

before is that the requests are always a POST method.
As mentioned before, it is not a single query which fails but all of

the

requests done to the  search handler.

final SolrQuery q = new SolrQuery("!( _id_:"+

doc.getId()+")AND(_root_:"+

doc.getId()+")");
final QueryResponse query = solrClient.query(q,

SolrRequest.METHOD.POST);

Regarding the embedded server instantiation:
.
.
.
final CoreContainer container =

CoreContainer.createAndLoad(tmpSolrHome,

tmpSolrConfig);
return new SolrClientWrapper(container, CORE_NAME, tmpSolrHome);
.
.
.
private class SolrClientWrapper extends EmbeddedSolrServer {

   private final Path temporaryFolder;

   public SolrClientWrapper(CoreContainer coreContainer, String
coreName, Path temporaryFolder) {
   super(coreContainer, coreName);
   this.temporaryFolder = temporaryFolder;
   }
.
.
.

The whole embedded server instantiation can be seen here
<

https://github.com/RBMHTechnology/vind/blob/master/server/embedded-solr-server/src/main/java/com/rbmhtechnology/vind/solr/backend/EmbeddedSolrServerProvider.java

.

Best,
Alfonso.

On Wed, 29 Aug 2018 at 12:57, Andrea Gazzarini 

wrote:

Hi Alfonso,
could you please paste an extract of the client code? Specifically

those

few lines where you create the SolrQuery with params.

The line you mentioned is dealing with ContentStream which as far as I
remember wraps the request body, and not the request params. So as
request body Solr expects a valid JSON payload, but in your case, if I
got you, you're sending plain query parameters (through SolrJ).

Best,
Andrea

On 29/08/2018 12:45, Alfonso Noriega wrote:

Hi,
I am implementing a migration of Vind
 library from solr 5 to

7.4.0

and I

am facing an error which I have no idea how to solve...

The library provides a wrapper (and some extra stuff) to develop

search

tools over Solr and uses SolrJ to access it, more info about it can

be

seen

in the public repo, but basically all requests are done to solr

through a

client provided by solrj.
Actually, the migration is done and the tests running against a cloud
instance are working perfectly fine. The issue arises when testing

against

an embedded server (which is the default case as you may not have a

local

solr instance running), in which case every request fails throwing

the

following exception:
*org.apache.solr.common.SolrException: Bad contentType for search

handler

:application/javabin request={q=!(+_id_:P3)AND(_root_:P3)}*

* at


org.apache.solr.request.json.RequestUtil.processParams(RequestUtil.java:73)*

* at


org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginUtils.java:167)*

* at


org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:196)*

* at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)*
* at


org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:191)*

* at

org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)*

* at

org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:974)*

* at

org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:990)*

After some debugging I have seen that the embedded solr server uses
*org.apache.solr.request.json.RequestUtil* to process the parameters

which

invariable expects a '*application/json' *contentType but SolrJ

prodcues

as

*'application/javabin'.*
Any ideas on how to fix

Re: Split on whitespace parameter doubt

2018-08-31 Thread David Argüello Sánchez
Thank you for your response Emir!

Your message was really useful to have a better understanding on the parameter. 

I understand your point and it makes sense. 

I think that we are in the same point. The weird thing is that the parameter 
doesn’t create field centric queries all the time (or at least, as you said, 
that you can choose to do it) 

Regards,
David

On 31 Aug 2018, at 12:04, Jan Høydahl  wrote:

>> I am not sure why field centric field is not used all the time or at least 
>> why there is no parameter to force it.
> 
> Yea, we should have a parameter to force a field/term centric mode if 
> possible.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com 
> 
>> 30. aug. 2018 kl. 20:13 skrev Emir Arnautović > >:
>> 
>> Hi David,
>> Your observations seem correct. If all fields produces the same tokens then 
>> Solr goes for “term centric” query, but if different fields produce 
>> different tokens, then it uses field centric query. Here is blog post that 
>> explains it from multiword synonyms perspective: 
>> https://opensourceconnections.com/blog/2018/02/20/edismax-and-multiterm-synonyms-oddities/
>>  
>> 
>>  
>> >  
>> >
>> 
>> IMO the issue is that it is not clear how term centric would look like in 
>> case of different tokens: Imagine that your query is “a b” and you are 
>> searching  two fields title (analysed) and title_s (string) so you will end 
>> up with tokens ‘a’, ‘b’ and ‘a b’. So term centric query would be (title:a 
>> || title_s:a) (title:b || title_s:b)(title:a b || title_s:a b). If not 
>> already weird, lets assume you allow one token to be missed…
>> 
>> I am not sure why field centric field is not used all the time or at least 
>> why there is no parameter to force it.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ 
>> 
>> 
>> 
>> 
>>> On 30 Aug 2018, at 15:02, David Argüello Sánchez 
>>> mailto:arguellosanchezda...@gmail.com>> 
>>> wrote:
>>> 
>>> Hi everyone,
>>> 
>>> I am doing some tests to understand how the split on whitespace
>>> parameter works with eDisMax query parser. I understand the behaviour,
>>> but I have a doubt about why it works like that.
>>> 
>>> When sow=true, it works as it did with previous Solr versions.
>>> When sow=false, the behaviour changes and all the terms have to be
>>> present in the same field. However, if all queried fields' query
>>> structure is the same, it works as if it had sow=true. This is the
>>> thing that I don’t fully understand.
>>> Specifying sow=false I might want to match only those documents
>>> containing all the terms in the same field, but because of all queried
>>> fields having the same query structure, I would get back documents
>>> containing both terms in any of the fields.
>>> 
>>> Does anyone know the reasoning behind this decision?
>>> Thank you in advance.
>>> 
>>> Regards,
>>> David
>> 
> 


Re: Solrcloud collection file location on zookeeper

2018-08-31 Thread Jan Høydahl
Once Solr 7.5 is released you will have a new "Cloud -> ZK Status" tab that 
will among other things show the data path on each ZK server.
Until then, log in to the ZK server, locate zoo.cfg and check.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 31. aug. 2018 kl. 17:05 skrev Erick Erickson :
> 
> You should have a "zoo.cfg" file in the conf directory (a sibling to
> the bin directory where you run ZK). Inside there the dataDir property
> specifies where ZooKeeper stores data. NOTE: the default is somewhere
> under /tmp and should NOT be used for production since the contents of
> /tmp can disappear when you reboot on *nix op systems.
> 
> Best,
> Erick
> 
> On Fri, Aug 31, 2018 at 7:55 AM Shawn Heisey  wrote:
>> 
>> On 8/31/2018 8:40 AM, Sushant Vengurlekar wrote:
>>> Any idea where this database is stored on the file system.  I don’t want to
>>> read it but just know where it resides.
>> 
>> If you followed recommendations, your ZooKeeper ensemble is NOT using
>> the ZK server that's embedded inside Solr.  In that case I would have
>> absolutely no idea, as the setup of that would not be part of Solr at
>> all and the locations can be customized to be anything you want.
>> 
>> If you didn't follow advice and are using the embedded zookeeper, there
>> is probably a zoo_data directory under your solr home.  Inside that
>> location should be a version-2 directory, where the actual data will reside.
>> 
>> Thanks,
>> Shawn
>> 



MLT in Cloud Mode - Not Returning Fields?

2018-08-31 Thread Doug Turnbull
Hello,

We're working on a Solr More Like This project (Solr 6.6.2), using the More
Like This searchComponent. What we note is in standalone Solr, when we
request MLT using the search component, we get every more like this
document fully formed with complete fields in the moreLikeThis section.

In cloud, however, with the exact same query and config, we only get the
doc ids under "moreLikeThis" requiring us to fetch the metadata associated
with each document.

I can't easily share an example due to confidentiality, but I want to check
if we're missing something? Documentation doesn't mention any limitations.
The only interesting note I've found is this one which points to a
potential difference in behavior

>  The Cloud MLT Query Parser uses the realtime get handler to retrieve the
fields to be mined for keywords. Because of the way the realtime get
handler is implemented, it does not return data for fields populated using
copyField.

https://stackoverflow.com/a/46307140/8123

Any thoughts?

-Doug
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Re: ZooKeeper issues with AWS

2018-08-31 Thread Jack Schlederer
Thanks Erick. After some more testing, I'd like to correct the failure case
we're seeing. It's not when 2 ZK nodes are killed that we have trouble
recovering, but rather when all 3 ZK nodes that came up when the cluster
was initially started get killed at some point. Even if it's one at a time,
and we wait for a new one to spin up and join the cluster before killing
the next one, we get into a bad state when none of the 3 nodes that were in
the cluster initially are there anymore, even though they've been replaced
by our cloud provider spinning up new ZK's. We assign DNS names to the
ZooKeepers as they spin up, with a 10 second TTL, and those are what get
set as the ZK_HOST environment variable on the Solr hosts (i.e., ZK_HOST=
zk1.foo.com:2182,zk2.foo.com:2182,zk3.foo.com:2182). Our working hypothesis
is that Solr's JVM is caching the IP addresses for the ZK hosts' DNS names
when it starts up, and doesn't re-query DNS for some reason when it finds
that that IP address is no longer reachable (i.e., when a ZooKeeper node
dies and spins up at a different IP). Our current trajectory has us finding
a way to assign known static IPs to the ZK nodes upon startup, and
assigning those IPs to the ZK_HOST env var, so we can take DNS lookups out
of the picture entirely.

We can reproduce this in our cloud environment, as each ZK node has its own
IP and DNS name, but it's difficult to reproduce locally due to all the
ZooKeeper containers having the same IP when running locally (127.0.0.1).

Please let us know if you have insight into this issue.

Thanks,
Jack

On Fri, Aug 31, 2018 at 10:40 AM Erick Erickson 
wrote:

> Jack:
>
> Is it possible to reproduce "manually"? By that I mean without the
> chaos bit by the following:
>
> - Start 3 ZK nodes
> - Create a multi-node, multi-shard Solr collection.
> - Sequentially stop and start the ZK nodes, waiting for the ZK quorum
> to recover between restarts.
> - Solr does not reconnect to the restarted ZK node and will think it's
> lost quorum after the second node is restarted.
>
> bq. Kill 2, however, and we lose the quorum and we have
> collections/replicas that appear as "gone" on the Solr Admin UI's
> cloud graph display.
>
> It's odd that replicas appear as "gone", and suggests that your ZK
> ensemble is possibly not correctly configured, although exactly how is
> a mystery. Solr pulls it's picture of the topology of the network from
> ZK, establishes watches and the like. For most operations, Solr
> doesn't even ask ZooKeeper for anything since it's picture of the
> cluster is stored locally. ZKs job is to inform the various Solr nodes
> when the topology changes, i.e. _Solr_ nodes change state. For
> querying and indexing, there's no ZK involved at all. Even if _all_
> ZooKeeper nodes disappear, Solr should still be able to talk to other
> Solr nodes and shouldn't show them as down just because it can't talk
> to ZK. Indeed, querying should be OK although indexing will fail if
> quorum is lost.
>
> But you say you see the restarted ZK nodes rejoin the ZK ensemble, so
> the ZK config seems right. Is there any chance your chaos testing
> "somehow" restarts the ZK nodes with any changes to the configs?
> Shooting in the dark here.
>
> For a replica to be "gone", the host node should _also_ be removed
> form the "live_nodes" znode, H. I do wonder if what you're
> observing is a consequence of both killing ZK nodes and Solr nodes.
> I'm not saying this is what _should_ happen, just trying to understand
> what you're reporting.
>
> My theory here is that your chaos testing kills some Solr nodes and
> that fact is correctly propagated to the remaining Solr nodes. Then
> your ZK nodes are killed and somehow Solr doesn't reconnect to ZK
> appropriately so it's picture of the cluster has the node as
> permanently down. Then you restart the Solr node and that information
> isn't propagated to the Solr nodes since they didn't reconnect. If
> that were the case, then I'd expect the admin UI to correctly show the
> state of the cluster when hit on a Solr node that has never been
> restarted.
>
> As you can tell, I'm using something of a scattergun approach here b/c
> this isn't what _should_ happen given what you describe.
> Theoretically, all the ZK nodes should be able to go away and come
> back and Solr reconnect...
>
> As an aside, if you are ever in the code you'll see that for a replica
> to be usable, it must have both the state set to "active" _and_ the
> corresponding node has to be present in the live_nodes ephemeral
> zNode.
>
> Is there any chance you could try the manual steps above (AWS isn't
> necessary here) and let us know what happens? And if we can get a
> reproducible set of steps, feel free to open a JIRA.
> On Thu, Aug 30, 2018 at 10:11 PM Jack Schlederer
>  wrote:
> >
> > We run a 3 node ZK cluster, but I'm not concerned about 2 nodes failing
> at
> > the same time. Our chaos process only kills approximately one node per
> > hour, and our cloud service prov

Re: ZooKeeper issues with AWS

2018-08-31 Thread Walter Underwood
I would not run Zookeeper in a container. That seems like a very bad idea.
Each Zookeeper node has an identity. They are not interchangeable.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 31, 2018, at 11:14 AM, Jack Schlederer 
>  wrote:
> 
> Thanks Erick. After some more testing, I'd like to correct the failure case
> we're seeing. It's not when 2 ZK nodes are killed that we have trouble
> recovering, but rather when all 3 ZK nodes that came up when the cluster
> was initially started get killed at some point. Even if it's one at a time,
> and we wait for a new one to spin up and join the cluster before killing
> the next one, we get into a bad state when none of the 3 nodes that were in
> the cluster initially are there anymore, even though they've been replaced
> by our cloud provider spinning up new ZK's. We assign DNS names to the
> ZooKeepers as they spin up, with a 10 second TTL, and those are what get
> set as the ZK_HOST environment variable on the Solr hosts (i.e., ZK_HOST=
> zk1.foo.com:2182,zk2.foo.com:2182,zk3.foo.com:2182). Our working hypothesis
> is that Solr's JVM is caching the IP addresses for the ZK hosts' DNS names
> when it starts up, and doesn't re-query DNS for some reason when it finds
> that that IP address is no longer reachable (i.e., when a ZooKeeper node
> dies and spins up at a different IP). Our current trajectory has us finding
> a way to assign known static IPs to the ZK nodes upon startup, and
> assigning those IPs to the ZK_HOST env var, so we can take DNS lookups out
> of the picture entirely.
> 
> We can reproduce this in our cloud environment, as each ZK node has its own
> IP and DNS name, but it's difficult to reproduce locally due to all the
> ZooKeeper containers having the same IP when running locally (127.0.0.1).
> 
> Please let us know if you have insight into this issue.
> 
> Thanks,
> Jack
> 
> On Fri, Aug 31, 2018 at 10:40 AM Erick Erickson 
> wrote:
> 
>> Jack:
>> 
>> Is it possible to reproduce "manually"? By that I mean without the
>> chaos bit by the following:
>> 
>> - Start 3 ZK nodes
>> - Create a multi-node, multi-shard Solr collection.
>> - Sequentially stop and start the ZK nodes, waiting for the ZK quorum
>> to recover between restarts.
>> - Solr does not reconnect to the restarted ZK node and will think it's
>> lost quorum after the second node is restarted.
>> 
>> bq. Kill 2, however, and we lose the quorum and we have
>> collections/replicas that appear as "gone" on the Solr Admin UI's
>> cloud graph display.
>> 
>> It's odd that replicas appear as "gone", and suggests that your ZK
>> ensemble is possibly not correctly configured, although exactly how is
>> a mystery. Solr pulls it's picture of the topology of the network from
>> ZK, establishes watches and the like. For most operations, Solr
>> doesn't even ask ZooKeeper for anything since it's picture of the
>> cluster is stored locally. ZKs job is to inform the various Solr nodes
>> when the topology changes, i.e. _Solr_ nodes change state. For
>> querying and indexing, there's no ZK involved at all. Even if _all_
>> ZooKeeper nodes disappear, Solr should still be able to talk to other
>> Solr nodes and shouldn't show them as down just because it can't talk
>> to ZK. Indeed, querying should be OK although indexing will fail if
>> quorum is lost.
>> 
>> But you say you see the restarted ZK nodes rejoin the ZK ensemble, so
>> the ZK config seems right. Is there any chance your chaos testing
>> "somehow" restarts the ZK nodes with any changes to the configs?
>> Shooting in the dark here.
>> 
>> For a replica to be "gone", the host node should _also_ be removed
>> form the "live_nodes" znode, H. I do wonder if what you're
>> observing is a consequence of both killing ZK nodes and Solr nodes.
>> I'm not saying this is what _should_ happen, just trying to understand
>> what you're reporting.
>> 
>> My theory here is that your chaos testing kills some Solr nodes and
>> that fact is correctly propagated to the remaining Solr nodes. Then
>> your ZK nodes are killed and somehow Solr doesn't reconnect to ZK
>> appropriately so it's picture of the cluster has the node as
>> permanently down. Then you restart the Solr node and that information
>> isn't propagated to the Solr nodes since they didn't reconnect. If
>> that were the case, then I'd expect the admin UI to correctly show the
>> state of the cluster when hit on a Solr node that has never been
>> restarted.
>> 
>> As you can tell, I'm using something of a scattergun approach here b/c
>> this isn't what _should_ happen given what you describe.
>> Theoretically, all the ZK nodes should be able to go away and come
>> back and Solr reconnect...
>> 
>> As an aside, if you are ever in the code you'll see that for a replica
>> to be usable, it must have both the state set to "active" _and_ the
>> corresponding node has to be present in the live_nodes ephemeral
>> zNode.
>> 
>> Is there any chance 

Re: ZooKeeper issues with AWS

2018-08-31 Thread Shawn Heisey

On 8/31/2018 12:14 PM, Jack Schlederer wrote:

Our working hypothesis is that Solr's JVM is caching the IP addresses for the 
ZK hosts' DNS names when it starts up, and doesn't re-query DNS for some reason 
when it finds that that IP address is no longer reachable (i.e., when a 
ZooKeeper node dies and spins up at a different IP).


It might be the Solr JVM that's doing this, but it is NOT Solr code.  It 
is ZooKeeper code.


Solr incorporates the ZooKeeper jar and uses the ZooKeeper API for all 
interaction with ZooKeeper.  There is nothing we can do for this DNS 
problem -- it is a problem that must be raised with the ZooKeeper project.


As Walter hinted, ZooKeeper 3.4.x is not capable of dynamically 
adding/removing servers to/from the ensemble.  To do this successfully, 
all ZK servers and all ZK clients must be upgraded to 3.5.x.  Solr is a 
ZK client when running in cloud mode.  The 3.5.x version of ZK is 
currently in beta.  When a stable version is released, Solr will have 
its dependency upgraded in the next release.  We do not know if you can 
successfully replace the ZK jar in Solr with a 3.5.x version without 
making changes to the code.


Thanks,
Shawn



Re: Multiple solr instances per host vs Multiple cores in same solr instance

2018-08-31 Thread Wei
Hi Erick,

I am looking into the rule based replica placement documentation and
confused. How to ensure there are no more than one replica for any shard on
the same host?   There is an example rule  shard:*,replica:<2,node:* seem
to serve the purpose, but  I am not sure if  'node' refer to solr instance
or actual physical host. Is there an example for defining node?

Thanks



On Sun, Aug 26, 2018 at 8:37 PM Erick Erickson 
wrote:

> Yes, you can use the "node placement rules", see:
> https://lucene.apache.org/solr/guide/6_6/rule-based-replica-placement.html
>
> This is a variant of "rack awareness".
>
> Of course the simplest way if you're not doing very many collections is to
> create the collection with the special "EMPTY" createNodeSet then just
> build out your collection with ADDREPLICA, placing each replica on a
> particular node. The idea of that capability was exactly to explicitly
> control
> where each and every replica landed.
>
> As a third alternative, just create the collection and let Solr put
> the replicas where
> it will, then use MOVEREPLICA to position replicas as you want.
>
> The node placement rules are primarily intended for automated or very large
> setups. Manually placing replicas is simpler for limited numbers.
>
> Best,
> Erick
> On Sun, Aug 26, 2018 at 8:10 PM Wei  wrote:
> >
> > Thanks Shawn. When using multiple Solr instances per host, is there any
> way
> > to prevent solrcloud from putting multiple replicas of the same shard on
> > same host?
> > I see it makes sense if we can splitting into multiple instances with
> > smaller heap size. Besides that, do you think multiple instances will be
> > able to get better CPU utilization on multi-core server?
> >
> > Thanks,
> > Wei
> >
> > On Sun, Aug 26, 2018 at 4:37 AM Shawn Heisey 
> wrote:
> >
> > > On 8/26/2018 12:00 AM, Wei wrote:
> > > > I have a question about the deployment configuration in solr cloud.
> When
> > > > we need to increase the number of shards in solr cloud, there are two
> > > > options:
> > > >
> > > > 1.  Run multiple solr instances per host, each with a different port
> and
> > > > hosting a single core for one shard.
> > > >
> > > > 2.  Run one solr instance per host, and have multiple cores(shards)
> in
> > > the
> > > > same solr instance.
> > > >
> > > > Which would be better performance wise? For the first option I think
> JVM
> > > > size for each solr instance can be smaller, but deployment is more
> > > > complicated? Are there any differences for cpu utilization?
> > >
> > > My general advice is to only have one Solr instance per machine.  One
> > > Solr instance can handle many indexes, and usually will do so with less
> > > overhead than two or more instances.
> > >
> > > I can think of *ONE* exception to this -- when a single Solr instance
> > > would require a heap that's extremely large. Splitting that into two or
> > > more instances MIGHT greatly reduce garbage collection pauses.  But
> > > there's a caveat to the caveat -- in my strong opinion, if your Solr
> > > instance is so big that it requires a huge heap and you're considering
> > > splitting into multiple Solr instances on one machine, you very likely
> > > need to run each of those instances on *separate* machines, so that
> each
> > > one can have access to all the resources of the machine it's running
> on.
> > >
> > > For SolrCloud, when you're running multiple instances per machine, Solr
> > > will consider those to be completely separate instances, and you may
> end
> > > up with all of the replicas for a shard on a single machine, which is a
> > > problem for high availability.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
>


Need to connect solr with solrj from AWS lambda

2018-08-31 Thread nalsrini
Hi,
I need to connect solr with solrj from AWS java lambda. I use solr 5.3.

I get the client object like this:
SolrClient client = new
HttpSolrClient(System.getenv(SysEnvConstants.SOLR_HOST));

I neither get an error nor a response when I call these(for example) from
the lambda:

SolrDocument sorld = client.getById(id);

OR

UpdateResponse ur = client.deleteByQuery(sb.toString());

thanks
Srini



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Unknown field "cache"

2018-08-31 Thread kunhu0...@gmail.com
Hello Team,

Need suggestions on Solr Indexing. We are using Solr-6.6.3 and Nutch 1.14. 

I see unknown field 'cache' error while indexing the data to Solr so i added
below entry in field section of schema.xml forsolr



Tried indexing the data again and this time error is unknown field 'date'.
However i have the http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr unknown field 'metatag.description'

2018-08-31 Thread kunhu0...@gmail.com
 Team,

Need suggestions on  solr indexing error unknown field 'metatag.description'

We are using Nutch 1.14 and solr 6.6.3

Nutch-site.xml is below

protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(basic|site|url|lang)|indexer-solr|nutch-extensionpoints|parse-(html|tika|text|msexcel|msword|mspowerpoint|pdf)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|parse-(html|tika|metatags|msword|msexcel|pdf)|index-(basic|anchor|more|metadata)|language-identifier


Should i add anything under schema.xml and managed.schema for solr ? Please
help




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


BUMP: Atomic updates and POST command?

2018-08-31 Thread Scott Prentice

Just bumping this post from a few days ago.

Is anyone using atomic updates? If so, how are you passing the updates 
to Solr? I'm seeing a significant difference between the REST API and 
the post command .. is this to be expected? What's the recommended 
method for doing the update?


Thanks!
...scott


On 8/29/18 3:02 PM, Scott Prentice wrote:

Hi...

I'm trying to get atomic updates working and am seeing some 
strangeness. Here's my JSON with the data to update ..


[{"id":"/unique/path/id",
  "field1":{"set","newvalue1"},
  "field2":{"set","newvalue2"}
}]

If I use the REST API via curl it works fine. With the following 
command, the field1 and field2 fields get the new values, and all's well.


curl 'http://localhost:8983/solr/core01/update/json?commit=true' 
--data-binary @test1.json -H 'Content-type:application/json'


BUT, if I use the post command ..

./bin/post -c core01 /home/xtech/solrtest/test1.json

.. the record gets updated with new fields named "field1.set" and 
"field2.set", and the managed-schema file is modified to include these 
new field definitions. Not at all what I'd expect or want. Is there 
some setting or switch that will let the post command work "properly", 
or am I misunderstanding what's correct? I can use curl, but our 
current workflow uses the post command so I thought that might do the 
job.


Any thoughts are welcome!

Thanks,
...scott









Re: BUMP: Atomic updates and POST command?

2018-08-31 Thread Alexandre Rafalovitch
I think you are using different end points there. /update by default vs
/update/json

So i think the post gets treated as generic json parsing.

Can you try the same end point?

Regards,
 Alex


On Fri, Aug 31, 2018, 7:05 PM Scott Prentice,  wrote:

> Just bumping this post from a few days ago.
>
> Is anyone using atomic updates? If so, how are you passing the updates
> to Solr? I'm seeing a significant difference between the REST API and
> the post command .. is this to be expected? What's the recommended
> method for doing the update?
>
> Thanks!
> ...scott
>
>
> On 8/29/18 3:02 PM, Scott Prentice wrote:
> > Hi...
> >
> > I'm trying to get atomic updates working and am seeing some
> > strangeness. Here's my JSON with the data to update ..
> >
> > [{"id":"/unique/path/id",
> >   "field1":{"set","newvalue1"},
> >   "field2":{"set","newvalue2"}
> > }]
> >
> > If I use the REST API via curl it works fine. With the following
> > command, the field1 and field2 fields get the new values, and all's well.
> >
> > curl 'http://localhost:8983/solr/core01/update/json?commit=true'
> > --data-binary @test1.json -H 'Content-type:application/json'
> >
> > BUT, if I use the post command ..
> >
> > ./bin/post -c core01 /home/xtech/solrtest/test1.json
> >
> > .. the record gets updated with new fields named "field1.set" and
> > "field2.set", and the managed-schema file is modified to include these
> > new field definitions. Not at all what I'd expect or want. Is there
> > some setting or switch that will let the post command work "properly",
> > or am I misunderstanding what's correct? I can use curl, but our
> > current workflow uses the post command so I thought that might do the
> > job.
> >
> > Any thoughts are welcome!
> >
> > Thanks,
> > ...scott
> >
> >
> >
> >
> >
>
>


Re: BUMP: Atomic updates and POST command?

2018-08-31 Thread Scott Prentice
Hmm. That makes sense .. but where do you provide the endpoint to post? 
Is that additional commands within the JSON or a parameter at the 
command line?


Thanks,
...scott


On 8/31/18 4:48 PM, Alexandre Rafalovitch wrote:

I think you are using different end points there. /update by default vs
/update/json

So i think the post gets treated as generic json parsing.

Can you try the same end point?

Regards,
  Alex


On Fri, Aug 31, 2018, 7:05 PM Scott Prentice wrote:


Just bumping this post from a few days ago.

Is anyone using atomic updates? If so, how are you passing the updates
to Solr? I'm seeing a significant difference between the REST API and
the post command .. is this to be expected? What's the recommended
method for doing the update?

Thanks!
...scott


On 8/29/18 3:02 PM, Scott Prentice wrote:

Hi...

I'm trying to get atomic updates working and am seeing some
strangeness. Here's my JSON with the data to update ..

[{"id":"/unique/path/id",
   "field1":{"set","newvalue1"},
   "field2":{"set","newvalue2"}
}]

If I use the REST API via curl it works fine. With the following
command, the field1 and field2 fields get the new values, and all's well.

curl 'http://localhost:8983/solr/core01/update/json?commit=true'
--data-binary @test1.json -H 'Content-type:application/json'

BUT, if I use the post command ..

./bin/post -c core01 /home/xtech/solrtest/test1.json

.. the record gets updated with new fields named "field1.set" and
"field2.set", and the managed-schema file is modified to include these
new field definitions. Not at all what I'd expect or want. Is there
some setting or switch that will let the post command work "properly",
or am I misunderstanding what's correct? I can use curl, but our
current workflow uses the post command so I thought that might do the
job.

Any thoughts are welcome!

Thanks,
...scott











Re: BUMP: Atomic updates and POST command?

2018-08-31 Thread Scott Prentice

Ah .. is this done with the -url parameter? As in ..

./bin/post -url http://localhost:8983/solr/core01/update/json 
/home/xtech/solrtest/test1.json


Will test.

Thanks,
...scott


On 8/31/18 5:15 PM, Scott Prentice wrote:
Hmm. That makes sense .. but where do you provide the endpoint to 
post? Is that additional commands within the JSON or a parameter at 
the command line?


Thanks,
...scott


On 8/31/18 4:48 PM, Alexandre Rafalovitch wrote:

I think you are using different end points there. /update by default vs
/update/json

So i think the post gets treated as generic json parsing.

Can you try the same end point?

Regards,
  Alex


On Fri, Aug 31, 2018, 7:05 PM Scott Prentice wrote:


Just bumping this post from a few days ago.

Is anyone using atomic updates? If so, how are you passing the updates
to Solr? I'm seeing a significant difference between the REST API and
the post command .. is this to be expected? What's the recommended
method for doing the update?

Thanks!
...scott


On 8/29/18 3:02 PM, Scott Prentice wrote:

Hi...

I'm trying to get atomic updates working and am seeing some
strangeness. Here's my JSON with the data to update ..

[{"id":"/unique/path/id",
   "field1":{"set","newvalue1"},
   "field2":{"set","newvalue2"}
}]

If I use the REST API via curl it works fine. With the following
command, the field1 and field2 fields get the new values, and all's 
well.


curl 'http://localhost:8983/solr/core01/update/json?commit=true'
--data-binary @test1.json -H 'Content-type:application/json'

BUT, if I use the post command ..

./bin/post -c core01 /home/xtech/solrtest/test1.json

.. the record gets updated with new fields named "field1.set" and
"field2.set", and the managed-schema file is modified to include these
new field definitions. Not at all what I'd expect or want. Is there
some setting or switch that will let the post command work "properly",
or am I misunderstanding what's correct? I can use curl, but our
current workflow uses the post command so I thought that might do the
job.

Any thoughts are welcome!

Thanks,
...scott














Re: BUMP: Atomic updates and POST command?

2018-08-31 Thread Scott Prentice

Nope. That's not it. It complains about this path not being found ..

    /solr/core01/update/json/json/docs

So, I changed the -url value to this 
"http://localhost:8983/solr/core01/update"; .. which was "successful", 
but created the same odd index structure of "field.set".


I'm clearly flailing. If you have any thoughts on this, do let me know.

Thanks!
...scott


On 8/31/18 5:20 PM, Scott Prentice wrote:

Ah .. is this done with the -url parameter? As in ..

./bin/post -url http://localhost:8983/solr/core01/update/json 
/home/xtech/solrtest/test1.json


Will test.

Thanks,
...scott


On 8/31/18 5:15 PM, Scott Prentice wrote:
Hmm. That makes sense .. but where do you provide the endpoint to 
post? Is that additional commands within the JSON or a parameter at 
the command line?


Thanks,
...scott


On 8/31/18 4:48 PM, Alexandre Rafalovitch wrote:

I think you are using different end points there. /update by default vs
/update/json

So i think the post gets treated as generic json parsing.

Can you try the same end point?

Regards,
  Alex


On Fri, Aug 31, 2018, 7:05 PM Scott Prentice wrote:


Just bumping this post from a few days ago.

Is anyone using atomic updates? If so, how are you passing the updates
to Solr? I'm seeing a significant difference between the REST API and
the post command .. is this to be expected? What's the recommended
method for doing the update?

Thanks!
...scott


On 8/29/18 3:02 PM, Scott Prentice wrote:

Hi...

I'm trying to get atomic updates working and am seeing some
strangeness. Here's my JSON with the data to update ..

[{"id":"/unique/path/id",
   "field1":{"set","newvalue1"},
   "field2":{"set","newvalue2"}
}]

If I use the REST API via curl it works fine. With the following
command, the field1 and field2 fields get the new values, and 
all's well.


curl 'http://localhost:8983/solr/core01/update/json?commit=true'
--data-binary @test1.json -H 'Content-type:application/json'

BUT, if I use the post command ..

./bin/post -c core01 /home/xtech/solrtest/test1.json

.. the record gets updated with new fields named "field1.set" and
"field2.set", and the managed-schema file is modified to include 
these

new field definitions. Not at all what I'd expect or want. Is there
some setting or switch that will let the post command work 
"properly",

or am I misunderstanding what's correct? I can use curl, but our
current workflow uses the post command so I thought that might do the
job.

Any thoughts are welcome!

Thanks,
...scott

















Re: BUMP: Atomic updates and POST command?

2018-08-31 Thread Alexandre Rafalovitch
Ok,

Try "-format solr" instead of "-url ...".

Regards,
   Alex.

On 31 August 2018 at 20:54, Scott Prentice  wrote:
> Nope. That's not it. It complains about this path not being found ..
>
> /solr/core01/update/json/json/docs
>
> So, I changed the -url value to this
> "http://localhost:8983/solr/core01/update"; .. which was "successful", but
> created the same odd index structure of "field.set".
>
> I'm clearly flailing. If you have any thoughts on this, do let me know.
>
> Thanks!
> ...scott
>
>
>
> On 8/31/18 5:20 PM, Scott Prentice wrote:
>>
>> Ah .. is this done with the -url parameter? As in ..
>>
>> ./bin/post -url http://localhost:8983/solr/core01/update/json
>> /home/xtech/solrtest/test1.json
>>
>> Will test.
>>
>> Thanks,
>> ...scott
>>
>>
>> On 8/31/18 5:15 PM, Scott Prentice wrote:
>>>
>>> Hmm. That makes sense .. but where do you provide the endpoint to post?
>>> Is that additional commands within the JSON or a parameter at the command
>>> line?
>>>
>>> Thanks,
>>> ...scott
>>>
>>>
>>> On 8/31/18 4:48 PM, Alexandre Rafalovitch wrote:

 I think you are using different end points there. /update by default vs
 /update/json

 So i think the post gets treated as generic json parsing.

 Can you try the same end point?

 Regards,
   Alex


 On Fri, Aug 31, 2018, 7:05 PM Scott Prentice wrote:

> Just bumping this post from a few days ago.
>
> Is anyone using atomic updates? If so, how are you passing the updates
> to Solr? I'm seeing a significant difference between the REST API and
> the post command .. is this to be expected? What's the recommended
> method for doing the update?
>
> Thanks!
> ...scott
>
>
> On 8/29/18 3:02 PM, Scott Prentice wrote:
>>
>> Hi...
>>
>> I'm trying to get atomic updates working and am seeing some
>> strangeness. Here's my JSON with the data to update ..
>>
>> [{"id":"/unique/path/id",
>>"field1":{"set","newvalue1"},
>>"field2":{"set","newvalue2"}
>> }]
>>
>> If I use the REST API via curl it works fine. With the following
>> command, the field1 and field2 fields get the new values, and all's
>> well.
>>
>> curl 'http://localhost:8983/solr/core01/update/json?commit=true'
>> --data-binary @test1.json -H 'Content-type:application/json'
>>
>> BUT, if I use the post command ..
>>
>> ./bin/post -c core01 /home/xtech/solrtest/test1.json
>>
>> .. the record gets updated with new fields named "field1.set" and
>> "field2.set", and the managed-schema file is modified to include these
>> new field definitions. Not at all what I'd expect or want. Is there
>> some setting or switch that will let the post command work "properly",
>> or am I misunderstanding what's correct? I can use curl, but our
>> current workflow uses the post command so I thought that might do the
>> job.
>>
>> Any thoughts are welcome!
>>
>> Thanks,
>> ...scott
>>
>>
>>
>>
>>
>
>>>
>>>
>>
>>
>


Re: BUMP: Atomic updates and POST command?

2018-08-31 Thread Scott Prentice

Yup. That does the trick! Here's my command line ..

    $ ./bin/post -c core01 -format solr /home/xtech/solrtest/test1b.json

I saw that "-format solr" option, but it wasn't clear what it did. It's 
still not clear to me how that changes the endpoint to allow for 
updates. But nice to see that it works!


Thanks for your help!
...scott


On 8/31/18 6:04 PM, Alexandre Rafalovitch wrote:

Ok,

Try "-format solr" instead of "-url ...".

Regards,
Alex.

On 31 August 2018 at 20:54, Scott Prentice  wrote:

Nope. That's not it. It complains about this path not being found ..

 /solr/core01/update/json/json/docs

So, I changed the -url value to this
"http://localhost:8983/solr/core01/update"; .. which was "successful", but
created the same odd index structure of "field.set".

I'm clearly flailing. If you have any thoughts on this, do let me know.

Thanks!
...scott



On 8/31/18 5:20 PM, Scott Prentice wrote:

Ah .. is this done with the -url parameter? As in ..

./bin/post -url http://localhost:8983/solr/core01/update/json
/home/xtech/solrtest/test1.json

Will test.

Thanks,
...scott


On 8/31/18 5:15 PM, Scott Prentice wrote:

Hmm. That makes sense .. but where do you provide the endpoint to post?
Is that additional commands within the JSON or a parameter at the command
line?

Thanks,
...scott


On 8/31/18 4:48 PM, Alexandre Rafalovitch wrote:

I think you are using different end points there. /update by default vs
/update/json

So i think the post gets treated as generic json parsing.

Can you try the same end point?

Regards,
   Alex


On Fri, Aug 31, 2018, 7:05 PM Scott Prentice wrote:


Just bumping this post from a few days ago.

Is anyone using atomic updates? If so, how are you passing the updates
to Solr? I'm seeing a significant difference between the REST API and
the post command .. is this to be expected? What's the recommended
method for doing the update?

Thanks!
...scott


On 8/29/18 3:02 PM, Scott Prentice wrote:

Hi...

I'm trying to get atomic updates working and am seeing some
strangeness. Here's my JSON with the data to update ..

[{"id":"/unique/path/id",
"field1":{"set","newvalue1"},
"field2":{"set","newvalue2"}
}]

If I use the REST API via curl it works fine. With the following
command, the field1 and field2 fields get the new values, and all's
well.

curl 'http://localhost:8983/solr/core01/update/json?commit=true'
--data-binary @test1.json -H 'Content-type:application/json'

BUT, if I use the post command ..

./bin/post -c core01 /home/xtech/solrtest/test1.json

.. the record gets updated with new fields named "field1.set" and
"field2.set", and the managed-schema file is modified to include these
new field definitions. Not at all what I'd expect or want. Is there
some setting or switch that will let the post command work "properly",
or am I misunderstanding what's correct? I can use curl, but our
current workflow uses the post command so I thought that might do the
job.

Any thoughts are welcome!

Thanks,
...scott













Re: ZooKeeper issues with AWS

2018-08-31 Thread Erick Erickson
Jack:

Yeah, I understood that you were only killing one ZK at a time.

I think Walter and Shawn are pointing you in the right direction.
On Fri, Aug 31, 2018 at 12:53 PM Shawn Heisey  wrote:
>
> On 8/31/2018 12:14 PM, Jack Schlederer wrote:
> > Our working hypothesis is that Solr's JVM is caching the IP addresses for 
> > the ZK hosts' DNS names when it starts up, and doesn't re-query DNS for 
> > some reason when it finds that that IP address is no longer reachable 
> > (i.e., when a ZooKeeper node dies and spins up at a different IP).
>
> It might be the Solr JVM that's doing this, but it is NOT Solr code.  It
> is ZooKeeper code.
>
> Solr incorporates the ZooKeeper jar and uses the ZooKeeper API for all
> interaction with ZooKeeper.  There is nothing we can do for this DNS
> problem -- it is a problem that must be raised with the ZooKeeper project.
>
> As Walter hinted, ZooKeeper 3.4.x is not capable of dynamically
> adding/removing servers to/from the ensemble.  To do this successfully,
> all ZK servers and all ZK clients must be upgraded to 3.5.x.  Solr is a
> ZK client when running in cloud mode.  The 3.5.x version of ZK is
> currently in beta.  When a stable version is released, Solr will have
> its dependency upgraded in the next release.  We do not know if you can
> successfully replace the ZK jar in Solr with a 3.5.x version without
> making changes to the code.
>
> Thanks,
> Shawn
>


Sorting by custom order

2018-08-31 Thread Salvo Bonanno
Hello

I need to sort a results set in a particolar way... the documents
looks like this:

{
"customer_id":28998,
"name_txt":["Equal Corp"],
"address_txt":["Austin Ring Center"],
"municipality_txt":["Austin"],
"province_txt":["Austin"],
"region_txt":["TX"],
"profile_txt":["Base"],
"visibility_weight_txt":["2"]
},
{
"customer_id":28997,
"name_txt":["Mustard Ltd"],
"address_txt":["Telegraph Road"],
"municipality_txt":["London"],
"province_txt":["London"],
"region_txt":["UK"],
"profile_txt":["Gold"],
"visibility_weight_txt":["2"]
}

I need to sort them by profile_txt value (it's a multiValue field but
actually it contains just a single value), but since the possible
values are just 5, I'd like to tokenize them for deciding in wich
order the should came.

the order should follows a simple schema:

1. profile_txt = "Gold"
2. profile_txt = "Super"
3. profile_txt = "Smart"
4. profile_txt = "Base"
5. profile_txt = "Essential"

Then an additional sort by visibility_weight_txt should be done

Is it possible to do this someway using FunctionQueries? Unfortunally
I can't modify the schema.

Thanks for reading