Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-08 Thread Zheng Lin Edwin Yeo
Thank you nutchsolruser and Shawn.

I've changed the clientPort to different port for each of the machine.
It is able to work for my another setup, in which I have 3 different
zookeeper folder, and each has its own configuration and all are using
zoo.cfg. For that setup I can start the 3 servers individually using
zkServer.cmd.

However, when I use zkServer.cmd in the setup which I posted earlier, only
the first server managed to get started up, and I see the same error
message for the other 2 servers. Although some documents says use the
following commands, it doesn't help too, since I'm supposed to use the
zkServer.cmd
zkServer.cmd start zoo.cfg
zkServer.cmd start zoo2.cfg
zkServer.cmd start zoo3.cfg


Regards,
Edwin



On 8 April 2015 at 13:46, Shawn Heisey  wrote:

> On 4/7/2015 9:16 PM, Zheng Lin Edwin Yeo wrote:
> > I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and now
> > I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a
> single
> > machine. These are the settings which I have configured, according to the
> > Solr Reference Guide.
> >
> > These files are under \conf\ directory
> > (C:\Users\edwin\zookeeper-3.4.6\conf)
> >
> > *zoo.cfg*
> > tickTime=2000
> > initLimit=10
> > syncLimit=5
> > dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1
> > clientPort=2181
> > server.1=localhost:2888:3888
> > server.2=localhost:2889:3889
> > server.3=localhost:2890:3890
>
> 
>
> >  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot
> > open channel to 2 at election address localhost/127.0.0.1:3889
> > java.net.ConnectException: Connection refused: connect
>
> The first thing I would suspect when running any network program on a
> Windows machine that won't communicate is the Windows firewall, unless
> you have either turned off the firewall or you have explicitly
> configured an exception in the firewall for the relevant ports.
>
> Your other reply that you got from nutchsolruser does point out that all
> three zookeeper configs are using 2181 as the clientPort.  Because these
> are all running on the same machine, you must use a different port for
> each one.  I'm not sure what happens to subsequent processes after the
> first one starts, but they won't work even if they do manage to start.
>
> Thanks,
> Shawn
>
>


Re: What is the best way of Indexing different formats of documents?

2015-04-08 Thread sangs8788
I just want to index only certain documents and there will not be any update
happening on the indexed document. 

In our existing system we already have DIH implemented which indexes
document from sql server (As you said based on last index time). In this
case the metadata is there available in database. but if we are streaming
via url, we woulld need to append the metadata too. correct me if i am
wrong.

is  /update/extract is extractrequesthandler you are talking about ? Is
there we post the metadata in the url to solr server ? When you say manual
operation what is it you are talking about ? 

Can you please clarify .



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-best-way-of-Indexing-different-formats-of-documents-tp4198053p4198288.html
Sent from the Solr - User mailing list archive at Nabble.com.


search on special characters

2015-04-08 Thread avinash09
not able to search on special characters like . ,_

my query
http://localhost:8983/solr/rna/select?q=name:"UAE
B"&wt=json&fl=name&rows=100

getting result UAE_bhdgsfsdbj

but for
http://localhost:8983/solr/rna/select?q=name:"UAE_"&wt=json&fl=name&rows=100

no result found

I am using below field type
   

  
  
  
  
  
  


  
  
  

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-on-special-characters-tp4198286.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: distributed search on tables

2015-04-08 Thread avinash09
thanks Erick



--
View this message in context: 
http://lucene.472066.n3.nabble.com/distributed-search-on-tables-tp4197456p4198285.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrconfig.xml error

2015-04-08 Thread Pradeep
We have installed solr-4.3.0 is our local but we are getting error. 
Please find attachment. And help us to fix this error.


Thank You.
Regards,
Pradeep


Re: solrconfig.xml error

2015-04-08 Thread Andrea Gazzarini

Hi Pradeep,
AFAIK the mailing list doesn't allow attachments. I think pasting the 
error should be enough


Best,
Andrea

On 04/08/2015 09:02 AM, Pradeep wrote:
We have installed solr-4.3.0 is our local but we are getting error. 
Please find attachment. And help us to fix this error.


Thank You.
Regards,
Pradeep




RE: What is the best way of Indexing different formats of documents?

2015-04-08 Thread sangeetha.subraman...@gtnexus.com
Hi Swaraj,



Thanks for the answers.

From my understanding We can index,

·   Using DIH from db

·   Using DIH from filesystem - this is where I am concentrating on.

o   For this we can use SolrJ with Tika(solr cell) from Java layer in order to 
extract the content and send the data through REST API to solrserver

o   Or we can use extractrequesthandler to do the job.



I just want to index only certain documents and there will not be any update 
happening on the indexed document.



In our existing system we already have DIH implemented which indexes document 
from sql server (As you said based on last index time). In this case the 
metadata is there available in database.



But if we are streaming via url, we would need to append the metadata too. 
correct me if i am wrong. And how does the indexing happening here based on 
last index time or something else ? Also for  extractrequesthandler when you 
say manual operation what is it you are talking about ? Can you please clarify.



Thanks

Sangeetha



-Original Message-
From: Swaraj Kumar [mailto:swaraj2...@gmail.com]
Sent: 07 April 2015 18:02
To: solr-user@lucene.apache.org
Subject: Re: What is the best way of Indexing different formats of documents?



You can always choose either DIH or /update/extract to index docs in solr.

Now there are multiple benefits of DIH which I am listing below :-



1. Clean and update using a single command.

2. DIH also optimize indexing using optimize=true 3. You can do delta-import 
based on last index time where as in case of /update/extract you need to do 
manual operation in case of delta import.

4. You can use multiple entity processor and transformers in case of DIH which 
is very useful to index exact data you want.

5. Query parameter "rows" limits the num of records.



Regards,





Swaraj Kumar

Senior Software Engineer I

MakeMyTrip.com

Mob No- 9811774497



On Tue, Apr 7, 2015 at 4:18 PM, 
sangeetha.subraman...@gtnexus.com < 
sangeetha.subraman...@gtnexus.com> 
wrote:



> Hi,

>

> I am a newbie to SOLR and basically from database background. We have

> a requirement of indexing files of different formats (x12,edifact, csv,xml).

> The files which are inputted can be of any format and we need to do a

> content based search on it.

>

> From the web I understand we can use TIKA processor to extract the

> content and store it in SOLR. What I want to know is, is there any

> better approach for indexing files in SOLR ? Can we index the document

> through streaming directly from the Application ? If so what is the

> disadvantage of using it (against DIH which fetches from the

> database)? Could someone share me some insight on this ? ls there any

> web links which I can refer to get some idea on it ? Please do help.

>

> Thanks

> Sangeetha

>

>


Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-08 Thread Swaraj Kumar
Hi Zheng,

I am not sure if this command *"zkServer.cmd start zoo.cfg" * works in
windows or not, but in zkServer.cmd it calls zkEnv.cmd where "
*ZOOCFG=%ZOOCFGDIR%\zoo.cfg*" is set. So, if you want to run multiple
instances of zookeeper, change zoo.cfg to your config file and start
zookeeper.
The command will not include any start.



Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
Mob No- 9811774497

On Wed, Apr 8, 2015 at 12:29 PM, Zheng Lin Edwin Yeo 
wrote:

> Thank you nutchsolruser and Shawn.
>
> I've changed the clientPort to different port for each of the machine.
> It is able to work for my another setup, in which I have 3 different
> zookeeper folder, and each has its own configuration and all are using
> zoo.cfg. For that setup I can start the 3 servers individually using
> zkServer.cmd.
>
> However, when I use zkServer.cmd in the setup which I posted earlier, only
> the first server managed to get started up, and I see the same error
> message for the other 2 servers. Although some documents says use the
> following commands, it doesn't help too, since I'm supposed to use the
> zkServer.cmd
> zkServer.cmd start zoo.cfg
> zkServer.cmd start zoo2.cfg
> zkServer.cmd start zoo3.cfg
>
>
> Regards,
> Edwin
>
>
>
> On 8 April 2015 at 13:46, Shawn Heisey  wrote:
>
> > On 4/7/2015 9:16 PM, Zheng Lin Edwin Yeo wrote:
> > > I'm using SolrCloud 5.0.0 and ZooKeeper 3.4.6 running on Windows, and
> now
> > > I'm trying to deploy a multiple ZooKeeper ensemble (3 servers) on a
> > single
> > > machine. These are the settings which I have configured, according to
> the
> > > Solr Reference Guide.
> > >
> > > These files are under \conf\ directory
> > > (C:\Users\edwin\zookeeper-3.4.6\conf)
> > >
> > > *zoo.cfg*
> > > tickTime=2000
> > > initLimit=10
> > > syncLimit=5
> > > dataDir=C:\\Users\\edwin\\zookeeper-3.4.6\\1
> > > clientPort=2181
> > > server.1=localhost:2888:3888
> > > server.2=localhost:2889:3889
> > > server.3=localhost:2890:3890
> >
> > 
> >
> > >  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] -
> Cannot
> > > open channel to 2 at election address localhost/127.0.0.1:3889
> > > java.net.ConnectException: Connection refused: connect
> >
> > The first thing I would suspect when running any network program on a
> > Windows machine that won't communicate is the Windows firewall, unless
> > you have either turned off the firewall or you have explicitly
> > configured an exception in the firewall for the relevant ports.
> >
> > Your other reply that you got from nutchsolruser does point out that all
> > three zookeeper configs are using 2181 as the clientPort.  Because these
> > are all running on the same machine, you must use a different port for
> > each one.  I'm not sure what happens to subsequent processes after the
> > first one starts, but they won't work even if they do manage to start.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: What is the best way of Indexing different formats of documents?

2015-04-08 Thread Swaraj Kumar
Hi Sangeetha,

/update/extract refers to extractrequesthandler.

If you only want to index the data, you can do it with extractrequesthandler.
I dont think it requires metadata, but you need to provide literal.id to
specify which field will be unique id.

For more information :-
https://wiki.apache.org/solr/ExtractingRequestHandler
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika



Regards,


Swaraj Kumar
Senior Software Engineer I
MakeMyTrip.com
Mob No- 9811774497

On Wed, Apr 8, 2015 at 2:20 PM, sangeetha.subraman...@gtnexus.com <
sangeetha.subraman...@gtnexus.com> wrote:

> Hi Swaraj,
>
>
>
> Thanks for the answers.
>
> From my understanding We can index,
>
> ·   Using DIH from db
>
> ·   Using DIH from filesystem - this is where I am concentrating on.
>
> o   For this we can use SolrJ with Tika(solr cell) from Java layer in
> order to extract the content and send the data through REST API to
> solrserver
>
> o   Or we can use extractrequesthandler to do the job.
>
>
>
> I just want to index only certain documents and there will not be any
> update happening on the indexed document.
>
>
>
> In our existing system we already have DIH implemented which indexes
> document from sql server (As you said based on last index time). In this
> case the metadata is there available in database.
>
>
>
> But if we are streaming via url, we would need to append the metadata too.
> correct me if i am wrong. And how does the indexing happening here based on
> last index time or something else ? Also for  extractrequesthandler when
> you say manual operation what is it you are talking about ? Can you please
> clarify.
>
>
>
> Thanks
>
> Sangeetha
>
>
>
> -Original Message-
> From: Swaraj Kumar [mailto:swaraj2...@gmail.com]
> Sent: 07 April 2015 18:02
> To: solr-user@lucene.apache.org
> Subject: Re: What is the best way of Indexing different formats of
> documents?
>
>
>
> You can always choose either DIH or /update/extract to index docs in solr.
>
> Now there are multiple benefits of DIH which I am listing below :-
>
>
>
> 1. Clean and update using a single command.
>
> 2. DIH also optimize indexing using optimize=true 3. You can do
> delta-import based on last index time where as in case of /update/extract
> you need to do manual operation in case of delta import.
>
> 4. You can use multiple entity processor and transformers in case of DIH
> which is very useful to index exact data you want.
>
> 5. Query parameter "rows" limits the num of records.
>
>
>
> Regards,
>
>
>
>
>
> Swaraj Kumar
>
> Senior Software Engineer I
>
> MakeMyTrip.com
>
> Mob No- 9811774497
>
>
>
> On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com sangeetha.subraman...@gtnexus.com> < sangeetha.subraman...@gtnexus.com
> > wrote:
>
>
>
> > Hi,
>
> >
>
> > I am a newbie to SOLR and basically from database background. We have
>
> > a requirement of indexing files of different formats (x12,edifact,
> csv,xml).
>
> > The files which are inputted can be of any format and we need to do a
>
> > content based search on it.
>
> >
>
> > From the web I understand we can use TIKA processor to extract the
>
> > content and store it in SOLR. What I want to know is, is there any
>
> > better approach for indexing files in SOLR ? Can we index the document
>
> > through streaming directly from the Application ? If so what is the
>
> > disadvantage of using it (against DIH which fetches from the
>
> > database)? Could someone share me some insight on this ? ls there any
>
> > web links which I can refer to get some idea on it ? Please do help.
>
> >
>
> > Thanks
>
> > Sangeetha
>
> >
>
> >
>


Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-08 Thread Jürgen Wagner (DVT)
To be precise: create one zoo.cfg for each of the instances. One config
file for all is a bad idea.

In each config file, use the same server.X lines, but use a unique
clientPort.

As you will also have separate data directories, I would recommend
having one root directory .../zookeeper where you create subdirectories
for each instance. In each of these subdirectories, you may have your
zoo.cfg. To start a zookeeper instance, simply have ZOOCFGDIR point to
the proper relative path, change to the respective directory and start
zookeeper.

Best regards,
--Jürgen

On 08.04.2015 11:22, Swaraj Kumar wrote:
> Hi Zheng,
>
> I am not sure if this command *"zkServer.cmd start zoo.cfg" * works in
> windows or not, but in zkServer.cmd it calls zkEnv.cmd where "
> *ZOOCFG=%ZOOCFGDIR%\zoo.cfg*" is set. So, if you want to run multiple
> instances of zookeeper, change zoo.cfg to your config file and start
> zookeeper.
> The command will not include any start.
>
>
>
> Regards,
>
>
> Swaraj Kumar
> Senior Software Engineer I
> MakeMyTrip.com
> Mob No- 9811774497
>
> On Wed, Apr 8, 2015 at 12:29 PM, Zheng Lin Edwin Yeo 
> wrote:
>
>

-- 

Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
уважением
*i.A. Jürgen Wagner*
Head of Competence Center "Intelligence"
& Senior Cloud Consultant

Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
E-Mail: juergen.wag...@devoteam.com
, URL: www.devoteam.de



Managing Board: Jürgen Hatzipantelis (CEO)
Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071




Permission Denied Error

2015-04-08 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi,

I am trying to setup a SolrCloud cluster on top of Hadoop (HDP). The
upconfig and linkconfig commands were run successfully and the
configuration is now centrally manged in Zookeeper.

However, when I run the command to create a core, I am shown the following
permission denied error. This is related to HDFS. However, I am not sure on
how to get rid of this, as no where I am specifying the user detailslike
run the REST command as root user. I had even tried to run the command
logged in as 'hdfs' user but without any luck. I had even added two hdfs
related impersonation configuration parameters to core-site.xml on Hadoop
side to see if it would resolve the issue. Issue still prevails. The two
configuration parameters added were *hadoop.proxyusers.hdfs.groups*
and *hadoop.proxyusers.hdfs.hosts,
*both set to a value of *

Command being run is curl '
http://datanode1.hdp.lgcgroup.com:8080/solr/admin/cores?action=CREATE&name=shard1-replica-1&collection=mycollection&shard=shard1
'

FYI, inode="/user" is the 'user' directory under HDFS under which Solr is
going to store its data.

Error Message is

Error CREATEing SolrCore 'shard1-replica-1': Unable to create core
[shard1-replica-1] Caused by: Permission denied: user=root, access=WRITE,
inode="/user":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
checkFsPermission(FSPermissionChecker.java:271)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
check(FSPermissionChecker.java:257)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
check(FSPermissionChecker.java:238)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.
checkPermission(FSPermissionChecker.java:179)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
checkPermission(FSNamesystem.java:6515)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
checkPermission(FSNamesystem.java:6497)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
checkAncestorAccess(FSNamesystem.java:6449)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
mkdirsInternal(FSNamesystem.java:4251)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
mkdirsInt(FSNamesystem.java:4221)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(
FSNamesystem.java:4194)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
mkdirs(NameNodeRpcServer.java:813)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi
deTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$
ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.
java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(
UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

I am attaching both solr.xml and solrconfig.xml files as well here for
reference.

Thanks & Regards
Vijay


*Vijay Bhoomireddy*, Big Data Architect

1000 Great West Road, Brentford, London, TW8 9DW
*T:  +44 20 3475 7980*
*M: **+44 7481 298 360*
*W: *ww w.whishworks.com



  


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.





  

  
  4.10.2

  

  
  
  

  
  

  
  

  
  

  
  
  
  
  ${solr.data.dir:}


  

  hdfs://ambariserver.hdp.lgcgroup.com:8020/user/solr
  true
  1
  true
  16384
  true
  true
  true
  16
  192
 

  
  

  
  

  
  
















   







hdfs












  
  
  
  
  
  



 true


 false
  


  
  
  
  
  
  

  
  

 

  ${solr.ulog.dir:}

 

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 



  
   ${solr.autoSoftCommit.maxTime:-1} 
 






  
  
  
  
  
  

  
  

1024







   

Ignoring metatags in solr

2015-04-08 Thread Anchit Jain

I have crawled a website using nutch.
When I try to index it with solr I get following error
org.apache.solr.common.SolrException: ERROR: [doc=http://xyz.htm] 
unknown field 'metatag.keywords'

*unknown field 'metatag.keywords'*

I can not figure out where the error is as I have o not defined any 
field in schema.xml for metatags.I just copied the schema.xml from nutch 
into solr.

I am using Nutch 1.9 with Solr 4.10

My *schema.xml* for *solr*




sortMissingLast="true"

omitNorms="true"/>

precisionStep="0"

omitNorms="true" positionIncrementGap="0"/>






















































multiValued="true"/>











id
content



my *solrindex-mapping.xml*











id





Re: search on special characters

2015-04-08 Thread Jack Krupansky
Text search means searching of text, and special characters are not... text.

Why are you using the standard tokenizer if you are not trying to search
for standard text?

Try using the white space tokenizer, which will preserve special characters.

That said, the word delimiter filter will remove this punctuation. You can
specify a character type map to treat specific characters as letters. See
the doc. (or the examples in my e-book.)




-- Jack Krupansky

On Wed, Apr 8, 2015 at 2:50 AM, avinash09  wrote:

> not able to search on special characters like . ,_
>
> my query
> http://localhost:8983/solr/rna/select?q=name:"UAE
> B"&wt=json&fl=name&rows=100
>
> getting result UAE_bhdgsfsdbj
>
> but for
> http://localhost:8983/solr/rna/select?q=name
> :"UAE_"&wt=json&fl=name&rows=100
>
> no result found
>
> I am using below field type
>
> 
>mapping="mapping-ISOLatin1Accent.txt"/>
>   
>generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>   
>minGramSize="1"/>
>pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
> 
> 
>mapping="mapping-ISOLatin1Accent.txt"/>
>   
>   
> 
>   
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/search-on-special-characters-tp4198286.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


CloudSolrServer - Unknown type 19

2015-04-08 Thread Chaushu, Shani
i'm using solr 4.4.
the query request works fine but when i try to add doc into solr cloud 
(cloudSolrServer.request(updateRequest))
 i get an error:

Exception in thread "main" 
org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: Unknown type 
19
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)

i think this is problem between the versions, there is a way to make it work?

-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: CloudSolrServer - Unknown type 19

2015-04-08 Thread Emre Sevinc
Hello Shani,

Are you using SolrJ? Did you try using the same version of SolrJ (e.g v.
4.4 of SolrJ, if you're using Solr 4.4)? That's what generally worked for
me.

Kind regards,

Emre Sevinç
http://www.bigindustries.be/


On Wed, Apr 8, 2015 at 1:46 PM, Chaushu, Shani 
wrote:

> i'm using solr 4.4.
> the query request works fine but when i try to add doc into solr cloud
> (cloudSolrServer.request(updateRequest))
>  i get an error:
>
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: Unknown
> type 19
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360)
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:533)
>
> i think this is problem between the versions, there is a way to make it
> work?
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>



-- 
Emre Sevinc


RE: CloudSolrServer - Unknown type 19

2015-04-08 Thread Chaushu, Shani
Hi,
I tried to get the SolrJ in older version, but I'm using solr-spark package and 
it fails with compilation errors probably because it uses function from newer 
versions...
I can't find any solution... 

-Original Message-
From: Emre Sevinc [mailto:emre.sev...@gmail.com] 
Sent: Wednesday, April 08, 2015 14:56
To: solr-user@lucene.apache.org
Subject: Re: CloudSolrServer - Unknown type 19

Hello Shani,

Are you using SolrJ? Did you try using the same version of SolrJ (e.g v.
4.4 of SolrJ, if you're using Solr 4.4)? That's what generally worked for me.

Kind regards,

Emre Sevinç
http://www.bigindustries.be/


On Wed, Apr 8, 2015 at 1:46 PM, Chaushu, Shani 
wrote:

> i'm using solr 4.4.
> the query request works fine but when i try to add doc into solr cloud
> (cloudSolrServer.request(updateRequest))
>  i get an error:
>
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.CloudSolrServer$RouteException: 
> Unknown type 19
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.directUpdate(CloudSolrServer.java:360)
> at
> org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrSer
> ver.java:533)
>
> i think this is problem between the versions, there is a way to make 
> it work?
>
> -
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>



--
Emre Sevinc
-
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: CloudSolrServer - Unknown type 19

2015-04-08 Thread Shawn Heisey
On 4/8/2015 6:30 AM, Chaushu, Shani wrote:
> I tried to get the SolrJ in older version, but I'm using solr-spark package 
> and it fails with compilation errors probably because it uses function from 
> newer versions...
> I can't find any solution... 

Looking at the github repo for spark-solr, I see no releases, so I
assume you're building it from source.  The pom.xml that's in the repo
currently uses SolrJ 4.10.3.  With the server running 4.4, that's a very
wide version spread.  SolrCloud often will not work properly with a wide
version difference between SolrJ and Solr.

It seems that you've tried to downgrade SolrJ in the client code without
success, which doesn't surprise me.  I think you will have no choice but
to upgrade Solr to the same version as SolrJ.  It should be safe to go
with 4.10.4 rather than 4.10.3 -- get the latest bugfixes.

Thanks,
Shawn



Keeping frequently changing fields out of SOLR

2015-04-08 Thread Achim Domma
Hi,

I have a core with about 20M documents and the size on disc is about
50GB. It is running on a single EC2 instance. If the core is warmed up,
everything is running fine. The problem is the following:

We assign categories (similar to tags) to documents. Those are stored in
a multivalue string field. After the commit, query times are
unacceptable slow.

Those categories are the only field that is every changed, so I was
thinking about a way to keep the information outside SOLR. I had some
ideas, but my knowledge of SOLR internals would need some improvement to
implement them. Looking for other solutions, I stumbled about this
comment in a JIRA issue:

https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13423159&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13423159

The following words sound quite good to me:

"People could instead solve this by putting their apps primary key into
a docvalues field, allowing them to keep these scoring factors
completely external to lucene (e.g. their own array or whatever),
indexed by their own primary key. But the problem is I think people want
lucene to manage this, they don't want to implement themselves whats
necessary to make it consistent with commits etc."

Sounds like there is an obvious solution, how to keep data outside SOLR,
but make it somehow accessible via DocValues. But I have no idea about
what kind of solution he is talking.

Could somebody give me a starting point? I would need to filter on that
field and facet over it.

cheers,
Achim


Re: Keeping frequently changing fields out of SOLR

2015-04-08 Thread Jack Krupansky
How much RAM do you have? Check whether your system is compute-bound or
I/O-bound? If all or most of your index doesn't fit in the system memory
available for file caching, you're asking for trouble.

Is the indexing time also unacceptably slow, or just the query time?

-- Jack Krupansky

On Wed, Apr 8, 2015 at 9:03 AM, Achim Domma  wrote:

> Hi,
>
> I have a core with about 20M documents and the size on disc is about
> 50GB. It is running on a single EC2 instance. If the core is warmed up,
> everything is running fine. The problem is the following:
>
> We assign categories (similar to tags) to documents. Those are stored in
> a multivalue string field. After the commit, query times are
> unacceptable slow.
>
> Those categories are the only field that is every changed, so I was
> thinking about a way to keep the information outside SOLR. I had some
> ideas, but my knowledge of SOLR internals would need some improvement to
> implement them. Looking for other solutions, I stumbled about this
> comment in a JIRA issue:
>
>
> https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13423159&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13423159
>
> The following words sound quite good to me:
>
> "People could instead solve this by putting their apps primary key into
> a docvalues field, allowing them to keep these scoring factors
> completely external to lucene (e.g. their own array or whatever),
> indexed by their own primary key. But the problem is I think people want
> lucene to manage this, they don't want to implement themselves whats
> necessary to make it consistent with commits etc."
>
> Sounds like there is an obvious solution, how to keep data outside SOLR,
> but make it somehow accessible via DocValues. But I have no idea about
> what kind of solution he is talking.
>
> Could somebody give me a starting point? I would need to filter on that
> field and facet over it.
>
> cheers,
> Achim
>


Re: Deploying multiple ZooKeeper ensemble on a single machine

2015-04-08 Thread Zheng Lin Edwin Yeo
Thank you Swaraj and Jurgen for the information.

I'll just stick to the one zoo.cfg for each instance. Now I have one root
directory .../zookeeper where I create the 3 subdirectories for each of the
instance (known as zookeeper1, zookeeper2 and zookeeper3), and each of them
have their own zoo.cfg.

Regards,
Edwin


On 8 April 2015 at 17:33, "Jürgen Wagner (DVT)"  wrote:

>  To be precise: create one zoo.cfg for each of the instances. One config
> file for all is a bad idea.
>
> In each config file, use the same server.X lines, but use a unique
> clientPort.
>
> As you will also have separate data directories, I would recommend having
> one root directory .../zookeeper where you create subdirectories for each
> instance. In each of these subdirectories, you may have your zoo.cfg. To
> start a zookeeper instance, simply have ZOOCFGDIR point to the proper
> relative path, change to the respective directory and start zookeeper.
>
> Best regards,
> --Jürgen
>
> On 08.04.2015 11:22, Swaraj Kumar wrote:
>
> Hi Zheng,
>
> I am not sure if this command *"zkServer.cmd start zoo.cfg" * works in
> windows or not, but in zkServer.cmd it calls zkEnv.cmd where "
> *ZOOCFG=%ZOOCFGDIR%\zoo.cfg*" is set. So, if you want to run multiple
> instances of zookeeper, change zoo.cfg to your config file and start
> zookeeper.
> The command will not include any start.
>
>
>
> Regards,
>
>
> Swaraj Kumar
> Senior Software Engineer I
> MakeMyTrip.com
> Mob No- 9811774497
>
> On Wed, Apr 8, 2015 at 12:29 PM, Zheng Lin Edwin Yeo  
> 
> wrote:
>
>
>
>
> --
>
> Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> уважением
> *i.A. Jürgen Wagner*
> Head of Competence Center "Intelligence"
> & Senior Cloud Consultant
>
> Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
> E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> --
> Managing Board: Jürgen Hatzipantelis (CEO)
> Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>
>
>


Re: Problem with new solr.xml format and core swaps

2015-04-08 Thread Erick Erickson
Well, at least it's _some_  progress ;).

Agreed, the segments hanging around is still something of a mystery
although if I really stretch I could relate them, maybe.

I believe there's clean-up logic when a core starts up to nuke cruft
in the index directory. If the cruft was created after a core swap on
the core where Solr couldn't write the core.properties file, then when
the core started back up is it possible that it was looking in the
wrong directory to clean stuff up. This a total and complete guess
though as I don't know that bit of code so

If the undeleted segment files were in a directory related to the core
whose core.properties file wasn't persisted, that would lend some
credence to the idea though.

FWIW,
Erick

On Tue, Apr 7, 2015 at 12:18 PM, Shawn Heisey  wrote:
> On 4/7/2015 10:54 AM, Erick Erickson wrote:
>> I'm pretty clueless why you would be seeing this, and slammed with
>> other stuff so I can't dig into this right now.
>>
>> What do the "core.properties" files look like when you see this? They
>> should be re-written when you swap cores. Hmmm, I wonder if there's
>> some condition where the files are already open and the persistence
>> fails? If so we should be logging that error, I have no proof either
>> way whether we are or not though.
>>
>> Guessing that your log files in the problem case weren't all that
>> helpful, but let's have a look at them if this occurs again?
>
> I hadn't had a chance to review the logs, but when I did just now, I
> found this:
>
> ERROR - 2015-04-07 11:56:15.568;
> org.apache.solr.core.CorePropertiesLocator; Couldn't persist core
> properties to /index/solr4/cores/sparkinc_0/core.properties:
> java.io.FileNotFoundException:
> /index/solr4/cores/sparkinc_0/core.properties (Permission denied)
>
> That's fairly clear.  I guess my permissions were wrong.  My best guess
> as to why -- things owned by root from when I created the
> core.properties files.  Solr does not run as root.  I didn't think to
> actually look at the permissions before I ran a script that I maintain
> which fixes all the ownership on my various directories involved in my
> full search installation.
>
> I don't think this explains the not-deleted segment files problem.
> Those segment files were written by solr running as the regular user, so
> there couldn't have been a permission problem.
>
> Thanks,
> Shawn
>


Re: Keeping frequently changing fields out of SOLR

2015-04-08 Thread Erick Erickson
bq: After the commit, query times are unacceptable slow

First, please quantify "unacceptable". 100ms? 10,000ms? Details matter.

Second, the purpose of autowarming is exactly to smooth out the first few
searches when a new searcher is opened, are you doing any?

Third: What are your autocommit settings, and how are you committing
in general? How often?

LUCENE-4258 has never been implemented. Updateable DocValues are
certainly something I'd really like to see in Solr, but they're not there yet,
there are some consistency issues that have to be dealt with, see:
https://issues.apache.org/jira/browse/SOLR-5944

All that aside, lots and lots of people have solved this problem with
appropriate commit policies and autowarming, so that's what I'd
look at first.

Best,
Erick

On Wed, Apr 8, 2015 at 6:22 AM, Jack Krupansky  wrote:
> How much RAM do you have? Check whether your system is compute-bound or
> I/O-bound? If all or most of your index doesn't fit in the system memory
> available for file caching, you're asking for trouble.
>
> Is the indexing time also unacceptably slow, or just the query time?
>
> -- Jack Krupansky
>
> On Wed, Apr 8, 2015 at 9:03 AM, Achim Domma  wrote:
>
>> Hi,
>>
>> I have a core with about 20M documents and the size on disc is about
>> 50GB. It is running on a single EC2 instance. If the core is warmed up,
>> everything is running fine. The problem is the following:
>>
>> We assign categories (similar to tags) to documents. Those are stored in
>> a multivalue string field. After the commit, query times are
>> unacceptable slow.
>>
>> Those categories are the only field that is every changed, so I was
>> thinking about a way to keep the information outside SOLR. I had some
>> ideas, but my knowledge of SOLR internals would need some improvement to
>> implement them. Looking for other solutions, I stumbled about this
>> comment in a JIRA issue:
>>
>>
>> https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13423159&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13423159
>>
>> The following words sound quite good to me:
>>
>> "People could instead solve this by putting their apps primary key into
>> a docvalues field, allowing them to keep these scoring factors
>> completely external to lucene (e.g. their own array or whatever),
>> indexed by their own primary key. But the problem is I think people want
>> lucene to manage this, they don't want to implement themselves whats
>> necessary to make it consistent with commits etc."
>>
>> Sounds like there is an obvious solution, how to keep data outside SOLR,
>> but make it somehow accessible via DocValues. But I have no idea about
>> what kind of solution he is talking.
>>
>> Could somebody give me a starting point? I would need to filter on that
>> field and facet over it.
>>
>> cheers,
>> Achim
>>


Search speed issue on new core creation

2015-04-08 Thread dhaivat dave
Hello All,

I am using Master - Slave architecture setup with hundreds of cores getting
replicated between master and slave servers. I am facing very weird issue
while creating a new core.

Whenever there is a new call for a new core creation (using
CoreAdminRequest.createCore(coreName,instanceDir,serverObj)) all the
searches issued to other cores are getting blocked.

Any help or thoughts would highly appreciated.

Regards,
Dhaivat


Solr Development for E-Commerce Appllication

2015-04-08 Thread jainam vora
Hi,

Brief:
I am new to Solr and E commerce web apps and Java.
i want to integrate solr in eCommerce web application (developed using Java
on Linux).

I have following queries.

1. how to setup SolrCloud on Tomcat. Searched on internet but could not get
clear steps yet. Also tried some steps but not succeeded yet.

2. what is the best way to update the index regularly in production
 curl or post.jar or solrj
 i mean what should be the architecture (scalable)

3. if i want to display rs , inches etc prefixes with my facets .. is there
any ready   made option or i need to format it on web page using javascript.
Price
Rs. 1 -1000
Rs.  1001-1

stored data 1050 , 2050 , 3050 , 4050 etc..

Screen size
Inch 5
Inch 4.5

stored data 5 , 4.5 , 4 etc..


4. i have two types of products clothing and electronics... how many
collections i need to create.. with distributed support for all features
i.e facets etc.in a single query.

5. how to decide on facets.. as web page wont give details about the same.
decision needs to be done at web application server.
if i query red then which facet should be displayed in fact which facets
should be queried?
if i query samsung then which facet should be displayed in fact which
facets should be queried?
how to architect on this?


6. how to develop custom component and integrate it in solr.. any need for
this?
 use of this..

7.Use of Solr Collection Alias and impact of query formation..


8. What is the best practice to update the data in Solr..  I mean i have
data from mysql , csv files and hadoop analyzed data..
I want to index the same in Solr. what should i use in production.
1. Curl Command on command prompt
2. Develop an application using SolrJ Client API ..
3. bin/post command..

9. Hadoop integration benefits and limitation for large scale solrCloud
..needed or not?

10. Variable Gap facets are not supported yet. how to develop same?
facet.range.spec not working.

-- 
Thanks & Regards,
Jainam Vora


curl on debian linux gives http authentication error

2015-04-08 Thread jainam vora
Hi,

I have installed Curl on debian linux. But when i use curl to create
collection i am getting http autentication error.

-- 
Thanks & Regards,
Jainam Vora


i am using text_general not able to search on space

2015-04-08 Thread avinash09


http://localhost:8983/solr/rna/select?q=test_name:*Uae
blow*&wt=json&rows=100

getting 
{
responseHeader: {
status: 400,
QTime: 28
},
error: {
msg: "no field name specified in query and no default specified via 'df'
param",
code: 400
}
}

plz help!!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/i-am-using-text-general-not-able-to-search-on-space-tp4198470.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: i am using text_general not able to search on space

2015-04-08 Thread Test Test
Re,
You have to specify defautSearchField tag in Schema.xml
Regards,Andy 


 Le Mercredi 8 avril 2015 21h33, avinash09  a écrit 
:
   

 

http://localhost:8983/solr/rna/select?q=test_name:*Uae
blow*&wt=json&rows=100

getting 
{
responseHeader: {
status: 400,
QTime: 28
},
error: {
msg: "no field name specified in query and no default specified via 'df'
param",
code: 400
}
}

plz help!!





--
View this message in context: 
http://lucene.472066.n3.nabble.com/i-am-using-text-general-not-able-to-search-on-space-tp4198470.html
Sent from the Solr - User mailing list archive at Nabble.com.


  

Re: Solr Development for E-Commerce Appllication

2015-04-08 Thread Erick Erickson
See inline for a few answers:

On Wed, Apr 8, 2015 at 10:32 AM, jainam vora  wrote:
> Hi,
>
> Brief:
> I am new to Solr and E commerce web apps and Java.
> i want to integrate solr in eCommerce web application (developed using Java
> on Linux).
>
> I have following queries.
>
> 1. how to setup SolrCloud on Tomcat. Searched on internet but could not get
> clear steps yet. Also tried some steps but not succeeded yet.

Don't. Solr 5 is moving to using scripts to start/stop Solr and does
NOT require a
container, Tomcat, Jetty or anything else, just use the bin/solr
[start|stop|etc] script.

>
> 2. what is the best way to update the index regularly in production
>  curl or post.jar or solrj
>  i mean what should be the architecture (scalable)
>

My vast preference is SolrJ. IMO Post.jar and curl are there to for
getting started
quickly and aren't really suitable for Production. Here's a SolrJ sample:
https://lucidworks.com/blog/indexing-with-solrj/

> 3. if i want to display rs , inches etc prefixes with my facets .. is there
> any ready   made option or i need to format it on web page using javascript.
> Price
> Rs. 1 -1000
> Rs.  1001-1
>
> stored data 1050 , 2050 , 3050 , 4050 etc..
>
> Screen size
> Inch 5
> Inch 4.5
>
> stored data 5 , 4.5 , 4 etc..

Not quite sure what you're getting at here. But how to display the facets
"prettily" is usually handled in the app layer, but see the "tag" facet
parameter.

>
>
> 4. i have two types of products clothing and electronics... how many
> collections i need to create.. with distributed support for all features
> i.e facets etc.in a single query.

Unknown. I'd start with one though, include a "type" field to allow you to
slice and dice them.

You can use an "alias" with that points to more than one collection, but there's
some subtle issues with comparing scores from different collections that'll bite
you.

>
> 5. how to decide on facets.. as web page wont give details about the same.
> decision needs to be done at web application server.
> if i query red then which facet should be displayed in fact which facets
> should be queried?
> if i query samsung then which facet should be displayed in fact which
> facets should be queried?
> how to architect on this?

This is mostly all app-layer/UI decisions. Facets are a purely query-time
construct so you can do most anything you want with them depending on
how you form your _query_. The fields you facet on must be indexed is all.

>
>
> 6. how to develop custom component and integrate it in solr.. any need for
> this?
>  use of this..
>

Too big a topic right now. Personally I wouldn't waste time on this
until I ran up against a situation where I couldn't make stock Solr
work. But most components are pluggable (i.e. query, update etc).

> 7.Use of Solr Collection Alias and impact of query formation..
>

Don't know if it's big enough to measure, much less be noticeable.

>
> 8. What is the best practice to update the data in Solr..  I mean i have
> data from mysql , csv files and hadoop analyzed data..
> I want to index the same in Solr. what should i use in production.
> 1. Curl Command on command prompt
> 2. Develop an application using SolrJ Client API ..
> 3. bin/post command..

This seems like the second time you've asked this, what's different from
question 2?

>
> 9. Hadoop integration benefits and limitation for large scale solrCloud
> ..needed or not?

Unless you're _already_ on a Hadoop infrastructure, there's no need
to put one in just for Solr. So not really needed IMO unless you have
massive amounts of data that you already have on HDFS.

>
> 10. Variable Gap facets are not supported yet. how to develop same?
> facet.range.spec not working.

You've got to tell us _how_ it's not working, what version of Solr you're using,
etc. If you're going from this page:
http://wiki.apache.org/solr/VariableRangeGaps

then all the cautions at the top are pertinent.

"interval faceting" can accomplish much of this I think, see:
https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-IntervalFaceting


>
> --
> Thanks & Regards,
> Jainam Vora


Re: i am using text_general not able to search on space

2015-04-08 Thread Erick Erickson
Specifying the default search field in the schema has been
deprecated for a while, it wasn't flexible enough.

The recommended way is to specify a "df" parameter in
your request handler defaults.

The space separates the field specification from the second term.\
Assuming that you want to search both terms in the same field, you
have several options
test_name:(term1 term2)
will search for both in the test_name field using the default
operator, usually OR.
test_name:(+term1 +term2)
will require both terms to appear in the field, you get a similar effect from
test_name:(term1 AND term2)
(note the AND must be capitalized).

test_name:"term1 term2"
will look for both terms in test_name, but they have to be next to each other.

test_name:"term1 term2"~5
will look for them within 5 words of each other, and will also hit if
term2 is in front of term1.

Best,
Erick

On Wed, Apr 8, 2015 at 12:42 PM, Test Test  wrote:
> Re,
> You have to specify defautSearchField tag in Schema.xml
> Regards,Andy
>
>
>  Le Mercredi 8 avril 2015 21h33, avinash09  a 
> écrit :
>
>
>
>
> http://localhost:8983/solr/rna/select?q=test_name:*Uae
> blow*&wt=json&rows=100
>
> getting
> {
> responseHeader: {
> status: 400,
> QTime: 28
> },
> error: {
> msg: "no field name specified in query and no default specified via 'df'
> param",
> code: 400
> }
> }
>
> plz help!!
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/i-am-using-text-general-not-able-to-search-on-space-tp4198470.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>


Lucene updateDocument does not affect index until restarting solr

2015-04-08 Thread Ali Nazemian
Dear all,
Hi,
As a part of my code I have to update Lucene document. For this purpose I
used writer.updateDocument() method. My problem is the update process is
not affect index until restarting Solr. Would you please tell me what part
of my code is wrong? Or what should I add in order to apply the changes?

RefCounted iw = solrCoreState.getIndexWriter(core);
  try {
IndexWriter writer = iw.get();
FieldType type= new FieldType(StringField.TYPE_STORED);
for (int i = 0; i < hits.length; i++) {
  Document document = searcher.doc(hits[i].doc);
  List keywords = keyword.getKeywords(hits[i].doc);
  if (keywords.size() > 0) document.removeFields(keywordField);
  for (String word : keywords) {
document.add(new Field(keywordField, word, type));
  }
  String uniqueKey =
searcher.getSchema().getUniqueKeyField().getName();
  writer.updateDocument(new Term(uniqueKey,
document.get(uniqueKey)),
  document);
}
writer.commit();
  } finally {
iw.decref();
  }


Best regards.

-- 
A.Nazemian


Re: Help understanding addreplica error message re: maxShardsPerNode

2015-04-08 Thread Ian Rose
Wups - sorry folks, I send this prematurely.  After typing this out I think
I have it figured out - although SPLITSHARD ignores maxShardsPerNode,
ADDREPLICA does not.  So ADDREPLICA fails because I already have too many
shards on a single node.

On Wed, Apr 8, 2015 at 11:18 PM, Ian Rose  wrote:

> On my local machine I have the following test setup:
>
> * 2 "nodes" (JVMs)
> * 1 collection named "testdrive", that was originally created with
> numShards=1 and maxShardsPerNode=1.
> * After a series of SPLITSHARD commands, I now have 4 shards, as follows:
>
> testdrive_shard1_0_0_replica1 (L) Active 115
> testdrive_shard1_0_1_replica1 (L) Active 0
> testdrive_shard1_1_0_replica1 (L) Active 5
> testdrive_shard1_1_1_replica1 (L) Active 88
>
> The number in the last column is the number of documents.  The 4 shards
> are all on the same node; the second node holds nothing for this collection.
>
> Already, this situation is a little strange because I have 4 shards on one
> node, despite the fact that maxShardsPerNode is 1.  My guess is that
> SPLITSHARD ignores the maxShardsPerNode value - is that right?
>
> Now, if I issue an ADDREPLICA command
> with collection=testdrive&shard=shard1_0_0, I get the following error:
>
> "Cannot create shards testdrive. Value of maxShardsPerNode is 1, and the
> number of live nodes is 2. This allows a maximum of 2 to be created. Value
> of numShards is 4 and value of replicationFactor is 1. This requires 4
> shards to be created (higher than the allowed number)"
>
> I don't totally understand this.
>
>
>


Re: change maxShardsPerNode for existing collection?

2015-04-08 Thread Ian Rose
Thanks, I figured that might be the case (hand-editting clusterstate.json).

- Ian


On Wed, Apr 8, 2015 at 11:46 PM, ralph tice  wrote:

> It looks like there's a patch available:
> https://issues.apache.org/jira/browse/SOLR-5132
>
> Currently the only way without that patch is to hand-edit
> clusterstate.json, which is very ill advised.  If you absolutely must,
> it's best to stop all your Solr nodes, backup the current clusterstate
> in ZK, modify it, and then start your nodes.
>
> On Wed, Apr 8, 2015 at 10:21 PM, Ian Rose  wrote:
> > I previously created several collections with maxShardsPerNode=1 but I
> > would now like to change that (to "unlimited" if that is an option).  Is
> > changing this value possible?
> >
> > Cheers,
> > - Ian
>


RE: Clusterstate - state active

2015-04-08 Thread Matt Kuiper
Found this error which likely explains my issue with new replicas not coming 
up, not sure next step.  Almost looks like Zookeeper's record of a Shard's 
leader is not being updated?

4/8/2015, 4:56:03 PM
ERROR
ShardLeaderElectionContext
There was a problem trying to register as the 
leader:org.apache.solr.common.SolrException: Could not register as the leader 
because creating the ephemeral registration node in ZooKeeper failed
There was a problem trying to register as the 
leader:org.apache.solr.common.SolrException: Could not register as the leader 
because creating the ephemeral registration node in ZooKeeper failed
at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:150)
at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:306)
at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
at 
org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
at 
org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:209)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /collections/kla_collection/leaders/shard4
at 
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40)
at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:137)
... 11 more
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for /collections/kla_collection/leaders/shard4
at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:462)
at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:459)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:416)
at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:403)
at 
org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:142)
at 
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:34)

Matt


-Original Message-
From: Matt Kuiper [mailto:matt.kui...@issinc.com] 
Sent: Wednesday, April 08, 2015 4:36 PM
To: solr-user@lucene.apache.org
Subject: RE: Clusterstate - state active

Erick, Anshum,

Thanks for your replies!  Yes, it is replica state that I am looking at, and 
this the answer I was hoping for.  

I am working on a solution that involves moving some replicas to new Solr nodes 
as they are made available.  Before deleting the original replicas backing the 
shard, I check the replica state to make sure is active for the new replicas.  

Initially it was working pretty well, but with more recent testing I regularly 
see the shard go down.  The two new replicas go into failed recovery state 
after the original replicas are deleted, the logs report that a registered 
leader was not found for the shard.  Initially I was concerned that maybe the 
new shards were not fully synced with the leader, even though I checked for 
active state.

Now I am wondering if the new shards are somehow competing (or somehow 
reluctant )  to become leader, and thus neither become leader.  I plan to test 
just creating one new replica on a new solr node, checking for state is active, 
then deleting original replicas, and then creating second new replica.

Any thoughts?

Matt

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, April 08, 2015 4:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Clusterstate - state active

Matt:

In a word, "yes". Depending on the size of the index for that shard, the 
transition from Down->Recovering->Active may be too fast to catch.
If replicating the index takes a while, though, you should at least see the 
"Recovering" state, during which time there won't be any searches forwarded to 
that node.

Best,
Erick

On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper  wrote:
> Hello,
>
> When creating a new replica, and the 

Clusterstate - state active

2015-04-08 Thread Matt Kuiper
Hello,

When creating a new replica, and the state is recorded as active with in ZK 
clusterstate, does that mean that new replica has synched with the leader 
replica for the particular shard?

Thanks,
Matt



SOLR searching

2015-04-08 Thread Brian Usrey
I am extremely new to SOLR and am wondering if it is possible to do something 
like the following. Basically I have been tasked with researching SOLR to see 
if we can replace our current searching algorithm.
We have a website with product data.  Product data includes standard things 
like Name, SKU, Description, StandardPrice and other things.  I can search that 
information with no issue, however in addition to the product data, the Price 
of the product can depend on the user that is logged in and doing the 
searching.  So, for one user, the product costs $2.50 and for another the same 
product costs $2.65.  Additionally a user might not have a price specifically 
for them, so we would display the standard price of for the product which might 
be $2.75.  The tables in the database look similar to this:
ProductProductIDNameSKUDescriptionStandardPrice

ProductUserPriceProductIDUserIDPrice

How can I create the SOLR installation so that when searching for products, I 
return the product information, including the correct Price for the logged in 
user?  We will know the UserID when searching, so that can be part of the 
query.  Also, part of the search might be for all products in a certain price 
range ($.50 - $10.00) which should take into account the price for the specific 
user.
I have started using the DataImporter and have come up with this 
design:
    
   
       
        
    
       
          
          
          
      
    


I can use this to import Parent information, but I have no idea how to use the 
Child entity.  I don't see how to handle nested docs in the schema.xml.  Is it 
possible to do so?

I would appreciate any help, and if there is another way to solve this beyond 
nested documents I am completely open to a different way. 

Thank You!
Brian


Help understanding addreplica error message re: maxShardsPerNode

2015-04-08 Thread Ian Rose
On my local machine I have the following test setup:

* 2 "nodes" (JVMs)
* 1 collection named "testdrive", that was originally created with
numShards=1 and maxShardsPerNode=1.
* After a series of SPLITSHARD commands, I now have 4 shards, as follows:

testdrive_shard1_0_0_replica1 (L) Active 115
testdrive_shard1_0_1_replica1 (L) Active 0
testdrive_shard1_1_0_replica1 (L) Active 5
testdrive_shard1_1_1_replica1 (L) Active 88

The number in the last column is the number of documents.  The 4 shards are
all on the same node; the second node holds nothing for this collection.

Already, this situation is a little strange because I have 4 shards on one
node, despite the fact that maxShardsPerNode is 1.  My guess is that
SPLITSHARD ignores the maxShardsPerNode value - is that right?

Now, if I issue an ADDREPLICA command
with collection=testdrive&shard=shard1_0_0, I get the following error:

"Cannot create shards testdrive. Value of maxShardsPerNode is 1, and the
number of live nodes is 2. This allows a maximum of 2 to be created. Value
of numShards is 4 and value of replicationFactor is 1. This requires 4
shards to be created (higher than the allowed number)"

I don't totally understand this.


omitTermFreqAndPositions issue

2015-04-08 Thread Ryan Josal
Hey guys, it seems that omitTermFreqAndPositions is not very usable with
edismax, and I'm wondering if this is intended behavior, and how I can get
around the problem.

The setup:
define field "foo" with omitTermFreqAndPositions=true

The query:
q="ground coffee"&qf=foo bar baz

The error:
IllegalStateException: field "foo" indexed without position data; cannot
run PhraseQuery.

It would actually be ok for us to index position data but there isn't an
option for that without term frequencies.  No TF is important for us when
it comes to searching product titles.

I should say that only a small fraction of user queries contained quoted
phrases that trigger this error, so it works much of the time, but we'd
also like to continue supporting user quoted phrase queries.

So how can I index a field without TF and use it in edismax qf?

Thanks for your help!
Ryan


Re: change maxShardsPerNode for existing collection?

2015-04-08 Thread ralph tice
It looks like there's a patch available:
https://issues.apache.org/jira/browse/SOLR-5132

Currently the only way without that patch is to hand-edit
clusterstate.json, which is very ill advised.  If you absolutely must,
it's best to stop all your Solr nodes, backup the current clusterstate
in ZK, modify it, and then start your nodes.

On Wed, Apr 8, 2015 at 10:21 PM, Ian Rose  wrote:
> I previously created several collections with maxShardsPerNode=1 but I
> would now like to change that (to "unlimited" if that is an option).  Is
> changing this value possible?
>
> Cheers,
> - Ian


Re: Memory Leak in solr 4.8.1

2015-04-08 Thread Toke Eskildsen
On Wed, 2015-04-08 at 14:00 -0700, pras.venkatesh wrote:
> 1. 8 nodes, 4 shards(2 nodes per shard)
> 2. each node having about 55 GB of Data, in total there is 450 million
> documents in the collection. so the document size is not huge, 

So ~120M docs/shard.

> 3. The schema has 42 fields, it gets reloaded every 15 mins with about
> 50,000 documents. Now we have primary Key for the index, so when there are
> any duplicates the document gets re-written.
> 4. The GC policy is CMS, with heap size min and max = 8 gb and perm size =
> 512 mb and RAM on the VM is 24 gb.

Do you have a large and active filter cache? Each entry is 30MB, so it
does not take many entries to fill a 8GB heap. That would match the
description of ever-running GC.

- Toke Eskildsen, State and University Library, Denmark




change maxShardsPerNode for existing collection?

2015-04-08 Thread Ian Rose
I previously created several collections with maxShardsPerNode=1 but I
would now like to change that (to "unlimited" if that is an option).  Is
changing this value possible?

Cheers,
- Ian


Re: omitTermFreqAndPositions issue

2015-04-08 Thread Erick Erickson
Ryan:

bq:  I don't want it to issue phrase queries to that field ever

This is one of those requirements that you'd have to enforce at the
app layer. Having Solr (or Lucene) enforce a rule like this for
everyone would be terrible.

So if you're turning off TF but also saying title is "one of the
primary components of score". Since TF in integral to calculating
scores, I'm not quite sure what that means.

You could write a custom similarity class that returns whatever you
want (1.0 comes to mind) from the tf() method.

Best,
Erick

On Wed, Apr 8, 2015 at 4:50 PM, Ryan Josal  wrote:
> Thanks for your thought Shawn, I don't think fq will be helpful here.  The
> field for which I want to turn TF off is "title", which is actually one of
> the primary components of score, so I really need it in qf.  I just don't
> want the TF portion of the score for that field only.  I don't want it to
> issue phrase queries to that field ever, but if the user quotes something,
> it does, and I don't know how to make it stop.  To me it seems potentially
> more appropriate to send that to the pf fields, although I can think of a
> couple good reasons to put it against qf.  That's fine as long as it
> doesn't try to build a phrase query against a no TF no pos field.
>
> Ryan
>
> On Wednesday, April 8, 2015, Shawn Heisey  wrote:
>
>> On 4/8/2015 5:06 PM, Ryan Josal wrote:
>> > The error:
>> > IllegalStateException: field "foo" indexed without position data; cannot
>> > run PhraseQuery.
>> >
>> > It would actually be ok for us to index position data but there isn't an
>> > option for that without term frequencies.  No TF is important for us when
>> > it comes to searching product titles.
>> >
>> > I should say that only a small fraction of user queries contained quoted
>> > phrases that trigger this error, so it works much of the time, but we'd
>> > also like to continue supporting user quoted phrase queries.
>> >
>> > So how can I index a field without TF and use it in edismax qf?
>>
>> If you omit positions, you can't do phrase queries.  As far as I know,
>> there is no option in Solr to omit only frequencies and not positions.
>>
>> I think there is a way that you can achieve what you want, though.  What
>> you are looking for is filters.  The fq parameter (filter query) will
>> restrict the result set to only entries that match the query, but will
>> not affect the relevancy score *at all*.  Here is an example of a filter
>> query that restricts the results to items that are in stock, assuming
>> you have the appropriate schema:
>>
>> fq=inStock:true
>>
>> Queries specified in fq will default to the lucene query parser, but you
>> can override that if you need to.  This query would be equivalent to the
>> previous one, but it would be parsed using edismax:
>>
>> fq={!edismax}inStock:true
>>
>> Here's another example of a useful filter, using yet another query parser:
>>
>> fq={!terms f=userId}bob,alice,susan
>>
>> Remember, the reason I have suggested filters is that they do not
>> influence score.
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefq%28FilterQuery%29Parameter
>>
>> Thanks,
>> Shawn
>>
>>


Re: Clusterstate - state active

2015-04-08 Thread Erick Erickson
Matt:

In a word, "yes". Depending on the size of the index for that shard,
the transition from Down->Recovering->Active may be too fast to catch.
If replicating the index takes a while, though, you should at least
see the "Recovering" state, during which time there won't be any
searches forwarded to that node.

Best,
Erick

On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper  wrote:
> Hello,
>
> When creating a new replica, and the state is recorded as active with in ZK 
> clusterstate, does that mean that new replica has synched with the leader 
> replica for the particular shard?
>
> Thanks,
> Matt
>


Memory Leak in solr 4.8.1

2015-04-08 Thread pras.venkatesh
I have a solr cloud instance with 8 nodes, 4 shards and facing memory leak on
the JVMs

here are the details of the instance.


1. 8 nodes, 4 shards(2 nodes per shard)
2. each node having about 55 GB of Data, in total there is 450 million
documents in the collection. so the document size is not huge, 
3. The schema has 42 fields, it gets reloaded every 15 mins with about
50,000 documents. Now we have primary Key for the index, so when there are
any duplicates the document gets re-written.
4. The GC policy is CMS, with heap size min and max = 8 gb and perm size =
512 mb and RAM on the VM is 24 gb.


when users start searching in solr and not always but often the heap keeps
growing and the GC cycles are not clearing up the heap. I see GC running for
almost 100,000 ms with still not clearing up the heap.

Appreciate any advice on this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Leak-in-solr-4-8-1-tp4198488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: omitTermFreqAndPositions issue

2015-04-08 Thread Ryan Josal
Thanks for your thought Shawn, I don't think fq will be helpful here.  The
field for which I want to turn TF off is "title", which is actually one of
the primary components of score, so I really need it in qf.  I just don't
want the TF portion of the score for that field only.  I don't want it to
issue phrase queries to that field ever, but if the user quotes something,
it does, and I don't know how to make it stop.  To me it seems potentially
more appropriate to send that to the pf fields, although I can think of a
couple good reasons to put it against qf.  That's fine as long as it
doesn't try to build a phrase query against a no TF no pos field.

Ryan

On Wednesday, April 8, 2015, Shawn Heisey  wrote:

> On 4/8/2015 5:06 PM, Ryan Josal wrote:
> > The error:
> > IllegalStateException: field "foo" indexed without position data; cannot
> > run PhraseQuery.
> >
> > It would actually be ok for us to index position data but there isn't an
> > option for that without term frequencies.  No TF is important for us when
> > it comes to searching product titles.
> >
> > I should say that only a small fraction of user queries contained quoted
> > phrases that trigger this error, so it works much of the time, but we'd
> > also like to continue supporting user quoted phrase queries.
> >
> > So how can I index a field without TF and use it in edismax qf?
>
> If you omit positions, you can't do phrase queries.  As far as I know,
> there is no option in Solr to omit only frequencies and not positions.
>
> I think there is a way that you can achieve what you want, though.  What
> you are looking for is filters.  The fq parameter (filter query) will
> restrict the result set to only entries that match the query, but will
> not affect the relevancy score *at all*.  Here is an example of a filter
> query that restricts the results to items that are in stock, assuming
> you have the appropriate schema:
>
> fq=inStock:true
>
> Queries specified in fq will default to the lucene query parser, but you
> can override that if you need to.  This query would be equivalent to the
> previous one, but it would be parsed using edismax:
>
> fq={!edismax}inStock:true
>
> Here's another example of a useful filter, using yet another query parser:
>
> fq={!terms f=userId}bob,alice,susan
>
> Remember, the reason I have suggested filters is that they do not
> influence score.
>
>
> https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefq%28FilterQuery%29Parameter
>
> Thanks,
> Shawn
>
>


Re: Clusterstate - state active

2015-04-08 Thread Anshum Gupta
Hi Matt,

If it's the replica state that you're looking at, yes, it means that the
Replica is in sync with the leader and serving/ready to serve requests.

On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper  wrote:

> Hello,
>
> When creating a new replica, and the state is recorded as active with in
> ZK clusterstate, does that mean that new replica has synched with the
> leader replica for the particular shard?
>
> Thanks,
> Matt
>
>


-- 
Anshum Gupta


RE: Clusterstate - state active

2015-04-08 Thread Matt Kuiper
Erick, Anshum,

Thanks for your replies!  Yes, it is replica state that I am looking at, and 
this the answer I was hoping for.  

I am working on a solution that involves moving some replicas to new Solr nodes 
as they are made available.  Before deleting the original replicas backing the 
shard, I check the replica state to make sure is active for the new replicas.  

Initially it was working pretty well, but with more recent testing I regularly 
see the shard go down.  The two new replicas go into failed recovery state 
after the original replicas are deleted, the logs report that a registered 
leader was not found for the shard.  Initially I was concerned that maybe the 
new shards were not fully synced with the leader, even though I checked for 
active state.

Now I am wondering if the new shards are somehow competing (or somehow 
reluctant )  to become leader, and thus neither become leader.  I plan to test 
just creating one new replica on a new solr node, checking for state is active, 
then deleting original replicas, and then creating second new replica.

Any thoughts?

Matt

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, April 08, 2015 4:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Clusterstate - state active

Matt:

In a word, "yes". Depending on the size of the index for that shard, the 
transition from Down->Recovering->Active may be too fast to catch.
If replicating the index takes a while, though, you should at least see the 
"Recovering" state, during which time there won't be any searches forwarded to 
that node.

Best,
Erick

On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper  wrote:
> Hello,
>
> When creating a new replica, and the state is recorded as active with in ZK 
> clusterstate, does that mean that new replica has synched with the leader 
> replica for the particular shard?
>
> Thanks,
> Matt
>


Re: omitTermFreqAndPositions issue

2015-04-08 Thread Shawn Heisey
On 4/8/2015 5:06 PM, Ryan Josal wrote:
> The error:
> IllegalStateException: field "foo" indexed without position data; cannot
> run PhraseQuery.
>
> It would actually be ok for us to index position data but there isn't an
> option for that without term frequencies.  No TF is important for us when
> it comes to searching product titles.
>
> I should say that only a small fraction of user queries contained quoted
> phrases that trigger this error, so it works much of the time, but we'd
> also like to continue supporting user quoted phrase queries.
>
> So how can I index a field without TF and use it in edismax qf?

If you omit positions, you can't do phrase queries.  As far as I know,
there is no option in Solr to omit only frequencies and not positions.

I think there is a way that you can achieve what you want, though.  What
you are looking for is filters.  The fq parameter (filter query) will
restrict the result set to only entries that match the query, but will
not affect the relevancy score *at all*.  Here is an example of a filter
query that restricts the results to items that are in stock, assuming
you have the appropriate schema:

fq=inStock:true

Queries specified in fq will default to the lucene query parser, but you
can override that if you need to.  This query would be equivalent to the
previous one, but it would be parsed using edismax:

fq={!edismax}inStock:true

Here's another example of a useful filter, using yet another query parser:

fq={!terms f=userId}bob,alice,susan

Remember, the reason I have suggested filters is that they do not
influence score.

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefq%28FilterQuery%29Parameter

Thanks,
Shawn



Re: Clusterstate - state active

2015-04-08 Thread Erick Erickson
Matt:

How are you creating the new replica? Are you giving it an explicit
name? And especially is it the same name as one you've already
deleted?

'cause I can't really imagine why you'd be getting a ZK exception
saying the node already exists.

Shot in the dark here..

On Wed, Apr 8, 2015 at 4:11 PM, Matt Kuiper  wrote:
> Found this error which likely explains my issue with new replicas not coming 
> up, not sure next step.  Almost looks like Zookeeper's record of a Shard's 
> leader is not being updated?
>
> 4/8/2015, 4:56:03 PM
> ERROR
> ShardLeaderElectionContext
> There was a problem trying to register as the 
> leader:org.apache.solr.common.SolrException: Could not register as the leader 
> because creating the ephemeral registration node in ZooKeeper failed
> There was a problem trying to register as the 
> leader:org.apache.solr.common.SolrException: Could not register as the leader 
> because creating the ephemeral registration node in ZooKeeper failed
> at 
> org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:150)
> at 
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:306)
> at 
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
> at 
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
> at 
> org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
> at 
> org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
> at 
> org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:209)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /collections/kla_collection/leaders/shard4
> at 
> org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40)
> at 
> org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:137)
> ... 11 more
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
> KeeperErrorCode = NodeExists for /collections/kla_collection/leaders/shard4
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> at 
> org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:462)
> at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
> at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:459)
> at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:416)
> at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:403)
> at 
> org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:142)
> at 
> org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:34)
>
> Matt
>
>
> -Original Message-
> From: Matt Kuiper [mailto:matt.kui...@issinc.com]
> Sent: Wednesday, April 08, 2015 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Clusterstate - state active
>
> Erick, Anshum,
>
> Thanks for your replies!  Yes, it is replica state that I am looking at, and 
> this the answer I was hoping for.
>
> I am working on a solution that involves moving some replicas to new Solr 
> nodes as they are made available.  Before deleting the original replicas 
> backing the shard, I check the replica state to make sure is active for the 
> new replicas.
>
> Initially it was working pretty well, but with more recent testing I 
> regularly see the shard go down.  The two new replicas go into failed 
> recovery state after the original replicas are deleted, the logs report that 
> a registered leader was not found for the shard.  Initially I was concerned 
> that maybe the new shards were not fully synced with the leader, even though 
> I checked for active state.
>
> Now I am wondering if the new shards are somehow competing (or somehow 
> reluctant )  to become leader, and thus neither become leader.  I plan to 
> test just creating one new replica on a new solr node, checking for state is 
> active, then deleting original replicas, and then creating second new replica.
>
> Any thoughts?
>
> Matt
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, April 08,

Documentation for Solr Cloud

2015-04-08 Thread Arumugam, Suresh
Hi All,

We are trying to setup the Solr Cloud in our team and able setup multiple nodes 
in one server as a cloud.

Need clarifications on the following.

Is there any good documentation, which can help us to build the Solr Cloud with 
multiple physical servers?
Since the Solr Cloud is distributed, will there be any latency in searching 
data in Solr Cloud vs Solr?

Thanks in advance.

Regards,
Suresh.A




Re: SOLR searching

2015-04-08 Thread Jack Krupansky
Are there at least a small number of categories of users with discrete
prices, or can each user have their own price. The former is doable, the
latter is not unless the number of users is relatively small, in which case
they are equivalent to categories.

You could have a set of dynamic fields, price_, and fill in the
user-id when doing the query.

-- Jack Krupansky

On Wed, Apr 8, 2015 at 5:21 PM, Brian Usrey 
wrote:

> I am extremely new to SOLR and am wondering if it is possible to do
> something like the following. Basically I have been tasked with researching
> SOLR to see if we can replace our current searching algorithm.
> We have a website with product data.  Product data includes standard
> things like Name, SKU, Description, StandardPrice and other things.  I can
> search that information with no issue, however in addition to the product
> data, the Price of the product can depend on the user that is logged in and
> doing the searching.  So, for one user, the product costs $2.50 and for
> another the same product costs $2.65.  Additionally a user might not have a
> price specifically for them, so we would display the standard price of for
> the product which might be $2.75.  The tables in the database look similar
> to this:
> ProductProductIDNameSKUDescriptionStandardPrice
>
> ProductUserPriceProductIDUserIDPrice
>
> How can I create the SOLR installation so that when searching for
> products, I return the product information, including the correct Price for
> the logged in user?  We will know the UserID when searching, so that can be
> part of the query.  Also, part of the search might be for all products in a
> certain price range ($.50 - $10.00) which should take into account the
> price for the specific user.
> I have started using the DataImporter and have come up with this
> design:
>pk="productid"
>   query="select
>   productid as id
> , ProductName
> , ProductDescription
> , StandardPrice
> from product"
>   deltaImportQuery=""
>   deltaQuery=""
>   >
>
>
>
> 
>
>   
>   
>   
>   
> 
> 
>
> I can use this to import Parent information, but I have no idea how to use
> the Child entity.  I don't see how to handle nested docs in the
> schema.xml.  Is it possible to do so?
>
> I would appreciate any help, and if there is another way to solve this
> beyond nested documents I am completely open to a different way.
>
> Thank You!
> Brian
>