Solr Quries
Hi, I am new to solr. I have following queries : 1. Is solr work in distributed environment ? if yes, how to configure it? 2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop) 3. I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way? In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache ) Thanks in advance -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
how to post(index) large file of 5 GB or greater than this
Hi, I am new to solr. I am able to index, search and update with small size(around 500mb) But if I try to index file with 5 to 10 or more that (500mb) it gives memory heap exception. While investigation I found that post jar or post.sh load whole file in memory. I use one work around with dividing small file in small files..and it's working Is there any other way to post large file as above work around is not feasible for 1 TB file Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: Solr Quries
Thanks for your help. Can you please provide detail configuration for solr distributed environment. How to setup master and slave ? for this in which file/s I have to do changes ? What are the shard parameters ? Can we integrate zookeeper with this ? Please provide details for this. Thanks in advance. -Pravin -Original Message- From: Sandeep Tagore [mailto:sandeep.tag...@gmail.com] Sent: Wednesday, October 07, 2009 4:29 PM To: solr-user@lucene.apache.org Subject: Re: Solr Quries Hi Pravin, 1. Is solr work in distributed environment ? if yes, how to configure it? Yep. You can achieve this with Sharding. For example: Install and Configure Solr on two machines and declare any one of those as master. Insert shard parameters while you index and search your data. 2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop) Sorry. No idea. 3. I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way? In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache ) I think, XML is not the best way. I don't suggest it. If you have that 1 TB data in a database you can achieve this simply using full import command. Configure your DB details in solr-config.xml and data-config.xml and add you DB driver jar to solr lib directory. Now import the data in slices (say dept wise, or in some category wise..). In future, you can import the data from a DB or you can index the data directly using client-API with simple java beans. Hope this info helps you. Regards, Sandeep Tagore -- View this message in context: http://www.nabble.com/Solr-Quries-tp25780371p25783891.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: Solr Quries
Thanks for your reply. I have one more query regarding solr distributed environment. I have configured solr on to machine as per http://wiki.apache.org/solr/DistributedSearch But I have following test case - Suppose I have two machine ,Sever1 ,Server2 I have post record with id 1 on sever1 and put other record on server2 with same id i.e. 1 So when I gives query like http://sever1:8983/solr/select?shards=server1:8983/solr,server2:8983/solr&; &q=1 this gives result from server1 http://server2:8983/solr/select?shards=server2:8983/solr,server1/solr&q=1 this gives result from server2 how to solve this.. Is any other setting is required for this ? Thanks in advance -Pravin -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, October 07, 2009 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Solr Quries First, please do not cross-post messages to both solr-dev and solr-user. Solr-dev is only for development related discussions. Comments inline: On Wed, Oct 7, 2009 at 9:59 AM, Pravin Karne wrote: > Hi, > I am new to solr. I have following queries : > > > 1. Is solr work in distributed environment ? if yes, how to configure > it? > Yes, Solr works in distributed environment. See http://wiki.apache.org/solr/DistributedSearch > > > > 2. Is solr have Hadoop support? if yes, how to setup it with > Hadoop/HDFS? (Note: I am familiar with Hadoop) > > Not currently. There is some work going on at https://issues.apache.org/jira/browse/SOLR-1457 > > > 3. I have employee information(id, name ,address, cell no, personal > info) of 1 TB ,To post(index)this data on solr server, shall I have to > create xml file with this data and then post it to solr server? Or is there > any other optimal way? In future my data will grow upto 10 TB , then how > can I index this data ?(because creating xml is more headache ) > > XML is just one way. You could use also CSV. If you use, the Solrj java client with Solr 1.4 (soon to be released), it uses an efficient binary format for posting data to Solr. -- Regards, Shalin Shekhar Mangar. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
how to deploy index on solr
Hi I have index data with Lucene. I want to deploy this indexes on solr for search. Generally we index and search data with Solr, but now I want to just search with Lucene indexes. How can we do this ? -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
dose solr sopport distribute index storage ?
Hi, I am new to solr. I have configured solr successfully and its working smoothly. I have one query: I want index large data(around 100GB).So can we store these indexes on different machine as distributed system. So there will be one master and more slave . Also we have to keep these data in sync over all the node. So when I send update request solr will update that record from corresponding node. In short I want to create scalable and optimal search system. Is this possible with solr? Please help in this. Any pointer regarding this will be highly appreciated. Thanks in advance -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: dose solr sopport distribute index storage ?
How to set master/slave setup for solr. What are the configuration steps for this? -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, October 09, 2009 6:51 PM To: solr-user@lucene.apache.org Subject: Re: dose solr sopport distribute index storage ? On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne wrote: > Hi, > I am new to solr. I have configured solr successfully and its working > smoothly. > > I have one query: > > I want index large data(around 100GB).So can we store these indexes on > different machine as distributed system. > > Are you talking about one large index with 100GB of data? Or do you plan to shard the data into multiple smaller indexes and use Solr's distributed search? > So there will be one master and more slave . Also we have to keep these > data in sync over all the node. > > So when I send update request solr will update that record from > corresponding node. > > Solr will not update corresponding node automatically. You have to make sure to send the add/delete request to the master of the correct shard. Solr does not support update operation (it is always a replace by uniqueKey). > In short I want to create scalable and optimal search system. > > Is this possible with solr? > > Of course you can create a scalable and optimal search system with Solr. We do that all the time ;) -- Regards, Shalin Shekhar Mangar. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: dose solr sopport distribute index storage ?
I am looking for one large index with 100GB of data. How to store this on distribute system. -Thanks -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, October 09, 2009 6:51 PM To: solr-user@lucene.apache.org Subject: Re: dose solr sopport distribute index storage ? On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne wrote: > Hi, > I am new to solr. I have configured solr successfully and its working > smoothly. > > I have one query: > > I want index large data(around 100GB).So can we store these indexes on > different machine as distributed system. > > Are you talking about one large index with 100GB of data? Or do you plan to shard the data into multiple smaller indexes and use Solr's distributed search? > So there will be one master and more slave . Also we have to keep these > data in sync over all the node. > > So when I send update request solr will update that record from > corresponding node. > > Solr will not update corresponding node automatically. You have to make sure to send the add/delete request to the master of the correct shard. Solr does not support update operation (it is always a replace by uniqueKey). > In short I want to create scalable and optimal search system. > > Is this possible with solr? > > Of course you can create a scalable and optimal search system with Solr. We do that all the time ;) -- Regards, Shalin Shekhar Mangar. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
hadoop configuarions for SOLR-1301 patch
Hi, I am using SOLR-1301 path. I have build the solr with given patch. But I am not able to configure Hadoop for above war. I want to run solr(create index) with 3 nodes (1+2) cluster. How to do the Hadoop configurations for above patch? How to set master and slave? Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
RE: hadoop configuarions for SOLR-1301 patch
Hi, Patch(SOLR-1301) provides distributed indexing (using Hadoop). Now I have Hadoop cluster with 1 master and 2 slaves. Also I have applied above path to solr and build solr. So how I integrate above solr executables with Hadoop cluster? Can u please tell what are the steps for this. Shall I just have copy solr war to Hadoop cluster or what else ? (Note: I have two setup : 1. Hadoop setup 2. Solr setup) So to run distributed indexing how to bridge these two setup? Thanks -Pravin -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Friday, October 16, 2009 7:45 AM To: solr-user@lucene.apache.org Subject: Re: hadoop configuarions for SOLR-1301 patch Hi Pravin, You'll need to setup a Hadoop cluster which is independent of SOLR-1301. 1301 is for building Solr indexes only, so there isn't a master and slave. After building the indexes one needs to provision the indexes to Solr servers. In my case I only have slaves because I'm not incrementally indexing on the Hadoop generated shards. 1301 does need a Hadoop specific unit test, which I got started and need to complete, that could help a little in understanding. -J On Wed, Oct 14, 2009 at 5:45 AM, Pravin Karne wrote: > Hi, > I am using SOLR-1301 path. I have build the solr with given patch. > But I am not able to configure Hadoop for above war. > > I want to run solr(create index) with 3 nodes (1+2) cluster. > > How to do the Hadoop configurations for above patch? > How to set master and slave? > > > Thanks > -Pravin > > > > > DISCLAIMER > == > This e-mail may contain privileged and confidential information which is the > property of Persistent Systems Ltd. It is intended only for the use of the > individual or entity to which it is addressed. If you are not the intended > recipient, you are not authorized to read, retain, copy, print, distribute or > use this message. If you have received this communication in error, please > notify the sender and delete all copies of this message. Persistent Systems > Ltd. does not accept any liability for virus infected mails. > DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.