Solr Quries

2009-10-06 Thread Pravin Karne
Hi,
I am new to solr. I have following queries :


1.   Is solr work in distributed environment ? if yes, how to configure it?



2.   Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? 
(Note: I am familiar with Hadoop)



3.   I have employee information(id, name ,address, cell no, personal info) 
of 1 TB ,To post(index)this data on solr server, shall I have to create xml 
file with this data and then post it to solr server? Or is there any other 
optimal way?  In future my data will grow upto 10 TB , then how can I index 
this data ?(because creating xml is more headache )





Thanks in advance

-Pravin




DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Pravin Karne
Hi,
I am new to solr. I am able to index, search and update with small size(around 
500mb)
But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
heap exception.
While investigation I found that post jar or post.sh load whole file in memory.

I use one work around with dividing small file in small files..and it's working

Is there any other way to post large file as above work around is not feasible 
for 1 TB file

Thanks
-Pravin


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: Solr Quries

2009-10-08 Thread Pravin Karne
Thanks for your help.
Can you please provide detail configuration for solr distributed environment.
How to setup master and slave ? for this in which  file/s I have to do changes ?
What are the shard parameters ?

Can we integrate zookeeper with this ?

Please provide details for this.

Thanks in advance.
-Pravin

-Original Message-
From: Sandeep Tagore [mailto:sandeep.tag...@gmail.com]
Sent: Wednesday, October 07, 2009 4:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Quries


Hi Pravin,

1. Is solr work in distributed environment ? if yes, how to configure it?
Yep. You can achieve this with Sharding.
For example: Install and Configure Solr on two machines and declare any one
of those as master. Insert shard parameters while you index and search your
data.

2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS?
(Note: I am familiar with Hadoop)
Sorry. No idea.

3. I have employee information(id, name ,address, cell no, personal info) of
1 TB ,To post(index)this data on solr server, shall I have to create xml
file with this data and then post it to solr server? Or is there any other
optimal way?  In future my data will grow upto 10 TB , then how can I index
this data ?(because creating xml is more headache )
I think, XML is not the best way. I don't suggest it. If you have that 1 TB
data in a database you can achieve this simply using full import command.
Configure your DB details in solr-config.xml and data-config.xml and add you
DB driver jar to solr lib directory. Now import the data in slices (say dept
wise, or in some category wise..). In future, you can import the data from a
DB or you can index the data directly using client-API with simple java
beans.

Hope this info helps you.

Regards,
Sandeep Tagore
--
View this message in context: 
http://www.nabble.com/Solr-Quries-tp25780371p25783891.html
Sent from the Solr - User mailing list archive at Nabble.com.


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: Solr Quries

2009-10-08 Thread Pravin Karne
Thanks for your reply.
I have one more query regarding solr distributed environment.

I have configured solr on to machine as per 
http://wiki.apache.org/solr/DistributedSearch

But I have following test case -

Suppose I have two machine ,Sever1 ,Server2

I have post record with id 1 on sever1 and put other record on server2 with 
same id i.e. 1

So when I gives query like 
http://sever1:8983/solr/select?shards=server1:8983/solr,server2:8983/solr&; &q=1
this gives result from server1



http://server2:8983/solr/select?shards=server2:8983/solr,server1/solr&q=1
this gives result from server2

how to solve this..

Is any other setting is required for this ?

Thanks in advance
-Pravin

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, October 07, 2009 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Quries

First, please do not cross-post messages to both solr-dev and solr-user.
Solr-dev is only for development related discussions.

Comments inline:

On Wed, Oct 7, 2009 at 9:59 AM, Pravin Karne
wrote:

> Hi,
> I am new to solr. I have following queries :
>
>
> 1.   Is solr work in distributed environment ? if yes, how to configure
> it?
>

Yes, Solr works in distributed environment. See
http://wiki.apache.org/solr/DistributedSearch


>
>
>
> 2.   Is solr have Hadoop support? if yes, how to setup it with
> Hadoop/HDFS? (Note: I am familiar with Hadoop)
>
>
Not currently. There is some work going on at
https://issues.apache.org/jira/browse/SOLR-1457


>
>
> 3.   I have employee information(id, name ,address, cell no, personal
> info) of 1 TB ,To post(index)this data on solr server, shall I have to
> create xml file with this data and then post it to solr server? Or is there
> any other optimal way?  In future my data will grow upto 10 TB , then how
> can I index this data ?(because creating xml is more headache )
>
>
XML is just one way. You could use also CSV. If you use, the Solrj java
client with Solr 1.4 (soon to be released), it uses an efficient binary
format for posting data to Solr.

-- 
Regards,
Shalin Shekhar Mangar.

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


how to deploy index on solr

2009-10-09 Thread Pravin Karne
Hi
I have index data with Lucene. I want to deploy this indexes on solr for search.

Generally we  index and search data with Solr, but now I want to just search 
with Lucene indexes.

How can we do this ?

-Pravin

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


dose solr sopport distribute index storage ?

2009-10-09 Thread Pravin Karne
Hi,
I am new to solr. I have configured solr successfully and its working smoothly.

I have one query:

I want index large data(around 100GB).So can we store these indexes on 
different machine as distributed system.

So there will be one master and more slave . Also we have to keep these data in 
sync over all the node.

So when I send update request solr will update that record from corresponding 
node.

In short I want to create scalable and optimal search system.

Is this possible with solr?

Please help in this. Any pointer  regarding this will be highly appreciated.

Thanks in advance


-Pravin

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: dose solr sopport distribute index storage ?

2009-10-11 Thread Pravin Karne
How to set master/slave setup for solr.

What are the configuration steps for this?


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, October 09, 2009 6:51 PM
To: solr-user@lucene.apache.org
Subject: Re: dose solr sopport distribute index storage ?

On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne
wrote:

> Hi,
> I am new to solr. I have configured solr successfully and its working
> smoothly.
>
> I have one query:
>
> I want index large data(around 100GB).So can we store these indexes on
> different machine as distributed system.
>
>
Are you talking about one large index with 100GB of data? Or do you plan to
shard the data into multiple smaller indexes and use Solr's distributed
search?


> So there will be one master and more slave . Also we have to keep these
> data in sync over all the node.
>
> So when I send update request solr will update that record from
> corresponding node.
>
>
Solr will not update corresponding node automatically. You have to make sure
to send the add/delete request to the master of the correct shard. Solr does
not support update operation (it is always a replace by uniqueKey).


> In short I want to create scalable and optimal search system.
>
> Is this possible with solr?
>
>
Of course you can create a scalable and optimal search system with Solr. We
do that all the time ;)

-- 
Regards,
Shalin Shekhar Mangar.

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: dose solr sopport distribute index storage ?

2009-10-11 Thread Pravin Karne
I am looking for one large index with 100GB of data.

How to store this on distribute system.

-Thanks

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Friday, October 09, 2009 6:51 PM
To: solr-user@lucene.apache.org
Subject: Re: dose solr sopport distribute index storage ?

On Fri, Oct 9, 2009 at 6:10 PM, Pravin Karne
wrote:

> Hi,
> I am new to solr. I have configured solr successfully and its working
> smoothly.
>
> I have one query:
>
> I want index large data(around 100GB).So can we store these indexes on
> different machine as distributed system.
>
>
Are you talking about one large index with 100GB of data? Or do you plan to
shard the data into multiple smaller indexes and use Solr's distributed
search?




> So there will be one master and more slave . Also we have to keep these
> data in sync over all the node.
>
> So when I send update request solr will update that record from
> corresponding node.
>
>
Solr will not update corresponding node automatically. You have to make sure
to send the add/delete request to the master of the correct shard. Solr does
not support update operation (it is always a replace by uniqueKey).


> In short I want to create scalable and optimal search system.
>
> Is this possible with solr?
>
>
Of course you can create a scalable and optimal search system with Solr. We
do that all the time ;)

-- 
Regards,
Shalin Shekhar Mangar.

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


hadoop configuarions for SOLR-1301 patch

2009-10-14 Thread Pravin Karne
Hi,
I am using SOLR-1301 path. I have build the solr with given patch.
But I am not able to configure Hadoop for above war.

I want to run solr(create index) with 3 nodes (1+2) cluster.

How to do the Hadoop configurations for above patch?
How to set master and slave?


Thanks
-Pravin




DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


RE: hadoop configuarions for SOLR-1301 patch

2009-10-15 Thread Pravin Karne
Hi,
Patch(SOLR-1301) provides distributed indexing (using Hadoop).

Now I have Hadoop cluster with 1 master and 2 slaves.

Also I have applied above path to solr and build solr.

So how I integrate above solr executables with Hadoop cluster?

Can u please tell what are the steps for this.

Shall I just have copy solr war to Hadoop  cluster or what else ?

(Note: I have two setup :
  1. Hadoop setup
  2. Solr setup)

So to run distributed indexing how to bridge these two setup?

Thanks
-Pravin
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Friday, October 16, 2009 7:45 AM
To: solr-user@lucene.apache.org
Subject: Re: hadoop configuarions for SOLR-1301 patch

Hi Pravin,

You'll need to setup a Hadoop cluster which is independent of
SOLR-1301. 1301 is for building Solr indexes only, so there
isn't a master and slave. After building the indexes one needs
to provision the indexes to Solr servers. In my case I only have
slaves because I'm not incrementally indexing on the Hadoop
generated shards.

1301 does need a Hadoop specific unit test, which I got started
and need to complete, that could help a little in understanding.

-J

On Wed, Oct 14, 2009 at 5:45 AM, Pravin Karne
 wrote:
> Hi,
> I am using SOLR-1301 path. I have build the solr with given patch.
> But I am not able to configure Hadoop for above war.
>
> I want to run solr(create index) with 3 nodes (1+2) cluster.
>
> How to do the Hadoop configurations for above patch?
> How to set master and slave?
>
>
> Thanks
> -Pravin
>
>
>
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.
>

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.