Index structuring

2008-05-31 Thread Ritesh Ambastha

Dear Readers, 

I am a newbie in solr world. I have successfully deployed solr on my
machine, and I am able to index a large DB table. I am pretty sure that
internal index structure of solr is much capable to handle large data sets. 

But, say my data size keeps growing at jet speed, then what should be the
index structure? Do I need to follow some specific index structuring
patterns/algos for handling such massive data?

I am sorry as I may be sounding novice in this area. I would appreciate your
thoughts/suggestions.

Regards,
Ritesh Ambastha
-- 
View this message in context: 
http://www.nabble.com/Index-structuring-tp17576449p17576449.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to describe 2 entities in dataConfig for the DataImporter?

2008-05-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
julio,
Looks like it is a bug.
We can give u a new TemplateTransformer.java which we will incorporate
in the next patch
--Noble

On Sat, May 31, 2008 at 12:24 AM, Julio Castillo
<[EMAIL PROTECTED]> wrote:
> I'm sorry Shalin, but I still get the same Null Pointer exception. This is
> my complete dataconfig.xml (I remove the parallel entity to narrow down the
> scope of the problem).
> 
>  
>query="select id as idAlias,first_name,last_name FROM vets"
>deltaQuery="SELECT id as idAlias FROM vets WHERE last_modified >
> '${dataimporter.last_index_time}'"
>transformer="TemplateTransformer">
> template="vets-${vets.idAlias}"/>
>
>
>
>  
> 
>
> Thanks again.
>
> ** julio
>
> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Friday, May 30, 2008 11:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?
>
> The surname is used just as an example of a field.
>
> The NullPointerException is because the same field "id" tries to use it's
> own value in a template. The template cannot contain the same field on which
> it is being applied. I'd suggest that you get the id aliased to another
> name, for example using a query "select id as idAlias from vets" and then
> use:
> 
>
> That should work, let me know if you face a problem.
>
> On Fri, May 30, 2008 at 10:40 PM, Julio Castillo <[EMAIL PROTECTED]>
> wrote:
>> Thanks for all the leads.
>> I did get however a null pointer exception while implementing it:
>>
>> May 30, 2008 9:57:50 AM
>> org.apache.solr.handler.dataimport.EntityProcessorBase
>> applyTransformer
>> WARNING: transformer threw error
>> java.lang.NullPointerException
>>   at
>> org.apache.solr.handler.dataimport.TemplateTransformer.transformRow(Te
>> mplate
>> Transformer.java:55)
>>   at
>> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransforme
>> r(Enti
>> tyProcessorBase.java:186)
>>
>> Looking at the source code, it appears that the resolverMap is null.
>> The resolver returned null given the entityName.
>>
>> Looking at the documentation, there is the reference to a eparent.surname.
>> The example says:
>> > />
>>
>> I'm afraid, I don't know what an eparent.surname is. This is my
>> current dataconfig.xml configuration excerpt:
>>
>> > transformer="TemplateTransformer">
>>...
>>
>> Am I missing a surname? Whatever that may be?
>>
>> Thanks
>>
>> ** julio
>>
>> -Original Message-
>> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, May 29, 2008 11:10 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to describe 2 entities in dataConfig for the
> DataImporter?
>>
>> Sorry I forgot to mention that.
>> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990
>> fb03c4
>> ff461b3736496a9
>> --Noble
>>
>> On Fri, May 30, 2008 at 11:37 AM, Shalin Shekhar Mangar
>> <[EMAIL PROTECTED]> wrote:
>>> You need to enable TemplateTransformer for your entity. For example:
>>> >> transformer="TemplateTransformer">
>>>
>>> On Fri, May 30, 2008 at 11:31 AM, Julio Castillo
>>> <[EMAIL PROTECTED]> wrote:
 Noble,
 I tried the template setting for the "id" field, but I didn't notice
 any different behavior. I also didn't see where this would be reflected.
 I looked at the fields and the debug output for the dataImporter and
 couldn't see any reference to a modified id name (per the template
 instructions).

 The behavior in the end seemed to be the same. Did I miss anything?
 I assume that the id setting in the
 schema.xml remains the same?

 Thanks again

 ** julio

 -Original Message-
 From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 29, 2008 9:46 PM
 To: solr-user@lucene.apache.org
 Subject: Re: How to describe 2 entities in dataConfig for the
>> DataImporter?

 Consider constructing the id concatenating an extra string for each
 document . You can construct that field using the TeplateTransformer.
 in the entity owners keep the id as

  and in
 vets 

 or anything else which can make it unique

 --Noble

 On Fri, May 30, 2008 at 10:05 AM, Shalin Shekhar Mangar
 <[EMAIL PROTECTED]> wrote:
> That will happen only if id is the uniqueKey in Solr and the id
> coming from both your tables have same values. In that case, they
> will overwrite each other. You will need a separate uniqueKey (on
> other than id field).
>
> On Fri, May 30, 2008 at 6:34 AM, Julio Castillo
> <[EMAIL PROTECTED]>
 wrote:
>> Thanks Shalin,
>> I tried putting everything under the same document (two different
>> unrelated entities), and got a bit further.
>>
>> My problem now appears to be both of them stepping on each other
>> due to
 "id"
>> conflicts. Currently my id is define

Re: How to describe 2 entities in dataConfig for the DataImporter?

2008-05-31 Thread Shalin Shekhar Mangar
Hi Julio,

I've fixed the bug, can you please replace the exiting
TemplateTransformer.java in the SOLR-469.patch and use the attached
TemplateTransformer.java file. We'll add the changes to our next
patch. Sorry for all the trouble.

On Sat, May 31, 2008 at 10:31 PM, Noble Paul നോബിള്‍ नोब्ळ्
<[EMAIL PROTECTED]> wrote:
> julio,
> Looks like it is a bug.
> We can give u a new TemplateTransformer.java which we will incorporate
> in the next patch
> --Noble
>
> On Sat, May 31, 2008 at 12:24 AM, Julio Castillo
> <[EMAIL PROTECTED]> wrote:
>> I'm sorry Shalin, but I still get the same Null Pointer exception. This is
>> my complete dataconfig.xml (I remove the parallel entity to narrow down the
>> scope of the problem).
>> 
>>  
>>>query="select id as idAlias,first_name,last_name FROM vets"
>>deltaQuery="SELECT id as idAlias FROM vets WHERE last_modified >
>> '${dataimporter.last_index_time}'"
>>transformer="TemplateTransformer">
>>> template="vets-${vets.idAlias}"/>
>>
>>
>>
>>  
>> 
>>
>> Thanks again.
>>
>> ** julio
>>
>> -Original Message-
>> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
>> Sent: Friday, May 30, 2008 11:38 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?
>>
>> The surname is used just as an example of a field.
>>
>> The NullPointerException is because the same field "id" tries to use it's
>> own value in a template. The template cannot contain the same field on which
>> it is being applied. I'd suggest that you get the id aliased to another
>> name, for example using a query "select id as idAlias from vets" and then
>> use:
>> 
>>
>> That should work, let me know if you face a problem.
>>
>> On Fri, May 30, 2008 at 10:40 PM, Julio Castillo <[EMAIL PROTECTED]>
>> wrote:
>>> Thanks for all the leads.
>>> I did get however a null pointer exception while implementing it:
>>>
>>> May 30, 2008 9:57:50 AM
>>> org.apache.solr.handler.dataimport.EntityProcessorBase
>>> applyTransformer
>>> WARNING: transformer threw error
>>> java.lang.NullPointerException
>>>   at
>>> org.apache.solr.handler.dataimport.TemplateTransformer.transformRow(Te
>>> mplate
>>> Transformer.java:55)
>>>   at
>>> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransforme
>>> r(Enti
>>> tyProcessorBase.java:186)
>>>
>>> Looking at the source code, it appears that the resolverMap is null.
>>> The resolver returned null given the entityName.
>>>
>>> Looking at the documentation, there is the reference to a eparent.surname.
>>> The example says:
>>> >> />
>>>
>>> I'm afraid, I don't know what an eparent.surname is. This is my
>>> current dataconfig.xml configuration excerpt:
>>>
>>> >> transformer="TemplateTransformer">
>>>...
>>>
>>> Am I missing a surname? Whatever that may be?
>>>
>>> Thanks
>>>
>>> ** julio
>>>
>>> -Original Message-
>>> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
>>> Sent: Thursday, May 29, 2008 11:10 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How to describe 2 entities in dataConfig for the
>> DataImporter?
>>>
>>> Sorry I forgot to mention that.
>>> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990
>>> fb03c4
>>> ff461b3736496a9
>>> --Noble
>>>
>>> On Fri, May 30, 2008 at 11:37 AM, Shalin Shekhar Mangar
>>> <[EMAIL PROTECTED]> wrote:
 You need to enable TemplateTransformer for your entity. For example:
 >>> transformer="TemplateTransformer">

 On Fri, May 30, 2008 at 11:31 AM, Julio Castillo
 <[EMAIL PROTECTED]> wrote:
> Noble,
> I tried the template setting for the "id" field, but I didn't notice
> any different behavior. I also didn't see where this would be reflected.
> I looked at the fields and the debug output for the dataImporter and
> couldn't see any reference to a modified id name (per the template
> instructions).
>
> The behavior in the end seemed to be the same. Did I miss anything?
> I assume that the id setting in the
> schema.xml remains the same?
>
> Thanks again
>
> ** julio
>
> -Original Message-
> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 29, 2008 9:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to describe 2 entities in dataConfig for the
>>> DataImporter?
>
> Consider constructing the id concatenating an extra string for each
> document . You can construct that field using the TeplateTransformer.
> in the entity owners keep the id as
>
>  and in
> vets 
>
> or anything else which can make it unique
>
> --Noble
>
> On Fri, May 30, 2008 at 10:05 AM, Shalin Shekhar Mangar
> <[EMAIL PROTECTED]> wrote:
>> That will happen only if id is the uniqueKey in Solr and the id
>> coming from both your tables have same values. In that case, they
>> will overw

Re: Index structuring

2008-05-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
You could have been more specific on the dataset size.

If your data volumes are growing you can partition your index into
multiple shards.
http://wiki.apache.org/solr/DistributedSearch
--Noble

On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]> wrote:
>
> Dear Readers,
>
> I am a newbie in solr world. I have successfully deployed solr on my
> machine, and I am able to index a large DB table. I am pretty sure that
> internal index structure of solr is much capable to handle large data sets.
>
> But, say my data size keeps growing at jet speed, then what should be the
> index structure? Do I need to follow some specific index structuring
> patterns/algos for handling such massive data?
>
> I am sorry as I may be sounding novice in this area. I would appreciate your
> thoughts/suggestions.
>
> Regards,
> Ritesh Ambastha
> --
> View this message in context: 
> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


RE: How to describe 2 entities in dataConfig for the DataImporter?

2008-05-31 Thread Julio Castillo
Not a problem Shalin,
On the contrary, thanks for all your hard work.

I will try it as soon as possible.

** julio 

-Original Message-
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Saturday, May 31, 2008 10:26 AM
To: solr-user@lucene.apache.org
Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?

Hi Julio,

I've fixed the bug, can you please replace the exiting
TemplateTransformer.java in the SOLR-469.patch and use the attached
TemplateTransformer.java file. We'll add the changes to our next patch.
Sorry for all the trouble.

On Sat, May 31, 2008 at 10:31 PM, Noble Paul ??? ??
<[EMAIL PROTECTED]> wrote:
> julio,
> Looks like it is a bug.
> We can give u a new TemplateTransformer.java which we will incorporate 
> in the next patch --Noble
>
> On Sat, May 31, 2008 at 12:24 AM, Julio Castillo 
> <[EMAIL PROTECTED]> wrote:
>> I'm sorry Shalin, but I still get the same Null Pointer exception. 
>> This is my complete dataconfig.xml (I remove the parallel entity to 
>> narrow down the scope of the problem).
>> 
>>  
>>>query="select id as idAlias,first_name,last_name FROM vets"
>>deltaQuery="SELECT id as idAlias FROM vets WHERE 
>> last_modified > '${dataimporter.last_index_time}'"
>>transformer="TemplateTransformer">
>>> template="vets-${vets.idAlias}"/>
>>
>>
>>
>>  
>> 
>>
>> Thanks again.
>>
>> ** julio
>>
>> -Original Message-
>> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
>> Sent: Friday, May 30, 2008 11:38 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: How to describe 2 entities in dataConfig for the
DataImporter?
>>
>> The surname is used just as an example of a field.
>>
>> The NullPointerException is because the same field "id" tries to use 
>> it's own value in a template. The template cannot contain the same 
>> field on which it is being applied. I'd suggest that you get the id 
>> aliased to another name, for example using a query "select id as 
>> idAlias from vets" and then
>> use:
>> 
>>
>> That should work, let me know if you face a problem.
>>
>> On Fri, May 30, 2008 at 10:40 PM, Julio Castillo 
>> <[EMAIL PROTECTED]>
>> wrote:
>>> Thanks for all the leads.
>>> I did get however a null pointer exception while implementing it:
>>>
>>> May 30, 2008 9:57:50 AM
>>> org.apache.solr.handler.dataimport.EntityProcessorBase
>>> applyTransformer
>>> WARNING: transformer threw error
>>> java.lang.NullPointerException
>>>   at
>>> org.apache.solr.handler.dataimport.TemplateTransformer.transformRow(
>>> Te
>>> mplate
>>> Transformer.java:55)
>>>   at
>>> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransfor
>>> me
>>> r(Enti
>>> tyProcessorBase.java:186)
>>>
>>> Looking at the source code, it appears that the resolverMap is null.
>>> The resolver returned null given the entityName.
>>>
>>> Looking at the documentation, there is the reference to a
eparent.surname.
>>> The example says:
>>> >> />
>>>
>>> I'm afraid, I don't know what an eparent.surname is. This is my 
>>> current dataconfig.xml configuration excerpt:
>>>
>>> >> transformer="TemplateTransformer">
>>>...
>>>
>>> Am I missing a surname? Whatever that may be?
>>>
>>> Thanks
>>>
>>> ** julio
>>>
>>> -Original Message-
>>> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
>>> Sent: Thursday, May 29, 2008 11:10 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: How to describe 2 entities in dataConfig for the
>> DataImporter?
>>>
>>> Sorry I forgot to mention that.
>>> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a9
>>> 90
>>> fb03c4
>>> ff461b3736496a9
>>> --Noble
>>>
>>> On Fri, May 30, 2008 at 11:37 AM, Shalin Shekhar Mangar 
>>> <[EMAIL PROTECTED]> wrote:
 You need to enable TemplateTransformer for your entity. For example:
 >>> transformer="TemplateTransformer">

 On Fri, May 30, 2008 at 11:31 AM, Julio Castillo 
 <[EMAIL PROTECTED]> wrote:
> Noble,
> I tried the template setting for the "id" field, but I didn't 
> notice any different behavior. I also didn't see where this would be
reflected.
> I looked at the fields and the debug output for the dataImporter 
> and couldn't see any reference to a modified id name (per the 
> template instructions).
>
> The behavior in the end seemed to be the same. Did I miss anything?
> I assume that the id setting in the 
> schema.xml remains the same?
>
> Thanks again
>
> ** julio
>
> -Original Message-
> From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 29, 2008 9:46 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to describe 2 entities in dataConfig for the
>>> DataImporter?
>
> Consider constructing the id concatenating an extra string for 
> each document . You can construct that field using the
TeplateTransformer.
> in the entity owners