Re: what does the version parameter in the query mean?

2009-05-22 Thread Shalin Shekhar Mangar
On Fri, May 22, 2009 at 7:40 AM, Anshuman Manur  wrote:

> ahI see! thank you so much for the response!
>
> I'm using SolrJ, so I probably don't need to set XML version since the wiki
> tells me that it uses binary as a default!
>
>
Solrj automatically adds the correct version parameter/value. You do not
need to add it yourself.

-- 
Regards,
Shalin Shekhar Mangar.


Re: No sanity checks before replicating files?

2009-05-22 Thread Shalin Shekhar Mangar
I think this problem might happen when there are uncommitted changes in S2
and the master S1 comes back online. In that case, slave's generation is
still less than master's and installation of index diff from master may
fail.

However, I do not understand a few points. Damien, if S1 comes back online
and S2 starts replicating from S1, any changes to S2 will be discarded when
a successful replication happens. How do you intend to protect against that?

A better way is to detect when S1 comes back online and make it a slave of
S2.

2009/5/22 Noble Paul നോബിള്‍ नोब्ळ् 

> Let us see what is the desired behavior.
>
> When s1 comes back up online , s2 must download a fresh copy of index
> from s1 because s1 is the slave and s2 has a newer version of index
> than s1.
>
> Are you suggesting that s2 downloads the index files and then commit
> fails? The code is written as follows
>
> boolean freshDownloadneeded = myIndexGeneration >= mastersIndexgeneration;
>
> then it should be a problem
>
> can u post the stacktrace?
>
> On Thu, May 21, 2009 at 11:45 PM, Otis Gospodnetic
>  wrote:
> >
> > Aha, I see.  Perhaps you can post the error message/stack trace?
> >
> > As for the sanity check, I bet a call to 
> > http://host:port/solr/replication?command=indexversion
> could be used ensure only newer versions of the index are being pulled.
>  We'll see what Paul says when he wakes up. :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: Damien Tournoud 
> >> To: solr-user@lucene.apache.org
> >> Sent: Thursday, May 21, 2009 1:26:30 PM
> >> Subject: Re: No sanity checks before replicating files?
> >>
> >> Hi Otis,
> >>
> >> Thanks for your answer.
> >>
> >> On Thu, May 21, 2009 at 7:14 PM, Otis Gospodnetic
> >> wrote:
> >> > Interesting, this is similar to my suggestion to another person I just
> replied
> >> to here on solr-user.
> >> > Have you actually run into this problem?  I haven't tried it, but I'd
> think
> >> the first next replication (copying index from s1 to s2) would not
> necessarily
> >> fail, but would simply overwrite any changes that were made on s2 while
> it was
> >> serving as the master.  Is that not what happens?
> >>
> >> No it doesn't. For some reason, Solr download all the files of the
> >> index, but fails to commit the changes locally. At the next poll, the
> >> process restarts. Not only does this clogs the network, but it also
> >> unnecessarily uses resources on the newly promoted slave, until we
> >> change its configuration.
> >>
> >> > If that's what happens, then I think what you'd simply have to do is
> to:
> >> >
> >> > 1) bring s1 back up, but don't make it a master immediately
> >> > 2) take away the master role from s2
> >> > 3) make s1 copy the index from s2, since s2 might have a more up to
> date index
> >> now
> >> > 4) make s1 the master
> >>
> >> Once s2 is the master, we want it to stay this way. We will reassign
> >> s1 as the slave at a later stage, when resources allows. What worries
> >> me is that strange behavior of Solr 1.4 replication when the "slave"
> >> index is fresher then the "master" one.
> >>
> >> Damien
> >
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: How to index large set data

2009-05-22 Thread Jianbin Dai

about 2.8 m total docs were created. only the first run finishes. In my 2nd 
try, it hangs there forever at the end of indexing, (I guess right before 
commit), with cpu usage of 100%. Total 5G (2050) index files are created. Now I 
have two problems:
1. why it hangs there and failed?
2. how can i speed up the indexing?


Here is my solrconfig.xml

false
3000
1000
2147483647
1
false




--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Thursday, May 21, 2009, 10:39 PM
> what is the total no:of docs created
> ?  I guess it may not be memory
> bound. indexing is mostly amn IO bound operation. You may
> be able to
> get a better perf if a SSD is used (solid state disk)
> 
> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
> wrote:
> >
> > Hi Paul,
> >
> > Thank you so much for answering my questions. It
> really helped.
> > After some adjustment, basically setting mergeFactor
> to 1000 from the default value of 10, I can finished the
> whole job in 2.5 hours. I checked that during running time,
> only around 18% of memory is being used, and VIRT is always
> 1418m. I am thinking it may be restricted by JVM memory
> setting. But I run the data import command through web,
> i.e.,
> >
> http://:/solr/dataimport?command=full-import,
> how can I set the memory allocation for JVM?
> > Thanks again!
> >
> > JB
> >
> > --- On Thu, 5/21/09, Noble Paul നോബിള്‍
>  नोब्ळ् 
> wrote:
> >
> >> From: Noble Paul നോബിള്‍
>  नोब्ळ् 
> >> Subject: Re: How to index large set data
> >> To: solr-user@lucene.apache.org
> >> Date: Thursday, May 21, 2009, 9:57 PM
> >> check the status page of DIH and see
> >> if it is working properly. and
> >> if, yes what is the rate of indexing
> >>
> >> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai
> 
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I have about 45GB xml files to be indexed. I
> am using
> >> DataImportHandler. I started the full import 4
> hours ago,
> >> and it's still running
> >> > My computer has 4GB memory. Any suggestion on
> the
> >> solutions?
> >> > Thanks!
> >> >
> >> > JB
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >>
> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 





Re: Solr statistics of top searches and results returned

2009-05-22 Thread Umar Shah
Hi,

good feature to have,
maintaining top N would also require storing all the search queries
done so far and keep updating (or atleast in some time window).

having pluggable persistent storage for all time search queries would be great.

tell me how can I help?

-umar

On Fri, May 22, 2009 at 12:21 PM, Shalin Shekhar Mangar
 wrote:
> On Fri, May 22, 2009 at 3:22 AM, Grant Ingersoll wrote:
>
>>
>> I think you will want some type of persistence mechanism otherwise you will
>> end up consuming a lot of resources keeping track of all the query strings,
>> unless I'm missing something.  Either a Lucene index (Solr core) or the
>> option of embedding a DB.  Ideally, it would be pluggable such that people
>> could choose their storage mechanism.  Most people do this kind of thing
>> offline via log analysis as logs can grow quite large quite quickly.
>>
>
> For a general case, yes. But I was thinking more of a top 'n' queries as a
> running statistic.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: clustering SOLR-769

2009-05-22 Thread Stanislaw Osinski
Hi there,


> Is it possbile to specify more than one snippet field or should I use copy
> field to copy copy two or three field into single field and specify it in
> snippet field.


Currently, you can specify only one snippet field, so you'd need to use
copy.

Cheers,

S.


solr replication 1.3

2009-05-22 Thread Ashish P

I want to add master slave configuration for solr. I have following solr
configuration:
I am using solr 1.3 on windows. I am also using EmbeddedSolrServer.
In this case is it possible to perform master slave configuration?? 

My second question is if I user solr 1.4 which has solr replication using
java..
Still is it possible to do solr replication using EmbeddedSolrServer on
windows??

Thanks,
Ashish
-- 
View this message in context: 
http://www.nabble.com/solr-replication-1.3-tp23667360p23667360.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr in cluster

2009-05-22 Thread Reza Safari

Hi,

One of the problems I have with Lucene is Lock obtained by the  
IndexWriter. I want to use one Solr running inside a cluster behind  
the load balancer. Are multiple webservers able to write and commit to  
Lucene using Solr with out locking issues etc? Is Solr the solution  
for concurrency problem or do I have to use some JMS queue or  
something to update/commit? I can use synchronization technics to fix  
concurrency problems on one webserver but on more than one webserver,  
I think that you what I mean.


Gr, Reza

--
Reza Safari
LUKKIEN
Copernicuslaan 15
6716 BM Ede

The Netherlands
-
http://www.lukkien.com
t: +31 (0) 318 698000

This message is for the designated recipient only and may contain  
privileged, proprietary, or otherwise private information. If you have  
received it in error, please notify the sender immediately and delete  
the original. Any other use of the email by you is prohibited.

















Re: clustering SOLR-769

2009-05-22 Thread Grant Ingersoll


On May 22, 2009, at 4:40 AM, Stanislaw Osinski wrote:


Hi there,


Is it possbile to specify more than one snippet field or should I  
use copy
field to copy copy two or three field into single field and specify  
it in

snippet field.



Currently, you can specify only one snippet field, so you'd need to  
use

copy.



Do note, though, that nothing is set in stone on this stuff.  What you  
have right now is a first attempt.  We are definitely open to  
suggestions on improvements.


-Grant


Re: solr replication 1.3

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, May 22, 2009 at 3:12 PM, Ashish P  wrote:
>
> I want to add master slave configuration for solr. I have following solr
> configuration:
> I am using solr 1.3 on windows. I am also using EmbeddedSolrServer.
> In this case is it possible to perform master slave configuration??
>
> My second question is if I user solr 1.4 which has solr replication using
> java..
> Still is it possible to do solr replication using EmbeddedSolrServer on
> windows??
no . The replication in 1.4 relies on http transport. for an
EmbeddedSolrServer there is no http end point
>
> Thanks,
> Ashish
> --
> View this message in context: 
> http://www.nabble.com/solr-replication-1.3-tp23667360p23667360.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to index large set data

2009-05-22 Thread Grant Ingersoll
Can you parallelize this?  I don't know that the DIH can handle it,  
but having multiple threads sending docs to Solr is the best  
performance wise, so maybe you need to look at alternatives to pulling  
with DIH and instead use a client to push into Solr.



On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:



about 2.8 m total docs were created. only the first run finishes. In  
my 2nd try, it hangs there forever at the end of indexing, (I guess  
right before commit), with cpu usage of 100%. Total 5G (2050) index  
files are created. Now I have two problems:

1. why it hangs there and failed?
2. how can i speed up the indexing?


Here is my solrconfig.xml

   false
   3000
   1000
   2147483647
   1
   false




--- On Thu, 5/21/09, Noble Paul നോബിള്‍  नो 
ब्ळ्  wrote:


From: Noble Paul നോബിള്‍  नोब्ळ्  


Subject: Re: How to index large set data
To: solr-user@lucene.apache.org
Date: Thursday, May 21, 2009, 10:39 PM
what is the total no:of docs created
?  I guess it may not be memory
bound. indexing is mostly amn IO bound operation. You may
be able to
get a better perf if a SSD is used (solid state disk)

On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
wrote:


Hi Paul,

Thank you so much for answering my questions. It

really helped.

After some adjustment, basically setting mergeFactor

to 1000 from the default value of 10, I can finished the
whole job in 2.5 hours. I checked that during running time,
only around 18% of memory is being used, and VIRT is always
1418m. I am thinking it may be restricted by JVM memory
setting. But I run the data import command through web,
i.e.,



http://:/solr/dataimport?command=full-import,
how can I set the memory allocation for JVM?

Thanks again!

JB

--- On Thu, 5/21/09, Noble Paul നോബിള്‍

 नोब्ळ् 
wrote:



From: Noble Paul നോബിള്‍

 नोब्ळ् 

Subject: Re: How to index large set data
To: solr-user@lucene.apache.org
Date: Thursday, May 21, 2009, 9:57 PM
check the status page of DIH and see
if it is working properly. and
if, yes what is the rate of indexing

On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai



wrote:


Hi,

I have about 45GB xml files to be indexed. I

am using

DataImportHandler. I started the full import 4

hours ago,

and it's still running

My computer has 4GB memory. Any suggestion on

the

solutions?

Thanks!

JB









--


-

Noble Paul | Principal Engineer| AOL | http://aol.com











--
-
Noble Paul | Principal Engineer| AOL | http://aol.com







--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: How to index large set data

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
there is already an issue for writing to Solr in multiple threads  SOLR-1089

On Fri, May 22, 2009 at 6:08 PM, Grant Ingersoll  wrote:
> Can you parallelize this?  I don't know that the DIH can handle it, but
> having multiple threads sending docs to Solr is the best performance wise,
> so maybe you need to look at alternatives to pulling with DIH and instead
> use a client to push into Solr.
>
>
> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
>
>>
>> about 2.8 m total docs were created. only the first run finishes. In my
>> 2nd try, it hangs there forever at the end of indexing, (I guess right
>> before commit), with cpu usage of 100%. Total 5G (2050) index files are
>> created. Now I have two problems:
>> 1. why it hangs there and failed?
>> 2. how can i speed up the indexing?
>>
>>
>> Here is my solrconfig.xml
>>
>>   false
>>   3000
>>   1000
>>   2147483647
>>   1
>>   false
>>
>>
>>
>>
>> --- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ् 
>> wrote:
>>
>>> From: Noble Paul നോബിള്‍  नोब्ळ् 
>>> Subject: Re: How to index large set data
>>> To: solr-user@lucene.apache.org
>>> Date: Thursday, May 21, 2009, 10:39 PM
>>> what is the total no:of docs created
>>> ?  I guess it may not be memory
>>> bound. indexing is mostly amn IO bound operation. You may
>>> be able to
>>> get a better perf if a SSD is used (solid state disk)
>>>
>>> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
>>> wrote:

 Hi Paul,

 Thank you so much for answering my questions. It
>>>
>>> really helped.

 After some adjustment, basically setting mergeFactor
>>>
>>> to 1000 from the default value of 10, I can finished the
>>> whole job in 2.5 hours. I checked that during running time,
>>> only around 18% of memory is being used, and VIRT is always
>>> 1418m. I am thinking it may be restricted by JVM memory
>>> setting. But I run the data import command through web,
>>> i.e.,

>>> http://:/solr/dataimport?command=full-import,
>>> how can I set the memory allocation for JVM?

 Thanks again!

 JB

 --- On Thu, 5/21/09, Noble Paul നോബിള്‍
>>>
>>>  नोब्ळ् 
>>> wrote:

> From: Noble Paul നോബിള്‍
>>>
>>>  नोब्ळ् 
>
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Thursday, May 21, 2009, 9:57 PM
> check the status page of DIH and see
> if it is working properly. and
> if, yes what is the rate of indexing
>
> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai
>>>
>>> 
>
> wrote:
>>
>> Hi,
>>
>> I have about 45GB xml files to be indexed. I
>>>
>>> am using
>
> DataImportHandler. I started the full import 4
>>>
>>> hours ago,
>
> and it's still running
>>
>> My computer has 4GB memory. Any suggestion on
>>>
>>> the
>
> solutions?
>>
>> Thanks!
>>
>> JB
>>
>>
>>
>>
>>
>
>
>
> --
>
>>> -
>
> Noble Paul | Principal Engineer| AOL | http://aol.com
>





>>>
>>>
>>>
>>> --
>>> -
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>
>>
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Solr in cluster

2009-05-22 Thread Otis Gospodnetic

Reza,

You can't have multiple Solr instances write to the same index at the same time.
But you can add documents to a single Solr instance in parallel (e.g. from 
multiple threads of one or more applications) and Solr will do the right thing 
without you having to put JMS or some other type of queue in front of Solr.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Reza Safari 
> To: solr-user@lucene.apache.org
> Sent: Friday, May 22, 2009 6:17:56 AM
> Subject: Solr in cluster
> 
> Hi,
> 
> One of the problems I have with Lucene is Lock obtained by the IndexWriter. I 
> want to use one Solr running inside a cluster behind the load balancer. Are 
> multiple webservers able to write and commit to Lucene using Solr with out 
> locking issues etc? Is Solr the solution for concurrency problem or do I have 
> to 
> use some JMS queue or something to update/commit? I can use synchronization 
> technics to fix concurrency problems on one webserver but on more than one 
> webserver, I think that you what I mean.
> 
> Gr, Reza
> 
> --
> Reza Safari
> LUKKIEN
> Copernicuslaan 15
> 6716 BM Ede
> 
> The Netherlands
> -
> http://www.lukkien.com
> t: +31 (0) 318 698000
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise private information. If you have received it in 
> error, 
> please notify the sender immediately and delete the original. Any other use 
> of 
> the email by you is prohibited.



Re: How to index large set data

2009-05-22 Thread Otis Gospodnetic

Hi,

Those settings are a little "crazy".  Are you sure you want to give Solr/Lucene 
3G to buffer documents before flushing them to disk?  Are you sure you want to 
use the mergeFactor of 1000?  Checking the logs to see if there are any errors. 
 Look at the index directory to see if Solr is actually still writing to it? 
(file sizes are changing, number of files is changing).  kill -QUIT the JVM pid 
to see where things are "stuck" if they are stuck...


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jianbin Dai 
> To: solr-user@lucene.apache.org; noble.p...@gmail.com
> Sent: Friday, May 22, 2009 3:42:04 AM
> Subject: Re: How to index large set data
> 
> 
> about 2.8 m total docs were created. only the first run finishes. In my 2nd 
> try, 
> it hangs there forever at the end of indexing, (I guess right before commit), 
> with cpu usage of 100%. Total 5G (2050) index files are created. Now I have 
> two 
> problems:
> 1. why it hangs there and failed?
> 2. how can i speed up the indexing?
> 
> 
> Here is my solrconfig.xml
> 
> false
> 3000
> 1000
> 2147483647
> 1
> false
> 
> 
> 
> 
> --- On Thu, 5/21/09, Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> > From: Noble Paul നോബിള്‍  नोब्ळ् 
> > Subject: Re: How to index large set data
> > To: solr-user@lucene.apache.org
> > Date: Thursday, May 21, 2009, 10:39 PM
> > what is the total no:of docs created
> > ?  I guess it may not be memory
> > bound. indexing is mostly amn IO bound operation. You may
> > be able to
> > get a better perf if a SSD is used (solid state disk)
> > 
> > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
> > wrote:
> > >
> > > Hi Paul,
> > >
> > > Thank you so much for answering my questions. It
> > really helped.
> > > After some adjustment, basically setting mergeFactor
> > to 1000 from the default value of 10, I can finished the
> > whole job in 2.5 hours. I checked that during running time,
> > only around 18% of memory is being used, and VIRT is always
> > 1418m. I am thinking it may be restricted by JVM memory
> > setting. But I run the data import command through web,
> > i.e.,
> > >
> > http://:/solr/dataimport?command=full-import,
> > how can I set the memory allocation for JVM?
> > > Thanks again!
> > >
> > > JB
> > >
> > > --- On Thu, 5/21/09, Noble Paul നോബിള്‍
> >  नोब्ळ् 
> > wrote:
> > >
> > >> From: Noble Paul നോബിള്‍
> >  नोब्ळ् 
> > >> Subject: Re: How to index large set data
> > >> To: solr-user@lucene.apache.org
> > >> Date: Thursday, May 21, 2009, 9:57 PM
> > >> check the status page of DIH and see
> > >> if it is working properly. and
> > >> if, yes what is the rate of indexing
> > >>
> > >> On Thu, May 21, 2009 at 11:48 AM, Jianbin Dai
> > 
> > >> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > I have about 45GB xml files to be indexed. I
> > am using
> > >> DataImportHandler. I started the full import 4
> > hours ago,
> > >> and it's still running
> > >> > My computer has 4GB memory. Any suggestion on
> > the
> > >> solutions?
> > >> > Thanks!
> > >> >
> > >> > JB
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > -
> > >> Noble Paul | Principal Engineer| AOL | http://aol.com
> > >>
> > >
> > >
> > >
> > >
> > >
> > 
> > 
> > 
> > -- 
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> > 



Re: Solr in cluster

2009-05-22 Thread Reza Safari

Master work. This is exactly what I'm looking for. Now I'm happy :)

Gr, Reza

On May 22, 2009, at 4:23 PM, Otis Gospodnetic wrote:



Reza,

You can't have multiple Solr instances write to the same index at  
the same time.
But you can add documents to a single Solr instance in parallel  
(e.g. from multiple threads of one or more applications) and Solr  
will do the right thing without you having to put JMS or some other  
type of queue in front of Solr.



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Reza Safari 
To: solr-user@lucene.apache.org
Sent: Friday, May 22, 2009 6:17:56 AM
Subject: Solr in cluster

Hi,

One of the problems I have with Lucene is Lock obtained by the  
IndexWriter. I
want to use one Solr running inside a cluster behind the load  
balancer. Are
multiple webservers able to write and commit to Lucene using Solr  
with out
locking issues etc? Is Solr the solution for concurrency problem or  
do I have to
use some JMS queue or something to update/commit? I can use  
synchronization
technics to fix concurrency problems on one webserver but on more  
than one

webserver, I think that you what I mean.

Gr, Reza

--
Reza Safari
LUKKIEN
Copernicuslaan 15
6716 BM Ede

The Netherlands
-
http://www.lukkien.com
t: +31 (0) 318 698000

This message is for the designated recipient only and may contain  
privileged,
proprietary, or otherwise private information. If you have received  
it in error,
please notify the sender immediately and delete the original. Any  
other use of

the email by you is prohibited.





--
Reza Safari
LUKKIEN
Copernicuslaan 15
6716 BM Ede

The Netherlands
-
http://www.lukkien.com
t: +31 (0) 318 698000

This message is for the designated recipient only and may contain  
privileged, proprietary, or otherwise private information. If you have  
received it in error, please notify the sender immediately and delete  
the original. Any other use of the email by you is prohibited.

















Re: Plugin Not Found

2009-05-22 Thread Jeff Newburn
I have included the configuration and the log for the error on startup. I
does appear it tries to load the lib but then simply can't referene it.
   


explicit
0.01

productId^10.0

personality^15.0
subCategory^20.0
category^10.0
productType^8.0

brandName^10.0
realBrandName^9.5
productNameSearch^20

size^1.2
width^1.0
heelHeight^1.0

productDescription^5.0
color^6.0
price^1.0

expandedGender^0.5


brandName^5.0  productNameSearch^5.0 productDescription^5.0
personality^10.0 subCategory^20.0 category^10.0 productType^8.0


productId, productName, price, originalPrice,
brandNameFacet, productRating, imageUrl, productUrl, isNew, onSale

rord(popularity)^1
100%
1
5
*:*


brandNameFacet,productTypeFacet,productName,categoryFacet,subC
ategoryFacet,personalityFacet,colorFacet,heelHeight,expandedGender
1
1

 
   spellcheck
   facetcube
 






LOGS
May 22, 2009 7:38:24 AM org.apache.catalina.startup.SetAllPropertiesRule
begin
WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property
'maxProcessors' to '500' did not find a matching property.
May 22, 2009 7:38:24 AM org.apache.catalina.startup.SetAllPropertiesRule
begin
WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property
'maxProcessors' to '500' did not find a matching property.
May 22, 2009 7:38:24 AM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path: /usr/local/apr/lib
May 22, 2009 7:38:24 AM org.apache.tomcat.util.net.NioSelectorPool
getSharedSelector
INFO: Using a shared selector for servlet write/read
May 22, 2009 7:38:24 AM org.apache.coyote.http11.Http11NioProtocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
May 22, 2009 7:38:24 AM org.apache.tomcat.util.net.NioSelectorPool
getSharedSelector
INFO: Using a shared selector for servlet write/read
May 22, 2009 7:38:24 AM org.apache.coyote.http11.Http11NioProtocol init
INFO: Initializing Coyote HTTP/1.1 on http-8443
May 22, 2009 7:38:24 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1011 ms
May 22, 2009 7:38:24 AM org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
May 22, 2009 7:38:24 AM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.16
May 22, 2009 7:38:24 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive solr.war
May 22, 2009 7:38:25 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: No /solr/home in JNDI
May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader
locateInstanceDir
INFO: using system property solr.solr.home: /home/zetasolr
May 22, 2009 7:38:25 AM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: /home/zetasolr/solr.xml
May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/home/zetasolr/'
May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/home/zetasolr/lib/FacetCubeComponent.jar' to Solr
classloader
May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/home/zetasolr/cores/zeta-main/'
May 22, 2009 7:38:25 AM org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Reusing parent classloader
May 22, 2009 7:38:25 AM org.apache.solr.core.SolrConfig 
INFO: Loaded SolrConfig: solrconfig.xml
May 22, 2009 7:38:25 AM org.apache.solr.schema.IndexSchema readSchema
INFO: Reading Solr Schema
May 22, 2009 7:38:25 AM org.apache.solr.schema.IndexSchema readSchema
INFO: Schema name=Zappos Zeta (zeta-main)
May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created string: org.apache.solr.schema.StrField
May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created boolean: org.apache.solr.schema.BoolField
May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created integer: org.apache.solr.schema.IntField
May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created long: org.apache.solr.schema.LongField
May 22, 2009 7:38:25 AM org.apache.solr.util.plugin.AbstractPluginLoader
load
INFO: created float: org.apache.solr.schema.FloatField
M

Re: How to index large set data

2009-05-22 Thread Jianbin Dai

I dont know exactly what is this 3G Ram buffer used. But what I noticed was 
both index size and file number were keeping increasing, but stuck in the 
commit. 

--- On Fri, 5/22/09, Otis Gospodnetic  wrote:

> From: Otis Gospodnetic 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Friday, May 22, 2009, 7:26 AM
> 
> Hi,
> 
> Those settings are a little "crazy".  Are you sure you
> want to give Solr/Lucene 3G to buffer documents before
> flushing them to disk?  Are you sure you want to use
> the mergeFactor of 1000?  Checking the logs to see if
> there are any errors.  Look at the index directory to
> see if Solr is actually still writing to it? (file sizes are
> changing, number of files is changing).  kill -QUIT the
> JVM pid to see where things are "stuck" if they are
> stuck...
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: Jianbin Dai 
> > To: solr-user@lucene.apache.org;
> noble.p...@gmail.com
> > Sent: Friday, May 22, 2009 3:42:04 AM
> > Subject: Re: How to index large set data
> > 
> > 
> > about 2.8 m total docs were created. only the first
> run finishes. In my 2nd try, 
> > it hangs there forever at the end of indexing, (I
> guess right before commit), 
> > with cpu usage of 100%. Total 5G (2050) index files
> are created. Now I have two 
> > problems:
> > 1. why it hangs there and failed?
> > 2. how can i speed up the indexing?
> > 
> > 
> > Here is my solrconfig.xml
> > 
> >     false
> >     3000
> >     1000
> >     2147483647
> >     1
> >     false
> > 
> > 
> > 
> > 
> > --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍  नोब्ळ् wrote:
> > 
> > > From: Noble Paul നോബിള്‍ 
> नोब्ळ् 
> > > Subject: Re: How to index large set data
> > > To: solr-user@lucene.apache.org
> > > Date: Thursday, May 21, 2009, 10:39 PM
> > > what is the total no:of docs created
> > > ?  I guess it may not be memory
> > > bound. indexing is mostly amn IO bound operation.
> You may
> > > be able to
> > > get a better perf if a SSD is used (solid state
> disk)
> > > 
> > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
> > > wrote:
> > > >
> > > > Hi Paul,
> > > >
> > > > Thank you so much for answering my
> questions. It
> > > really helped.
> > > > After some adjustment, basically setting
> mergeFactor
> > > to 1000 from the default value of 10, I can
> finished the
> > > whole job in 2.5 hours. I checked that during
> running time,
> > > only around 18% of memory is being used, and VIRT
> is always
> > > 1418m. I am thinking it may be restricted by JVM
> memory
> > > setting. But I run the data import command
> through web,
> > > i.e.,
> > > >
> > > http://:/solr/dataimport?command=full-import,
> > > how can I set the memory allocation for JVM?
> > > > Thanks again!
> > > >
> > > > JB
> > > >
> > > > --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍
> > >  नोब्ळ् 
> > > wrote:
> > > >
> > > >> From: Noble Paul നോബിള്‍
> > >  नोब्ळ् 
> > > >> Subject: Re: How to index large set
> data
> > > >> To: solr-user@lucene.apache.org
> > > >> Date: Thursday, May 21, 2009, 9:57 PM
> > > >> check the status page of DIH and see
> > > >> if it is working properly. and
> > > >> if, yes what is the rate of indexing
> > > >>
> > > >> On Thu, May 21, 2009 at 11:48 AM,
> Jianbin Dai
> > > 
> > > >> wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> > I have about 45GB xml files to be
> indexed. I
> > > am using
> > > >> DataImportHandler. I started the full
> import 4
> > > hours ago,
> > > >> and it's still running.
> > > >> > My computer has 4GB memory. Any
> suggestion on
> > > the
> > > >> solutions?
> > > >> > Thanks!
> > > >> >
> > > >> > JB
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > >
> -
> > > >> Noble Paul | Principal Engineer| AOL |
> http://aol.com
> > > >>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > 
> > > 
> > > 
> > > -- 
> > >
> -
> > > Noble Paul | Principal Engineer| AOL | http://aol.com
> > > 
> 
> 






Re: How to index large set data

2009-05-22 Thread Jianbin Dai

If I do the xml parsing by myself and use embedded client to do the push, would 
it be more efficient than DIH?


--- On Fri, 5/22/09, Grant Ingersoll  wrote:

> From: Grant Ingersoll 
> Subject: Re: How to index large set data
> To: solr-user@lucene.apache.org
> Date: Friday, May 22, 2009, 5:38 AM
> Can you parallelize this?  I
> don't know that the DIH can handle it,  
> but having multiple threads sending docs to Solr is the
> best  
> performance wise, so maybe you need to look at alternatives
> to pulling  
> with DIH and instead use a client to push into Solr.
> 
> 
> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
> 
> >
> > about 2.8 m total docs were created. only the first
> run finishes. In  
> > my 2nd try, it hangs there forever at the end of
> indexing, (I guess  
> > right before commit), with cpu usage of 100%. Total 5G
> (2050) index  
> > files are created. Now I have two problems:
> > 1. why it hangs there and failed?
> > 2. how can i speed up the indexing?
> >
> >
> > Here is my solrconfig.xml
> >
> >   
> false
> >   
> 3000
> >   
> 1000
> >   
> 2147483647
> >   
> 1
> >   
> false
> >
> >
> >
> >
> > --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍  नो 
> > ब्ळ् 
> wrote:
> >
> >> From: Noble Paul നോബിള്‍ 
> नोब्ळ्  
> >> 
> >> Subject: Re: How to index large set data
> >> To: solr-user@lucene.apache.org
> >> Date: Thursday, May 21, 2009, 10:39 PM
> >> what is the total no:of docs created
> >> ?  I guess it may not be memory
> >> bound. indexing is mostly amn IO bound operation.
> You may
> >> be able to
> >> get a better perf if a SSD is used (solid state
> disk)
> >>
> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai
> 
> >> wrote:
> >>>
> >>> Hi Paul,
> >>>
> >>> Thank you so much for answering my questions.
> It
> >> really helped.
> >>> After some adjustment, basically setting
> mergeFactor
> >> to 1000 from the default value of 10, I can
> finished the
> >> whole job in 2.5 hours. I checked that during
> running time,
> >> only around 18% of memory is being used, and VIRT
> is always
> >> 1418m. I am thinking it may be restricted by JVM
> memory
> >> setting. But I run the data import command through
> web,
> >> i.e.,
> >>>
> >>
> http://:/solr/dataimport?command=full-import,
> >> how can I set the memory allocation for JVM?
> >>> Thanks again!
> >>>
> >>> JB
> >>>
> >>> --- On Thu, 5/21/09, Noble Paul
> നോബിള്‍
> >>  नोब्ळ् 
> >> wrote:
> >>>
>  From: Noble Paul നോബിള്‍
> >>  नोब्ळ् 
>  Subject: Re: How to index large set data
>  To: solr-user@lucene.apache.org
>  Date: Thursday, May 21, 2009, 9:57 PM
>  check the status page of DIH and see
>  if it is working properly. and
>  if, yes what is the rate of indexing
> 
>  On Thu, May 21, 2009 at 11:48 AM, Jianbin
> Dai
> >> 
>  wrote:
> >
> > Hi,
> >
> > I have about 45GB xml files to be
> indexed. I
> >> am using
>  DataImportHandler. I started the full
> import 4
> >> hours ago,
>  and it's still running
> > My computer has 4GB memory. Any
> suggestion on
> >> the
>  solutions?
> > Thanks!
> >
> > JB
> >
> >
> >
> >
> >
> 
> 
> 
>  --
> 
> >>
> -
>  Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >> -- 
> >>
> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >>
> >
> >
> >
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem
> (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination..com/search
> 
> 






Filtering query terms

2009-05-22 Thread Branca Marco
Hi,

I am experiencing problems using filters.

I'm using the following version of Solr:
  solr/nightly of 2009-04-12

The part of the schema.xml I'm using for setting filters is the following:


  





  
  





  


and the field I'm querying is a field called "all" declared as follows:



When I try testing the filter "solr.LowerCaseFilterFactory" I get different 
results calling the following urls:

 1. 
http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
 2. 
http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on

Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I get 
different results calling the following urls:

 1. 
http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
 2. 
http://[server-ip]:[server-port]/solr/[core-name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on

Is it the expected behavior or it is a (known) bug? I would like to apply some 
filter converting all searched words in the corresponding lowercase version 
without accents.

Thanks for your help,

Marco


--
The information transmitted is intended for the person or entity to which it is 
addressed and may contain confidential and/or privileged material. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient is prohibited. If you received this in error, please contact the 
sender and delete the material from any computer.


RE: Filtering query terms

2009-05-22 Thread Ensdorf Ken
> When I try testing the filter "solr.LowerCaseFilterFactory" I get
> different results calling the following urls:
>
>  1. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
>  2. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on

In this case, the WordDelimiterFilterFactory is kicking in on your second 
search, so "APaPa" is split into "APa" and "Pa".  You can double-check this by 
using the analysis tool in the admin UI - 
http://localhost:8983/solr/admin/analysis.jsp

>
> Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I
> get different results calling the following urls:
>
>  1. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
>  2. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on

Not sure what it happening here, but again I would check it with the analysi 
tool


Re: Multicore Solr not showing Cache Stats

2009-05-22 Thread Otis Gospodnetic

Old email. Hoss, thanks for doing this.  I had a closer look at my 
solrconfig.xml and found that I didn't put  elements around the settings 
for caches.  Solr didn't complain, so I didn't notice earlier...

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Chris Hostetter 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, April 7, 2009 5:41:48 PM
> Subject: Re: Multicore Solr not showing Cache Stats
> 
> 
> : - Going to http://localhost:8983/core1/admin/stats.jsp#cache shows a 
> : nearly empty Cache section.  The only cache that shows up there is 
> : fieldValueCache (which is really commented out in solrconfig.xml, but 
> : Solr creates it anyway, which is normal).  All other caches are missing.
> : 
> : Any ideas why cache stats might not be getting displayed or where I 
> : could look to figure out what's going on?
> 
> Otis: I can't reporduce on the trunk...
> 
> chr...@chrishmposxl:~/lucene/solr/example$ mkdir otis
> chr...@chrishmposxl:~/lucene/solr/example$ cp multicore/solr.xml otis/
> chr...@chrishmposxl:~/lucene/solr/example$ cp -r solr otis/core0
> chr...@chrishmposxl:~/lucene/solr/example$ cp -r solr otis/core1
> chr...@chrishmposxl:~/lucene/solr/example$ java -Dsolr.solr.home=otis -jar 
> start.jar
> 
> http://localhost:8983/solr/core1/admin/stats.jsp#cache
> http://localhost:8983/solr/core0/admin/stats.jsp#cache
> 
> ...both show full cache stats for all of the expected caches.
> 
> 
> are you sure there isn't a bug in your configs? if you set 
> -Dsolr.solr.home=/data/solr_home/cores/core1 can you see the stats for 
> that core?
> 
> 
> -Hoss



R: Filtering query terms

2009-05-22 Thread Branca Marco
Thank you very much for the instantaneous support.
I couldn't find the conflict for hours :(

When I have a response for the ISOLatin1AccentFilterFactory I will write it on 
the mailing-list.

Thanks again,

Marco

Da: Ensdorf Ken [ensd...@zoominfo.com]
Inviato: venerdì 22 maggio 2009 18.16
A: 'solr-user@lucene.apache.org'
Oggetto: RE: Filtering query terms

> When I try testing the filter "solr.LowerCaseFilterFactory" I get
> different results calling the following urls:
>
>  1. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
>  2. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on

In this case, the WordDelimiterFilterFactory is kicking in on your second 
search, so "APaPa" is split into "APa" and "Pa".  You can double-check this by 
using the analysis tool in the admin UI - 
http://localhost:8983/solr/admin/analysis.jsp

>
> Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I
> get different results calling the following urls:
>
>  1. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
>  2. http://[server-ip]:[server-port]/solr/[core-
> name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on

Not sure what it happening here, but again I would check it with the analysi 
tool

--
The information transmitted is intended for the person or entity to which it is 
addressed and may contain confidential and/or privileged material. Any review, 
retransmission, dissemination or other use of, or taking of any action in 
reliance upon, this information by persons or entities other than the intended 
recipient is prohibited. If you received this in error, please contact the 
sender and delete the material from any computer.


Re: How to index large set data

2009-05-22 Thread Otis Gospodnetic

If the file numbers and index size was increasing, that means Solr was still 
working.  It's possible it's taking extra long because of such high settings.  
Bring them both down and try.  For example, don't go over 20 with mergeFactor, 
and try just 1GB for ramBufferSizeMB.


Bona fortuna!

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jianbin Dai 
> To: solr-user@lucene.apache.org
> Sent: Friday, May 22, 2009 11:05:27 AM
> Subject: Re: How to index large set data
> 
> 
> I dont know exactly what is this 3G Ram buffer used. But what I noticed was 
> both 
> index size and file number were keeping increasing, but stuck in the commit. 
> 
> --- On Fri, 5/22/09, Otis Gospodnetic wrote:
> 
> > From: Otis Gospodnetic 
> > Subject: Re: How to index large set data
> > To: solr-user@lucene.apache.org
> > Date: Friday, May 22, 2009, 7:26 AM
> > 
> > Hi,
> > 
> > Those settings are a little "crazy".  Are you sure you
> > want to give Solr/Lucene 3G to buffer documents before
> > flushing them to disk?  Are you sure you want to use
> > the mergeFactor of 1000?  Checking the logs to see if
> > there are any errors.  Look at the index directory to
> > see if Solr is actually still writing to it? (file sizes are
> > changing, number of files is changing).  kill -QUIT the
> > JVM pid to see where things are "stuck" if they are
> > stuck...
> > 
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> > > From: Jianbin Dai 
> > > To: solr-user@lucene.apache.org;
> > noble.p...@gmail.com
> > > Sent: Friday, May 22, 2009 3:42:04 AM
> > > Subject: Re: How to index large set data
> > > 
> > > 
> > > about 2.8 m total docs were created. only the first
> > run finishes. In my 2nd try, 
> > > it hangs there forever at the end of indexing, (I
> > guess right before commit), 
> > > with cpu usage of 100%. Total 5G (2050) index files
> > are created. Now I have two 
> > > problems:
> > > 1. why it hangs there and failed?
> > > 2. how can i speed up the indexing?
> > > 
> > > 
> > > Here is my solrconfig.xml
> > > 
> > > false
> > > 3000
> > > 1000
> > > 2147483647
> > > 1
> > > false
> > > 
> > > 
> > > 
> > > 
> > > --- On Thu, 5/21/09, Noble Paul
> > നോബിള്‍  नोब्ळ् wrote:
> > > 
> > > > From: Noble Paul നോബിള്‍ 
> > नोब्ळ् 
> > > > Subject: Re: How to index large set data
> > > > To: solr-user@lucene.apache.org
> > > > Date: Thursday, May 21, 2009, 10:39 PM
> > > > what is the total no:of docs created
> > > > ?  I guess it may not be memory
> > > > bound. indexing is mostly amn IO bound operation.
> > You may
> > > > be able to
> > > > get a better perf if a SSD is used (solid state
> > disk)
> > > > 
> > > > On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai 
> > > > wrote:
> > > > >
> > > > > Hi Paul,
> > > > >
> > > > > Thank you so much for answering my
> > questions. It
> > > > really helped.
> > > > > After some adjustment, basically setting
> > mergeFactor
> > > > to 1000 from the default value of 10, I can
> > finished the
> > > > whole job in 2.5 hours. I checked that during
> > running time,
> > > > only around 18% of memory is being used, and VIRT
> > is always
> > > > 1418m. I am thinking it may be restricted by JVM
> > memory
> > > > setting. But I run the data import command
> > through web,
> > > > i.e.,
> > > > >
> > > > http://:/solr/dataimport?command=full-import,
> > > > how can I set the memory allocation for JVM?
> > > > > Thanks again!
> > > > >
> > > > > JB
> > > > >
> > > > > --- On Thu, 5/21/09, Noble Paul
> > നോബിള്‍
> > > >  नोब्ळ् 
> > > > wrote:
> > > > >
> > > > >> From: Noble Paul നോബിള്‍
> > > >  नोब्ळ् 
> > > > >> Subject: Re: How to index large set
> > data
> > > > >> To: solr-user@lucene.apache.org
> > > > >> Date: Thursday, May 21, 2009, 9:57 PM
> > > > >> check the status page of DIH and see
> > > > >> if it is working properly. and
> > > > >> if, yes what is the rate of indexing
> > > > >>
> > > > >> On Thu, May 21, 2009 at 11:48 AM,
> > Jianbin Dai
> > > > 
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> > I have about 45GB xml files to be
> > indexed. I
> > > > am using
> > > > >> DataImportHandler. I started the full
> > import 4
> > > > hours ago,
> > > > >> and it's still running.
> > > > >> > My computer has 4GB memory. Any
> > suggestion on
> > > > the
> > > > >> solutions?
> > > > >> > Thanks!
> > > > >> >
> > > > >> > JB
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > >
> > -
> > > > >> Noble Paul | Principal Engineer| AOL |
> > http://aol.com
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > >
> > -
> > > > Noble Paul | Principal Engineer| AOL | http://aol.com
>

Re: R: Filtering query terms

2009-05-22 Thread Otis Gospodnetic

Marco,

Open-source can be good like that. :)
See http://www.jroller.com/otis/entry/lucene_solr_nutch_amazing_tech for a 
similar example

Ciao,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Branca Marco 
> To: "solr-user@lucene.apache.org" 
> Sent: Friday, May 22, 2009 12:27:45 PM
> Subject: R: Filtering query terms
> 
> Thank you very much for the instantaneous support.
> I couldn't find the conflict for hours :(
> 
> When I have a response for the ISOLatin1AccentFilterFactory I will write it 
> on 
> the mailing-list.
> 
> Thanks again,
> 
> Marco
> 
> Da: Ensdorf Ken [ensd...@zoominfo.com]
> Inviato: venerdì 22 maggio 2009 18.16
> A: 'solr-user@lucene.apache.org'
> Oggetto: RE: Filtering query terms
> 
> > When I try testing the filter "solr.LowerCaseFilterFactory" I get
> > different results calling the following urls:
> >
> >  1. http://[server-ip]:[server-port]/solr/[core-
> > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
> >  2. http://[server-ip]:[server-port]/solr/[core-
> > name]/select/?q=all%3APaPa&version=2.2&start=0&rows=10&indent=on
> 
> In this case, the WordDelimiterFilterFactory is kicking in on your second 
> search, so "APaPa" is split into "APa" and "Pa".  You can double-check this 
> by 
> using the analysis tool in the admin UI - 
> http://localhost:8983/solr/admin/analysis.jsp
> 
> >
> > Besides, when trying to test the "solr.ISOLatin1AccentFilterFactory" I
> > get different results calling the following urls:
> >
> >  1. http://[server-ip]:[server-port]/solr/[core-
> > name]/select/?q=all%3Apapa&version=2.2&start=0&rows=10&indent=on
> >  2. http://[server-ip]:[server-port]/solr/[core-
> > name]/select/?q=all%3Apapà&version=2.2&start=0&rows=10&indent=on
> 
> Not sure what it happening here, but again I would check it with the analysi 
> tool
> 
> --
> The information transmitted is intended for the person or entity to which it 
> is 
> addressed and may contain confidential and/or privileged material. Any 
> review, 
> retransmission, dissemination or other use of, or taking of any action in 
> reliance upon, this information by persons or entities other than the 
> intended 
> recipient is prohibited. If you received this in error, please contact the 
> sender and delete the material from any computer.



DIH uses == instead of = in SQL

2009-05-22 Thread Eric Pugh

I am getting this error:

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:  
You have an error in your SQL syntax; check the manual that  
corresponds to your MySQL server version for the right syntax to use  
near '=='1433'' at line 1
at  
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)


during a select for a specific institution:

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable  
to execute query: select institution_id, name, acronym as i_acronym  
from institutions where institution_id=='1433' Processing Document # 1
at org.apache.solr.handler.dataimport.JdbcDataSource 
$ResultSetIterator.(JdbcDataSource.java:248)
at  
org 
.apache 
.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205)
at  
org 
.apache 
.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
at  
org 
.apache 
.solr 
.handler 
.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at  
org 
.apache 
.solr 
.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java: 
71)


I just switched to using the paired deltaImportQuery and deltaQuery  
approach.   I am using the latest from trunk.  Any ideas?


Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Document Boosts don't seem to be having an effect

2009-05-22 Thread Jodi Showers
Greetings - first post here - hoping someone can direct me - grasping  
at straws. thank you in advance. Jodi



I'm trying to tune the sort order using a combination of document and  
query time boosts. When searching for the term 'builder' with almost  
identical quantities of this term, and a much larger document boost  
for doc #, it seems to be the score should be much higher for doc #1.


Doc 1 boost - 21.542363409468
Doc 1 scoring - 6.7017727
Doc 1 boost - 12.6390725007673
Doc 2 scoring - 8.00193

All fields being searched on are _t fields - all are:



where text is defined as:

positionIncrementGap="100">

  

words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="1"  
catenateNumbers="1" catenateAll="0"/>


protected="protwords.txt"/>


  
  

synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
words="stopwords.txt"/>
generateWordParts="1" generateNumberParts="1" catenateWords="0"  
catenateNumbers="0" catenateAll="0"/>


protected="protwords.txt"/>


  


omitNorms isn't indicated - I've tried adding it to the Text  
definition - but no change.



To illustrate I have the following documents (I may be overly verbose):

#1

Companyfield>211623Company: 
211623-79.3761name='lat'>43.6496J. Roberts  
& Associates Interiorsboost='1.0'>J. Roberts & Associates Interiorsname='profile_search_s' boost='1.0'>J. ROBERTS & ASSOCIATES  
INTERIORS

<br />-30 years construction experience
<br />-Quality service, on time, on budget
<br />-All sub-trades are licensed and certified - We  
are fully licensed, insured and covered by WSIB.
<br />-References available from our satisfied  
clients.<br/>


<br/>BUILDER
<br />-Custom Homes , Additions and Major Renovations
<br />-Project Management and Planning - Design /  
Build , Engineering , Permits

<br />-Renovation Advisors for DIY homeowners
<br />KITCHENS & INTERIORS
<br />-Design and planning
<br />-Custom kitchens and interior renovations
<br />-Complete painting services
<br />STRUCTURAL SERVICES
<br />-Engineering , permits required, Foundations and  
underpinning
<br />-Wall removal and beam installation<br/ 
>


<br/>Maintenance and Repairs Services
<br />-Masonry repairs and Stone work  (in house staff )
<br />-Windows and doors
<br />-Eave troughs and metal work<br/>

<br/>J.  
ROBERTS & ASSOCIATES INTERIORS

-30 years construction experience
-Quality service, on time, on budget
-All sub-trades are licensed and certified - We are fully licensed,  
insured and covered by WSIB.

-References available from our satisfied clients.

BUILDER
-Custom Homes , Additions and Major Renovations
-Project Management and Planning - Design / Build , Engineering ,  
Permits

-Renovation Advisors for DIY homeowners
KITCHENS & INTERIORS
-Design and planning
-Custom kitchens and interior renovations
-Complete painting services
STRUCTURAL SERVICES
-Engineering , permits required, Foundations and underpinning
-Wall removal and beam installation

Maintenance and Repairs Services
-Masonry repairs and Stone work  (in house staff )
-Windows and doors
-Eave troughs and metal work

Builders, Home  
Builders, Home Contractors, residential builders, residential  
contractors, home construction companies, design build companies,  
design build contractors, residential building  
contractor,Foundations, ,General Contractors, Residential General  
Contractor, Building Contractor, Additions, Remodeling Contractor,   
Renovation, Builder,Home Additions, General contractor, home  
improvement, building addition, home expansion, house  
expansion,Kitchen & Bathroom - Cabinets & Design,  
Kitchen Cabinet And Counter, Kitchen Cabinet Hardware, Bathroom  
Cabinet, Bathroom Wall Cabinet, Bathroom Sink Cabinet,Kitchen Planning  
& Renovation, Kitchen Planning And Design, Kitchen Cabinet  
Planning, Kitchen Design, Kitchen Remodeling,Masonry &  
Bricklaying, Masonry Supply, Masonry Contractor, Concrete Masonry,  
Stone Masonry, Brick Laying Technique, Brick Laying Pattern, building  
a fireplace, constructing a firplace, stone fence, stone wall, brick  
wall, masonry repair, brick repairs,Paint & Wallpaper  
Contractors, Paint Colors, Paint Store, Paint Brush, Paint Shop, Home  
Wallpaper, Home Decorating, Wallpaper, Paint colour advice, paint  
colour consultants, wallpapering,name='reviews_info_cache_t' boost='0.0'>name='position_rf' boost='0.0'>12.1115name='first_letter_of_name_t' boost='0.0'>Jname='country_t' boost='0.0'>CANADAboost='0.0'>9.8913boost='0.0'>66boost='0.0'>23boost='0.0'>232field>approvedname='category_name_facet'>Buildersname='category_name_facet'>Foundationsname='category_name_facet'>General Contractorsname='category_name_facet'>Home Additi

Re: DIH uses == instead of = in SQL

2009-05-22 Thread Otis Gospodnetic

Eric,

WHERE institution_id=1433
  vs.
WHERE institution_id==1433

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Eric Pugh 
> To: solr-user@lucene.apache.org
> Sent: Friday, May 22, 2009 2:43:59 PM
> Subject: DIH uses == instead of = in SQL
> 
> I am getting this error:
> 
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You 
> have 
> an error in your SQL syntax; check the manual that corresponds to your MySQL 
> server version for the right syntax to use near '=='1433'' at line 1
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> 
> during a select for a specific institution:
> 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
> execute 
> query: select institution_id, name, acronym as i_acronym from institutions 
> where 
> institution_id=='1433' Processing Document # 1
> at 
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:248)
> at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205)
> at 
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
> at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
> at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
> 
> I just switched to using the paired deltaImportQuery and deltaQuery approach. 
>  
> I am using the latest from trunk.  Any ideas?
> 
> Eric
> 
> -
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
> http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal



Data Import Handler - parentDeltaImport

2009-05-22 Thread Michael Korthuis
I have the data-config.xml detailed below  (Stripped down a bit for
simplicity) -
When I run the delta import, the design_template delta query is running and
modified rows are being returned.  However, the parentDeltaQuery is never
executed.

Any thoughts?

Thanks,

Micahael




 


 

 
   
   


 



Re: Data Import Handler - parentDeltaImport

2009-05-22 Thread Michael Korthuis
I have the data-config.xml detailed below  (Stripped down a bit for
simplicity) -
When I run the delta import, the design_template delta query is running and
modified rows are being returned.  However, the parentDeltaQuery is never
executed.

Any thoughts?

Thanks,

Micahael




 


 

  

  

 

 
 



How to use DIH to index attributes in xml file

2009-05-22 Thread Jianbin Dai

I have an xml file like this 




301.46


In the data-config.xml, I use


but how can I index "id", "mid"?

Thanks.


  


solr machine freeze up during first replication after optimization

2009-05-22 Thread Kyle Lau
Hi all,

We recently started running into this solr slave server freeze up problem.
After looking into the logs and the timing of such occurrences, it seems
that the problem always follows the first replication after an
optimization.  Once the server freezes up, we are unable to ssh into it, but
ping still returns fine.  The only way to recover is by rebooting the
machine.

In our replication setup, the masters are optimized nightly because we have
a fairly large index (~60GB per master) and are adding millions of documents
everyday.  After the optimization, a snapshot happens automatically.  When
replication kicks in, the corresonding slave server will retrieve the
snapshot using rsync.

Here is the snappuller.log capturing one of the failed pull and one
successful pull before and after it:

2009/05/21 22:55:01 started by biz360
2009/05/21 22:55:01 command: /mnt/solr/bin/snappuller ...
2009/05/21 22:55:04 pulling snapshot snapshot.20090521221402
2009/05/21 22:55:11 ended (elapsed time: 10 sec)

# optimization completes sometime during this gap, and a new snapshot is
created

2009/05/21 23:55:01 started by biz360
2009/05/21 23:55:01 command: /mnt/solr/bin/snappuller ...
2009/05/21 23:55:02 pulling snapshot snapshot.20090521233922

# slave freezes up, and machine has to be rebooted

2009/05/22 01:55:02 started by biz360
2009/05/22 01:55:02 command: /mnt/solr/bin/snappuller ...
2009/05/22 01:55:03 pulling snapshot snapshot.20090522014528
2009/05/22 02:56:12 ended (elapsed time: 3670 sec)


A more detailed debug log shows snappuller simply stopped at some point:

started by biz360
command: /mnt/solr/bin/snappuller ...
pulling snapshot snapshot.20090521233922
receiving file list ... done
deleting segments_16a
deleting _cwu.tis
deleting _cwu.tii
deleting _cwu.prx
deleting _cwu.nrm
deleting _cwu.frq
deleting _cwu.fnm
deleting _cwt.tis
deleting _cwt.tii
deleting _cwt.prx
deleting _cwt.nrm
deleting _cwt.frq
deleting _cwt.fnm
deleting _cws.tis
deleting _cws.tii
deleting _cws.prx
deleting _cws.nrm
deleting _cws.frq
deleting _cws.fnm
deleting _cwr_1.del
deleting _cwr.tis
deleting _cwr.tii
deleting _cwr.prx
deleting _cwr.nrm
deleting _cwr.frq
deleting _cwr.fnm
deleting _cwq.tis
deleting _cwq.tii
deleting _cwq.prx
deleting _cwq.nrm
deleting _cwq.frq
deleting _cwq.fnm
deleting _cwq.fdx
deleting _cwq.fdt
deleting _cwp.tis
deleting _cwp.tii
deleting _cwp.prx
deleting _cwp.nrm
deleting _cwp.frq
deleting _cwq.fnm
deleting _cwq.fdx
deleting _cwq.fdt
deleting _cwp.tis
deleting _cwp.tii
deleting _cwp.prx
deleting _cwp.nrm
deleting _cwp.frq
deleting _cwp.fnm
deleting _cwp.fdx
deleting _cwp.fdt
deleting _cwo_1.del
deleting _cwo.tis
deleting _cwo.tii
deleting _cwo.prx
deleting _cwo.nrm
deleting _cwo.frq
deleting _cwo.fnm
deleting _cwo.fdx
deleting _cwo.fdt
deleting _cwe_1.del
deleting _cwe.tis
deleting _cwe.tii
deleting _cwe.prx
deleting _cwe.nrm
deleting _cwe.frq
deleting _cwe.fnm
deleting _cwe.fdx
deleting _cwe.fdt
deleting _cw2_3.del
deleting _cw2.tis
deleting _cw2.tii
deleting _cw2.prx
deleting _cw2.nrm
deleting _cw2.frq
deleting _cw2.fnm
deleting _cw2.fdx
deleting _cw2.fdt
deleting _cvs_4.del
deleting _cvs.tis
deleting _cvs.tii
deleting _cvs.prx
deleting _cvs.nrm
deleting _cvs.frq
deleting _cvs.fnm
deleting _cvs.fdx
deleting _cvs.fdt
deleting _csp_h.del
deleting _csp.tis
deleting _csp.tii
deleting _csp.prx
deleting _csp.nrm
deleting _csp.frq
deleting _csp.fnm
deleting _csp.fdx
deleting _csp.fdt
deleting _cpn_q.del
deleting _cpn.tis
deleting _cpn.tii
deleting _cpn.prx
deleting _cpn.nrm
deleting _cpn.frq
deleting _cpn.fnm
deleting _cpn.fdx
deleting _cpn.fdt
deleting _cmk_x.del
deleting _cmk.tis
deleting _cmk.tii
deleting _cmk.prx
deleting _cmk.nrm
deleting _cmk.frq
deleting _cmk.fnm
deleting _cmk.fdx
deleting _cmk.fdt
deleting _cjg_14.del
deleting _cjg.tis
deleting _cjg.tii
deleting _cjg.prx
deleting _cjg.nrm
deleting _cjg.frq
deleting _cjg.fnm
deleting _cjg.fdx
deleting _cjg.fdt
deleting _cge_19.del
deleting _cge.tis
deleting _cge.tii
deleting _cge.prx
deleting _cge.nrm
deleting _cge.frq
deleting _cge.fnm
deleting _cge.fdx
deleting _cge.fdt
deleting _cd9_1m.del
deleting _cd9.tis
deleting _cd9.tii
deleting _cd9.prx
deleting _cd9.nrm
deleting _cd9.frq
deleting _cd9.fnm
deleting _cd9.fdx
deleting _cd9.fdt
./
_cww.fdt

We have random Solr slaves failing in the exact same manner almost daily.
Any help is appreciated!


Re: solr machine freeze up during first replication after optimization

2009-05-22 Thread Otis Gospodnetic

Hm, are you sure this is not a network/switch/disk/something like that problem?
Also, precisely because you have such a large index I'd avoid optimizing the 
index and then replicating it.  My wild guess is that simply rsyncing this much 
data over the network kills your machines.  Have you tried manually doing the 
rsync and watching the machine/switches/NICs/disks to see what's going on?  
That's what I'd do.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Kyle Lau 
> To: solr-user@lucene.apache.org
> Sent: Friday, May 22, 2009 7:54:53 PM
> Subject: solr machine freeze up during first replication after optimization
> 
> Hi all,
> 
> We recently started running into this solr slave server freeze up problem.
> After looking into the logs and the timing of such occurrences, it seems
> that the problem always follows the first replication after an
> optimization.  Once the server freezes up, we are unable to ssh into it, but
> ping still returns fine.  The only way to recover is by rebooting the
> machine.
> 
> In our replication setup, the masters are optimized nightly because we have
> a fairly large index (~60GB per master) and are adding millions of documents
> everyday.  After the optimization, a snapshot happens automatically.  When
> replication kicks in, the corresonding slave server will retrieve the
> snapshot using rsync.
> 
> Here is the snappuller.log capturing one of the failed pull and one
> successful pull before and after it:
> 
> 2009/05/21 22:55:01 started by biz360
> 2009/05/21 22:55:01 command: /mnt/solr/bin/snappuller ...
> 2009/05/21 22:55:04 pulling snapshot snapshot.20090521221402
> 2009/05/21 22:55:11 ended (elapsed time: 10 sec)
> 
> # optimization completes sometime during this gap, and a new snapshot is
> created
> 
> 2009/05/21 23:55:01 started by biz360
> 2009/05/21 23:55:01 command: /mnt/solr/bin/snappuller ...
> 2009/05/21 23:55:02 pulling snapshot snapshot.20090521233922
> 
> # slave freezes up, and machine has to be rebooted
> 
> 2009/05/22 01:55:02 started by biz360
> 2009/05/22 01:55:02 command: /mnt/solr/bin/snappuller ...
> 2009/05/22 01:55:03 pulling snapshot snapshot.20090522014528
> 2009/05/22 02:56:12 ended (elapsed time: 3670 sec)
> 
> 
> A more detailed debug log shows snappuller simply stopped at some point:
> 
> started by biz360
> command: /mnt/solr/bin/snappuller ...
> pulling snapshot snapshot.20090521233922
> receiving file list ... done
> deleting segments_16a
> deleting _cwu.tis
> deleting _cwu.tii
> deleting _cwu.prx
> deleting _cwu.nrm
> deleting _cwu.frq
> deleting _cwu.fnm
> deleting _cwt.tis
> deleting _cwt.tii
> deleting _cwt.prx
> deleting _cwt.nrm
> deleting _cwt.frq
> deleting _cwt.fnm
> deleting _cws.tis
> deleting _cws.tii
> deleting _cws.prx
> deleting _cws.nrm
> deleting _cws.frq
> deleting _cws.fnm
> deleting _cwr_1.del
> deleting _cwr.tis
> deleting _cwr.tii
> deleting _cwr.prx
> deleting _cwr.nrm
> deleting _cwr.frq
> deleting _cwr.fnm
> deleting _cwq.tis
> deleting _cwq.tii
> deleting _cwq.prx
> deleting _cwq.nrm
> deleting _cwq.frq
> deleting _cwq.fnm
> deleting _cwq.fdx
> deleting _cwq.fdt
> deleting _cwp.tis
> deleting _cwp.tii
> deleting _cwp.prx
> deleting _cwp.nrm
> deleting _cwp.frq
> deleting _cwq.fnm
> deleting _cwq.fdx
> deleting _cwq.fdt
> deleting _cwp.tis
> deleting _cwp.tii
> deleting _cwp.prx
> deleting _cwp.nrm
> deleting _cwp.frq
> deleting _cwp.fnm
> deleting _cwp.fdx
> deleting _cwp.fdt
> deleting _cwo_1.del
> deleting _cwo.tis
> deleting _cwo.tii
> deleting _cwo.prx
> deleting _cwo.nrm
> deleting _cwo.frq
> deleting _cwo.fnm
> deleting _cwo.fdx
> deleting _cwo.fdt
> deleting _cwe_1.del
> deleting _cwe.tis
> deleting _cwe.tii
> deleting _cwe.prx
> deleting _cwe.nrm
> deleting _cwe.frq
> deleting _cwe.fnm
> deleting _cwe.fdx
> deleting _cwe.fdt
> deleting _cw2_3.del
> deleting _cw2.tis
> deleting _cw2.tii
> deleting _cw2.prx
> deleting _cw2.nrm
> deleting _cw2.frq
> deleting _cw2.fnm
> deleting _cw2.fdx
> deleting _cw2.fdt
> deleting _cvs_4.del
> deleting _cvs.tis
> deleting _cvs.tii
> deleting _cvs.prx
> deleting _cvs.nrm
> deleting _cvs.frq
> deleting _cvs.fnm
> deleting _cvs.fdx
> deleting _cvs.fdt
> deleting _csp_h.del
> deleting _csp.tis
> deleting _csp.tii
> deleting _csp.prx
> deleting _csp.nrm
> deleting _csp.frq
> deleting _csp.fnm
> deleting _csp.fdx
> deleting _csp.fdt
> deleting _cpn_q.del
> deleting _cpn.tis
> deleting _cpn.tii
> deleting _cpn.prx
> deleting _cpn.nrm
> deleting _cpn.frq
> deleting _cpn.fnm
> deleting _cpn.fdx
> deleting _cpn.fdt
> deleting _cmk_x.del
> deleting _cmk.tis
> deleting _cmk.tii
> deleting _cmk.prx
> deleting _cmk.nrm
> deleting _cmk.frq
> deleting _cmk.fnm
> deleting _cmk.fdx
> deleting _cmk.fdt
> deleting _cjg_14.del
> deleting _cjg.tis
> deleting _cjg.tii
> deleting _cjg.prx
> deleting _cjg.nrm
> deleting _cjg.frq
> deleting _cjg.fnm
> deleting _cjg.fdx
> deleting _cjg.fdt
> deleting _cge_19.del
> deleting _cge.

questions about Clustering

2009-05-22 Thread Koji Sekiguchi
I'm thinking using clustering (SOLR-769) function for my project.

I have a couple of questions:

1. if q=*:* is requested, Carrot2 will receive "MatchAllDocsQuery"
via attributes. Is it OK?

2. I'd like to use it on an environment other than English, e.g. Japanese.
I've implemented Carrot2JapaneseAnalyzer (w/ Payload/ITokenType)
for this purpose.
It worked well with ClusteringDocumentList example, but didn't
work with CarrotClusteringEngine.

What I did is that I inserted the following lines(+) to
CarrotClusteringEngine:

attributes.put(AttributeNames.QUERY, query.toString());
+ attributes.put(AttributeUtils.getKey(Tokenizer.class, "analyzer"),
+ Carrot2JapaneseAnalyzer.class);

There is no runtime errors, but Carrot2 didn't use my analyzer,
it just ignored and used ExtendedWhitespaceAnalyzer (confirmed via
debugger).
Is it classloader problem? I placed my jar in ${solr.solr.home}/lib .

Thank you,

Koji




Re: DIH uses == instead of = in SQL

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
are you using delta-import w/o a deltaImportQuery ? pls paste the
relevant portion of data-config.xml

On Sat, May 23, 2009 at 12:13 AM, Eric Pugh
 wrote:
> I am getting this error:
>
> Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You
> have an error in your SQL syntax; check the manual that corresponds to your
> MySQL server version for the right syntax to use near '=='1433'' at line 1
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> during a select for a specific institution:
>
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
> execute query: select institution_id, name, acronym as i_acronym from
> institutions where institution_id=='1433' Processing Document # 1
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:248)
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:205)
>        at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
>        at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
>        at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
>
> I just switched to using the paired deltaImportQuery and deltaQuery
> approach.   I am using the latest from trunk.  Any ideas?
>
> Eric
>
> -
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to use DIH to index attributes in xml file

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
wild cards are not supported . u must use full xpath

On Sat, May 23, 2009 at 4:55 AM, Jianbin Dai  wrote:
>
> I have an xml file like this
>
> 
>                    
>                    
>                    301.46
> 
>
> In the data-config.xml, I use
> 
>
> but how can I index "id", "mid"?
>
> Thanks.
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to index large set data

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
no need to use embedded Solrserver. you can use SolrJ with streaming
in multiple threads

On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai  wrote:
>
> If I do the xml parsing by myself and use embedded client to do the push, 
> would it be more efficient than DIH?
>
>
> --- On Fri, 5/22/09, Grant Ingersoll  wrote:
>
>> From: Grant Ingersoll 
>> Subject: Re: How to index large set data
>> To: solr-user@lucene.apache.org
>> Date: Friday, May 22, 2009, 5:38 AM
>> Can you parallelize this?  I
>> don't know that the DIH can handle it,
>> but having multiple threads sending docs to Solr is the
>> best
>> performance wise, so maybe you need to look at alternatives
>> to pulling
>> with DIH and instead use a client to push into Solr.
>>
>>
>> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
>>
>> >
>> > about 2.8 m total docs were created. only the first
>> run finishes. In
>> > my 2nd try, it hangs there forever at the end of
>> indexing, (I guess
>> > right before commit), with cpu usage of 100%. Total 5G
>> (2050) index
>> > files are created. Now I have two problems:
>> > 1. why it hangs there and failed?
>> > 2. how can i speed up the indexing?
>> >
>> >
>> > Here is my solrconfig.xml
>> >
>> >
>> false
>> >
>> 3000
>> >
>> 1000
>> >
>> 2147483647
>> >
>> 1
>> >
>> false
>> >
>> >
>> >
>> >
>> > --- On Thu, 5/21/09, Noble Paul
>> നോബിള്‍  नो
>> > ब्ळ् 
>> wrote:
>> >
>> >> From: Noble Paul നോബിള്‍
>> नोब्ळ्
>> >> 
>> >> Subject: Re: How to index large set data
>> >> To: solr-user@lucene.apache.org
>> >> Date: Thursday, May 21, 2009, 10:39 PM
>> >> what is the total no:of docs created
>> >> ?  I guess it may not be memory
>> >> bound. indexing is mostly amn IO bound operation.
>> You may
>> >> be able to
>> >> get a better perf if a SSD is used (solid state
>> disk)
>> >>
>> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin Dai
>> 
>> >> wrote:
>> >>>
>> >>> Hi Paul,
>> >>>
>> >>> Thank you so much for answering my questions.
>> It
>> >> really helped.
>> >>> After some adjustment, basically setting
>> mergeFactor
>> >> to 1000 from the default value of 10, I can
>> finished the
>> >> whole job in 2.5 hours. I checked that during
>> running time,
>> >> only around 18% of memory is being used, and VIRT
>> is always
>> >> 1418m. I am thinking it may be restricted by JVM
>> memory
>> >> setting. But I run the data import command through
>> web,
>> >> i.e.,
>> >>>
>> >>
>> http://:/solr/dataimport?command=full-import,
>> >> how can I set the memory allocation for JVM?
>> >>> Thanks again!
>> >>>
>> >>> JB
>> >>>
>> >>> --- On Thu, 5/21/09, Noble Paul
>> നോബിള്‍
>> >>  नोब्ळ् 
>> >> wrote:
>> >>>
>>  From: Noble Paul നോബിള്‍
>> >>  नोब्ळ् 
>>  Subject: Re: How to index large set data
>>  To: solr-user@lucene.apache.org
>>  Date: Thursday, May 21, 2009, 9:57 PM
>>  check the status page of DIH and see
>>  if it is working properly. and
>>  if, yes what is the rate of indexing
>> 
>>  On Thu, May 21, 2009 at 11:48 AM, Jianbin
>> Dai
>> >> 
>>  wrote:
>> >
>> > Hi,
>> >
>> > I have about 45GB xml files to be
>> indexed. I
>> >> am using
>>  DataImportHandler. I started the full
>> import 4
>> >> hours ago,
>>  and it's still running
>> > My computer has 4GB memory. Any
>> suggestion on
>> >> the
>>  solutions?
>> > Thanks!
>> >
>> > JB
>> >
>> >
>> >
>> >
>> >
>> 
>> 
>> 
>>  --
>> 
>> >>
>> -
>>  Noble Paul | Principal Engineer| AOL | http://aol.com
>> 
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> -
>> >> Noble Paul | Principal Engineer| AOL | http://aol.com
>> >>
>> >
>> >
>> >
>>
>> --
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem
>> (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination..com/search
>>
>>
>
>
>
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Data Import Handler - parentDeltaImport

2009-05-22 Thread Noble Paul നോബിള്‍ नोब्ळ्
how do you know it is not being executed ?. use deltaImportQuery also
if you are  using Solr1.4

On Sat, May 23, 2009 at 4:29 AM, Michael Korthuis
 wrote:
> I have the data-config.xml detailed below  (Stripped down a bit for
> simplicity) -
> When I run the delta import, the design_template delta query is running and
> modified rows are being returned.  However, the parentDeltaQuery is never
> executed.
>
> Any thoughts?
>
> Thanks,
>
> Micahael
>
>
> 
>  user="USER" password="PASSWORD"/>
>  
>  pk="catalog_item_id"
>  query="select catalog_item_id,catalog_item_code  from catalog_item"
> deltaQuery="select catalog_item_id from catalog_item where date_updated >
> '${dataimporter.last_index_time}'"
>  deletedPkQuery="select catalog_item_id from catalog_item_delete where
> date_deleted > '${dataimporter.last_index_time}'"
>>
>
>  
>
>   id="design_template_id" pk="design_template_id"
>  query="select name from design_template where
> design_template_id='${catalog_item.design_template_id_fk}'"
>  deltaQuery="select design_template_id from design_template where
> date_updated > '${dataimporter.last_index_time}'"
> parentDeltaQuery="select catalog_item_id from catalog_item where
> design_template_id_fk = '${design_template.design_template_id}'"
>  >
>
>  
>
>  
>
>  
>  
> 
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: How to index large set data

2009-05-22 Thread Jianbin Dai

Hi Pual, but in your previous post, you said "there is already an issue for 
writing to Solr in multiple threads  SOLR-1089". Do you think use solrj alone 
would be better than DIH? 
Thanks and have a good weekend!

--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> no need to use embedded Solrserver.
> you can use SolrJ with streaming
> in multiple threads
> 
> On Fri, May 22, 2009 at 8:36 PM, Jianbin Dai 
> wrote:
> >
> > If I do the xml parsing by myself and use embedded
> client to do the push, would it be more efficient than DIH?
> >
> >
> > --- On Fri, 5/22/09, Grant Ingersoll 
> wrote:
> >
> >> From: Grant Ingersoll 
> >> Subject: Re: How to index large set data
> >> To: solr-user@lucene.apache.org
> >> Date: Friday, May 22, 2009, 5:38 AM
> >> Can you parallelize this?  I
> >> don't know that the DIH can handle it,
> >> but having multiple threads sending docs to Solr
> is the
> >> best
> >> performance wise, so maybe you need to look at
> alternatives
> >> to pulling
> >> with DIH and instead use a client to push into
> Solr.
> >>
> >>
> >> On May 22, 2009, at 3:42 AM, Jianbin Dai wrote:
> >>
> >> >
> >> > about 2.8 m total docs were created. only the
> first
> >> run finishes. In
> >> > my 2nd try, it hangs there forever at the end
> of
> >> indexing, (I guess
> >> > right before commit), with cpu usage of 100%.
> Total 5G
> >> (2050) index
> >> > files are created. Now I have two problems:
> >> > 1. why it hangs there and failed?
> >> > 2. how can i speed up the indexing?
> >> >
> >> >
> >> > Here is my solrconfig.xml
> >> >
> >> >
> >>
> false
> >> >
> >>
> 3000
> >> >
> >> 1000
> >> >
> >>
> 2147483647
> >> >
> >>
> 1
> >> >
> >>
> false
> >> >
> >> >
> >> >
> >> >
> >> > --- On Thu, 5/21/09, Noble Paul
> >> നോബിള്‍  नो
> >> > ब्ळ् 
> >> wrote:
> >> >
> >> >> From: Noble Paul നോബിള്‍
> >> नोब्ळ्
> >> >> 
> >> >> Subject: Re: How to index large set data
> >> >> To: solr-user@lucene.apache.org
> >> >> Date: Thursday, May 21, 2009, 10:39 PM
> >> >> what is the total no:of docs created
> >> >> ?  I guess it may not be memory
> >> >> bound. indexing is mostly amn IO bound
> operation.
> >> You may
> >> >> be able to
> >> >> get a better perf if a SSD is used (solid
> state
> >> disk)
> >> >>
> >> >> On Fri, May 22, 2009 at 10:46 AM, Jianbin
> Dai
> >> 
> >> >> wrote:
> >> >>>
> >> >>> Hi Paul,
> >> >>>
> >> >>> Thank you so much for answering my
> questions.
> >> It
> >> >> really helped.
> >> >>> After some adjustment, basically
> setting
> >> mergeFactor
> >> >> to 1000 from the default value of 10, I
> can
> >> finished the
> >> >> whole job in 2.5 hours. I checked that
> during
> >> running time,
> >> >> only around 18% of memory is being used,
> and VIRT
> >> is always
> >> >> 1418m. I am thinking it may be restricted
> by JVM
> >> memory
> >> >> setting. But I run the data import
> command through
> >> web,
> >> >> i.e.,
> >> >>>
> >> >>
> >>
> http://:/solr/dataimport?command=full-import,
> >> >> how can I set the memory allocation for
> JVM?
> >> >>> Thanks again!
> >> >>>
> >> >>> JB
> >> >>>
> >> >>> --- On Thu, 5/21/09, Noble Paul
> >> നോബിള്‍
> >> >>  नोब्ळ् 
> >> >> wrote:
> >> >>>
> >>  From: Noble Paul
> നോബിള്‍
> >> >>  नोब्ळ् 
> >>  Subject: Re: How to index large
> set data
> >>  To: solr-u...@lucene.apache..org
> >>  Date: Thursday, May 21, 2009,
> 9:57 PM
> >>  check the status page of DIH and
> see
> >>  if it is working properly. and
> >>  if, yes what is the rate of
> indexing
> >> 
> >>  On Thu, May 21, 2009 at 11:48 AM,
> Jianbin
> >> Dai
> >> >> 
> >>  wrote:
> >> >
> >> > Hi,
> >> >
> >> > I have about 45GB xml files
> to be
> >> indexed. I
> >> >> am using
> >>  DataImportHandler. I started the
> full
> >> import 4
> >> >> hours ago,
> >>  and it's still running.
> >> > My computer has 4GB memory.
> Any
> >> suggestion on
> >> >> the
> >>  solutions?
> >> > Thanks!
> >> >
> >> > JB
> >> >
> >> >
> >> >
> >> >
> >> >
> >> 
> >> 
> >> 
> >>  --
> >> 
> >> >>
> >>
> -
> >>  Noble Paul | Principal Engineer|
> AOL | http://aol.com
> >> 
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >>
> -
> >> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >> >>
> >> >
> >> >
> >> >
> >>
> >> --
> >> Grant Ingersoll
> >> http://www.lucidimagination.com/
> >>
> >> Search the Lucene ecosystem
> >> (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> >> using Solr/Lucene:
> >> http://www.lucidimagination...com/search
> >>
> >>
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 






Re: How to use DIH to index attributes in xml file

2009-05-22 Thread Jianbin Dai

Oh, I guess I didn't say it clearly in my post. 
I didn't use wild cards in xpath. My question was how to index attributes "id" 
and "mid" in the following xml file.




301.46


In the data-config.xml, I use


but what are the xpath for "id" and "mid"?

Thanks again!





--- On Fri, 5/22/09, Noble Paul നോബിള്‍  नोब्ळ्  wrote:

> From: Noble Paul നോബിള്‍  नोब्ळ् 
> Subject: Re: How to use DIH to index attributes in xml file
> To: solr-user@lucene.apache.org
> Date: Friday, May 22, 2009, 9:03 PM
> wild cards are not supported . u must
> use full xpath
> 
> On Sat, May 23, 2009 at 4:55 AM, Jianbin Dai 
> wrote:
> >
> > I have an xml file like this
> >
> > 
> >                     type="stock-4" />
> >                     type="cond-0" />
> >                  
>  301.46
> > 
> >
> > In the data-config.xml, I use
> >   xpath="/.../merchantProduct/price" />
> >
> > but how can I index "id", "mid"?
> >
> > Thanks.
> >
> >
> >
> >
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 






Re: How to use DIH to index attributes in xml file

2009-05-22 Thread Shalin Shekhar Mangar
On Sat, May 23, 2009 at 10:31 AM, Jianbin Dai  wrote:

>
> Oh, I guess I didn't say it clearly in my post.
> I didn't use wild cards in xpath. My question was how to index attributes
> "id" and "mid" in the following xml file.
>
> 
>
>
>301.46
> 
>
> In the data-config.xml, I use
> 
>
> but what are the xpath for "id" and "mid"?
>

That would be /merchantProduct/@id and /merchantProduct/@mid

-- 
Regards,
Shalin Shekhar Mangar.