Solr1.3 / MySql / Tomcat55 multiple delta-import inside a big full-import

2008-10-31 Thread sunnyfr
Hi, I would like to know if it's very longer to make a limited full import and multi delta-import to index all the database. If I fire a full-import without limit 4M in my request that will run me OOM because I've 8,5M of document. If I fire a full-import without limit and a batchsize=-1 I will s

Re: Performanec Lucene / Solr

2008-10-31 Thread Kraus, Ralf | pixelhouse GmbH
Hey, I think it will have the disadvantage of being a lot slower though... How were you handling things with Lucene? You must have used Java then? If you even want to get close to that performance I think you need to use non http embedded solr. I am using this : - I wrote a JAVA JSP file to

Re: DataImportHandler running out of memory

2008-10-31 Thread sunnyfr
Hi Grant, How did you finally managed it I've the same problem with less data, 8,5M, if I put a batchsize -1, I will slow down a lot the database which is not that good for the website and stack request. What did you do you ??? Thanks, Grant Ingersoll-6 wrote: > > I think it's a bit di

Re: Changing mergeFactor in mid-stream?

2008-10-31 Thread Mark Miller
Otis Gospodnetic wrote: Yes, you can change the mergeFactor. More important than the mergeFactor is this: 32 Pump it up as much as your hardware/JVM allows. And use appropriate -Xmx, of course. Is that true? I thought there was a sweet spot for the RAM buffer (and not as high as youd th

Re: Performanec Lucene / Solr

2008-10-31 Thread Kraus, Ralf | pixelhouse GmbH
Hi, Thx a lot for the tip ! But when I try it I got > HTTP/1.1 500 null java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37) My Request is : INFO: [core_de] webapp=/solr path=/select/ params={wt=phps&query=Tools&records=30&start_record=0} statu

Re: Performanec Lucene / Solr

2008-10-31 Thread Shalin Shekhar Mangar
On Fri, Oct 31, 2008 at 5:10 PM, Kraus, Ralf | pixelhouse GmbH < [EMAIL PROTECTED]> wrote: > > > HTTP/1.1 500 null java.lang.NullPointerException at > org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37) > > My Request is : > INFO: [core_de] webapp=/solr path=/select/ > params={wt=php

Re: Using Solrj

2008-10-31 Thread Shalin Shekhar Mangar
On Fri, Oct 31, 2008 at 4:32 PM, Raghunandan Rao < [EMAIL PROTECTED]> wrote: > I am doing that but the API is in experimental stage. Not sure to use it or > not. BTW can you also let me know how clustering works on Windows OS cos I > saw clustering scripts for Unix OS bundled out with Solr release

RE: Using Solrj

2008-10-31 Thread Raghunandan Rao
Thank you. I was talking about DataImportHandler API. -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Friday, October 31, 2008 5:19 PM To: solr-user@lucene.apache.org Subject: Re: Using Solrj On Fri, Oct 31, 2008 at 4:32 PM, Raghunandan Rao < [EMAIL PROTE

Re: Solr1.3 / MySql / Tomcat55 multiple delta-import inside a big full-import

2008-10-31 Thread sunnyfr
Sorry I wasn't clear, The stack is not on solr database or index query, stack request are on our main database MySql, When I do a full import to create indexes for solr, MySql honnor it and won't drive it OOM, but with a batchsize -1, it uses MySql memory which let less memory for the rest of the

Re: Performanec Lucene / Solr

2008-10-31 Thread Erik Hatcher
On Oct 31, 2008, at 6:14 AM, Kraus, Ralf | pixelhouse GmbH wrote: Hey, I think it will have the disadvantage of being a lot slower though... How were you handling things with Lucene? You must have used Java then? If you even want to get close to that performance I think you need to use non

Re: Using Solrj

2008-10-31 Thread Shalin Shekhar Mangar
On Fri, Oct 31, 2008 at 5:21 PM, Raghunandan Rao < [EMAIL PROTECTED]> wrote: > Thank you. > I was talking about DataImportHandler API. > > Most likely, you will not need to use the API. DataImportHandler will let you index your database without writing code -- you just need an XML configuration fi

RE: Using Solrj

2008-10-31 Thread Raghunandan Rao
I am doing that but the API is in experimental stage. Not sure to use it or not. BTW can you also let me know how clustering works on Windows OS cos I saw clustering scripts for Unix OS bundled out with Solr release. -Original Message- From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROT

Re: Performanec Lucene / Solr

2008-10-31 Thread Erik Hatcher
On Oct 31, 2008, at 7:42 AM, Shalin Shekhar Mangar wrote: On Fri, Oct 31, 2008 at 5:10 PM, Kraus, Ralf | pixelhouse GmbH < [EMAIL PROTECTED]> wrote: HTTP/1.1 500 null java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37) My Request is : INFO:

Re: Solr1.3 / MySql / Tomcat55 multiple delta-import inside a big full-import

2008-10-31 Thread Shalin Shekhar Mangar
On Fri, Oct 31, 2008 at 3:27 PM, sunnyfr <[EMAIL PROTECTED]> wrote: > > I would like to know if it's very longer to make a limited full import and > multi delta-import to index all the database. > If I fire a full-import without limit 4M in my request that will run me OOM > because I've 8,5M of do

Re: Performanec Lucene / Solr

2008-10-31 Thread Kraus, Ralf | pixelhouse GmbH
Hi, And rows instead of records, and start instead of start_record. :) Erik You´re my man :-) Greets -Ralf-

Re: Performanec Lucene / Solr

2008-10-31 Thread Kraus, Ralf | pixelhouse GmbH
Hi, class="org.apache.solr.request.PHPSerializedResponseWriter"/> Then in PHP, hit Solr directly like this: $response = unserialize(file_get_contents($url)); Where $url is something like http://localhost:8983/solr/select?q=*:* No SOLR is 2times faster than LUCENE => Strike ! He

Re: DataImportHandler running out of memory

2008-10-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
I've moved the FAQ to a new Page http://wiki.apache.org/solr/DataImportHandlerFaq The DIH page is too big and editing has become harder On Thu, Jun 26, 2008 at 6:07 PM, Shalin Shekhar Mangar <[EMAIL PROTECTED]> wrote: > I've added a FAQ section to DataImportHandler wiki page which captures > quest

Re: DIH and rss feeds

2008-10-31 Thread Jon Baer
Is that right? I find the wording of "clean" a little confusing. I would have thought this is what I had needed earlier but the topic came up regarding the fact that you can not deleteByQuery for an entity you want to flush w/ delta-import. I just noticed that the original JIRA request sa

What are the way to update / delete solr datas?

2008-10-31 Thread Vincent Pérès
Hello, I'm trying to find the best way to update / delete datas according to my project (developed with javascript and rails). I would like to do something like that : http://localhost:8983/solr/update/?q=id:1&rate=4 and http://localhost:8983/solr/delete/?q=id:1 Is it possible ? But I found onl

Re: DIH and rss feeds

2008-10-31 Thread Shalin Shekhar Mangar
The "clean" parameter is there in the 1.3 release. The full-import is by definition "full" so we delete all existing documents at the start. If you don't want to clean the index, you can pass clean=false and DIH will just add them. On Fri, Oct 31, 2008 at 8:58 PM, Jon Baer <[EMAIL PROTECTED]> wrote

Re: What are the way to update / delete solr datas?

2008-10-31 Thread Erik Hatcher
On Oct 31, 2008, at 11:40 AM, Vincent Pérès wrote: The last possibility is to use the solr-ruby library. If you're using Ruby, that's what I'd use. Were your other proposals to still do those calls from Ruby, but with the HTTP library directly? Erik

Re: What are the way to update / delete solr datas?

2008-10-31 Thread Vincent Pérès
Thanks for your quick answer. I'm using only HTTP to display my results, that's why I would like to continue with this way. If I can use HTTP instead of solr, it will be better for me. Erik Hatcher wrote: > > > On Oct 31, 2008, at 11:40 AM, Vincent Pérès wrote: >> The last possibility is t

Re: date range query performance

2008-10-31 Thread Chris Hostetter
: Concrete example, this query just look 18s: : : instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z TO : 2008-10-30T03:59:59Z] AND label_facet:"Added to Position" : I saw a thread from Apr 2008 which explains the problem being due to too much : precision on the DateField type

Re: date range query performance

2008-10-31 Thread Alok Dhir
We have implemented the suggested reduction in granularity by dropping time altogether and simply disallowing time filtering. This, in light of other search filters we have provided, should prove be sufficient for our user base. We did keep the fine granularity field not for filtering, but

Re: corrupt solr index on ec2

2008-10-31 Thread Michael McCandless
Bill Graham wrote: Then it seemed to run well for about an hour and I saw this: Oct 28, 2008 10:38:51 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Oct 28, 2008 10:38:51 PM org.apache.solr.common.SolrException log S

TermVectorComponent for tag generation?

2008-10-31 Thread Jon Baer
Hi, So Im looking to either use this or build a component which might do what Im looking for. Id like to figure out if its possible use a single doc to get tag generation based on the matches within that document for example: 1 News Doc -> contains 5 Players and 8 Teams (show them as pos

Re: TermVectorComponent for tag generation?

2008-10-31 Thread Grant Ingersoll
Hey Jon, Not following how the TVC (TermVectorComp) would help here.I suppose you could use the "most important" terms, as defined by TF- IDF, as suggested tags. The MLT (MoreLikeThis) uses this to generate query terms. However, I'm not following the different filter query piece. Can

Re: TermVectorComponent for tag generation?

2008-10-31 Thread Jon Baer
Well for example in any given text (which is field on a document); "While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site search

RE: DIH and rss feeds

2008-10-31 Thread Lance Norskog
Thanks all. I knew there had to be something :) Perhaps I should read the complete wiki page over and over again some more. It is a complex tool. Lance -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Friday, October 31, 2008 8:42 AM To: solr-user@lucene.a

DIH Http input bug - problem with two-level RSS walker

2008-10-31 Thread Lance Norskog
I wrote a nested HttpDataSource RSS poller. The outer loop reads an rss feed which contains N links to other rss feeds. The nested loop then reads each one of those to create documents. (Yes, this is an obnoxious thing to do.) Let's say the outer RSS feed gives 10 items. Both feeds use the same str

Re: date range query performance

2008-10-31 Thread Michael Lackhoff
On 31.10.2008 19:16 Chris Hostetter wrote: > forteh record, you don't need to index as a "StrField" to get this > benefit, you can still index using DateField you just need to round your > dates to some less graunlar level .. if you always want to round down, you > don't even need to do the rou

Re: date range query performance

2008-10-31 Thread Erik Hatcher
On Nov 1, 2008, at 1:07 AM, Michael Lackhoff wrote: On 31.10.2008 19:16 Chris Hostetter wrote: forteh record, you don't need to index as a "StrField" to get this benefit, you can still index using DateField you just need to round your dates to some less graunlar level .. if you always want

Re: DIH Http input bug - problem with two-level RSS walker

2008-10-31 Thread Shalin Shekhar Mangar
On Sat, Nov 1, 2008 at 10:30 AM, Lance Norskog <[EMAIL PROTECTED]> wrote: > I wrote a nested HttpDataSource RSS poller. The outer loop reads an rss > feed > which contains N links to other rss feeds. The nested loop then reads each > one of those to create documents. (Yes, this is an obnoxious thi

Re: date range query performance

2008-10-31 Thread Michael Lackhoff
On 01.11.2008 06:10 Erik Hatcher wrote: > Yeah, this should work fine: > > default="NOW/DAY" multiValued="false"/> Wow, that was fast, thanks! -Michael

Re: TermVectorComponent for tag generation?

2008-10-31 Thread Vaijanath N. Rao
Hi Jon, Isn't it similar to what Grant just said the top most terms ( after removing the stop words ). You would need to get how many terms are there and there related frequency and any term which is beyond a certain threshold you would mark it as an member of tag set. One can also build a