Embedded SOLR using the SOLR collection distribution
Hello, I would like to know if can implement the Embedded SOLR using the SOLR collection distribution? Regards, Dilip -Original Message- From: mike topper [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 22, 2007 8:29 PM To: solr-user@lucene.apache.org Subject: almost realtime updates with replication Hello, Currently in our application we are using the master/slave setup and have a batch update/commit about every 5 minutes. There are a couple queries that we would like to run almost realtime so I would like to have it so our client sends an update on every new document and then have solr configured to do an autocommit every 5-10 seconds. reading the Wiki, it seems like this isn't possible because of the strain of snapshotting and pulling to the slaves at such a high rate. What I was thinking was for these few queries to just query the master and the rest can query the slave with the not realtime data, although I'm assuming this wouldn't work either because since a snapshot is created on every commit, we would still impact the performance too much? anyone have any suggestions? If I set autowarmingCount=0 would I be able to to pull to the slave faster than every couple of minutes (say, every 10 seconds)? what if I take out the postcommit hook on the master and just have the snapshooter run on a cron every 5 minutes? -Mike
The mechanism of data replciation in Solr?
Hello, everybody:-) I'm interested with the mechanism of data replciation in Solr, In the "Introduction to the solr enterprise Search Server", Replication is one of features of Solr, but I can't find anything about replication issues on the Web site and documents, including how to split the index, how to distribute the chunks of index, how to placement the replica, eager replicaton or lazy replication..etc. I think they are different from the problem in HDFS. Can anybody help me? Thank you in advance. Best Wishes.
Re: Indexing longer documents using Solr...memory issue after index grows to about 800 MB...
thanks for your reply, my response below: On 9/5/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > On 4-Sep-07, at 4:50 PM, Ravish Bhagdev wrote: > > > - I have about 11K html documents to index. > > - I'm trying to index these documents (along with 3 more small string > > fields) so that when I search within the "doc" field (field with the > > html file content), I can get results with snippets or highlights as I > > get when using nutch. > > - While going through Wiki I noticed that if I need to do highlighting > > in a particular field, I have to make sure it is indexed and stored. > > > > But when I try to do the above, after indexing about 3K files which > > creates index of about 800MB (which is fine as files are quite > > lengthy) it keeps giving out of heap space errors. > > > > Things I've tried without much help: > > > > - Increase memory of tomcat > > - Play around with settings like autoCommit (documents and time) > > - Reducing mergefactor to 5 > > - Reducing maxBufferedDocs to 100 > > Merge factor should not affect memory usage. You say that you > increased memory usage.. but to what? I've found reducing > maxBuffered Docs decreases my peak memory usage significantly. > OK > > My question is also, if its required to store fields in index to be > > able to do highlighting/returning field content, how does nutch/lucene > > do it without that (because index for same documents created using > > nutch is much much smaller) > > Are you sure that it doesn't? According to: > > http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary- > basic/src/java/org/apache/nutch/summary/basic/BasicSummarizer.java? > view=markup > > nutch does indeed take the stored text and re-analyses it when > generating a summary. Does nutch perhaps store less content of a > document, or in a different store? > I am not sure what it does internally but my educated guess is it doesn't store entire documents in index (going by index size). It is way too small when created using nutch to store entire documents (pretty sure of this part) > > But also when trying to query partially added documents, when I set > > field highlight on (and a particular field) it doesn't seem to have > > any effect. > > Does the field contain a match against one of the terms you are > querying for? > Yup > -Mike > > > Cheers, Ravi
Re: The mechanism of data replciation in Solr?
On Wed, 2007-09-05 at 15:56 +0800, Dong Wang wrote: > Hello, everybody:-) > I'm interested with the mechanism of data replciation in Solr, In the > "Introduction to the solr enterprise Search Server", Replication is > one of features of Solr, but I can't find anything about replication > issues on the Web site and documents, including how to split the > index, how to distribute the chunks of index, how to placement the > replica, eager replicaton or lazy replication..etc. I think they are > different from the problem in HDFS. > Can anybody help me? Thank you in advance. http://wiki.apache.org/solr/CollectionDistribution HTH > > Best Wishes. -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Embedded SOLR using the SOLR collection distribution
On Sep 5, 2007, at 3:30 AM, Dilip.TS wrote: I would like to know if can implement the Embedded SOLR using the SOLR collection distribution? Partly... the rsync method of getting a master index to the slaves would work, but you'd need a way to to the slaves so that they reload their IndexSearcher's. Erik Regards, Dilip -Original Message- From: mike topper [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 22, 2007 8:29 PM To: solr-user@lucene.apache.org Subject: almost realtime updates with replication Hello, Currently in our application we are using the master/slave setup and have a batch update/commit about every 5 minutes. There are a couple queries that we would like to run almost realtime so I would like to have it so our client sends an update on every new document and then have solr configured to do an autocommit every 5-10 seconds. reading the Wiki, it seems like this isn't possible because of the strain of snapshotting and pulling to the slaves at such a high rate. What I was thinking was for these few queries to just query the master and the rest can query the slave with the not realtime data, although I'm assuming this wouldn't work either because since a snapshot is created on every commit, we would still impact the performance too much? anyone have any suggestions? If I set autowarmingCount=0 would I be able to to pull to the slave faster than every couple of minutes (say, every 10 seconds)? what if I take out the postcommit hook on the master and just have the snapshooter run on a cron every 5 minutes? -Mike
Re: The mechanism of data replciation in Solr?
The front page of the Solr WIki has a small section on replication: http://wiki.apache.org/solr/ Solr's built-in replication does not split the index. It replicate the entire index by only copying files that have changed. Bill On 9/5/07, Dong Wang <[EMAIL PROTECTED]> wrote: > > Hello, everybody:-) > I'm interested with the mechanism of data replciation in Solr, In the > "Introduction to the solr enterprise Search Server", Replication is > one of features of Solr, but I can't find anything about replication > issues on the Web site and documents, including how to split the > index, how to distribute the chunks of index, how to placement the > replica, eager replicaton or lazy replication..etc. I think they are > different from the problem in HDFS. > Can anybody help me? Thank you in advance. > > Best Wishes. >
Indexing very large files.
Hello all, I will apologize up front if this is comes twice. I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. Even on an empty index with one Gig of vm memory it sill won't work. Is it even possible to get Solr to index such large files? Do I need to write a custom index handler? Thanks, Brian
Can't get 1.2 running under Tomcat 5.5
Hi, I'm having no luck getting Solr 1.2 to run under Tomcat 5.5 using context fragments. I've followed the example on wiki: http://wiki.apache.org/solr/SolrTomcat The only thing I've changed is the installation method. I'm using the Tomcat manager to create a context path, and also point to my context config. This worked fine with Solr 1.1. If I tail the Tomcat log (catalina.out) I get this (among other things): java.lang.ArrayIndexOutOfBoundsException Solr does create the data/index directory though along with a few index files. Anyone have any ideas as to what I could be doing wrong? Thanks, Matt
Re: Can't get 1.2 running under Tomcat 5.5
OK found the start of the trail... I had a duplicate entry for fulltext in my schema. Removed that. Now when I first try to deploy Solr, I get this error: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.IndexInfoRequestHandler' Matt On Sep 5, 2007, at 11:25 AM, Matt Mitchell wrote: Hi, I'm having no luck getting Solr 1.2 to run under Tomcat 5.5 using context fragments. I've followed the example on wiki: http://wiki.apache.org/solr/SolrTomcat The only thing I've changed is the installation method. I'm using the Tomcat manager to create a context path, and also point to my context config. This worked fine with Solr 1.1. If I tail the Tomcat log (catalina.out) I get this (among other things): java.lang.ArrayIndexOutOfBoundsException Solr does create the data/index directory though along with a few index files. Anyone have any ideas as to what I could be doing wrong? Thanks, Matt Matt Mitchell Digital Scholarship Services Box 400129 Alderman Library University of Virginia Charlottesville, VA 22904 [EMAIL PROTECTED]
Re: Indexing very large files.
On 9/5/07, Brian Carmalt <[EMAIL PROTECTED]> wrote: > I've bin trying to index a 300MB file to solr 1.2. I keep getting out of > memory heap errors. 300MB of what... a single 300MB document? Or is that file represent multiple documents in XML or CSV format? -Yonik
How to search Case Sensitive words?
Hi All, Now i am facing the problem with case sensitive text. I am indexing smaller case word but when i give the same word in upper case for search, its not getting search. Example : Indexing word : "corent" Searching word : "CORENT". If i search "CORENT" it retrieves nothing. Whether i have to change any configuration? I am using the default configuration. And it has lower case filter also. If any help, appreciated. Regards, V.Nithya. -- View this message in context: http://www.nabble.com/How-to-search-Case-Sensitive-words--tf4386665.html#a12506345 Sent from the Solr - User mailing list archive at Nabble.com.
Re: The mechanism of data replciation in Solr?
Thank you, Thorsten Scherler and Bill Au.I'm so indiscretionary to post this question, Thanks for your patience. Ok, Here comes my new questions, Solr's Wiki says "All the files in the index directory are hard links to the latest snapshot. This technique has these advantages: Can keep multiple snapshots on each host without the need to keep multiple copies of index files that have not changed. File copying from master to slave is very fast...balabala ^^ ^^ " Why do hard links make file copying between master and slave fast? Thanks. Best Regards. -- Wang 2007/9/5, Bill Au <[EMAIL PROTECTED]>: > The front page of the Solr WIki has a small section on replication: > > http://wiki.apache.org/solr/ > > Solr's built-in replication does not split the index. It replicate the > entire index by only copying files that have changed. > > Bill > > > On 9/5/07, Dong Wang <[EMAIL PROTECTED]> wrote: > > > > Hello, everybody:-) > > I'm interested with the mechanism of data replciation in Solr, In the > > "Introduction to the solr enterprise Search Server", Replication is > > one of features of Solr, but I can't find anything about replication > > issues on the Web site and documents, including how to split the > > index, how to distribute the chunks of index, how to placement the > > replica, eager replicaton or lazy replication..etc. I think they are > > different from the problem in HDFS. > > Can anybody help me? Thank you in advance. > > > > Best Wishes. > > >
Re: How to search Case Sensitive words?
I am a pretty new user of Lucene, but I think the simple answer is "what analyzer are you using when you index" and "use the same analyzer when you search". I believe StandardAnalyzer for example does lowercasing, so if you use the same one when you search all should work as you wish.
Re: The mechanism of data replciation in Solr?
: snapshot. This technique has these advantages: Can keep multiple : snapshots on each host without the need to keep multiple copies of : index files that have not changed. File copying from master to slave : Why do hard links make file copying between master and slave fast? : Thanks. Best Regards. bullets 2 and 3 build off of bullet 1 ... the Lucene file format is desigend such that files are only ever added, appended to, or deleted -- there is never in rewriting of existing bytes in a file. so having hardlinks to the orriginal files in the snapshot directories on both the master/slave means that the rsync operation of a new snapshot only needs to send the new data, not diffs or full contents of existing files. -Hoss
Re: Can't get 1.2 running under Tomcat 5.5
: OK found the start of the trail... I had a duplicate entry for fulltext in my : schema. Removed that. Now when I first try to deploy Solr, I get this error: really? defining the same field name twice gave you an ArrayIndexOutofBounds? ... that's bad, i'll open a bug on that. : SEVERE: org.apache.solr.common.SolrException: Error loading class : 'solr.IndexInfoRequestHandler' your solrconfig.xml seems to refer to solr.IndexInfoRequestHandler ... this was a class that was added to the rcunk after Solr 1.1 was released, and was removed before 1.2 was released (all of it's functionality was replaced by the LukeRequestHandler) -Hoss
Tomcat logging
Hi- Here are the lines to add to the end of Tomcat's conf/logging.properties file to get rid of query/update logging noise: org.apache.solr.core.SolrCore.level = WARNING org.apache.solr.handler.XmlUpdateRequestHandler.level = WARNING org.apache.solr.search.SolrIndexSearcher.level = WARNING I would prefer not to get involved in editing the wiki; its generally better to have a few editors. Also, it crosses the line into company property. Also, I'm lazy. Will somebody please add this to the Tomcat page? Thanks, Lance
Re: Distribution Information?
Not that I've noticed. I'll do a more careful grep soon here - I just got back from a long weekend. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Aug 31, 2007, at 6:12 PM, Bill Au wrote: Are there any error message in your appserver log files? Bill On 8/31/07, Matthew Runo <[EMAIL PROTECTED]> wrote: Hello! /solr/admin/distributiondump.jsp This server is set up as a master server, and other servers use the replication scripts to pull updates from it every few minutes. My distribution information screen is blank.. and I couldn't find any information on fixing this in the wiki. Any chance someone would be able to explain how to get this page working, or what I'm doing wrong? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Indexing a URL
Hello, I am trying to post the following to my index: http://www.nytimes.com/2007/08/25/business/worldbusiness/25yuan.html?ex=1345694400&en=499af384a9ebd18f&ei=5088&partner=rssnyt&emc=rss The url field is defined as: However, I get the following error: Posting file docstor/ffc110ee5c9a2ed28c8f35aa243bb53b.xml to http://localhost:8983/news_feed/update Error 500 HTTP ERROR: 500ParseError at [row,col]:[3,104] Message: The reference to entity "en" must end with the ';' delimiter. It is apparently attempting to parse &en=499af384a9ebd18f in the URL. I am not clear why it would do this as I specified indexed="false." I need to store this because that is how the user gets to the original article. Is there any data type that simply ignores the characters in the field? I don't care that it can't be a search field. I've tried the "ignored" field type and it still gives me the same error. Thanks, Bill
Re: Indexing a URL
It is apparently attempting to parse &en=499af384a9ebd18f in the URL. I am not clear why it would do this as I specified indexed="false." I need to store this because that is how the user gets to the original article. the ampersand is an XML reserved character. you have to escape it (turn it into &), whether you are indexing the data or not. Nothing to do w/ Solr, just xml files in general. Whatever you're using to render the xml should be able to handle this for you.
Re: Distribution Information?
When I load the distrobutiondump.jsp, there is no output in my catalina.out file. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Sep 5, 2007, at 1:55 PM, Matthew Runo wrote: Not that I've noticed. I'll do a more careful grep soon here - I just got back from a long weekend. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Aug 31, 2007, at 6:12 PM, Bill Au wrote: Are there any error message in your appserver log files? Bill On 8/31/07, Matthew Runo <[EMAIL PROTECTED]> wrote: Hello! /solr/admin/distributiondump.jsp This server is set up as a master server, and other servers use the replication scripts to pull updates from it every few minutes. My distribution information screen is blank.. and I couldn't find any information on fixing this in the wiki. Any chance someone would be able to explain how to get this page working, or what I'm doing wrong? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: Replication broken.. no helpful errors?
It seems that the scripts cannot open new searchers at the end of the process, for some reason. Here's a message from cron, but I'm not sure what to make of it... It looks like the files properly copied over, but failed the install. I removed the temp* directory, but still SOLR could not launch a new searcher. I don't see any activity in catalina.out though... started by tomcat5 command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/ data -S /opt/solr/logs -d /opt/solr/data -v pulling snapshot temp-snapshot.20070905150504 receiving file list ... done deleting segments_1ine deleting _164h_1.del deleting _164h.tis deleting _164h.tii deleting _164h.prx deleting _164h.nrm deleting _164h.frq deleting _164h.fnm deleting _164h.fdx deleting _164h.fdt deleting _164g_1.del deleting _164g.tis deleting _164g.tii deleting _164g.prx deleting _164g.nrm deleting _164g.frq deleting _164g.fnm deleting _164g.fdx deleting _164g.fdt deleting _164f_1.del deleting _164f.tis deleting _164f.tii deleting _164f.prx deleting _164f.nrm deleting _164f.frq deleting _164f.fnm deleting _164f.fdx deleting _164f.fdt deleting _164e_1.del deleting _164e.tis deleting _164e.tii deleting _164e.prx deleting _164e.nrm deleting _164e.frq deleting _164e.fnm deleting _164e.fdx deleting _164e.fdt deleting _164d_1.del deleting _164d.tis deleting _164d.tii deleting _164d.prx deleting _164d.nrm deleting _164d.frq deleting _164d.fnm deleting _164d.fdx deleting _164d.fdt deleting _164c_1.del deleting _164c.tis deleting _164c.tii deleting _164c.prx deleting _164c.nrm deleting _164c.frq deleting _164c.fnm deleting _164c.fdx deleting _164c.fdt deleting _164b_1.del deleting _164b.tis deleting _164b.tii deleting _164b.prx deleting _164b.nrm deleting _164b.frq deleting _164b.fnm deleting _164b.fdx deleting _164b.fdt deleting _164a_1.del deleting _164a.tis deleting _164a.tii deleting _164a.prx deleting _164a.nrm deleting _164a.frq deleting _164a.fnm deleting _164a.fdx deleting _164a.fdt deleting _163z_3.del deleting _163z.tis deleting _163z.tii deleting _163z.prx deleting _163z.nrm deleting _163z.frq deleting _163z.fnm deleting _163z.fdx deleting _163z.fdt deleting _163o_3.del deleting _163o.tis deleting _163o.tii deleting _163o.prx deleting _163o.nrm deleting _163o.frq deleting _163o.fnm deleting _163o.fdx deleting _163o.fdt deleting _163d_4.del deleting _163d.tis deleting _163d.tii deleting _163d.prx deleting _163d.nrm deleting _163d.frq deleting _163d.fnm deleting _163d.fdx deleting _163d.fdt deleting _1632_6.del deleting _1632.tis deleting _1632.tii deleting _1632.prx deleting _1632.nrm deleting _1632.frq deleting _1632.fnm deleting _1632.fdx deleting _1632.fdt deleting _162r_7.del deleting _162r.tis deleting _162r.tii deleting _162r.prx deleting _162r.nrm deleting _162r.frq deleting _162r.fnm deleting _162r.fdx deleting _162r.fdt deleting _162g_d.del deleting _162g.tis deleting _162g.tii deleting _162g.prx deleting _162g.nrm deleting _162g.frq deleting _162g.fnm deleting _162g.fdx deleting _162g.fdt deleting _1625_m.del deleting _1625.tis deleting _1625.tii deleting _1625.prx deleting _1625.nrm deleting _1625.frq deleting _1625.fnm deleting _1625.fdx deleting _1625.fdt deleting _161u_w.del deleting _161u.tis deleting _161u.tii deleting _161u.prx deleting _161u.nrm deleting _161u.frq deleting _161u.fnm deleting _161u.fdx deleting _161u.fdt deleting _161j_16.del ./ _161j_17.del _164m.fdt _164m.fdx _164m.fnm _164m.frq _164m.nrm _164m.prx _164m.tii _164m.tis _164m_1.del _164x.fdt _164x.fdx _164x.fnm _164x.frq _164x.nrm _164x.prx _164x.tii _164x.tis _164x_1.del segments.gen segments_1inv sent 516 bytes received 105864302 bytes 30247090.86 bytes/sec total size is 966107226 speedup is 9.13 + [[ -z search1 ]] + [[ -z /opt/solr/logs ]] + fixUser -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -z tomcat5 ]] ++ whoami + [[ tomcat5 != tomcat5 ]] ++ who -m ++ cut '-d ' -f1 ++ sed '-es/^.*!//' + oldwhoami= + [[ '' == '' ]] +++ pgrep -g0 snapinstaller ++ tail -1 ++ cut -f1 '-d ' ++ ps h -Hfp 3621 3629 3630 3631 + oldwhoami=tomcat5 + [[ -z /opt/solr/data ]] ++ echo /opt/solr/data ++ cut -c1 + [[ / != \/ ]] ++ echo /opt/solr/logs ++ cut -c1 + [[ / != \/ ]] ++ date +%s + start=1189030205 + logMessage started by tomcat5 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 started by tomcat5 + [[ -n '' ]] + logMessage command: /opt/solr/bin/snapinstaller -M search1 -S /opt/ solr/logs -d /opt/solr/data -V ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 command: /opt/solr/bin/snapinstaller -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -n '' ]] ++ ls /opt/solr/data ++ grep 'snapshot\.' ++ grep -v wip ++ sort -r ++ head -1 + name=temp-snapshot.20070905150504 + trap 'echo "caught INT/TERM, exiting now but partial installation may have already occured";/bin/rm -rf ${data_dir"/index.tmp$$;logExit aborted 13' INT TERM + [[ temp-snapshot.20070905150504 == '' ]] + name=/opt/solr/data/temp-snapsh
Re: Replication broken.. no helpful errors?
If it helps anyone, this index is around a gig in size. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Sep 5, 2007, at 3:14 PM, Matthew Runo wrote: It seems that the scripts cannot open new searchers at the end of the process, for some reason. Here's a message from cron, but I'm not sure what to make of it... It looks like the files properly copied over, but failed the install. I removed the temp* directory, but still SOLR could not launch a new searcher. I don't see any activity in catalina.out though... started by tomcat5 command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/ data -S /opt/solr/logs -d /opt/solr/data -v pulling snapshot temp-snapshot.20070905150504 receiving file list ... done deleting segments_1ine deleting _164h_1.del deleting _164h.tis deleting _164h.tii deleting _164h.prx deleting _164h.nrm deleting _164h.frq deleting _164h.fnm deleting _164h.fdx deleting _164h.fdt deleting _164g_1.del deleting _164g.tis deleting _164g.tii deleting _164g.prx deleting _164g.nrm deleting _164g.frq deleting _164g.fnm deleting _164g.fdx deleting _164g.fdt deleting _164f_1.del deleting _164f.tis deleting _164f.tii deleting _164f.prx deleting _164f.nrm deleting _164f.frq deleting _164f.fnm deleting _164f.fdx deleting _164f.fdt deleting _164e_1.del deleting _164e.tis deleting _164e.tii deleting _164e.prx deleting _164e.nrm deleting _164e.frq deleting _164e.fnm deleting _164e.fdx deleting _164e.fdt deleting _164d_1.del deleting _164d.tis deleting _164d.tii deleting _164d.prx deleting _164d.nrm deleting _164d.frq deleting _164d.fnm deleting _164d.fdx deleting _164d.fdt deleting _164c_1.del deleting _164c.tis deleting _164c.tii deleting _164c.prx deleting _164c.nrm deleting _164c.frq deleting _164c.fnm deleting _164c.fdx deleting _164c.fdt deleting _164b_1.del deleting _164b.tis deleting _164b.tii deleting _164b.prx deleting _164b.nrm deleting _164b.frq deleting _164b.fnm deleting _164b.fdx deleting _164b.fdt deleting _164a_1.del deleting _164a.tis deleting _164a.tii deleting _164a.prx deleting _164a.nrm deleting _164a.frq deleting _164a.fnm deleting _164a.fdx deleting _164a.fdt deleting _163z_3.del deleting _163z.tis deleting _163z.tii deleting _163z.prx deleting _163z.nrm deleting _163z.frq deleting _163z.fnm deleting _163z.fdx deleting _163z.fdt deleting _163o_3.del deleting _163o.tis deleting _163o.tii deleting _163o.prx deleting _163o.nrm deleting _163o.frq deleting _163o.fnm deleting _163o.fdx deleting _163o.fdt deleting _163d_4.del deleting _163d.tis deleting _163d.tii deleting _163d.prx deleting _163d.nrm deleting _163d.frq deleting _163d.fnm deleting _163d.fdx deleting _163d.fdt deleting _1632_6.del deleting _1632.tis deleting _1632.tii deleting _1632.prx deleting _1632.nrm deleting _1632.frq deleting _1632.fnm deleting _1632.fdx deleting _1632.fdt deleting _162r_7.del deleting _162r.tis deleting _162r.tii deleting _162r.prx deleting _162r.nrm deleting _162r.frq deleting _162r.fnm deleting _162r.fdx deleting _162r.fdt deleting _162g_d.del deleting _162g.tis deleting _162g.tii deleting _162g.prx deleting _162g.nrm deleting _162g.frq deleting _162g.fnm deleting _162g.fdx deleting _162g.fdt deleting _1625_m.del deleting _1625.tis deleting _1625.tii deleting _1625.prx deleting _1625.nrm deleting _1625.frq deleting _1625.fnm deleting _1625.fdx deleting _1625.fdt deleting _161u_w.del deleting _161u.tis deleting _161u.tii deleting _161u.prx deleting _161u.nrm deleting _161u.frq deleting _161u.fnm deleting _161u.fdx deleting _161u.fdt deleting _161j_16.del ./ _161j_17.del _164m.fdt _164m.fdx _164m.fnm _164m.frq _164m.nrm _164m.prx _164m.tii _164m.tis _164m_1.del _164x.fdt _164x.fdx _164x.fnm _164x.frq _164x.nrm _164x.prx _164x.tii _164x.tis _164x_1.del segments.gen segments_1inv sent 516 bytes received 105864302 bytes 30247090.86 bytes/sec total size is 966107226 speedup is 9.13 + [[ -z search1 ]] + [[ -z /opt/solr/logs ]] + fixUser -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -z tomcat5 ]] ++ whoami + [[ tomcat5 != tomcat5 ]] ++ who -m ++ cut '-d ' -f1 ++ sed '-es/^.*!//' + oldwhoami= + [[ '' == '' ]] +++ pgrep -g0 snapinstaller ++ tail -1 ++ cut -f1 '-d ' ++ ps h -Hfp 3621 3629 3630 3631 + oldwhoami=tomcat5 + [[ -z /opt/solr/data ]] ++ echo /opt/solr/data ++ cut -c1 + [[ / != \/ ]] ++ echo /opt/solr/logs ++ cut -c1 + [[ / != \/ ]] ++ date +%s + start=1189030205 + logMessage started by tomcat5 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 started by tomcat5 + [[ -n '' ]] + logMessage command: /opt/solr/bin/snapinstaller -M search1 -S / opt/solr/logs -d /opt/solr/data -V ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 command: /opt/solr/bin/snapinstaller -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -n '' ]] ++ ls /opt/solr/data ++ grep 'snapshot\.' ++ grep -v
Re: Can't get 1.2 running under Tomcat 5.5
On Sep 5, 2007, at 11:37 AM, Matt Mitchell wrote: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.IndexInfoRequestHandler' You're using my old hand-built version of Solr, I suspect. Hoss explained it fully in his previous message on this thread. Care needs to be taken when upgrading Solr but leaving solrconfig.xml untouched because additional config may be necessary. Comparing your solrconfig.xml with the one that ships with the example app of the version of Solr you're upgrading too is recommended. Erik
Re: Can't get 1.2 running under Tomcat 5.5
: Care needs to be taken when upgrading Solr but leaving solrconfig.xml : untouched because additional config may be necessary. Comparing your : solrconfig.xml with the one that ships with the example app of the version of : Solr you're upgrading too is recommended. Hmmm... that's kind of a scary statement, and it may misslead people into thinking that they need to throw away their configs when updating and start over with the newest examples -- that's certianly not true. I think it's safe to say that if you are using official releases of Solr and not trunk builds, then either: * any "old" config files will continue to work as is OR: * any known config syntax which no longer works exactly the same way will be called out loudly in the CHANGES.txt files fo the release. If however you are using a nightly snapshot, items that work in your config may not continue to work in future versions as functionality is tweaked and revised. However: Erik's point about comparing your configs with the examples is still a good idea -- because their may be cool new features that you'd like to take advantage of that dont immediately jump out at you when looking at the CHANGES.txt file, but do when looking at sample configs. -Hoss
Re: Can't get 1.2 running under Tomcat 5.5
I guess my warning is more because I play on the edge and have several times ended up tweaking various apps solrconfig.xml's as I upgraded them to keep things working. Anyway, we'll all agree that diff'ing your config files with the example app can be useful. Erik On Sep 5, 2007, at 9:26 PM, Chris Hostetter wrote: : Care needs to be taken when upgrading Solr but leaving solrconfig.xml : untouched because additional config may be necessary. Comparing your : solrconfig.xml with the one that ships with the example app of the version of : Solr you're upgrading too is recommended. Hmmm... that's kind of a scary statement, and it may misslead people into thinking that they need to throw away their configs when updating and start over with the newest examples -- that's certianly not true. I think it's safe to say that if you are using official releases of Solr and not trunk builds, then either: * any "old" config files will continue to work as is OR: * any known config syntax which no longer works exactly the same way will be called out loudly in the CHANGES.txt files fo the release. If however you are using a nightly snapshot, items that work in your config may not continue to work in future versions as functionality is tweaked and revised. However: Erik's point about comparing your configs with the examples is still a good idea -- because their may be cool new features that you'd like to take advantage of that dont immediately jump out at you when looking at the CHANGES.txt file, but do when looking at sample configs. -Hoss
Re: Can't get 1.2 running under Tomcat 5.5
Not really. It is a very poor substitute for reading the release notes, and sufficiently inadequate that it might not be worth the time. Diffing the example with the previous release is probably more instructive, but might or might not help for your application. A config file checker would be useful. wunder On 9/5/07 6:55 PM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote: > Anyway, we'll all agree that diff'ing your config files with the > example app can be useful.
Re: Indexing very large files.
On Wed, 05 Sep 2007 17:18:09 +0200 Brian Carmalt <[EMAIL PROTECTED]> wrote: > I've bin trying to index a 300MB file to solr 1.2. I keep getting out of > memory heap errors. > Even on an empty index with one Gig of vm memory it sill won't work. Hi Brian, VM != heap memory. VM = OS memory heap memory = memory made available by the JavaVM to the Java process. Heap memory errors are hardly ever an issue of the app itself (other , of course, with bad programming... but it doesnt seem to be issue here so far) [EMAIL PROTECTED] [Thu Sep 6 14:59:21 2007] /usr/home/betom $ java -X [...] -Xmsset initial Java heap size -Xmxset maximum Java heap size -Xssset java thread stack size [...] For example, start solr as : java -Xms64m -Xmx512m -jar start.jar YMMV with respect to the actual values you use. Good luck, B _ {Beto|Norberto|Numard} Meijome Windows caters to everyone as though they are idiots. UNIX makes no such assumption. It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Indexing very large files.
Yonik Seeley schrieb: On 9/5/07, Brian Carmalt <[EMAIL PROTECTED]> wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. 300MB of what... a single 300MB document? Or is that file represent multiple documents in XML or CSV format? -Yonik Hello Yonik, Thank you for your fast reply. It is one large document. If it was made up of smaller docs, I would split it up and index them separately. Can Solr be made to handle such large docs? Thanks, Brian
Re: Indexing very large files.
Hello again, I run Solr on Tomcat under windows and use the tomcat monitor to start the service. I have set the minimum heap size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of ram. The error that I get after sending approximately 300 MB is: java.lang.OutOfMemoryError: Java heap space at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2947) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:261) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:581) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) After sleeping on the problem I see that it does not directly stem from Solr, but from the module org.xmlpull.mxp1.MXParser. Hmmm. I'm open to sugestions and ideas. First is this doable? If yes, will I have to modify the code to save the file to disk and then read it back in order to index it in chunks. Or can I get it it working on a stock Solr install. Thanks, Brian Norberto Meijome schrieb: On Wed, 05 Sep 2007 17:18:09 +0200 Brian Carmalt <[EMAIL PROTECTED]> wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. Even on an empty index with one Gig of vm memory it sill won't work. Hi Brian, VM != heap memory. VM = OS memory heap memory = memory made available by the JavaVM to the Java process. Heap memory errors are hardly ever an issue of the app itself (other , of course, with bad programming... but it doesnt seem to be issue here so far) [EMAIL PROTECTED] [Thu Sep 6 14:59:21 2007] /usr/home/betom $ java -X [...] -Xmsset initial Java heap size -Xmxset maximum Java heap size -Xssset java thread stack size [...] For example, start solr as : java -Xms64m -Xmx512m -jar start.jar YMMV with respect to the actual values you use. Good luck, B _ {Beto|Norberto|Numard} Meijome Windows caters to everyone as though they are idiots. UNIX makes no such assumption. It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.