Embedded SOLR using the SOLR collection distribution

2007-09-05 Thread Dilip.TS
Hello,

 I would like to know if can implement the Embedded SOLR using the SOLR
collection distribution?


Regards,
Dilip


-Original Message-
From: mike topper [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 22, 2007 8:29 PM
To: solr-user@lucene.apache.org
Subject: almost realtime updates with replication


Hello,

Currently in our application we are using the master/slave setup and
have a batch update/commit about every 5 minutes.

There are a couple queries that we would like to run almost realtime so
I would like to have it so our client sends an update on every new
document and then have solr configured to do an autocommit every 5-10
seconds.

reading the Wiki, it seems like this isn't possible because of the
strain of snapshotting and pulling to the slaves at such a high rate.
What I was thinking was for these few queries to just query the master
and the rest can query the slave with the not realtime data, although
I'm assuming this wouldn't work either because since a snapshot is
created on every commit, we would still impact the performance too much?

anyone have any suggestions?  If I set autowarmingCount=0 would I be
able to to pull to the slave faster than every couple of minutes (say,
every 10 seconds)?

what if I take out the postcommit hook on the master and just have the
snapshooter run on a cron every 5 minutes?

-Mike



The mechanism of data replciation in Solr?

2007-09-05 Thread Dong Wang
Hello, everybody:-)
I'm interested with the mechanism of data replciation in Solr, In the
"Introduction to the solr enterprise Search Server", Replication is
one of features of Solr, but I can't find anything about replication
issues on the Web site and documents, including how to split the
index, how to distribute the chunks of index, how to placement the
replica, eager replicaton  or lazy replication..etc. I think  they are
different from the problem in HDFS.
Can anybody help me? Thank you in advance.

Best Wishes.


Re: Indexing longer documents using Solr...memory issue after index grows to about 800 MB...

2007-09-05 Thread Ravish Bhagdev
thanks for your reply, my response below:

On 9/5/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> On 4-Sep-07, at 4:50 PM, Ravish Bhagdev wrote:
>
> > - I have about 11K html documents to index.
> > - I'm trying to index these documents (along with 3 more small string
> > fields) so that when I search within the "doc" field (field with the
> > html file content), I can get results with snippets or highlights as I
> > get when using nutch.
> > - While going through Wiki I noticed that if I need to do highlighting
> > in a particular field, I have to make sure it is indexed and stored.
> >
> > But when I try to do the above, after indexing about 3K files which
> > creates index of about 800MB (which is fine as files are quite
> > lengthy) it keeps giving out of heap space errors.
> >
> > Things I've tried without much help:
> >
> > - Increase memory of tomcat
> > - Play around with settings like autoCommit (documents and time)
> > - Reducing mergefactor to 5
> > - Reducing maxBufferedDocs to 100
>
> Merge factor should not affect memory usage.  You say that you
> increased memory usage.. but to what?  I've found reducing
> maxBuffered Docs decreases my peak memory usage significantly.
>

OK

> > My question is also, if its required to store fields in index to be
> > able to do highlighting/returning field content, how does nutch/lucene
> > do it without that (because index for same documents created using
> > nutch is much much smaller)
>
> Are you sure that it doesn't?  According to:
>
> http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary-
> basic/src/java/org/apache/nutch/summary/basic/BasicSummarizer.java?
> view=markup
>
> nutch does indeed take the stored text and re-analyses it when
> generating a summary.  Does nutch perhaps store less content of a
> document, or in a different store?
>

I am not sure what it does internally but my educated guess is it
doesn't store entire documents in index (going by index size).  It is
way too small when created using nutch to store entire documents
(pretty sure of this part)

> > But also when trying to query partially added documents, when I set
> > field highlight on (and a particular field) it doesn't seem to have
> > any effect.
>
> Does the field contain a match against one of the terms you are
> querying for?
>

Yup

> -Mike
>
>
>
Cheers,
Ravi


Re: The mechanism of data replciation in Solr?

2007-09-05 Thread Thorsten Scherler
On Wed, 2007-09-05 at 15:56 +0800, Dong Wang wrote:
> Hello, everybody:-)
> I'm interested with the mechanism of data replciation in Solr, In the
> "Introduction to the solr enterprise Search Server", Replication is
> one of features of Solr, but I can't find anything about replication
> issues on the Web site and documents, including how to split the
> index, how to distribute the chunks of index, how to placement the
> replica, eager replicaton  or lazy replication..etc. I think  they are
> different from the problem in HDFS.
> Can anybody help me? Thank you in advance.

http://wiki.apache.org/solr/CollectionDistribution

HTH
> 
> Best Wishes.
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: Embedded SOLR using the SOLR collection distribution

2007-09-05 Thread Erik Hatcher


On Sep 5, 2007, at 3:30 AM, Dilip.TS wrote:
 I would like to know if can implement the Embedded SOLR using the  
SOLR

collection distribution?


Partly... the rsync method of getting a master index to the slaves  
would work, but you'd need a way to  to the slaves so that  
they reload their IndexSearcher's.


Erik





Regards,
Dilip


-Original Message-
From: mike topper [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 22, 2007 8:29 PM
To: solr-user@lucene.apache.org
Subject: almost realtime updates with replication


Hello,

Currently in our application we are using the master/slave setup and
have a batch update/commit about every 5 minutes.

There are a couple queries that we would like to run almost  
realtime so

I would like to have it so our client sends an update on every new
document and then have solr configured to do an autocommit every 5-10
seconds.

reading the Wiki, it seems like this isn't possible because of the
strain of snapshotting and pulling to the slaves at such a high rate.
What I was thinking was for these few queries to just query the master
and the rest can query the slave with the not realtime data, although
I'm assuming this wouldn't work either because since a snapshot is
created on every commit, we would still impact the performance too  
much?


anyone have any suggestions?  If I set autowarmingCount=0 would I be
able to to pull to the slave faster than every couple of minutes (say,
every 10 seconds)?

what if I take out the postcommit hook on the master and just have the
snapshooter run on a cron every 5 minutes?

-Mike




Re: The mechanism of data replciation in Solr?

2007-09-05 Thread Bill Au
The front page of the Solr WIki has a small section on replication:

http://wiki.apache.org/solr/

Solr's built-in replication does not split the index.  It replicate the
entire index by only copying files that have changed.

Bill


On 9/5/07, Dong Wang <[EMAIL PROTECTED]> wrote:
>
> Hello, everybody:-)
> I'm interested with the mechanism of data replciation in Solr, In the
> "Introduction to the solr enterprise Search Server", Replication is
> one of features of Solr, but I can't find anything about replication
> issues on the Web site and documents, including how to split the
> index, how to distribute the chunks of index, how to placement the
> replica, eager replicaton  or lazy replication..etc. I think  they are
> different from the problem in HDFS.
> Can anybody help me? Thank you in advance.
>
> Best Wishes.
>


Indexing very large files.

2007-09-05 Thread Brian Carmalt

Hello all,

I will apologize up front if this is comes twice.

I've bin trying to index a 300MB file to solr 1.2. I keep getting out of 
memory heap errors.

Even on an empty index with one Gig of vm memory it sill won't work.
Is it even possible to get Solr to index such large files?
Do I need to write a custom index handler?

Thanks,  Brian



Can't get 1.2 running under Tomcat 5.5

2007-09-05 Thread Matt Mitchell

Hi,

I'm having no luck getting Solr 1.2 to run under Tomcat 5.5 using  
context fragments. I've followed the example on wiki:

http://wiki.apache.org/solr/SolrTomcat

The only thing I've changed is the installation method. I'm using the  
Tomcat manager to create a context path, and also point to my context  
config. This worked fine with Solr 1.1. If I tail the Tomcat log  
(catalina.out) I get this (among other things):


java.lang.ArrayIndexOutOfBoundsException

Solr does create the data/index directory though along with a few  
index files.


Anyone have any ideas as to what I could be doing wrong?

Thanks,
Matt


Re: Can't get 1.2 running under Tomcat 5.5

2007-09-05 Thread Matt Mitchell
OK found the start of the trail... I had a duplicate entry for  
fulltext in my schema. Removed that. Now when I first try to deploy  
Solr, I get this error:


SEVERE: org.apache.solr.common.SolrException: Error loading class  
'solr.IndexInfoRequestHandler'


Matt

On Sep 5, 2007, at 11:25 AM, Matt Mitchell wrote:


Hi,

I'm having no luck getting Solr 1.2 to run under Tomcat 5.5 using  
context fragments. I've followed the example on wiki:

http://wiki.apache.org/solr/SolrTomcat

The only thing I've changed is the installation method. I'm using  
the Tomcat manager to create a context path, and also point to my  
context config. This worked fine with Solr 1.1. If I tail the  
Tomcat log (catalina.out) I get this (among other things):


java.lang.ArrayIndexOutOfBoundsException

Solr does create the data/index directory though along with a few  
index files.


Anyone have any ideas as to what I could be doing wrong?

Thanks,
Matt


Matt Mitchell
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904

[EMAIL PROTECTED]




Re: Indexing very large files.

2007-09-05 Thread Yonik Seeley
On 9/5/07, Brian Carmalt <[EMAIL PROTECTED]> wrote:
> I've bin trying to index a 300MB file to solr 1.2. I keep getting out of
> memory heap errors.

300MB of what... a single 300MB document?  Or is that file represent
multiple documents in XML or CSV format?

-Yonik


How to search Case Sensitive words?

2007-09-05 Thread nithyavembu

Hi All,

Now i am facing the problem with case sensitive text. I am indexing
smaller case word but when i give the same word in upper case for search,
its not getting search.
Example : Indexing word : "corent"
  Searching word : "CORENT".
   If i search "CORENT" it retrieves nothing. Whether i have to change any
configuration? I am using the default configuration. And it has lower case
filter also.
   If any help, appreciated. 

Regards,
V.Nithya.
-- 
View this message in context: 
http://www.nabble.com/How-to-search-Case-Sensitive-words--tf4386665.html#a12506345
Sent from the Solr - User mailing list archive at Nabble.com.



Re: The mechanism of data replciation in Solr?

2007-09-05 Thread Dong Wang
Thank you, Thorsten Scherler and Bill Au.I'm so indiscretionary to
post this question, Thanks for your patience.
Ok, Here comes my new questions, Solr's Wiki says
"All the files in the index directory are hard links to the latest
snapshot. This technique has these advantages: Can keep multiple
snapshots on each host without the need to keep multiple copies of
index files that have not changed. File copying from master to slave
is
very fast...balabala
^^
^^
"
Why do hard links make file copying between master and slave fast?
Thanks. Best Regards.

--
Wang

2007/9/5, Bill Au <[EMAIL PROTECTED]>:
> The front page of the Solr WIki has a small section on replication:
>
> http://wiki.apache.org/solr/
>
> Solr's built-in replication does not split the index.  It replicate the
> entire index by only copying files that have changed.
>
> Bill
>
>
> On 9/5/07, Dong Wang <[EMAIL PROTECTED]> wrote:
> >
> > Hello, everybody:-)
> > I'm interested with the mechanism of data replciation in Solr, In the
> > "Introduction to the solr enterprise Search Server", Replication is
> > one of features of Solr, but I can't find anything about replication
> > issues on the Web site and documents, including how to split the
> > index, how to distribute the chunks of index, how to placement the
> > replica, eager replicaton  or lazy replication..etc. I think  they are
> > different from the problem in HDFS.
> > Can anybody help me? Thank you in advance.
> >
> > Best Wishes.
> >
>


Re: How to search Case Sensitive words?

2007-09-05 Thread Donna L Gresh
I am a pretty new user of Lucene, but I think the simple answer is "what
analyzer are you using when you index" and "use the same analyzer 
when you search". I believe StandardAnalyzer for example does
lowercasing, so if you use the same one when you search all should 
work as you wish.

Re: The mechanism of data replciation in Solr?

2007-09-05 Thread Chris Hostetter
: snapshot. This technique has these advantages: Can keep multiple
: snapshots on each host without the need to keep multiple copies of
: index files that have not changed. File copying from master to slave

: Why do hard links make file copying between master and slave fast?
: Thanks. Best Regards.

bullets 2 and 3 build off of bullet 1 ... the Lucene file format is 
desigend such that files are only ever added, appended to, or deleted -- 
there is never in rewriting of existing bytes in a file.  so having 
hardlinks to the orriginal files in the snapshot directories on both 
the master/slave means that the rsync operation of a new snapshot only 
needs to send the new data, not diffs or full contents of existing files.



-Hoss



Re: Can't get 1.2 running under Tomcat 5.5

2007-09-05 Thread Chris Hostetter

: OK found the start of the trail... I had a duplicate entry for fulltext in my
: schema. Removed that. Now when I first try to deploy Solr, I get this error:

really? defining the same field name twice gave you an
ArrayIndexOutofBounds? ...  that's bad, i'll open a bug on that.

: SEVERE: org.apache.solr.common.SolrException: Error loading class
: 'solr.IndexInfoRequestHandler'

your solrconfig.xml seems to refer to solr.IndexInfoRequestHandler ... 
this was a class that was added to the rcunk after Solr 1.1 was released, 
and was removed before 1.2 was released (all of it's functionality was 
replaced by the LukeRequestHandler)




-Hoss



Tomcat logging

2007-09-05 Thread Lance Norskog
Hi-
Here are the lines to add to the end of Tomcat's conf/logging.properties
file to get rid of query/update logging noise:
 
org.apache.solr.core.SolrCore.level = WARNING
org.apache.solr.handler.XmlUpdateRequestHandler.level = WARNING
org.apache.solr.search.SolrIndexSearcher.level = WARNING

 
I would prefer not to get involved in editing the wiki; its generally better
to have a few editors. Also, it crosses the line into company property.
Also, I'm lazy. Will somebody please add this to the Tomcat page?
 
Thanks,
 
Lance


Re: Distribution Information?

2007-09-05 Thread Matthew Runo
Not that I've noticed. I'll do a more careful grep soon here - I just  
got back from a long weekend.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Aug 31, 2007, at 6:12 PM, Bill Au wrote:


Are there any error message in your appserver log files?

Bill

On 8/31/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Hello!

/solr/admin/distributiondump.jsp

This server is set up as a master server, and other servers use the
replication scripts to pull updates from it every few minutes. My
distribution information screen is blank.. and I couldn't find any
information on fixing this in the wiki.

Any chance someone would be able to explain how to get this page
working, or what I'm doing wrong?

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++









Indexing a URL

2007-09-05 Thread Bill Fowler
Hello,

I am trying to post the following to my index:

http://www.nytimes.com/2007/08/25/business/worldbusiness/25yuan.html?ex=1345694400&en=499af384a9ebd18f&ei=5088&partner=rssnyt&emc=rss


The url field is defined as:

   

However, I get the following error:

Posting file docstor/ffc110ee5c9a2ed28c8f35aa243bb53b.xml to
http://localhost:8983/news_feed/update



Error 500 

HTTP ERROR: 500ParseError at [row,col]:[3,104]
Message: The reference to entity "en" must end with the ';' delimiter.

It is apparently attempting to parse &en=499af384a9ebd18f in the URL.  I am
not clear why it would do this as I specified indexed="false."  I need to
store this because that is how the user gets to the original article.

Is there any data type that simply ignores the characters in the field?  I
don't care that it can't be a search field.  I've tried the "ignored" field
type and it still gives me the same error.

Thanks,

Bill


Re: Indexing a URL

2007-09-05 Thread Brian Whitman


It is apparently attempting to parse &en=499af384a9ebd18f in the  
URL.  I am
not clear why it would do this as I specified indexed="false."  I  
need to

store this because that is how the user gets to the original article.


the ampersand is an XML reserved character. you have to escape it  
(turn it into &), whether you are indexing the data or not.  
Nothing to do w/ Solr, just xml files in general. Whatever you're  
using to render the xml should be able to handle this for you.





Re: Distribution Information?

2007-09-05 Thread Matthew Runo
When I load the distrobutiondump.jsp, there is no output in my  
catalina.out file.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Sep 5, 2007, at 1:55 PM, Matthew Runo wrote:

Not that I've noticed. I'll do a more careful grep soon here - I  
just got back from a long weekend.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Aug 31, 2007, at 6:12 PM, Bill Au wrote:


Are there any error message in your appserver log files?

Bill

On 8/31/07, Matthew Runo <[EMAIL PROTECTED]> wrote:

Hello!

/solr/admin/distributiondump.jsp

This server is set up as a master server, and other servers use the
replication scripts to pull updates from it every few minutes. My
distribution information screen is blank.. and I couldn't find any
information on fixing this in the wiki.

Any chance someone would be able to explain how to get this page
working, or what I'm doing wrong?

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++











Re: Replication broken.. no helpful errors?

2007-09-05 Thread Matthew Runo
It seems that the scripts cannot open new searchers at the end of the  
process, for some reason. Here's a message from cron, but I'm not  
sure what to make of it... It looks like the files properly copied  
over, but failed the install. I removed the temp* directory, but  
still SOLR could not launch a new searcher. I don't see any activity  
in catalina.out though...



started by tomcat5
command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/ 
data -S /opt/solr/logs -d /opt/solr/data -v

pulling snapshot temp-snapshot.20070905150504
receiving file list ... done
deleting segments_1ine
deleting _164h_1.del
deleting _164h.tis
deleting _164h.tii
deleting _164h.prx
deleting _164h.nrm
deleting _164h.frq
deleting _164h.fnm
deleting _164h.fdx
deleting _164h.fdt
deleting _164g_1.del
deleting _164g.tis
deleting _164g.tii
deleting _164g.prx
deleting _164g.nrm
deleting _164g.frq
deleting _164g.fnm
deleting _164g.fdx
deleting _164g.fdt
deleting _164f_1.del
deleting _164f.tis
deleting _164f.tii
deleting _164f.prx
deleting _164f.nrm
deleting _164f.frq
deleting _164f.fnm
deleting _164f.fdx
deleting _164f.fdt
deleting _164e_1.del
deleting _164e.tis
deleting _164e.tii
deleting _164e.prx
deleting _164e.nrm
deleting _164e.frq
deleting _164e.fnm
deleting _164e.fdx
deleting _164e.fdt
deleting _164d_1.del
deleting _164d.tis
deleting _164d.tii
deleting _164d.prx
deleting _164d.nrm
deleting _164d.frq
deleting _164d.fnm
deleting _164d.fdx
deleting _164d.fdt
deleting _164c_1.del
deleting _164c.tis
deleting _164c.tii
deleting _164c.prx
deleting _164c.nrm
deleting _164c.frq
deleting _164c.fnm
deleting _164c.fdx
deleting _164c.fdt
deleting _164b_1.del
deleting _164b.tis
deleting _164b.tii
deleting _164b.prx
deleting _164b.nrm
deleting _164b.frq
deleting _164b.fnm
deleting _164b.fdx
deleting _164b.fdt
deleting _164a_1.del
deleting _164a.tis
deleting _164a.tii
deleting _164a.prx
deleting _164a.nrm
deleting _164a.frq
deleting _164a.fnm
deleting _164a.fdx
deleting _164a.fdt
deleting _163z_3.del
deleting _163z.tis
deleting _163z.tii
deleting _163z.prx
deleting _163z.nrm
deleting _163z.frq
deleting _163z.fnm
deleting _163z.fdx
deleting _163z.fdt
deleting _163o_3.del
deleting _163o.tis
deleting _163o.tii
deleting _163o.prx
deleting _163o.nrm
deleting _163o.frq
deleting _163o.fnm
deleting _163o.fdx
deleting _163o.fdt
deleting _163d_4.del
deleting _163d.tis
deleting _163d.tii
deleting _163d.prx
deleting _163d.nrm
deleting _163d.frq
deleting _163d.fnm
deleting _163d.fdx
deleting _163d.fdt
deleting _1632_6.del
deleting _1632.tis
deleting _1632.tii
deleting _1632.prx
deleting _1632.nrm
deleting _1632.frq
deleting _1632.fnm
deleting _1632.fdx
deleting _1632.fdt
deleting _162r_7.del
deleting _162r.tis
deleting _162r.tii
deleting _162r.prx
deleting _162r.nrm
deleting _162r.frq
deleting _162r.fnm
deleting _162r.fdx
deleting _162r.fdt
deleting _162g_d.del
deleting _162g.tis
deleting _162g.tii
deleting _162g.prx
deleting _162g.nrm
deleting _162g.frq
deleting _162g.fnm
deleting _162g.fdx
deleting _162g.fdt
deleting _1625_m.del
deleting _1625.tis
deleting _1625.tii
deleting _1625.prx
deleting _1625.nrm
deleting _1625.frq
deleting _1625.fnm
deleting _1625.fdx
deleting _1625.fdt
deleting _161u_w.del
deleting _161u.tis
deleting _161u.tii
deleting _161u.prx
deleting _161u.nrm
deleting _161u.frq
deleting _161u.fnm
deleting _161u.fdx
deleting _161u.fdt
deleting _161j_16.del
./
_161j_17.del
_164m.fdt
_164m.fdx
_164m.fnm
_164m.frq
_164m.nrm
_164m.prx
_164m.tii
_164m.tis
_164m_1.del
_164x.fdt
_164x.fdx
_164x.fnm
_164x.frq
_164x.nrm
_164x.prx
_164x.tii
_164x.tis
_164x_1.del
segments.gen
segments_1inv

sent 516 bytes  received 105864302 bytes  30247090.86 bytes/sec
total size is 966107226  speedup is 9.13
+ [[ -z search1 ]]
+ [[ -z /opt/solr/logs ]]
+ fixUser -M search1 -S /opt/solr/logs -d /opt/solr/data -V
+ [[ -z tomcat5 ]]
++ whoami
+ [[ tomcat5 != tomcat5 ]]
++ who -m
++ cut '-d ' -f1
++ sed '-es/^.*!//'
+ oldwhoami=
+ [[ '' == '' ]]
+++ pgrep -g0 snapinstaller
++ tail -1
++ cut -f1 '-d '
++ ps h -Hfp 3621 3629 3630 3631
+ oldwhoami=tomcat5
+ [[ -z /opt/solr/data ]]
++ echo /opt/solr/data
++ cut -c1
+ [[ / != \/ ]]
++ echo /opt/solr/logs
++ cut -c1
+ [[ / != \/ ]]
++ date +%s
+ start=1189030205
+ logMessage started by tomcat5
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2007/09/05 15:10:05 started by tomcat5
+ [[ -n '' ]]
+ logMessage command: /opt/solr/bin/snapinstaller -M search1 -S /opt/ 
solr/logs -d /opt/solr/data -V

++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2007/09/05 15:10:05 command: /opt/solr/bin/snapinstaller -M  
search1 -S /opt/solr/logs -d /opt/solr/data -V

+ [[ -n '' ]]
++ ls /opt/solr/data
++ grep 'snapshot\.'
++ grep -v wip
++ sort -r
++ head -1
+ name=temp-snapshot.20070905150504
+ trap 'echo "caught INT/TERM, exiting now but partial installation  
may have already occured";/bin/rm -rf ${data_dir"/index.tmp$$;logExit  
aborted 13' INT TERM

+ [[ temp-snapshot.20070905150504 == '' ]]
+ name=/opt/solr/data/temp-snapsh

Re: Replication broken.. no helpful errors?

2007-09-05 Thread Matthew Runo

If it helps anyone, this index is around a gig in size.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Sep 5, 2007, at 3:14 PM, Matthew Runo wrote:

It seems that the scripts cannot open new searchers at the end of  
the process, for some reason. Here's a message from cron, but I'm  
not sure what to make of it... It looks like the files properly  
copied over, but failed the install. I removed the temp* directory,  
but still SOLR could not launch a new searcher. I don't see any  
activity in catalina.out though...



started by tomcat5
command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/ 
data -S /opt/solr/logs -d /opt/solr/data -v

pulling snapshot temp-snapshot.20070905150504
receiving file list ... done
deleting segments_1ine
deleting _164h_1.del
deleting _164h.tis
deleting _164h.tii
deleting _164h.prx
deleting _164h.nrm
deleting _164h.frq
deleting _164h.fnm
deleting _164h.fdx
deleting _164h.fdt
deleting _164g_1.del
deleting _164g.tis
deleting _164g.tii
deleting _164g.prx
deleting _164g.nrm
deleting _164g.frq
deleting _164g.fnm
deleting _164g.fdx
deleting _164g.fdt
deleting _164f_1.del
deleting _164f.tis
deleting _164f.tii
deleting _164f.prx
deleting _164f.nrm
deleting _164f.frq
deleting _164f.fnm
deleting _164f.fdx
deleting _164f.fdt
deleting _164e_1.del
deleting _164e.tis
deleting _164e.tii
deleting _164e.prx
deleting _164e.nrm
deleting _164e.frq
deleting _164e.fnm
deleting _164e.fdx
deleting _164e.fdt
deleting _164d_1.del
deleting _164d.tis
deleting _164d.tii
deleting _164d.prx
deleting _164d.nrm
deleting _164d.frq
deleting _164d.fnm
deleting _164d.fdx
deleting _164d.fdt
deleting _164c_1.del
deleting _164c.tis
deleting _164c.tii
deleting _164c.prx
deleting _164c.nrm
deleting _164c.frq
deleting _164c.fnm
deleting _164c.fdx
deleting _164c.fdt
deleting _164b_1.del
deleting _164b.tis
deleting _164b.tii
deleting _164b.prx
deleting _164b.nrm
deleting _164b.frq
deleting _164b.fnm
deleting _164b.fdx
deleting _164b.fdt
deleting _164a_1.del
deleting _164a.tis
deleting _164a.tii
deleting _164a.prx
deleting _164a.nrm
deleting _164a.frq
deleting _164a.fnm
deleting _164a.fdx
deleting _164a.fdt
deleting _163z_3.del
deleting _163z.tis
deleting _163z.tii
deleting _163z.prx
deleting _163z.nrm
deleting _163z.frq
deleting _163z.fnm
deleting _163z.fdx
deleting _163z.fdt
deleting _163o_3.del
deleting _163o.tis
deleting _163o.tii
deleting _163o.prx
deleting _163o.nrm
deleting _163o.frq
deleting _163o.fnm
deleting _163o.fdx
deleting _163o.fdt
deleting _163d_4.del
deleting _163d.tis
deleting _163d.tii
deleting _163d.prx
deleting _163d.nrm
deleting _163d.frq
deleting _163d.fnm
deleting _163d.fdx
deleting _163d.fdt
deleting _1632_6.del
deleting _1632.tis
deleting _1632.tii
deleting _1632.prx
deleting _1632.nrm
deleting _1632.frq
deleting _1632.fnm
deleting _1632.fdx
deleting _1632.fdt
deleting _162r_7.del
deleting _162r.tis
deleting _162r.tii
deleting _162r.prx
deleting _162r.nrm
deleting _162r.frq
deleting _162r.fnm
deleting _162r.fdx
deleting _162r.fdt
deleting _162g_d.del
deleting _162g.tis
deleting _162g.tii
deleting _162g.prx
deleting _162g.nrm
deleting _162g.frq
deleting _162g.fnm
deleting _162g.fdx
deleting _162g.fdt
deleting _1625_m.del
deleting _1625.tis
deleting _1625.tii
deleting _1625.prx
deleting _1625.nrm
deleting _1625.frq
deleting _1625.fnm
deleting _1625.fdx
deleting _1625.fdt
deleting _161u_w.del
deleting _161u.tis
deleting _161u.tii
deleting _161u.prx
deleting _161u.nrm
deleting _161u.frq
deleting _161u.fnm
deleting _161u.fdx
deleting _161u.fdt
deleting _161j_16.del
./
_161j_17.del
_164m.fdt
_164m.fdx
_164m.fnm
_164m.frq
_164m.nrm
_164m.prx
_164m.tii
_164m.tis
_164m_1.del
_164x.fdt
_164x.fdx
_164x.fnm
_164x.frq
_164x.nrm
_164x.prx
_164x.tii
_164x.tis
_164x_1.del
segments.gen
segments_1inv

sent 516 bytes  received 105864302 bytes  30247090.86 bytes/sec
total size is 966107226  speedup is 9.13
+ [[ -z search1 ]]
+ [[ -z /opt/solr/logs ]]
+ fixUser -M search1 -S /opt/solr/logs -d /opt/solr/data -V
+ [[ -z tomcat5 ]]
++ whoami
+ [[ tomcat5 != tomcat5 ]]
++ who -m
++ cut '-d ' -f1
++ sed '-es/^.*!//'
+ oldwhoami=
+ [[ '' == '' ]]
+++ pgrep -g0 snapinstaller
++ tail -1
++ cut -f1 '-d '
++ ps h -Hfp 3621 3629 3630 3631
+ oldwhoami=tomcat5
+ [[ -z /opt/solr/data ]]
++ echo /opt/solr/data
++ cut -c1
+ [[ / != \/ ]]
++ echo /opt/solr/logs
++ cut -c1
+ [[ / != \/ ]]
++ date +%s
+ start=1189030205
+ logMessage started by tomcat5
++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2007/09/05 15:10:05 started by tomcat5
+ [[ -n '' ]]
+ logMessage command: /opt/solr/bin/snapinstaller -M search1 -S / 
opt/solr/logs -d /opt/solr/data -V

++ timeStamp
++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2007/09/05 15:10:05 command: /opt/solr/bin/snapinstaller -M  
search1 -S /opt/solr/logs -d /opt/solr/data -V

+ [[ -n '' ]]
++ ls /opt/solr/data
++ grep 'snapshot\.'
++ grep -v 

Re: Can't get 1.2 running under Tomcat 5.5

2007-09-05 Thread Erik Hatcher


On Sep 5, 2007, at 11:37 AM, Matt Mitchell wrote:
SEVERE: org.apache.solr.common.SolrException: Error loading class  
'solr.IndexInfoRequestHandler'


You're using my old hand-built version of Solr, I suspect.  Hoss  
explained it fully in his previous message on this thread.


Care needs to be taken when upgrading Solr but leaving solrconfig.xml  
untouched because additional config may be necessary.  Comparing your  
solrconfig.xml with the one that ships with the example app of the  
version of Solr you're upgrading too is recommended.


Erik



Re: Can't get 1.2 running under Tomcat 5.5

2007-09-05 Thread Chris Hostetter

: Care needs to be taken when upgrading Solr but leaving solrconfig.xml
: untouched because additional config may be necessary.  Comparing your
: solrconfig.xml with the one that ships with the example app of the version of
: Solr you're upgrading too is recommended.

Hmmm... that's kind of a scary statement, and it may misslead people into 
thinking that they need to throw away their configs when updating and 
start over with the newest examples -- that's certianly not true.

I think it's safe to say that if you are using official releases of Solr 
and not trunk builds, then either:
* any "old" config files will continue to work as is
OR: * any known config syntax which no longer works exactly the same way 
  will be called out loudly in the CHANGES.txt files fo the release.

If however you are using a nightly snapshot, items that work in your 
config may not continue to work in future versions as functionality is 
tweaked and revised.

However: Erik's point about comparing your configs with the examples is 
still a good idea -- because their may be cool new features that you'd 
like to take advantage of that dont immediately jump out at you when 
looking at the CHANGES.txt file, but do when looking at sample configs.



-Hoss



Re: Can't get 1.2 running under Tomcat 5.5

2007-09-05 Thread Erik Hatcher
I guess my warning is more because I play on the edge and have  
several times ended up tweaking various apps solrconfig.xml's as I  
upgraded them to keep things working.


Anyway, we'll all agree that diff'ing your config files with the  
example app can be useful.


Erik

On Sep 5, 2007, at 9:26 PM, Chris Hostetter wrote:



: Care needs to be taken when upgrading Solr but leaving  
solrconfig.xml
: untouched because additional config may be necessary.  Comparing  
your
: solrconfig.xml with the one that ships with the example app of  
the version of

: Solr you're upgrading too is recommended.

Hmmm... that's kind of a scary statement, and it may misslead  
people into

thinking that they need to throw away their configs when updating and
start over with the newest examples -- that's certianly not true.

I think it's safe to say that if you are using official releases of  
Solr

and not trunk builds, then either:
* any "old" config files will continue to work as is
OR: * any known config syntax which no longer works exactly the  
same way
  will be called out loudly in the CHANGES.txt files fo the  
release.


If however you are using a nightly snapshot, items that work in your
config may not continue to work in future versions as functionality is
tweaked and revised.

However: Erik's point about comparing your configs with the  
examples is

still a good idea -- because their may be cool new features that you'd
like to take advantage of that dont immediately jump out at you when
looking at the CHANGES.txt file, but do when looking at sample  
configs.




-Hoss




Re: Can't get 1.2 running under Tomcat 5.5

2007-09-05 Thread Walter Underwood
Not really. It is a very poor substitute for reading the release notes,
and sufficiently inadequate that it might not be worth the time.

Diffing the example with the previous release is probably more
instructive, but might or might not help for your application.

A config file checker would be useful.

wunder

On 9/5/07 6:55 PM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote:

> Anyway, we'll all agree that diff'ing your config files with the
> example app can be useful.



Re: Indexing very large files.

2007-09-05 Thread Norberto Meijome
On Wed, 05 Sep 2007 17:18:09 +0200
Brian Carmalt <[EMAIL PROTECTED]> wrote:

> I've bin trying to index a 300MB file to solr 1.2. I keep getting out of 
> memory heap errors.
> Even on an empty index with one Gig of vm memory it sill won't work.

Hi Brian,

VM != heap memory.

VM = OS memory
heap memory = memory made available by the JavaVM to the Java process. Heap 
memory errors are hardly ever an issue of the app itself (other , of course, 
with bad programming... but it doesnt seem to be issue here so far)


[EMAIL PROTECTED] [Thu Sep  6 14:59:21 2007]
/usr/home/betom
$ java -X
[...]
-Xmsset initial Java heap size
-Xmxset maximum Java heap size
-Xssset java thread stack size
[...]

For example, start solr as :
java  -Xms64m -Xmx512m   -jar start.jar

YMMV with respect to the actual values you use.

Good luck,
B
_
{Beto|Norberto|Numard} Meijome

Windows caters to everyone as though they are idiots. UNIX makes no such 
assumption. 
It assumes you know what you are doing, and presents the challenge of figuring 
it out for yourself if you don't.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Indexing very large files.

2007-09-05 Thread Brian Carmalt

Yonik Seeley schrieb:

On 9/5/07, Brian Carmalt <[EMAIL PROTECTED]> wrote:
  

I've bin trying to index a 300MB file to solr 1.2. I keep getting out of
memory heap errors.



300MB of what... a single 300MB document?  Or is that file represent
multiple documents in XML or CSV format?

-Yonik
  

Hello Yonik,

Thank you for your fast reply.  It is one large document. If it was made up
of smaller docs, I would split it up and index them separately.

Can Solr be made to handle such large docs?

Thanks, Brian


Re: Indexing very large files.

2007-09-05 Thread Brian Carmalt

Hello again,

I run Solr on Tomcat under windows and use the tomcat monitor to start 
the service. I have set the minimum heap
size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of 
ram. The error that I get after sending

approximately 300 MB is:

java.lang.OutOfMemoryError: Java heap space
   at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2947)
   at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
   at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384)
   at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
   at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:261)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:581)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

   at java.lang.Thread.run(Thread.java:619)

After sleeping on the problem I see that it does not directly stem from 
Solr, but from the

module  org.xmlpull.mxp1.MXParser. Hmmm. I'm open to sugestions and ideas.

First is this doable?
If yes, will I have to modify the code to save the file to disk and then 
read it back

in order to index it in chunks.
Or can I get it it working on a stock Solr install.

Thanks,

Brian

Norberto Meijome schrieb:

On Wed, 05 Sep 2007 17:18:09 +0200
Brian Carmalt <[EMAIL PROTECTED]> wrote:

  
I've bin trying to index a 300MB file to solr 1.2. I keep getting out of 
memory heap errors.

Even on an empty index with one Gig of vm memory it sill won't work.



Hi Brian,

VM != heap memory.

VM = OS memory
heap memory = memory made available by the JavaVM to the Java process. Heap 
memory errors are hardly ever an issue of the app itself (other , of course, 
with bad programming... but it doesnt seem to be issue here so far)


[EMAIL PROTECTED] [Thu Sep  6 14:59:21 2007]
/usr/home/betom
$ java -X
[...]
-Xmsset initial Java heap size
-Xmxset maximum Java heap size
-Xssset java thread stack size
[...]

For example, start solr as :
java  -Xms64m -Xmx512m   -jar start.jar

YMMV with respect to the actual values you use.

Good luck,
B
_
{Beto|Norberto|Numard} Meijome

Windows caters to everyone as though they are idiots. UNIX makes no such assumption. 
It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't.


I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.