Re: Performance issue.

2006-12-05 Thread Yonik Seeley
On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote: > There's nothing wrong with CPU jumping to 100% each query, that just > means you aren't IO bound :-) What do you mean not IO bound? There is always going to be a bottleneck somewhere. In very large indicies, the bottleneck may be waiting f

Re: Performance issue.

2006-12-05 Thread Gmail Account
There's nothing wrong with CPU jumping to 100% each query, that just means you aren't IO bound :-) What do you mean not IO bound? >- I did an optimize index through Luke with compound format and > noticed > in the solrconfig file that useCompoundFile is set to false. Don't do this unle

Re: Index-time Boosting

2006-12-05 Thread Yonik Seeley
Yep, sounds like you got it. Query-time boosting is what you want. however, the above document *will* have a higher score, in general, because the "title" portion was nearly half of the "text" field. Well, if you boost *all* of the "title" fields by 100, it also has the net effect of boosting

Re: Performance issue.

2006-12-05 Thread Yonik Seeley
On 12/5/06, Gmail Account <[EMAIL PROTECTED]> wrote: Sorry.. I put the wrong subject on my message. I also wanted to mention that my cpu jumps to to almost 100% each query. There's nothing wrong with CPU jumping to 100% each query, that just means you aren't IO bound :-) It's the 3 seconds tha

Performance issue.

2006-12-05 Thread Gmail Account
Sorry.. I put the wrong subject on my message. I also wanted to mention that my cpu jumps to to almost 100% each query. I'm having slow performance with my solr index. I'm not sure what to do. I need some suggestions on what to try. I have updated all my records in the last couple of days. I'm

Re: Index-time Boosting

2006-12-05 Thread Tracey Jaquith
ahh, after rereading this about 20 times today 8-) i think i finally "get it" (your final question below). if i do index-time boosts, and search only "text" (default field) the boosts will propogate into "text", but only insofar that the document will weight higher when a phrase is found in the "

Re: Initial import problems

2006-12-05 Thread Gmail Account
I'm having slow performance with my solr index. I'm not sure what to do. I need some suggestions on what to try. I have updated all my records in the last couple of days. I'm not sure how much it degraded because of that, but it now takes about 3 seconds per search. My cache statistics don't loo

Re: Initial import problems

2006-12-05 Thread Mike Klaas
On 12/5/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: Hello, I am new to SOLR but very excited for it's possibilities. I am having some difficulties with my data import which I hope can be solved very easily. First I wrote an xslt to transform my xml into the solr schema and modified the schema.xml

Re: Initial import problems

2006-12-05 Thread Yonik Seeley
On 12/5/06, Andrew Nagy <[EMAIL PROTECTED]> wrote: Hello, I am new to SOLR but very excited for it's possibilities. I am having some difficulties with my data import which I hope can be solved very easily. First I wrote an xslt to transform my xml into the solr schema and modified the schema.xml

Initial import problems

2006-12-05 Thread Andrew Nagy
Hello, I am new to SOLR but very excited for it's possibilities. I am having some difficulties with my data import which I hope can be solved very easily. First I wrote an xslt to transform my xml into the solr schema and modified the schema.xml to match the fields that I created. I then ran

Re: Index-time Boosting

2006-12-05 Thread Tracey Jaquith
ok, great to know -- all this is invaluable. i'm stashing away "ideas" like this for the future (because..) i think for now i'll stick with XSL transforming the fields to lowercase because we already need this small XSLT from our item XML to XML that solr can index. -t Chris Hostetter wrote:

Re: Index-time Boosting

2006-12-05 Thread Tracey Jaquith
oh, and yes, i've always understood, thankfully, that queries of    "q=commute&fl=title" and    "q=title:commute&fl=title" are *quite* different (but that is probably mostly due to my prior experience with  lucene with our current broken SE 8-) -t Mike Klaas wrote: On 12/5/06, Tracey Jaquith

Re: Index-time Boosting

2006-12-05 Thread Tracey Jaquith
Hi Mike, OK, I guess my "problem" is more of a partially still coming up to speed / partially wanting to be lazy. If I make a dismax handler called "dissed", I'd like it to "work" whether or not i pass in "commute" or "title:commute" to the query. (Now I *do* realize those are two completely dif

Re: Indexing XML files

2006-12-05 Thread Chris Hostetter
: At some point, it would be simpler to write a custom response handler : and generate the output in your desired XML format. I think Walters got the right idea ... as a general rule, we want to make the XmlResponseWriter "bullet proof" so that no matter waht data you put into your index, it is g

Re: Index-time Boosting

2006-12-05 Thread Mike Klaas
On 12/5/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Indeed--those are different queries. The "fl" parameter controlled : the stored fields returned by Solr; it does not affect which documents : are returned. The first query asks for the titles of all documents : containing the word "commu

Re: Index-time Boosting

2006-12-05 Thread Chris Hostetter
: so for my dwindling number of remaining "string" types, in my XSL : transform (on the input to index the doc) i'll lowercase them all, too 8-) I don't beleive that is strictly neccessary, these two field types should be functionally equivilent... ...so i'm p

Re: Index-time Boosting

2006-12-05 Thread Chris Hostetter
: > (For example, I do "indent=on&fl=title&q=commute" in a wget and grep the : > results : > for and then grep -i for commute, there are 23 hits. But doing : > "&q=title:commute" only returns one of those hits..) : : Indeed--those are different queries. The "fl" parameter controlled : the sto

Re: Indexing XML files

2006-12-05 Thread Walter Underwood
At some point, it would be simpler to write a custom response handler and generate the output in your desired XML format. wunder On 12/5/06 1:52 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > Hi, > > the idea is to apply XSLT transformation on the result. But it seems that > I would have

Re: Indexing XML files

2006-12-05 Thread mirko
Hi, the idea is to apply XSLT transformation on the result. But it seems that I would have to apply two transformations in a row, one which unescapes the escaped node and a second which performs the actual transformation... mirko Quoting Yonik Seeley <[EMAIL PROTECTED]>: > On 12/5/06, [EMAIL

Re: Index-time Boosting

2006-12-05 Thread Tracey Jaquith
wow, that makes sense now.  my bad. OK, great.  further testing shows "you mean what you say" -- not only verbatim, but case sensitive. so for my dwindling number of remaining "string" types, in my XSL transform (on the input to index the doc) i'll lowercase them all, too 8-) thanks!! --t Yo

Re: Indexing XML files

2006-12-05 Thread mirko
You are right, it is escaped. But my question is: (how) can I make it unescaped? mirko Quoting Yonik Seeley <[EMAIL PROTECTED]>: ... > > I bet it is escaped, but your browser has helpfully displayed it as > unescaped. > Try doing CTRL-U in firefox to see the real source for the reply. > > > -Y

Re: Indexing XML files

2006-12-05 Thread Yonik Seeley
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Thanks for the quick response. Now, I have one more question. Is it possible to get the result for a query back in the following form (considering the input is the escaped xml, what you mentioned before): 0 0 As You Like

Re: Index-time Boosting

2006-12-05 Thread Mike Klaas
On 12/5/06, Tracey Jaquith <[EMAIL PROTECTED]> wrote: Now I have one new mystery that's popped up for me. With std req handler, this simple query q=title:commute is *not* returning me all documents that have the word "commute" in the title. There must be some other filter/clause or something

Re: Index-time Boosting

2006-12-05 Thread Yonik Seeley
On 12/5/06, Tracey Jaquith <[EMAIL PROTECTED]> wrote: Now I have one new mystery that's popped up for me. With std req handler, this simple query q=title:commute is *not* returning me all documents that have the word "commute" in the title. There must be some other filter/clause or something

Re: Indexing XML files

2006-12-05 Thread mirko
Hi, Thanks for the quick response. Now, I have one more question. Is it possible to get the result for a query back in the following form (considering the input is the escaped xml, what you mentioned before): 0 0 As You Like It (Promptbook of McVicars 1860)Shakespeare, William,

Re: Indexing XML files

2006-12-05 Thread Mike Klaas
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You are right, it is escaped. But my question is: (how) can I make it unescaped? I don't think solr will support such functionality. The xml that solr uses to return data is completely orthogonal to the xml embedded in the data, and mix

Re: Indexing XML files

2006-12-05 Thread Yonik Seeley
On 12/5/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You are right, it is escaped. But my question is: (how) can I make it unescaped? For what purpose? If you use an XML parser, the values it gives back to you will be unescaped. -Yonik

Re: Index-time Boosting

2006-12-05 Thread Tracey Jaquith
Hi Yonik! Yonik Seeley wrote: On 12/5/06, Tracey Jaquith <[EMAIL PROTECTED]> wrote: Quick intro. Server Engineer at Internet Archive. I just spent a mere 3 days porting nearly our entire site to use your *wonderful* project! I, too, am looking for a kind of "boosting". If I understand your re

Re: Indexing XML files

2006-12-05 Thread Chris Hostetter
Since XML is the transport for sending data to Solr, you need to make sure all field values are XML escaped. If you wanted to index a plain text "title" and that tile contained an ampersand character Sense & Sensability ...you would need to XML escape that as... Sense & Sen

Indexing XML files

2006-12-05 Thread mirko
Hi, I am trying to index an xml file as a field in lucene, see example below: As You Like it Shakespeare, William here goes the xml... I can index the title and author fields because they are strings, but the record field is an xml itself and I bump into some problems as I cannot dir

Re: Index-time Boosting

2006-12-05 Thread Yonik Seeley
One last thing to keep in mind is the tradeoffs: Querying a single all encompasing "text" field will be faster, but the scoring won't be as relevant. The types of queries dismax generates can get you better relevance, at the cost of performance. -Yonik On 12/5/06, Chris Hostetter <[EMAIL PROTEC

Re: Index-time Boosting

2006-12-05 Thread Chris Hostetter
: > [We are most interested in always having "title", "description", and a : > few other : > fields boosted. We have both user queries of phrases/words as well as : > "field-specific" queries (eg: "mediatype:moves AND collection:prelinger") : > so my thought is std might be better than dismax.

Re: Index-time Boosting

2006-12-05 Thread Yonik Seeley
On 12/5/06, Tracey Jaquith <[EMAIL PROTECTED]> wrote: Quick intro. Server Engineer at Internet Archive. I just spent a mere 3 days porting nearly our entire site to use your *wonderful* project! I, too, am looking for a kind of "boosting". If I understand your reply here, if i reindex *all* my

Re: Resin error question

2006-12-05 Thread David Halsted
Great, Yonik -- I was hoping somebody would have seen it before (and I didn't think to look in web.xml!). I thought it would be easier to uncomment than to get the host to upgrade, so I did and presto -- no more errors. Thanks much, Dave On 12/5/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: I re

Re: Resin error question

2006-12-05 Thread Yonik Seeley
I recognize this error: [02:30:31.613]Caused by: java.lang.UnsupportedOperationException [02:30:31.613] at com.caucho.xml.QAbstractNode.getTextContent(QAbstractNode.java:355) It's caused by a resin bug in their xpath implementation. I think it's fixed in their latest version, so the simplest sol

Resin error question

2006-12-05 Thread David Halsted
I'm trying to get Solr running with Resin on a hosted site and I'm having a problem in the initialization sequence. I get the stack trace below. I had a look at the mailing list archives and this kind of error seems to be caused mostly when the config files can't be seen, but it looks as though