Re: OOE during indexing

2008-01-22 Thread Marcus Herou
Yep but we hire these god damn boxes and then my friend memory costs per month = not cheap in long term. Something like 50$ / month for 2G more. I might be an ultra geek when it comes to Linux and programming but I'm not an ultra-geek building servers from scratch. But I will straighten up and buy

Re: OOE during indexing

2008-01-22 Thread Mike Klaas
On 22-Jan-08, at 9:46 PM, Marcus Herou wrote: OK I got the conclusion myself. add memory to the box and get some more boxes :) I'm glad you've come to that conclusion, but to reinforce it: Solr/ Lucene heavily benefits from loads of memory. Not just for Solr caching, but it also depends

Re: OOE during indexing

2008-01-22 Thread Marcus Herou
Thanks! Yes I agree (to a certain level) on me being naive. Currently I'm only using one server for this but will go into distributed snapshot/pull mode soon. Then I can tune the slaves differently then the master I believe. The master for instance do not need autowarming nor caches if not searche

Re: multivalued dynamic fields performance

2008-01-22 Thread Yonik Seeley
On Jan 22, 2008 9:29 PM, Jonathan Ariel <[EMAIL PROTECTED]> wrote: > If I'm going to have nearly always one value and in some cases 4 or 5 values > I would feel the penalty when faceting? Does it depends on the amount of > values in my field? For those documents that I'm going to have just one > va

Re: multivalued dynamic fields performance

2008-01-22 Thread Jonathan Ariel
Thanks! So there is just one penalty when faceting, which is my case. "TermEnum is good for a limited number of different indexed terms in the field, and allows multiple terms per field per document" How much is limited number of different indexed terms in the field? If I'm going to have nearly alw

Re: Solr feasibility with terabyte-scale data

2008-01-22 Thread Mike Klaas
On 22-Jan-08, at 4:20 PM, Phillip Farber wrote: We would need all 7M ids scored so we could push them through a filter query to reduce them to a much smaller number on the order of 100-10,000 representing just those that correspond to items in a collection. You could pass the filter to S

Re: Logging in Solr

2008-01-22 Thread Chris Hostetter
: I'm new to Solr and Tomcat and I'm trying to track down some odd errors. : How do I set up Tomcat to do fine-grained Solr-specific logging? I have : looked around enough to know that it should be possible to do per-webapp : logging in Tomcat 5.5, but the details are hard to follow for a newbie

Re: Storing Related Data - At Different Times

2008-01-22 Thread Chris Hostetter
: details. This is a simple join in the db. But how do we achieve this in : Solr. The problem is when personal details are changed we will have to : update all 5 resumes. that is in a nutsehll what you need to do. >From the perspective of clients, a Solr index is a very flattened date structur

Re: Solr feasibility with terabyte-scale data

2008-01-22 Thread Erick Erickson
Just to add another wrinkle, how clean is your OCR? I've seen it range from very nice (i.e. 99.9% of the words are actually words) to horrible (60%+ of the "words" are nonsense). I saw one attempt to OCR a family tree. As in a stylized tree with the data hand-written along the various branches in e

Re: Solr feasibility with terabyte-scale data

2008-01-22 Thread Phillip Farber
Otis Gospodnetic wrote: Hi, Some quick notes, since it's late here. - You'll need to wait for SOLR-303 - there is no way even a big machine will be able to search such a large index in a reasonable amount of time, plus you may simply not have enough RAM for such a large index. Are you bas

Re: SolrPhpClient with example jetty

2008-01-22 Thread Daniel Andersson
On Jan 23, 2008, at 12:47 AM, Brian Whitman wrote: $document->title = 'Some Title'; $document->content = 'Some content for this wonderful document. Blah blah blah.'; did you change the schema? There's no title or content field in the default example schema. But I believe solr d

Re: SolrPhpClient with example jetty

2008-01-22 Thread Brian Whitman
$document->title = 'Some Title'; $document->content = 'Some content for this wonderful document. Blah blah blah.'; did you change the schema? There's no title or content field in the default example schema. But I believe solr does output different errors for that.

SolrPhpClient with example jetty

2008-01-22 Thread Daniel Andersson
Hi (again) I'm trying to add documents using the SolrPhpClient (if there's a specific mailinglist for it, please let me know and I'll ask there instead). I've searched the net for "missing content stream", but found nothing that makes sense. This is what solr spits out when I run the ex

Re: multivalued dynamic fields performance

2008-01-22 Thread Chris Hostetter
: Hi, : Do you know if there is a performance impact when using multivalued dynamic : fields when it's not always necessary to store more than one value? http://www.nabble.com/Performance-penalty-for-Multivalued-field--to9496992.html -Hoss

Re: OOE during indexing

2008-01-22 Thread Chris Hostetter
: I get OOE with Solr 1.3 Autowarm seem to be the villain in cojunction with : FieldCache somehow. : JVM args: -Xmx512m -Xms512m -Xss128k : : Index size is ~4 Million docs, where I index text and store database primary it seems naive to me to only allow 512MB for an index of 4 million docs -- n

Re: Problem with applying stylesheets

2008-01-22 Thread Chris Hostetter
: I am trying to apply style sheet to result xml by passing argument like : stylesheet=tabular.xml . but it complained : that stylesheet may be empty.. when i checked the source code for XMLwriter : .. its lookin under /admin as noted in the wiki the stylesheet param is (vastly) discouraged ...

using xalan:tokenize in output xslt...

2008-01-22 Thread Sean Laval
I am using xalan:tokenize in an xsl that transforms solr output and the stylesheet is failing to compile. Any ideas? I am sure its straightforward. Any help appreciated. Regards, Sean _ Get Hotmail on your mobile, text MSN to 63

Updating and Appending

2008-01-22 Thread Owens, Martin
Hello, We've got some memory constraint worries from using Java RMI, although I can see this problem could effect the xml requests too. The Java code doesn't seem to handle large files as streams. Now we're thinking that there are two possible solutions, either the exists or we create a file pa

Re: copyField limitation

2008-01-22 Thread Ryan McKinley
Solr does not now do this. I don't know if the Solr processing stack has this flexibility, or if it is worth adding it. I understand every example you have suggested -- i just don't get how it isn't possible. Can you post an exampe of the schema+commands that give you an error? If your go

Problem with applying stylesheets

2008-01-22 Thread Ismail Siddiqui
I am trying to apply style sheet to result xml by passing argument like stylesheet=tabular.xml . but it complained that stylesheet may be empty.. when i checked the source code for XMLwriter .. its lookin under /admin private static final char[] XML_START1="\n".toCharArray(); private static fi

Re: Solr feasibility with terabyte-scale data

2008-01-22 Thread Mike Klaas
On 22-Jan-08, at 11:05 AM, Phillip Farber wrote: Currently 1M docs @ ~1.4M/doc. Scaling to 7M docs. This is OCR so we are talking perhaps 50K words total to index so as you point out the index might not be too big. It's the *data* that is big not the *index*, right?. So I don't think S

Re: OOE during indexing

2008-01-22 Thread Marcus Herou
Thanks for your reply. I set autowarmcount = 0 for both LRUCache and the queryCache but still I got these errors on heavy reindexing (4M docs as fast as possible each doc < 10K). I removed firstSearcher and newSearcher but I still got the same errors. The strange thing is that now when the server

RE: copyField limitation

2008-01-22 Thread Lance Norskog
A more interesting use case: Analyzing text and finding a number, like the mean word length or the mean number of repeated words. These are standard tools for spam detection. To create these, we would want to shovel text into a text processing chain that creates an integer. We then want to both st

Re: OOE during indexing

2008-01-22 Thread Mike Klaas
Queries involving sorting can occupy a lot of memory. During autowarming you need 2x peak memory usage. The only thing you can do is increase your max heap size or be careful about cache autowarming (possibly turning it off). cheers, -Mike On 21-Jan-08, at 9:44 PM, Marcus Herou wrote:

multivalued dynamic fields performance

2008-01-22 Thread Jonathan Ariel
Hi, Do you know if there is a performance impact when using multivalued dynamic fields when it's not always necessary to store more than one value? Since I'm going to add dynamic fields to my schema and I'm not sure if the field will be multivalued or not, I thought about doing them multivalued. In

Re: Solr feasibility with terabyte-scale data

2008-01-22 Thread Ryan McKinley
Obviously as the number of documents increase the index size must increase to some degree -- I think linearly? But what index size will result for 7M documents over 50K words where we're talking just 2 fields per doc: 1 id field and one OCR field of ~1.4M? Ballpark? Regarding single word qu

Re: spellcheckhandler

2008-01-22 Thread anuvenk
I did try with the latest nightly build and followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to creating new catchall field 'spell' of type 'spell' and copied my text fields to 'spell' at index time. Still q=grapics returns 'graphics' but q=grapic

Re: spellcheckhandler

2008-01-22 Thread anuvenk
I did try with the latest nightly build. The problem still exists. I tested with the example data that comes with solr package. 1)with termsourcefield set to 'word' which is string fieldtype q=iped nano returns 'ipod nano' which is good 2) with termsourcefield set to 'spell' (which is the c

Re: Solr feasibility with terabyte-scale data

2008-01-22 Thread Phillip Farber
Ryan McKinley wrote: We are considering Solr 1.2 to index and search a terabyte-scale dataset of OCR. Initially our requirements are simple: basic tokenizing, score sorting only, no faceting. The schema is simple too. A document consists of a numeric id, stored and indexed and a large

Re: auto Warming and Special Character

2008-01-22 Thread Ryan McKinley
same way you put any & in xml... & Jae Joo wrote: In the firstsearch listner, I need to use special character "&" in the q string, but it complains "Error - filterStart" company_desc:"Advertising & Marketing" 0 20 company_name,

auto Warming and Special Character

2008-01-22 Thread Jae Joo
In the firstsearch listner, I need to use special character "&" in the q string, but it complains "Error - filterStart" company_desc:"Advertising & Marketing" 0 20 company_name, score Thanks, Jae Joo