Getting SolrSharp to work, Part 2

2008-01-23 Thread Peter Thygesen
I wrote a small client in .Net which query Solr and dumps the result on screen.. fantastic low-tech.. ;) However I ran into new SolrSharp problems. My schema allows a particular field to be multiValued, but if it only has one value, it will cause SolrSharp fail in line 88 of Class: IndexFiledAttr

Out of heap space with simple updates

2008-01-23 Thread Michael Lackhoff
I wanted to try to do the daily update with XML updates (was mentioned recently as the recommended way) but got an "OutOfMemoryError: Java heap space" after 319000 records. I am sending one document at a time through the http update interface, so every request should be short enough to not run o

Embedded Solr and Servlet Solr

2008-01-23 Thread Jonathan Ariel
Hi! I am using Solr (the classic servlet one) in my application. But for special cases, and just for the sake of optimization, I would like to access it locally so I can avoid HTTP calls and XML serializing. This is for full re indexation of Solr, which is much faster in embedded environment than s

Re: Embedded Solr and Servlet Solr

2008-01-23 Thread Ryan McKinley
Is it possible to use SolrJ in my web application to access Solr remotely and use SolrJ in a simple application to access Solr locally for full re indexation? yes - check: http://wiki.apache.org/solr/Solrj for remote, use CommonsHttpSolrServer and for local, use: EmbeddedSolrServer. ryan

SnowballPorterFilterFactory and protected words

2008-01-23 Thread Daniele Salvatico
Hi all, i'm working with SOLR and i have an italian language documents set. I have a question about using the "protected=" attribute with SnowballPorterFilterFactory filter. Here's my schema.xml

Re: Solr feasibility with terabyte-scale data

2008-01-23 Thread Phillip Farber
For sure this is a problem. We have considered some strategies. One might be to use a dictionary to clean up the OCR but that gets hard for proper names and technical jargon. Another is to use stop words (which has the unfortunate side effect of making phrase searches like "to be or not to be

Re: Updating and Appending

2008-01-23 Thread Yonik Seeley
On Jan 22, 2008 4:10 PM, Owens, Martin <[EMAIL PROTECTED]> wrote: > We've got some memory constraint worries from using Java RMI, although I can > see this problem could effect the xml requests too. The Java code doesn't > seem to handle large files as streams. It depends on what component we ar

Solr-303 Re: Solr in a distributed multi-machine high-performance environment

2008-01-23 Thread Srikant Jakilinki
Yes, 303 looks very promising. And I would like to get involved. I have gone to the JIRA thread and very impressed by the activity going on there. It is THE hangout :-) Following up, does anyone (especially Yonik or Sharad) have any documentation of this feature? Such as goals, use cases, requ

Inverted Search Engine

2008-01-23 Thread George Everitt
Verity had a function called "profiler" which was essentially an inverted search engine. Instead of evaluating a single query at a time against a large corpus of documents, the profiler evaluated a single document at a time against a large number of queries. This kind of functionality is

Re: Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-23 Thread Chris Harris
Suppose I wanted to use this log approach. Does anyone have suggestions about the best way to do it? The approach that first comes to mind is to store the log as a separate DB table, and to maintain that table using a DB trigger attached to the underlying source data table. This is clearly not the

Re: Inverted Search Engine

2008-01-23 Thread Erick Erickson
As chance would have it, this was just discussed over on the lucene user's list. See the thread.. Inverted search / Search on profilenetBest Erick On Jan 23, 2008 1:38 PM, George Everitt <[EMAIL PROTECTED]> wrote: > Verity had a function called "profiler" which was essentially an > inverted sea

Re: Out of heap space with simple updates

2008-01-23 Thread Chris Harris
I'm using java -Xms512M -Xmx1500M -jar start.jar which gives the Java VM a min heap of 512MB RAM and a max of 1500MB. I don't know if 1500MB is enough to fix your problem. I do know that when I try to increase it much beyond there using the standard Sun VM on Windows 2003 Server, Java refuses

solr synonyms behaviour

2008-01-23 Thread anuvenk
I need to understand this synonym behaviour I have this synonym divorce mediation,alternative dispute resolution so when i do a debug this is the parsedquery_tostring i see: (((text:divorc^0.8 | name:divorc^2.0)~0.01 (text:mediat^0.8 | name:mediat^2.0)~0.01)~2) (text:"(divorc altern) (disput med

RE: Solr feasibility with terabyte-scale data

2008-01-23 Thread Lance Norskog
We use two indexed copies of the same text, one with stemming and stopwords and the other with neither. We do phrase search on the second. You might use two different OCR implementations and cross-correlate the output. Lance -Original Message- From: Phillip Farber [mailto:[EMAIL PROT

Re: spellcheckhandler

2008-01-23 Thread anuvenk
I did try with the latest nightly build and followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to creating new catchall field 'spell' of type 'spell' and copied my text fields to 'spell' at index time. Still q=grapics returns 'graphics' but q=grapics

Re: Inverted Search Engine

2008-01-23 Thread George Everitt
Wow, that's spooky. Thanks for the heads up - looks like a good list to subscribe to as well George Everitt Applied Relevance LLC [EMAIL PROTECTED] Tel: +1 (727) 641-4660 Fax: +1 (727) 233-0672 Skype: geverit4 AIM: [EMAIL PROTECTED] On Jan 23, 2008, at 2:30 PM, Erick Erickson wrote: As cha

Re: Updating and Appending

2008-01-23 Thread Chris Harris
On Jan 23, 2008 9:04 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Jan 22, 2008 4:10 PM, Owens, Martin <[EMAIL PROTECTED]> wrote: > > We've got some memory constraint worries from using Java RMI, although I > > can see this problem could effect the xml requests too. The Java code > > doesn't s

Re: Updating and Appending

2008-01-23 Thread Yonik Seeley
On Jan 23, 2008 4:29 PM, Chris Harris <[EMAIL PROTECTED]> wrote: > Supposing you could do this -- i.e. that you could get Solr to pass a > particular field's data to Lucene without reading it all into memory > first --, are there any potential problems on the Lucene end? It's not > going to turn ar

Re: copyField limitation

2008-01-23 Thread Grant Ingersoll
This may be possible to do with Lucene's new SinkTokenizer/ TeeTokenFilter functionality. You might find http://www.mail-archive.com/[EMAIL PROTECTED]/msg06863.html useful in that context. Also, search the Lucene dev list for discussion. -Grant On Jan 22, 2008, at 3:13 PM, Lance Norskog w

Re: SnowballPorterFilterFactory and protected words

2008-01-23 Thread Chris Hostetter
: I have a question about using the "protected=" attribute with : SnowballPorterFilterFactory filter. SnowballPorterFilterFactory doesn't (and has never) supported a protwords option ... that feature is unique to the EnglishPorterFilterFactory. this is probably just due to how they came about

Re: Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-23 Thread Chris Hostetter
: Suppose I wanted to use this log approach. Does anyone have : suggestions about the best way to do it? The approach that first comes : to mind is to store the log as a separate DB table, and to maintain it largely depends on your DB schema and the mechanisms you use to update your data. In mo

Re: Out of heap space with simple updates

2008-01-23 Thread Michael Lackhoff
On 23.01.2008 20:57 Chris Harris wrote: I'm using java -Xms512M -Xmx1500M -jar start.jar Thanks! I did see the -X... params in recent threads but didn't know where to place them -- not being a java guy at all ;-) -Michael