Re: Embedded about 50% faster for indexing

2007-08-27 Thread climbingrose
Agree. I was actually thinking of developing the embedded version early this year for one of my projects. I'm sure it will be needed in cases where running another web server is an overkill. On 8/28/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: > > I don't think you should apologise for highlighti

RE: Embedded about 50% faster for indexing

2007-08-27 Thread Jonathan Woods
I don't think you should apologise for highlighting embedded usage. For circumstances in which you're at liberty to run a Solr instance in the same JVM as an app which uses it, I find it very strange that you should have to use anything _other_ than embedded, and jump through all the unnecessary h

RE: Embedded about 50% faster for indexing

2007-08-27 Thread Sundling, Paul
At this point I think I'm going recommend against embedded, regardless of any performance advantage. The level of documentation is just too low, while the XML API is clearly documented. It's clear that XML is preferred. The embedded example on the wiki is pretty good, but until mutliple core sup

Re: Embedded about 50% faster for indexing

2007-08-27 Thread Mike Klaas
On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote: Whether embedded solr should give me a performance boost or not, it did. :) I'm not surprised, since it skips XML parsing. Although you never know where cycles are used for sure until you profile. It certainly is possible that XML parsing dw

RE: solr + carrot2

2007-08-27 Thread Lance Norskog
Thanks very much! It came up and worked on our Solr. I found three quirks: 1) Clicking the URL sends something not quite right to IE. I don't know what it is. For some links for file types like .MOV (quicktime) it gives the 'download this file' popup. For other links it gives something about Java

RE: Embedded about 50% faster for indexing

2007-08-27 Thread Sundling, Paul
Sorry I got mixed up with the numbers it was faster than 200 records (2:37:17) than with 10 records (3:21:36), but still definitely slower than embedded (2:10:23) and requires a larger memory footprint. Embedded and post with 10 records was run with 64M, but the later 200 records were run with 128M

RE: Embedded about 50% faster for indexing

2007-08-27 Thread Sundling, Paul
Whether embedded solr should give me a performance boost or not, it did. :) I'm not surprised, since it skips XML parsing. Although you never know where cycles are used for sure until you profile. I tried doing more records per post (200) and it was actually slightly slower and seemed to require

RE: Filtering using data only available at query time

2007-08-27 Thread Daniel Pitts
Okay, but you can put into your index the [permission affecting data], and add a filter for the [current access permission]. In other words, you're front-end handles the current business rules to create the appropriate filter query, and passes that to the solr query handler. > -Original Messa

RE: Filtering using data only available at query time

2007-08-27 Thread Jonathan Woods
But [the type of user] which has permission can change too. > -Original Message- > From: Daniel Pitts [mailto:[EMAIL PROTECTED] > Sent: 27 August 2007 19:07 > To: solr-user@lucene.apache.org > Subject: RE: Filtering using data only available at query time > > I think you're missing my po

RE: Solr and terracotta

2007-08-27 Thread Orion Letizi
Jeryl, I put your idea in our JIRA: http://jira.terracotta.org/jira/browse/CDV-399 --Orion Jeryl Cook wrote: > > had no problems with Terracotta, I got a good handle on the product.. > > Maybe you all at Terracotta could lead the implementation to patch SOLR > to allow it to use the RAMDir

RE: Filtering using data only available at query time

2007-08-27 Thread Daniel Pitts
I think you're missing my point. Don't index which users have permission, index which type of user has permission. Then _filter_ based on that. > -Original Message- > From: Jonathan Woods [mailto:[EMAIL PROTECTED] > Sent: Monday, August 27, 2007 10:26 AM > To: solr-user@lucene.apache.org

RE: range index

2007-08-27 Thread Jonathan Woods
I don't know of any - sorry. I guess this is more a Lucene issue than a Solr one, though Solr analyzers should subclass SolrAnalyzer rather than org.apache.lucene.analysis.Analyzer. I guess you could Google around for something useful - I had a quick look, but couldn't find anything compelling.

Re: Solr and terracotta

2007-08-27 Thread Jonathan Ariel
I'm looking forward for this implementation! I think it'll be really great feature! Does anybody knows how long it takes terracotta sync a RAMDir? Can it be configured? What's the chance of commiting one document on server A and querying on server B that wasn't synchronized yet? On 8/27/07, Je

RE: Filtering using data only available at query time

2007-08-27 Thread Jonathan Woods
I know what you mean, and maybe I'm just being obstinate. But in the general case, it isn't possible to know these things ahead of time. The indexing machinery isn't told about changes in user permissions (e.g. demotion from administrative to ordinary user), and even if it were I'd hate to have t

RE: Filtering using data only available at query time

2007-08-27 Thread Daniel Pitts
Can you add some fields that let set a filter or query that weed out the results that the user doesn't have access too? If its as simple as Admin versus User, you could have a boolean field called AdminOnly, and when a User is querying, add a fq=[* TO *] -AdminOnly:true You could get more specifi

Filtering using data only available at query time

2007-08-27 Thread Jonathan Woods
I've got a Lucene-based search implementation which searches over documents in a CMS and weeds out those hits which aren't accessible to the user carrying out the search. The raw search results are returned as an iterator, and I wrap another iterator around this to silently consume the inaccessibl

Re: range index

2007-08-27 Thread Jae Joo
Any sample code and howto write Analyzer and Tockenizer available? Jae On 8/27/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: > > Or you could write your own Analyzer and Tokenizer to produce single > values > corresponding, say, to the start of each range. > > Jon > > > -Original Message-

RE: Solr and terracotta

2007-08-27 Thread Jeryl Cook
had no problems with Terracotta, I got a good handle on the product.. Maybe you all at Terracotta could lead the implementation to patch SOLR to allow it to use the RAMDirectory ( a setter) so terracotta can hook into the RAMDirectory... the way Terracotta handles clustering , Those of you w

Re: range index

2007-08-27 Thread Jae Joo
I could build index with Sales Vol ranges using PatternReplaceFilterFactory Thanks, Jae On 8/27/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On Aug 27, 2007, at 9:48 AM, Jae Joo wrote: > > That works. But I am looking how to do that at INDEXING TIME, but > > at

RE: range index

2007-08-27 Thread Jonathan Woods
Or you could write your own Analyzer and Tokenizer to produce single values corresponding, say, to the start of each range. Jon > -Original Message- > From: Jae Joo [mailto:[EMAIL PROTECTED] > Sent: 27 August 2007 16:46 > To: solr-user@lucene.apache.org > Subject: Re: range index > > I

Re: Indexing HTML

2007-08-27 Thread Erik Hatcher
On Aug 27, 2007, at 10:00 AM, Michael Kimsal wrote: What's odd about this is that the error seems to indicate that I did. Actually the error message looks like you escaped too much. You should _not_ escape , only the contents of it. Erik The full text (minus the stack trace)

Re: Embedded about 50% faster for indexing

2007-08-27 Thread Kevin Osborn
At 10,000 documents per post, I was actually finding that embedded Solr was providing a significant performance boost. It has been a while since I did any comparisons, but it was probably on the order of 40% or so. - Original Message From: climbingrose <[EMAIL PROTECTED]> To: solr-user@

Optimize and Merging

2007-08-27 Thread Stu Hood
In the situation where a small index is generated (all at once) and then merged into a larger index, when should the indexes be optimized? Should both be optimized before/after? Thanks, Stu Hood Webmail.us "You manage your business. We'll manage your email."®

Re: Indexing HTML

2007-08-27 Thread Michael Kimsal
What's odd about this is that the error seems to indicate that I did. The full text (minus the stack trace) was org.xmlpull.v1.XmlPullParserException: parser must be on START_TAG or TEXT to read text (position: START_TAG seen ...... @4:37) Or is that just a by

Re: range index

2007-08-27 Thread Erik Hatcher
On Aug 27, 2007, at 9:48 AM, Jae Joo wrote: That works. But I am looking how to do that at INDEXING TIME, but at query time. Any way for that? I'm not sure I understand the question. The example provided works at query time. If you want to bucket things at indexing time you could do

Re: range index

2007-08-27 Thread Jae Joo
That works. But I am looking how to do that at INDEXING TIME, but at query time. Any way for that? Thanks, Jae On 8/27/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On Aug 27, 2007, at 9:32 AM, Jae Joo wrote: > > Is there any way to catagorize by price range? > > > > I would like to do face

Re: range index

2007-08-27 Thread Erik Hatcher
On Aug 27, 2007, at 9:32 AM, Jae Joo wrote: Is there any way to catagorize by price range? I would like to do facet by price range. (ex. 100-200, 201-500, 501-1000, ...) Yes, look at using facet queries using range queries. There is an example of this very thing here:

Re: Indexing HTML

2007-08-27 Thread Erik Hatcher
Michael, I think the issue is that you're not escaping the values. Send something like this to Solr instead: linktext Erik On Aug 27, 2007, at 9:29 AM, Michael Kimsal wrote: Hello I'm trying to index individual lines of an HTML file, a

Re: Indexing HTML

2007-08-27 Thread Thierry Collogne
I think you can use the HTMLStripWhitespaceTokenizerFactory. Look here : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-031d5d370010955fdcc529d208395cd556f4a73e I hope this helps On 27/08/07, Michael Kimsal <[EMAIL PROTECTED]> wrote: > > Hello > > I'm trying to index individu

range index

2007-08-27 Thread Jae Joo
Is there any way to catagorize by price range? I would like to do facet by price range. (ex. 100-200, 201-500, 501-1000, ...) Thanks, Jae Joo

Indexing HTML

2007-08-27 Thread Michael Kimsal
Hello I'm trying to index individual lines of an HTML file, and I'm hitting this error: TEXT must be immediately followed by END_TAG and not START_TAG I've got something that looks like 4 linktext Actually, that sample code above, as its own data file POSTed to SOLR, throws parser must be

Re: solr + carrot2

2007-08-27 Thread Stanislaw Osinski
Hi Lance and all, I've just implemented a configuration UI for Solr, similar to the one we have for Lucene. The new UI is available in the HEAD version of the browser: http://demo.carrot2.org/head/dist/carrot2-demo-browser-head.zip or through WebStart: http://demo.carrot2.org/head/webstart/ Pl

Re: Embedded about 50% faster for indexing

2007-08-27 Thread climbingrose
Haven't tried the embedded server but I think I have to agree with Mike. We're currently sending 2000 job batches to SOLR server and the amount of time required to transfer documents over http is insignificant compared with the time required to index them. So I do think unless you are sending docum

Re: Solr and JBOSS Integration

2007-08-27 Thread Thierry Collogne
Hi, The method works, but has the drawback that you need to configure your solr home inside the war of the web application. What we did is the following: Add this to the jboss-service.xml http://www.w3.org/2001/XMLSchema-instance"; xmlns:jndi="urn:jboss:jndi-binding-