Re: processing documents in solr

2013-07-28 Thread Aditya
Hi, The easiest solution would be to have timestamp indexed. Is there any issue in doing re-indexing? If you want to process records in batch then you need a ordered list and a bookmark. You require a field to sort and maintain a counter / last id as bookmark. This is mandatory to solve your probl

Merged segment warmer Solr 4.4

2013-07-28 Thread Manuel Le Normand
Hi, I have a slow storage machine and non sufficient RAM for the whole index to store all the index. This causes the first queries (~5000) to be very slow (they are read from disk and my cpu is most of time in iowait), and after that the readings from the index become very fast and read mainly from

ANNOUNCE: Apache Solr Reference Guide 4.4 Available

2013-07-28 Thread Chris Hostetter
The Lucene PMC is pleased to announce the release of the Apache Solr Reference Guide for Solr 4.4. This 431 page PDF serves as the definitive users manual for Solr 4.4. As the first document of it's kind released by the Lucene project, this release demonstrates a major milestone in the growt

Re: processing documents in solr

2013-07-28 Thread Joe Zhang
Basically, I was thinking about running a range query like Shawn suggested on the tstamp field, but unfortunately it was not indexed. Range queries only work on indexed fields, right? On Sun, Jul 28, 2013 at 9:49 PM, Joe Zhang wrote: > I've been thinking about tstamp solution int the past few d

Re: processing documents in solr

2013-07-28 Thread Joe Zhang
I've been thinking about tstamp solution int the past few days. but too bad, the field is avaialble but not indexed... I'm not familiar with SolrJ. Again, sounds like SolrJ is providing the counter value. If yes, that would be equivalent to an autoincrement id. I'm indexing from Nutch though; don'

Re: lang.fallback doesn't work when using lang.fallbackFields

2013-07-28 Thread Sam Dillingham
unsubscribe On Sun, Jul 28, 2013 at 5:59 PM, Jan Høydahl wrote: > Hi, > > Looking at the code, you are right. Whitelist processing is only done on > detected languages, not on the fallback or fallbackFields languages, since > these are assumed to be correct. Thus you should not pass in a fallba

Re: new field type - enum field

2013-07-28 Thread Erick Erickson
You should be able to attach a patch, wonder if there was some temporary glitch in the JIRA. Is this persisting. Let us know if this continues... Erick On Sun, Jul 28, 2013 at 12:11 PM, Elran Dvir wrote: > Hi, > > I have created an issue: https://issues.apache.org/jira/browse/SOLR-5084 > I trie

Re: lang.fallback doesn't work when using lang.fallbackFields

2013-07-28 Thread Jan Høydahl
Hi, Looking at the code, you are right. Whitelist processing is only done on detected languages, not on the fallback or fallbackFields languages, since these are assumed to be correct. Thus you should not pass in a fallback language, either in the input document or with langid.fallback which ca

Re: Query Performance

2013-07-28 Thread Jack Krupansky
start is a window into the sorted, matched documents. So, whether the second query matches a lot less documents, and hence has less to sort, depends once again on where X lies in the distribution of documents. If X if the first term in the field, the second query would match all documents (exc

Re: Query Performance

2013-07-28 Thread Furkan KAMACI
Actually I have to rewrite my question: Query 1: q=*:*&rows=row_count&sort=id asc&start=X and Query2: q={X TO *}&rows=row_count&sort=id asc&start=0 2013/7/29 Jack Krupansky > The second query excludes documents matched by [* TO X], while the first > query matches all documents. > > Relati

Re: Query Performance

2013-07-28 Thread Jack Krupansky
The second query excludes documents matched by [* TO X], while the first query matches all documents. Relative performance will depend on relative match count and the sort time on the matched documents. Sorting will likely be the dominant factor - for equal number of documents. So, it depends

Query Performance

2013-07-28 Thread Furkan KAMACI
What is the difference between: q=*:*&rows=row_count&sort=id asc and q={X TO *}&rows=row_count&sort=id asc Does the first one trys to get all the documents but cut the result or they are same or...? What happens at underlying process of Solr for that two queries?

Re: Solr-4663 - Alternatives to use same data dir in different cores for optimal cache performance

2013-07-28 Thread Roman Chyla
Hi, Yes, it can be done, if you search the mailing list for 'two solr instances same datadir', you will a post where i am describing our setup - it works well even with automated deployments how do you measure performance? I am asking before one reason for us having the same setup is sharing the O

Re: Solr-4663 - Alternatives to use same data dir in different cores for optimal cache performance

2013-07-28 Thread Dominik Siebel
Maybe you're right. The problem is that with the different types of queries it is hard to properly size document- and queryResultCaches (one query requests 10 results per page, others up to 12000). We tried different approaches, cache sizes and spend hours with JVM configuration (OutOfMemory proble

RE: new field type - enum field

2013-07-28 Thread Elran Dvir
Hi, I have created an issue: https://issues.apache.org/jira/browse/SOLR-5084 I tried to attach my patch, but it failed: " Cannot attach file Solr-5084.patch: Unable to communicate with JIRA." What am I doing wrong? Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@g

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-28 Thread Erick Erickson
Shawn had an interesting idea on another thread. It depends on having basically an identity field (which I see how to do manually, but don't see how to make work as a new field type in a distributed environment). And it's brilliantly simple, just a range query identity:{ TO *]&sort=identity asc

Re: processing documents in solr

2013-07-28 Thread Erick Erickson
Why wouldn't a simple timestamp work for the ordering? Although I guess "simple timestamp" isn't really simple if the time settings change. So how about a simple counter field in your documents? Assuming you're indexing from SolrJ, your setup is to query q=*:*&sort=counter desc. Take the counter f

Re: Sending shard requests to all replicas

2013-07-28 Thread Erick Erickson
You'd probably start in CloudSolrServer in SolrJ code, as far as I know that's where the request is sent out. I'd think that would be better than changing Solr itself since if you found that this was useful you wouldn't be patching your Solr release, just keeping your client up to date. Best Eric

Re: plugin init failure for ShingleFilterFactory

2013-07-28 Thread Erick Erickson
My first guess is that you have old jars in your classpath. Try a fresh install first outside of your current setup as a first test. If that works, then you'll need to track down where your old jars are Best Erick On Fri, Jul 26, 2013 at 7:26 PM, Mingfeng Yang wrote: > I am trying to upgrade

greetings

2013-07-28 Thread Frank Apap
http://leden.lionsclubdehaan.be/ahqig/bikisjprkowoft Frank Apap 7/28/2013 9:12:39 AM