Re: solr export get wrong results

2015-01-03 Thread Sandy Ding
Thanks a lot for your for your help, Joel. Just wondering, why does "export" have such limitations? It uses the same query handler with "select", isn't it? 2014-12-31 10:28 GMT+08:00 Joel Bernstein : > For the initial release only JSON output format is supported with the > /export feature. Also t

Re: How large is your solr index?

2015-01-03 Thread Jack Krupansky
That's a laudable goal - to support low-latency queries - including faceting - for "hundreds of millions" of documents, using Solr "out of the box" on a random, commodity box selected by IT and just adding a dozen or two fields to the default schema that are both indexed and stored, without any "ex

RE: How large is your solr index?

2015-01-03 Thread Toke Eskildsen
Erick Erickson [erickerick...@gmail.com] wrote: > I can't disagree. You bring up some of the points that make me _extremely_ > reluctant to try to get this in to 5.x though. 6.0 at the earliest I should > think. Ignoring the magic 2b number for a moment, I think the overall question is whether or

Re: How large is your solr index?

2015-01-03 Thread Jack Krupansky
Back in June on a similar thread I asked "Anybody care to forecast when hardware will catch up with Solr and we can routinely look forward to newbies complaining that they indexed "some" data and after only 10 minutes they hit this weird 2G document count limit?" Still not there. So the race is o

Re: How large is your solr index?

2015-01-03 Thread Erick Erickson
I can't disagree. You bring up some of the points that make me _extremely_ reluctant to try to get this in to 5.x though. 6.0 at the earliest I should think. And who knows? Java may get a GC process that's geared to modern amounts of memory and get by the current pain Best, Erick On Sat, Jan

RE: How large is your solr index?

2015-01-03 Thread Toke Eskildsen
Erick Erickson [erickerick...@gmail.com] wrote: > Of course I wouldn't be doing the work so I really don't have much of > a vote, but it's not clear to me at all that enough people would actually > have a use-case for 2b+ docs in a single shard to make it > worthwhile. At that scale GC potentially

Re: How large is your solr index?

2015-01-03 Thread Shawn Heisey
On 1/3/2015 9:02 AM, Erick Erickson wrote: > bq: For Solr 5 why don't we switch it to 64 bit ?? > > -1 on this for a couple of reasons >> it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to >> implement at the last second >> It's not clear that it's even useful. Once you get

Re: How large is your solr index?

2015-01-03 Thread Erick Erickson
bq: For Solr 5 why don't we switch it to 64 bit ?? -1 on this for a couple of reasons > it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to > implement at the last second > It's not clear that it's even useful. Once you get to that many documents, > performance usually suff

Re: De Duplication using Solr

2015-01-03 Thread Amit Jha
Thanks for reply...I have already seen wiki. It is more likely to record matching. On Sat, Jan 3, 2015 at 7:39 PM, Jack Krupansky wrote: > First, see if you can get your requirements to align to the de-dupe feature > that Solr already has: > https://cwiki.apache.org/confluence/display/solr/De-D

Re: De Duplication using Solr

2015-01-03 Thread Jack Krupansky
First, see if you can get your requirements to align to the de-dupe feature that Solr already has: https://cwiki.apache.org/confluence/display/solr/De-Duplication -- Jack Krupansky On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha wrote: > I am trying to find out duplicate records based on distance and

RE: How large is your solr index?

2015-01-03 Thread Toke Eskildsen
Bill Bell [billnb...@gmail.com] wrote: [solr maxdoc limit of 2b] > For Solr 5 why don't we switch it to 64 bit ?? The biggest challenge for a switch is that Java's arrays can only hold 2b values. I support the idea of switching to much larger minimums throughout the code. But it is a larger fi

RE: De Duplication using Solr

2015-01-03 Thread steve
One possible "match" is using Python's FuzzyWuzzy https://github.com/seatgeek/fuzzywuzzy http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/ > Date: Sat, 3 Jan 2015 13:24:17 +0530 > Subject: De Duplication using Solr > From: shanuu@gmail.com > To: solr-user@lucene.apache.

Re: How large is your solr index?

2015-01-03 Thread Bill Bell
For Solr 5 why don't we switch it to 64 bit ?? Bill Bell Sent from mobile > On Dec 29, 2014, at 1:53 PM, Jack Krupansky wrote: > > And that Lucene index document limit includes deleted and updated > documents, so even if your actual document count stays under 2^31-1, > deleting and updating do