Thanks a lot for your for your help, Joel.
Just wondering, why does "export" have such limitations? It uses the same
query handler with "select", isn't it?
2014-12-31 10:28 GMT+08:00 Joel Bernstein :
> For the initial release only JSON output format is supported with the
> /export feature. Also t
That's a laudable goal - to support low-latency queries - including
faceting - for "hundreds of millions" of documents, using Solr "out of the
box" on a random, commodity box selected by IT and just adding a dozen or
two fields to the default schema that are both indexed and stored, without
any "ex
Erick Erickson [erickerick...@gmail.com] wrote:
> I can't disagree. You bring up some of the points that make me _extremely_
> reluctant to try to get this in to 5.x though. 6.0 at the earliest I should
> think.
Ignoring the magic 2b number for a moment, I think the overall question is
whether or
Back in June on a similar thread I asked "Anybody care to forecast when
hardware will catch up with Solr and we can routinely look forward to
newbies complaining that they indexed "some" data and after only 10 minutes
they hit this weird 2G document count limit?"
Still not there.
So the race is o
I can't disagree. You bring up some of the points that make me _extremely_
reluctant to try to get this in to 5.x though. 6.0 at the earliest I should
think.
And who knows? Java may get a GC process that's geared to modern
amounts of memory and get by the current pain
Best,
Erick
On Sat, Jan
Erick Erickson [erickerick...@gmail.com] wrote:
> Of course I wouldn't be doing the work so I really don't have much of
> a vote, but it's not clear to me at all that enough people would actually
> have a use-case for 2b+ docs in a single shard to make it
> worthwhile. At that scale GC potentially
On 1/3/2015 9:02 AM, Erick Erickson wrote:
> bq: For Solr 5 why don't we switch it to 64 bit ??
>
> -1 on this for a couple of reasons
>> it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to
>> implement at the last second
>> It's not clear that it's even useful. Once you get
bq: For Solr 5 why don't we switch it to 64 bit ??
-1 on this for a couple of reasons
> it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to
> implement at the last second
> It's not clear that it's even useful. Once you get to that many documents,
> performance usually suff
Thanks for reply...I have already seen wiki. It is more likely to record
matching.
On Sat, Jan 3, 2015 at 7:39 PM, Jack Krupansky
wrote:
> First, see if you can get your requirements to align to the de-dupe feature
> that Solr already has:
> https://cwiki.apache.org/confluence/display/solr/De-D
First, see if you can get your requirements to align to the de-dupe feature
that Solr already has:
https://cwiki.apache.org/confluence/display/solr/De-Duplication
-- Jack Krupansky
On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha wrote:
> I am trying to find out duplicate records based on distance and
Bill Bell [billnb...@gmail.com] wrote:
[solr maxdoc limit of 2b]
> For Solr 5 why don't we switch it to 64 bit ??
The biggest challenge for a switch is that Java's arrays can only hold 2b
values. I support the idea of switching to much larger minimums throughout the
code. But it is a larger fi
One possible "match" is using Python's FuzzyWuzzy
https://github.com/seatgeek/fuzzywuzzy
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
> Date: Sat, 3 Jan 2015 13:24:17 +0530
> Subject: De Duplication using Solr
> From: shanuu@gmail.com
> To: solr-user@lucene.apache.
For Solr 5 why don't we switch it to 64 bit ??
Bill Bell
Sent from mobile
> On Dec 29, 2014, at 1:53 PM, Jack Krupansky wrote:
>
> And that Lucene index document limit includes deleted and updated
> documents, so even if your actual document count stays under 2^31-1,
> deleting and updating do
13 matches
Mail list logo