date:20150103

Re: How large is your solr index?

2015-01-03 Thread Bill Bell

For Solr 5 why don't we switch it to 64 bit ??

Bill Bell
Sent from mobile


> On Dec 29, 2014, at 1:53 PM, Jack Krupansky  wrote:
> 
> And that Lucene index document limit includes deleted and updated
> documents, so even if your actual document count stays under 2^31-1,
> deleting and updating documents can push the apparent document count over
> the limit unless you very aggressively merge segments to expunge deleted
> documents.
> 
> -- Jack Krupansky
> 
> -- Jack Krupansky
> 
> On Mon, Dec 29, 2014 at 12:54 PM, Erick Erickson 
> wrote:
> 
>> When you say 2B docs on a single Solr instance, are you talking only one
>> shard?
>> Because if you are, you're very close to the absolute upper limit of a
>> shard, internally
>> the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems.
>> 
>> But yeah, your 100B documents are going to use up a lot of servers...
>> 
>> Best,
>> Erick
>> 
>> On Mon, Dec 29, 2014 at 7:24 AM, Bram Van Dam 
>> wrote:
>>> Hi folks,
>>> 
>>> I'm trying to get a feel of how large Solr can grow without slowing down
>> too
>>> much. We're looking into a use-case with up to 100 billion documents
>>> (SolrCloud), and we're a little afraid that we'll end up requiring 100
>>> servers to pull it off.
>>> 
>>> The largest index we currently have is ~2billion documents in a single
>> Solr
>>> instance. Documents are smallish (5k each) and we have ~50 fields in the
>>> schema, with an index size of about 2TB. Performance is mostly OK. Cold
>>> searchers take a while, but most queries are alright after warming up. I
>>> wish I could provide more statistics, but I only have very limited
>> access to
>>> the data (...banks...).
>>> 
>>> I'd very grateful to anyone sharing statistics, especially on the larger
>> end
>>> of the spectrum -- with or without SolrCloud.
>>> 
>>> Thanks,
>>> 
>>> - Bram
>>

RE: De Duplication using Solr

2015-01-03 Thread steve

One possible "match" is using Python's FuzzyWuzzy
https://github.com/seatgeek/fuzzywuzzy
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/

> Date: Sat, 3 Jan 2015 13:24:17 +0530
> Subject: De Duplication using Solr
> From: shanuu@gmail.com
> To: solr-user@lucene.apache.org
> 
> I am trying to find out duplicate records based on distance and phonetic
> algorithms. Can I utilize solr for that? I have following fields and
> conditions to identify exact or possible duplicates.
> 
> 1. Fields
> prefix
> suffix
> firstname
> lastname
> email(primary_email1, email2, email3)
> phone(primary_phone1, phone2, phone3)
> 2. Conditions:
> Two records said to be exact duplicates if
> 
> 1. IsExactMatchFunction(record1_prefix, record2_prefix) AND
> IsExactMatchFunction(record1_suffix, record2_suffix) AND
> IsExactMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> Two records said to be possible duplicates if
> 
> 1. IsExactMatchFunction(record1_prefix, record2_prefix) OR
> IsExactMatchFunction(record1_suffix, record2_suffix) OR
> IsExactMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
>  ELSE
>  2. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
>  ELSE
>  3. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_any_email,record2_any_email) OR
> IsExactMatchFunction(record1_any_phone,record2_any_primary)
> 
> IsFuzzyMatchFunction() will perform distance and phonetic algorithms
> calculation and compare it with predefined threshold.
> 
> For example:
> 
> if threshold defined for firsname is 85 and IsFuzzyMatchFunction() function
> only return "ture" only and only if one of the algorithms(distance or
> phonetic) return the similarity socre >= 85.
> 
> Can I use solr to perform this job. Or Can you guys suggest how can I
> approach to this problem. I have seen the duke(De duplication API) but I
> can not use duke out of the box.

RE: How large is your solr index?

2015-01-03 Thread Toke Eskildsen

Bill Bell [billnb...@gmail.com] wrote:

[solr maxdoc limit of 2b]

> For Solr 5 why don't we switch it to 64 bit ??

The biggest challenge for a switch is that Java's arrays can only hold 2b 
values. I support the idea of switching to much larger minimums throughout the 
code. But it is a larger fix than replacing int with long. 

- Toke Eskildsen

Re: De Duplication using Solr

2015-01-03 Thread Jack Krupansky

First, see if you can get your requirements to align to the de-dupe feature
that Solr already has:
https://cwiki.apache.org/confluence/display/solr/De-Duplication


-- Jack Krupansky

On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha  wrote:

> I am trying to find out duplicate records based on distance and phonetic
> algorithms. Can I utilize solr for that? I have following fields and
> conditions to identify exact or possible duplicates.
>
> 1. Fields
> prefix
> suffix
> firstname
> lastname
> email(primary_email1, email2, email3)
> phone(primary_phone1, phone2, phone3)
> 2. Conditions:
> Two records said to be exact duplicates if
>
> 1. IsExactMatchFunction(record1_prefix, record2_prefix) AND
> IsExactMatchFunction(record1_suffix, record2_suffix) AND
> IsExactMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> Two records said to be possible duplicates if
>
> 1. IsExactMatchFunction(record1_prefix, record2_prefix) OR
> IsExactMatchFunction(record1_suffix, record2_suffix) OR
> IsExactMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
>  ELSE
>  2. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
>  ELSE
>  3. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> IsExactMatchFunction(record1_lastname,record2_lastname) AND
> IsExactMatchFunction(record1_any_email,record2_any_email) OR
> IsExactMatchFunction(record1_any_phone,record2_any_primary)
>
> IsFuzzyMatchFunction() will perform distance and phonetic algorithms
> calculation and compare it with predefined threshold.
>
> For example:
>
> if threshold defined for firsname is 85 and IsFuzzyMatchFunction() function
> only return "ture" only and only if one of the algorithms(distance or
> phonetic) return the similarity socre >= 85.
>
> Can I use solr to perform this job. Or Can you guys suggest how can I
> approach to this problem. I have seen the duke(De duplication API) but I
> can not use duke out of the box.
>

Re: De Duplication using Solr

2015-01-03 Thread Amit Jha

Thanks for reply...I have already seen wiki. It is more  likely to record
matching.

On Sat, Jan 3, 2015 at 7:39 PM, Jack Krupansky 
wrote:

> First, see if you can get your requirements to align to the de-dupe feature
> that Solr already has:
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>
>
> -- Jack Krupansky
>
> On Sat, Jan 3, 2015 at 2:54 AM, Amit Jha  wrote:
>
> > I am trying to find out duplicate records based on distance and phonetic
> > algorithms. Can I utilize solr for that? I have following fields and
> > conditions to identify exact or possible duplicates.
> >
> > 1. Fields
> > prefix
> > suffix
> > firstname
> > lastname
> > email(primary_email1, email2, email3)
> > phone(primary_phone1, phone2, phone3)
> > 2. Conditions:
> > Two records said to be exact duplicates if
> >
> > 1. IsExactMatchFunction(record1_prefix, record2_prefix) AND
> > IsExactMatchFunction(record1_suffix, record2_suffix) AND
> > IsExactMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> > IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> > Two records said to be possible duplicates if
> >
> > 1. IsExactMatchFunction(record1_prefix, record2_prefix) OR
> > IsExactMatchFunction(record1_suffix, record2_suffix) OR
> > IsExactMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> > IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> >  ELSE
> >  2. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_primary_email,record2_primary_email) OR
> > IsExactMatchFunction(record1_primary_phone,record2_primary_primary)
> >  ELSE
> >  3. IsFuzzyMatchFunction(record1_firstname,record2_firstname) AND
> > IsExactMatchFunction(record1_lastname,record2_lastname) AND
> > IsExactMatchFunction(record1_any_email,record2_any_email) OR
> > IsExactMatchFunction(record1_any_phone,record2_any_primary)
> >
> > IsFuzzyMatchFunction() will perform distance and phonetic algorithms
> > calculation and compare it with predefined threshold.
> >
> > For example:
> >
> > if threshold defined for firsname is 85 and IsFuzzyMatchFunction()
> function
> > only return "ture" only and only if one of the algorithms(distance or
> > phonetic) return the similarity socre >= 85.
> >
> > Can I use solr to perform this job. Or Can you guys suggest how can I
> > approach to this problem. I have seen the duke(De duplication API) but I
> > can not use duke out of the box.
> >
>

Re: How large is your solr index?

2015-01-03 Thread Erick Erickson

bq: For Solr 5 why don't we switch it to 64 bit ??

-1 on this for a couple of reasons
> it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to 
> implement at the last second
> It's not clear that it's even useful. Once you get to that many documents, 
> performance usually suffers

Of course I wouldn't be doing the work so I really don't have much of
a vote, but it's not clear to me at
all that enough people would actually have a use-case for 2b+ docs in
a single shard to make it
worthwhile. At that scale GC potentially becomes really unpleasant for
instance

FWIW,
Erick

On Sat, Jan 3, 2015 at 2:45 AM, Toke Eskildsen  wrote:
> Bill Bell [billnb...@gmail.com] wrote:
>
> [solr maxdoc limit of 2b]
>
>> For Solr 5 why don't we switch it to 64 bit ??
>
> The biggest challenge for a switch is that Java's arrays can only hold 2b 
> values. I support the idea of switching to much larger minimums throughout 
> the code. But it is a larger fix than replacing int with long.
>
> - Toke Eskildsen

Re: How large is your solr index?

2015-01-03 Thread Shawn Heisey

On 1/3/2015 9:02 AM, Erick Erickson wrote:
> bq: For Solr 5 why don't we switch it to 64 bit ??
> 
> -1 on this for a couple of reasons
>> it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to 
>> implement at the last second
>> It's not clear that it's even useful. Once you get to that many documents, 
>> performance usually suffers
> 
> Of course I wouldn't be doing the work so I really don't have much of
> a vote, but it's not clear to me at
> all that enough people would actually have a use-case for 2b+ docs in
> a single shard to make it
> worthwhile. At that scale GC potentially becomes really unpleasant for
> instance

I agree, 2 billion documents in a single index is MORE than enough.  If
you actually create an index that large, you're going to have
performance problems, and most of those performance problems will likely
be related to garbage collection.  I can extrapolate one such problem
from personal experience on a much smaller index.

A filterCache entry for a 2 billion document index is 256MB in size.
Assuming you're using the G1 collector, the maximum size for a G1 heap
region is 32MB, which means that at that size, every single filter will
result in an object that is allocated immediately from the old
generation (it's called a humongous allocation).  Allocating that much
memory from the old generation will eventually (and frequently) result
in a full garbage collection ... and you do not want your application to
wait for a full garbage collection on the heap size that would be
required for a 2 billion document index.  It could easily exceed 30 or
60 seconds.

When you consider the current limitations of G1GC, it would be advisable
to keep each Solr index below 100 million documents.  At 134,217,728
documents, each filter object will be too large (more than 16MB) to be
considered a normal allocation on the max heap region size (32MB).

Even with the older battle-tested CMS collector (assuming good tuning
options), I think the huge object sizes (and the huge number of smaller
objects) resulting from a 2 billion document index will have major
garbage collection problems.

Thanks,
Shawn

RE: How large is your solr index?

2015-01-03 Thread Toke Eskildsen

Erick Erickson [erickerick...@gmail.com] wrote:
> Of course I wouldn't be doing the work so I really don't have much of
> a vote, but it's not clear to me at all that enough people would actually
> have a use-case for 2b+ docs in a single shard to make it
> worthwhile. At that scale GC potentially becomes really unpleasant for
> instance

Over the last years we have seen a few use cases here on the mailing list. I 
would be very surprised if the number of such cases does not keep rising. 
Currently the work for a complete overhaul does not measure up to the rewards, 
but that is slowly changing. At the very least I find it prudent to not limit 
new Lucene/Solr interfaces to ints.

As for GC: Right now a lot of structures are single-array oriented (for example 
using a long-array to represent bits in a bitset), which might not work well 
with current garbage collectors. A change to higher limits also means 
re-thinking such approaches: If the garbage collectors likes objects below a 
certain size then split the arrays into that. Likewise, iterations over 
structures linear in size to the index could be threaded. These are issues even 
with the current 2b limitation.

- Toke Eskildsen

Re: How large is your solr index?

2015-01-03 Thread Erick Erickson

I can't disagree. You bring up some of the points that make me _extremely_
reluctant to try to get this in to 5.x though. 6.0 at the earliest I should
think.

And who knows? Java may get a GC process that's geared to modern
amounts of memory and get by the current pain

Best,
Erick

On Sat, Jan 3, 2015 at 1:00 PM, Toke Eskildsen 
wrote:

> Erick Erickson [erickerick...@gmail.com] wrote:
> > Of course I wouldn't be doing the work so I really don't have much of
> > a vote, but it's not clear to me at all that enough people would actually
> > have a use-case for 2b+ docs in a single shard to make it
> > worthwhile. At that scale GC potentially becomes really unpleasant for
> > instance
>
> Over the last years we have seen a few use cases here on the mailing list.
> I would be very surprised if the number of such cases does not keep rising.
> Currently the work for a complete overhaul does not measure up to the
> rewards, but that is slowly changing. At the very least I find it prudent
> to not limit new Lucene/Solr interfaces to ints.
>
> As for GC: Right now a lot of structures are single-array oriented (for
> example using a long-array to represent bits in a bitset), which might not
> work well with current garbage collectors. A change to higher limits also
> means re-thinking such approaches: If the garbage collectors likes objects
> below a certain size then split the arrays into that. Likewise, iterations
> over structures linear in size to the index could be threaded. These are
> issues even with the current 2b limitation.
>
> - Toke Eskildsen
>

Re: How large is your solr index?

2015-01-03 Thread Jack Krupansky

Back in June on a similar thread I asked "Anybody care to forecast when
hardware will catch up with Solr and we can routinely look forward to
newbies complaining that they indexed "some" data and after only 10 minutes
they hit this weird 2G document count limit?"

Still not there.

So the race is on between when Lucene will relax the 2G limit and when
hardware gets fast enough that 2G documents can be indexed within a small
number of hours.


-- Jack Krupansky

On Sat, Jan 3, 2015 at 4:00 PM, Toke Eskildsen 
wrote:

> Erick Erickson [erickerick...@gmail.com] wrote:
> > Of course I wouldn't be doing the work so I really don't have much of
> > a vote, but it's not clear to me at all that enough people would actually
> > have a use-case for 2b+ docs in a single shard to make it
> > worthwhile. At that scale GC potentially becomes really unpleasant for
> > instance
>
> Over the last years we have seen a few use cases here on the mailing list.
> I would be very surprised if the number of such cases does not keep rising.
> Currently the work for a complete overhaul does not measure up to the
> rewards, but that is slowly changing. At the very least I find it prudent
> to not limit new Lucene/Solr interfaces to ints.
>
> As for GC: Right now a lot of structures are single-array oriented (for
> example using a long-array to represent bits in a bitset), which might not
> work well with current garbage collectors. A change to higher limits also
> means re-thinking such approaches: If the garbage collectors likes objects
> below a certain size then split the arrays into that. Likewise, iterations
> over structures linear in size to the index could be threaded. These are
> issues even with the current 2b limitation.
>
> - Toke Eskildsen
>

RE: How large is your solr index?

2015-01-03 Thread Toke Eskildsen

Erick Erickson [erickerick...@gmail.com] wrote:
> I can't disagree. You bring up some of the points that make me _extremely_
> reluctant to try to get this in to 5.x though. 6.0 at the earliest I should
> think.

Ignoring the magic 2b number for a moment, I think the overall question is 
whether or not single shards should perform well in the hundreds of millions of 
documents range. The alternative is more shards, but it is quite an explicit 
process to handle shard-juggling. From an end-user perspective, the underlying 
technology matters little: Whatever the choice, it should be possible to 
install "something" on a machine and expect it to scale within the hardware 
limitations without much ado.

- Toke Eskildsen

Re: How large is your solr index?

2015-01-03 Thread Jack Krupansky

That's a laudable goal - to support low-latency queries - including
faceting - for "hundreds of millions" of documents, using Solr "out of the
box" on a random, commodity box selected by IT and just adding a dozen or
two fields to the default schema that are both indexed and stored, without
any "expert" tuning, by an "average" developer. The reality doesn't seem to
be there today. 50 to 100 million documents, yes, but beyond that takes
some kind of "heroic" effort, whether a much beefier box, very careful and
limited data modeling or limiting of query capabilities or tolerance of
higher latency, expert tuning, etc.

The proof is always in the pudding - pick a box, install Solr, setup the
schema, load 20 or 50 or 100 or 250 or 350 million documents, try some
queries with the features you need, and you get what you get.

But I agree that it would be highly desirable to push that 100 million
number up to 350 million or even 500 million ASAP since the pain of
unnecessarily sharding is unnecessarily excessive.

I wonder what changes will have to occur in Lucene, or... what evolution in
commodity hardware will be necessary to get there.

-- Jack Krupansky

On Sat, Jan 3, 2015 at 6:11 PM, Toke Eskildsen 
wrote:

> Erick Erickson [erickerick...@gmail.com] wrote:
> > I can't disagree. You bring up some of the points that make me
> _extremely_
> > reluctant to try to get this in to 5.x though. 6.0 at the earliest I
> should
> > think.
>
> Ignoring the magic 2b number for a moment, I think the overall question is
> whether or not single shards should perform well in the hundreds of
> millions of documents range. The alternative is more shards, but it is
> quite an explicit process to handle shard-juggling. From an end-user
> perspective, the underlying technology matters little: Whatever the choice,
> it should be possible to install "something" on a machine and expect it to
> scale within the hardware limitations without much ado.
>
> - Toke Eskildsen
>

Re: solr export get wrong results

2015-01-03 Thread Sandy Ding

Thanks a lot for your for your help, Joel.
Just wondering, why does "export" have such limitations? It uses the same
query handler with "select", isn't it?

2014-12-31 10:28 GMT+08:00 Joel Bernstein :

> For the initial release only JSON output format is supported with the
> /export feature. Also there is no built-in distributed support yet. Both of
> these features are likely to follow in future releases.
>
> For the initial release you'll need a client that can handle the JSON
> format and distributed logic. The Heliosearch project includes a client
> called CloudSolrStream that you can use for this purpose. Here are two
> links to get started with CloudSolrStream:
>
>
> https://github.com/Heliosearch/heliosearch/blob/helio_4_10/solr/solrj/src/java/org/apache/solr/client/solrj/streaming/CloudSolrStream.java
> http://heliosearch.org/streaming-aggregation-for-solrcloud/
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
> On Mon, Dec 29, 2014 at 2:20 AM, Sandy Ding 
> wrote:
>
> > Hi, Joel
> >
> > Thanks for your reply.
> > It seems that the weird export results is because that I removed the
> " > name>xsort" invariant of the export request handler in the default
> > sorlconfig.xml to get csv-format output.
> > I don't quite understand the meaning of "xsort", but I removed it
> because I
> > always get json response (as you said) with the xsort invariant.
> > Is there a way to get a csv output using export?
> > And also, can I get full results from all shards? (I tried to set
> > "distrib=true" but get "SyntaxError:xport RankQuery is required for
> xsort:
> > rq={!xport}", and I do have rq={!xport} in the export invariants)
> >
> >
> > 2014-12-27 3:21 GMT+08:00 Joel Bernstein :
> >
> > > Hi Sandy,
> > >
> > > I pulled Solr 4.10.3 to see if I could recreate the issue you are
> seeing
> > > with export and I wasn't able to recreate the bug you are seeing. For
> > > example the following query:
> > >
> > > http://localhost:8983/solr/collection1/export?q=join_i:[50 TO
> > > 500010]&wt=json&indent=true&sort=join_i+asc&fl=join_i,ShopId_i
> > >
> > >
> > > Brings back the following result:
> > >
> > >
> > > {"responseHeader": {"status": 0}, "response":{"numFound":11,
> > >
> > >
> >
> "docs":[{"join_i":50,"ShopId_i":578917},{"join_i":51,"ShopId_i":294217},{"join_i":52,"ShopId_i":199805},{"join_i":53,"ShopId_i":633461},{"join_i":54,"ShopId_i":472995},{"join_i":55,"ShopId_i":672122},{"join_i":56,"ShopId_i":394637},{"join_i":57,"ShopId_i":446443},{"join_i":58,"ShopId_i":697329},{"join_i":59,"ShopId_i":166988},{"join_i":500010,"ShopId_i":191261}]}}
> > >
> > >
> > > Notice the join_i values are all within the correct range.
> > >
> > > If you can post the export handler configuration we should be able to
> > > see the issue.
> > >
> > >
> > > Joel Bernstein
> > > Search Engineer at Heliosearch
> > >
> > > On Fri, Dec 26, 2014 at 1:50 PM, Joel Bernstein 
> > > wrote:
> > >
> > > > Hi Sandy,
> > > >
> > > > The export handler should only return documents in JSON format. The
> > > > results in your second example are in XML for format so something
> looks
> > > to
> > > > be wrong in the configuration. Can you post what your solrconfig
> looks
> > > like?
> > > >
> > > > Joel
> > > >
> > > > Joel Bernstein
> > > > Search Engineer at Heliosearch
> > > >
> > > > On Fri, Dec 26, 2014 at 12:43 PM, Erick Erickson <
> > > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> I think you missed a very important part of Jack's reply:
> > > >>
> > > >> bq: I notice that you don't have distrib=false on your select, which
> > > >> would make your select be from all nodes, while export would only be
> > > >> docs from the specific node you sent the request to.
> > > >>
> > > >> And from the Reference Guide on export
> > > >>
> > > >> bq: The initial release treats all queries as non-distributed
> > > >> requests. So the client is responsible for making the calls to each
> > > >> Solr instance and merging the results.
> > > >>
> > > >> So the export statement you're sending is _only_ exporting the
> results
> > > >> from the shard on 8983 and completely ignoring the other (6?)
> shards,
> > > >> whereas the query you're sending is getting the results from all the
> > > >> shards.
> > > >>
> > > >> As Jack said, add &distrib=false to the query, send it to the same
> > > >> shard you send the export command to and the results should match.
> > > >>
> > > >> Also, be sure your configuration for the /select handler doesn't
> have
> > > >> any additional default parameters that might alter the results, but
> I
> > > >> doubt that's really a problem here.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Fri, Dec 26, 2014 at 7:02 AM, Ahmet Arslan
> >  > > >
> > > >> wrote:
> > > >> > Hi,
> > > >> >
> > > >> > Do you have any custom solr components deployed? May be custom
> > > response
> > > >> writer?
> > > >> >
> > > >> > Ahmet
> > > >> >
> > > >> >
> > > >> >
> > > >>

Re: How large is your solr index?

RE: De Duplication using Solr

RE: How large is your solr index?

Re: De Duplication using Solr

Re: De Duplication using Solr

Re: How large is your solr index?

Re: How large is your solr index?

RE: How large is your solr index?

Re: How large is your solr index?

Re: How large is your solr index?

RE: How large is your solr index?

Re: How large is your solr index?

Re: solr export get wrong results

13 matches

Site Navigation

Mail list logo

Footer information