Hi Steve,
I cannot remove deduplication at index time, but rather to find duplicates
of the document then inform the duplicate data back to user.
Yes, I need to query each document of all 40 million rows. It will be about
10 mapper tasks max. Will try the SolrJ for this purpose. Thanks Steve.
Be
Hi Tim,
Thank you for the great pointer. Will join the group.
Thanks,
Dino
On Tue, Jan 5, 2016 at 2:10 AM, Tim Williams wrote:
> Apache Blur (Incubating) has several approaches (hive, spark, m/r)
> that could probably help with this ranging from very experimental to
> stable. If you're inter
bq. There's no good reason to have 5 with a small cluster and by "small" I
mean < 100s of nodes.
Well, a good reason would be if you want your system to continue to operate
if 2 ZK nodes lose communication with the rest of the cluster or go down
completely. Just to be clear though, the ZK nodes de
Dear All,
I've a use case where I need to do selective replication from master to
slave.
Basically I am going with master slave approach - the application pushing
data to master will need to preview the search and if the search is deemed
useful/appropriate I need the data to be replicated to slav
You might consider trying to get the de-duplication done at index time:
https://cwiki.apache.org/confluence/display/solr/De-Duplication that way
the map reduce job wouldn't even be necessary.
When it comes to the map reduce job, you would need to be more specific
with *what* you are doing for peop
Hi,
Eric:
I changed updateLog as follows.
/mnt/nitin_test/
I made this change after the collection was created and then updated zk and
reloaded the collection.
Mark: Ok that might be the issue. I will try doing this without the reload.
Thanks,
Nitin
On Sat, Jan 9, 2016 at 2:32 PM, Mark Mi
bq: is it best/good to get the CLUSTERSTATUS via the collection API
and explicitly send queries to a replica to ensure I don't send
queries to the leaders of my collection
In a word _no_. SolrCloud is vastly different than the old
master/slave. In SolrCloud, each and every node (leader and replica
dataDir and tlog dir cannot be changed with a core reload.
- Mark
On Sat, Jan 9, 2016 at 1:20 PM Erick Erickson
wrote:
> Please show us exactly what you did. and exactly
> what you saw to say that "does not seem to work".
>
> Best,
> Erick
>
> On Fri, Jan 8, 2016 at 7:47 PM, KNitin wrote:
> >
Hi Mark,
Try using set method instead of add method : params1.set("fl", "id");
I also suggest to use static String for "fl" as you used CommonParams.Q for "q"
Congrats for your first search component!
happy searching,
Ahmet
On Saturday, January 9, 2016 11:32 PM, Mark Robinson
wrote:
Th
Thanks Eric!
Appreciate your valuable suggestions.
Now I am getting the concept of a search-component better!
So my custom class is just this after removing the SOLRJ part, as I just
need to modify the query by adding some parameters dynamically before the
query actually is executed by SOLR:-
pu
Woah, Mark…. you’re making a search request within a search component.
Instead, let the built-in “query” component do the work for you.
I think one fix for you is to make your “components” be “first-components”
instead (allowing the other default search components to come into play). You
don
Hi,
Ahmet, Jack, Thanks for the pointers.
My requirement is, I would not be having the facets or sort fields or its
order as static.
For example suppose for a particular scenario I need to show only 2 facets
and sort on only one field.
For another scenario I may have to do facet.field for a differ
Thank you all so much for your responses. Very helpful indeed!
> On Jan 8, 2016, at 12:03 PM, Erick Erickson wrote:
>
> First, Daniel nailed the XY problem, but this isn't that...
>
> You're correct that hand-editing the schema file is error-prone.
> The managed schema API is your friend here
Hi,
(btw, when is 5.5 due? I see the docs reference it, but not the
download page)
Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it
best/good to get the CLUSTERSTATUS via the collection API and explicitly
send queries to a replica to ensure I don't send queries to the leade
For some reason, "slice" is the preferred term in the _code_, while
"shard" is preferred in docs
FWIW
Erick
On Fri, Jan 8, 2016 at 3:51 PM, Jeff Wartes wrote:
>
> Honestly, I have no idea which is "old". The solr source itself uses slice
> pretty consistently, so I stuck with that when I st
I don't really know unless there's _something_ different
about the docs, and you could delete by _query_, something
like id=XXX AND (condition unique to the doc you want to remove).
I'm more concerned about how there got to be duplicate entries in the
first place. There really shouldn't be any wit
Please show us exactly what you did. and exactly
what you saw to say that "does not seem to work".
Best,
Erick
On Fri, Jan 8, 2016 at 7:47 PM, KNitin wrote:
> Hi,
>
> How do I specify a different directory for transaction logs? I tried using
> the updatelog entry in solrconfig.xml and reloaded t
Sure, you CAN do this, but why would you want to? I mean, what exactly is
the motivation here? If you truly have custom code to execute, fine, but if
all you are trying to do is set parameters, a custom request handler is
hitting a tack with a sledge hammer. For example, why isn't setting
defaults
Hi Mark,
Yes this is possible. Better, you can use a custom SearchComponent for this
task too.
You retrieve solr parameters, wrap it into ModifiableSolrParams. Add extra
parameters etc, then pass it to underlying search components.
Ahmet
On Saturday, January 9, 2016 3:59 PM, Mark Robinson
w
Hi,
When I initially fire a query against my Solr instance using SOLRJ I pass
only, say q=*:*&fq=(myfield:vaue1).
I have written a custom RequestHandler, which is what I call in my SolrJ
query.
Inside this custom request handler can I add more query params like say the
facets etc.. so that ultimat
20 matches
Mail list logo