Hi Ketan,
Each shard is a separate index and if you are indexing 100doc/sec without
routing with two shards, you are indexing 50 docs/shard. If you have routing,
and all documents are from single tenant, single shard has to be able to
process 100doc/sec. If you have two nodes it means that you h
Hi,
I have Solr 4.10.3 part of a CDH5 installation and I would like to index
huge amount of CSV files on HDFS. I was wondering what is the best way of
doing that.
Here is the current approach:
data.csv:
id, fruit
10, apple
20, orange
Indexing with the following command using search-mr-1.0.0-cd
Hi,
For an LTR query, is there any way of checking how the `rq` is being
parsed? or specifically how the `efi` queries are treated?
For e.g. let's say my `rq` looks like this:
"rq":"{!ltr model=my_efi_model efi.text=my car}"
And my corresponding feature is:
SolrFeature [name=my_efi, params={q={!f
You probably get much more informed responses from
the Cloudera folks, especially about Hue.
Best,
Erick
On Wed, Oct 11, 2017 at 6:05 AM, István wrote:
> Hi,
>
> I have Solr 4.10.3 part of a CDH5 installation and I would like to index
> huge amount of CSV files on HDFS. I was wondering what is t
HI,
I have issue as mentions below while use Document Routing.
1: Query is slower with heavy index for below detail.
Config: 4 shard and 4 replica,with 8.5 GB Index Size(2GB Index Size for each
shard).
-With routing parameter:
q=worksetid_l:2028446%20AND%20modelid_l:23718&rows=1&_route_=10
Hi,
We're trying to investigate a possible data issue between two replicas in
our cloud setup. We have docValues enabled for a string field, and when we
facet by it, the results come back with the expected numbers per value, or
zero for all values.
Is there a way to tell which replica is handling
I want to use solr to index a markdown website. The files
are in native markdown, but they are served in HTML (by markserv).
Here's what I did:
docker run --name solr -d -p 8983:8983 -t solr
docker exec -it --user=solr solr bin/solr create_core -c handbook
Then, to crawl the site:
quadra[git:m
You can route a request to a specific replica by
solr_node:port/solr/collection1_shard1_replica1/query?distrib=false&blah
blah blah
The "distrib=false" bit will cause the query to go to that replica and
only that replica. You can get the shard (collection1_shard1_replica1)
from the admin UI "cores
I find myself in the same boat as TI when a Solr node goes into recovery.
Solr UI and the logs are really of no help at that time.
It would be really nice to enhance the Solr UI with the features mentioned
in the original post.
On Tue, Oct 10, 2017 at 4:14 AM, Charlie Hull wrote:
> On 10/10/201
Thanks! I was trying the distrib=false option but was apparently using it
incorrectly for the cloud. The shard.info parameter was what I was
originally looking for.
On Wed, Oct 11, 2017 at 1:09 PM Erick Erickson
wrote:
> You can route a request to a specific replica by
> solr_node:port/solr/col
Hi,
We've run into a strange issue with our deployment of solrcloud 6.3.0.
Essentially, a standard facet query on a string field usually comes back
empty when it shouldn't. However, every now and again the query actually
returns the correct values. This is only affecting a single shard in our
setu
I have a similar question. I performing my feature extraction with the
following:
fl= [features+efi.query=bakeware 3-piece set]
I'm pretty sure the dash is causing my query to error. But I'm also not sure
how the spaces impacts the efi param. I tried putting the term in quotes, but
that does n
bq: ...but the collection wasn't emptied first
This is what I'd suspect is the problem. Here's the issue: Segments
aren't merged identically on all replicas. So at some point you had
this field indexed without docValues, changed that and re-indexed. But
the segment merging could "read" the fir
Hello,
Could someone please let me know what this user-level keeper exception in
zookeeper mean? and How to fix the same.
Thanks,
GVK
2017-10-12 01:56:25,276 [myid:3] - INFO
[CommitProcessor:3:ZooKeeperServer@687] - Established session 0x35f0e3edd390001
with negotiated timeout 15000 for c
14 matches
Mail list logo