We are not doing anything special in terms of routing.
The issue seems fixed after setting numShards=2 parameter in solr.in.cmd file.
set -DnumShards=2
Not sure if anything changed in solr 5.2 which requires to add this parameter
in solr.in.cmd file. In Solr 4.8 it was working fine even with
.nabble.com is indexing each post, is it possible to delete my post or hide
email id
On Mon, Aug 10, 2015 at 11:24 AM, Roshan Agarwal
wrote:
> Dear All,
>
> Can any one let us know how to implement plagiarism Checker with solr,
> how to index content with shingles and what to send in queries
>
>
Hi Mikhail,
Im trying to read 7-8 xml files of data that contain realistic data from our
production server. Then I would like to read this data into EmbeddedSolrServer
to test for edge cases for our custom date search. The use of
EmbeddedSolrServer is purely to separate the data testing from an
Hi,
I've tried to split my collection from 1 shard to 2 shards using the
command:
http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
The shard was split successfully with all the index intact. The search and
highlight gives the same results before a
Hi
Check out the CollapsingQParser
(https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results).
As long as you have a field that will be the same for all duplicates, you can
“collapse” on that field. If you not have a “group id”, you can create one
using e.g. an MD5 signatur
Apache only removes or modifies posts when personal information is
revealed, such as social security numbers. Email addresses and phone
numbers are not considered such. Apache has no control over Nabble and
such third party services.
I would suggest you resubscribe with a different email address t
Hi,
I am trying to model am index from a relational database and i have 3
main entity types: products, buyers and sellers.
I am using nested documents for sellers and buyers, as i have many
sellers and many buyers for one product:
{ "Active" : "true",
"CategoryID" : 59,
"CategoryName" : "
Florin,
I disclosure some details in the recent post
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html.
Let me know if you have further questions afterwards.
I also notice that you use "obvious" syntax: BuyerID=83 but it's hardly
ever possible. There is a good habit of debugQ
Endre,
As I suggested before, consider to avoid test framework, just put all code
interacting with EmbeddedSolrServer into main() method.
On Mon, Aug 31, 2015 at 12:15 PM, Moen Endre wrote:
> Hi Mikhail,
>
> Im trying to read 7-8 xml files of data that contain realistic data from
> our producti
It doesn't matter which node you do it on. And, you can replace an
existing alias by just creating another one with the same name.
Upayavira
On Mon, Aug 31, 2015, at 02:04 PM, Bill Au wrote:
> Thank, Shawn. So I only need to issue the command to update the alias on
> one of the node in the SolrC
We are using solrcloud 5.2 with 1 shard (in UK Data Center) and 1 replica
(in Australia Data Center). We observed that data inserted/updated in shard
(UK Data center) is replicated very slowly to Replica in AUSTRALIA Data
Center (Due to high latency between UK and AUSTRALIA). We are looking to
impr
For 1-3, test and see. The problem I often see is that it is _assumed_ that
flattening the data will cost a lot in terms of index size and maintenance.
Test that assumption before going down the relational road.
You haven't talked about how many documents you have, how much data
would have to be r
On Mon, Aug 31, 2015, at 02:23 PM, Maulin Rathod wrote:
> We are using solrcloud 5.2 with 1 shard (in UK Data Center) and 1 replica
> (in Australia Data Center). We observed that data inserted/updated in
> shard
> (UK Data center) is replicated very slowly to Replica in AUSTRALIA Data
> Center (Due
If you really must expunge deletes, use optimize. That will merge all
index segments into one, and in the process will remove any deleted
documents.
Why do you need to expunge deleted documents anyway? It is generally
done in the background for you, so you shouldn't need to worry about it.
Upayav
Erick,
Apologies for missing out on status on indexing (replication) issues as
I have originally started this thread. After implementing
CloudSolrServer instead of CouncurrentUpdateSolrServer things were much
better. I simply wanted to follow up on understanding the memory
behavior better tho
I am having a hard time finding documentation on DataImportHandler
scheduling in SolrCloud. Can someone please post a link to that? I have a
requirement that the DIH should be initiated at a specific time Monday
through Friday.
Thanks!
Hi Troy,
I think folks use corncobs (with curl utility) provided by the Operating System.
Ahmet
On Monday, August 31, 2015 8:26 PM, Troy Edwards
wrote:
I am having a hard time finding documentation on DataImportHandler
scheduling in SolrCloud. Can someone please post a link to that? I have a
So, I think corncobs is not a utility, but a pattern - you have cron run curl
to invoke something on your web application on the localhost (and elsewhere),
and it runs the job if the job needs running, thus the webapp keeps the state.
There's a utility cronlock (https://github.com/kvz/cronlock)
Thank you Erick. What about cache size? If we add replicas to our
cluster and each replica has nGBytes of RAM allocated for HDFS caching,
would that help performance? Specifically the performance we want to
increase is time to facet data, time to cluster data and search time.
While we index
Hi Folks,
I need to merge docs received from multiple shards via a custom logic, a
straightforward score based priority queue doesn't work for my scenario (I
need to maintain a blend/distribution of docs).
How can I plugin my custom merge logic? One way might be to fully implement
the QueryCompon
We have about 15 million items. Each item has 10 attributes that we are
indexing at this time. We are planning on adding 15 more attributes in
future.
We have about 1 customers. Each of the items mentioned above can have
special pricing, etc for each of the customers. There are 6 attributes of
Apologies for cross posting a question from SO here.
I am very interested in the new faceting on child documents feature of Solr
5.3 and would like to know if somebody has figured out how to do it as
asked in the question on
http://stackoverflow.com/questions/32212949/solr-5-3-faceting-on-children
Mostly just do the most naive data-flattening you can and see
how big the index is. You really have to generate the index then
run representative queries at it.
But naively flattening the data in this case approaches
15B documents, which is a problem, you're sharding over quite a
few shards etc.
Yes, No, Maybe.
bq; Specifically the performance we want to increase is time to facet
data, time to cluster data and search time
Well, that about covers everything ;)
You cannot talk about this without also taking about cache warming. Given your
setup, I'm guessing you have very few searches on
OK, thanks for wrapping this up!
On Mon, Aug 31, 2015 at 10:08 AM, Rallavagu wrote:
> Erick,
>
> Apologies for missing out on status on indexing (replication) issues as I
> have originally started this thread. After implementing CloudSolrServer
> instead of CouncurrentUpdateSolrServer things were
Sorry Jamie, I totally missed this email. There was no Jira that I could
find. I created SOLR-7996
On Sat, Aug 29, 2015 at 5:26 AM, Jamie Johnson wrote:
> This sounds like a good idea, I'm assuming I'd need to make my own
> UnInvertingReader (or subclass) to do this right? Is there a way to do
Anyone else running into any issues trying to get the authentication and
authorization plugins in 5.3 working?
> On Aug 29, 2015, at 2:30 AM, Kevin Lee wrote:
>
> Hi,
>
> I’m trying to use the new basic auth plugin for Solr 5.3 and it doesn’t seem
> to be working quite right. Not sure if I’m
On 8/31/2015 7:23 AM, Maulin Rathod wrote:
> We are using solrcloud 5.2 with 1 shard (in UK Data Center) and 1 replica
> (in Australia Data Center). We observed that data inserted/updated in shard
> (UK Data center) is replicated very slowly to Replica in AUSTRALIA Data
> Center (Due to high latenc
On 8/31/2015 11:26 AM, Troy Edwards wrote:
> I am having a hard time finding documentation on DataImportHandler
> scheduling in SolrCloud. Can someone please post a link to that? I have a
> requirement that the DIH should be initiated at a specific time Monday
> through Friday.
Every modern operat
Hi All,
I have a cluster that has the overseer leader gone. This is on Solr 4.10.3
version.
Its completely gone from zookeeper and bouncing any instance does not start a
new election process.
Anyone experience this issue before and any ideas to fix this.
Thanks,
Rishi.
Hi Upayavira
In fact we are using optimize currently but was advised to use expunge
deletes as it is less resource intensive.
So expunge deletes will only remove deleted documents, it will not merge
all index segments into one?
If we don't use optimize, the deleted documents in the index will
Thanks Jan.
But I read that the field that is being collapsed on must be a single
valued String, Int or Float. As I'm required to get the distinct results
from "content" field that was indexed from a rich text document, I got the
following error:
"error":{
"msg":"java.io.IOException: 64 bit
Can't you just treat it as String?
Also, do you actually want those documents in your index in the first
place? If not, have you looked at De-duplication:
https://cwiki.apache.org/confluence/display/solr/De-Duplication
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a ne
Hi Alexandre,
Will treating it as String affect the search or other functions like
highlighting?
Yes, the content must be in my index, unless I do a copyField to do
de-duplication on that field.. Will that help?
Regards,
Edwin
On 1 September 2015 at 10:04, Alexandre Rafalovitch
wrote:
> Can'
Re-read the question. You want to de-dupe on the full text-content.
I would actually try to use the dedupe chain as per the link I gave
but put results into a separate string field. Then, you group on that
field. You cannot actually group on the long text field, that would
kill any performance. So
Thank you for your advice Alexandre.
Will try out the de-duplication from the link you gave.
Regards,
Edwin
On 1 September 2015 at 10:34, Alexandre Rafalovitch
wrote:
> Re-read the question. You want to de-dupe on the full text-content.
>
> I would actually try to use the dedupe chain as per
I tried to follow the de-duplication guide, but after I configured it in
solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is
no error message. I'm using SimplePostTool to index rich-text documents.
Below are my configurations:
In solrconfig.xml
dedupe
true
Admin UI is not protected by any of these permissions. Only if you try
to perform a protected operation , it asks for a password.
I'll investigate the restart problem and report my findings
On Tue, Sep 1, 2015 at 3:10 AM, Kevin Lee wrote:
> Anyone else running into any issues trying to get the
We have a Solr cloud (4.7) consisting of 5 servers.
At some point we noticed that one of the servers had a very high CPU and
was not responding. A few minutes later, the other 4 servers were
responding very slowly. A restart was required.
Looking at the Solr logs, we mainly saw symptoms, i.e. error
39 matches
Mail list logo