Hi all
We have changed all solr configs and commit parameters that were mentioned
by Shawn,
but still - when inserting the same 300 documents from 20 threads we see no
latency
and when inserting different 300 docs from 20 threads it is very slow and no
cpu/ram/disk/network are showing high metrics
can u paste the stacktrace here
On Tue, Apr 11, 2017 at 1:19 PM, Zheng Lin Edwin Yeo
wrote:
> I found from StackOverflow that we should declare it this way:
> http://stackoverflow.com/questions/43335419/using-basicauth-with-solrj-code
>
>
> SolrRequest req = new QueryRequest(new SolrQuery("*:*")
when 1.15 will be released? maybe you have some beta version and I could
test it :)
SAX sounds interesting, and from info that I found in google it could solve
my issues.
On Tue, Apr 11, 2017 at 10:48 PM, Allison, Timothy B.
wrote:
> It depends. We've been trying to make parsers more, erm, fle
Hi,
I'm getting an error with indexing using SolrJ after setting up the Basic
Authentication with the following code.
Credentials defaultcreds = new UsernamePasswordCredentials("id",
"password");
appendAuthentication(defaultcreds, "BASIC", solr);
private static void appendAuthentication(Credenti
JVM version? We’re running v8 update 121 with the G1 collector and it is
working really well. We also have an 8GB heap.
Graph your heap usage. You’ll see a sawtooth shape, where it grows, then there
is a major GC. The maximum of the base of the sawtooth is the working set of
heap that your Solr
On 4/11/2017 2:56 PM, Chetas Joshi wrote:
> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
> with number of shards = 80 and replication Factor=2
>
> Sold JVM heap size = 20 GB
> solr.hdfs.blockcache.enabled = true
> solr.hdfs.blockcache.direct.memory.allocation = true
> Max
On 4/11/2017 2:19 PM, Scruggs, Matt wrote:
> I’m updating our schema.xml file with 1 change: deleting a field.
>
> Do I need to re-index all of my documents in Solr, or can I simply reload my
> collection config by calling:
>
> http://mysolrhost:8000/solr/admin/collections?action=RELOAD&name=myco
Hi Jordi,
Thanks for the advice.
Regards,
Edwin
On 11 April 2017 at 18:27, Jordi Domingo Borràs
wrote:
> Browsers retain basic auth information. You have to close it or clean
> browsing history. You can also change the user password at server side.
>
> Best
>
> On Tue, Apr 11, 2017 at 7:18 AM
When I have done this, it is in multiple steps.
1. Change the indexing so that no data is going to that field.
2. Reindex, so the field is empty.
3. Remove the field from the schema.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Apr 11, 2017, at 3
Hi - We did this on one occasion and Solr started complaining in the logs about
a field that is present but not defined. We thought the problem would go away
within 30 days - the time within every document is reindexed or deleted - but
it did not, for some reason. Forcing a merge did not solve t
I’m updating our schema.xml file with 1 change: deleting a field.
Do I need to re-index all of my documents in Solr, or can I simply reload my
collection config by calling:
http://mysolrhost:8000/solr/admin/collections?action=RELOAD&name=mycollection
Thanks,
Matt
Hi - i cannot think of any real drawback right away. But you probably can
expect a slightly different ordered MLT response. It should not be a problem if
you select enough terms for MLT lookup.
Regards,
Markus
-Original message-
> From:David Hastings
> Sent: Tuesday 11th April 2017
On 4/8/2017 6:42 PM, Mike Thomsen wrote:
> I'm running two nodes of SolrCloud in Docker on Windows using Docker
> Toolbox. The problem I am having is that Docker Toolbox runs inside of a
> VM and so it has an internal network inside the VM that is not accessible
> to the Docker Toolbox VM's host O
Here is a small snippet that I copy pated from Shawn Helsey (who is a core
contributor I think, he's good):
> One thing to note: SolrCloud begins to have performance issues when the
> number of collections in the cloud reaches the low hundreds. It's not
> going to scale very well with a collecti
Hello,
I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
with number of shards = 80 and replication Factor=2
Sold JVM heap size = 20 GB
solr.hdfs.blockcache.enabled = true
solr.hdfs.blockcache.direct.memory.allocation = true
MaxDirectMemorySize = 25 GB
I am querying a solr
John,
Here I mean a query, which matches a doc, which it expected to be matched
by the problem query.
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-TheexplainOtherParameter
On Tue, Apr 11, 2017 at 11:32 PM, John Blythe wrote:
> first off, i don't
first off, i don't think i have a full handle on the import of what is
outputted by the debugger.
that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")" is
matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should
match. the query analyzer is keywordtokenizer, pa
Hi, was wondering if there are any known drawbacks to using the CommonGram
factory, in regards to such features as the "more like this"
John,
How do you suppose to match any of "parsed_filter_queries":["
MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
against
vendor_coolmed | coolmed | vendor ?
I just can't see any chance to match them.
One po
It depends. We've been trying to make parsers more, erm, flexible, but there
are some problems from which we cannot recover.
Tl;dr there isn't a short answer. :(
My sense is that DIH/ExtractingDocumentHandler is intended to get people up and
running with Solr easily but it is not really a gre
>
> And this overhead depends on what? I mean, if I create an empty collection
> will it take up much heap size just for "being there" ?
Yes. You can search on elastic-search/solr/lucene mailing lists and see
that it's true. But nobody has `empty` collections, so yours will have a
schema and some
hi, erick.
appreciate the feedback.
1> i'm sending the terms to solr enquoted
2> i'd thought that at one point and reran the indexing. i _had_ had two of
the fields not indexed, but this represented one pass (same analyzer) from
two diff source fields while 2 or 3 of the other 4 fields _were_ see
Thanks for your responses.
Are there any posibilities to ignore parsing errors and continue indexing?
because now solr/tika stops parsing whole document if it finds any exception
On Apr 11, 2017 19:51, "Allison, Timothy B." wrote:
> You might want to drop a note to the dev or user's list on Apac
The way the data is spread across the cluster is not really uniform. Most of
shards have way lower than 50GB; I would say about 15% of the total shards
have more than 50GB.
Dorian Hoxha wrote
> Each shard is a lucene index which has a lot of overhead.
And this overhead depends on what? I mean,
Skimming, I don't think this is inconsistent. First I assume that
you're OK with the second example, it's this one seems odd to you:
sort=score asc
group.sort=score desc
You're telling Solr to return the highest scoring doc in each group.
However, you're asking to order the _groups_ in ascending
&debug=query is your friend. There are several issues that often trip people up:
1> The analysis tab pre-supposes that what you put in the boxes gets
all the way to the field in question. Trivial example:
I put (without quotes) "erick erickson" in the "name" field in the
analysis page and see that
You might want to drop a note to the dev or user's list on Apache POI.
I'm not extremely familiar with the vsd(x) portion of our code base.
The first item ("PolylineTo") may be caused by a mismatch btwn your doc and the
ooxml spec.
The second item appears to be an unsupported feature.
The thir
Ok :)
But if you have time have a look at my project https://github.com/freedev/
solrcloud-zookeeper-docker
The project builds a couple of docker instances (solr - zookeeper) or a
cluster with 6 nodes.
Then you have just to put in your hosts file the ip addresses of your VM
and you can play with
I am looking for best practices when a search component in one handler,
needs to invoke another handler, say /basic. So far, I got this working
prototype:
public void process(ResponseBuilder rb) throws IOException {
SolrQueryResponse response = new SolrQueryResponse();
Modifiabl
What I'm suggesting, is that you should aim for max(50GB) per shard of
data. How much is it currently ?
Each shard is a lucene index which has a lot of overhead. If you can, try
to have 20x-50x-100x less shards than you currently do and you'll see lower
heap requirement. I don't know about static/d
Dorian Hoxha wrote
> Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a
> little too much for 3TB of data ?
> Something like 0.167GB for each shard ?
> Isn't that too much overhead (i've mostly worked with es but still lucene
> underneath) ?
I don't have only 3TB , I have 3TB
Thanks. I think I'll take a look at that. I decided to just build a big
vagrant-managed desktop VM to let me run Ubuntu on my company machine, so I
expect that this pain point may be largely gone soon.
On Mon, Apr 10, 2017 at 12:31 PM, Vincenzo D'Amore
wrote:
> Hi Mike
>
> disclaimer I'm the aut
hi everyone.
i recently wrote in ('analysis matching, query not') but never heard back
so wanted to follow up. i'm at my wit's end currently. i have several
fields that are showing matches in the analysis tab. when i dumb down the
string sent over to query it still gives me issues in some field ca
I modified and cleaned the previous query. As you can see the first query
sorting is a bit odd.
Using parameters
sort=score asc
group.sort=score desc
http://localhost:8983/solr/mcontent.ph_post/select?=&fl=*,score&group.field=partnerId&group.limit=1&group.main=false&group.ngroups=true&gro
#field() is defined in _macros.vm as this monstrosity:
# TODO: make this parameterized fully, no context sensitivity
#macro(field $f)
#if($response.response.highlighting.get($docId).get($f).get(0))
#set($pad = "")
#foreach($v in $response.response.highlighting.get($docId).get($f))
the group.sort spec is specified twice in the URL
group.sort=score desc&
group.sort=score desc
Is there a chance that during testing you only changed _one_ of them so you had
group.sort=score desc&
group.sort=score asc
? I think the last one should win.. Shot in the dark.
Best,
Erick
On Tue,
Can't the filter be used in cases when you're paginating in
sharded-scenario ?
So if you do limit=10, offset=10, each shard will return 20 docs ?
While if you do limit=10, _score<=last_page.min_score, then each shard will
return 10 docs ? (they will still score all docs, but merging will be
faster)
Hey guys,
I have a problem:
In Velocity:
*Beschreibung:*#field('LONG_TEXT')
In Solr the field "LONG_TEXT" dont show everything only the first ~90-110
characters.
But if I set "$doc.getFieldValue('LONG_TEXT')" in the Velocity file, then he
show me everything whats inside in the field "LONG_TEXT".
Hi,
history:
1. we're using single core Solr 6.4 instance on windows server (windows
server 2012 R2 standard),
2. Java v8, (build 1.8.0_121-b13).
3. as a workaround for earlier issues with visio files, we have in
solr-6.4.0\contrib\extraction\lib:
3.1. ooxml-schemas-1.3.jar instead of poi-ooxml-
Can i ask what is the final requirement here ?
What are you trying to do ?
- just display less results ?
you can easily do at search client time, cutting after a certain amount
- make search faster returning less results ?
This is not going to work, as you need to score all of them as Erick
expla
On Mon, 2017-04-10 at 13:27 +0530, Himanshu Sachdeva wrote:
> Thanks for your time and quick response. As you said, I changed our
> logging level from SEVERE to INFO and indeed found the performance
> warning *Overlapping onDeckSearchers=2* in the logs.
If you only see it occasionally, it is proba
Browsers retain basic auth information. You have to close it or clean
browsing history. You can also change the user password at server side.
Best
On Tue, Apr 11, 2017 at 7:18 AM, Zheng Lin Edwin Yeo
wrote:
> Anyone has any idea if the authentication will expired automatically? Mine
> has alrea
To be fair the second result seems consistent with the Solr grouping logic :
*First Query results (Suspicious)*
1) group.sort= score desc -> select the group head as you have 1 doc per
group( the head will be the top scoring doc per group)
2) sort=score asc -> sort the groups by the score of the h
Also you should change the heap 32GB->30GB so you're guaranteed to get
pointer compression. I think you should have no need to increase it more
than this, since most things have moved to out-of-heap stuff, like
docValues etc.
On Tue, Apr 11, 2017 at 12:07 PM, Dorian Hoxha
wrote:
> Isn't 18K luce
Isn't 18K lucene-indexes (1 for each shard, not counting the replicas) a
little too much for 3TB of data ?
Something like 0.167GB for each shard ?
Isn't that too much overhead (i've mostly worked with es but still lucene
underneath) ?
Can't you use 1/100 the current number of collections ?
On Mon
45 matches
Mail list logo