Hey Team
Is there a way to extract a part of a string field and group by on it and
obtain a histogram ?
for example the filed value is DateTime of the form: 20180911T00 and
I want to do a substring like substring(field1,0,7), and then do a streaming
expression of the form :
rollup(
selec
Alex I use solr 7.
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
I agree.
1. Shut down each Solr server process using the “bin/solr” script.
2. Shut down the Zookeeper ensemble.
3. Take backups.
4. Shut down the OS.
Do that in reverse to get going.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2018, at
bin/solr stop
As long as you don't kill it with extreme prejudice (i.e. kill -9 or
pull the plug) it should be fine. Assuming you're running ZooKeeper
in an external ensemble, I'd certainly stop those after all the Solr
instances were stopped.
Powering the nodes up is irrelevant to Solr, the bin/
Thanks Doug.
It is funny that you should mention that. It is very hard trying to
convince people that just because words are somehow related, we really
don't know how they are related. This is especially true when they are
handed the results of a shallow neural net that took a research team a few
You may already know this, but just be very careful. Embeddings are useful,
but people often think of them as detecting synonyms, but really just
encode contexts. For example antonyms and words with similar functions
often are seen as similar.
There's also issues with terms that occur in sparsely
Hi All,
We have a solr cloud running 3 shards, 3 hosts, 6 total NRT replicas, and
the data director on hdfs. It has 950 million documents in the index,
occupying 700GB of disk space.
We need to completely power off the system to move it.
Are there any actions we should take on shutdown to help t
Hello Webster,
It smells like KeywordRepeat. In general it is not a problem if all terms are
scored twice. But you also have RemoveDuplicates, and this causes that in some
cases a term in one field is scored twice, but once in the other field and then
you have a problem.
Due to lack of replies
Oh very cool. I will have to look into this more. This is something up and
coming I take it?
Thanks,
~Ben
On Tue, Oct 30, 2018 at 4:36 PM Alexandre Rafalovitch
wrote:
> Simon Hughes presentation on just finished Activate may be relevant:
>
> https://www.slideshare.net/SimonHughes13/vectors-in-s
I noticed that sometimes query matches seem to get counted twice when they are
scored. This will happen if the fieldtype is being stemmed, and there is a
matching synonym.
It seems that the score for the field is 2X higher than it should be. We see
this only when there is a matching synonym that
I will second the SolrJ method. You don’t want to be doing this on your SOLR
instance. One question is whether your PDFs are scanned or are already
searchable. I use tesseract offline to convert all scanned PDFs into searchable
PDF so I don’t want Tika to be doing that. My code core is:
Hello Martin,
We also use an URP for this in some cases. We index documents to some
collection, the URP reads a field from that document which is an ID in another
collection. So we fetch that remote Solr document on-the-fly, and use those
fields to enrich the incoming document.
It is very stra
Hi Alex,
Thanks for your help. I will take a look at the update-request-processor.
I wonder if there is a way to link documents together, so that they always show
up together should one of the documents match a search query?
-Original Message-
From: Alexandre Rafalovitch
Sent: 30. okto
Simon Hughes presentation on just finished Activate may be relevant:
https://www.slideshare.net/SimonHughes13/vectors-in-search-towards-more-semantic-matching
The video will be available in a couple of weeks, I am guessing from
LucidWorks channel.
Related repos:
*) https://github.com/DiceTechJobs/
Hello all,
We came up with a fascinating question. We actually have for our corpora,
word2vec, doc2vec, and GloVe results. Is it possible to use these datasets
within the search engine? If so, could you please point me to documentation
on how to get Solr to use them?
Thank you so much,
~Ben
On 10/29/2018 7:24 AM, Sofiya Strochyk wrote:
Actually the smallest server doesn't look bad in terms of performance,
it has been consistently better that the other ones (without
replication) which seems a bit strange (it should be about the same or
slightly worse, right?). I guess the memory be
I have done a production implementation of this, running for last four
months without any issue. Just a resatrt every week of all components.
http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/
Best, Ravion
On Tue, Oct 30, 2018, 1:00 PM Er
All of the above work, but for robust production situations you'll
want to consider a SolrJ client, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog
combines indexing from a DB and using Tika, but those are independent.
Best,
Erick
On Tue, Oct 30, 2018 at 12:21 AM Kamuela Lau
Chris:
Please follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc. You
must use the _exact_ same e-mail as you used to subscribe.
If the initial try doesn't work and following the suggestions at the
"problems" link doesn't work for you, let us know. But no
Sure, here is IO for bigger machine:
https://upload.cc/i1/2018/10/30/tQovyM.png
for smaller machine:
https://upload.cc/i1/2018/10/30/cP8DxU.png
CPU utilization including iowait:
https://upload.cc/i1/2018/10/30/eSs1YT.png
iowait only:
https://upload.cc/i1/2018/10/30/CHgx41.png
On 30.10.18
Please see inline...
Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"
+91 73500 12833
deic...@gmail.com
Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool
"Plant a Tree, G
On 10/29/2018 8:56 PM, Erick Erickson wrote:
The interval between when a commit happens and all the autowarm
queries are finished if 52 seconds for the filterCache. seen warming
that that long unless something's very unusual. I'd actually be very
surprised if you're really only firing 64 autowarm
UNSUBSCRIBE
On Tue, 30 Oct 2018 at 8:24 pm, Stefan Kuhn wrote:
> Hi,
>
> last week I found an error in the result sorting regarding a field of the
> type "solr.CurrencyFieldType" in solr version 7.3.1.
>
> There are multiple documents which I must sort with this field, but the
> order of the res
Hi,
last week I found an error in the result sorting regarding a field of the type
"solr.CurrencyFieldType" in solr version 7.3.1.
There are multiple documents which I must sort with this field, but the order
of the result is apparently not correctly sorted after the sorting parameters
(price_
Maybe
https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory
Regards,
Alex
On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), wrote:
> Hi,
>
> I am trying to merge files from different sources and with different
> content (except for one k
Hi,
I am trying to merge files from different sources and with different content
(except for one key-field) , how can this be done in Solr?
An example could be:
Document 1
001 Unique id for
Document 1
test-123
…
Do
My swappiness is set to 10, swap is almost not used (used space is on
scale of a few MB) and there is no swap IO.
There is disk IO like this, though:
https://upload.cc/i1/2018/10/30/43lGfj.png
https://upload.cc/i1/2018/10/30/T3u9oY.png
However CPU iowait is still zero, so not sure if the disk
Hi,
We had the same happen with PULL replicas with Solr 7.5. Solr was
showing that they all had correct index version, but the changes were
not showing. Unfortunately the solr.log size was too small to catch any
issues, so I've now increased and waiting for it to happen again.
Regards,
Ere
Yes. Swapping from disk to memory & vice versa
Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"
+91 73500 12833
deic...@gmail.com
Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in
Hi there,
Here are a couple of ways I'm aware of:
1. Extract-handler / post tool
You can use the curl command with the extract handler or bin/post to upload
a single document.
Reference:
https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html
2. DataImportHa
30 matches
Mail list logo