date:20181030

streaming expressions substring-evaluator

2018-10-30 Thread Aroop Ganguly

Hey Team Is there a way to extract a part of a string field and group by on it and obtain a histogram ? for example the filed value is DateTime of the form: 20180911T00 and I want to do a substring like substring(field1,0,7), and then do a streaming expression of the form : rollup( selec

Re: partial update in solr

2018-10-30 Thread Zahra Aminolroaya

Alex I use solr 7. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr cloud - poweroff procedure

2018-10-30 Thread Walter Underwood

I agree. 1. Shut down each Solr server process using the “bin/solr” script. 2. Shut down the Zookeeper ensemble. 3. Take backups. 4. Shut down the OS. Do that in reverse to get going. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 30, 2018, at

Re: Solr cloud - poweroff procedure

2018-10-30 Thread Erick Erickson

bin/solr stop As long as you don't kill it with extreme prejudice (i.e. kill -9 or pull the plug) it should be fine. Assuming you're running ZooKeeper in an external ensemble, I'd certainly stop those after all the Solr instances were stopped. Powering the nodes up is irrelevant to Solr, the bin/

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Benedict Holland

Thanks Doug. It is funny that you should mention that. It is very hard trying to convince people that just because words are somehow related, we really don't know how they are related. This is especially true when they are handed the results of a shallow neural net that took a research team a few

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Doug Turnbull

You may already know this, but just be very careful. Embeddings are useful, but people often think of them as detecting synonyms, but really just encode contexts. For example antonyms and words with similar functions often are seen as similar. There's also issues with terms that occur in sparsely

Solr cloud - poweroff procedure

2018-10-30 Thread lstusr 5u93n4

Hi All, We have a solr cloud running 3 shards, 3 hosts, 6 total NRT replicas, and the data director on hdfs. It has 950 million documents in the index, occupying 700GB of disk space. We need to completely power off the system to move it. Are there any actions we should take on shutdown to help t

RE: Odd Scoring behavior

2018-10-30 Thread Markus Jelsma

Hello Webster, It smells like KeywordRepeat. In general it is not a problem if all terms are scored twice. But you also have RemoveDuplicates, and this causes that in some cases a term in one field is scored twice, but once in the other field and then you have a problem. Due to lack of replies

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Benedict Holland

Oh very cool. I will have to look into this more. This is something up and coming I take it? Thanks, ~Ben On Tue, Oct 30, 2018 at 4:36 PM Alexandre Rafalovitch wrote: > Simon Hughes presentation on just finished Activate may be relevant: > > https://www.slideshare.net/SimonHughes13/vectors-in-s

Odd Scoring behavior

2018-10-30 Thread Webster Homer

I noticed that sometimes query matches seem to get counted twice when they are scored. This will happen if the fieldtype is being stemmed, and there is a matching synonym. It seems that the score for the field is 2X higher than it should be. We see this only when there is a matching synonym that

RE: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Phil Scadden

I will second the SolrJ method. You don’t want to be doing this on your SOLR instance. One question is whether your PDFs are scanned or are already searchable. I use tesseract offline to convert all scanned PDFs into searchable PDF so I don’t want Tika to be doing that. My code core is:

RE: Merging data from different sources

2018-10-30 Thread Markus Jelsma

Hello Martin, We also use an URP for this in some cases. We index documents to some collection, the URP reads a field from that document which is an ID in another collection. So we fetch that remote Solr document on-the-fly, and use those fields to enrich the incoming document. It is very stra

RE: Merging data from different sources

2018-10-30 Thread Martin Frank Hansen (MHQ)

Hi Alex, Thanks for your help. I will take a look at the update-request-processor. I wonder if there is a way to link documents together, so that they always show up together should one of the documents match a search query? -Original Message- From: Alexandre Rafalovitch Sent: 30. okto

Re: Integrating word2vec and glove results into Solr

2018-10-30 Thread Alexandre Rafalovitch

Simon Hughes presentation on just finished Activate may be relevant: https://www.slideshare.net/SimonHughes13/vectors-in-search-towards-more-semantic-matching The video will be available in a couple of weeks, I am guessing from LucidWorks channel. Related repos: *) https://github.com/DiceTechJobs/

Integrating word2vec and glove results into Solr

2018-10-30 Thread Benedict Holland

Hello all, We came up with a fascinating question. We actually have for our corpora, word2vec, doc2vec, and GloVe results. Is it possible to use these datasets within the search engine? If so, could you please point me to documentation on how to get Solr to use them? Thank you so much, ~Ben

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Shawn Heisey

On 10/29/2018 7:24 AM, Sofiya Strochyk wrote: Actually the smallest server doesn't look bad in terms of performance, it has been consistently better that the other ones (without replication) which seems a bit strange (it should be about the same or slightly worse, right?). I guess the memory be

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread ☼ R Nair

I have done a production implementation of this, running for last four months without any issue. Just a resatrt every week of all components. http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-pdfs-at-scale-using-fewer-than-50-lines-of-code/ Best, Ravion On Tue, Oct 30, 2018, 1:00 PM Er

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Erick Erickson

All of the above work, but for robust production situations you'll want to consider a SolrJ client, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog combines indexing from a DB and using Tika, but those are independent. Best, Erick On Tue, Oct 30, 2018 at 12:21 AM Kamuela Lau

Re: Sorting of solr.CurrencyFieldType in 7.3.1

2018-10-30 Thread Erick Erickson

Chris: Please follow the instructions here: http://lucene.apache.org/solr/community.html#mailing-lists-irc. You must use the _exact_ same e-mail as you used to subscribe. If the initial try doesn't work and following the suggestions at the "problems" link doesn't work for you, let us know. But no

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Sofiya Strochyk

Sure, here is IO for bigger machine: https://upload.cc/i1/2018/10/30/tQovyM.png for smaller machine: https://upload.cc/i1/2018/10/30/cP8DxU.png CPU utilization including iowait: https://upload.cc/i1/2018/10/30/eSs1YT.png iowait only: https://upload.cc/i1/2018/10/30/CHgx41.png On 30.10.18

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Deepak Goel

Please see inline... Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, G

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Shawn Heisey

On 10/29/2018 8:56 PM, Erick Erickson wrote: The interval between when a commit happens and all the autowarm queries are finished if 52 seconds for the filterCache. seen warming that that long unless something's very unusual. I'd actually be very surprised if you're really only firing 64 autowarm

Re: Sorting of solr.CurrencyFieldType in 7.3.1

2018-10-30 Thread Chris Gerke

UNSUBSCRIBE On Tue, 30 Oct 2018 at 8:24 pm, Stefan Kuhn wrote: > Hi, > > last week I found an error in the result sorting regarding a field of the > type "solr.CurrencyFieldType" in solr version 7.3.1. > > There are multiple documents which I must sort with this field, but the > order of the res

Sorting of solr.CurrencyFieldType in 7.3.1

2018-10-30 Thread Stefan Kuhn

Hi, last week I found an error in the result sorting regarding a field of the type "solr.CurrencyFieldType" in solr version 7.3.1. There are multiple documents which I must sort with this field, but the order of the result is apparently not correctly sorted after the sorting parameters (price_

Re: Merging data from different sources

2018-10-30 Thread Alexandre Rafalovitch

Maybe https://lucene.apache.org/solr/guide/7_5/update-request-processors.html#atomicupdateprocessorfactory Regards, Alex On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ), wrote: > Hi, > > I am trying to merge files from different sources and with different > content (except for one k

Merging data from different sources

2018-10-30 Thread Martin Frank Hansen (MHQ)

Hi, I am trying to merge files from different sources and with different content (except for one key-field) , how can this be done in Solr? An example could be: Document 1 001 Unique id for Document 1 test-123 … Do

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Sofiya Strochyk

My swappiness is set to 10, swap is almost not used (used space is on scale of a few MB) and there is no swap IO. There is disk IO like this, though: https://upload.cc/i1/2018/10/30/43lGfj.png https://upload.cc/i1/2018/10/30/T3u9oY.png However CPU iowait is still zero, so not sure if the disk

Re: TLOG replica stucks

2018-10-30 Thread Ere Maijala

Hi, We had the same happen with PULL replicas with Solr 7.5. Solr was showing that they all had correct index version, but the changes were not showing. Unfortunately the solr.log size was too small to catch any issues, so I've now increased and waiting for it to happen again. Regards, Ere

Re: SolrCloud scaling/optimization for high request rate

2018-10-30 Thread Deepak Goel

Yes. Swapping from disk to memory & vice versa Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Kamuela Lau

Hi there, Here are a couple of ways I'm aware of: 1. Extract-handler / post tool You can use the curl command with the extract handler or bin/post to upload a single document. Reference: https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html 2. DataImportHa

streaming expressions substring-evaluator

Re: partial update in solr

Re: Solr cloud - poweroff procedure

Re: Solr cloud - poweroff procedure

Re: Integrating word2vec and glove results into Solr

Re: Integrating word2vec and glove results into Solr

Solr cloud - poweroff procedure

RE: Odd Scoring behavior

Re: Integrating word2vec and glove results into Solr

Odd Scoring behavior

RE: Indexing PDF file in Apache SOLR via Apache TIKA

RE: Merging data from different sources

RE: Merging data from different sources

Re: Integrating word2vec and glove results into Solr

Integrating word2vec and glove results into Solr

Re: SolrCloud scaling/optimization for high request rate

Re: Indexing PDF file in Apache SOLR via Apache TIKA

Re: Indexing PDF file in Apache SOLR via Apache TIKA

Re: Sorting of solr.CurrencyFieldType in 7.3.1

Re: SolrCloud scaling/optimization for high request rate

Re: SolrCloud scaling/optimization for high request rate

Re: SolrCloud scaling/optimization for high request rate

Re: Sorting of solr.CurrencyFieldType in 7.3.1

Sorting of solr.CurrencyFieldType in 7.3.1

Re: Merging data from different sources

Merging data from different sources

Re: SolrCloud scaling/optimization for high request rate

Re: TLOG replica stucks

Re: SolrCloud scaling/optimization for high request rate

Re: Indexing PDF file in Apache SOLR via Apache TIKA

30 matches

Site Navigation

Mail list logo

Footer information