Re: Extract a list of the most recent field values?

2021-02-05 Thread Emir Arnautović
Hi Jimi, It seems to me that you could get the results using collapsing query parse: https://lucene.apache.org/solr/guide/6_6/collapse-and-expand-results.html HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Emir Arnautović
to true for simple types in recent schemas. > 2> Perhaps you pulled > over an old definition from your former schema? > > > One other thing: you mention a bit of custom code you needed to change. I > always try to investigate that first. Is it possible to > 1> reproduce

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Emir Arnautović
Hi Jaan, It can be several things: caches fieldCache/fieldValueCache - it can be that you you are missing doc values on some fields that are used for faceting/sorting/functions and that uninverted field structures are eating your memory. filterCache - you’ve changed setting for filter caches and

Re: Question on solr metrics

2020-10-27 Thread Emir Arnautović
Hi, In order to see time range metrics, you’ll need to collect metrics periodically and send it to some storage and then query/visualise. Solr has exporters for some popular backends, or you can use some cloud based solution. One such solution is our: https://sematext.com/integrations/solr-monit

Re: Any blog or url that explain step by step configure grafana dashboard to monitor solr metrics

2020-09-25 Thread Emir Arnautović
Hi, In case you decide to go with cloud solution, you can check how you can monitor Solr with Sematext: https://sematext.com/blog/solr-monitoring-made-easy-with-sematext/ Regards, Emir -- Monitoring - Log Management - Alerting

Re: Replication in soft commit

2020-09-03 Thread Emir Arnautović
, > Tushar > On Thu, 3 Sep 2020 at 16:17, Emir Arnautović > wrote: > >> Hi Tushar, >> Replication is file based process and hard commit is when segment is >> flushed to disk. It is not common that you use soft commits on master. The >> only usecase that I can

Re: Replication in soft commit

2020-09-03 Thread Emir Arnautović
Hi Tushar, Replication is file based process and hard commit is when segment is flushed to disk. It is not common that you use soft commits on master. The only usecase that I can think of is when you read your index as part of indexing process, but even that is bad practice and should be avoided

Re: Understanding Negative Filter Queries

2020-07-14 Thread Emir Arnautović
Hi Chris, tag:* is a wildcard query while *:* is match all query. I believe that adjusting pure negative is turned on by default so you can safely just use -tag:email and it’ll be translated to *:* -tag:email. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elastic

Re: Search for term except within phrase

2020-07-06 Thread Emir Arnautović
Hi Stavros, I didn’t check what’s supported in ComplexPhraseQueryParser but that is wrapper around span queries, so you should be able to do what you need: https://lucene.apache.org/solr/guide/7_6/other-parsers.html#complex-phrase-query-parser

Re: Searching document content and mult-valued fields

2020-07-06 Thread Emir Arnautović
Hi Shaun, If project content is relatively static, you could use nested documents or you could plain with join query parser . HTH, Emir -- Mon

Re: Solr caches per node or per core

2020-06-24 Thread Emir Arnautović
Hi Reinaldo, It is per core. Single node can have cores from different collections, each configured differently. When you size caches from memory consumption point of view, you have to take into account how many cores will be placed on each node. Of course, you have to count replicas as well. H

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-24 Thread Emir Arnautović
Hi all, Here is how I see it and explain to others that are not too familiar with Solr: Solr comes in two flavours - Cloud and Standalone. In any mode Solr writes to primary core(s). There is option to have different types of replicas, but in Standalone mode one can only have pull replica. In ad

Re: Solr Deletes

2020-05-26 Thread Emir Arnautović
Hi Dwane, DBQ does not play well with concurrent updates - it’ll block updates on replicas causing replicas to fall behind, trigger full replication and potentially OOM. My advice is to go with cursors (or even better use some DB as source of IDs) and DBID with some batching. You’ll need some te

Re: solr payloads performance

2020-05-11 Thread Emir Arnautović
Hi Wei, In order to use payload you have to use functions and that’s not cheap. In order to make it work fast, you could use it as post filter and filter on some summary field like minPrice/maxPrice/defaultPrice. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elas

Re: Minimum Match Query

2020-05-07 Thread Emir Arnautović
Hi Russel, You are right about mm - it is about min term matches. Frequencies are usually used to determine score. But you can also filter on number of matches using function queries: fq={!frange l=3}sum(termfreq(field, ‘barker’), termfreq(field, ‘jones’), termfreq(field, ‘baker’)) It is not pe

Re: Reindexing using dataimporthandler

2020-04-27 Thread Emir Arnautović
Hi Bjarke, I don’t see a problem with that approach if you have enough resources to handle both cores at the same time, especially if you are doing that while serving production queries. The only issue is that if you plan to do that then you have to have all fields stored. Also note that cursorM

Re: Rule of thumb for determining maxTime of AutoCommit

2020-02-27 Thread Emir Arnautović
depends on the system, the > service, etc... ) > > > > Sincerely, > Kaya Ota > > > > 2020年2月26日(水) 17:36 Emir Arnautović : > >> Hi Kaya, >> The answer is simple: as much as your requirements allow delay between >> data being indexed and change

Re: Rule of thumb for determining maxTime of AutoCommit

2020-02-26 Thread Emir Arnautović
Hi Kaya, The answer is simple: as much as your requirements allow delay between data being indexed and changes being visible. It is sometimes seconds and sometimes hours or even a day is tolerable. On each commit your caches are invalidated and warmed (if it is configured like that) so in order

Re: How to monitor the performance of the SolrCloud cluster in real time

2020-02-23 Thread Emir Arnautović
Hi Adonis, If you are up to 3rd party, cloud based monitoring solution, you can try our integration for Solr/SolrCloud: https://sematext.com/cloud/ Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Tra

Re: SOLR PERFORMANCE Warning

2020-02-20 Thread Emir Arnautović
Hi, It means that you are either committing too frequently or your warming up takes too long. If you are committing on every bulk, stop doing that and use autocommit. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - ht

Re: Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-10 Thread Emir Arnautović
how more > than one tokens can be consumed. I can implement my custom logic once I > have access to multiple tokens from previous filter. > > Thanks > Pratik > > On Mon, Feb 10, 2020 at 2:47 AM Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >

Re: Solr Analyzer : Filter to drop tokens based on some logic which needs access to adjacent tokens

2020-02-09 Thread Emir Arnautović
Hi Pratik, You might be able to do some of required things using PatternReplaceChartFilter, but as you can see it does not operate on tokens level but input string. Your best bet is custom token filter. Not sure how familiar you are with how token filters work, but you have access to tokens fro

Re: Number of requested rows

2020-02-05 Thread Emir Arnautović
Wed, 2020-02-05 at 13:00 +0100, Emir Arnautović wrote: >> I was thinking in that direction. Do you know where it is in the >> codebase or which structure is used - I am guessing some array of >> objects? > > Yeah. More precisely a priority queue of Objects, initialized wit

Re: Number of requested rows

2020-02-05 Thread Emir Arnautović
h.HitQueue.HitQueue(int, > boolean), you may found an alternative usage you probably is looking for. > > On Wed, Feb 5, 2020 at 3:01 PM Emir Arnautović > wrote: > >> Hi Mikhail, >> I was thinking in that direction. Do you know where it is in the codebase >> or w

Re: Number of requested rows

2020-02-05 Thread Emir Arnautović
http://sematext.com/ > On 5 Feb 2020, at 12:54, Mikhail Khludnev wrote: > > Absolutely. Searcher didn't know number of hits a priory. It eagerly > allocate results heap before collecting results. The only cap I'm aware of > is maxDocs. > > On Wed, Feb 5, 2020 at 2:42 PM

Number of requested rows

2020-02-05 Thread Emir Arnautović
Hi, Does somebody know if requested number of rows is used internally to set some temp structures? In other words will query with rows=100 be more expensive than query with rows=1000 if number of hits is 1000? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr &

Re: How expensive is core loading?

2020-01-29 Thread Emir Arnautović
Hi Rahul, It depends. You might have warm up queries that would populate caches. For each core Solr exposes JMX stats so you can read just those without “touching" core. You can also try using some of existing tools for monitoring Solr, but I don’t think that any of them provides you info about

Re: Easiest way to export the entire index

2020-01-29 Thread Emir Arnautović
Hi Amanda, I assume that you have all the fields stored so you will be able to export full document. Several thousands records should not be too much to use regular start+rows to paginate results, but the proper way of doing that would be to use cursors. Adjust page size to avoid creating huge

Re: In-place re-indexing after DocValue schema change

2020-01-29 Thread Emir Arnautović
Hi, 1. No, it’s not valid. Solr will look at schema to see if it can use docValues or if it has to uninvert field and it assumes that all fields will have doc values. You might expect from wrong results to errors if you do something like that. 2. Not sure if it would work, but It is not better t

Re: Coming back to search after some time... SOLR or Elastic for text search?

2020-01-16 Thread Emir Arnautović
Hi Jan, Here is a blog post related to this topic: https://sematext.com/blog/solr-vs-elasticsearch-differences/ It also contains links to other resources that might help you make a decision. HTH, Emir -- Monitoring - Log Management -

Re: Boosting only top n results that match a criteria

2019-12-28 Thread Emir Arnautović
) the top 5 results of class A1 from my potentially 100s of >>> results. Then boost them to first page. >>> Do you think this(or near this) behaviour is possible >>> using RerankQParserPlugin? Please shed more light how. >>> >>> On Fri, 27 Dec 2019 at 19:48, Erick

Re: Boosting only top n results that match a criteria

2019-12-27 Thread Emir Arnautović
Hi Nitin, Can you simply filter and return top 5: ….&fq=class:A1&rows=5 Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 27 Dec 2019, at 13:55, Nitin Arora wrote: > > Hello, I have a comple

Re: Indexing with customized parameters

2019-12-12 Thread Emir Arnautović
Hi Anuj, Maybe I am missing something but this is more question for some SQL group than for Solr group. I am surprised that you get any records. You can consult your DB documentation for some more elegant solution, but a brute-force solution, if your column is string, could be: WHERE sector = 27

Re: Is it possible to have different Stop words depending on the value of a field?

2019-12-04 Thread Emir Arnautović
Hi, I’ve spent quite a lot time working on a similar issue but I did not think about it much since (at the time it was Solr 1.3) so some new features could push me to some other direction, but here is what I remember: You cannot rely on users entering standardised address format even within one

Re: Solr Case Insensitive Search while preserving cases in Index and allowing Boolean AND/OR searches

2019-12-02 Thread Emir Arnautović
Hi Lewin, Not sure I follow your example. From what I read, you could have one field lowercased and other not and filter on the first field and facet on the second. There is probably something that I am missing, so some example would probably help. Thanks, Emir -- Monitoring - Log Management -

Re: Exact match

2019-12-02 Thread Emir Arnautović
Hi Omer, From performance perspective, it is the best if you index title as a single token: KeywordTokenizer + LowerCaseFilter If you need to query that field in some other way, you can index it differently as some other field using copyField. HTH, Emir -- Monitoring - Log Management - Alerting

Re: How to implement NOTIN operator with Solr

2019-11-19 Thread Emir Arnautović
port Training - http://sematext.com/ > On 19 Nov 2019, at 11:08, Raboah, Avi wrote: > > In that case I got only doc1 > > -Original Message- > From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] > Sent: Tuesday, November 19, 2019 11:51 AM > To: solr-user@luc

Re: How to implement NOTIN operator with Solr

2019-11-19 Thread Emir Arnautović
Hi Avi, There are span queries, but in this case you don’t need it. It is enough to simply filter out documents that are with “credit card”. Your query can be something like +text:credit -text:”credit card” If you prefer using boolean operators, you can write it as: text:credit AND NOT text: “cre

Re: Use of TLog

2019-11-18 Thread Emir Arnautović
to build > other solr cluster. I am able to make that work but Is this design okay? or > any other approach I can try to get a new cluster spin up with the same > data as in the old one. > > Thanks, > Sripradeep P > > > On Mon, Nov 18, 2019 at 2:12 PM Emir Arnautović

Re: Use of TLog

2019-11-18 Thread Emir Arnautović
Hi Sripradeep, Simplified: TLog files are used to replay index updates from the last successful hard commit in case of some Solr crashes. It is used on the next Solr startup. It does not contain all updates, otherwise, it would duplicate the index size. If you start from these premises, you will

Re: Solr 7.2.1 - unexpected docvalues type

2019-11-11 Thread Emir Arnautović
Hi Antony, Like Erick explained, you still have to preprocess your field in order to be able to use doc values. What you can do is use update request processor chain and have all the logic in Solr. Here is blog post explaining how it could work: https://www.od-bits.com/2018/02/solr-docvalues-on-

Re: Commit disabled

2019-11-08 Thread Emir Arnautović
Hi David, Index will get updated (hard commit is happening every 15s) but changes will not be visible until you explicitly commit or you reload core. Note that Solr restart reloads cores. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Supp

Re: Query on changing FieldType

2019-10-23 Thread Emir Arnautović
gt;> Monitoring - Log Management - Alerting - Anomaly Detection >>>> Solr & Elasticsearch Consulting Support Training - >> http://sematext.com/ >>>> >>>> >>>> >>>>> On 22 Oct 2019, at 10:53, Shubham Goswami >> >&g

Re: Query on changing FieldType

2019-10-22 Thread Emir Arnautović
; different types defined ? > or if i talk about my previous query, can we index some data for the same > field with different unique id after replacing the type ? > > Thanks again > Shubham > > On Tue, Oct 22, 2019 at 1:23 PM Emir Arnautović < > emir.arnauto...@semate

Re: Query on changing FieldType

2019-10-22 Thread Emir Arnautović
Hi Shubham, Changing type is not allowed without full reindexing. If you do something like that, Solr will end up with segments with different types for the same field. Remember that segments are immutable and that reindexing some document will be in new segment, but old segment will still be th

Re: Metrics API - Documentation

2019-10-07 Thread Emir Arnautović
Hi Richard, We do not use API to collect metrics but JMX, but I believe that those are the same (did not verify it in code). You can see how we handled those metrics into reports/charts or even use our agent to send data to Prometheus: https://github.com/sematext/sematext-agent-integrations/tree

Re: Optimizing after daily Replication - Does optimization resets cache?

2019-09-15 Thread Emir Arnautović
Hi Paras, In master-slave model, optimisation will affect only master since slaves will get optimised segments from master. But note that slaves get what changed from master and in case of optimisation entire index will be replicated so you can experience longer replications and in case of large

Re: Is it possible to skip scoring completely?

2019-09-11 Thread Emir Arnautović
Hi Ash, I did not check the code, so not sure if your question is based on something that you find in the codebase or you are just assuming that scoring is called? I would assume differently: if you use only fq, then Solr does not have anything to score. Also, if you order by something other tha

Re: how to use copy filed as only taken after the suffix

2019-07-15 Thread Emir Arnautović
Hi Uma, Take a look at https://lucene.apache.org/solr/guide/8_1/charfilterfactories.html#solr-patternreplacecharfilterfactory Depending on your usecase, this might be enough for you. HTH, Em

Re: Function Query with multi-value field

2019-07-15 Thread Emir Arnautović
Hi Wei, I see two options: 1. create custom distance function for colors 2. split each color component to a separate numeric fields and try calculate distance function using standard set of functions. (I think that Solr does not support 3d points). HTH, Emir -- Monitoring - Log Management - Aler

Re: Solr cloud setup

2019-06-07 Thread Emir Arnautović
Hi Abhishek, Here is a nice blog post about migrating to SolrCloud: https://sematext.com/blog/solr-master-slave-solrcloud-migration/ Re number of shards - there is no definite answer - it depends on your indexing/search latency

Re: Softer version of grouping and/or filter query

2019-05-08 Thread Emir Arnautović
Hi Doug, It seems to me that you’ve found a way to increase score for those that are within selected price range, but “A price higher than $150 should not increase the score”. I’ll just remind you that scores in Solr are relevant to query and that you cannot do much other than sorting on it so i

Re: Solr monitoring

2019-04-29 Thread Emir Arnautović
Hi Shruti, One such tool is our https://sematext.com/spm . It provides Solr integration and ability to also send Solr logs. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Re: Optimal RAM to size index ration

2019-04-15 Thread Emir Arnautović
Hi, The recommendation to have RAM enough to place your entire index into memory is sort of worst case scenario (maybe better called the best case scenario) where your index is optimal and is fully used all the time. OS will load pages that are used and those that might be used to memory, so eve

Re: Filters and data cleansing

2019-04-15 Thread Emir Arnautović
Hi Ken, What Solr returns is stored value which is original value. Analysis is applied and its result is stored as “index” and is used for searching. In order to get what you want, you have to move analysis at least one step earlier. It can be moved to update request processor chain where you ap

Re: nested documents performance

2019-04-15 Thread Emir Arnautović
Hi Roi, I don’t know the details about your test, but trying to assume how it looks like and explain observed. With your flat test you are denormalising data, meaning creating data duplication so the resulting document set is larger. That means more fields/text for Solr/Lucene to analyse and to

Re: DocValues or stored fields to enable atomic updates

2019-04-05 Thread Emir Arnautović
Hi Andreas, Stored values are compressed so should take less disk. I am thinking that doc values might perform better when it comes to executing atomic update. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematex

Re: Solr index slow response

2019-03-19 Thread Emir Arnautović
> wiki.apache.org > Schema Design Considerations. indexed fields. The number of indexed fields > greatly increases the following: Memory usage during indexing ; Segment merge > time > > > > > > From: Emir Arnautović > Sent: Tu

Re: Solr index slow response

2019-03-19 Thread Emir Arnautović
hat it is really solr slow response. > > Those long response time is not really spikes, it's constantly happening, > almost half of the request has such long delay. The more document added in > one request the more delay it has. > > >

Re: Solr index slow response

2019-03-19 Thread Emir Arnautović
long periods of constant indexing of documents to a staging collection (~2 >> billion documents), we have following commit settings >> >> softcommit: 360ms (for periodic validation of data, since it's not in >> production) >> hardcommit: openSearcher -> false, 1

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
ap > is set to 4GB, there 3.2 free, I doubt swap would affect it since there is > such huge free memory. > > I could try to with set Xms and Xmx to the same value, but I doubt how much > would that change the response time. > > > BRs > > //Aaron > > _

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
Hi Aaron, You are right - large heap means that there will be no major GC all the time, but eventually it will happen and then the larger the heap the longer it will take. So with 300GB heap it takes observed 300s. If you used to run on 32GB heap and it was slow, it probably means that heap is t

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
One more thing - it is considered a good practice to use the same value for Xmx and Xms. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 18 Mar 2019, at 14:19, Emir Arnautović > wrot

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
lr.home=..", >"-Djetty.port=8983"], > "startTime":"2019-03-18T09:35:27.892Z", > "upTimeMS":9258422}}, > "system":{ >"name":"Linux", >"arch":"amd64", >"av

Re: Solr index slow response

2019-03-18 Thread Emir Arnautović
Hi Aaron, Which version of Solr? How did you configure your heap? Is it standalone Solr or SolrCloud? A single server? Do you use some monitoring tool? Do you see some spikes, pauses or CPU usage is constant? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elast

Re: questions regrading stored fields role in query time

2019-02-26 Thread Emir Arnautović
hanks > Saurabh > > On Tue, Feb 26, 2019 at 2:41 PM Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Saurabh, >> Welcome to the channel! >> Storing fields should not affect query performances directly if you use >> lazy field loading an

Re: questions regrading stored fields role in query time

2019-02-26 Thread Emir Arnautović
Hi Saurabh, Welcome to the channel! Storing fields should not affect query performances directly if you use lazy field loading and it is the default set. And it should not affect at all if you have enough RAM compared to index size. Otherwise OS caches might be affected by stored fields. The bes

Re: What's the deal with dataimporthandler overwriting indexes?

2019-02-12 Thread Emir Arnautović
Hi Joakim, This might not be what you expect but it is expected behaviour. When you do clean=true, DIH will first delete all records. That is how it works in both M/S and Cloud. The diff might be that you disabled replication or disabled auto commits in your old setup so it is not visible. You c

Re: Load balance writes

2019-02-11 Thread Emir Arnautović
wed Go client and write just to > one of 12 available nodes. I believe I should find out this smart way to > handle this :) > > > > >> On 11. Feb 2019, at 15:21, Emir Arnautović >> wrote: >> >> Hi Boban, >> If you use SolrCloud Solrj client and

Re: Load balance writes

2019-02-11 Thread Emir Arnautović
Hi Boban, If you use SolrCloud Solrj client and initialise it with ZK, it should be aware of masters and send documents in a smart way. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 11 Feb 2

Re: shingles + stop words

2018-12-10 Thread Emir Arnautović
Hi David, As you already observed shingles are concatenating tokens based on positions and in case of stopwords it results in empty string (you can configure it to be something else with fillerToken option). You can do the following: 1. if you do not have too many stopwords, you could use Patter

Re: Delete by query in SOLR 6.3

2018-11-15 Thread Emir Arnautović
Hi Rakesh, Since Solr has to maintain eventual consistency of all replicas, it has to block updates while DBQ is running. Here is blog post with high level explaination of the issue: http://www.od-bits.com/2018/03/dbq-or-delete-by-query.html

Re: A different result with filters

2018-10-26 Thread Emir Arnautović
Hi, The second query is equivalent to: > { > "query": "*:*", > "limit": 0, > "filter": [ >"{!parent which=kind_s:edition}condition_s:0", >"price_i:[* TO 75]" > ] > } HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Tra

Re: Internal Solr communication question

2018-10-25 Thread Emir Arnautović
Hi Fernando, I did not look at code and not sure if there is special handling in case of a single shard collection, but Solr does not have to choose local shard to query. It assumes that one node will receive all requests and that it needs to balance. What you can do is add preferLocalShards=tru

Re: indexed and stored for fields that are sources of a copy field

2018-10-22 Thread Emir Arnautović
ection of the Solr documentation. Should I create a JIRA issue asking > for this? > > Regards, > > Chris > > On 22/10/2018 14:28, Emir Arnautović wrote: >> Hi Chris, >> Yes you can do that. There is also type=“ignored” that you can use in such >> scenar

Re: indexed and stored for fields that are sources of a copy field

2018-10-22 Thread Emir Arnautović
Hi Chris, Yes you can do that. There is also type=“ignored” that you can use in such scenario. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 22 Oct 2018, at 15:22, Chris Wareham > wrote: >

Re: Custom typeahead using Solr

2018-10-17 Thread Emir Arnautović
Hi Vineet, You can index your jobtitle field in two different ways: 1. standard tokenizer -> edge ngram 2. keyword tokenizer -> edge ngram The first field will be used to match word regardless of its position and the second one to prefer exact matches. HTH, Emir -- Monitoring - Log Management -

Re: Using function in fiter query

2018-10-08 Thread Emir Arnautović
Hi Skanth, You can use FunctionRangeQueryParser to do that: https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-FunctionRangeQueryParser Let us know if you are having troubles

Re: Migrate cores from 4.10.2 to 7.5.0

2018-10-03 Thread Emir Arnautović
Hi Wolfgang, I would say that your safest bet is to start from 7.5 schema, adjust it to suite your needs and reindex (better than to try adjust your existing schema to 7.5). If all your fields are stored in current collection, you might be able to use DIH to reindex: http://www.od-bits.com/2018/

Re: How to do rollback from solrclient using python

2018-10-03 Thread Emir Arnautović
Hi Chetra, In addition to what Jason explained, rollbacks do not work in Solr Cloud. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 3 Oct 2018, at 14:45, Jason Gerlowski wrote: > > Hi Chet

Re: Clarification about Solr Cloud and Shard

2018-10-03 Thread Emir Arnautović
Hi Rekha, In addition to what Shawn explained, the answer to your last question is yes and no: You can split shards, but cannot change number of shards without reindexing. And you can add nodes but you should make sure adding nodes will result in well balanced cluster. You can address scalabilit

Re: Dynamic filters

2018-10-02 Thread Emir Arnautović
Hi Tamas, Maybe I am missing the point and you already discarded that option, but you should be able to cover such cases with simple faceting? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On

Re: CACHE -> fieldValueCache usage

2018-09-20 Thread Emir Arnautović
Hi Vincenzo, Are you saying that you used to see some numbers other than 0 and now you see 0? If it is always zero, it means that you are not using features that require uninverted version of field (mainly faceting) or that you have doc values enabled on all fields that are used in such scenario

Re: 20180913 - Clarification about Limitation

2018-09-14 Thread Emir Arnautović
Hi, Here are some thought on how to resolve some of “it depends”: http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elast

Re: Data Import Handler with Solr Source behind Load Balancer

2018-09-14 Thread Emir Arnautović
Hi Thomas, Is this SolrCloud or Solr master-slave? Do you update index while indexing? Did you check if all your instances behind LB are in sync if you are using master-slave? My guess would be that DIH is using cursors to read data from another Solr. If you are using multiple Solr instances beh

Re: Boost only first 10 records

2018-09-03 Thread Emir Arnautović
Hi, The requirement is not 100% clear or logical. If user selects filter type:comedy, it does not make sense to show anything else. You might have “Other categories relavant results” and that can be done as a separate query. It seems that you want to prefer comedy, but you have an issue with boo

Re: Split on whitespace parameter doubt

2018-08-30 Thread Emir Arnautović
Hi David, Your observations seem correct. If all fields produces the same tokens then Solr goes for “term centric” query, but if different fields produce different tokens, then it uses field centric query. Here is blog post that explains it from multiword synonyms perspective: https://opensourc

Re: Issue with adding an extra Solr Slave

2018-08-28 Thread Emir Arnautović
he > connectivity between the master and slaves exist. The only error I see in the > new Slave log is what I shared originally. > > Thanks, > Zafar. > > > > -Original Message- > From: Emir Arnautović [mailto:emir.arnauto...@sematext.com] > Sent: Tue

Re: Issue with adding an extra Solr Slave

2018-08-28 Thread Emir Arnautović
Hi Zafar, How do you access admin console? Through ELB or you see this behaviour when accessing admin console of a new slave? Do you see any replication related errors in new slave’s logs? Did you check connectivity of a new slave and master nodes? Thanks, Emir -- Monitoring - Log Management -

Re: How to hit filterCache?if filterQuery is a sub range query of another already cache range filterQuery

2018-08-24 Thread Emir Arnautović
Hi, No it will not and it does not make sense to - it would still have to apply filter on top of cached results since they can include values with 2. You can consider a query as entry into cache. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Cons

Re: How to add solr admin ui

2018-08-22 Thread Emir Arnautović
Hi Ahmed, I am not aware of some extension point in UI, but you can maybe use some combination of request handler and velocity response writer to get what you want, but you will not have some link in UI. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Cons

Re: Metrics for a healthy Solr cluster

2018-08-16 Thread Emir Arnautović
Hi, If you are up to ready-to-go Solr monitoring, you can check out Sematext’s Solr integration: https://sematext.com/integrations/solr-monitoring/ Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch

Re: Solr changing the search when given many qf fields?

2018-08-16 Thread Emir Arnautović
Hi Aaron, It is probably not about number of fields but related to different analysis of different fields. As long as all your fields analyzers produce the same tokens you should get “term centric” query. Once any of your analyzers produce different token, it’ll become “field centric”. It is lik

Re: Ignored fields and copyfield

2018-08-06 Thread Emir Arnautović
Hi John, Yes it can and it is common pattern when you want to index multiple fields into a single field or if you want to standardise naming without changing indexing logic. Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Trainin

Re: Need an advice for architecture.

2018-07-19 Thread Emir Arnautović
Hi Francois, If I got your numbers right, you are indexing on a single server and indexing rate is ~31 doc/s. I would first check if something is wrong with indexing logic. You check where the bottleneck is: do you read documents from DB fast enough, do you batch documents… Assuming you cannot h

Re: Solr7.3.1 Installation

2018-07-11 Thread Emir Arnautović
Hi, Why are you building Solr? Because you added your custom query parser? If that’s the case, then it is not the way to do it. You should set up separate project for your query parser, build it and include jar in your Solr setup. It is not query parser, but here is blog/code for simple update pr

Re: Maximum number of SolrCloud collections in limited hardware resource

2018-06-29 Thread Emir Arnautović
Hi, It is probably the best if you merge some of your collections (or all) and have discriminator field that will be used to filter out tenant’s documents only. In case you go with multiple collections serving multiple tenants, you would have to have logic on top of it to resolve tenant to colle

Re: SolrCloud Large Cluster Performance Issues

2018-06-25 Thread Emir Arnautović
Hi, With such a big cluster a lot of things can go wrong and it is hard to give any answer without looking into it more and understanding your model. I assume that you are monitoring your system (both Solr/ZK and components that index/query) so it should be the first thing to look at and see if

Re: Delete By Query issue followed by Delete By Id Issues

2018-06-24 Thread Emir Arnautović
Hi Sujatha, Did I get it right that you are deleting the same documents that will be updated afterward? If that’s the case, then you can simply skip deleting, and just send updated version of document. Solr (Lucene) does not have delete - it’ll just flag document as deleted. Updating document (a

Re: Solr cloud with different JVM size nodes

2018-06-19 Thread Emir Arnautović
Hi Rishi, It is not uncommon to have tiers in your cluster assuming you weighted if it is the best choice. I would remind you that 32GB is not a good heap size since you cannot use compressed OOPS. Check what is the limit of your JVM but 30GB is a safe bet. Also, what did you mean be “got high f

  1   2   3   4   >