Re: Search by similarity?

2017-08-28 Thread Emir Arnautovic
Hi Darko, The issue is the wrong expectations: title-1-end is parsed to 3 tokens (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. Since all your documents have 'title' and 'end' tokens, all match. If you want to round up, you can use mm=-1% - that will result in zero (or

Re: JMX property keys

2017-06-09 Thread Emir Arnautovic
Hi Ari, It is common that way app is reporting metric is not monitoring friendly. It is not just how it is named but also some metrics require you to create statefull monitoring agent in order to be able to display them on time axis. I am not aware that this can be overridden for Solr, but y

Re: Search substring in field

2017-05-10 Thread Emir Arnautovic
Hi, Solr works on top of data structure called inverted index . You can misuse it and do not invert your documents and use regex or wildcards to find matches, but that is not the way to use it - it'll be significantly slower. Solr does support su

Re: Search inside grouping list

2017-05-09 Thread Emir Arnautovic
Can you try reproducing this issue on fresh Solr, and if you manage to, can you please share documents and steps to reproduce it. Which version of Solr do you run and do you have any custom plugins running on it? Emir On 09.05.2017 13:01, donjose wrote: Yes. I am getting the same result fo

Re: Search inside grouping list

2017-05-09 Thread Emir Arnautovic
Do you get the same result if you use q instead of fq? On 09.05.2017 07:38, donjose wrote: Hi Emir, Grouping by default is part of the configuration true assetid true Don. -- View this message in context: http://lucene.472066.n3.nabble.com/

Re: Search inside grouping list

2017-05-08 Thread Emir Arnautovic
Hi Don, This is query without grouping and returns expected results. But when you apply grouping by some field, you get wrong results? Can you share query results and query with grouping. Emir On 08.05.2017 14:28, donjose wrote: Hi Emir, Thank you for the response. Please find the query w

Re: Search inside grouping list

2017-05-08 Thread Emir Arnautovic
Hi, Can you please provide full query that you are sending to Solr. Thanks, Emir On 08.05.2017 07:18, donjose wrote: Could anyone can please reply for this query -- View this message in context: http://lucene.472066.n3.nabble.com/Search-inside-grouping-list-tp4333488p4333870.html Sent fro

Re: Search substring in field

2017-05-05 Thread Emir Arnautovic
Hi, I would start from https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters And this https://cwiki.apache.org/confluence/display/solr/Solr+Field+Types And after that https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing P

Re: in-place atomic updates for numeric docValue field

2017-05-05 Thread Emir Arnautovic
d inc. Emir On 04.05.2017 16:57, Dan . wrote: Hi Emir, Yes I though of representing -1 as null, but this makes the index unnecessarily larger, particularly if we have to default all docs to this value. Cheers, Dan On 4 May 2017 at 15:16, Emir Arnautovic wrote: Hi Dan, Remove does not make

Re: in-place atomic updates for numeric docValue field

2017-05-04 Thread Emir Arnautovic
Hi Dan, Remove does not make sense when it comes to in-place updates of docValues - it has to have some value, so only thing that you can do is introduce some int value as null. HTH, Emir On 04.05.2017 15:40, Dan . wrote: Hi, I have a field like this: so I can do a fast in-place atomi

Re: solr 6.3.0 monitoring

2017-05-04 Thread Emir Arnautovic
Hi Satya, In order to have more complete picture of your production (host, JVM, ZK, Solr metrics), I would suggest using one of monitoring solutions. One such solution is Sematext's SPM: http://sematext.com/spm/. It is much easier if you are up to SaaS setup, but we also provide on premise i

Re: Poll: Master-Slave or SolrCloud?

2017-04-27 Thread Emir Arnautovic
I think creating poll for ES ppl with question: "How do you run master nodes? A) on some data nodes B) dedicated node C) dedicated server" would give some insight how big issue is having ZK and if hiding ZK behind Solr would do any good. Emir On 25.04.2017 23:13, Otis Gospodnetić wrote: Hi

Re: distinct records based on a field

2017-04-05 Thread Emir Arnautovic
s 5 records AB C XYZFoo cat1 XYZFoo cat2 XYZBar cat1 XYZBar cat1 XYZBar cat2 out of those 10 records there may be duplicate values for B and then I am faceting it on C, So I get somethi

Re: distinct records based on a field

2017-04-05 Thread Emir Arnautovic
Hi VJ, You can use field collapsing feature to do distinct (https://cwiki.apache.org/confluence/display/solr/Result+Grouping) or maybe you can use facet pivoting and pivot on distinct field to get number of doc in each if needed (https://cwiki.apache.org/confluence/display/solr/Faceting#Facet

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-27 Thread Emir Arnautovic
It seems to me that you are looking for Solr's highlighting functionality: https://cwiki.apache.org/confluence/display/solr/Highlighting HTH, Emir On 27.03.2017 09:09, forest_soup wrote: We are going to implement a feature: When opening a document whose body field is already indexed in Solr,

Re: to handle expired documents: collection alias or delete by id query

2017-03-23 Thread Emir Arnautovic
Hi Derek, There are both pros and cons for both approaches: 1. if you are doing full reindexing PRO is that you have clean index all the time and even if something goes wrong, you don't have to switch alias to updated index so your users will not notice issues. CON is that you are doing full

Re: Architecture suggestions

2017-03-23 Thread Emir Arnautovic
Hi Vrindavda, It is hard to tell anything without testing and details on what/how is indexed, how it is going to be queried and what are latency/throughput requirements. 25M or 12.5M documents per shard might be too much if you have strict latency requirements, but testing is the only way to

Re: Solr Query Suggestion

2017-03-03 Thread Emir Arnautovic
Hi Vrinda, You should use field collapsing (https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results) or if you cannot live with its limitations, you can use results grouping (https://cwiki.apache.org/confluence/display/solr/Result+Grouping) HTH, Emir On 03.03.2017 10:5

Re: OR condition between !frange and normal query

2017-03-03 Thread Emir Arnautovic
sults. Regards, Edwin On 2 March 2017 at 17:04, Emir Arnautovic wrote: Hi Edwin, You can use subqueries: q=_query_:"({!frange l=1}ms(startDate_dt,endDate_dt)" OR _query_:"startDate:[2000-01-01T00:00:00Z TO *] AND endDate:[2016-12-31T23:59:59Z]" HTH, Emir On 02.03.2017

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Emir Arnautovic
e yourself. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 2 March 2017 at 09:09, Сергей Твердохлеб wrote: Hi Emir, Thanks for your answer. However in my case I really need to separate results, because I need to treat those resultsets differentl

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Emir Arnautovic
Hi Sergei, Usually you don't want to know which is which, but you do want to have exact matches first. In case of simple queries and depending on your usecase, you can use score to make distinction. If "bolter" matches "bolt" because of some filters, you will need to index it in two fields an

Re: OR condition between !frange and normal query

2017-03-02 Thread Emir Arnautovic
Hi Edwin, You can use subqueries: q=_query_:"({!frange l=1}ms(startDate_dt,endDate_dt)" OR _query_:"startDate:[2000-01-01T00:00:00Z TO *] AND endDate:[2016-12-31T23:59:59Z]" HTH, Emir On 02.03.2017 04:51, Zheng Lin Edwin Yeo wrote: Hi, Would like to check, how can we do an OR condition bet

Re: Boolean expression for spatial query

2017-02-27 Thread Emir Arnautovic
Hi Michael, I haven't been playing with spatial for a while, but if it fully supports WKT, you could use Intersects instead of Contains and MULTIPOINT instead of POINT. Something like: fq={!field f=regionGeometry}Intersects(MULTIPOINT((x1 y1), (x2, y2))) In any case you can use OR-ed _query_

Re: Phrase field matches not counting towards minimum match

2017-02-24 Thread Emir Arnautovic
Hi, mm applies to qf only and pf2/3 is about boosting results that are matched. What you can do is play with additional fields in qf and/or try making it work close to your requirement with autoRelax parameter. Note that in case of autorelax it might result in unexpected results if one field

Re: Select TOP 10 items from Solr Query

2017-02-20 Thread Emir Arnautovic
the normal faceting. Regards, Edwin On 20 February 2017 at 17:24, Emir Arnautovic wrote: Hi Edwin, I am also bit confused but, it seems to me that you could achieve what you need with pivot faceting: https://cwiki.apache.org/confl uence/display/solr/Faceting#Faceting-Pivot(DecisionTree)Faceting

Re: Select TOP 10 items from Solr Query

2017-02-20 Thread Emir Arnautovic
Hi Edwin, I am also bit confused but, it seems to me that you could achieve what you need with pivot faceting: https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Pivot(DecisionTree)Faceting HTH, Emir On 18.02.2017 08:46, Zheng Lin Edwin Yeo wrote: Although I have nested doc

Re: Indexing of documents in more than one step (SOLRJ)

2017-02-15 Thread Emir Arnautovic
and send to Solr. Emir On 15.02.2017 12:24, Maciej Ł. PCSS wrote: No, it's not the case. In both steps I'm indexing documents from the same set of IDs (I mean the values of the 'id'). Maciej W dniu 15.02.2017 o 11:07, Emir Arnautovic pisze: I did not have time to te

Re: Indexing of documents in more than one step (SOLRJ)

2017-02-15 Thread Emir Arnautovic
I did not have time to test it or look at the code, but can you check if it could be the case when there is no document with a, b, c fields and you are trying to update it with d, e, f using partial update syntax. Emir On 15.02.2017 09:25, Maciej Ł. PCSS wrote: Dear All, how should I handle

Re: Removing duplicate terms from query

2017-02-09 Thread Emir Arnautovic
Hi Ere, I don't think that there is such filter. Implementing such filter would require looking backward which violates streaming approach of token filters and unpredictable memory usage. I would do it as part of query preprocessor and not necessarily as part of Solr. HTH, Emir On 09.02.

Re: including dependency jars for SOLR plugins

2016-12-29 Thread Emir Arnautovic
Hi Vinay, You need to include libs using lib directives in Solr config: https://cwiki.apache.org/confluence/display/solr/Lib+Directives+in+SolrConfig. Regrads, Emir On 29.12.2016 19:11, Vinay B, wrote: I'm modifying out custom update handler and the modifications needs access to a third par

Re: Solr Suggester

2016-12-22 Thread Emir Arnautovic
That is because my_field_2 is not indexed. Regards, Emir On 21.12.2016 18:04, Furkan KAMACI wrote: Hi All, I've a field like that: When I run a suggester on my_field_1 it returns response. However my_field_2 doesn't. I've defined suggester as: suggester FuzzyLooku

Re: price sort

2016-11-14 Thread Emir Arnautovic
approach that does not change users intent . On Mon, Nov 14, 2016 at 2:38 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Midas, Sorting by price means that score (~relevancy) is ignored/used as second sorting criteria. My assumption is that you have long tail of false posit

Re: spell checking on query

2016-11-14 Thread Emir Arnautovic
Hi Midas, You can use Solr's spellcheck component: https://cwiki.apache.org/confluence/display/solr/Spell+Checking Emir On 14.11.2016 08:37, Midas A wrote: How can we do the query time spell checking with help of solr . -- Monitoring * Alerting * Anomaly Detection * Centralized Log Manag

Re: price sort

2016-11-14 Thread Emir Arnautovic
Hi Midas, Sorting by price means that score (~relevancy) is ignored/used as second sorting criteria. My assumption is that you have long tail of false positives causing sort by price to sort cheap, unrelated items first just because they matched by some stop word. Or I missed your question?

Re: Merge policy

2016-10-28 Thread Emir Arnautovic
I got some notification from mailer, so not sure if my reply reached you: "If you are using TieredMergePolicy, you can try setting /*reclaimDeletesWeight*/." HTH, Emir On 28.10.2016 09:20, Arkadi Colson wrote: The index size of 1 shard is about 125GB and we are running 11 shards with repl

Re: Comparing between 2 String fields

2016-10-28 Thread Emir Arnautovic
, Emir Arnautovic wrote: Hi Edwin, You can use functions to do that, e.g. fq={!frange l=1}strdist(field1,field2, edit) Solr now has eq func as well, so you can use that one in case you are running latest version. HTH, Emir On 27.10.2016 13:39, Zheng Lin Edwin Yeo wrote: Hi, Is it possible to

Re: Comparing between 2 String fields

2016-10-27 Thread Emir Arnautovic
Hi Edwin, You can use functions to do that, e.g. fq={!frange l=1}strdist(field1,field2, edit) Solr now has eq func as well, so you can use that one in case you are running latest version. HTH, Emir On 27.10.2016 13:39, Zheng Lin Edwin Yeo wrote: Hi, Is it possible to compare between 2 Str

Re: Query by distance

2016-10-13 Thread Emir Arnautovic
Hi, Did you try simple phrase query> PositionNSD:"Chief Executive Officer"? Did you apply synonym filter on query or index time? Emir On 11.10.2016 17:49, marotosg wrote: Hi, I have a field which contains Job Positions for people. This field uses a SynonymFilterFactory The field contains

Re: multivalued coordinate for geospatial search

2016-10-13 Thread Emir Arnautovic
Hi Chris, In order to make it work you have to concatenate lat/lon before it reaches indexing. You can do that by using processor chain and adding ConcatFieldUpdateProcessorFactory. Emir On 12.10.2016 11:26, Chris Chris wrote: Hello solr users! I am trying to use geospatial to do some bas

Re: How to retrieve 200K documents from Solr 4.10.2

2016-10-13 Thread Emir Arnautovic
Hi Obaid, You may also want to check out https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets Emir On 13.10.2016 00:33, Nick Vasilyev wrote: Check out cursorMark, it should be available in your release. There is some good information on this page: https://cwiki.apache.org/

Re: help with field definition

2016-09-16 Thread Emir Arnautovic
Hi, I missed that you already did define field and you are having troubles with query (did not read stackoverflow). Added answer there, but just in case somebody else is having similar troubles, issue is how query is written - space has to be escaped: q=Justin\ Bieber Regards, Emir On 13

Re: help with field definition

2016-09-14 Thread Emir Arnautovic
Hi Gandham, It seems to me that you need exact matches on singerName so it should be untokenized - use KeywordTokenizerFactory. If you want to make it case insensitive, add LowerCaseFilterFactory and that's for indexing. Query analysis chain can use standard tokenizer, LowerCaseFilterFactory

Re: Default stop word list

2016-09-09 Thread Emir Arnautovic
I would partially agree with Walter - having more resources allows us to include stopwords in index and let scoring model do its job. However, there are other Solr features that can suffer from that approach: e.g. if you use edismax and mm=80%, in case of query with stopwords, you can end up wi

Re: solr query time

2016-09-07 Thread Emir Arnautovic
Hi Kshitij, Query time depends on query parameters, number of docs matched, collection size, index size on disk, resources available and caches. Number of fields per doc will results in index being bigger on disk, but assuming there are enough resources - mainly RAM for OS caches - that shou

Re: Use function in condition

2016-09-05 Thread Emir Arnautovic
k you. Regards,Nabil. De : Emir Arnautovic À : solr-user@lucene.apache.org Envoyé le : Lundi 5 septembre 2016 10h30 Objet : Re: Use function in condition Hi Nabil, It should work. I've just tested on gettingstarted collection (sample that comes with Solr) and this query return

Re: Function query. Not in range

2016-09-05 Thread Emir Arnautovic
Hi NKI, You'll have to negate range or negate - in case you expect only positive values than it would be {!frange l=100} and if you want to include negative results, you will have to use {!frange l=1}or(query($q1),query($q2))&q1={!frange u=0}sum(Field1, Ffield2)&q2={!frange l=100}sum(Field1,

Re: Use function in condition

2016-09-05 Thread Emir Arnautovic
ery($sub3)))&sub1=F3:Active&sub2={!frange u=2000}sum(F3,F4)&sub3={!frange l=3000}sum(F5,F6) Regards,Nabil. De : Emir Arnautovic À : solr-user@lucene.apache.org Envoyé le : Lundi 29 août 2016 14h06 Objet : Re: Use function in condition Hi Nabil, Can you try following

Re: Always add the marker when elevating documents

2016-09-02 Thread Emir Arnautovic
Hi Alexandre, You can specify default fl paramter for search handler in Solr config. You can use *,[elevated] to return all fields + elevated, but it is recommended to limit fl to fields needed - if you truly need all fields, then using * is ok. Regards, Emir On 01.09.2016 22:11, Alexandre

Re: query issue

2016-08-31 Thread Emir Arnautovic
Hi Kris, It is because there is no token 'ddd' in content field. There are tokens that start with 'ddd', but that is not what you asked for. If you want 'ddd' to match 'd' than your query should be content:ddd* Please take a look at how Solr tokenization works: https://cwiki.apache.o

Re: Spike in SOLR Process and Frequent GC

2016-08-31 Thread Emir Arnautovic
Hi Thiru, Two Solrs with different data and usage patterns should be tuned separately and comparing one to another does not give much value. Like Shawn suggested, first thing that you can try is increase heap size. Having different Xms and Xmx is bad practice so make sure it is set to the sa

Re: Sorting on different language fields

2016-08-31 Thread Emir Arnautovic
time? Thank you, Vasu On Wed, Aug 31, 2016 at 12:46 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Vasu, It is expected behavior, and you can control it with sortMissingLast and sortMissingFirst. Here is comment from schema: In any case it does not seem right to me to have re

Re: Solr for Multi Tenant architecture

2016-08-31 Thread Emir Arnautovic
HI Chamil, One thing to consider is relevancy, especially in case tenants' domains are different (e.g. one is tech and other pharmacy). If you go with one collection and use same field (e.g. desc) for all tenants, you will get one field stats and could skew results ordering if you order by sco

Re: Sorting on different language fields

2016-08-31 Thread Emir Arnautovic
Hi Vasu, It is expected behavior, and you can control it with sortMissingLast and sortMissingFirst. Here is comment from schema: In any case it does not seem right to me to have results first just because it is declared as French - in some cases it will be same as English version and will

Re: Monitoring Apache Solr

2016-08-30 Thread Emir Arnautovic
Hi Hardika, You can try Sematext's SPM: http://sematext.com/spm Regards, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 30.08.2016 12:59, vrindavda wrote: Hi Hardika, To stop/restart solr you can try explor

Re: Use function in condition

2016-08-29 Thread Emir Arnautovic
syntaxe. Regards,Nabil. De : Emir Arnautovic À : solr-user@lucene.apache.org Envoyé le : Jeudi 25 août 2016 16h51 Objet : Re: Use function in condition Hi Nabil, You have limited set functions, but there are logical functions: or, and, not and you have query function so can d

Re: High load, frequent updates, low latency requirement use case

2016-08-26 Thread Emir Arnautovic
Hi Brent, Please see inline comments. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 26.08.2016 04:51, Brent P wrote: I'm trying to set up a Solr Cloud cluster to support a system with the following

Re: Use function in condition

2016-08-25 Thread Emir Arnautovic
know that I can use multiple fq but the problem is I can have complexe filter like (cond1 OR cond2 AND cond3) Could you please help. Regards,Nabil. De : Emir Arnautovic À : solr-user@lucene.apache.org Envoyé le : Mercredi 17 août 2016 17h08 Objet : Re: Use function in condition

Re: help with DIH transformer to add a suffix to column names

2016-08-23 Thread Emir Arnautovic
Hi Wendy, Why don't you simply specify column names in your query? Do you have that much columns that "SELECT *" is THE way to go? For the transformer - you changed the row, but fields in context are still using old names - maybe try setting field names in context (if possible - did not look

Re: help with DIH transformer to add a suffix to column names

2016-08-22 Thread Emir Arnautovic
Hi Wendy, It seems to me that you misunderstood concept of dynamic fields. It is something that is defined in Solr schema, e.g. *_text, and then in your DIH config you define fields that match that pattern, e.g. name_text, desc_text etc. HTH, Emir On 20.08.2016 00:58, Alexandre Rafalovitch

Re: Use function in condition

2016-08-17 Thread Emir Arnautovic
Hi Nabil, You can use frange queries, e.g. you can use fq={!frange l=100}sum(field1,field2) to filter doc with sum greater than 100. Regards, Emir On 17.08.2016 16:26, nabil Kouici wrote: Hi, Is it possible to use functions (function query https://cwiki.apache.org/confluence/display/solr/F

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Emir Arnautovic
document and i am sending 100 documents per request. solr heap size is 16gb and running on multithread. On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi, 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr will be idle while acc

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Emir Arnautovic
request are in same pool. On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi, Do you send one doc per request? How frequently do you commit? Where is Solr running? What is network connection between your machine and Solr? What are JVM settings?

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Emir Arnautovic
Hi, Do you send one doc per request? How frequently do you commit? Where is Solr running? What is network connection between your machine and Solr? What are JVM settings? Is 10-30s for entire indexing or single doc? Regards, Emir On 16.08.2016 11:34, kshitij tyagi wrote: Hi alexandre, 1 do

Re: insertion time

2016-08-15 Thread Emir Arnautovic
Hi Mahmoud, I haven't been looking for new DIH featrures, but I don't think there is something that can provides such functionality and that only thing you can do is track it in your source and index it (like createDate and lastUpdatedDate). Regards, Emir On 14.08.2016 20:56, Mahmoud Almok

Re: Effects of insert order on query performance

2016-08-12 Thread Emir Arnautovic
Hi Jeff, I will not comment on your theory (will let that to guys more familiar with Lucene code) but will point to one alternative solution: routing. You can use routing to split documents with different permission to different shards and use composite hash routing to split "A" (and maybe "B

Re: commit it taking 1300 ms

2016-08-11 Thread Emir Arnautovic
to say that we are not hard committing ). that curl takes time i.e. 1.3 sec. On Wed, Aug 10, 2016 at 2:29 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Midas, According to your autocommit configuration and your worry about commit time I assume that you are doing explicit c

Re: Help for -- Filter in the text field + highlight + no affect on boosting(if done with q instead of fq)

2016-08-10 Thread Emir Arnautovic
Hi Mayur, Not sure if I get your case completely, but if you need query but not sorted by score, you can use boost factors 0 in your edismax definition (e.g. qf=title^0) or you can order by doc id (sort= _docid_ asc) HTH, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Ma

Re: display filter based on existence of facet

2016-08-10 Thread Emir Arnautovic
Hi Derek, Not sure if there is some shortcut but you could try setting facet.sort=index and for sure use facet.limit=1. Regards, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 10.08.2016 09:32, Derek Poh

Re: Solr 5.2.1 heap issues

2016-08-10 Thread Emir Arnautovic
Hi Preeti, 3GB heap is too small for such setup. I would try 10-15GB, but that depends on usage patterns. You have 50GB machine and assuming that you do not run anything other than solr you have 30GB to spare on Solr and still leave enough to OS to cache entire index. The best way to do heap

Re: commit it taking 1300 ms

2016-08-10 Thread Emir Arnautovic
10.08.2016 05:20, Midas A wrote: Thanks for replying index size:9GB 2000 docs/sec. Actually earlier it was taking less but suddenly it has increased . Currently we do not have any monitoring tool. On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi

Re: commit it taking 1300 ms

2016-08-09 Thread Emir Arnautovic
Hi Midas, Can you give us more details on your index: size, number of new docs between commits. Why do you think 1.3s for commit is to much and why do you need it to take less? Did you do any system/Solr monitoring? Emir On 09.08.2016 14:10, Midas A wrote: please reply it is urgent. On Tue

Re: GC implications on Solr

2016-07-25 Thread Emir Arnautovic
Hi Madhur, Shown described extreme case (not unusual though) and is not hard to detect since effects will be catastrophic. You can use one of Solr monitoring tools to see how GC (and other interrupting events such as commits, segment merges, saturated network) affect Solr numbers. One such to

Re: Cold replication

2016-07-19 Thread Emir Arnautovic
Hi Mahmoud, What you can do is use local SSD disk as cache for EBS. You can try lvmcache or bcache. It will boost your performance while data will remain on EBS. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://semate

Re: Define search query parameters in Solr or let clients applications craft them?

2016-06-14 Thread Emir Arnautovic
you for pointing out the cons of defining them in Solr config. One of the thing I am worry about in letting clientapplication defined the parametersis the developers will use or include unnecessary, wrong and resource intensive parameters. On 6/13/2016 5:50 PM, Emir Arnautovic wrote: Hi

Re: Define search query parameters in Solr or let clients applications craft them?

2016-06-13 Thread Emir Arnautovic
Hi Derek, Maybe I am looking this from perspective who is working with other peoples' setups, but I prefer when it is defined in Solr configs: I can get sense of queries from looking at configs, you have mechanism to lock some parameters, updates are centralized... However, it does come with s

Re: Indexing date types

2016-06-03 Thread Emir Arnautovic
Hi Steve, The best way to make sure everything work is to test, but without testing on target version, my answers would be: 1. if Solr accepts date without time it'll be the same as time 00:00:00 so if it does not accept, you can always append. 2. it'll work just expect that sum of facet count c

Re: solr sql & streaming

2016-04-28 Thread Emir Arnautovic
Hi Shani, Are you running in SolrCloud mode? Here is blog post you can follow: https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/ Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 28.

Re: Optimal indexing speed in Solr

2016-04-14 Thread Emir Arnautovic
Hi Edwin, Indexing speed depends on multiple factors: HW, Solr configurations and load, documents, indexing client: More complex documents, more CPU time to process each document before indexing structure is written down to disk. Bigger the index, more heap is used, more frequent GCs. Maybe you

Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic
don't want to specify a range? Or would I have to do year 0 to NOW? Thanks, Rob On 03/31/2016 10:26 AM, Emir Arnautovic wrote: Hi Yago, Not sure if I misunderstood the case, but assuming you have date field called my_date you can facet last 10 days by day using range qu

Re: Complex Sort

2016-03-31 Thread Emir Arnautovic
You would have to write your custom function for that. On 31.03.2016 11:24, ~$alpha` wrote: I am not sure how to use "Sort By Function" for Case. |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0| Can you tell how to fetch 40 when input is 10. -- View this message in co

Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic
3ce59) On Mar 31 2016, at 10:08 am, Emir Arnautovic <emir.arnauto...@sematext.com> wrote: Hi Robert, You can use range faceting and set use facet.range.gap to set how dates are "truncated". Regards, Emir On 31.03.2016 10:52, Robert Brown wrote: > Hi, > > Is it possi

Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic
Hi Robert, You can use range faceting and set use facet.range.gap to set how dates are "truncated". Regards, Emir On 31.03.2016 10:52, Robert Brown wrote: Hi, Is it possible to facet by a date (solr.TrieDateField) but truncated to the day, or even the hour? If not, are there any other opt

Re: Complex Sort

2016-03-31 Thread Emir Arnautovic
Hi, Not sure if I fully understood your case, but here are some ideas: - if you have small number of ids you can have score_%id% field that can be used for sorting - if number of ids is large you can use sort by function to parse score data and find right score - if number of results is small,

Re: Use default field, if more specific field does not exist

2016-03-28 Thread Emir Arnautovic
d the individual price values (not the buckets), just like facet.field=price but with respect to the user prices. Is this possible as well? About the performance: Are there any specific bottlenecks you would expect? Best regards, Georg Emir Arnautovic schrieb am Fr., 25. März 2016 um 11:47 Uhr: Hi

Re: Use default field, if more specific field does not exist

2016-03-25 Thread Emir Arnautovic
Hi Georg, One solution that could work on existing schema is to use query faceting and queries like (for USER_ID = 1, bucker 100 to 200): price_1:[100 TO 200] OR (-price_1:[* TO *] AND price:[100 TO 200]) Same query is used for filtering. What you should test is if performances are acceptable

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Hi, Your cache will be cleared on soft commits - every two minutes. It seems that it is either configured to be huge or you have big documents and retrieving all fields or dont have lazy field loading set to true. Can you please share your document cache config and heap settings. Thanks, Emir

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat up memory before new search is executed. It

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
/16 8:56 AM, Emir Arnautovic wrote: Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat

Re: Text search NGram

2016-03-16 Thread Emir Arnautovic
all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -Original Message----- From: Emir Arnautovic [mailto:em

Re: Text search NGram

2016-03-16 Thread Emir Arnautovic
y anyone other than the intended person(s) is prohibited. -Original Message- From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] Sent: Wednesday, March 16, 2016 2:39 PM To: solr-user@lucene.apache.org Subject: Re: Text search NGram Hi Rajesh, Did you reindex afters setting omitNor

Re: Text search NGram

2016-03-16 Thread Emir Arnautovic
alent Measurement products and services. If you have received this e-mail in error, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachmen

Re: ngrams with position

2016-03-11 Thread Emir Arnautovic
e trying to solve with that complex way of tokenisation ? Solr is really good in storing positions along with token, so I am curious to know why your are mixing the things up. Cheers On 8 March 2016 at 10:08, elisabeth benoit < elisaelisael...@gmail.com> wrote: Thanks

Re: ngrams with position

2016-03-08 Thread Emir Arnautovic
Hi Elisabeth, I don't think there is such token filter, so you would have to create your own token filter that takes token and emits ngram token of specific length. It should not be too hard to create such filter - you can take a look how nagram filter is coded - yours should be simpler than th

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -Original Message- From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] Sent: Monday, March 7, 2016 7:36 PM To: solr-user@lucene.apache.org Subject: Re: Text search NGram Hi Rajes

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -----Original Mess

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
Hi Rajesh, It is most likely related to norms - you can try setting omitNorms="true" and reindexing content. Anyway, it is not common to use just ngrams for matching content - in such case you can expect more unexpected ordering/results. You should combine ngrams fields with normally tokenized

Re: Spatial Search on Postal Code

2016-03-04 Thread Emir Arnautovic
Emir, Obviously #2 approach is much better. I know its not straight forward. But, is it really acheivable in Solr? Like building a polygon for a postal code. If so, can you throw some light how to do? Thanks, Manohar On Friday, March 4, 2016, Emir Arnautovic wrote: Hi Manohar, This depends on

Re: Spatial Search on Postal Code

2016-03-04 Thread Emir Arnautovic
Hi Manohar, This depends on your requirements/usecase. If postal code is interpreted as point than it is expected to have radius that is significantly larger than postal code diameter. In such case you can go with first approach. In order to avoid missing results from postal code in case of sma

Re: Commit after every document - alternate approach

2016-03-04 Thread Emir Arnautovic
Hi Sangeetha, It seems to me that you are using Solr as primary data store? If that is true, you should not do that - you should have some other store that is transactional and can support what you are trying to do with Solr. If you are not using Solr as primary store, and it is critical to hav

Re: Commit after every document - alternate approach

2016-03-02 Thread Emir Arnautovic
Hi Sangeetha, What is sure is that it is not going to work - with 200-300K doc/hour, there will be >50 commits/second, meaning there are <20ms time for doc+commit. You can do is let Solr handle commits and maybe use real time get to verify doc is in Solr or do some periodic sanity checks. Are y

  1   2   >