Re: integrate solr with preprocessor tools

2015-12-16 Thread Emir Arnautovic
must use solr interfaces? i see in above link that i can use solr analyzer.but how i use that? plz say me how i start to write my own analyzer step by step... which interface i can use and change to achieve my goal? tnx On Wed, Dec 9, 2015 at 1:50 AM, Emir Arnautovic < emir.arnauto...@sematext.

Re: TPS with Solr Cloud

2015-12-21 Thread Emir Arnautovic
Hi Anshul, TPS depends on number of concurrent request you can run and request processing time. With sharding you reduce processing time with reducing amount of data single node process, but you have overhead of inter shard communication and merging results from different shards. If that overh

Re: new data structure for some fields

2015-12-21 Thread Emir Arnautovic
Maybe missing something but if c and b are one-to-one and you are filtering by c, how can you sort on b since all values will be the same? On 21.12.2015 13:10, Abhishek Mishra wrote: Hi binoy it will not work as category and integer is one to one mapping so if category_id is multivalued same go

Re: Best practices on monitoring Solr

2015-12-23 Thread Emir Arnautovic
Hi Shail, As William mentioned, our SPM allows you to monitor all main Solr/Jvm/Host metrics and also set up alerts for some values or use anomaly detection to notify you when something is about to be wrong. You can test all features for free for 30 days (

Re: Does soft commit re-opens searchers in disk?

2016-01-04 Thread Emir Arnautovic
Hi Gili, Visibility is related to searcher - if you reopen searcher it will be visible. If hard commit happens without reopening searcher, documents will not be visible till next soft commit happens. You can find more details about commits on https://lucidworks.com/blog/2013/08/23/understanding

Re: Query behavior difference.

2016-01-06 Thread Emir Arnautovic
Hi Modassar, It usually helps if you analyze extreme case: e.g. fl:a* What terms should be better match? Those who are shorter or all should be equally good? What should be top document? Assuming standard TF/IDF scoring is used, that would be one with the most terms that start with 'a' especiall

Re: Newbie: Searching across 2 collections ?

2016-01-06 Thread Emir Arnautovic
Hi Bruno, Can you check counts? Is it possible that first page is only with results from collection that you sent request to so you assumed it returns only results from single collection? Thanks, Emir On 06.01.2016 14:33, Susheel Kumar wrote: Hi Bruno, I just tested this scenario in my loca

Re: solr BooleanClauses issue with space

2016-01-13 Thread Emir Arnautovic
Hi Sara, You can run your query (or smaller one) with debugQuery=true and see how it is rewritten. Thanks, Emir On 13.01.2016 16:01, sara hajili wrote: tnx. and my main question is about maxBooleanDefault in solr config. it is 1024 by default. and i have a edismax query with about 500 words i

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Emir Arnautovic
Hi Modassar, Why do you think it should be at position 1? In that case searching for "3 d" would not find anything. Is it what you expect? Thanks, Emir On 14.01.2016 10:15, Modassar Ather wrote: Hi, I have following definition for WordDelimiterFilter. The analysis of 3d shows following fo

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Emir Arnautovic
's what I get: 3d 1 3 1 d 2 3d 2 1) can you confirm if you've made a typo while typing out your results? 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is split. Try the same thing with d3 and you'll get 3 and d3 at position 2 On Thu, 14 Jan 2016,

Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Emir Arnautovic
Modassar, Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why do you use WordDelimiterFilter? Can you give us few examples where it is useful? Thanks, Emir On 15.01.2016 05:13, Modassar Ather wrote: Thanks for your responses. It seems to me that you don't want to split

Re: Speculation on Memory needed to efficently run a Solr Instance.

2016-01-15 Thread Emir Arnautovic
Hi, OS does not care much about search v.s. retrieve so amount of RAM needed for file caches would depend on your index usage patterns. If you are not retrieving stored fields much and most/all results are only id+score, than it can be assumed that you can go with less RAM than actual index si

Re: Position increment in WordDelimiterFilter.

2016-01-15 Thread Emir Arnautovic
en in two terms like lucene and search then it will be helpful to get the documents containing it for queries like lucene documentation or search documentation. Best, Modassar On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Modassar, Are you saying

Re: ramBufferSizeMB and maxIndexingThreads

2016-01-20 Thread Emir Arnautovic
Kind of obvious/logical, but seen some people forgetting that it is per core - if single node host multiple shards, each will take 100MB. Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 20.01.2016 07:02, Sha

Re: Returning all documents in a collection

2016-01-20 Thread Emir Arnautovic
Hi Salman, You should use cursors in order to avoid "deep paging issues". Take a look at https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results. Regards, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://semate

Re: solr score threashold

2016-01-20 Thread Emir Arnautovic
Hi Sara, You can use funct and frange to achive needed, but note that scores are not normalized meaning score 8 does not mean it is good match - it is just best match. There are examples online how to normalize score (e.g. http://wiki.apache.org/lucene-java/ScoresAsPercentages). Other approach

Re: Couple of question about Virtualization and Load Balancer

2016-01-22 Thread Emir Arnautovic
There is other reason to avoid virtualization - fault tolerance. It is common to use virtualization on huge box and keep replications on same box. Such setup will survive VM failure but not HW failure. Regards, Emir On 22.01.2016 11:05, Gian Maria Ricci - aka Alkampfer wrote: Thanks, my actua

Re: Understanding solr commit

2016-01-25 Thread Emir Arnautovic
Hi Rahul, If I got your mail right there is misconception of SolrCloud - nodes are infrastructure of cloud and collection is something that is "unit". So when you commit, you are committing changes you did on collection and SolrCloud will handle nodes. When you commit to three 3 nodes it is ac

Re: Understanding solr commit

2016-01-25 Thread Emir Arnautovic
Hi Rahul, It is good that you commit only once, but not sure how external commits can do something auto commit cannot. Can you give us bit more details about Solr heap parameters. Running Solr on the edge of OOM is always risk of starting snowball effect and crashing entire cluster. Also can yo

Re: Understanding solr commit

2016-01-25 Thread Emir Arnautovic
info about auto commit (both hard and soft) you used when experienced OOM. 15000 15000 false soft commit is not enabled. -Rahul On Mon, Jan 25, 2016 at 6:00 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Rahul, It is good that you commit only once, but not sure how exter

Re: indexing rich data with solr 5.3.1 integreting in Ubuntu server

2016-01-26 Thread Emir Arnautovic
Hi, I would first check if external libraries are present and loaded. How do you start Solr? Try explicitly setting solr.install.dir or set absolute path to libs and see in logs if they are loaded. Thanks, Emir On 25.01.2016 15:16, kostali hassan wrote: 0down votefavorite

Re: To Detect Wheter Core is Available To Post

2016-01-26 Thread Emir Arnautovic
Hi Edwin, Assuming you are using SolrCloud - why do you need specific core? Can you use some of status actions from collection API - there is CLUSTERSTATUS action? Thanks, Emir On 26.01.2016 05:34, Edwin Lee wrote: Hi All, Our team is using the Solr to process log and we met a problem in SO

Re: Apache solr can be made near-real-Time???

2016-01-28 Thread Emir Arnautovic
Hi Samina, First to thank you for teaching me what "lakh" is :) Solr is capable of handling large amount of data, but that requires large Solr cluster. What you need to determine is what is your real time - what is max time you can tolerate update to be visible; and determine acceptable query

Re: implement exact match for one of the search fields only?

2016-01-28 Thread Emir Arnautovic
Hi Derek, It is not clear what you are trying to achieve: "one of the search fields is an exact phrase match while the rest of the search fields can be exact or partial matches". What does "while" mean - it has to match in other fields as well or result should be scored better if it does but n

Re: implement exact match for one of the search fields only?

2016-01-29 Thread Emir Arnautovic
http://sematext.com/ On 29.01.2016 02:03, Derek Poh wrote: Hi Emir For the other search fields, if they have matches it should be return. On 1/28/2016 8:17 PM, Emir Arnautovic wrote: Hi Derek, It is not clear what you are trying to achieve: "one of the search fields is an exact phrase match while

Re: Solr segment merging in different replica

2016-02-01 Thread Emir Arnautovic
Hi Edwin, What is your setup - SolrCloud or Master-Slave? If it si SolrCloud, then under normal index updates, each core is behaving as independent index. In theory, if all changes happen at the same time on all nodes, merges will happen at the same time. But that is not realistic and it is ex

Re: Solr segment merging in different replica

2016-02-02 Thread Emir Arnautovic
lso, will it be good to use a separate network interface to connect the two node with the interface that is used to connect to the network for searching? Regards, Edwin On 1 February 2016 at 19:01, Emir Arnautovic wrote: Hi Edwin, What is your setup - SolrCloud or Master-Slave? If it si SolrCl

Re: Solr segment merging in different replica

2016-02-03 Thread Emir Arnautovic
e querying. This issue should be eliminated when I shift my replica to another server. Would like to check, will there be any advantage if I change to the Master-Slave setup, as compared to the SolrCloud setup which I am currently using? Regards, Edwin On 2 February 2016 at 21:23, Emir Arnauto

Re: solr performance issue

2016-02-08 Thread Emir Arnautovic
Hi Sara, Not sure if I am reading this right, but I read it as you have 1000 doc index and issues? Can you tell us bit more about your setup: number of servers, hw, index size, number of shards, queries that you run, do you index at the same time... It seems to me that you are running Solr on

Re: solr performance issue

2016-02-08 Thread Emir Arnautovic
Hi Sara, It is still considered to be small index. Can you give us bit details about your setup? Thanks, Emir On 08.02.2016 12:04, sara hajili wrote: sorry i made a mistake i have a bout 1000 K doc. i mean about 100 doc. On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic < emir.arna

Re: Solr architecture

2016-02-08 Thread Emir Arnautovic
Hi Mark, Can you give us bit more details: size of docs, query types, are docs grouped somehow, are they time sensitive, will they update or it is rebuild every time, etc. Thanks, Emir On 08.02.2016 16:56, Mark Robinson wrote: Hi, We have a requirement where we would need to index around 2 B

Re: Solr architecture

2016-02-10 Thread Emir Arnautovic
of 2 billion docs as NRT or if it will be offline (during off hours etc). For more accurate sizing you may also want to index say 10 million documents which may give you idea how much is your index size and then use that for extrapolation to come up with memory requirements. Thanks

Re: Solr architecture

2016-02-11 Thread Emir Arnautovic
so could some one please recommend a sizing to cater to this levels of data. The queries per second is around 320 qps. Thanks! Mark On Wed, Feb 10, 2016 at 3:38 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Mark, Appending session actions just to be able to return more th

Re: solr-4.3.1 docValues usage

2016-02-15 Thread Emir Arnautovic
Hi, Not sure how ordering will help (maybe missing question) but what seems to me that would help your case is simple boosting. See https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field Regards, Emir On 15.02.201

Re: solr-4.3.1 docValues usage

2016-02-15 Thread Emir Arnautovic
Sorry - replied to wrong thread :( On 15.02.2016 15:17, Emir Arnautovic wrote: Hi, Not sure how ordering will help (maybe missing question) but what seems to me that would help your case is simple boosting. See https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman

Re: SOLR ranking

2016-02-15 Thread Emir Arnautovic
Hi, Not sure how ordering will help (maybe missing question) but what seems to me that would help your case is simple boosting. See https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field Regards, Emir On 15.02.2

Re: SOLR ranking

2016-02-15 Thread Emir Arnautovic
Hi Nitin, You can use pf parameter to boost results with exact phrase. You can also use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3 words in case input is with more than 3 words) Regards, Emir On 16.02.2016 06:18, Nitin.K wrote: I am using edismax parser with the fo

Re: SOLR ranking

2016-02-16 Thread Emir Arnautovic
Hi Nitin, Not sure if you changed what fields you use for phrase boost, but in example you sent, all fields except content are "string" fields and content is boosted with 6 while topic_title in qf is boosted with 100. Try setting same field you use in qf in pf2 and you should see the differenc

Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Emir Arnautovic
Hi, It is most common to use Nutch as crawler, but it seems that it still does not have support for SolrCloud (if I am reading this ticket correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, I would recommend Nutch with standard http client. Regards, Emir On 16.02.2016 16:02

Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Emir Arnautovic
Markus, Ticket I run into is for Nutch2 and NUTCH-2197 is for Nutch1. Haven't been using Nutch for a while so cannot recommend version. Thanks, Emir On 16.02.2016 16:37, Markus Jelsma wrote: Nutch has Solr 5 cloud support in trunk, i committed it earlier this month. https://issues.apache.org/j

Re: SOLR ranking

2016-02-18 Thread Emir Arnautovic
Hi Nitin, Can you send us how your parsed query looks like (from debug output). Thanks, Emir On 17.02.2016 08:38, Nitin.K wrote: Hi Binoy, We are searching for both phrases and individual words but we want that only those documents which are having phrases will come first in the order and then

Re: Sort vs boost

2016-02-22 Thread Emir Arnautovic
Hi Anil, Decision also depends on your usecase - if you are sure that there will be no cases where documents matches are of different score or you don't care about how well document match query (e.g. all queries will be single term query) then sorting by time is way to go. But, if there is cha

Re: Query time de-boost

2016-02-24 Thread Emir Arnautovic
Hi Shamik, Is boosting others acceptable option to you, e.g. ContentGroup:"NonDeveloper"^100. Which query parser do you use? Regards, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 23.02.2016 23:42, Shami

Re: What search metrics are useful?

2016-02-24 Thread Emir Arnautovic
Hi Bill, You can take a look at Sematext's search analytics (https://sematext.com/search-analytics). It provides some of metrics you mentioned, plus some additional (top queries, CTR, click stats, paging stats etc.). In combination with Sematext's performance metrics (https://sematext.com/spm)

Re: Query time de-boost

2016-02-25 Thread Emir Arnautovic
Hi Shamik, You are righ boosting with values that are lower than 1 is still positive, but you can boost with negative value and that should do the trick so you can do bq=ContenGroup-local:Developer^-99 (note that it can result in negative score). If you need more than just Developer/Others you

Re: Query time de-boost

2016-02-26 Thread Emir Arnautovic
Hi Jack, I just checked on 5.5 and 0.1 is positive boost. Regards, Emir On 26.02.2016 01:11, Jack Krupansky wrote: 0.1 is a fractional boost - all intra-query boosts are multiplicative, not additive, so term^0.1 reduces the term by 90%. -- Jack Krupansky On Wed, Feb 24, 2016 at 11:29 AM, sham

Re: Query time de-boost

2016-02-28 Thread Emir Arnautovic
f 0.5 boosted by 0.1 would become 0.05. IOW, it de-boosts occurrences of the term. The point remains that you do not need a "negative boost" to de-boost a term. -- Jack Krupansky On Fri, Feb 26, 2016 at 4:01 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Jack, I ju

Re: Filter factory to reduce word from plural forms to singular forms correctly?

2016-02-29 Thread Emir Arnautovic
Hi Derek, Why does aggressive stemming worries you? You might have false positives, but that is desired behavior in most cases. In your case "iphone" documents will also be returned for "iphon" query. Is this something that is not desired behavior? You can have more than one field if you want

Re: Indexing books, chapters and pages

2016-03-01 Thread Emir Arnautovic
Hi, From the top of my head - probably does not solve problem completely, but may trigger brainstorming: Index chapters and include page break tokens. Use highlighting to return matches and make sure fragment size is large enough to get page break token. In such scenario you should use slop fo

Re: FW: Difference Between Tokenizer and filter

2016-03-02 Thread Emir Arnautovic
Hi Rajesh, Processing flow is same for both indexing and querying. What is compared at the end are resulting tokens. In general flow is: text -> char filter -> filtered text -> tokenizer -> tokens -> filter1 -> tokens ... -> filterN -> tokens. You can read more about analysis chain in Solr wi

Re: understand scoring

2016-03-02 Thread Emir Arnautovic
Hi Michael, Can you please run query with debug and share title field configuration. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 02.03.2016 09:14, michael solomon wrote: Thanks you, @Doug Turnbu

Re: Commit after every document - alternate approach

2016-03-02 Thread Emir Arnautovic
Hi Sangeetha, What is sure is that it is not going to work - with 200-300K doc/hour, there will be >50 commits/second, meaning there are <20ms time for doc+commit. You can do is let Solr handle commits and maybe use real time get to verify doc is in Solr or do some periodic sanity checks. Are y

Re: Commit after every document - alternate approach

2016-03-04 Thread Emir Arnautovic
Hi Sangeetha, It seems to me that you are using Solr as primary data store? If that is true, you should not do that - you should have some other store that is transactional and can support what you are trying to do with Solr. If you are not using Solr as primary store, and it is critical to hav

Re: Spatial Search on Postal Code

2016-03-04 Thread Emir Arnautovic
Hi Manohar, This depends on your requirements/usecase. If postal code is interpreted as point than it is expected to have radius that is significantly larger than postal code diameter. In such case you can go with first approach. In order to avoid missing results from postal code in case of sma

Re: Spatial Search on Postal Code

2016-03-04 Thread Emir Arnautovic
Emir, Obviously #2 approach is much better. I know its not straight forward. But, is it really acheivable in Solr? Like building a polygon for a postal code. If so, can you throw some light how to do? Thanks, Manohar On Friday, March 4, 2016, Emir Arnautovic wrote: Hi Manohar, This depends on

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
Hi Rajesh, It is most likely related to norms - you can try setting omitNorms="true" and reindexing content. Anyway, it is not common to use just ngrams for matching content - in such case you can expect more unexpected ordering/results. You should combine ngrams fields with normally tokenized

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -----Original Mess

Re: Text search NGram

2016-03-07 Thread Emir Arnautovic
other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -Original Message- From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] Sent: Monday, March 7, 2016 7:36 PM To: solr-user@lucene.apache.org Subject: Re: Text search NGram Hi Rajes

Re: ngrams with position

2016-03-08 Thread Emir Arnautovic
Hi Elisabeth, I don't think there is such token filter, so you would have to create your own token filter that takes token and emits ngram token of specific length. It should not be too hard to create such filter - you can take a look how nagram filter is coded - yours should be simpler than th

Re: ngrams with position

2016-03-11 Thread Emir Arnautovic
e trying to solve with that complex way of tokenisation ? Solr is really good in storing positions along with token, so I am curious to know why your are mixing the things up. Cheers On 8 March 2016 at 10:08, elisabeth benoit < elisaelisael...@gmail.com> wrote: Thanks

Re: Text search NGram

2016-03-16 Thread Emir Arnautovic
alent Measurement products and services. If you have received this e-mail in error, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachmen

Re: Text search NGram

2016-03-16 Thread Emir Arnautovic
y anyone other than the intended person(s) is prohibited. -Original Message- From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] Sent: Wednesday, March 16, 2016 2:39 PM To: solr-user@lucene.apache.org Subject: Re: Text search NGram Hi Rajesh, Did you reindex afters setting omitNor

Re: Text search NGram

2016-03-16 Thread Emir Arnautovic
all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -Original Message----- From: Emir Arnautovic [mailto:em

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
/16 8:56 AM, Emir Arnautovic wrote: Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Problem starts with autowarmCount="5000" - that executes 5000 queries when new searcher is created and as queries are executed, document cache is filled. If you have large queryResultWindowSize and queries return big number of documents, that will eat up memory before new search is executed. It

Re: Document Cache

2016-03-19 Thread Emir Arnautovic
Hi, Your cache will be cleared on soft commits - every two minutes. It seems that it is either configured to be huge or you have big documents and retrieving all fields or dont have lazy field loading set to true. Can you please share your document cache config and heap settings. Thanks, Emir

Re: Use default field, if more specific field does not exist

2016-03-25 Thread Emir Arnautovic
Hi Georg, One solution that could work on existing schema is to use query faceting and queries like (for USER_ID = 1, bucker 100 to 200): price_1:[100 TO 200] OR (-price_1:[* TO *] AND price:[100 TO 200]) Same query is used for filtering. What you should test is if performances are acceptable

Re: Use default field, if more specific field does not exist

2016-03-28 Thread Emir Arnautovic
d the individual price values (not the buckets), just like facet.field=price but with respect to the user prices. Is this possible as well? About the performance: Are there any specific bottlenecks you would expect? Best regards, Georg Emir Arnautovic schrieb am Fr., 25. März 2016 um 11:47 Uhr: Hi

Re: Complex Sort

2016-03-31 Thread Emir Arnautovic
Hi, Not sure if I fully understood your case, but here are some ideas: - if you have small number of ids you can have score_%id% field that can be used for sorting - if number of ids is large you can use sort by function to parse score data and find right score - if number of results is small,

Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic
Hi Robert, You can use range faceting and set use facet.range.gap to set how dates are "truncated". Regards, Emir On 31.03.2016 10:52, Robert Brown wrote: Hi, Is it possible to facet by a date (solr.TrieDateField) but truncated to the day, or even the hour? If not, are there any other opt

Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic
3ce59) On Mar 31 2016, at 10:08 am, Emir Arnautovic <emir.arnauto...@sematext.com> wrote: Hi Robert, You can use range faceting and set use facet.range.gap to set how dates are "truncated". Regards, Emir On 31.03.2016 10:52, Robert Brown wrote: > Hi, > > Is it possi

Re: Complex Sort

2016-03-31 Thread Emir Arnautovic
You would have to write your custom function for that. On 31.03.2016 11:24, ~$alpha` wrote: I am not sure how to use "Sort By Function" for Case. |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0| Can you tell how to fetch 40 when input is 10. -- View this message in co

Re: Facet by truncated date

2016-03-31 Thread Emir Arnautovic
don't want to specify a range? Or would I have to do year 0 to NOW? Thanks, Rob On 03/31/2016 10:26 AM, Emir Arnautovic wrote: Hi Yago, Not sure if I misunderstood the case, but assuming you have date field called my_date you can facet last 10 days by day using range qu

Re: Optimal indexing speed in Solr

2016-04-14 Thread Emir Arnautovic
Hi Edwin, Indexing speed depends on multiple factors: HW, Solr configurations and load, documents, indexing client: More complex documents, more CPU time to process each document before indexing structure is written down to disk. Bigger the index, more heap is used, more frequent GCs. Maybe you

Re: solr sql & streaming

2016-04-28 Thread Emir Arnautovic
Hi Shani, Are you running in SolrCloud mode? Here is blog post you can follow: https://sematext.com/blog/2016/04/18/solr-6-solrcloud-sql-support/ Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 28.

Re: problem with index size

2015-07-22 Thread Emir Arnautovic
Hi Daniel, Do you need all fields stored in your index? Only field that is not stored is host. Thanks, Emir On 22.07.2015 12:27, Daniel Holmes wrote: Hi All I have problem with index size in solr 4.7.2. My OS is Ubuntu 14.10 64-bit. my fields are : In one case for instance my seg

Re: Running SolrJ from Solr's REST API

2015-07-22 Thread Emir Arnautovic
Hi Edwin, Not sure if I understood your case, but if I got it right you are trying to write some code that will run as part of SOLR. If that's the case, then you should take a look how to write SOLR plugins (https://wiki.apache.org/solr/SolrPlugins). SolrJ is client side library that simplifies

Re: problem with index size

2015-07-22 Thread Emir Arnautovic
Is this test index? Do you rewrite documents with same ids? Did you try to optimize index? Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 22.07.2015 13:10, Daniel Holmes wrote: Upayavira number of docs in

Re: `cat /dev/null > solr-8983-console.log` frees host's memory

2015-10-21 Thread Emir Arnautovic
Hi Eric, As Shawn explained, memory is freed because it was used to cache portion of log file. Since you are already with Sematext, I guess you are aware, but doesn't hurt to remind you that we also have Logsene that you can use to manage your logs: http://sematext.com/logsene/index.html Th

Re: result grouping on all documents

2015-10-21 Thread Emir Arnautovic
Hi Christian, It seems to me that you can use range faceting to get counts. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 20.10.2015 17:05, Christian Reuschling wrote: Hi, we try to get the number

Re: Is it possible to specigfy only one-character term synonym for 2-gram tokenizer?

2015-10-22 Thread Emir Arnautovic
Hi Scott, I don't have experience with Chinese, but SynonymFilter works on tokens, so if CJKTokenizer recognizes C1 and Cm as tokens, it should work. If not, than you can try configuring PatternReplaceCharFilter to replace C1 to C2 during indexing and searching and get a match. Thanks, Emir

Re: Is it possible to specigfy only one-character term synonym for2-gram tokenizer?

2015-10-22 Thread Emir Arnautovic
ly, especially when applying highlight, e.g. search "C1C2" Solr returns highlight snippet such as "...C1C2...". Scott Chu,scott@udngroup.com <mailto:scott@udngroup.com> 2015/10/22 - Original Message - *From: *Emir Arnautovic <mailto:

Re: Is it possible to specigfy only one-character term synonymfor2-gram tokenizer?

2015-10-23 Thread Emir Arnautovic
...@udngroup.com <mailto:scott@udngroup.com> 2015/10/23 - Original Message - *From: *Emir Arnautovic <mailto:emir.arnauto...@sematext.com> *To: *solr-user <mailto:solr-user@lucene.apache.org> *Date: *2015-10-22, 18:20:38 *Subject: *Re: Is it possible to

Re: Does docValues impact termfreq ?

2015-10-26 Thread Emir Arnautovic
If I got it right, you are using term query, use function to get TF as score, iterate all documents in results and sum up total number of occurrences of specific term in index? Is this only way you use index or this is side functionality? Thanks, Emir On 24.10.2015 22:28, Aki Balogh wrote: C

Re: Does docValues impact termfreq ?

2015-10-26 Thread Emir Arnautovic
will have updatable, fast total frequency lookups. Thanks, Emir On 26.10.2015 14:43, Aki Balogh wrote: Hi Emir, This is correct. This is the only way we use the index. Thanks, Aki On Mon, Oct 26, 2015 at 9:31 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: If I got it right, y

Re: solr 5.3.0 master-slave: TWO segments after optimize

2015-10-28 Thread Emir Arnautovic
Hi Andrii, Observed high CPU is on master or slave? If on slave, is it on all slaves? Can you do thread dump and see if what is running. Based on numbers this seems like small index and one segment is flush with only 160 doc. Anyway, this is small and something is really wrong if you notice i

Re: Solr results relevancy / scoring

2015-11-09 Thread Emir Arnautovic
To get answer for why 15, you can use field analysis for index/query and see that "15%" is probably tokenized and as both 15 and 15%. Emir On 06.11.2015 20:22, Erick Erickson wrote: I'm not sure what the question your asking is. You say that you have debugged the query and the score for 15 is

Re: solr search relevancy

2015-11-09 Thread Emir Arnautovic
Hi Dhanesh, Several things you could try: * when you are searching for "bank" you are actually searching for tag/category and in your query you are boosting name 300 while tag is 3. * you must not sort on premium content weight - you can either use boost query clauses to prefer premium content *

Re: Search query speed

2015-11-12 Thread Emir Arnautovic
What are HW specs. 4 threads is not much but still makes test less deterministic, especially in case when queries are not equally "heavy". Can you also collect QTime from Solr response and see if differences are caused by networking. Emir On 11.11.2015 20:44, John Stric wrote: There is a .N

Re: Solr Cloud 5.3.0 Errors in Logs

2015-11-16 Thread Emir Arnautovic
Hi Adrian, Can you give us bit more details about warmup queries you use and test that you are running when error occurs. Thanks, Emir On 16.11.2015 08:40, Adrian Liew wrote: Hi there, Will like to get some opinions on the errors encountered below. I have currently setup a SolrCloud cluster

Re: Undo Split Shard

2015-11-17 Thread Emir Arnautovic
Hi, You can try manually adjusting cluster state in ZK to include parent shard and exclude splits, reload collection and try split again. Btw. any error in logs when split failed? Thanks, Emir On 17.11.2015 07:08, kiyer_adobe wrote: We had 32 shards of 30GB each. The query performance was aw

Re: Performance testing on SOLR cloud

2015-11-18 Thread Emir Arnautovic
Hi Aswath, It is not common to test only QPS unless it is static index most of the time. Usually you have to test and tune worst case scenario - max expected indexing rate + queries. You can get more QPS by reducing query latency or by increasing number of replicas. You manage latency by tunin

Re: solr indexing warning

2015-11-19 Thread Emir Arnautovic
This means that one searcher is still warming when other searcher created due to commit with openSearcher=true. This can be due to frequent commits of searcher warmup taking too long. Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support *

Re: solr indexing warning

2015-11-20 Thread Emir Arnautovic
Hi, Since this is master node, and not expected to have queries, you can disable caches completely. However, from numbers cache autowarm is not an issue here but probably frequency of commits and/or warmup queries. How do you do commits? Since master-slave, I don't see reason to have them too

Re: ZooKeeper nodes die taking down Solr Cluster?

2015-12-01 Thread Emir Arnautovic
Hi Frank, Can you please confirm that Solr nodes are aware of entire ZK ensemble? Can you give more info how it is deployed - ZK on separate servers? What is load on Solr when it happens? Do you see any errors in Solr logs? Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centraliz

Re: ZooKeeper nodes die taking down Solr Cluster?

2015-12-01 Thread Emir Arnautovic
t;solr.StandardTokenizerFactory" }, "filters":[ { "class":"solr.StopFilterFactory", "ignoreCase":true, "words":"stopwords.txt" }, { "class":"solr.SynonymFilterFactory", "synonyms"

Re: FastVector Highlighter

2015-12-07 Thread Emir Arnautovic
Hi Edwin, FastVector Highlighter requires term vector, positions and frequencies, so if it is not enabled on fields that you want to highlight, it will increase index size. Since it is common to have those enabled for standard highlighter to speed up highlighting, those might already be enable

Re: FastVector Highlighter

2015-12-07 Thread Emir Arnautovic
xing. That is where I found that the index size is bigger than previously when I was using the Original Highlighter. Regards, Edwin On 7 December 2015 at 19:19, Emir Arnautovic wrote: Hi Edwin, FastVector Highlighter requires term vector, positions and frequencies, so if it is not enabled on f

Re: Solr 5.2.1 deadlock on commit

2015-12-08 Thread Emir Arnautovic
Hi Ali, This thread is blocked because cannot obtain update lock - in this particular case when doing soft commit. I am guessing that there others are blocked for the same reason. Can you tell us bit more about your setup and indexing load and procedure? Do you do explicit commits? Regards, E

Re: Use multiple istance simultaneously

2015-12-08 Thread Emir Arnautovic
Can you tolerate having indices in different state or you plan to keep them in sync with controlled commits. DIH-ing content from source when new machine is needed will probably be slow and I am afraid that you will end up simulating master-slave model (copying state from one of healthy nodes

  1   2   >