Re: Heavy operations in PostFilter are heavy

2018-01-03 Thread Solrmails
Yes I do so. The Problem ist that the collect-Method is called for EVERY document the query matches. Even if the User only wants to see like 10 documents. The Operation I have to perform takes maybe 50ms/per document if have to process them singel. And maybe 30ms if I could get a Document-List.

Re: Query fields with data of certain length

2018-01-03 Thread Zheng Lin Edwin Yeo
Hi Emir, So this would likely be different from what the operating system counts, as the operating system may consider each Chinese characters as 3 to 4 bytes. Which is probably why I could not find any record with subject:/.{255,}.*/ Is there other tools that we can use to query the length for d

problem with Solr Sorting by score and distance together

2018-01-03 Thread Deepak Udapudi
Hi all, Problem :- Assume that, I am searching for car care centers. Solr collection has the data for all the major car care centers. As an example I search for Firestone car care centers in a 5 miles radius. In the search results I am supposed to receive the firestone car care centers list w

problem with Solr Sorting by score and distance together

2018-01-03 Thread Deepak Udapudi
Hi all, Problem :- Assume that, I am searching for car care centers. Solr collection has the data for all the major car care centers. As an example I search for Firestone car care centers in a 5 miles radius. In the search results I am supposed to receive the firestone car care centers list w

Re: Small Tokenization issue

2018-01-03 Thread Shawn Heisey
On 1/3/2018 1:56 PM, Nawab Zada Asad Iqbal wrote: Thanks Emir, Erick. What i want to do is remove empty tokens after WordDelimiterGraphFilter ? Is there any such option in WordDelimiterGraphFilter to not generate empty tokens? I use LengthFilterFactory with a minimum of 1 and a maximum of 512

Re: Is DataImportHandler ready for production-usage?

2018-01-03 Thread Shawn Heisey
On 1/3/2018 2:20 PM, Tech Id wrote: I stumbled across https://wiki.apache.org/solr/DataImportHandler and found it matching my needs exactly. So I just wanted to confirm if it is an actively supported plugin, before I start using it for production. Are there any users who have had a good or a bad

Re: Is DataImportHandler ready for production-usage?

2018-01-03 Thread Erick Erickson
It's been around forever and lots of people use it in production. That said, an independent client using SolrJ is often preferable for reasons outlined here: https://lucidworks.com/2012/02/14/indexing-with-solrj/ If DIH fits your needs by all means use it. The article I linked to, though, provide

Re: Small Tokenization issue

2018-01-03 Thread Erick Erickson
WordDelimiterGraphFilterFactory is a new implementation so it's also quite possible that the behavior just changed. I just took a look and indeed it does. WordDelimiterFilterFactory (done on "p / n whatever) produces token: p n whatever position: 1 2 3 whereas WordDelimiterGraphFilt

Is DataImportHandler ready for production-usage?

2018-01-03 Thread Tech Id
Hi, I stumbled across https://wiki.apache.org/solr/DataImportHandler and found it matching my needs exactly. So I just wanted to confirm if it is an actively supported plugin, before I start using it for production. Are there any users who have had a good or a bad experience with DIH ? Thanks TI

Small Query.

2018-01-03 Thread Fiz Newyorker
Hello Solr Group, I have a small Question ? How does the Autosuggest and Spell Check work together in SOLR. ? I need to implement AutoSuggest on word”iPhine” But this should return the Results of “iPhone” on Autosuggest ? What is the best Suggester Component for addressing this requirement ?

Re: Small Tokenization issue

2018-01-03 Thread Nawab Zada Asad Iqbal
Thanks Emir, Erick. What i want to do is remove empty tokens after WordDelimiterGraphFilter ? Is there any such option in WordDelimiterGraphFilter to not generate empty tokens? This index field is intended to use for strange strings e.g. part numbers. P/N HSC0424PP The benefit of removing the emp

Re: Small Tokenization issue

2018-01-03 Thread Emir Arnautović
Hi Nawab, The reason why you do not get shingle is because there is empty token because after tokenizer you have 3 tokens ‘abc’, ‘-’ and ‘def’ so the token that you are interested in are not next to each other and cannot form shingle. What you can do is apply char filter before tokenization to re

Re: Small Tokenization issue

2018-01-03 Thread Erick Erickson
If it's regular, you could try using a PatternReplaceCharFilterFactory (PRCFF), which gets applied to the input before tokenization (note, this is NOT PatternReplaceFilterFatory, which gets applied after tokenization). I don't really see how you could make this work though. WhitespaceTokenizer wil

Small Tokenization issue

2018-01-03 Thread Nawab Zada Asad Iqbal
Hi, So, I have a string for indexing: abc - def (notice the space on either side of hyphen) which is being processed with this filter-list:- I get two shingle tokens at the e

Re: Sorting on Child document.

2018-01-03 Thread crezy
Hello All, any updates on my post. It's too much urget. Thanks -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Always use leader for searching queries

2018-01-03 Thread Walter Underwood
If you have a field for the indexed datetime, you can use a filter query to get rid of recent updates that might be in transit. I’d use double the autocommit time, to leave time for the followers to index. If the autocommit interval is one minute: fq=indexed_datetime:[* TO NOW-2MIN] wunder Wal

Re: Always use leader for searching queries

2018-01-03 Thread Erick Erickson
[I probably not need to do this because I have only one shard but I did anyway count was different.] This isn't what I meant. I meant to query each replica directly _within_ the same shard. Your problem statement is that the leader and replicas (I use "followers") have different document counts. H

Re: DIH XPathEntityProcessor XPath subset?

2018-01-03 Thread Erik Hatcher
Stefan - If you pre-transform the XML, I’d personally recommend either transforming it into straight up Solr XML (docs/fields/values) or some other format or posting directly to Solr. Avoid this DIH thing when things get complicated. Erik > On Jan 3, 2018, at 11:40 AM, Stefan Moises

DIH XPathEntityProcessor XPath subset?

2018-01-03 Thread Stefan Moises
Hi there, I'm trying to index a wordpress site using DIH XPathEntityProcessor... I've read it only supports a subset of XPath, but I couldn't find any docs what exactly is supported. After some painful trial and error, I've found that xpath expressions like the following don't work:   

Re: Removing some fields from uprefix

2018-01-03 Thread Zheng Lin Edwin Yeo
Hi Alex, Thanks for your advice. It works. Regards, Edwin On 3 January 2018 at 23:06, Alexandre Rafalovitch wrote: > uprefix is only for the fields that do NOT exist in schema. So, you > can define your x_parsed_by in schema, but map it to the type that has > index=false, store=false, docvalu

Re: Query fields with data of certain length

2018-01-03 Thread Emir Arnautović
Hi Edwin, I do not know, but my guess would be that each character is counted as 1 in regex regardless how many bytes it takes in used encoding. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >

Re: Query fields with data of certain length

2018-01-03 Thread Zheng Lin Edwin Yeo
Thanks for the reply. I am doing the search on existing data that has already been indexed, and it is likely to be a one time thing. This subject:/.{255,}.*/ works for English characters. However, there are Chinese characters in some of the records. The length seems to be more than 255, but it

Re: SolrJ with Async Http Client

2018-01-03 Thread Walter Underwood
HTTPClient is non-blocking. Send the request, then the client gets control back. It only blocks when you do the read. So one thread can send multiple requests then check for each response. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 3, 2018

Re: SolrCloud Nodes going to recovery state during indexing

2018-01-03 Thread Emir Arnautović
Hi Sravan, DBQ does not play well with indexing - it causes indexing to be completely blocked on replicas while it is running. It is highly likely that it is the root cause of your issues. If you can change indexing logic to avoid it, you can quickly test it. What you can do as a workaround is t

Re: Heavy operations in PostFilter are heavy

2018-01-03 Thread Alexandre Rafalovitch
Are you doing cache=false and cost > 100? See the recent article on the topic deep-dive, if you haven't: https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/ Regards, Alex. On 3 January 2018 at 05:31, Solrmails wrote: > Hello, > > I tried to write a Solr PostFilter to do f

Re: Removing some fields from uprefix

2018-01-03 Thread Alexandre Rafalovitch
uprefix is only for the fields that do NOT exist in schema. So, you can define your x_parsed_by in schema, but map it to the type that has index=false, store=false, docvalues=false. Which means the field is acknowledged but effectively dropped. Regards, Alex. On 3 January 2018 at 05:53, Zheng

Re: SolrCloud Nodes going to recovery state during indexing

2018-01-03 Thread Sravan Kumar
Emir, Yes there is a delete_by_query on every bulk insert. This delete_by_query deletes all the documents which are updated lesser than a day before the current time. Is bulk delete_by_query the reason? On Wed, Jan 3, 2018 at 7:58 PM, Emir Arnautović < emir.arnauto...@sematext.com> wro

Re: Query fields with data of certain length

2018-01-03 Thread Alexandre Rafalovitch
Do that during indexing as Emir suggested. Specifically, use an UpdateRequestProcessor chain, probably with the Clone and FieldLength processors: http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html Regards, Alex. On 31 December

Re: SolrCloud Nodes going to recovery state during indexing

2018-01-03 Thread Emir Arnautović
Do you have deletes by query while indexing or it is append only index? Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 3 Jan 2018, at 12:16, sravan wrote: > > SolrCloud Nodes going to rec

Re: Query fields with data of certain length

2018-01-03 Thread Emir Arnautović
Hi Edwin, If it is one time thing you can use regex to filter out results that are not long enough. Something like: subject:/.{255,}.*/. Of course, this means subject is not tokenized. It would be probably best if you index subject length as separate field and include it in query as subject_leng

Re: Limit edismax search to a certain field value and find out matched fields on the results

2018-01-03 Thread Emir Arnautović
Hi Sami, I would just add that it is probably better to use fq to limit results to some category, e.g. q=iphone&fq=category:phones. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 31 Dec 20

Re: SolrJ with Async Http Client

2018-01-03 Thread Joel Bernstein
Streaming expressions has an event driven architecture built in. There are two blogs that describe how it works. This describes the message queues: http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html This describes an async model of execution: http://joelsolr.blogspot.

Re: SolrJ with Async Http Client

2018-01-03 Thread RAUNAK AGRAWAL
Yes, I am talking about event driven way of calling solr, so that I can write pure async web service. Does SolrJ provides support for non-blocking calls? On Wed, Jan 3, 2018 at 6:22 PM, Hendrik Haddorp wrote: > There is asynchronous and non-blocking. If I use 100 threads to perform > calls to So

Re: SolrJ with Async Http Client

2018-01-03 Thread Hendrik Haddorp
There is asynchronous and non-blocking. If I use 100 threads to perform calls to Solr using the standard Java HTTP client or SolrJ I block 100 threads even if I don't block my program logic threads by using async calls. However if I perform those HTTP calls using a non-blocking HTTP client, lik

SolrCloud Nodes going to recovery state during indexing

2018-01-03 Thread sravan
SolrCloud Nodes going to recovery state during indexing We have solr cloud setup with the settings shared below. We have a collection with 3 shards and a replica for each of them. Normal State(As soon as the whole cluster is restarted):     - Status of all the shards is UP.     - a bulk updat

Removing some fields from uprefix

2018-01-03 Thread Zheng Lin Edwin Yeo
Hi, I'm using Solr 7.2.0, and I have this /extract handler in my solrconfig.xml /xhtml:html/xhtml:body/descendant:node() content attr_meta_ attr_ true dedupe Understand that this attr_ will cause all generated fileds that aren't defined in the

Heavy operations in PostFilter are heavy

2018-01-03 Thread Solrmails
Hello, I tried to write a Solr PostFilter to do filtering within the 'collect'-Method(DelegatingCollector). I have to do some heavy operations within the 'collect'-Method. This isn't a problem for a few results. But unfortunately it taks forever with 50 or more results. This is because I have

Re: Always use leader for searching queries

2018-01-03 Thread Novin Novin
Hi Erick, Thanks for your reply. [ First of all, replicas can be off in terms of counts for the soft commit interval. The commits don't all happen on the replicas at the same wall-clock time. Solr promises eventual consistency, in this case NOW-autocommit time.] I realized that, to stop it. I ha