Yes I do so. The Problem ist that the collect-Method is called for EVERY
document the query matches. Even if the User only wants to see like 10
documents. The Operation I have to perform takes maybe 50ms/per document if
have to process them singel. And maybe 30ms if I could get a Document-List.
Hi Emir,
So this would likely be different from what the operating system counts, as
the operating system may consider each Chinese characters as 3 to 4 bytes.
Which is probably why I could not find any record with subject:/.{255,}.*/
Is there other tools that we can use to query the length for d
Hi all,
Problem :-
Assume that, I am searching for car care centers. Solr collection has the data
for all the major car care centers. As an example I search for Firestone car
care centers in a 5 miles radius. In the search results I am supposed to
receive the firestone car care centers list w
Hi all,
Problem :-
Assume that, I am searching for car care centers. Solr collection has the data
for all the major car care centers. As an example I search for Firestone car
care centers in a 5 miles radius. In the search results I am supposed to
receive the firestone car care centers list w
On 1/3/2018 1:56 PM, Nawab Zada Asad Iqbal wrote:
Thanks Emir, Erick.
What i want to do is remove empty tokens after WordDelimiterGraphFilter ?
Is there any such option in WordDelimiterGraphFilter to not generate empty
tokens?
I use LengthFilterFactory with a minimum of 1 and a maximum of 512
On 1/3/2018 2:20 PM, Tech Id wrote:
I stumbled across https://wiki.apache.org/solr/DataImportHandler and found
it matching my needs exactly.
So I just wanted to confirm if it is an actively supported plugin, before I
start using it for production.
Are there any users who have had a good or a bad
It's been around forever and lots of people use it in production.
That said, an independent client using SolrJ is often preferable for
reasons outlined here:
https://lucidworks.com/2012/02/14/indexing-with-solrj/
If DIH fits your needs by all means use it. The article I linked to,
though, provide
WordDelimiterGraphFilterFactory is a new implementation so it's also
quite possible that the behavior just changed.
I just took a look and indeed it does. WordDelimiterFilterFactory
(done on "p / n whatever) produces
token: p n whatever
position: 1 2 3
whereas WordDelimiterGraphFilt
Hi,
I stumbled across https://wiki.apache.org/solr/DataImportHandler and found
it matching my needs exactly.
So I just wanted to confirm if it is an actively supported plugin, before I
start using it for production.
Are there any users who have had a good or a bad experience with DIH ?
Thanks
TI
Hello Solr Group,
I have a small Question ? How does the Autosuggest and Spell Check work
together in SOLR. ? I need to implement AutoSuggest on word”iPhine” But
this should return the Results of “iPhone” on Autosuggest ? What is the
best Suggester Component for addressing this requirement ?
Thanks Emir, Erick.
What i want to do is remove empty tokens after WordDelimiterGraphFilter ?
Is there any such option in WordDelimiterGraphFilter to not generate empty
tokens?
This index field is intended to use for strange strings e.g. part numbers.
P/N HSC0424PP
The benefit of removing the emp
Hi Nawab,
The reason why you do not get shingle is because there is empty token because
after tokenizer you have 3 tokens ‘abc’, ‘-’ and ‘def’ so the token that you
are interested in are not next to each other and cannot form shingle.
What you can do is apply char filter before tokenization to re
If it's regular, you could try using a PatternReplaceCharFilterFactory
(PRCFF), which gets applied to the input before tokenization (note,
this is NOT PatternReplaceFilterFatory, which gets applied after
tokenization).
I don't really see how you could make this work though.
WhitespaceTokenizer wil
Hi,
So, I have a string for indexing:
abc - def (notice the space on either side of hyphen)
which is being processed with this filter-list:-
I get two shingle tokens at the e
Hello All,
any updates on my post.
It's too much urget.
Thanks
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
If you have a field for the indexed datetime, you can use a filter query to get
rid of recent updates that might be in transit. I’d use double the autocommit
time, to leave time for the followers to index.
If the autocommit interval is one minute:
fq=indexed_datetime:[* TO NOW-2MIN]
wunder
Wal
[I probably not need to do this because I have only one shard but I did
anyway count was different.]
This isn't what I meant. I meant to query each replica directly
_within_ the same shard. Your problem statement is that the leader and
replicas (I use "followers") have different document counts. H
Stefan -
If you pre-transform the XML, I’d personally recommend either transforming it
into straight up Solr XML (docs/fields/values) or some other format or posting
directly to Solr. Avoid this DIH thing when things get complicated.
Erik
> On Jan 3, 2018, at 11:40 AM, Stefan Moises
Hi there,
I'm trying to index a wordpress site using DIH XPathEntityProcessor...
I've read it only supports a subset of XPath, but I couldn't find any
docs what exactly is supported.
After some painful trial and error, I've found that xpath expressions
like the following don't work:
Hi Alex,
Thanks for your advice. It works.
Regards,
Edwin
On 3 January 2018 at 23:06, Alexandre Rafalovitch
wrote:
> uprefix is only for the fields that do NOT exist in schema. So, you
> can define your x_parsed_by in schema, but map it to the type that has
> index=false, store=false, docvalu
Hi Edwin,
I do not know, but my guess would be that each character is counted as 1 in
regex regardless how many bytes it takes in used encoding.
Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
Thanks for the reply.
I am doing the search on existing data that has already been indexed, and
it is likely to be a one time thing.
This subject:/.{255,}.*/ works for English characters. However, there are
Chinese characters in some of the records. The length seems to be more than
255, but it
HTTPClient is non-blocking. Send the request, then the client gets control
back. It only blocks when you do the read. So one thread can send multiple
requests then check for each response.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 3, 2018
Hi Sravan,
DBQ does not play well with indexing - it causes indexing to be completely
blocked on replicas while it is running. It is highly likely that it is the
root cause of your issues. If you can change indexing logic to avoid it, you
can quickly test it. What you can do as a workaround is t
Are you doing cache=false and cost > 100?
See the recent article on the topic deep-dive, if you haven't:
https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/
Regards,
Alex.
On 3 January 2018 at 05:31, Solrmails wrote:
> Hello,
>
> I tried to write a Solr PostFilter to do f
uprefix is only for the fields that do NOT exist in schema. So, you
can define your x_parsed_by in schema, but map it to the type that has
index=false, store=false, docvalues=false. Which means the field is
acknowledged but effectively dropped.
Regards,
Alex.
On 3 January 2018 at 05:53, Zheng
Emir,
Yes there is a delete_by_query on every bulk insert.
This delete_by_query deletes all the documents which are updated lesser
than a day before the current time.
Is bulk delete_by_query the reason?
On Wed, Jan 3, 2018 at 7:58 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wro
Do that during indexing as Emir suggested. Specifically, use an
UpdateRequestProcessor chain, probably with the Clone and FieldLength
processors:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
Regards,
Alex.
On 31 December
Do you have deletes by query while indexing or it is append only index?
Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 3 Jan 2018, at 12:16, sravan wrote:
>
> SolrCloud Nodes going to rec
Hi Edwin,
If it is one time thing you can use regex to filter out results that are not
long enough. Something like: subject:/.{255,}.*/.
Of course, this means subject is not tokenized.
It would be probably best if you index subject length as separate field and
include it in query as subject_leng
Hi Sami,
I would just add that it is probably better to use fq to limit results to some
category, e.g. q=iphone&fq=category:phones.
Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> On 31 Dec 20
Streaming expressions has an event driven architecture built in. There are
two blogs that describe how it works.
This describes the message queues:
http://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html
This describes an async model of execution:
http://joelsolr.blogspot.
Yes, I am talking about event driven way of calling solr, so that I can
write pure async web service. Does SolrJ provides support for non-blocking
calls?
On Wed, Jan 3, 2018 at 6:22 PM, Hendrik Haddorp
wrote:
> There is asynchronous and non-blocking. If I use 100 threads to perform
> calls to So
There is asynchronous and non-blocking. If I use 100 threads to perform
calls to Solr using the standard Java HTTP client or SolrJ I block 100
threads even if I don't block my program logic threads by using async
calls. However if I perform those HTTP calls using a non-blocking HTTP
client, lik
SolrCloud Nodes going to recovery state during indexing
We have solr cloud setup with the settings shared below. We have a
collection with 3 shards and a replica for each of them.
Normal State(As soon as the whole cluster is restarted):
- Status of all the shards is UP.
- a bulk updat
Hi,
I'm using Solr 7.2.0, and I have this /extract handler in my solrconfig.xml
/xhtml:html/xhtml:body/descendant:node()
content
attr_meta_
attr_
true
dedupe
Understand that this attr_ will cause all
generated fileds that aren't defined in the
Hello,
I tried to write a Solr PostFilter to do filtering within the
'collect'-Method(DelegatingCollector). I have to do some heavy operations
within the 'collect'-Method. This isn't a problem for a few results. But
unfortunately it taks forever with 50 or more results. This is because I have
Hi Erick,
Thanks for your reply.
[ First of all, replicas can be off in terms of counts for the soft
commit interval. The commits don't all happen on the replicas at the
same wall-clock time. Solr promises eventual consistency, in this case
NOW-autocommit time.]
I realized that, to stop it. I ha
38 matches
Mail list logo