Try URLClassifyProcessorFactory in the processing chain instead, configured
in solrconfig.xml
There is very little documentation for it, so check the source for exact
params. Or search for the blog post introducing it several years ago.
Documentation patches would be welcome.
Regards,
Alex
Thank you Shawn. It looks like it is being applied. This could be some
sort of chain reaction where:
Drive or server fails. HDFS starts to replicate blocks which causes
network congestion. Solr7 can't talk, so initiates a replication
process which causes more network congestionwhich ca
bq. So my question, what does happen when I'm sending index request to
replica instead of leader server?
The replica forwards the document to the leader which then distributes
to _all_ replicas, including the replica that originally forwarded the
document.
Best,
Erick
On Tue, Jun 12, 2018 at 1:3
Looks like stop words (in, and, on) is what is breaking. The regex looks
like it is correct.
Kevin Risden
On Tue, Jun 12, 2018, 18:02 Hanjan, Harinder
wrote:
> Hello!
>
> I am indexing web documents and have a need to extract their top-level URL
> to be stored in a different field. I have had s
Hello!
I am indexing web documents and have a need to extract their top-level URL to
be stored in a different field. I have had some success with the
PatternTokenizerFactory (relevant schema bits at the bottom) but the behavior
appears to be inconsistent. Most of the times, the top level URL i
I believe it becomes a federator and resends the request to the leader, but
someone else more intimately familiar can correct me.
Devansh Dhutia
Development Manager, Content Ingestion
USA TODAY Network
From: SOLR4189
Reply-To: "solr-user@lucene.apache.org"
Date: Friday, June 8, 2018 at 6:03 AM
Having the tlogs be huge is a red flag. Do you have buffering enabled
in CDCR? This was something of a legacy option that's going to be
removed, it's been made obsolete by the ability of CDCR to bootstrap
the entire index. Buffering should be disabled always.
Another reason tlogs can grow is if yo
Hi all,
Recently we have gone live using CDCR on our 2 node solr cloud cluster
(7.2.1). From a CDCR perspective, everything seems to be working
fine...collections are staying in sync across the cluster, everything looks
good.
The issue we are seeing is with 1 collection in particular, after we se
In a mixed-hardware situation you can certainly place replicas as you
choose. Create a minimal collection or use the special nodeset EMPTY
and then place your replicas one-by-one.
You can also consider "replica placement rules", see:
https://lucene.apache.org/solr/guide/6_6/rule-based-replica-plac
Hi Team,
I am trying to index a document from HDFS in version solr 4.9 and getting
below error:
Command used by me :
QUEUE_NAME=default
MORPHLINE_CONF=/home/sshuser/solrruncls/morphline_retail_tr_2017.conf
OUTPUT_DIR=adl:///solr
ZK_HOST=zk0-hnrhba.thxycti2fi4ejjab25g3ibtkog.gx.internal.cloudapp
On 6/11/2018 9:46 AM, Joe Obernberger wrote:
> We are seeing an issue on our Solr Cloud 7.3.1 cluster where
> replication starts and pegs network interfaces so aggressively that
> other tasks cannot talk. We will see it peg a bonded 2GB interfaces.
> In some cases the replication fails over and o
On 6/12/2018 9:12 AM, Michael Braun wrote:
> The way to handle this right now looks to be running additional Solr
> instances on nodes with increased resources to balance the load (so if the
> machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4
> instances, respectively). Has anyone
What does your base hardware configuration look like?
You could have several VM's on machines with higher configuration.
Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"
+91 73500 12833
deic...@gmail.c
We have a case of a Solr Cloud cluster with different kinds of nodes - some
may have significant differences in hardware specs (50-100% more
HD/RAM/CPU, etc). Ideally nodes with increased resources could take on more
shard replicas.
It looks like the Collections API (
https://lucene.apache.org/sol
I observed that the build works if the data size is below 25M. The moment
the records go beyond that, this OOM error shows up. Solar itself shows 56%
usage of 20GB space during the build. So, is there some settings I need to
change to handle larger data size?
On Tue, Jun 12, 2018 at 3:17 PM, Aless
On 6/12/2018 2:56 AM, Marc Lammers wrote:
I want to sort my data by a multivalued field. I add this to my query
„*sort=field(foo,min)
asc“*. The configuration in the schema for this field is
The documentation for the field function says that the field must
contain numeric docvalues. Your fi
Derek -
One trick I like to do is try various forms of a query all in one go. With
facet=on, you can:
&facet.query=big brown bear
&facet.query=big brown
&facet.query=brown bear
&facet.query=big
&facet.query=brown
&facet.query=bear
The returned counts give you an indication of what
On top of that I would not recommend to use the schema-less mode in
production.
That mode is useful for experimenting and prototyping, but with a managed
schema you would have much more control over a production instance.
Regards
-
---
Alessandro Benedetti
Search Consultant, R&D
Hi,
first of all the two different suggesters you are using are based on
different data structures ( with different memory utilisation) :
- FuzzyLookupFactory -> FST ( in memory and stored binary on disk)
- AnalyzingInfixLookupFactory -> Auxiliary Lucene Index
Both the data structures should be v
I would recommend to look into the Highlight feature[1] .
There are few implementations and they should be all right for your user
requirement.
Regards
[1] https://lucene.apache.org/solr/guide/7_3/highlighting.html
-
---
Alessandro Benedetti
Search Consultant, R&D Software Engi
Hi All.
I want to sort my data by a multivalued field. I add this to my query
„*sort=field(foo,min)
asc“*. The configuration in the schema for this field is
The solr documentation says that i have to add the docValues="true"
attribute for this field. After this I deleted the collection an
Sorry I realized the strike through on the term "bear" in "big brown
bear" cannot be displayaccordinglyin the mailing list.
My aim is to have the search terms "big brown bear", display on the
search result page with the term "bear" striked through since it does
not have a match in the search res
22 matches
Mail list logo