Re: Extracting top level URL when indexing document

2018-06-12 Thread Alexandre Rafalovitch
Try URLClassifyProcessorFactory in the processing chain instead, configured in solrconfig.xml There is very little documentation for it, so check the source for exact params. Or search for the blog post introducing it several years ago. Documentation patches would be welcome. Regards, Alex

Re: Solr 7 + HDFS issue

2018-06-12 Thread Joe Obernberger
Thank you Shawn.  It looks like it is being applied.  This could be some sort of chain reaction where: Drive or server fails.  HDFS starts to replicate blocks which causes network congestion.  Solr7 can't talk, so initiates a replication process which causes more network congestionwhich ca

Re: Indexing to replica instead leader

2018-06-12 Thread Erick Erickson
bq. So my question, what does happen when I'm sending index request to replica instead of leader server? The replica forwards the document to the leader which then distributes to _all_ replicas, including the replica that originally forwarded the document. Best, Erick On Tue, Jun 12, 2018 at 1:3

Re: Extracting top level URL when indexing document

2018-06-12 Thread Kevin Risden
Looks like stop words (in, and, on) is what is breaking. The regex looks like it is correct. Kevin Risden On Tue, Jun 12, 2018, 18:02 Hanjan, Harinder wrote: > Hello! > > I am indexing web documents and have a need to extract their top-level URL > to be stored in a different field. I have had s

Extracting top level URL when indexing document

2018-06-12 Thread Hanjan, Harinder
Hello! I am indexing web documents and have a need to extract their top-level URL to be stored in a different field. I have had some success with the PatternTokenizerFactory (relevant schema bits at the bottom) but the behavior appears to be inconsistent. Most of the times, the top level URL i

Re: Indexing to replica instead leader

2018-06-12 Thread Dhutia, Devansh
I believe it becomes a federator and resends the request to the leader, but someone else more intimately familiar can correct me. Devansh Dhutia Development Manager, Content Ingestion USA TODAY Network From: SOLR4189 Reply-To: "solr-user@lucene.apache.org" Date: Friday, June 8, 2018 at 6:03 AM

Re: Suggestions for debugging performance issue

2018-06-12 Thread Erick Erickson
Having the tlogs be huge is a red flag. Do you have buffering enabled in CDCR? This was something of a legacy option that's going to be removed, it's been made obsolete by the ability of CDCR to bootstrap the entire index. Buffering should be disabled always. Another reason tlogs can grow is if yo

Suggestions for debugging performance issue

2018-06-12 Thread Chris Troullis
Hi all, Recently we have gone live using CDCR on our 2 node solr cloud cluster (7.2.1). From a CDCR perspective, everything seems to be working fine...collections are staying in sync across the cluster, everything looks good. The issue we are seeing is with 1 collection in particular, after we se

Re: Hardware-Aware Solr Coud Sharding?

2018-06-12 Thread Erick Erickson
In a mixed-hardware situation you can certainly place replicas as you choose. Create a minimal collection or use the special nodeset EMPTY and then place your replicas one-by-one. You can also consider "replica placement rules", see: https://lucene.apache.org/solr/guide/6_6/rule-based-replica-plac

Issues while running map-reduce index tool.

2018-06-12 Thread Sujeet Singh
Hi Team, I am trying to index a document from HDFS in version solr 4.9 and getting below error: Command used by me : QUEUE_NAME=default MORPHLINE_CONF=/home/sshuser/solrruncls/morphline_retail_tr_2017.conf OUTPUT_DIR=adl:///solr ZK_HOST=zk0-hnrhba.thxycti2fi4ejjab25g3ibtkog.gx.internal.cloudapp

Re: Solr 7 + HDFS issue

2018-06-12 Thread Shawn Heisey
On 6/11/2018 9:46 AM, Joe Obernberger wrote: > We are seeing an issue on our Solr Cloud 7.3.1 cluster where > replication starts and pegs network interfaces so aggressively that > other tasks cannot talk.  We will see it peg a bonded 2GB interfaces.  > In some cases the replication fails over and o

Re: Hardware-Aware Solr Coud Sharding?

2018-06-12 Thread Shawn Heisey
On 6/12/2018 9:12 AM, Michael Braun wrote: > The way to handle this right now looks to be running additional Solr > instances on nodes with increased resources to balance the load (so if the > machines are 1x, 1.5x, and 2x, run 2 instances, 3 instances, and 4 > instances, respectively). Has anyone

Re: Hardware-Aware Solr Coud Sharding?

2018-06-12 Thread Deepak Goel
What does your base hardware configuration look like? You could have several VM's on machines with higher configuration. Deepak "The greatness of a nation can be judged by the way its animals are treated. Please consider stopping the cruelty by becoming a Vegan" +91 73500 12833 deic...@gmail.c

Hardware-Aware Solr Coud Sharding?

2018-06-12 Thread Michael Braun
We have a case of a Solr Cloud cluster with different kinds of nodes - some may have significant differences in hardware specs (50-100% more HD/RAM/CPU, etc). Ideally nodes with increased resources could take on more shard replicas. It looks like the Collections API ( https://lucene.apache.org/sol

Re: Solr Suggest Component and OOM

2018-06-12 Thread Ratnadeep Rakshit
I observed that the build works if the data size is below 25M. The moment the records go beyond that, this OOM error shows up. Solar itself shows 56% usage of 20GB space during the build. So, is there some settings I need to change to handle larger data size? On Tue, Jun 12, 2018 at 3:17 PM, Aless

Re: Solr sort multivalued field

2018-06-12 Thread Shawn Heisey
On 6/12/2018 2:56 AM, Marc Lammers wrote: I want to sort my data by a multivalued field. I add this to my query „*sort=field(foo,min) asc“*. The configuration in the schema for this field is The documentation for the field function says that the field must contain numeric docvalues.  Your fi

Re: How to find out which search terms have matches in a search

2018-06-12 Thread Erik Hatcher
Derek - One trick I like to do is try various forms of a query all in one go. With facet=on, you can: &facet.query=big brown bear &facet.query=big brown &facet.query=brown bear &facet.query=big &facet.query=brown &facet.query=bear The returned counts give you an indication of what

Re: Changing Field Assignments

2018-06-12 Thread Alessandro Benedetti
On top of that I would not recommend to use the schema-less mode in production. That mode is useful for experimenting and prototyping, but with a managed schema you would have much more control over a production instance. Regards - --- Alessandro Benedetti Search Consultant, R&D

Re: Solr Suggest Component and OOM

2018-06-12 Thread Alessandro Benedetti
Hi, first of all the two different suggesters you are using are based on different data structures ( with different memory utilisation) : - FuzzyLookupFactory -> FST ( in memory and stored binary on disk) - AnalyzingInfixLookupFactory -> Auxiliary Lucene Index Both the data structures should be v

Re: How to find out which search terms have matches in a search

2018-06-12 Thread Alessandro Benedetti
I would recommend to look into the Highlight feature[1] . There are few implementations and they should be all right for your user requirement. Regards [1] https://lucene.apache.org/solr/guide/7_3/highlighting.html - --- Alessandro Benedetti Search Consultant, R&D Software Engi

Solr sort multivalued field

2018-06-12 Thread Marc Lammers
Hi All. I want to sort my data by a multivalued field. I add this to my query „*sort=field(foo,min) asc“*. The configuration in the schema for this field is The solr documentation says that i have to add the docValues="true" attribute for this field. After this I deleted the collection an

Re: How to find out which search terms have matches in a search

2018-06-12 Thread Derek Poh
Sorry I realized the strike through on the term "bear" in "big brown bear" cannot be displayaccordinglyin the mailing list. My aim is to have the search terms "big brown bear", display on the search result page with the term "bear" striked through since it does not have a match in the search res