Re: Undefined field - solr 7.2.1 cloud

2019-09-04 Thread Antony A
Hi, I have confirmed that ZK ensemble is external. Even though both managed-schema and schema.xml are on the admin ui, I see the below class defined in solrconfig. The workaround is till to run "solr zk upconfig" followed by restarting the cores of the collection. Anything else I should be looki

Re: Undefined field - solr 7.2.1 cloud

2019-09-04 Thread Erick Erickson
This almost always means that you really _didn’t_ update the schema and reload the collection, you just thought you did ;). One common reason is to fire up Solr with an internal ZooKeeper but have the rest of your collection be using an external ensemble. Another is to be modifying schema.xml w

Re: Atomic indexing as default indexing mode in Solr

2019-09-04 Thread Erick Erickson
Because atomic updates require special preparation, specifically all original fields must be stored which is not a requirement and is, in fact, an anti-pattern in large installations. Best, Erick > On Sep 4, 2019, at 7:51 PM, Arnold Bronley wrote: > > Why atomic indexing is not the default mo

Atomic indexing as default indexing mode in Solr

2019-09-04 Thread Arnold Bronley
Why atomic indexing is not the default mode of indexing in Solr? That way the ownership model of the content changes from document level to field level for clients. Multiple clients can participate in the contribution process of the same Solr document without overwriting each other.

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-04 Thread Russell Bahr
Hi Toke, Please see below. We reindexed the solr8 cluster to make sure it was up to date with content. * What is your Xmx for the Solrs? - solr8 SOLR_JAVA_MEM="-Xms11235m -Xmx11235m" - 70% of OS Memory solr4 export CATALINA_OPTS="-Xms11235m -Xmx11235m" - - 70% of OS Memory (If you use most

Re: Returning the value of an ExternalFileField in Solr 3.4

2019-09-04 Thread Mikhail Khludnev
Hello, Adam. It's hard to witness about 3.4, but that assert works on master diff --git a/solr/core/src/test/org/apache/solr/schema/ExternalFileFieldSortTest.java b/solr/core/src/test/org/apache/solr/schema/ExternalFileFieldSortTest.java index 632b413..4106e15 100644 --- a/solr/core/src/test/org/a

Re: Undefined field - solr 7.2.1 cloud

2019-09-04 Thread Antony A
Hi, I ran the collection reload after a new "leader" core was selected for the collection due to heap failure on the previous core. But I still have stack trace with common.SolrException: undefined field. On Thu, Aug 29, 2019 at 1:36 PM Antony A wrote: > Yes. I do restart the cores on all the d

RE: SolrClient from inside processAdd function

2019-09-04 Thread Markus Jelsma
Hello Arnold, Yes, we do this too for several cases. You can create the SolrClient in the Factory's inform() method, and pass is to the URP when it is created. You must implement SolrCoreAware and close the client when the core closes as well. Use a CloseHook for this. If you do not close the

Re: SolrClient from inside processAdd function

2019-09-04 Thread Arnold Bronley
Hi Simon, I am interested in knowing what did you end up doing in your use-case then. Can you please share it at least at high level? On Wed, Sep 4, 2019 at 2:26 PM Simon Rosenthal wrote: > Similarly, I had considered a URP which would call the Solr Tagger to add > new metadata fields for inde

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-04 Thread Russell Bahr
Hi Shawn, Thank you for the feedback and advise. I have loaded the 2 screenshots up to drop box. Here is the link. https://www.dropbox.com/s/c5b41a61za0ojw7/solr4_Screen%20Shot%202019-09-03%20at%203.37.08%20PM.png?dl=0 Thank you, *Manzama*a MODERN GOVERNANCE company Russell Bahr Lead Infrast

Re: SolrClient from inside processAdd function

2019-09-04 Thread Simon Rosenthal
Similarly, I had considered a URP which would call the Solr Tagger to add new metadata fields for indexing to incoming documents (and recall discussing this with David Smiley), but eventually decided against this approach on the grounds of complexity. -Simon On Wed, Sep 4, 2019 at 2:10 PM Arnold

Re: SolrClient from inside processAdd function

2019-09-04 Thread Arnold Bronley
I need to search some other collection inside processAdd function and append that information to the indexing request. On Tue, Sep 3, 2019 at 7:55 PM Erick Erickson wrote: > This really sounds like an XY problem. What do you need the SolrClient > _for_? I suspect there’s an easier way to do this

Returning the value of an ExternalFileField in Solr 3.4

2019-09-04 Thread Adam Taylor
Hi All, We're running Solr 3.4 (I know, I know - we have another project to upgrade this) and we have a fieldtype defined with an externalFileField as: (Actually, I have tried this with both `stored="true"` and `stored="false"`). The field using this fieldtype is defined as:

Re: Solr 7.7.2 Autoscaling policy - Poor performance

2019-09-04 Thread Andrew Kettmann
> there are known perf issues in computing very large clusters Is there any documentation/open tickets on this that you have handy? If that is the case, then we might be back to looking at separate Znodes. Right now if we provide a nodeset on collection creation, it is creating them quickly. I

Re: Skip Headers & Footers while text extraction using Apache Tika parsing for PPT & PDF formats

2019-09-04 Thread Alexandre Rafalovitch
I think you have to start from the lowest level and then go up the stack. Solr uses Tika if you use extract handler (and for production you may not want to) Tika uses PDFBox to extract from PDF Searching PDFBox remove headers gets you: https://stackoverflow.com/questions/18126035/how-to-remove-hea

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-04 Thread Walter Underwood
On Sep 3, 2019, at 1:13 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > > The main issue we are anticipating with the above strategy surrounds scoring. > Since we will be increasing the frequency of accented terms, we might bias > our page ranker... You will not be increasing the f

Re: Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-04 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Thanks, Alex! We'll look into this. -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 9/3/19, 4:27 PM, "Alexandre Rafalovitch" wrote: What about combining: 1) KeywordRepeatFilterFactory 2) An existing folding filter (need to check it ignores Keyword

Re: Skip Headers & Footers while text extraction using Apache Tika parsing for PPT & PDF formats

2019-09-04 Thread Walter Underwood
> On Sep 3, 2019, at 10:46 PM, Jörn Franke wrote: > > PDF is a problematic format as headers and footers are not specified per se > as headers and footers in the document, but only as drawing instructions on > the page. There is no chance for a software to find them based on the > structure.

Re: Question: Solr perform well with thousands of replicas?

2019-09-04 Thread Hongxu Ma
Hi Erick Thanks for your help. Before I visit wiki/maillist, I knew solr is unstable in 1000+ collections, and should be safe in 10~100 collections. But in a specific env, what's the exact number which solr begin to become unstable? I don't know. So I try to deploy a test cluster to get the num

Re: upgrading from solr4 to solr8 searches taking 4 to 10 times as long to return

2019-09-04 Thread Toke Eskildsen
On Tue, 2019-09-03 at 12:35 -0700, Russell Bahr wrote: > Also, if it helps, the content on each server is between around 6.2Gb > and 7.8Gb. We're still missing something here. The trivial query http://solr.obscured.com:8990/solr/content/select?q=*%3A*&wt=json&indent=true on such a modest index si