Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-12-09 Thread Hendrik Haddorp
Hi, for the HDFS case wouldn't it be nice if there was a mode in which the replicas just read the same index files as the leader? I mean after all the data is already on a shared readable file system so why would one even need to replicate the transaction log files? regards, Hendrik On 08.1

RE: FW: Need Help Configuring Solr To Work With Nutch

2017-12-09 Thread Rick Leir
Ara The config for soft commit would not be in schema.xml, please look in solrconfig.xml. Look in solr.log for evidence of commits occurring. Explore the SolrAdmin console, what are the document counts? You can post snippets from your config files here. Cheers --Rick On December 8, 2017 4:23:

Re: Learning to Rank (LTR) with grouping

2017-12-09 Thread Roopa Rao
Hi I tried to apply this JIRA SOLR-8776 as a patch as this feature is critical. Here are the steps I took on my mac: On branch branch_6_5 Your branch is up-to-date with 'origin/branch_6_5' patch -p1 -i 162.patch --dry-run I am getting Failures for certain Hunks Example: patching file solr/c

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-12-09 Thread Erick Erickson
This has been bandied about on a number of occasions, it boils down to nobody has stepped up to make it happen. It turns out there are a number of tricky issues: > how does leadership change if the leader goes down? > the raw complexity of getting it right. Getting it wrong corrupts indexes > how

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-12-09 Thread Hendrik Haddorp
Ok, thanks for the answer. The leader election and update notification sound like they should work using ZooKeeper (leader election recipe and a normal watch) but I guess there are some details that make things more complicated. On 09.12.2017 20:19, Erick Erickson wrote: This has been bandied

joining across sharded collection

2017-12-09 Thread chris
I'm trying to figure out how to structure this query. I have two types of documents: items and sources. Previously, they were all in the same collection. I'm now testing a cluster with separate collections. The items collection has 38,034,895,527 documents, and the sources collection has 41

Re: joining across sharded collection

2017-12-09 Thread Erick Erickson
Have you looked at the streaming functionality (StreamingExpressions and ParllelSQL in particular)? While it has some restrictions, it easily handles cross-collection joins. It's generally intended for analytic-type queries, but at your scale that may be what you need. At that scale denoramlizing

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-12-09 Thread Erick Erickson
The complications are things like this: Say an update comes in and gets written to the tlog and indexed but not committed. Now the leader goes down. How does the replica that takes over leadership 1> understand the current state of the index, i.e. that there are uncommitted updates 2> replay the u