On Tue, Jan 12, 2016 at 3:00 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 1/12/2016 7:45 AM, Tom Evans wrote:
>> That makes no sense whatsoever. DIH loads the data_import.conf from ZK
>> just fine, or is that provided to DIH from another module that does
>> know about ZK?
>
> This is accomplished indirectly through a resource loader in the
> SolrCore object that is responsible for config files.  Also, the
> dataimport handler is created by the main Solr code which then hands the
> configuration to the dataimport module.  DIH itself does not know about
> zookeeper.

ZkPropertiesWriter seems to know a little..

>
>> Either way, it is entirely sub-optimal to have SolrCloud store "all"
>> its configuration in ZK, but still require manually storing and
>> updating files on specific nodes in order to influence DIH. If a
>> server is mistakenly not updated, or manually modified locally on
>> disk, that node would start indexing documents differently than other
>> replicas, which sounds dangerous and scary!
>
> The entity processor you are using accesses files through a Java
> interface for mounted filesystems.  As already mentioned, it does not
> know about zookeeper.
>
>> If there is not a ZkFileDataSource, it shouldn't be too tricky to add
>> one... I'll see how much I dislike having config files on the host...
>
> Creating your own DIH class would be the only solution available right now.
>
> I don't know how useful this would be in practice.  Without special
> config in multiple places, Zookeeper limits the size of the files it
> contains to 1MB.  It is not designed to deal with a large amount of data
> at once.

This is not large amounts of data, it is a 5kb XML file containing
configuration of what tables to query for what fields and how to map
them in to the document.

>
> You could submit a feature request in Jira, but unless you supply a
> complete patch that survives the review process, I do not know how
> likely an implementation would be.

We've already started implementation, basing around FileDataSource and
using SolrZkClient, which we will deploy as an additional library
whilst that process is ongoing or doesn't survive it.

Cheers

Tom

Reply via email to