On May 3, 2012, at 8:30 AM, Markus Jelsma wrote:

> Hi.
> 
> Compression is a good suggestion. All large dictionaries are compressed well 
> below 1MB with GZIP. Where should this be implemented? SolrZkClient or 
> ZkController?

Hmm...I'm not sure - we want to be careful with this feature. Offhand, I'd 
guess if we can get it in SolrZkClient that is the right level.

The main issue is that we don't want to compress by default - we want to do it 
based on size or request - because it's much harder to inspect the data in zk 
if its compressed. We will probably want to add support to auto uncompress to 
the Admin Zk view UI.

> Which good compressor is already in Solr's lib?

I don't know that we have one yet - though the benchmark contrib uses a lib for 
compression (commons-compress from Apache).

> And what's the 
> difference between SolrZkClient setData and create?

setData sets data on an existing node - create creates a new node (with or 
without data).

> Should it autocompress 
> files larger than N bytes?

This seems like a reasonable approach to me...

> And how should we detect if data is compressed when 
> reading from ZooKeeper?

I was thinking we could somehow use file extensions?

eg synonyms.txt.gzip - then you can use different compression algs depending on 
the ext, etc.

We would want to try and make it as transparent as possible though...

> 
> On Thursday 03 May 2012 14:04:31 Mark Miller wrote:
>> On May 3, 2012, at 5:15 AM, Markus Jelsma wrote:
>>> Hi,
>>> 
>>> We've increased Zookeepers znode size limit to accomodate for some larger
>>> dictionaries and other files. It isn't the best idea to increase the
>>> maximum znode size. Any plans for splitting up larger files and storing
>>> them with multi? Does anyone have another suggestion?
>>> 
>>> Thanks,
>>> Markus
>> 
>> Patches welcome :) You can compress, you can break up the files, or you can
>> raise the limit - that's about the options I know of.
>> 
>> You might start by creating a JIRA issue.
>> 
>> - Mark Miller
>> lucidimagination.com
> 
> -- 
> Markus Jelsma - CTO - Openindex

- Mark Miller
lucidimagination.com











Reply via email to