Object field storage: Questions and alternatives

David Adams via 4D_Tech Tue, 11 Jul 2017 03:12:09 -0700

I've not delved into 4D's object fields much yet and now have a situation
where they make sense...but may not be workable. I'm looking for some
comments, advice and, hopefully, code.


I have been under the misimpression that object fields use an efficient
binary format for storage. Based on my own limited experimentation and
comments from Miyako and John DeSoi, it seems that this is not the case.
Instead, the data is stored as JSON.

Saying that something "is JSON" tells only two things:

1) The data follows the rules of a specific grammar.
2) Any grammar-complete parser can parse the data.

That's it. It tells you nothing about the structure or contents of the
data. More to the point, the same data can be stored in a whole lot of
different ways. For example, check out this record stored in JSON:

{
   "id":119736,
   "statecode":"FL",
   "county":"CLAY COUNTY",
   "eq_site_limit":498960,
   "hu_site_limit":498960,
   "fl_site_limit":498960,
   "fr_site_limit":498960,
   "tiv_2011":498960,
   "tiv_2012":792148.9,
   "eq_site_deductible":0,
   "hu_site_deductible":9979.2,
   "fl_site_deductible":0,
   "fr_site_deductible":0,
   "point_latitude":30.102261,
   "point_longitude":-81.711777,
   "line":"Residential",
   "construction":"Masonry",
   "point_granularity":1
}

Pretty standard stuff. In this case, I grabbed some point data off the Web
for a bit of experimentation - it's not my data...but it's illustrative.
Now imagine the same data as a row in a spreadsheet/CSV/TSV file:

119736 FL CLAY COUNTY 498960 498960 498960 498960 498960 792148.9 0 9979.2 0
0 30.102261 -81.711777 Residential Masonry 1

The difference is the name-value pairs in the JSON pictured above *takes
more space than the data itself.* Repeating this for every object/element
means that you get some super large JSON in a hurry. Just for comparison, I
took 100 rows of this point data and compared the size using the verbose
format and a flat file. The JSON is nearly four times larger. The ratio
depends a lot on you key names and data, but 4:1 isn't rate, I've seen far
worse and rarely will you see less than 2:1 using the format above.

The JSON format shown above is easy to read (if you ever need to read it),
but it is pretty much the most de-optimized format for data storage and
transfer over the network. This seems to be 4D's native format and it makes
object fields a non-starter for most of the applications I was considering
them for.

The issue I'm raising here isn't unique to 4D and is not new. Common
solutions:

* Use super short names for your keys like "a" and "b", etc. So horrible.
You've lost the legibility of nice name-values.

* Use a header object that describes the 'columns' and then use compact
JSON arrays for the data. Rob Laveaux reminded me about this option some
months back and it's a really decent compromise.

* Avoid JSON entirely and use either a format like CSV, TSV, or something
binary.

On this last point, the TSV (Tab Separated Values) option is also one that
I like. It makes for a relatively efficient text format which compresses as
well as any other text. My beloved D3.js has a whole suite of routines to
convert incoming TSV, CSV, etc. into an array of JSON objects on-the-fly.
Why? Because network connections are often poor and you never know what
they're like. Also, network download time tends to be perfectly
proportionate to download size. Twice as much data pretty much takes twice
as long to download. Once it's in RAM on the client side, unpacking it is
fast.

So, that leads to the code I'm after. Has anyone written
serializer/deserializers for 4D's objects to make them more compact? I
appreciate that this adds a layer on top of normal operations and means not
using native object field features. Given the volumes of data I'm likely to
be working with, the standard format just won't cut it so the alternatives
are to push the data out to PostgreSQL or a big cloudy thing. Not
necessarily a bad idea (and one that will eventually come into play), but
it would be nice to stay in 4D with the data for longer.

Any comments appreciated. If overlooked something obvious and am asking a
foolish question, well, I'd rather be embarrassed for a bit and get the
info - so feel free to tell me I'm being a bit thick.

Also, please vote for this request:

Support storing object data in a compressed format
http://forums.4d.fr/Post//19672585/1/#19672586

Thanks!
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:[email protected]
**********************************************************************

Object field storage: Questions and alternatives

Reply via email to