Hi Kashyap,
I've a bit experience with ActiveRecord and some more with EntityFramework.
As said in the other responses, the big fundamental difference between
PicoLisp database architecture and ORMs is that in PicoLisp the
application layer and the database layer is the same layer, it is not
two very different conceptual systems (OOP vs. Relational) with a
"translation system" in between.
This results in several benefits when developing applications using
PicoLisp database:
* higher productivity
o in PicoLisp, the database "table definition" and the application
class is the same thing. In other stacks you have to do the
"object definition" multiple times: in the database schema, an
application class, usually you do additional a "business model"
object (or even multiple). Every time it is basically the same
"grouping of some properties/columns" with only small (but
substantial) deviations.
Even when these frameworks generate you some or all of the extra
code, it is in practice often necessary to look at this code and
adjust it so it works as intended, especially when doing changes
on an existing application.
* less code, less space for bugs
o It is easier to analyze and debug what is happening with the
data between input/storage and storage/output without having to
track the values being copied from one intermediate (layer
transition) object to the next.
* simpler mental model
o as less screws and mechanics are present in the PicoLisp stack,
it is easier to understand and imagine in your head what does
happen, easier to contrive which potential effects you need to
take care of.
o in mainstream stacks, the conceptual and technical barrier
between OOP application programming and relational SQL database
programming is often very pronounced. In my experience,
application programmers these days often have a very incomplete
understanding of how a database works (e.g. they don't know
about transactions), which results in the best case in
insufficient usage of the DBMS (e.g. amateurish re-implementing
of functionality in the application which would be available
from the DBMS in a tested, optimized and reliable way), or
widespread disregard of fundamental data integrity
vulnerabilities in the worst case, which depending of the
application might result in minor annoyances up to complete and
utter failure of the whole thing (e.g. losing vital data, might
bankrupt a company).
* queries and application cache
o As you saw, it is possible to write very involved and powerful
queries on the application level in PicoLisp.
These queries make use of the database data cached on the
"application layer", meaning the database is only asked for
records which were not already accessed in the same session.
o Most (maybe all?) ORMs cannot take advantage of the records they
already have been cached, because the ORM has only a very
limited implementation of SQL, so the query (e.g. written with
LINQ on the application level) gets translated to SQL and sent
to the database to be executed there, and then the results are
sent back to the application - even data is sent forth and back
which would already be cached on the application layer.
+ e.g. Entity Framework only uses its own cache when directly
fetching a record by primary key (id),
all other queries are executed on the database as Entity
Framework has no query engine.
+ this hurts performance even more when the database is
located on a different server than the application, as it is
often the case.
PicoLisp database is not relational:
* there is no "primary key" in PicoLisp
o while you might often have a (rel id (+Key +Number)) to have an
simple index to find all records of the same entity, such a
property is completely optional
o a record only needs to be in at least one index or being linked
to from another record, so it can be found and is not seen as
garbage (as in garbage collecting = object to which nothing points)
* every object has a global unique identifier
o global per database
o this is the name of the external symbol
+ which encodes the physical location in the database file
where the record is stored
o this makes it possible to directly link a record to another
(also many to many)
+ in relational databases this requires a foreign key (which
is an index, roughly comparable to +Ref)
+ in relational databases, many-to-many relations require an
interim table
# this interim table is usually hidden by the ORM
* lazy loading is possible by having a global unique identifier for
every record
o in relational databases, the "link" is storing the value of the
primary key of another record
o retrieving the complete record in a relational database means:
+ 1) find the corresponding entry in the specific primary key
index
+ 2) from this the database retrieves the pointer to the
physical location in the database file
+ 3) read the record from the database file
o retrieving the complete record in PicoLisp database means:
+ 1) read the record from the database file
o scenario: you have a number of interlinked records (as natural
in OOP), and you want to have them available on the application
layer
+ ORM:
# 1) get a list of records from the database (or a single
record)
# 2) create for each of them an application object,
copying all properties/fields from the database record
to the application object
# 3) when the property/field is a foreign key, retrieve
the corresponding record from the database, create
another application object and link it (by memory
pointer) to the other related application object(s)
# 4) repeat 2) and 3) for every such indirectly retrieved
record because it was referred to (by foreign key) from
the previous processed record
# this is eager loading
* loading everything reachable from the "starting
point" records returned by the query
# hope you have no indirect circular relations - direct
circular relations might be handled by the ORM, but
indirect ones often aren't - so endless recursion until
you get stack overflow within the ORM code
* I just recently managed to do this by accident with
an Entity Framework application
# obviously it often leads to "over"-loading:
* retrieving more records from the database than
actually required/used for the current operation in
the application
* creating all those extra application objects and
doing all these copying of property/field values
# ORMs do allow to customize this "automatic default loading"
* extra work, extra source of bugs, ...
* when looking at an application you haven't written
yourself, you have to dive into this code to know
what is loaded, when and how.
+ PicoLisp database:
# 1) get a list of records from the database (or a single
record)
# 2) when the property/field is a +Link (or +Joint), we do
nothing special with it - the value just happens to be
the name of an external symbol (or multiple)
# only when the related record is accessed in the
application is it retrieved from the database
* e.g. accessed via (get This address street name)
# this is lazy loading
* only load when we actually use it
# circular relations are no problem, doesn't matter how
indirect/nested
* on first access the record is loaded
* on further access the already cached object is re-used
PicoLisp database is a multi-paradigm database system:
* direct support for OOP
o application objects can easily be persisted, without copying
from/to another structure
o inheritance and polymorphism can be applied to "tables" (as
Entities are real OOP classes)
* each record can have it's own specific schema
o same as "NoSQL" document databases (e.g. MongoDB)
o maintenance nightmare when used in typical business
applications, but extremely useful for e.g. archive systems (old
records may keep their old schema)
* records can be directly interlinked, and these links can be
traversed easily and efficiently
o same as in graph databases
* no limits on field (property) size or record size
o well there are (high) limits eventually, but if hit them, you
did most likely something very badly wrong in your design
o text/strings can be of arbitrary lengths
o numbers can be of arbitrary lengths
o so some rather typical issue of applications using SQL databases
simply don't exist
+ e.g. my name/street/city/comment/whatever is 21 characters
long but the database only allows 15
+ e.g. we have to change the primary key from an int to a
long, because we have so many records we reached the size
limit of the int....
# yep, this happens.
+ this is really a even more advantageous feature than one
might think at first, this is really very useful in practice
# typical applications require various changes to the
application and the database whenever such an limit has
to be increased
# and all the ORM-generated code has usually to be adapted too
o relational databases have limits on the maximum possible size of
an individual record
+ practically means: the number of columns in a table is
limited, as the combined maximal size of each column must
not become higher than the record size limit
+ no such limit in PicoLisp (as long the symbol fits into
memory and you can still come up with names for all these
properties...)
* support for various indexing strategies:
o unique index
+ +Key
o non-unique index
+ +Ref
o various text index strategies
+ +Idx
+ +IdxFold
+ +Fold +Idx
+ +Sn (soundex, find by phonetic similarity)
o index based on multiple properties (columns)
+ +Aux
o spatial index
+ +UB
+ for geospatial or other data with multiple numeric dimensions
o and more...
+ e.g. creating your own custom indexing mechanic is well
possible in PicoLisp
# not so much in usual relational database systems
* +Swap for lazy loading fields/properties
o in relational databases, you do this with a separate table and a
1-to-1 relation
+ an ORM often doesn't know better than to do eager loading,
so when using an ORM this method usually needs additional
tweaking of the ORM
+ 1-to-1 relations are often vehemently discouraged in SQL
teachings, but in practice often a meaningful method to
improve performance by preventing "loading too much"
# it violates the beauty of the relational model
# fun fact: the founders of relational model (the theory
applied in "relational" databases) argue that the
current implementations do not adhere to the relational
model
* so these "relational" databases are not even really
relational in pure theory
* full ACID <https://en.wikipedia.org/wiki/ACID> support
o NoSQL databases usually lack it, basically shoveling the
responsibility for data integrity to the application layer
+ the application programmers are usually not aware of this
+ more than one company got bankrupt because of this
o mainstream relational databases have full ACID support
+ often the default transaction guarantees are rather weak, as
they are optimized for performance benchmarks
# but strong guarantees can be easily activated, if you
know the right magic words
* the right magic words differ between different
database management systems, and the words
themselves are often somewhat misleading
+ support for transactions in ORMs is usually rather weak
* block size configuration can be finely granulated
o see (dbs)
o most database systems store data in evenly sized slots in the
database file
+ filesystems (which are a kind of database system) work the
same (e.g. sectors on a harddrive)
+ because this allows quickly to jump to a specific location
within the database file, e.g. the Nth entry = N multiplied
with the blocksize (same as pointer arithmetic)
+ in contrast to a text file in which the lines have varying
lengths separated by a linebreak character
# so to get to the Nth line, you have to go from the start
through every character one by one, and count the linebreaks
o so a database record typically occupies at minimum one such block
+ or two, or many, but it can't occupy only half a block
+ unused space in a block is overhead (e.g. many filesystems
allow you to see the "real file size" vs. "size on disk",
where "size on disk" is always equal or bigger = filesystem
blocksize)
o so you want the blocksize to fit the typical (median) size of
your database records
+ of course this is specific to every application
o in most database systems, you can have multiple database files
(or "partitions"), each has a fixed blocksize, tables get
assigned to those files/partitions
+ same in PicoLisp database
o in most database systems, this blocksize is something to
configure in the database software, by an expert for that
database software, not by the typical programmer
+ so usually this is actually not adjusted to the use case,
but the database software defaults are used
o most database systems start with blocksizes of 4kB or higher
+ afaik because this is usually the page-size of the
filesystem and/or the OS caching of files
o a table in a relational database is kinda one big array
o in PicoLisp database the blocksize can be much smaller, as data
is not stored in fixed arrays but in external symbols (random
access)
o in relational databases, a table usually has to be stored
completely in one database file, as a consequence there is only
one blocksize for the whole table
o in PicoLisp, every record could be (created) in another database
file
+ moving a record later to another database file is not
feasible, because this would change the name of its external
symbol, so all references to that record would need to be
changed.
# if a re-adjustment of the blocksize has to be done on a
PicoLisp database, the usual way is to export and
freshly import the data
o so if the blocksize is properly defined in *Dbs for every
Entity, index and +Swap, then:
+ the PicoLisp database might have less storage overhead due
to padding in ill-fitting blocks (in the database file) than
a comparable relational database
+ (automatic) caching by the OS / filesystem might be more
efficient (as more records fit into the cache)
+ though this effect is highly depending on the size and usage
patterns of the data (so all in all this point might be
rather academic)
* Support for Blobs (files, or other arbitrary binary data)
o picolisp actually stores Blobs not in the database files, but as
separate files, referenced in the database
o this is usually much more efficient than pressuring such Blobs
into the database blocksize
o such files can easily (and usually efficiently) be accessed
o but such files are not cached in the application/database cache
+ this can be an advantage or an disadvantage, depending on
the use case
+ typically Blobs are used to store files which often are not
much processed by the application, mostly just imported (or
generated) and exported to the user
# then this mechanic for storing Blobs is clearly an advantage
+ in other scenarios such Blobs might be Bit-Tables used in
the application, which better should be cached
# though in PicoLisp such Bit-Tables could be stored as
+Number, no need to use +Blob for this
While PicoLisp database is well-suited for business applications, it is
less ideal for storing huge amounts of strictly uniformly structured
data (e.g. raw data for machine learning).
Here it has two disadvantages compared to relational databases:
* PicoLisp stores the property names for every non-NIL property on a
record
o so this is a small overhead per record and property, which might
accumulate to unfavorable levels when storing huge amounts of
similar data (similar to JSON, XML, ...)
o relational databases store data as arrays, so the field names
are not stored on every record (similar to CSV)
o this is why Alex uses very short property names (e.g. "nm"
instead of "name")
+ readability of the code is a habit thing
* External symbols are good for random access, not so good for bulk
read/export of data
o this point is also valid for bulk insert, to a degree
o a relational database stores data similar to an array - in
continuing blocks (per table)
o picolisp database stores data in external symbols
+ external symbols stored next to each other might be
completely unrelated (except same blocksize)
o so when a database query happens to request the data exactly in
the order as it is physically stored in the relational database
(usually by primary key, Microsoft calls this the "clustered"
key), then the relational database can very efficiently just
dump the whole part of the database file to the client, without
having to do any (or much) processing of that data.
+ though using an ORM probably ruins this advantage, as the
ORM is processing each record individually and copying the
data into application objects
I hope I could give here a good insight how PicoLisp database compares
to mainstream database stacks. In the end you have to decide what fits
best your requirements and the nature of the application/data.
I've worked now several years in parallel on business applications
implemented in C#/MVC .NET/MSSQL and in PicoLisp, and observe much
higher productivity and maintainability (flexibility, ease of doing
changes on an existing application) and better performance (for the
application user) in the PicoLisp stack - but of course it also depends
on the actual software architecture and database design used.
And of course PicoLisp programming requires capable developers who are
interested to dig into the stuff, while the popularity and the "hand
holding"-features/restrictions of those mainstream enterprise stacks
make a kind of minimum productivity possible even when no capable
developer is available, yadda-yadda...
Feedback and corrections are very welcome.
Kind regards,
beneroth
On 28.11.19 18:06, C K Kashyap wrote:
> Hi Alex,
> There is a plethora of ORM systems such as ActiveRecords (in
> Ruby/Rails) or Microsoft EntityFramework and similar solutions in
> other languages where Objects are mapped to SQL DB records.
>
> I'd love to know your thoughts about how PicoLisp's approach is
> similar/different from them.
>
> Regards,
> Kashyap