Reply to copyright concerns for libzdb in Seafile

Jonathan Xu Mon, 03 Jun 2019 07:15:59 -0700

Hi all,

I'm Jonathan Xu, main developer of Seafile. I personally wrote the replacement 
code to libzdb.


Sorry for not replying to the original thread since I'm late to the discussion 
and was not in the mailing list before. The original discussion can be found 
here: https://lists.debian.org/debian-legal/2019/05/msg00002.html

I'll reply to the concerns and evidence below.

Concern:
------------

We later discovered that the code that replaced libzdb is mostly a copy of 
libzdb's code and structures. This stand in contrast to the statement “we 
completely replaced libzdb with our own code.”  [4]

Libzdb is licensed under GPLv3. Copying and modifying GPL code is perfectly 
fine as long as the original copyright notice and license are kept. 
Unfortunately, this is not what the Seafile team did. Instead they copied code 
from libzdb, removed the copyright notice, claimed the code as their own and 
re-license it under another license. 
Evidence: 
-------------

To do a side by side comparison I’m going to use Seafile’s version of libzdb 
which they forked on GitHub [6] at version 2.11.1 and based their new code on 
and claimed as their own [5]. The comparison is going to be against our same 
version on Bitbucket. 

Libzdb is a database connection pool library which consists of 4 major 
components: ConnectionPool, Connection, ResultSet and PreparedStatement. 
Forward declared data structures are used to abstract the concrete database 
implementation. These components together with their method association are 
quite unique for a _C_ database connection pool library as far as I know. The 
Seafile "rewrite" uses the exact same components and method association between 
components with modest renaming. Mostly by going from camel-case to snake-case.

**** Seafile's reply

When we rewrote the database access layer, we did learn some ideas from libzdb. 
But we did a lot of redesign of the interface and architecture, to make it 
better suit our needs and make it easier for us to understand.

To be honest I didn't fully understand the code of libzdb at the time of 
writing our own access layer. The code libzdb uses some unusual coding 
techniques in C languaage. For example it implements its own Exception to 
handle errors. And they widely use object-oriented design patterns like 
delegation paradigms in the code.

So I admit that I borrow some ideas from libzdb, but all the code are written 
by myself so that I can understand it and maintain it in the future. It would 
be hard to believe someone can copy some code that he/she doesn't fully 
understand, make modifications and make it work flawlessly.

I also want to say that the interfaces of libzdb are not so unique, as they're 
very natural and common choices to provide a "middle ground" for all the APIs 
of different databases. Connection pools, connection, prepared statement, 
result set, are all basic concepts in DBMS APIs. What libzdb and us do are just 
to implement some "wrapper" to these APIs, to hide the differences among DBMS.

Let's first compare the interfaces of our db access layer and libzdb's. The 
source code are referenced below.

- Seafile: 
https://github.com/haiwen/seafile-server/blob/master/common/db-wrapper/db-wrapper.h
- libzdb: 
https://github.com/haiwen/libzdb/blob/master/src/db/PreparedStatement.h , 
https://github.com/haiwen/libzdb/blob/master/src/db/Connection.h , 
https://github.com/haiwen/libzdb/blob/master/src/db/ResultSet.h , 
https://github.com/haiwen/libzdb/blob/master/src/db/ConnectionPool.h

We'll go into details of comparing each interfaces and their implementation 
later. But as you can find by taking a quick look at the source code, you'll 
find that the interfaces are already very different except from the conceptual 
level.

First of all, since we don't use Exceptions to report errors, we explicitly 
include a GError **error parameter in each function. E.g. 

ResultSet *
db_connection_execute_query (DBConnection *conn, const char *sql, GError 
**error);

Secondly, the data structures we define doesn't use "delegate", compared to 
libzdb's.

--- Seafile
struct DBStmt {
 ResultSet *result_set;
};

--- Libzdb
#define T PreparedStatement_T
struct PreparedStatement_S {
 Pop_T op;
 ResultSet_T resultSet;
 PreparedStatementDelegate_T D;
};

Our DBStmt structure is defined to be the "parent" of all concrete DB statement 
structures, like 

typedef struct MySQLDBStmt {
 DBStmt parent;
 int param_count;
 MYSQL_STMT *stmt;
 MYSQL_BIND *bind;
} MySQLDBStmt;

Compared to libzdb's

#define T PreparedStatementDelegate_T
struct T {
 int maxRows;
 int lastError; 
 int paramCount;
 param_t params;
 MYSQL_STMT *stmt;
 MYSQL_BIND *bind;
};

I don't understand how the delegate works here :-(

You may find some similarity in the fields of both structures. But I think 
they're reasonable similarity due to the fact that we both need to call the 
same set of APIs provided by the databases.

Below I'll reply inline to the evidence provided by libzdb authors.

I’m going to limit the comparison somewhat for brevity, but it should be enough 
to demonstrate copyright concern. The full comparison can be done by comparing 
[5] and [7]. 


1. The Connection Pool has two significant methods:

- Get a connection from the pool

a:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/ConnectionPool.c#lines-314

a:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L180

- And return a connection to the pool

b:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/ConnectionPool.c#lines-345

b:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L220

Apart from Seafile using glib array and libzdb using its own vector module the 
above demonstrate copy of code with the same logic, method and variable names. 
Libzdb’s Connection_setAvailable is equal to their conn->is_available = FALSE; 
And our LOCK macro is just pthread_mutex_lock. I.e. the same code and logic, 
just expanded and inlined. 


***** Seafile's reply

Connection pools are very common programming constructs seen in many 
programming languages. The implementations are more or less the same. They all 
use some data structure to store the connections, and having a lock. It's 
unnecessary to use another data structure (say hash table) just to be different.

And if you compare the interfaces of connection pool from libzdb and Seafile, 
you'll find the interfaces are quite different.

We have functions like 'db_conn_pool_new_mysql' and much less other functions. 
While libzdb has more functions.

- libzdb: https://github.com/haiwen/libzdb/blob/master/src/db/ConnectionPool.h
- Seafile: 
https://github.com/haiwen/seafile-server/blob/master/common/db-wrapper/db-wrapper.h

What we have in Seafile:

- db_conn_pool_new_mysql
- db_conn_pool_new_pgsql
- db_conn_pool_new_sqlite

While there are many more functions in libzdb, they don't have the above 3 
functions.

2. Connection

In libzdb a Connection has three significant  methods, Connection_execute, 
Connection_executeQuery and Connection_prepareStatement. Seafile has the same 
methods implemented in the same way

a:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-308

a:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L258

b:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-323

b:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L350

Seafile has not copied all methods from libzdb’s Connection, but 
Connection_ping is is there as well as Connection_beginTransaction, 
Connection_rollback and Connection_commit

c:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-228

c:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L241


**** Seafile's reply

As I said in the beginning, connection, execute sql, prepare statement, ping 
are just basic concepts borrowed from DBMS client libraries like MySQL. If you 
look at MySQL's C API manual, you'll find all of them. They're natural choice 
for the function names.
What is special about our transaction code in libzdb is that we keep a counter 
called “isInTransaction” which Seafile has as “in_transaction”. 

d:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/Connection.c#lines-252

d:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L424


**** Seafile's reply

We also need this field since we have to know whether to rollback a transaction 
when we close a connection. It seem to me there are no other good choice to 
implement the same goal.
3. ResultSet and PreparedStatment are also clearly copied from libzdb. We see 
ResultSet_next, ResultSet_getString, ResultSet_getInt etc and 
PreparedStatement_setString, PreparedStatement_setInt etc. Also 
PreparedStatement_executeQuery is faithfully copied:

a:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/PreparedStatement.c#lines-122

a:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/db-wrapper.c#L392


**** Seafile's reply

Again the concepts of prepared statement and result set are from MySQL, 
PostgreSQL etc.

About the code similarity, I think it arises from the fact that there are only 
very few ways to implement a db wrapper library. We all need a function pointer 
table to abstract the details of various DBMS. And given the similarity of the 
concepts of different DBMS, we all need to define some functions: execute, 
query, prepare_statement, etc.
4. Concrete Database implementations. 

When it comes to the concrete database implementation for SQLite, MySQL and 
PostgreSQL the same copy of code is repeated. For example, MysqlResultSet_new.

a:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/mysql/MysqlResultSet.c#lines-102

a:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/mysql-db-ops.c#L189

and the special way we ensure column field capacity in MySQL where they very 
telling even has copied our comment: 

b:libzdb: 
https://bitbucket.org/tildeslash/libzdb/src/2958e023fcee44f313e6d3f3592b02cc06783e0f/src/db/mysql/MysqlResultSet.c#lines-84

b:seafile: 
https://github.com/haiwen/seafile-server/blob/9f30eedc467bf5938ff57e24cee3a5b473e72314/common/db-wrapper/mysql-db-ops.c#L277

**** Seafile's reply

The code similarity is demanded by the MySQL client API. If we don't construct 
the data structures in this way, we cannot correctly call the MySQL library.

Summary:
==================
The code similarity is largely due to below facts:
1. Various DBMS have some similar interface concepts. So it's natural for 
anyone who wants to implement db wrapper libraries to abstract the interfaces 
this way. And we made our own changes to the interface and architecture. It's 
not a direct copy of the interfaces and architecture. (BTW, in my humble 
opinion, interfaces and architecture is not covered by GPL license.)
2. Another factor is that in C language there are only limited ways to 
architect the code of db wrapper libraries.
3. Many code are required to be written as it is since we have to conform to 
DBMS's client API.

We're happy to attribute to the ideas we learn from libzdb. And we'll try to 
change the code that look too similar to libzdb's.

--
Regards
Jonathan Xu

Reply to copyright concerns for libzdb in Seafile

Reply via email to