[Dbix-class] Database: Slave or Master - call for paper review

Thu Sep 28 10:45:27 CEST 2006

At 6:12 PM +1200 9/28/06, Sam Vilain wrote:
>Hi all,
>
>I've recently written a paper for OSDC.com.au, and I was wondering if
>anyone here would like to provide comments about it.  Amongst other
>things it lays some groundwork for sensible translation of DDL to the
>Perl 6 metamodel and back.

I assume you refer to the conference happening mid-December of this year?

In any event, this whole topic is closely related to what I am doing, 
which includes eliminating any object/relational impledence mismatch 
at the level of the fundamentals.  I will give you what feedback I 
can.

>The Perl 6 fragments I'm hoping to get reviewed on the perl6-language
>list, but the rest of it is for general review.
>
>There is a PDF at http://utsl.gen.nz/talks/dbic-tangram.pdf, and if
>anyone wants to look at the source, that's at
>git://utsl.gen.nz/dbic-tangram-paper

Note that I noticed at least one grammatical error in the PDF, but I 
will not mention those here since you'll probably catch them yourself 
on another read-through.  Also, I won't comment on the Perl 6 code 
examples that you also posted to perl6-language, since p6l is doing 
that.

1.  The paper in general looks very well done, and I wouldn't 
recommend any major changes to it.  You raise a lot of good points 
and ideas that afaik are not part of the status quo, and so 
implementing what you suggest will certainly improve our lot.  Well 
done!

2.  Some of your Perl syntax is out of date already.  Perl 6 no 
longer has a .id, but rather a .WHICH, afaik.

3.  While you can use SQL or its terminology for illustrative 
examples, you should never word your proposal as if it is 
SQL-specific; that would be limiting yourself too much.  There are 
many useful database implementations that don't use SQL as their 
query language, ane even within those that do use SQL, each one uses 
it differently, so its not like 1 language.

Towards that end, here are some suggested rewordings.

Replace:

   Abstract

   finally expressions are related to SQL fragments and iterations 
over expressions related to SQL queries.

With:

   Abstract

   finally Perl expressions are related to definitions of database 
queries and iterations over expressions related to executions of 
those queries.

Replace:

   1. Introduction

   This schism is what we are trying to eliminate, by mapping the 
entire Perl metamodel to some representation of the SQL Data 
Definition Language (DDL), and the entire variety of the DDL to 
corresponding Perl 6 metamodel components.

With:

   1. Introduction

   This schism is what we are trying to eliminate, by mapping the 
entire Perl metamodel to some representation of a Data Definition 
Language (DDL), and the entire variety of the DDL to corresponding 
Perl 6 metamodel components.

4.  Regarding this section:

   2.1 Object Identity as Primary Key

   One could say that instead of using a surrogate row ID, you could 
use the entire object as the 'primary key' - this would certainly sit 
better with Set Theory, and get rid of that 'duplicate row problem' 
that drives the theorists crazy.

   However, much as requiring all tables without primary keys to have 
a big primary key for the entire row would be insane, defining .id so 
that it would include all the properties of the object has side 
effects.

I would say that there is nothing insane about a primary key being 
over all of the columns in a table, if there is no subset of those 
columns that can by themselves uniquely identify an object.

The primary key / object id should correspond to the simplest or 
smallest collection of its attributes that are guaranteed to be 
unique per object, but it may be that with some objects, all of the 
attributes must be included.

But that says more about the object class definition itself than the database.

In any event, if we actually want to store the same object definition 
more than once in a database, because the quantity of such identical 
definitions carries some useful information, then the better approach 
is to just store it once and add an extra database column to store a 
quantity number.  Considering this simple alternative, there is never 
any valid reason to store actual duplicate rows in a database, so set 
theorists and everyone else both can be happy.

(I would also say that there is no reason to have a "primary" key at 
all, since any uniquely identifying set of columns is as good as any 
other, so we simply have one or more "key", and either can be used to 
reference the record.  Though if setting a key aside as "primary" 
makes something easier to do, it isn't bad.)

5.  Regarding this section:

   2.2  Attributes as Columns

   Any attributes that do not have a [corresponding native database] 
type defined are fair game for storing using a proprietary storage 
mechanism, similar to that employed by Tangram::Type::Dump::Any. Such 
a schema can not be called 'normalised' and will not be stable.

Please defend that last statement, or rewrite it to clarify what you 
mean, since I interpret it by itself to be flat-out wrong.

I say that it is perfectly valid for a data type to be arbitrarily 
complicated, and this does in no way violate any relational database 
normalization principles.  For example, it is valid for a data type 
to be not only a number or a string, but it could be a date or a 
geometric shape or a picture or an XML document or whatever.  Any of 
these could conceptually be stored in a single field of a table row, 
either actually in a serialized or database supported form, or in 
another related table.

Regarding stability, the only way something like this can destabilize 
is if the proprietary storage mechanism changes how it works between 
when the database started being used and how it is currently used, 
without running a conversion op on the database.  But then you could 
run into the same problem if you change which database wrapper module 
you use.

6.  Regarding this section:

   4 Query Expressions

   Query expressions are a difficult issue. Nobody seems to want to 
write SQL for everything, but people still want the full power of SQL 
available to them. Why is it that most database abstractions leave 
wide gaping sections of SQL unable to be generated without cumbersome 
manual SQL fragments?

If things go according to my plan, there will be released in a few 
weeks (prior to the end of October) a working Perl 5 module that 
entirely lacks this problem, "QDRDBMS".  So if that actually comes to 
pass, feel free to update the above statement to say that it 
describes the general case, but that my module is an exception.  But 
regardless, I agree with the statement in the general case.

That's all the feedback for this round.

Keep up the good work!

-- Darren Duncan