[Dbix-class] Database: Slave or Master - call for paper review

Tue Oct 3 06:24:42 CEST 2006

Darren, replying to individual points now...

Darren Duncan wrote:
> 3.  While you can use SQL or its terminology for illustrative 
> examples, you should never word your proposal as if it is 
> SQL-specific; that would be limiting yourself too much.  There are 
> many useful database implementations that don't use SQL as their 
> query language, ane even within those that do use SQL, each one uses 
> it differently, so its not like 1 language.
>   

Good point; I'll keep the word SQL around but change the wording.

> 4.  Regarding this section:
>
>    2.1 Object Identity as Primary Key
>
>    One could say that instead of using a surrogate row ID, you could 
> use the entire object as the 'primary key' - this would certainly sit 
> better with Set Theory, and get rid of that 'duplicate row problem' 
> that drives the theorists crazy.
>
>    However, much as requiring all tables without primary keys to have 
> a big primary key for the entire row would be insane, defining .id so 
> that it would include all the properties of the object has side 
> effects.
>
> I would say that there is nothing insane about a primary key being 
> over all of the columns in a table, if there is no subset of those 
> columns that can by themselves uniquely identify an object.
>   

You would say that, you appear to have an in-depth knowledge of
relational theory :).

I mean it would be insane, because then the primary key index would be
the same size as the table, unless you made it index organised and the
RDBMS DTRT. It's the 'correct' thing to do on one level, but really not
what people expect. It is more likely to indicate an error.

I'll change the wording there, too.

> The primary key / object id should correspond to the simplest or 
> smallest collection of its attributes that are guaranteed to be 
> unique per object, but it may be that with some objects, all of the 
> attributes must be included.
>   

In one sense, I agree with what you're saying. However, if this is
happening, it means that people are treating the object like it has some
kind of /je ne cest pas/ binding it together, and that's what they're
referring to - its "object-ness". Or, its tuple-ness, if you prefer. In
a relational database, that's fine - you could never uniquely refer to a
row with no constraints anyway; the 'correct' way to refer to such a row
is with all of its values. Or the rowid, but nobody does that except bad
Oracle DB developers IME.

Then the distinction between the rows being an index only (set) or just
a table (bag) comes down to whether you care to honour the relational
idea that tuples should represent facts, and a fact stated twice does
not make it doubly true - as you say, there should be an explicit
category for it.

I have a sneaky feeling that object pointers actually point to the
.WHICH value. Which would mean that you'd invalidate pointers if you
changed those values. But in objects, people won't be used to not being
able to change tuples. Well, except Erlang programmers (Erlang is a
language with no[*] mutable state).

Ok, after a long debate on #perl6, it seems that this is simply Wrong™,
and results in undefined behaviour. ie, if WHICH were to default to all
of the attributes in an object, if you change the object the WHICH
changes, and either your pointers are invalidated or a kitten dies.
You're effectively in this case making a "value type" (or "immutable
type"), like the object that represents "3" or "(1..10)" if you set
WHICH to return something that has the entire value of the object. See S12.

* - yes, that's not entirely true, there are the per-process maps. but
real Erlang programmers don't use them :).
> But that says more about the object class definition itself than the database.
>
> In any event, if we actually want to store the same object definition 
> more than once in a database, because the quantity of such identical 
> definitions carries some useful information, then the better approach 
> is to just store it once and add an extra database column to store a 
> quantity number.  Considering this simple alternative, there is never 
> any valid reason to store actual duplicate rows in a database, so set 
> theorists and everyone else both can be happy.
>
> (I would also say that there is no reason to have a "primary" key at 
> all, since any uniquely identifying set of columns is as good as any 
> other, so we simply have one or more "key", and either can be used to 
> reference the record.  Though if setting a key aside as "primary" 
> makes something easier to do, it isn't bad.)
>
> 5.  Regarding this section:
>
>    2.2  Attributes as Columns
>
>    Any attributes that do not have a [corresponding native database] 
> type defined are fair game for storing using a proprietary storage 
> mechanism, similar to that employed by Tangram::Type::Dump::Any. Such 
> a schema can not be called 'normalised' and will not be stable.
>
> Please defend that last statement, or rewrite it to clarify what you 
> mean, since I interpret it by itself to be flat-out wrong.
>   

If your database doesn't have the ability to store arbitrary, custom
column types, then you have two options:

- express the object's type in a shape you can map, and keep mapping it.
- give up and serialise it to a BLOB or TEXT

Actually the second can be a lot faster, too. And so long as you don't
care about your database ensuring referential integrity or being able to
query or inspect the data :-), it's fine.

> I say that it is perfectly valid for a data type to be arbitrarily 
> complicated, and this does in no way violate any relational database 
> normalization principles.

In principle, yes. But can you point to a database that supports this?

>   For example, it is valid for a data type 
> to be not only a number or a string, but it could be a date or a 
> geometric shape or a picture or an XML document or whatever.  Any of 
> these could conceptually be stored in a single field of a table row, 
> either actually in a serialized or database supported form, or in 
> another related table.
>   

I agree.

> 6.  Regarding this section:
>
>    4 Query Expressions
>
>    Query expressions are a difficult issue. Nobody seems to want to 
> write SQL for everything, but people still want the full power of SQL 
> available to them. Why is it that most database abstractions leave 
> wide gaping sections of SQL unable to be generated without cumbersome 
> manual SQL fragments?
>
> If things go according to my plan, there will be released in a few 
> weeks (prior to the end of October) a working Perl 5 module that 
> entirely lacks this problem, "QDRDBMS".  So if that actually comes to 
> pass, feel free to update the above statement to say that it 
> describes the general case, but that my module is an exception.  But 
> regardless, I agree with the statement in the general case.
>
> That's all the feedback for this round.
>   

I've published an update to my paper, mostly based on your feedback, in
the same place. Diffs at http://www.utsl.gen.nz/gitweb/?p=dbic-tangram-paper

Thanks again!

Sam.

> Keep up the good work!
>
> -- Darren Duncan
>
> _______________________________________________
> List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
> Wiki: http://dbix-class.shadowcatsystems.co.uk/
> IRC: irc.perl.org#dbix-class
> SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
> Searchable Archive: http://www.mail-archive.com/dbix-class@lists.rawmode.org/
>