[Dbix-class] Explicit ASTs (ping nate)
Darren Duncan
darren at DarrenDuncan.net
Sun Sep 3 05:24:55 CEST 2006
At 7:12 PM -0700 9/2/06, Darren Duncan wrote:
>From my experience with working on Set::Relation and the Rosetta DBMS
>(and the now defunct SQL::Routine), if you *really* want to have an
>explicit AST that says exactly what you mean, is expressive enough
>for 99+% of any real-world uses, and is very portable, you
>essentially have to define a whole turing complete language,
>including the basics (which is what I am doing).
As an addenum to my previous message ...
I want to emphasize the importance of explicitly defining not only
the syntax but also the semantics of all built-in operators that your
explicit AST represents, so that there is no troublesome ambiguity in
user's minds, and so that translators of this AST into various SQL
dialects can produce SQL that does exactly what is expected.
For example, you need to explicitly say what you get out of a
division operation taking 2 integers that don't divide evenly, such
as 11/2; eg, does that return an integer of value 5 or a rational of
value 5.5? Different DBMS will implicitly go one way or the other,
and consistent default behaviour is not portable with the plan N / M
syntax.
So you need to explicitly define the bahaviour of your AST's division
operator, or better yet, provide multiple division operators that
have different fully-qualified names (eg: System::Int::div, which
uses integers as input and output vs System::Rat::div which uses
rationals as input and output), so that both behaviours are available
to choose from. Then, depending on which version is chosen and what
the default behaviour of the DBMS is, the SQL generator can either
make a plain N/M or something more complicated that achieves the same
result. Eg, if you want to simulate the Int version on a system that
only natively has a "number" type, you could render Int::div( N, M )
into "floor( floor(N) / floor(M) )".
Of course, the exact syntax and implementation is up to you, but I
hope you get my point.
Likewise, you want to explicitly define what data types your system
has (eg, Boolean, Integer, Rational, String, Table, Row), and what
values each is or is not allowed to hold, and only allow explicit
casts of values from one type to another. One advantage of explicit
casts is that you can include parameters in the casting operator
which lets you say how the mapping is done. For example, the
String-to-Integer operator can include an argument where you say what
numeric base the string is in, eg decimal vs octal vs hex vs binary,
so that something like '123' is turned into the correct number.
Similarly, you will want to explicitly declare what types your
literals are, so that eg "Int(2.0)" and "Rat(2.0)" produce a value of
the data type you want, unlike the ambiguous "2.0", which really
trips people up with some DBMSs, such as SQLite.
Trust me that I have thought about these matters a lot, and if you
want a reliable system, you want to take my suggestions to heart.
-- Darren Duncan
P.S. For those of you who hadn't already heard, I decided a few
months ago that my projects like Set::Relation and Rosetta will be
written in Perl 6 only, for various reasons, and so the only way you
will be able to use them in your Perl 5 programs on a pure Perl 5
system is by installing the excellent v6.pm CPAN module, which is
what I will be doing. So they will still integrate with Perl 5 well
and at a respectable speed (due to Module::Compile-like-mojo that
v6.pm does/can use), but there will be the v6.pm dependency overhead.
More information about the Dbix-class
mailing list