[Dbix-class] Explicit ASTs (ping nate)

Darren Duncan darren at DarrenDuncan.net
Sun Sep 3 05:24:55 CEST 2006


At 7:12 PM -0700 9/2/06, Darren Duncan wrote:
>From my experience with working on Set::Relation and the Rosetta DBMS
>(and the now defunct SQL::Routine), if you *really* want to have an
>explicit AST that says exactly what you mean, is expressive enough
>for 99+% of any real-world uses, and is very portable, you
>essentially have to define a whole turing complete language,
>including the basics (which is what I am doing).

As an addenum to my previous message ...

I want to emphasize the importance of explicitly defining not only 
the syntax but also the semantics of all built-in operators that your 
explicit AST represents, so that there is no troublesome ambiguity in 
user's minds, and so that translators of this AST into various SQL 
dialects can produce SQL that does exactly what is expected.

For example, you need to explicitly say what you get out of a 
division operation taking 2 integers that don't divide evenly, such 
as 11/2; eg, does that return an integer of value 5 or a rational of 
value 5.5?  Different DBMS will implicitly go one way or the other, 
and consistent default behaviour is not portable with the plan N / M 
syntax.

So you need to explicitly define the bahaviour of your AST's division 
operator, or better yet, provide multiple division operators that 
have different fully-qualified names (eg: System::Int::div, which 
uses integers as input and output vs System::Rat::div which uses 
rationals as input and output), so that both behaviours are available 
to choose from.  Then, depending on which version is chosen and what 
the default behaviour of the DBMS is, the SQL generator can either 
make a plain N/M or something more complicated that achieves the same 
result.  Eg, if you want to simulate the Int version on a system that 
only natively has a "number" type, you could render Int::div( N, M ) 
into "floor( floor(N) / floor(M) )".

Of course, the exact syntax and implementation is up to you, but I 
hope you get my point.

Likewise, you want to explicitly define what data types your system 
has (eg, Boolean, Integer, Rational, String, Table, Row), and what 
values each is or is not allowed to hold, and only allow explicit 
casts of values from one type to another.  One advantage of explicit 
casts is that you can include parameters in the casting operator 
which lets you say how the mapping is done.  For example, the 
String-to-Integer operator can include an argument where you say what 
numeric base the string is in, eg decimal vs octal vs hex vs binary, 
so that something like '123' is turned into the correct number.

Similarly, you will want to explicitly declare what types your 
literals are, so that eg "Int(2.0)" and "Rat(2.0)" produce a value of 
the data type you want, unlike the ambiguous "2.0", which really 
trips people up with some DBMSs, such as SQLite.

Trust me that I have thought about these matters a lot, and if you 
want a reliable system, you want to take my suggestions to heart.

-- Darren Duncan

P.S. For those of you who hadn't already heard, I decided a few 
months ago that my projects like Set::Relation and Rosetta will be 
written in Perl 6 only, for various reasons, and so the only way you 
will be able to use them in your Perl 5 programs on a pure Perl 5 
system is by installing the excellent v6.pm CPAN module, which is 
what I will be doing.  So they will still integrate with Perl 5 well 
and at a respectable speed (due to Module::Compile-like-mojo that 
v6.pm does/can use), but there will be the v6.pm dependency overhead.




More information about the Dbix-class mailing list