[Dbix-class] Explicit ASTs (ping nate)

David E. Wheeler david at kineticode.com
Sun Sep 3 21:52:22 CEST 2006


On Sep 2, 2006, at 19:12, Darren Duncan wrote:

>  From my experience with working on Set::Relation and the Rosetta DBMS
> (and the now defunct SQL::Routine), if you *really* want to have an
> explicit AST that says exactly what you mean, is expressive enough
> for 99+% of any real-world uses, and is very portable, you
> essentially have to define a whole turing complete language,
> including the basics (which is what I am doing).

I think that's true if you want to write an RDBMS. But we're just  
trying to support SELECT statements, here.

> It also helps to, on one hand, abstract away any differences in the
> syntax for invoking built-in operators (eg, '=', 'AND', 'LIKE') and
> user-defined operators (stored functions / procedures), so that
> syntax-wise, user-defined operators are simply extensions to the
> language.

Agreed. They wouldn't look any different, syntactically.

> But at the same time, it is very useful to have the
> concept of name-spaces (loosely analagous to SQL schemas and/or
> packages).  Have a pre-defined namespace for built-ins, eg, System::,
> and another for user-defined functions, eg, User::, and use these
> everywhere in the AST.

Well, what for? I mean, if I write Perl, and I use the uc() function  
in one place andy my own foo() function somewhere else, Perl doesn't  
have to know the namespaces of each to tell them apart.

> Then almost all things in the AST are simply
> expressions involving function calls, each of which takes zero or
> more arguments and returns a value.  All parts of a SELECT statement
> can also be composed in this format.

Yes, although I *really* want binary operators. Object::Relation just  
uses functions, and it works, but this confuses people:

   my $iter = $store->query(
     'Object::Relation::Phony::Person' =>
     last_name                  => 'Wall',
     first_name                 => 'Larry',
     OR (
         bio                   => LIKE ' perl ' ),
         'contact.type'        => LIKE 'email',
     )
   );

So, does OR make all of its arguments OR together and AND them  
relative to what comes before, or does it AND its arguments together  
and OR them relative to what comes before? The answer, for  
Object::Relation, is the latter, because I wanted OR to behave like a  
binary operator. But one could interpret it either way.

> I suggest making the arguments
> named as well, so they are better self-documenting.

I see no reason to have named arguments for uc() or count() or sum().  
But if you're talking about the arguments to the DWIM query function/ 
method, I think that's how it'd likely work.

> Note that the
> names you use in the built-ins under System:: don't have to look the
> same as the SQL analogs, especially when different DBMSs aren't the
> same as each other in those regards.

They wouldn't. More than likely I'd make them look like the Perl  
analogs. But in the AST, it really doesn't matter, since its more  
convention than interface.

> You'll also want to explicitly define a type system, and you'll want
> at least the Boolean type.

What for? The database handles that.

On Sep 2, 2006, at 20:24, Darren Duncan wrote:

> For example, you need to explicitly say what you get out of a
> division operation taking 2 integers that don't divide evenly, such
> as 11/2; eg, does that return an integer of value 5 or a rational of
> value 5.5?  Different DBMS will implicitly go one way or the other,
> and consistent default behaviour is not portable with the plan N / M
> syntax.

I'm okay with certain behaviors being dependent on the data store.  
For example, the three major open-source databases all support  
regular expressions, but different flavors: POSIX in PostgreSQL, PCRE  
in MySQL (IIRC), and whatever you plug in to SQLite (which in our  
case would be full-blown Perl regular expressions). There is no way  
to normalize these for different back ends, and as I said in another  
message to Matt, certain data stores may not support regular  
expressions at all.

We're going for a 90% solution here, not writing an RDBMS of our own,  
so a certain level of knowledge of the back end in edge case  
situations is perfectly acceptable to me.

> So you need to explicitly define the bahaviour of your AST's division
> operator, or better yet, provide multiple division operators that
> have different fully-qualified names (eg: System::Int::div, which
> uses integers as input and output vs System::Rat::div which uses
> rationals as input and output), so that both behaviours are available
> to choose from.  Then, depending on which version is chosen and what
> the default behaviour of the DBMS is, the SQL generator can either
> make a plain N/M or something more complicated that achieves the same
> result.  Eg, if you want to simulate the Int version on a system that
> only natively has a "number" type, you could render Int::div( N, M )
> into "floor( floor(N) / floor(M) )".

That's way too fucking much work for what I, at least, want to be  
able to use this system for. Knowing the quirks of the back end data  
store does not bother me in the slightest.

> Of course, the exact syntax and implementation is up to you, but I
> hope you get my point.

I do, and it's valid, but I think that it is beyond the scope of what  
we're actually trying to accomplish here.

> Likewise, you want to explicitly define what data types your system
> has (eg, Boolean, Integer, Rational, String, Table, Row), and what
> values each is or is not allowed to hold, and only allow explicit
> casts of values from one type to another.  One advantage of explicit
> casts is that you can include parameters in the casting operator
> which lets you say how the mapping is done.  For example, the
> String-to-Integer operator can include an argument where you say what
> numeric base the string is in, eg decimal vs octal vs hex vs binary,
> so that something like '123' is turned into the correct number.

Again, we don't want to write an RDBMS. To my mind, data type  
enforcement is up to the class definitions (e.g., Moose constraints  
or Class::Meta data types) and the data store.

> Similarly, you will want to explicitly declare what types your
> literals are, so that eg "Int(2.0)" and "Rat(2.0)" produce a value of
> the data type you want, unlike the ambiguous "2.0", which really
> trips people up with some DBMSs, such as SQLite.

I'll wait for your Perl 6-based RDBMS so that I don't have to do all  
that work myself. Seem fair enough? ;-)

> Trust me that I have thought about these matters a lot, and if you
> want a reliable system, you want to take my suggestions to heart.

I do, but I think that your goals and mine are somewhat different.

Best,

David





More information about the Dbix-class mailing list