[Dbix-class] Explicit ASTs (ping nate)

Mon Sep 4 08:10:04 CEST 2006

At 12:52 PM -0700 9/3/06, David E. Wheeler wrote:
>On Sep 2, 2006, at 19:12, Darren Duncan wrote:
<snip>
>On Sep 2, 2006, at 20:24, Darren Duncan wrote:
<snip>
>I do, but I think that your goals and mine are somewhat different.

While your goals don't require as much attention to detail and/or 
"work" to implement, I raised the detail points I did because I know 
that people will keep wanting more out of this solution under 
discussion, and that having an eye on design principles used by more 
involved systems will let us design this good-enough smaller system 
in such a way that it is easier to scale up with new features later, 
rather than resorting to premature hacks and/or breaking backwards 
compatability unnecessarily.

Also, based on the discussions to date, I didn't/don't really know 
what the intended scope of the project is, so I just defaulted to 
assuming this may grow to resemble a "complete" solution.  Perhaps 
more commentary about what people do or don't actually want in the 
explicit AST is helpful when going forward, such as where the line 
might be between good-enough and insufficient.  Mind you, some of 
what you said in your reply does just that, assuming others agree.

>I think that's true if you want to write an RDBMS. But we're just
>trying to support SELECT statements, here.

Fair enough, if that's actually true, and it does cut down the 
problem space by orders of magnitude.

In fact, this means that every AST can simply be a single 
arbitrary-depth expression which represents the select statement, all 
generally in the functional language sense.  Each node in said 
expression represents either a literal or variable name or an 
operator invocation that takes zero or more arguments and returns a 
value.  The root node's value would be of a Table or Relation type, 
since that's what a SELECT returns.

If you want to say the equivalent of "SELECT * FROM foo" then the 
root node is simply the database table variable named "foo".  If you 
want to say "SELECT a FROM foo" then we have 3 nodes instead, where 
the root node invokes a built-in function that extracts columns from 
a table value to produce another one, and its arguments are the 
source table variable name and the names of the columns to extract.

Generally speaking, just use a collection of separate simple 
relational operators (take the "original 8" and/or D for inspiration) 
that together do what a SELECT does, and then compose them into a 
SELECT when generating the SQL.

That makes something that is a lot more Perl-like and easier for 
programmers to understand, while people that know SQL already know 
what the parts of a SELECT do and can easily compose the analogous 
simpler functions.

Or just have a big "select" operator instead that is relatively 
complicated, though I would strongly suggest that the the more 
smaller functions are less work than the big one.  (I know from 
experience when making the defunct SQL::Routine, where much of the 
complexity was modelling an actual SELECT statement.)

But however you do it, if you just deal with function operators 
everywhere, including for both the select and any 
math/string/whatever operations, your syntax will be straightforward 
and simple, and Perl-like, and easier to make work over a non-SQL 
backend like LDAP or whatever.

>  > But at the same time, it is very useful to have the
>>  concept of name-spaces (loosely analagous to SQL schemas and/or
>>  packages).  Have a pre-defined namespace for built-ins, eg, System::,
>>  and another for user-defined functions, eg, User::, and use these
>>  everywhere in the AST.
>
>Well, what for? I mean, if I write Perl, and I use the uc() function 
>in one place andy my own foo() function somewhere else, Perl doesn't 
>have to know the namespaces of each to tell them apart.

Yes.  But we are making an EXPLICIT AST, right?  The DWIM wrapper 
would just take uc() of course, but I suggest the explicit version 
including a namespace will just make it less ambiguous in important 
ways.  Eg, such as if a user defines a function that is the same name 
as one of our AST's built-ins, because it isn't the same as their 
choice of underlying DBMS' reserved word.  Of course, the explicit 
AST should be easy to use, but one point of it being intended to use 
under a wrapper is that we can make it more verbose to aid clarity.

>  > Then almost all things in the AST are simply
>>  expressions involving function calls, each of which takes zero or
>>  more arguments and returns a value.  All parts of a SELECT statement
>>  can also be composed in this format.
>
>Yes, although I *really* want binary operators. Object::Relation just 
>uses functions, and it works, but this confuses people:
>
>    my $iter = $store->query(
>      'Object::Relation::Phony::Person' =>
>      last_name                  => 'Wall',
>      first_name                 => 'Larry',
>      OR (
>          bio                   => LIKE ' perl ' ),
>          'contact.type'        => LIKE 'email',
>      )
>    );
>
>So, does OR make all of its arguments OR together and AND them 
>relative to what comes before, or does it AND its arguments together 
>and OR them relative to what comes before? The answer, for 
>Object::Relation, is the latter, because I wanted OR to behave like a
>binary operator. But one could interpret it either way.

I fully agree that that code example is confusing.  I would expect 
there to be explicit operators for both AND() and OR() at any time 
where they are intended; you should be constructing an expression 
where the root node returns a boolean value, which is what AND(), 
OR(), and any arguments to those return.

-- Darren Duncan