[Dbix-class] Explicit ASTs (ping nate)
Darren Duncan
darren at DarrenDuncan.net
Mon Sep 4 08:10:04 CEST 2006
At 12:52 PM -0700 9/3/06, David E. Wheeler wrote:
>On Sep 2, 2006, at 19:12, Darren Duncan wrote:
<snip>
>On Sep 2, 2006, at 20:24, Darren Duncan wrote:
<snip>
>I do, but I think that your goals and mine are somewhat different.
While your goals don't require as much attention to detail and/or
"work" to implement, I raised the detail points I did because I know
that people will keep wanting more out of this solution under
discussion, and that having an eye on design principles used by more
involved systems will let us design this good-enough smaller system
in such a way that it is easier to scale up with new features later,
rather than resorting to premature hacks and/or breaking backwards
compatability unnecessarily.
Also, based on the discussions to date, I didn't/don't really know
what the intended scope of the project is, so I just defaulted to
assuming this may grow to resemble a "complete" solution. Perhaps
more commentary about what people do or don't actually want in the
explicit AST is helpful when going forward, such as where the line
might be between good-enough and insufficient. Mind you, some of
what you said in your reply does just that, assuming others agree.
>I think that's true if you want to write an RDBMS. But we're just
>trying to support SELECT statements, here.
Fair enough, if that's actually true, and it does cut down the
problem space by orders of magnitude.
In fact, this means that every AST can simply be a single
arbitrary-depth expression which represents the select statement, all
generally in the functional language sense. Each node in said
expression represents either a literal or variable name or an
operator invocation that takes zero or more arguments and returns a
value. The root node's value would be of a Table or Relation type,
since that's what a SELECT returns.
If you want to say the equivalent of "SELECT * FROM foo" then the
root node is simply the database table variable named "foo". If you
want to say "SELECT a FROM foo" then we have 3 nodes instead, where
the root node invokes a built-in function that extracts columns from
a table value to produce another one, and its arguments are the
source table variable name and the names of the columns to extract.
Generally speaking, just use a collection of separate simple
relational operators (take the "original 8" and/or D for inspiration)
that together do what a SELECT does, and then compose them into a
SELECT when generating the SQL.
That makes something that is a lot more Perl-like and easier for
programmers to understand, while people that know SQL already know
what the parts of a SELECT do and can easily compose the analogous
simpler functions.
Or just have a big "select" operator instead that is relatively
complicated, though I would strongly suggest that the the more
smaller functions are less work than the big one. (I know from
experience when making the defunct SQL::Routine, where much of the
complexity was modelling an actual SELECT statement.)
But however you do it, if you just deal with function operators
everywhere, including for both the select and any
math/string/whatever operations, your syntax will be straightforward
and simple, and Perl-like, and easier to make work over a non-SQL
backend like LDAP or whatever.
> > But at the same time, it is very useful to have the
>> concept of name-spaces (loosely analagous to SQL schemas and/or
>> packages). Have a pre-defined namespace for built-ins, eg, System::,
>> and another for user-defined functions, eg, User::, and use these
>> everywhere in the AST.
>
>Well, what for? I mean, if I write Perl, and I use the uc() function
>in one place andy my own foo() function somewhere else, Perl doesn't
>have to know the namespaces of each to tell them apart.
Yes. But we are making an EXPLICIT AST, right? The DWIM wrapper
would just take uc() of course, but I suggest the explicit version
including a namespace will just make it less ambiguous in important
ways. Eg, such as if a user defines a function that is the same name
as one of our AST's built-ins, because it isn't the same as their
choice of underlying DBMS' reserved word. Of course, the explicit
AST should be easy to use, but one point of it being intended to use
under a wrapper is that we can make it more verbose to aid clarity.
> > Then almost all things in the AST are simply
>> expressions involving function calls, each of which takes zero or
>> more arguments and returns a value. All parts of a SELECT statement
>> can also be composed in this format.
>
>Yes, although I *really* want binary operators. Object::Relation just
>uses functions, and it works, but this confuses people:
>
> my $iter = $store->query(
> 'Object::Relation::Phony::Person' =>
> last_name => 'Wall',
> first_name => 'Larry',
> OR (
> bio => LIKE ' perl ' ),
> 'contact.type' => LIKE 'email',
> )
> );
>
>So, does OR make all of its arguments OR together and AND them
>relative to what comes before, or does it AND its arguments together
>and OR them relative to what comes before? The answer, for
>Object::Relation, is the latter, because I wanted OR to behave like a
>binary operator. But one could interpret it either way.
I fully agree that that code example is confusing. I would expect
there to be explicit operators for both AND() and OR() at any time
where they are intended; you should be constructing an expression
where the root node returns a boolean value, which is what AND(),
OR(), and any arguments to those return.
-- Darren Duncan
More information about the Dbix-class
mailing list