[Dbix-class] Explicit ASTs (ping nate)

David E. Wheeler david at kineticode.com
Mon Sep 4 20:04:49 CEST 2006


On Sep 3, 2006, at 23:10, Darren Duncan wrote:

> While your goals don't require as much attention to detail and/or
> "work" to implement, I raised the detail points I did because I know
> that people will keep wanting more out of this solution under
> discussion, and that having an eye on design principles used by more
> involved systems will let us design this good-enough smaller system
> in such a way that it is easier to scale up with new features later,
> rather than resorting to premature hacks and/or breaking backwards
> compatability unnecessarily.

Oh, agreed. It's because of the insufficient design of the current  
implementations that we're having this discussion (see mst's comments  
about Object::Relation in the first post in this thread).

> Also, based on the discussions to date, I didn't/don't really know
> what the intended scope of the project is, so I just defaulted to
> assuming this may grow to resemble a "complete" solution.  Perhaps
> more commentary about what people do or don't actually want in the
> explicit AST is helpful when going forward, such as where the line
> might be between good-enough and insufficient.  Mind you, some of
> what you said in your reply does just that, assuming others agree.

Right.

> Fair enough, if that's actually true, and it does cut down the
> problem space by orders of magnitude.

Hell yeah! We'll leave creating an actual RDBMS to insane geniuses  
like you. ;-)

> In fact, this means that every AST can simply be a single
> arbitrary-depth expression which represents the select statement, all
> generally in the functional language sense.  Each node in said
> expression represents either a literal or variable name or an
> operator invocation that takes zero or more arguments and returns a
> value.  The root node's value would be of a Table or Relation type,
> since that's what a SELECT returns.

Yes, exactly, although the return value from the root node is not  
specified; that will be implementation-specific.

> Generally speaking, just use a collection of separate simple
> relational operators (take the "original 8" and/or D for inspiration)
> that together do what a SELECT does, and then compose them into a
> SELECT when generating the SQL.

FYI to others, the "original 8" are:

Restrict
Project
Join
Intersect
Union
Difference
Cartesian Product
Divide

See "Database in Depth" pp 86-92 for detailed descriptions of each.

   http://www.amazon.com/exec/obidos/ASIN/0596100124/

> That makes something that is a lot more Perl-like and easier for
> programmers to understand, while people that know SQL already know
> what the parts of a SELECT do and can easily compose the analogous
> simpler functions.

Well, maybe. I had to really struggle to understand those operators,  
myself. I still don't have much of a grasp (I am not a mathematician,  
let alone a set theorist).

> Or just have a big "select" operator instead that is relatively
> complicated, though I would strongly suggest that the the more
> smaller functions are less work than the big one.  (I know from
> experience when making the defunct SQL::Routine, where much of the
> complexity was modelling an actual SELECT statement.)

Yes, smaller is definitely better.

> But however you do it, if you just deal with function operators
> everywhere, including for both the select and any
> math/string/whatever operations, your syntax will be straightforward
> and simple, and Perl-like, and easier to make work over a non-SQL
> backend like LDAP or whatever.

Yes, that's the ideal.

> Yes.  But we are making an EXPLICIT AST, right?  The DWIM wrapper
> would just take uc() of course, but I suggest the explicit version
> including a namespace will just make it less ambiguous in important
> ways.  Eg, such as if a user defines a function that is the same name
> as one of our AST's built-ins, because it isn't the same as their
> choice of underlying DBMS' reserved word.  Of course, the explicit
> AST should be easy to use, but one point of it being intended to use
> under a wrapper is that we can make it more verbose to aid clarity.

Fair enough; your point is well-taken. We'd need to have a way to  
distinguish them when new core functions are added, though, where  
they might conflict with user-defined functions.

> I fully agree that that code example is confusing.  I would expect
> there to be explicit operators for both AND() and OR() at any time
> where they are intended; you should be constructing an expression
> where the root node returns a boolean value, which is what AND(),
> OR(), and any arguments to those return.

Yes, but the syntax in pure Perl is hard; see my recent post on p5p  
for details.

Best,

David





More information about the Dbix-class mailing list