[Dbix-class] Explicit ASTs (ping nate)

Tue Sep 5 00:54:18 CEST 2006

At 11:04 AM -0700 9/4/06, David E. Wheeler wrote:
>  > In fact, this means that every AST can simply be a single
>>  arbitrary-depth expression which represents the select statement, all
>>  generally in the functional language sense.  Each node in said
>>  expression represents either a literal or variable name or an
>>  operator invocation that takes zero or more arguments and returns a
>>  value.  The root node's value would be of a Table or Relation type,
>>  since that's what a SELECT returns.
>
>Yes, exactly, although the return value from the root node is not 
>specified; that will be implementation-specific.

Well, I'm saying the return type here in the sense to what a SELECT 
conceptually returns.  By implementation-specific, I assume you mean 
what Perl data type is returned, since Perl 5 doesn't have a Table or 
Relation type.

My own impression of a best type of response is that an object is 
returned from which you can either get the resultset all at once, eg 
as an array of hashes, or you can enumerate through it one row at a 
time, etc, such as if you have a large result set that uses a cursor 
behind the scenes since it won't fit in RAM.

But still, this matter is largely orthogonal to query syntax, unless 
your query can specify what kind of result you want to conceptually 
be given as a result, eg, all-at-once vs cursor handle.

>  > Generally speaking, just use a collection of separate simple
>>  relational operators (take the "original 8" and/or D for inspiration)
>>  that together do what a SELECT does, and then compose them into a
>>  SELECT when generating the SQL.
>
>FYI to others, the "original 8" are:

FYI, I'm writing some pseudo-SQL for each to help illustrate them; 
they may be technically incorrect, but they are only meant to give 
the general idea.  Note that every single one of these examples 
produces a <table-valued-expr> / <tve> as its output, and so can be 
used in the FROM clause of another one where it says <tve>, and so 
any of these can be chained in any order.  (Trivial cases of a <tve> 
are either a table name or a "SELECT <expr> AS <new-col-name>".)

>Restrict / Where

Like: SELECT * FROM <table-valued-expr> WHERE <condition>

(Note that SQL's HAVING clause is exactly the same as the above, but 
that it is applied to a <tve> that comes out of a group+summarize 
process.  Pretend that SQL has no HAVING clause, and you are using a 
subquery in the FROM clause to get the same effect, and you can do 
HAVING with another WHERE.)

>Project / Select

Like: SELECT <column-name-list> FROM <table-valued-expr>

>Join

Like: <tve> NATURAL INNER JOIN <tve>

>Intersect

Like: <tve> INTERSECT CORRESPONDING <tve>

>Union

Like: <tve> UNION CORRESPONDING <tve>

>Difference

Like: <tve> MINUS CORRESPONDING <tve>

>Cartesian Product

Like: <tve> CROSS JOIN <tve>

>Divide

(This has no terse SQL equivalent afaik.)

And then, besides these, there are various other useful ones defined 
by D, which are necessary for everything a modern SQL-SELECT does, 
like:

Rename

Like: SELECT <original-col-name> AS <new-col-name> FROM <tve>

Extend

Like: SELECT *, <expr> AS <new-col-name> FROM <tve>

Semi Join / Matching

Like: SELECT * FROM <tve> WHERE <col-list> IN (SELECT <col-list> FROM <tve2>)

Semi Difference / Not Matching

Like: SELECT * FROM <tve> WHERE <col-list> NOT IN (SELECT <col-list> 
FROM <tve>)

Compose

If table-valued-expressions "foo" and "bar" contain fields [a,b] and 
[b,c] respectively, like: SELECT a, c FROM foo NATURAL INNER JOIN bar

Disjoint Union

This is like xor, which afaik SQL doesn't directly support, so is 
emulatable like: (<tve> UNION CORRESPONDING <tve>) MINUS 
CORRESPONDING (<tve> INTERSECT CORRESPONDING <tve>)

Transitive Closure

(This is for recursive queries, like WITH RECURSIVE or START WITH CONNECT BY.)

Summarize

Like: SELECT count(*) AS <new-col-name> FROM <tve>
Or: SELECT sum(<old-col-name>) AS new-col-name> FROM <tve>

Group

(Many SQL products don't support this directly, AFAIK, since it is 
analagous to producing tables whose field values are themselves 
tables, but it produces an intermediate part of SQL's GROUP BY 
process.)

Substitute

Like an expressional version of SQL's UPDATE statement.

... etc.

>  > That makes something that is a lot more Perl-like and easier for
>>  programmers to understand, while people that know SQL already know
>>  what the parts of a SELECT do and can easily compose the analogous
>>  simpler functions.
>
>Well, maybe. I had to really struggle to understand those operators, 
>myself. I still don't have much of a grasp (I am not a mathematician, 
>let alone a set theorist).

Well, I used the word "inspiration" on purpose.  The point is that 
you can use simpler operators that are used together without those 
operators being the same as any of the above (especially if some DBMS 
products don't easily support some of the above in isolation from a 
combination with others).  The important principle is breaking down 
the bigger problem into clearly defined and easier to use smaller 
ones.

>See "Database in Depth" pp 86-92 for detailed descriptions of each.
>
>    http://www.amazon.com/exec/obidos/ASIN/0596100124/

I will also take this moment to point out a brand new book by the 
same authors, which is "The Relational Database Dictionary".  This 
book is more of a pocket reference type thing at 122 pages, and is 
inexpensive.

See: http://www.oreilly.com/catalog/relationaldb/index.html

I've already ordered a copy for myself, today.

>Yes, but the syntax in pure Perl is hard; see my recent post on p5p
>for details.

That depends on how you do it.  There are various Perl idioms that can be used.

If you want to generate SQL or other non-Perl things from it, then 
you could make the input more AST-like in appearance.

If you want it to execute in Perl, you could replace some things with 
anonymous subroutines; eg, a WHERE clause is essentially like Perl's 
"grep" in structure, so you could take an anonymous sub ref that is 
invoked for each row, given that row as its argument, and it returns 
true or false based on whether the row matches its criteria.

Note that my Set::Relation module will be taking the latter approach, 
while Rosetta will be taking the former approach.

If you don't want either of those approaches, then I don't know 
enough about Perl to do what you want, at least not without more 
prompting or help.

-- Darren Duncan