[Dbix-class] Explicit ASTs (ping nate)
Darren Duncan
darren at DarrenDuncan.net
Tue Sep 5 00:54:18 CEST 2006
At 11:04 AM -0700 9/4/06, David E. Wheeler wrote:
> > In fact, this means that every AST can simply be a single
>> arbitrary-depth expression which represents the select statement, all
>> generally in the functional language sense. Each node in said
>> expression represents either a literal or variable name or an
>> operator invocation that takes zero or more arguments and returns a
>> value. The root node's value would be of a Table or Relation type,
>> since that's what a SELECT returns.
>
>Yes, exactly, although the return value from the root node is not
>specified; that will be implementation-specific.
Well, I'm saying the return type here in the sense to what a SELECT
conceptually returns. By implementation-specific, I assume you mean
what Perl data type is returned, since Perl 5 doesn't have a Table or
Relation type.
My own impression of a best type of response is that an object is
returned from which you can either get the resultset all at once, eg
as an array of hashes, or you can enumerate through it one row at a
time, etc, such as if you have a large result set that uses a cursor
behind the scenes since it won't fit in RAM.
But still, this matter is largely orthogonal to query syntax, unless
your query can specify what kind of result you want to conceptually
be given as a result, eg, all-at-once vs cursor handle.
> > Generally speaking, just use a collection of separate simple
>> relational operators (take the "original 8" and/or D for inspiration)
>> that together do what a SELECT does, and then compose them into a
>> SELECT when generating the SQL.
>
>FYI to others, the "original 8" are:
FYI, I'm writing some pseudo-SQL for each to help illustrate them;
they may be technically incorrect, but they are only meant to give
the general idea. Note that every single one of these examples
produces a <table-valued-expr> / <tve> as its output, and so can be
used in the FROM clause of another one where it says <tve>, and so
any of these can be chained in any order. (Trivial cases of a <tve>
are either a table name or a "SELECT <expr> AS <new-col-name>".)
>Restrict / Where
Like: SELECT * FROM <table-valued-expr> WHERE <condition>
(Note that SQL's HAVING clause is exactly the same as the above, but
that it is applied to a <tve> that comes out of a group+summarize
process. Pretend that SQL has no HAVING clause, and you are using a
subquery in the FROM clause to get the same effect, and you can do
HAVING with another WHERE.)
>Project / Select
Like: SELECT <column-name-list> FROM <table-valued-expr>
>Join
Like: <tve> NATURAL INNER JOIN <tve>
>Intersect
Like: <tve> INTERSECT CORRESPONDING <tve>
>Union
Like: <tve> UNION CORRESPONDING <tve>
>Difference
Like: <tve> MINUS CORRESPONDING <tve>
>Cartesian Product
Like: <tve> CROSS JOIN <tve>
>Divide
(This has no terse SQL equivalent afaik.)
And then, besides these, there are various other useful ones defined
by D, which are necessary for everything a modern SQL-SELECT does,
like:
Rename
Like: SELECT <original-col-name> AS <new-col-name> FROM <tve>
Extend
Like: SELECT *, <expr> AS <new-col-name> FROM <tve>
Semi Join / Matching
Like: SELECT * FROM <tve> WHERE <col-list> IN (SELECT <col-list> FROM <tve2>)
Semi Difference / Not Matching
Like: SELECT * FROM <tve> WHERE <col-list> NOT IN (SELECT <col-list>
FROM <tve>)
Compose
If table-valued-expressions "foo" and "bar" contain fields [a,b] and
[b,c] respectively, like: SELECT a, c FROM foo NATURAL INNER JOIN bar
Disjoint Union
This is like xor, which afaik SQL doesn't directly support, so is
emulatable like: (<tve> UNION CORRESPONDING <tve>) MINUS
CORRESPONDING (<tve> INTERSECT CORRESPONDING <tve>)
Transitive Closure
(This is for recursive queries, like WITH RECURSIVE or START WITH CONNECT BY.)
Summarize
Like: SELECT count(*) AS <new-col-name> FROM <tve>
Or: SELECT sum(<old-col-name>) AS new-col-name> FROM <tve>
Group
(Many SQL products don't support this directly, AFAIK, since it is
analagous to producing tables whose field values are themselves
tables, but it produces an intermediate part of SQL's GROUP BY
process.)
Substitute
Like an expressional version of SQL's UPDATE statement.
... etc.
> > That makes something that is a lot more Perl-like and easier for
>> programmers to understand, while people that know SQL already know
>> what the parts of a SELECT do and can easily compose the analogous
>> simpler functions.
>
>Well, maybe. I had to really struggle to understand those operators,
>myself. I still don't have much of a grasp (I am not a mathematician,
>let alone a set theorist).
Well, I used the word "inspiration" on purpose. The point is that
you can use simpler operators that are used together without those
operators being the same as any of the above (especially if some DBMS
products don't easily support some of the above in isolation from a
combination with others). The important principle is breaking down
the bigger problem into clearly defined and easier to use smaller
ones.
>See "Database in Depth" pp 86-92 for detailed descriptions of each.
>
> http://www.amazon.com/exec/obidos/ASIN/0596100124/
I will also take this moment to point out a brand new book by the
same authors, which is "The Relational Database Dictionary". This
book is more of a pocket reference type thing at 122 pages, and is
inexpensive.
See: http://www.oreilly.com/catalog/relationaldb/index.html
I've already ordered a copy for myself, today.
>Yes, but the syntax in pure Perl is hard; see my recent post on p5p
>for details.
That depends on how you do it. There are various Perl idioms that can be used.
If you want to generate SQL or other non-Perl things from it, then
you could make the input more AST-like in appearance.
If you want it to execute in Perl, you could replace some things with
anonymous subroutines; eg, a WHERE clause is essentially like Perl's
"grep" in structure, so you could take an anonymous sub ref that is
invoked for each row, given that row as its argument, and it returns
true or false based on whether the row matches its criteria.
Note that my Set::Relation module will be taking the latter approach,
while Rosetta will be taking the former approach.
If you don't want either of those approaches, then I don't know
enough about Perl to do what you want, at least not without more
prompting or help.
-- Darren Duncan
More information about the Dbix-class
mailing list