[Dbix-class] SQLA refactor: proposal

Tue Nov 20 22:23:44 GMT 2007

This is an initial short reply, and I'll add more over time and/or in 
response to the comments/questions of others.

My first general comment is that probably anything which can be 
expressed as a pure functional expression, which generally is any SQL 
SELECT, is probably the least complicated to translate, because it 
isn't modifying database state.

At 8:14 PM +0000 11/20/07, Matt S Trout wrote:
>Far, far too much of the code is currently in _recurse_where, and the DWIM
>AST provides quite a few ways to say the same thing. From a few hours'
>thought, I think dduncan's
>
>http://search.cpan.org/~duncand/Language-MuldisD-0.11.0/lib/Language/MuldisD/Core.pod
>
>provides most of what we'll need - certainly the SELECT list, simple FROM
>and WHERE stuff is covered.

To help out with understanding this stuff, I'll list a few specific 
Muldis D operators that correspond to different parts of a SQL 
SELECT; the syntax is representative rather than exact:

----

   To return a rowset of all rows in a table.

   SQL:
     SELECT * FROM tab1

   Muldis D:
     tab1

----

   To return a scalar count of rows in a table.

   SQL (as a scalar query):
     (SELECT COUNT(*) FROM tab1)

   Muldis D:
     sys.Core.Relation.cardinality( topic => tab1 )

----

   To return a rowset having just some columns from all rows of a table.

   SQL:
     SELECT DISTINCT col1, col2
     FROM tab1

   Muldis D:
     sys.Core.Relation.project(
       topic => tab1,
       attrs => [col1, col2],
     )

----

   To return a rowset having all cols of just some rows of a table, 
where certain columns match desired values.

   SQL:
     SELECT *
     FROM tab1
     WHERE col1 = 'hello' AND col2 = 5
        OR col1 = 'world' AND col2 = 7
   or:
     SELECT *
     FROM tab1
     WHERE (col1, col2) IN (
       SELECT 'hello' AS col1, 5 AS col2
       UNION
       SELECT 'world' AS col1, 7 AS col2
     )

   Muldis D:
     sys.Core.Relation.semijoin(
       source => tab1,
       filter => [
         { col1 => 'hello', col2 => 5 },
         { col1 => 'world', col2 => 7 },
       ],
     )

   Alternately you can use sys.Core.Relation.semidifference if you 
want rows that *don't* match aka NOT IN.

   Alternately you can use sys.Core.Relation.restrict for generic 
boolean expressions besides matches/not-matches, aka Perl's "grep".

----

   To do a natural inner join on 2 tables, say tab1(col1,col2,col3), 
tab2(col2,col3,col4), such that the result is 
tab(col1,col2,col3,col4).

   SQL:
     SELECT DISTINCT tab1.*, tab2.col4
     FROM tab1 NATURAL INNER JOIN tab2
   or:
     SELECT DISTINCT tab1.*, tab2.col4
     FROM tab1
       INNER JOIN tab2 USING (col2, col3)
   or:
     SELECT DISTINCT tab1.*, tab2.col4
     FROM tab1
       INNER JOIN tab2 ON tab2.col2 = tab1.col2 AND tab2.col3 = tab1.col3

   Muldis D:
     sys.Core.Relation.join(
       topic => [tab1, tab2],
     )

----

   To select rows but with some cols renamed:

   SQL:
     SELECT col1 AS foo, col2, col3 AS bar
     FROM tab1

   Muldis D:
     sys.Core.Relation.rename(
       topic => tab1,
       map => { col1 => foo, col3 => bar },
     )

----

   To select rows with extra columns:

   SQL:
     SELECT tab1.*, 'x' AS a, (tab1.m - tab1.n) AS b FROM tab1

   Muldis D
     sys.Core.Relation.extend(
       topic => tab1,
       func => func:{ result {
         a => 'x',
         b => sys.Core.Int.difference(
           minuend => :topic.m,
           subtrahend => :topic.n
         ),
       } },
       assuming => {},
     )

   Note that 'extend' works sort of like Perl's "map" operator; 
"topic" is the inner function's argument corresponding to "$_", which 
is each input row, and "assuming" carries any other arguments that 
might be needed.

----

   To just select a constant scalar value:

   SQL:
     (SELECT 'x')
   or:
     (SELECT 'x' FROM dual)

   Muldis D:
     'x'

----

More examples are available on demand.

Note that all Muldis D examples above can be nested arbitrarily (any 
"tab1" et al can be a subquery rather than a table variable name), so 
eg combine a restrict and project in either nesting order to pick 
columns and rows both; this extends naturally to both what SQL calls 
subqueries or derived tables, and to typical scalar value expressions.

I have not yet demonstrated group-by etc, but they are supported too 
by way of operators like sys.Core.Relation.summarize() and 
sys.Core.Relation.group().

I also have not yet demonstrated union/intersect/minus, but they are 
all represented by sys.Core.Relation.union/intersection/difference().

I have not yet demonstrated hierarchical queries, but they are 
represented by sys.Core.Relation.tclose().

As for outer-joins, they have yet no specific operator for them, but 
you can achieve the desired effect using your choice of group() or 
extend() and union() etc; in the near future I can add operators 
specific to outer-joins which essentially are short-hand for another 
combination.

As for joins that want to work on matches between columns with 
different names (or avoid matches on columns with the same names), 
just invoke rename() before join() to get the desired effect; this 
includes to make self-joins work as desired.

As for invoking user-defined functions, you do it the same way as 
invoking any sys.Core.whatever function, except it is spelled eg 
dep.myschema.myfunction().

As for SQL's ORDER and LIMIT clauses, I haven't put analogies of 
those in Muldis D yet, but they are pending.

Note that all Muldis D relational operators are set oriented, and 
have an implicit SQL DISTINCT clause / GROUP BY all rowset columns. 
If you want to express a count of things, use an extra count-value 
column rather than storing duplicate rows.  Or add some 
distinguishing column (eg, a date stamp) to make otherwise duplicate 
rows non-duplicated.

>There are a few SQL-y operators such as LIKE or BETWEEN that don't seem to
>be in there, but I don't see why they can't be in sqla.Core.Str.Between
>etc.

If you want to extend the system-defined operators, better namespaces 
are the likes of imp.sqla.Str.Between, or sys.sqla.Str.Between, and 
the first is preferable.  Also, Muldis D differentiates character and 
binary strings with Text and Blob respectively, so I recommend using 
those rather than Str as names.

Note that Muldis D has distinct function and procedure routines, 
unlike Perl 5, and functions can't have side-effects.  Between() 
would be a function, as would the main body of any Select(), but 
insert/update/delete would be a procedure.

As for LIKE, the common case where you just want to see if something 
is a substring of something else is covered by 
sys.Core.Text.contains(); this has the flexability to specify 
at-start/at-end/may-be-middle.  But the less common cases covered by 
a generic LIKE aren't in Muldis D yet.

As for BETWEEN, there is currently no Muldis D operator specifically 
for that, though one could be added easily enough; meanwhile, the 
likes of "x >= 4 and x <= 7" works, as does "x IN (4,5,6,7)", 
assuming integers; the latter form is semijoin().

>So I think the first stage is to create an SQL::Abstract::Parser that
>parses SQLA syntax into MuldisD (+extra sqla. definitions) and an
>SQL::Abstract::Producer::SQL that converts from MuldisD to SQL, and then
>make core SQL::Abstract use that and pass all tests.
>
>Thoughts?

Superficially, that sounds fine with me.  I'll wait for further 
comments on this to address them directly.

-- Darren Duncan