[Dbix-class] SQLA refactor: proposal

Sat Nov 24 01:09:53 GMT 2007

Two main things for this email, separated by the dashed lines.

---------

I'll now briefly introduce the Muldis D analogues of a few more SQL 
expressions, all related to row grouping in some way (and remember 
that Muldis D supports rowset field values that are themselves 
rowsets).

Sorry, no code examples this time, just descriptions.

1.  When you want to use aggregate operators like sum/avg/min/max/etc 
on an entire rowset such that you want to produce exactly 1 result 
row, then that's what the sys.Core.Relation.summarize function is 
for.  What 'summarize' does is create a new row (sort of like 
'extend' does), where each named field is the result of some N-ary 
function that uses one column of a source rowset as input.

FYI, Muldis D currently provides all of these N-ary (0..N operand) 
functions, any of which can be used in a 'summarize':

     sys.Core.Ordered.min of Some.Ordered (Set{Some.Ordered} $topic)
     sys.Core.Ordered.max of Some.Ordered (Set{Some.Ordered} $topic)
     sys.Core.Ordered.maybe_min of Maybe{Some.Ordered} 
(Set{Some.Ordered} $topic)
     sys.Core.Ordered.maybe_max of Maybe{Some.Ordered} 
(Set{Some.Ordered} $topic)
     sys.Core.Bool.and of Bool (Set{Bool} $topic)
     sys.Core.Bool.or of Bool (Set{Bool} $topic)
     sys.Core.Bool.xor of Bool (Bag{Bool} $topic)
     sys.Core.Int.sum of Int (Bag{Int} $addends)
     sys.Core.Int.product of Int (Bag{Int} $factors)
     sys.Core.Int.range of Int (Set{Int} $topic)
     sys.Core.Int.median of Set{Int} (Bag{Int} $topic)
     sys.Core.Int.mode of Set{Int} (Bag{Int} $topic)
     sys.Core.Rat.sum of Rat (Bag{Rat} $addends)
     sys.Core.Rat.product of Rat (Bag{Rat} $factors)
     sys.Core.Rat.range of Rat (Set{Rat} $topic)
     sys.Core.Rat.mean of Rat (Bag{Rat} $topic)
     sys.Core.Rat.maybe_mean of Maybe{Rat} (Bag{Rat} $topic)
     sys.Core.Rat.median of Set{Rat} (Bag{Rat} $topic)
     sys.Core.Rat.mean_of_median of Rat (Bag{Rat} $topic)
     sys.Core.Rat.mode of Set{Rat} (Bag{Rat} $topic)
     sys.Core.Blob.catenate of Blob (Seq{Blob} $topic)
     sys.Core.Blob.and of Blob (Set{Blob} $topic)
     sys.Core.Blob.or of Blob (Set{Blob} $topic)
     sys.Core.Blob.xor of Blob (Bag{Blob} $topic)
     sys.Core.Text.catenate of Text (Seq{Text} $topic)
     sys.Core.Tuple.product of Tuple (QuasiSet{Tuple} $topic)
     sys.Core.Relation.union of Relation (Set{Relation} $topic)
     sys.Core.Relation.exclusion of Relation (Bag{Relation} $topic)
     sys.Core.Relation.intersection of Relation (Set{Relation} $topic)
     sys.Core.Relation.join of Relation (QuasiSet{Relation} $topic)
     sys.Core.Relation.product of Relation (QuasiSet{Relation} $topic)

That said, the 2 .catenate are a bit more complicated to summarize, 
needing more or different arguments than a typical unordered rowset, 
due to the order of their arguments being significant.

2.  When you want to do a "GROUP BY" such that you produce M rows 
from N rows, which is the first main step of doing a sub-summary of a 
rowset, then that's what the sys.Core.Relation.group function is for. 
Specifically what 'group' does is take a rowset of N rows and divides 
them into M groups where all the rows in a group have the same 
value/s in a certain subset of their columns (those are the columns 
you are "grouping by" or "grouping per").  It then produces a new 
rowset of M rows which has a new/replacement column where each field 
value is itself a rowset which is the rows of one of the M groups. 
In Perl parlance, it is like turning an array of foo into an array of 
arrays of foo.  But the columns that were grouped by, and whose 
values would be common to all rows in the inner rowsets, remain in 
just the main rowset, with 1 copy per main row.  The function 
sys.Core.Relation.ungroup is the inverse function of group, and would 
result in what 'group' started with given its output.

3.  Following the last point, if you want to do a SQL GROUP BY in 
combination with the use of aggregate operators, you would generally 
first perform a 'group' on the rowset, then a 'summarize' on each of 
the inner rowsets, where you then end up with M rows, some values 
being normal ones that were grouped by, and others being the results 
of aggregate functions on the normal values not grouped by.

4.  It doesn't exist yet, but I anticipate adding a 
'group_then_summarize' or some such function which combines the 
otherwise 2 steps of #3 into one system function call, so you then 
have the most direct analogy to SQL's group-by-and-summarize.

5.  The kind of output you would get from a 'group' by itself, namely 
a rowset of rowsets, is also directly useful in the common scenarios 
where you want to return a bunch of parent rows where each has 
multiple child rows, all in one database query that doesn't have 
parent column redundancies.

6.  You could also use rowsets of rowsets in the process of 
performing a so-called "outer join", but that for the main rows, 
their associated child rowsets would have zero rows when there are no 
corresponding child rows, and they have at least or exactly one row 
each when there are corresponding children.  This could also be the 
actual result of an outer-join, or alternately the query could remove 
the inner rowsets and, where they are empty, place some default or 
generated normal value there instead.  Note that *keeping* an empty 
inner rowset (sometimes called a "Maybe") is Muldis D's analogy to a 
SQL Null for missing/unknown/not-applic data, but that it retains 
2-valued-logic; any "default values" would not be Null, but some real 
value of the column type, such as zero or empty string or 
what-have-you.

---------

At 2:01 AM -0800 11/21/07, Darren Duncan wrote:
>At 2:23 PM -0800 11/20/07, Darren Duncan wrote:
>>At 8:14 PM +0000 11/20/07, Matt S Trout wrote:
>>>There are a few SQL-y operators such as LIKE or BETWEEN that don't seem to
>>>be in there, but I don't see why they can't be in sqla.Core.Str.Between
>>>etc.
>>
>>As for BETWEEN, there is currently no Muldis D operator 
>>specifically for that, though one could be added easily enough; 
>>meanwhile, the likes of "x >= 4 and x <= 7" works, as does "x IN 
>>(4,5,6,7)", assuming integers; the latter form is semijoin().
>
>As an update, for our convenience, I just added a pair of generic 
>range-checking operators to the Muldis D spec in Core.pod;
<snip>
>>As for LIKE, the common case where you just want to see if 
>>something is a substring of something else is covered by 
>>sys.Core.Text.contains(); this has the flexability to specify 
>>at-start/at-end/may-be-middle.  But the less common cases covered 
>>by a generic LIKE aren't in Muldis D yet.
>
>To explain this further, Muldis D currently supports these uses of 
>SQL's LIKE in contains():
<snip>

As a further followup ... and someone can tell me if this level of 
changelogging is too much for the DBIx-Class list ...

Language::MuldisD v0.12.0 is now released on CPAN, and mainly that 
consisted of a bunch of updates to 
http://search.cpan.org/dist/Language-MuldisD/lib/Language/MuldisD/Core.pod 
, both previously reported and new:

1.  Added the 2 operators sys.Core.Ordered.is_(in|out)side_range , 
which correspond to SQL's BETWEEN / NOT BETWEEN.

2.  Renamed the previously mentioned .contains to .is_substr, and 
added a .is_not_substr ; these correspond to simpler versions of 
SQL's LIKE / NOT LIKE.

3.  Made a large overhaul to the rational numeric operators with 
concern to specifying rounding operations, so doing rounding is now 
both simpler and more elegant / generic.  Eg, just 1 general .round 
function with a "rational rounding rule" parameter, rather than 5 
special-purpose round functions.

4.  A few other updates.

I'm just bringing these updates to your attention now in case you 
were looking through the Muldis D spec now following Matt's request, 
and noticed some language deficiencies in the old version that these 
now address.

----------

-- Darren Duncan