[Dbix-class] SQLA refactor: proposal
Darren Duncan
darren at DarrenDuncan.net
Sat Nov 24 01:09:53 GMT 2007
Two main things for this email, separated by the dashed lines.
---------
I'll now briefly introduce the Muldis D analogues of a few more SQL
expressions, all related to row grouping in some way (and remember
that Muldis D supports rowset field values that are themselves
rowsets).
Sorry, no code examples this time, just descriptions.
1. When you want to use aggregate operators like sum/avg/min/max/etc
on an entire rowset such that you want to produce exactly 1 result
row, then that's what the sys.Core.Relation.summarize function is
for. What 'summarize' does is create a new row (sort of like
'extend' does), where each named field is the result of some N-ary
function that uses one column of a source rowset as input.
FYI, Muldis D currently provides all of these N-ary (0..N operand)
functions, any of which can be used in a 'summarize':
sys.Core.Ordered.min of Some.Ordered (Set{Some.Ordered} $topic)
sys.Core.Ordered.max of Some.Ordered (Set{Some.Ordered} $topic)
sys.Core.Ordered.maybe_min of Maybe{Some.Ordered}
(Set{Some.Ordered} $topic)
sys.Core.Ordered.maybe_max of Maybe{Some.Ordered}
(Set{Some.Ordered} $topic)
sys.Core.Bool.and of Bool (Set{Bool} $topic)
sys.Core.Bool.or of Bool (Set{Bool} $topic)
sys.Core.Bool.xor of Bool (Bag{Bool} $topic)
sys.Core.Int.sum of Int (Bag{Int} $addends)
sys.Core.Int.product of Int (Bag{Int} $factors)
sys.Core.Int.range of Int (Set{Int} $topic)
sys.Core.Int.median of Set{Int} (Bag{Int} $topic)
sys.Core.Int.mode of Set{Int} (Bag{Int} $topic)
sys.Core.Rat.sum of Rat (Bag{Rat} $addends)
sys.Core.Rat.product of Rat (Bag{Rat} $factors)
sys.Core.Rat.range of Rat (Set{Rat} $topic)
sys.Core.Rat.mean of Rat (Bag{Rat} $topic)
sys.Core.Rat.maybe_mean of Maybe{Rat} (Bag{Rat} $topic)
sys.Core.Rat.median of Set{Rat} (Bag{Rat} $topic)
sys.Core.Rat.mean_of_median of Rat (Bag{Rat} $topic)
sys.Core.Rat.mode of Set{Rat} (Bag{Rat} $topic)
sys.Core.Blob.catenate of Blob (Seq{Blob} $topic)
sys.Core.Blob.and of Blob (Set{Blob} $topic)
sys.Core.Blob.or of Blob (Set{Blob} $topic)
sys.Core.Blob.xor of Blob (Bag{Blob} $topic)
sys.Core.Text.catenate of Text (Seq{Text} $topic)
sys.Core.Tuple.product of Tuple (QuasiSet{Tuple} $topic)
sys.Core.Relation.union of Relation (Set{Relation} $topic)
sys.Core.Relation.exclusion of Relation (Bag{Relation} $topic)
sys.Core.Relation.intersection of Relation (Set{Relation} $topic)
sys.Core.Relation.join of Relation (QuasiSet{Relation} $topic)
sys.Core.Relation.product of Relation (QuasiSet{Relation} $topic)
That said, the 2 .catenate are a bit more complicated to summarize,
needing more or different arguments than a typical unordered rowset,
due to the order of their arguments being significant.
2. When you want to do a "GROUP BY" such that you produce M rows
from N rows, which is the first main step of doing a sub-summary of a
rowset, then that's what the sys.Core.Relation.group function is for.
Specifically what 'group' does is take a rowset of N rows and divides
them into M groups where all the rows in a group have the same
value/s in a certain subset of their columns (those are the columns
you are "grouping by" or "grouping per"). It then produces a new
rowset of M rows which has a new/replacement column where each field
value is itself a rowset which is the rows of one of the M groups.
In Perl parlance, it is like turning an array of foo into an array of
arrays of foo. But the columns that were grouped by, and whose
values would be common to all rows in the inner rowsets, remain in
just the main rowset, with 1 copy per main row. The function
sys.Core.Relation.ungroup is the inverse function of group, and would
result in what 'group' started with given its output.
3. Following the last point, if you want to do a SQL GROUP BY in
combination with the use of aggregate operators, you would generally
first perform a 'group' on the rowset, then a 'summarize' on each of
the inner rowsets, where you then end up with M rows, some values
being normal ones that were grouped by, and others being the results
of aggregate functions on the normal values not grouped by.
4. It doesn't exist yet, but I anticipate adding a
'group_then_summarize' or some such function which combines the
otherwise 2 steps of #3 into one system function call, so you then
have the most direct analogy to SQL's group-by-and-summarize.
5. The kind of output you would get from a 'group' by itself, namely
a rowset of rowsets, is also directly useful in the common scenarios
where you want to return a bunch of parent rows where each has
multiple child rows, all in one database query that doesn't have
parent column redundancies.
6. You could also use rowsets of rowsets in the process of
performing a so-called "outer join", but that for the main rows,
their associated child rowsets would have zero rows when there are no
corresponding child rows, and they have at least or exactly one row
each when there are corresponding children. This could also be the
actual result of an outer-join, or alternately the query could remove
the inner rowsets and, where they are empty, place some default or
generated normal value there instead. Note that *keeping* an empty
inner rowset (sometimes called a "Maybe") is Muldis D's analogy to a
SQL Null for missing/unknown/not-applic data, but that it retains
2-valued-logic; any "default values" would not be Null, but some real
value of the column type, such as zero or empty string or
what-have-you.
---------
At 2:01 AM -0800 11/21/07, Darren Duncan wrote:
>At 2:23 PM -0800 11/20/07, Darren Duncan wrote:
>>At 8:14 PM +0000 11/20/07, Matt S Trout wrote:
>>>There are a few SQL-y operators such as LIKE or BETWEEN that don't seem to
>>>be in there, but I don't see why they can't be in sqla.Core.Str.Between
>>>etc.
>>
>>As for BETWEEN, there is currently no Muldis D operator
>>specifically for that, though one could be added easily enough;
>>meanwhile, the likes of "x >= 4 and x <= 7" works, as does "x IN
>>(4,5,6,7)", assuming integers; the latter form is semijoin().
>
>As an update, for our convenience, I just added a pair of generic
>range-checking operators to the Muldis D spec in Core.pod;
<snip>
>>As for LIKE, the common case where you just want to see if
>>something is a substring of something else is covered by
>>sys.Core.Text.contains(); this has the flexability to specify
>>at-start/at-end/may-be-middle. But the less common cases covered
>>by a generic LIKE aren't in Muldis D yet.
>
>To explain this further, Muldis D currently supports these uses of
>SQL's LIKE in contains():
<snip>
As a further followup ... and someone can tell me if this level of
changelogging is too much for the DBIx-Class list ...
Language::MuldisD v0.12.0 is now released on CPAN, and mainly that
consisted of a bunch of updates to
http://search.cpan.org/dist/Language-MuldisD/lib/Language/MuldisD/Core.pod
, both previously reported and new:
1. Added the 2 operators sys.Core.Ordered.is_(in|out)side_range ,
which correspond to SQL's BETWEEN / NOT BETWEEN.
2. Renamed the previously mentioned .contains to .is_substr, and
added a .is_not_substr ; these correspond to simpler versions of
SQL's LIKE / NOT LIKE.
3. Made a large overhaul to the rational numeric operators with
concern to specifying rounding operations, so doing rounding is now
both simpler and more elegant / generic. Eg, just 1 general .round
function with a "rational rounding rule" parameter, rather than 5
special-purpose round functions.
4. A few other updates.
I'm just bringing these updates to your attention now in case you
were looking through the Muldis D spec now following Matt's request,
and noticed some language deficiencies in the old version that these
now address.
----------
-- Darren Duncan
More information about the DBIx-Class
mailing list