[Dbix-class] rfc - how best to do design sorting, collations, etc
Darren Duncan
darren at DarrenDuncan.net
Wed Feb 20 04:38:05 GMT 2008
Hello,
In my work on designing the Muldis D language, one of the biggest
unresolved design problems I'm having is working out features that
involve sorting a set of values for various reasons, including
implementing ordered output of database queries, or implementing
'<'|'>' operators, or min|max|between operators, or implementing
quota queries or windowed queries.
I'm looking for some input on how I might best proceed with getting
these working.
The design needs to have semantics explicitly designed enough that
the features provide the range of actual features or behaviours that
people want to use with a database, that are highly deterministic
while being highly portable, so its easy to predict what any request
would give you and have that be the same in any implementation, and
that is easy to translate both ways semantics intact between Muldis D
and various SQL dialects and various normal programming languages
like Perl et al, and that is easy to use.
In various SQL dialects, it is common practice that when you want to
sort a rowset deterministically, you use an ORDER BY clause that
lists which columns (stored or computed) of the rowset whose values
you are sorting the rows on, and their order of precedence when some
column values compare as equal and others inequal. This has the
advantage of being very terse and generally polymorphic but it
requires that the type of the column's values has some built-in
sorting method to automatically use.
In Perl in the generic case, you sort a list by saying eg "sort {
<binary-order-compare-expr> } @rowset", and the expression would
explicitly invoke whatever behaviour-specific operators you want;
this gives you the most control, but it is more verbose and
potentially less polymorphic.
AFAIK, most generic languages take the Perl approach, where a generic
higher-order sort function takes a binary comparison function as an
argument which determines order of 2 arbitrary list items.
Up to now in Muldis D, I have tried to setup an environment more like
SQL's in that if a data type is marked as being Ordered and defines a
certain fundamental compare operator, then values of that type can be
used in generic sorting or order-compare or quota context without
said using code having to code differently depending on what the data
type is, as per SQL's ORDER BY.
I've had some problems so far in conceiving how to get all that to
work, some of which are related to certain other language design
issues which could potentially be changed, and others which I will
discuss here next.
One main issue is that so-called ordered types may have more than one
desired linear order of values depending on the context, and so the
desired algorithm would have to be specified when asking to sort
values of this type, in order to provide feature flexibility.
For a common example, see text collations; depending on the user,
base characters with particular accents may sort above or below or
beside the same characters with other accents or no accents. (Note
that Muldis D only has a single built-in character repertoire, which
is latest-Unicode, though one can define subtypes of it that just
allow a subset of those characters. Its character strings are also
encoding and normal-form agnostic.)
Now, design and implementation-wise I'm inclined to think that it
would be easiest to adopt the approach typical in non-SQL languages
for Muldis D's order-sensitive operators, where there is explicitly a
different-named one for each data type and you invoke that explicitly
when you want to do a sort or compare or what have you. And of
course, users can define their own operators and have them invoked
here. I see this as a simpler and more flexible design, letting
users say exactly what they want and get it.
But in the general case we don't get the terseness of SQL's "ORDER BY
foo ASC, bar DESC, baz ASC"; instead we say something along this
level of verbosity: "Seq.sort( 'what' => $myrowset, 'how' => [['foo',
'asc', 'Int.cmp'], ['bar', 'desc', 'Text.cmp', 'french'], ['baz',
'asc', 'Date.cmp']] )". Or alternately the user can define their own
MyPkg.foobarbazsort() function and then at use time they simply say
"MyPkg.foobarbazsort( 'what' => $myrowset )".
So one main question for the moment, does it seem okay for you as
potential users that an Ordered role would be eliminated, and that
each applicable type would have their own ones of these operators
rather than sharing same-spelling generic ones: '<', '>', between,
min, max, sort, etc (or alternately just '<' or 'compare' and all the
others become generic higher-order functions taking the prior ones as
arguments?); and that you would invoke each of these directly where
applicable?
Or generally speaking, are there any other advice about dealing with
these matters like collations and such in language design? I'm
looking for something easy and extensible. What do your projects
already do about these?
Barring feedback, I'll probably try eliminating the Ordered role and
do what I said above, and just see how that works.
Thank you. -- Darren Duncan
More information about the DBIx-Class
mailing list