[Dbix-class] rfc - how best to do design sorting, collations, etc

Wed Feb 20 04:38:05 GMT 2008

Hello,

In my work on designing the Muldis D language, one of the biggest 
unresolved design problems I'm having is working out features that 
involve sorting a set of values for various reasons, including 
implementing ordered output of database queries, or implementing 
'<'|'>' operators, or min|max|between operators, or implementing 
quota queries or windowed queries.

I'm looking for some input on how I might best proceed with getting 
these working.

The design needs to have semantics explicitly designed enough that 
the features provide the range of actual features or behaviours that 
people want to use with a database, that are highly deterministic 
while being highly portable, so its easy to predict what any request 
would give you and have that be the same in any implementation, and 
that is easy to translate both ways semantics intact between Muldis D 
and various SQL dialects and various normal programming languages 
like Perl et al, and that is easy to use.

In various SQL dialects, it is common practice that when you want to 
sort a rowset deterministically, you use an ORDER BY clause that 
lists which columns (stored or computed) of the rowset whose values 
you are sorting the rows on, and their order of precedence when some 
column values compare as equal and others inequal.  This has the 
advantage of being very terse and generally polymorphic but it 
requires that the type of the column's values has some built-in 
sorting method to automatically use.

In Perl in the generic case, you sort a list by saying eg "sort { 
<binary-order-compare-expr> } @rowset", and the expression would 
explicitly invoke whatever behaviour-specific operators you want; 
this gives you the most control, but it is more verbose and 
potentially less polymorphic.

AFAIK, most generic languages take the Perl approach, where a generic 
higher-order sort function takes a binary comparison function as an 
argument which determines order of 2 arbitrary list items.

Up to now in Muldis D, I have tried to setup an environment more like 
SQL's in that if a data type is marked as being Ordered and defines a 
certain fundamental compare operator, then values of that type can be 
used in generic sorting or order-compare or quota context without 
said using code having to code differently depending on what the data 
type is, as per SQL's ORDER BY.

I've had some problems so far in conceiving how to get all that to 
work, some of which are related to certain other language design 
issues which could potentially be changed, and others which I will 
discuss here next.

One main issue is that so-called ordered types may have more than one 
desired linear order of values depending on the context, and so the 
desired algorithm would have to be specified when asking to sort 
values of this type, in order to provide feature flexibility.

For a common example, see text collations; depending on the user, 
base characters with particular accents may sort above or below or 
beside the same characters with other accents or no accents.  (Note 
that Muldis D only has a single built-in character repertoire, which 
is latest-Unicode, though one can define subtypes of it that just 
allow a subset of those characters.  Its character strings are also 
encoding and normal-form agnostic.)

Now, design and implementation-wise I'm inclined to think that it 
would be easiest to adopt the approach typical in non-SQL languages 
for Muldis D's order-sensitive operators, where there is explicitly a 
different-named one for each data type and you invoke that explicitly 
when you want to do a sort or compare or what have you.  And of 
course, users can define their own operators and have them invoked 
here.  I see this as a simpler and more flexible design, letting 
users say exactly what they want and get it.

But in the general case we don't get the terseness of SQL's "ORDER BY 
foo ASC, bar DESC, baz ASC"; instead we say something along this 
level of verbosity: "Seq.sort( 'what' => $myrowset, 'how' => [['foo', 
'asc', 'Int.cmp'], ['bar', 'desc', 'Text.cmp', 'french'], ['baz', 
'asc', 'Date.cmp']] )".  Or alternately the user can define their own 
MyPkg.foobarbazsort() function and then at use time they simply say 
"MyPkg.foobarbazsort( 'what' => $myrowset )".

So one main question for the moment, does it seem okay for you as 
potential users that an Ordered role would be eliminated, and that 
each applicable type would have their own ones of these operators 
rather than sharing same-spelling generic ones: '<', '>', between, 
min, max, sort, etc (or alternately just '<' or 'compare' and all the 
others become generic higher-order functions taking the prior ones as 
arguments?); and that you would invoke each of these directly where 
applicable?

Or generally speaking, are there any other advice about dealing with 
these matters like collations and such in language design?  I'm 
looking for something easy and extensible.  What do your projects 
already do about these?

Barring feedback, I'll probably try eliminating the Ordered role and 
do what I said above, and just see how that works.

Thank you. -- Darren Duncan