[Dbix-class] rfc, treating NULL as a value like any other

Sun Jan 27 01:28:00 GMT 2008

Hello,

This email is primarily concerned with the possible future internals 
of DBIx-Class with respect to an intermediate AST based on Muldis D 
that you can convert to/from SQL, but this email is secondarily 
concerned with a possible future enhancement to the public interface 
of DBIx-Class.

In a nutshell, one issue that often trips up new (and experienced) 
SQL users is its 3-valued logic such that you can't just say "foo = 
NULL" to match null values, requiring "IS NULL" instead, or that both 
"foo = 3" and "foo <> 3" will not match foo that are NULL.  And AFAIK 
in some situations SQL treats 2 NULL as matching and other times 
non-matching.

My question for you is whether, from a typical or new or experienced 
user's point of view, it would be preferable for the DBMS to treat 
NULL like an actual single value, which is equal to itself while not 
equal to every other normal value that is not NULL?  Consequences of 
this include that "foo = NULL" will DWIM (result in true when foo is 
NULL), and "foo <> 3" will be true when foo is NULL (foo is anything 
but 3).

(Other consequences are that null-allowing columns with a 
single-column unique key constraint will allow a NULL value to appear 
exactly once in the column.  In a naive case, a foreign key 
relationship would then match a NULL in one table with one in 
another, or require one in the parent.  Though if this isn't desired, 
then the foreign key constraint could be defined to only apply to 
child table values that do not equal NULL.  Both of which I think SQL 
already does and that doesn't necessarily need to change.)

Regardless of whether any updates to the DBIx-Class public interface 
are made, I believe it is preferrable to think in terms of NULL = 
NULL and structure the AST along those principles, so you get more 
AST simplicity or DWIM when you want that to be the case, while you 
may or may not get more AST complexity when you don't want that to be 
the case.  Or regardless of whether that happens, the AST is simpler 
when dealing with not-nullable data, and marginally more complex when 
dealing with nullable data.

The latest release 0.18.0 of the Muldis D spec, 
http://search.cpan.org/dist/Language-MuldisD/ , clarifies that any 2 
table ("relation") values are identical when they contain zero rows 
("tuples") and they have the same column ("attribute") names, 
regardless of the declared types of those columns.  Also, new 
subtypes Nothing,Just of Maybe were added.  Now, Muldis D is 
optimized towards not-nullable data, so in SQL terms, if you simply 
say "foo Int" as a field type, that is implicitly not nullable, and 
you have to explicitly say nullable to make it so, as "foo 
Maybe{Int}".  Defining a field nullable is actually defining the 
field to contain a table value whose sole column name is 'value' and 
which may contain only zero rows or 1 row; if it contains zero, it is 
null, if it contains 1 row, namely the normal value, it is not null. 
You could say that not-nullable means unboxed while nullable means 
boxed.  But those are details that users typically don't have to know 
about.  The single Nothing value matches the empty case, and Just 
constructs a full case.  Testing a not nullable field is simple as 
"foo = 3" or "foo <> 3", while testing a nullable is "foo = Nothing" 
(SQL "foo IS NULL") "foo = Just(3)" (SQL "foo = 3") or "foo <> 
Just(3)" (SQL "foo <> 3 OR foo IS NULL") "foo <> Just(3) and foo <> 
Nothing" (SQL "foo <> 3").  In this system, "Nothing = Nothing" is 
true, "Nothing = 3" and "Nothing = Just(3)" are both false, "Nothing 
<> 3" and "Nothing <> Just(3)" are both true, "3 = Just(3)" is false, 
"Just(3) = Just(3)" is true, "3 = Just(3).attr" is true.  To do math 
with not-nullable, you just say eg "row.x / row.y", with nullable but 
known not null you say "row.x.attr / row.y.attr".

So, assuming it is clearly documented, do you forsee any problems in 
using an AST that is 2-valued-logic and assumes NULL = NULL, where 
you have to explicitly say when you don't want that behaviour?  It 
should still be possible to generate SQL with the desired semantics, 
so this is more a useability question.

Thank you. -- Darren Duncan