[Dbix-class] DBIx::Class field metadata and validation

Dan Kubb dan.kubb at autopilotmarketing.com
Fri Jul 29 22:30:12 CEST 2005


I've got an idea on how DBIx::Class could handle field metadata
and validation and I wanted to get some input on it -- I'm not
really sure what the best approach is, but I wanted to spark some
discussion anyway.

Some of frustrations I have with Class::DBI seem to stem from
the fact that field's metadata isn't stored in a organized way.
For example there are hacks like accessor_name() that map the
field name onto a database column name.

At the same time there's no easy way for me to attach other
metadata to a specific field, like specifying a default value,
or the data type, or some other constraints.

To try and solve this I've hacked up some basic code thats not
quite ready yet for posting to the list, but I wanted to outline
my approach to handling field metadata anyway.. I'd love to get
some feedback before I get too deeply into this:

   - Each field is put in its own unique class with the naming
     convention: $table_class::$field_name, so if I have a table
     called "contact" specified in the class Local::Model::Contact,
     the name_first field would be defined in the class
     Local::Model::Contact::name_first

   - Field classess can inherit from one or more classes that
     define the metadata and validation rules for a specific type
     of data.  For example, there might be a Type class for strings,
     which provides properties like length_min, length_max, format
     (for a regex the column values must match), etc.

   - The Type classes use Class::Data::Inheritable to specify the
     properties they provide.

   - A Field class is defined like so:

       package Local::Model::Contact::name_first;

       use base qw/
         DBIx::Class:::Type::database_column
         DBIx::Class:::Type::string
       /;

       __PACKAGE__->name('name_first');

       __PACKAGE__->length_min(1);
       __PACKAGE__->length_max(50);

       __PACKAGE__->allow_null(0);

       __PACKAGE__->table('Local::Model::Contact');
       __PACKAGE__->database_column_name('name_first');

       1;

   - Normally typing all this in would be really tedious, so I'm
     using an approach similar to Sebastian's Class::DBI::Loader
     where I iterate over all the tables in a database, and create
     the code like the above for each column.  This would be
     optional of course.

   - You're free to add further properties for things that cannot
     be inferred from the database in your model code, such as
     regex patterns that must match, or a human readable label.
     You could also specify inflate/deflate routines for the
     column here as well.

   - Each of the DBIx::Class::Type::* classes has a validate()
     method that checks the data type. The string class'
     validate() method looks like this:

       sub validate : method {
         my $self = shift;

         return
           $self->__validate_is_string,
           $self->__validate_with_length_min,
           $self->__validate_with_length_max,
           $self->__validate_with_allowed_chars,
           $self->__validate_with_disallowed_chars,
           $self->__validate_with_format,
           $self->__validate_with_encoding,
           $self->NEXT::DISTINCT::validate;
       }

     If a validation rule fails, it returns an array of hashrefs
     of errors. The hashrefs contain a name, a message, and
     possibly some data to help explain what failed the validation
     rule if it can't be found in another way.

     I toyed with the idea of throwing an exception when a rule
     fails, but I would rather have all the rules execute and know
     everything thats wrong with the data in one go.  What to do
     with the errors is punted to the caller. ;)

   - Fields don't necessarily need to be columns in a database;
     but they can have most of the same properties.

   - Foreign key fields can inherit from the primary key field
     of the table they link to to show the relationship between
     them.

Here's the DBIx::Class::Type::.* classes I've made so far and
the properties each provides:

   base
     name                 - the method name to use to access the field
     label                - the human-readable label (for GUIs mostly)
     description          - a description (for documentation purposes)
     default              - the default value to use if undef is  
supplied
     allowed_values       - a list of values that are allowed
     disallowed_values    - a list of values that are not allowed
     callback             - a list subrefs (or method names) to use  
in checking the value
     allow_null           - a flag that says if the field can be  
undefined
     read_only            - a flag that says if the value is read-only

   string
     length_min           - minimum length of the string
     length_max           - maximum length of the string
     alllowed_chars       - a list of characters allowed
     disallowed_chars     - a list of characters not allowed
     format               - a regex to match
     encoding             - character encoding that must match

   numeric
     range_min            - the smallest the number can be
     range_max            - the largest the number can be
     fractional_min       - the smallest length the fractional part  
of the number can be
     fractional_max       - the largest length the fractional part of  
the number can be

   database_column
     table                - the name of the table the field belongs to
     database_column_name - the name of the column inside the table
     inflate              - the subref (or method name) to execute to  
inflate the value
     deflate              - the subref (or method name) to execute to  
deflate the value

   object
     roles                - names of the methods the object value  
must have
     classes              - names of the classes the object must  
inherit from

So far most of this is working great. Of course it really doesn't
do much on its own, but it provides a lot of information that
DBIx::Class (and other classes) can use.

The one thing I'm really not sure about is that I've made it so
each column value is a blessed scalar in its Field class. That
way the data is stored, but I can easily get at the property
values if there are needed.  I'm concerned about performance and
memory usage though, although it allows you to do nice things
like this once an object is instantiated:

   # get the maximum length for the field
   my $length_max = $obj->name_first->length_max;

Which you could use in an TT template like so:

   <input type="text" name="contact.name_first" value="[%  
contact.name_first %]" maxlength="[% contact.name_first.length_max  
%]" />

Scalar objects are pretty light-weight so I'm not sure it would
make much of a difference either way, but I do like the idea of
keeping all the properties close to the column. Using a simple
AUTOLOAD would allow pass-through to underlying object calls as
long as there wasn't a collision with the method names.
(NOTE: I haven't quite gotten to working with objects yet).

Worst case though, as long as the table knew the class names for
each column, it could use the validation methods on its own. I
could handle not making the object a blessed Scalar; but I do
like the syntax it allows you ;)

Here's a few nice side effects I can think of with this system:

Default values:

We could know what the default value of a field should be without
having to create a corresponding record in the database first.

Query optimization:

If someone performs a search, where the string 'foo' is used for
an integer column, we can skip going to the database and just
return no results, since we know there can't be any.

Likewise if a character column has to match a regex like
qr/\A[A-Z]+\z/, and must be beween 5 and 10 characters in length,
and the supplied value is either "BAR" or "foobar" we can also
return immediately.

Also, if a field is NOT NULL, but undef is supplied for the
value, then we can return immediately.

Database Table Creation:

If tables and columns can be described in a rich enough way, then
it should be possible to make CREATE TABLE statements based on
the descriptions of the columns in the perl code. You should be
able to just change the DSN and rerun a script to re-create
everything with a different database engine.

This will require a way of richly describing things at the table
level, but I think it is do-able.

Simplified code:

I'm still getting my head wrapped around DBIx::Class, but I'm
pretty sure that when (if) I refactor DBIx::Class to use this
module some things will become simpler in the code.

Filters:

Once there is a way of handling things at the column level, it
should be fairly easy to make something that pipes the data
through a chain of filters that can remove leading/trailing
whitespace, or properly case words.

Anyway, thats all for now...  The main point I want to get across
is that the level of detail this allows should be optional.  I
think I can derive the critical information from interface
DBIx::Class provides right now.  This just provides a way of
going deeper and specifying the properties each field should
have at a more granular level.

Sorry for the length of this post, I wanted to make sure I
described everything properly.. comments and suggestions are
welcomed and appreciated.

--

Thanks,

Dan
__________________________________________________________________

Dan Kubb                  Email: dan.kubb at autopilotmarketing.com
Autopilot Marketing Inc.  Phone: 1 (604) 820-0212
                             Web: http://www.autopilotmarketing.com
__________________________________________________________________






More information about the Dbix-class mailing list