[Dbix-class] help building query w/ ->search() or using arbitrary select?

George Hartzell hartzell at alerce.com
Wed Aug 9 18:59:37 CEST 2006


[Once you wade through the explanation of my problem, schema, etc...,
 this is either a question about fancy use of resultset()->search's
 join attribute or how to hook an arbitrary sql select statement to a
 resultset.  Jump to QUESTION_STARTS_HERE to skip the prelude... ]

I'm working with a postgres database that's a fairly literal
transmogfication of a large (6GB) xml dataset representing biological
data.

The xml model is very generic/flexible and there are a variety of
biological tidbits that have been dumped into it.  I'm currently only
interested in a certain set of the elements.

The XML looks something like this (i've elided various attributes in
the elements that just make things messier and this is in fact
representative of the elements I want to pull out):

 <rn>
  <node>
   <node_attr name='Name' value='aName' />
   <node_attr name='NodeType' value='Protein' />
   <node_attr name='Organism value='Homo sapiens />
  </node>
  <node>
   <node_attr name='Name' value='anotherName' />
   <node_attr name='NodeType' value='Protein' />
   <node_attr name='Organism value='Homo sapiens />
  </node>
  <control>
   <control_attr name='ControlType' value='Regulation' />
  </control>
 </rn>

Generally, an <rn> element can have 0 or more <node>'s, and zero or
more <control>'s.  <node>'s and <control>'s can have zero or more
<{node,control}_attr> elements.

The simplified schema looks like this:

   create table control (id serial not null,
                         rn int,
                         PRIMARY KEY (id));
   
   create table control_attr (id serial not null,
                              name text not null,
                              value text not null,
                              control int not null,
                              PRIMARY KEY (id));
   
   create table node (id serial not null,
                      rn int,
                      PRIMARY KEY (id));
   
   create table node_attr (id serial not null,
                           name text not null,
                           value text not null,
                           node int not null,
                           PRIMARY KEY (id));
   
   create table rn (id serial,
                    PRIMARY KEY (id));

Once I have the data loaded, I also run a query across it that counts
the number of nodes of each NodeType that a <rn> has, which simplifies
some of the queries that I want to do (e.g. I'm only interested in
<rn>'s with two Protein nodes).  It looks like this:

   create table rncbt (rn_id integer,
                       value text,
                       count bigint);

I also have a pair of immutable functions that count the total number
of <node>'s and <control>'s that an <rn> has, they're created and
indexed once the data is loaded.
   
I have a simple set of DBIx::Class's that tie this all together with
has_many and belongs_to relationships and it works like a charm.
Phew!

I've figured out how to dig all kinds of crazy things out of the
dataset using hand crafted sql via psql and using my DBIx::Class
objects programmatically .  Again, phew!

[QUESTION_STARTS_HERE]

At the moment, I'm trying to get access to a particular subset of the
<rn>'s.  The <rn>'s that I'm searching for:

   - have exactly two <nodes>'s, both of which have <node_attr>'s with
     name='NodeType' and value='Protein

   - have a single <control>

This search seems to find them.  It knows to join to the rncbt table
using the <rn>'s id, and things seem to work:

   $rs = $s->resultset('RN')->search({
                                      'node_count(me.id)' => 2,
                                      'control_count(me.id)' => 1,
                                      'rncbts.value' => 'Protein',
                                      'rncbts.count' => 2,
                                     },
                                     {
                                      join => 'rncbts',
                                     },
                                    );

What I want is actually fancier.  Not only do I want to constrain an
<rn> to have exactly two <node>'s that have <node_attr>'s that say
they're 'Protein', but I want both of those nodes to have a
<node_attr> with name='Organism' and value='Homo sapiens'.  In reality
I want to add some additional constraints on the <node>'s based on
their <node_attr>'s, but this get's the point across.

I think that I understand that I can do something like this:

  { join => {nodes => attrs}}

to get at <node_attr>'s that belong_to <node>'s that belong_to an
<rn>.  But at this point I no longer know how to get the rn and rncbt
join that I achieved in the example above with the simple:

  { join => 'rncbts' }

Is there a way to do both?

And even goofier, I'm working with pairs of nodes and contraining on
their node_attr's, so I think I need to do this:

  { join => {nodes => attrs,
             nodes => attrs}}

which I think should give me attrs and attrs_2 that belong to node and
node_2 that belong_to <rn>???

Alternatively, this sql does what I've described (and more, it also
requires a <control> with a <control_attr> named ControlType).

  select distinct rn.id
  from 
    rn, 
    node as n1, 
    node as n2, 
    node_attr as na1,
    node_attr as na2,
    node_attr as na3,
    node_attr as na4,
    control as c1,
    control_attr as ca1
  where
    node_count(rn.id) = 2 and
    control_count(rn.id) = 1 and
    n1.rn = rn.id and
    n2.rn = rn.id and
    n1.id != n2.id and
    na1.node = n1.id and
    na2.node = n1.id and
    na3.node = n2.id and
    na4.node = n2.id and
    na1.name = 'NodeType' and
    na1.value = 'Protein' and
    na2.name = 'Organism' and
    na2.value = 'Homo sapiens' and
    na3.name = 'NodeType' and
    na3.value = 'Protein' and
    na4.name = 'Organism' and
    na4.value = 'Homo sapiens' and
    c1.rn = rn.id and
    ca1.control = c1.id and
    ca1.name = 'ControlType';

Is there a way to declare an arbitrary sql select statement and hook
the results up to my rn object?

If you've made it this far, THANK YOU.

Any insights would be welcome!

g. 



More information about the Dbix-class mailing list