New features of DB and dsql planned for ATK 4.0

Wednesday, June 9th, 2010|Brainstorming, Version 4|by Romans

Based on our current usage and user feedback we are preparing to start development on a new database layer. Thanks to new cool features in PHP (PDO) it’s going to be real easy. However we plan to add some exciting new features such as:

  • Rely on PDO, DSN format will change
  • Single-table dsql. You can still use joins, which will create multiple dsql instances and make them work together seamlessly.
  • Automatic alias management.
  • Database driver decides when to fetch data and how.
  • Clone-friendly
  • Multi-table update() and insert(). MySQL have some support for multi-table update, but this feature will take it to a whole new level.
  • Support for simple storage such as SimpleDB. Database engine will allow to compensate for missing [R]DB features.
  • dsql Implementation of Iterator, Countable, Seekable and more for nice syntactic sugar.
  • Much stricter towards field use. Do not allow developers to slip in any SQL logic.
  • Support for expression objects. Similar to $this->js(), this will allow you to build sql operators and functions dynamically
  • Proper support for sub-selects. Sub-select being another dsql query.
  • Better debugging
  • Automated tests
  • Transaction support.

I’m sure some of those features are really nice, while others might be not welcome. This is your chance to vote for which features you want to see and which you think are useless. But first of all – the whole controller have been rewritten from scratch. We do however plan to maintain API compatibility.

PDO support

PDO is a object layer enabled by default since PHP5.1+. There however will be way to add your own database drivers.

Sample config line, which is by the way now optional. DB driver will attempt to connect to localhost and use realm as database by default:

$config['db']['dsn']=’mysql:host=127.0.0.1;dbname=agileweb’;

PDO also comes with a great support for prepared queries. Those are used absolutely transparently so you as developer don’t need to worry about those at all.

Single-table dsql

Current (3.8) implementation of dsql would store all the joins in a array property and it wouldn’t care about the actual tables and fields. New implementation creates one object for each table and all objects are working in conjunction to build queries.

But following this approach, we can now reliably track which tables participate in the query. Example:

[php]
$q=$this->api->db->dsql();
$q->table(‘employee’)
->field(‘name’)
->field(‘surname’)
->join(‘job’)
->field(‘job.salary’)
;
[/php]

In this example  we implement a simple query across two tables. The following will be executed:

SELECT name,surname,job.salary FROM employee INNER JOIN job ON employee.job_id=job.id;

However what you seen here is compatibility form and you can be in problem if you use same table several times. There is a way to do it more flexibly however.

Automatic alias management

When you work with multiple tables and large selects, you often will face problem of table aliases. Here is how new dsql solves that:

[php]
$q=$this->api->db->dsql();
$q->table(‘employee’)
->field(‘name’)
->field(‘surname’)
->addJoin(‘employee’,'boss_id’)
->field(‘name’)
->getJoiner()
->where(‘status’,1)
;
[/php]

This syntax will produce the following query:

SELECT employee.name,employee.surname,employee_2.name name_2 FROM employee INNER JOIN employee employee_2 ON employee.boss_id=employee_2.id WHERE  employee.status=1;

This gets even more attractive when used in automated building of dsql such as Models. You can use “addField” instead of field() which will return the exact alias which will be used for this field in the output. So a easy-scenario for Model implementing – when you join two models, you simply call “addJoin” which will produce new dsql() which you can pass to the new model which can set it’s fields and conditions and will remember field aliases if it will expect any data return.

Just-on-time Fetches

With new dsql() you don’t need to worry about do_select() and fetching rows, etc. Depending on the database a full data can be pre-fetched using one or multiple selects and your code does not need to know anything about it. Here is how you use data:

[php]

foreach($q as $id=>$row){  echo "$id: ".$row['name']."\n" };

[/php]

You can even call for-each twice in the row. If database driver does not support joins, then main table will be queried first, compose element of sub-tables and query them after. It will then combine results in such a way so you don’t notice any difference. This logic however is not included in core of DB, but is rather implemented in extended database driver.

Some other cool thins you can do with aliases is removing what you have added. See example.

[php]
foreach($q as $id=>$row){
 $new_name=secretFormula($row['name']);
$cond=$q->addWhere(‘id’,$id);
$q->set(‘name’,$new_name)->do_update();
$q->removeWhere($cond);
};
[/php]

However this example can be also implemented with cloning.

Cloning

Like other objects in ATK4, dsql can be safely cloned. Cloning is important when you want to preserve all the conditions, joins and other settings of original query to perform a different type of query.

[php]
<foreach($q as $id=>$row){
 $new_name=secretFormula($row['name']);
$upd_q=clone $q;
$upd_q->where(‘id’,$id)->set(‘name’,$new_name)->do_update();
};
[/php]

Multi-table update and insert

When you are operating with multiple tables, sometimes your Models can span across multiple tables. Let’s say you have a “person” table and “address” table and both are joined with “inner” join. What should happen if you use set() for both tables and call insert()

New dsql is smart enough to insert a record into address first (it will also convert where clauses into set clauses) and then set data on person table.

[we are not sure if we will implement this fully or not, depending on our Model/Entity implementation]

SimpleDB, S3 and other simple-storage support

In some situations regular RDB become inefficient and it’s required to deal with data storage which does not support relations. Amazon AWS’s service SimpleDB is exactly this. It operates using standard SQL but it have a very lightweight functionality.

Mixing and matching different drivers

With our unique dsql-per-table implementation, we no longer require for query to be inside one database even one driver. So our main dsql can be a MySQL based table, but we can join it with dsql from other driver.

[php]
$data=$this->api->db->dsql()
->table(‘files’)
->field(‘filename’)
->addJoin($this->api->s3->dsql($node), ‘filename’)
->field(‘content’)
->getJoiner();</pre>
<pre>foreach($data as $row){
saveIntoFile($row['filename'],$row['content']);
}
[/php]

This implementation can help you easily relocate some part of your data between different storage devices. Now you can use dsql syntax to save your image files into local folder and tie them in with table at the same time.

Implementation of interfaces

Interfaces allow to implement really cool things. Consider this syntax:

$q[$id]->set(‘name’,'John’)->set(‘surname’,'Smith’);

or

$data=$q[$id]->getAllData();

Implementation behind this is surprisingly simple.

[php]
public function offsetGet($id){
$c=clone $this;
return $c
->where(‘id’,$id)
->select()
->autoUpdate()
;
}
[/php]

Of course few more checks will be in place.

Reliance on table.id fields

You should have noticed by now, how we rely on “id” fields to present in all tables. I am not sure if this should be enforced or configurable. In over 10 years I haven’t seen a case where table couldn’t have id column. I mean – people do design tables with weird keys or without keys at all, but only as design decision and not limitation.

Much stricter towards validated SQL

We have been relying on mysql quoting in $dq->where(‘id’,$_GET['id']) for a long time. However in some development approaches people need to sneak in custom code into the query. $dq->where(‘id=’.$myvar); is no more secure and will not do any quotes anymore.

New dsql implementation will not allow this syntax anymore. Arguments should be what they are and this way we eliminate possibility of developer to sneak data through it. We on other hand should provide a better argument support. You have seen join() syntax:

function join($table_or_dq, $our_field=null, $their_field=null, $join_type=’inner’);

We can almost always figure out what’s their field is – “id”. And our field is simple to figure out by $table.’_id’), this is true in many database designs. But even if you do

$dq->join($_GET['join'], $_GET['ourfield'], $_GET['theirfield'], $_GET['type']) – it would be safe with the dsql doing validations and checks on the arguments.

Conditions and Expressions

Another weak point where developers of 3.8 would want to inject some SQL code was where() clause. New implementation of where() takes very radical approach. Here are some samples:

  • ->where(‘myname’,'John’);    // compatibility. First argument can be field. (WHERE myname=:arg)
  • ->where(‘=’,$q->getField(‘myname’),’John’);    // first argument can be operator, but you’ll need to wrap fields into objects returned by getField()
  • ->where(‘or’,$q->expr(‘myname’,'John’),$q->expr(‘!=’,$q->getField(‘myname’),’Peter’)); // (WHERE myname=:arg1 OR myname != :arg2)
  • ->where(‘in’,$q->getField(‘age’),array(1,2,3,4))
  • expressions can also be passed to set()
    • ->set(‘ts’,$q->expr(‘now’));
    • ->set(‘password’,$q->expr(‘md5′,$mypass));

If you feel that this implementation could pose a security concern, tell me about the case.

Sub-selects

Expressions are compatible with other dsql objects, for instance:

  • $q->where(‘status’,$this->api->db->dsql()->table(‘statuses’)->field(‘id’)->where(‘active’,'Y’)); // WHERE status in (SELECT id from statuses where active=:arg)
  • However to avoid alias clashes and add ability to cross-reference fields, use this syntax:
  • $q->where(‘sattus’,$q->addSubSelect()->table(‘statuses’)->field(‘id’)->where(‘active’,'Y’));
  • $q->table(‘employee’)->field(‘name’)->field(
    • ($q2=$q->addSubSelect())->table(‘salaries’)->field($q2->expr(‘max’,$q2->getField(‘salary’)))->where($q2->getField(‘employee_id’),$q->getField(‘id’))
    • ,’max_salary’);
    • Result: SELECT employee.name,(SELECT MAX(salaries.salary) from salary WHERE salary.employee_id=employee.id) max_salary from employee;

If you are getting scared by syntax, note that you don’t need to do it in one line, you can have several lines to build expressions and join them together.

Bottom-line

The new approach might seem more complex, but it’s designed to work best with Model structure, where developer would hardly even use dsql. All this functionality is still in a planning / prototype state, so any feedback would be very appreciated.