Why extending Agile Data `Model` class is OK

This article is an answer to the question: "Is it OK that my Entities extend some 3rd party class?"

Consider this beautiful PHP code:

class Book {
    protected $title;
    protected $is_read = false;

    function makeRead() {
        $this->is_read = true;
    }
}

Isn't it nice when you can load a book from database just like this:

$book = BookStore::loadBy('title', 'How to PHP');
$book->makeRead();

The reality, however, is riddled with practical problems and, the approach above, simply does not always work.

I have looked through various PHP ORM and non-PHP persistence mappers during some extensive research. In this article, I wanted to share my findings and some of the conclusions I've made for my unique solution to this problem in Agile Data persistence framework. Hopefully I will be able to answer why some frameworks ask your Book class to extend Model/Entity class and others don't.

Origin of Persistence Frameworks

Persistence frameworks come a long way, but they have originated from non-network environments. The core idea is that object's data is stored in a local file. When the data is needed, data is loaded (hydration) and object attributes are populated. Changes you perform on the attributes will be stored back.

Frameworks are designed to "make us, developers, care less". They solve some of our problems, either image watermarking, URL routing or long-term data storage. In this example, if we continue to "not care how data is persisted" we get in trouble:

foreach ($book->chapters as $chapter) {
    echo $chapter->title.' by '.$chapter->author->name;
}

"Not caring" is not always practical. Not long, you run into issues and the initial concept of "relieving you from being aware that data is stored elsewhere" crumbles.

When the concept's fundamental principle fails, it can be patched up or a new approach can be created. So how do some popular frameworks deal with it?

Ideology of Doctrine

Doctrine pretty much chooses to patch the failed concept. It tries to make your entities look clean and simple. But things are not simple and that's why your comments now influence your code. Meet annotations:

class User
{
    /**
     * @ORM\Id @ORM\Column @ORM\GeneratedValue
     * @dummy
     * @var int
     */
    private $id;

    /**
     * @ORM\Column(type="string")
     * @Assert\NotEmpty
     * @Assert\Email
     * @var string
     */
    private $email;
}

Annotations give Persistence Manager (the code that stores / loads data) more knowledge of your object. PHP does not have good type handling, so you annotate that. If your database has field 'my-field' you cannot store it in a property, so you have to map it - again - through annotation.

Ideology of CakeORM, Eloquent, Yii ORM, Fuel ORM

These ORM don't even try to sell the concept of "not caring how data is persisted". Practicality over Concept design makes the developer painfully aware that data is stored elsewhere and "you better help me fetch it" concept takes over.

A practical solution requires a PHP code (not annotations) to be used for augmenting the "Entity". In other words - your Entity now must be extended from a class supplied by the framework to allow:

Using magical properties, detect changes, convert types
Help Persistence Manager to query data accurately
Keep the 1 object = 1 record paradigm.

Ideology of Agile Data

Agile Data has no Entities. It has Model but it is different. Here is how: "In Agile Data, Model instance represents Set of Records":

$book = new Book($db);
// set of all Book entities

$chapter = $book->withID(1)->ref('Chapter');
// set of all Chapters related to Book with id=1

Next, lets see how those different approaches help us address a practical problem.

Fine-tuning the query

Since we are now aware that the data is elsewhere, we want to fine-tune and minimize the amount of data retrieved and sent back to the database. How will different approaches deal with this task?

Some are helpless

foreach ($book->chapters as $chapter) {
    echo $chapter->title.' by '.$chapter->author->name;
}

Our original code is not efficient. That's because iteration is not aware of what information we will need. Persistence Mappers that allow iterating related objects will have to make a choice from two equally bad options:

Retrieve all chapter data (lots of data)
Lazy-load only chapter IDs and then fetch more data on demand (lots of requests)

Doctrine - Partial Queries

Doctrine can approach problem with some DQL code:

$q = $em->createQuery("select partial b.{id,title} from MyApp\Domain\Book b");

but if you also want to get author.name in the same query, it's becomes more and more cryptic: https://stackoverflow.com/a/9505215/204819

The other ORMs - Query Object

CakePHP (and few other ORMs) solve the issue by using a separate read-only stream object - query:

$query = $book->find()->select([
    'Book.id',
    'Book.title'
])
->contain([
    'RealestateAttributes' => [
        'fields' => [
            'Author.name',
        ]
    ]
])
->where($condition);

Similarly to Doctrine, this requires developer to know database intimately.

Agile Data - Paradigm of a Data Sets

I understand how important selective queries are, especially in a bigger applications. It's very important to let the developer choose which data they receive without going into database implementation details.

How do we solve this problem in Agile Data?

Defining fields

Agile Data defines fields, references and referenced fields all in one place:

class Book extends \atk4\data\Model {
    public $table = 'book';
    function init() {
        parent::init();

        $this->hasMany('Chapter', new Chapter());
    }
}

class Chapter extends \atk4\data\Model {
    public $table = 'chapter';
    function init() {
        parent::init();
        $this->addField('title');

        $this->hasOne('author_id', new Author())
            ->addField('author', 'full_name');
    }
}

class Author extends \atk4\data\Model {
    public $table = 'author';
    function init() {
        parent::init();

        $this->addField('full_name');
    }
}

Yet when you create model instance you can specify which fields to work with:

$book = new Book($db);

$chapter = $book->withID(1)->ref('Chapter');
$chapter->onlyFields('title', 'author');

Because Book was extended from Model you can use its onlyFields, ref and withID methods At the same time Doctrine used a raw user-defined Entity class and CakePHP diverted query-specific-stuff into a separate Query object. Are we asking for trouble by making Model too smart? Well, ... no.

Firstly Model and Persistence are fully separate in Agile Data, keeping it clean, but there is another major difference:

Breaking the "1 object = 1 record" paradigm

When Doctrine populates 100 instances of a class that extends nothing and contains only a few properties, that's actually not bad. Other PHP ORMs hack around and try to avoid populating their heavy Entity objects giving you plenty of other options work with Query class or fetching result arrays.

I saw the opportunity here to introduce a different approach.

In Agile Data 1 model = Set of records. But that's a set you can operate with - update, call methods and traverse. The next two examples show how you can select specific fields for your query:

foreach($book->ref('Chapters')->onlyFields('title', 'author_name') as $chapter) {
    echo $chapter['title'].' by '.$chapter['author_name'];
}

This uses single query that fetches only 3 fields from the database, yet:

$chapter is actually a model object, so you can execute methods, update data or traverse.
$chapter object remains through iteration, it's not re-created. Only $chapter->data and $chapter-Id is affectively changed. $chapter['title'] accesses $chapter->data['title'];
Only one database query is executed and only selected fields are fetched.

The design of Agile Data allows you to query without any database logic. You only need to decide which fields you want with your records. Quite often, you don't even have to do this choice yourself:

Making HTML table? Lets query only the data you need for columns.
Editing record in a form? Why load data for which you don't have form fields.
Creating CRUD? Same thing:

So suppose you want to display a CRUD listing chapters of a book with specific title and allow user to edit those those records? And then to restrict to only specific fields?

That's easily manageable through PHP code alone:

// Agile UI - Create a fully interractive CRUD for editing chapters of a specific book

$book = new Book($db);
$book->loadBy('title', 'How to PHP');

$crud = $app->add(new CRUD());
$crud->setModel($book->ref('Chapters'), ['title', 'author_name']);

Conclusions and Advice

So I wanted to give you some advice. Not on which data persistence to use, but rather on learning how to evaluate them.

Any magical mappers are bad news

If you use any database abstraction framework that magically saves / loads data - they are most likely to be impractical. You may get issues with large data sets.

Compare ways how you can tweak your query

Either through annotations, YAML file or PHP method calls - one way or another you must tell Persistence Mapper about your data. Some ways may involve extra file parsing, code generation or caching. Others may rely on the magical qualities of PHP.

I think that - the more straightforward is the approach, the better

Extending heavy class may be good or bad

From the framework design perspective, extending is good, because it's more efficient and elegant than hacking around. Yet - be aware when your data framework spits 100 copies of your class at you - this code won't scale.

Try Agile Data

There are not many Data Persistence frameworks that try to address actual problems in a consistent and complete way. In this article I have mentioned only few features of Agile Data, but, I invite you to give have a closer look at:

https://github.com/atk4/data