Using the open source tools Power Architect, LiquiBase and Subversion, we have built an integrated tool chain for data modeling and database script management for various DBMS products. The evolution of the data model is guided by a change process on top of the tool chain. In this first part of the series, I introduce the tool-chain and discuss its benefits such as portability, reuse, and generated documentation.
At mgm, we develop and maintain a variety of complex web applications which rely on a database for backend data storage. As the design of the data model has a crucial impact on the performance of the application and the integrity of the data, it is most important that the database design matches the application. And getting it right once is not enough as change and hence adaptation is inevitable. Based on our experiences with the difficulties to manage a growing amount of database scripts to keep the data model consistent and aligned with project needs, we have selected a set of tools to help us at the various stages.
Physical Data Modeling with Power Architect
Physical data modeling is about creating and changing a representation of the tables, attributes and relationships that describe the structure in which the data will be stored in the database. This data structure should on the one hand reflect the customer’s needs (as represented in the logical or business process data model) and on the other hand the needs of the applications or services which will access and work with the data. mgm has chosen SQL Power Architect to aid this complex task.
Power Architect is a lightweight open source tools available on platforms like Windows, Linux, and Mac OS X. The data model is created as a visual model which allows the user to see the tables and relationships at a glance. For complex models, the auto layout option can be used to optimize space usage.
The data model is independent of the DBMS that is used later on and is saved as a generic, highly portable XML file. You can also export the model as Confluence Wiki Markup code as documentation.
Existing databases can be reverse engineered to create a data model which can be used for further development. There is, however, one caution as the tool is currently not able to reverse engineer more complex concepts like stored procedures or check constraints. They have to be extracted and later re-added to the database script.
The auto reverse engineering is the basis for a part of Power Architect’s Diff function that allows not only to compare Power Architect models with one another, but also to compare a model to a database or even two databases with one another. In combination with the export function, this allows us to export only the differences (changes), which is very handy during adaptation and maintenance to ensure that all adjustments have been applied correctly.
The diff result can be exported to several formats as well. My colleague Thomas Kellerer has developed the diff export as a LiquiBase changeset. The changeset can then easily be inserted into an existing LiquiBase changelog. This feature is part of Power Architect starting with release 0.9.16.
Script Management with LiquiBase
While the physical data model we created using Power Architect is highly portable, there are certain problems still to be solved:
- How to track the separate changes?
- How to implement concepts like Check Constraints, etc?
- How to apply models or model changes at different times to different databases which may or may not use various DBMS and have different preferred access modes?
Script management deals these problems. It takes care of the transformation of the physical data model into scripts that take the constraints and SQL modifications of the various DBMS products into account (like Oracle, PostgreSQL, MS SQL Server, MySQL, etc.). Script management also complements Power Architect by organizing model changes so they can be tracked and applied to one or multiple databases in a comprehensive and comprehendible manner.
mgm uses the open source tool LiquiBase for script management. LiquiBase stores the information about the data model and all changes to it along with author and id tags in a generic XML file, the Changelog.
Power Architect can export the data model to this format automatically as a complete file. Additionally model changes can be exported (called Changeset) and then appended to the one Changelog. This way, we can bundle all data model changes for a given project.
Here’s an excerpt of a Changelog with several Changesets:
<changeSet author="thomas" id="25">
remarks="Stores different types of awards">
<column name="award_id" type="INTEGER" autoIncrement="true">
<column name="award_name" type="VARCHAR(50)"
remarks="The name of the award ">
<column name="issued_by" type="VARCHAR(50)"
remarks="The institution issuing the award"/>
<changeSet author="thomas" id="26">
<changeSet author="thomas" id="27">
remarks="Stores awards a movie has received">
<column name="film_id" type="INTEGER">
<column name="award_id" type="INTEGER">
<changeSet author="thomas" id="28">
<changeSet author="thomas" id="29">
The organization of the various changes into separate Changesets makes it easier to manage and track changes over time and to determine where any given change came from. This features turned out to be crucial when trying to determine when and where a problem was introduced. As all changes are kept in one place, the overhead for searching and deciding which version of a database script is the correct one, becomes obsolete.
However, organizing, tracking and managing the changes to the data model is only one benefit of LiquiBase. The other — maybe even bigger one — is that it offers very flexible and efficient ways to apply any given set of database changes. Why always apply the complete database script if only one or two changes in it are new?
LiquiBase can apply a Changelog to all major DBMS in an automated way and while doing so, it ensures that only changes that have not yet been applied to the respective database are considered (unless it is explicitly instructed to apply all changes). This saves the DBA quite some effort. If the DBA insists on applying the database scripts manually, there is no problem either, as LiquiBase also offers the option to export the Changelog to different flavors of SQL scripts.
As mentioned before, Power Architect does not offer to model certain concepts like stored procedures, etc. This can easily be remedied in LiquiBase because you can add any number of customized SQL statements to a Changelog manually. This grants full flexibility of the design and change process.
Since the Changelog is independent of a specific DBMS (except for custom SQL), it can be applied to any number of databases at any time to create, for example, matching databases for test, development and production. You can also choose to apply certain changes to specific databases.
In order to optimize the tracking of data model changes even more, we decided to employ Subversion to store and version the Power Architect files as well as the Changelogs of LiquiBase. This allows our users to compare any state in time with any other and identify which changes were made and when.
The history of data models also allows us to (re-)create databases with any stored data model — this is quite handy for testing or reviewing. The benefit is that Subversion is already a well known tool, so it can be used right away.
Change Process for the Data Model
Now that I have briefly walked you through the tools, let me illustrate how we deal with the evolution of an data model by using this tool chain:
- mgm uses Power Architect to create and maintain a physical data model from the various inputs generated by business data modeling and the development team in the later stages of a project.
- The Power Architect model is versioned and stored for further reference and reuse in Subversion. Power Architect is also used to automatically generate documentation of the data model.
- mgm uses LiquiBase to create a Changelog from the physical data mode. This transforms the data model into a file that can be used to track changes and for application to databases.
- Over time changes will be added to the Changelog as separate Changesets. The Changelog records all changes made to the respective data model and is versioned and stored in Subversion for further use.
- LiquiBase is also used to apply the data model and model changes to one or more databases. LiquiBase allows the application of the same model to any number of databases and a variety of DBMS (using different SQL dialects) so they can be kept in consistent states.
Power Architect files and LiquiBase Changelogs are both generic XML-type file formats. This makes them human readable (though not comfortable) and allows the use of XSLT to transform them to other formats.
As some of our customers require written documentation and as we also subscribe to the concept of a documented model, we use a special XSLT transformation to export the data model from Power Architect directly to the Markup needed for our Confluence Wiki. This makes documenting the model as easy as two clicks and one copy&paste action to create a table listing or one click to create a PDF file that contains the visual model.
In the next part of this blog series, we will use a sample data model and take a closer look at Power Architect and LiquiBase. Watch out for a lot of tips and tricks that have proven helpful.