Data Management

According to the Data Management Body of Knowledge, data management is “the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets.”  In our opinion this is a very good definition, unfortunately the implementation of data management strategies tends to be challenged in practice due to the traditional, documentation-heavy mindset. This mindset tends to result in onerous, bureaucratic strategies that more often than not struggle to support the goals of your organization.

Having said that, data management is still very important to the success of your organization. The Disciplined Agile framework’s Data Management process blade promotes a pragmatic, streamlined approach to data management that fits into the rest of your IT processes – we need to optimize the entire workflow, not sub-optimize our data management strategy.  We need to support the overall needs of our organization, producing real value for our stakeholders. Disciplined agile data management does this in an evolutionary and collaborative manner, via concrete data management strategies that provide the right data at the right time to the right people.

This article addresses several topics:

 

Why Data Management?

There are several reasons why a disciplined agile approach data management is important:

  1. Data is the lifeblood of your organization.  Without data, or more accurately information, you quickly find that you cannot run your business. Having said that, data is only one part of the overall picture.  Yes, blood is important but so is your skeleton, your muscles, your organs, and many other body parts.  We need to optimize the whole organizational body, not just the “data blood.”
  2. Data is a corporate asset and needs to be treated as such.    Unfortunately the traditional approach to data management has resulted in data with sketchy quality, data that is inconsistent, incomplete, and is often not available in a timely manner.  Traditional strategies are too slow moving and heavy-weight to address the needs of modern, lean enterprises.  To treat data like a real asset we must adopt concrete agile data quality techniques such as database regression testing to discover quality problems and database refactoring to fix them.  We also need to support delivery teams with lightweight agile data models and agile/lean data governance.
  3. People deserve to have appropriate access to data in a timely manner. People need access to the right data at the right time to make effective decisions.  The implication is that your organization must be able to provide the data that an individual should have access to in a streamlined and timely manner.
  4. Data management must be an enabler of DevOps.  As you can see in the following diagram, Data Management is an important part of our overall Disciplined DevOps strategy. A successful DevOps approach requires you to streamline the entire flow between delivery and operations, and part of that effort is to evolve existing production data sources to support new functionality.

Disciplined DevOps

 

The Process

The following process goal diagram overviews the potential activities associated with disciplined agile data management. These activities are often performed by, or at least supported by, a data management team.

Goal - IT - Data Management

The process factors that you need to consider for data management are:

  1. Improve data quality.  There is a range of strategies that you can adopt to ensure data quality.  The agile community has developed concrete quality techniques – in particular database testing, continuous database integration, and database refactoring – that prove more effective than traditional strategies.  Meta data management (MDM) proves to be fragile in practice as the overhead of collecting and maintaining the meta data proves to be far greater than the benefit of doing so.  Extract transform and load (ETL) strategies are commonplace for data warehouse (DW) efforts, but they are in effect band-aids that do nothing to fix data quality problems at the source.
  2. Evolve data assets.  There are several categories of data that prove to be true assets over the long term: Test data that is used to support your testing efforts; Reference data, also called lookup data, that describes relatively static entities such as states/provinces, product categories, or lines of business; Master data that is critical to your business, such as customer or supplier data; Meta data, which is data about data. Traditional data management tends to be reasonably good at this, although can be heavy handed at times and may not have the configuration management discipline that is common within the agile community.
  3. Ensure data security.  This is a very important aspect of security in general.  The fundamental issue is to ensure that people get access to only the information that they should and that information is not available to people who shouldn’t have it.  Data security must be addressed at both the virtual and physical levels.
  4. Specify data structures.  At the enterprise level your models should be high level – lean thinking is that the more complex something is, the less detailed your models should be to describe it.  This is why it is better to have a high-level conceptual model than a detailed enterprise data model (EDM) in most cases.  Detailed models, such as physical data models (PDMs), are often needed for specific legacy data sources by delivery teams.
  5. Refactor legacy data sourcesDatabase refactoring is a key technique for safely improving the quality of your production databases.  Where delivery teams will perform the short term work of implementing the refactoring, there is organizational work to be done to communicate the refactoring, monitor usage of deprecated schema, and eventually remove deprecated schema and any scaffolding required to implement the refactoring.
  6. Govern data.  Data, and the activities surrounding it, should be governed within your organization.  Data governance is part of your overall IT governance efforts.

Looking at the diagram above, traditional data management professionals may believe that some activities are missing.  These activities may include:

  • Enterprise data architecture.  This is addressed by the Enterprise Architecture process blade.  The DA philosophy is to optimize the whole.  When data architecture (or security architecture, or network architecture, or…) is split out from EA it often tends to be locally optimized and as a result does not fit well with the rest of the architectural vision.
  • Operational database administration.  This is addressed by the Operations process blade, once again to optimize the operational whole over locally optimizing the “data part.”

 

External Workflow With Other IT Teams


This section is a work in progress.

 

Internal Workflow

This section is a work in progress.