This posting, the latest in a series focused on a disciplined agile approach to data management, overviews the activities that a disciplined agile data management team may perform. The Disciplined Agile Delivery (DAD) framework promotes an adaptive, context-sensitive strategy to data management. DAD does this via its goal-driven approach that indicates the process factors you need to consider, a range of techniques or strategies for you to address each process factor, and the advantages and disadvantages of each technique. In this blog we present the goal diagram for the Data Management process blade and overview its process factors.
The following diagram overviews the potential activities associated with disciplined agile data management.
The process factors that you need to consider for data management are:
- Improve data quality. There is a range of strategies that you can adopt to ensure data quality. The agile community has developed concrete quality techniques – in particular database testing, continuous database integration, and database refactoring – that prove more effective than traditional strategies. Meta data management (MDM) proves to be fragile in practice as the overhead of collecting and maintaining the meta data proves to be far greater than the benefit of doing so. Extract transform and load (ETL) strategies are commonplace for data warehouse (DW) efforts, but they are in effect band-aids that do nothing to fix data quality problems at the source.
- Evolve data assets. There are several categories of data that prove to be true assets over the long term: Test data that is used to support your testing efforts; Reference data, also called lookup data, that describes relatively static entities such as states/provinces, product categories, or lines of business; Master data that is critical to your business, such as customer or supplier data; Meta data, which is data about data. Traditional data management tends to be reasonably good at this, although can be heavy handed at times and may not have the configuration management discipline that is common within the agile community.
- Ensure data security. This is a very important aspect of security in general. The fundamental issue is to ensure that people get access to only the information that they should and that information is not available to people who shouldn’t have it. Data security must be addressed at both the virtual and physical levels.
- Specify data structures. At the enterprise level your models should be high level – lean thinking is that the more complex something is, the less detailed your models should be to describe it. This is why it is better to have a high-level conceptual model than a detailed enterprise data model (EDM) in most cases. Detailed models, such as physical data models (PDMs), are often needed for specific legacy data sources by delivery teams.
- Refactor legacy data sources. Database refactoring is a key technique for safely improving the quality of your production databases. Where delivery teams will perform the short term work of implementing the refactoring, there is organizational work to be done to communicate the refactoring, monitor usage of deprecated schema, and eventually remove deprecated schema and any scaffolding required to implement the refactoring.
- Govern data. Data, and the activities surrounding it, should be governed within your organization. Data governance is part of your overall IT governance efforts.
Looking at the diagram above, traditional data management professionals may believe that some activities are missing. These activities may include:
- Enterprise data architecture. This is addressed by the Enterprise Architecture process blade. The DA philosophy is to optimize the whole. When data architecture (or security architecture, or network architecture, or…) is split out from EA it often tends to be locally optimized and as a result does not fit well with the rest of the architectural vision.
- Operational database administration. This is addressed by the Operations process blade, once again to optimize the operational whole over locally optimizing the “data part.”
Future blog postings in this series will explore the workflow associated with data management.
- The Data Management process blade
- DevOps Strategies: Data Management
- Disciplined Agile DW/BI: Artifact Creation
- Disciplined Agile DW/BI Teams
- The Enterprise Architecture process blade
- Why Data Management?