"An extraordinary thinker and strategist" "Great knowledge and a wealth of experience" "Informative and entertaining as always" "Captivating!" "Very relevant information" "10 out of 7 actually!" "In my over 20 years in the Analytics and Information Management space I believe Alan is the best and most complete practitioner I have worked with" "Surprisingly entertaining..." "Extremely eloquent, knowledgeable and great at joining the topics and themes between presentations" "Informative, dynamic and engaging" "I'd work with Alan even if I didn't enjoy it so much." "The quintessential information and data management practitioner – passionate, evangelistic, experienced, intelligent, and knowledgeable" "The best knowledgeable, enthusiastic and committed problem solver I have ever worked with" "His passion and depth of knowledge in Information Management Strategy and Governance is infectious" "Feed him your most critical strategic challenges. They are his breakfast." "A rare gem - a pleasure to work with."

Wednesday, 27 November 2013

Bill Inmon agrees with Aussie-based Scotsman!

Big ambitions for Big Data? Be prepared for Big Problems...

I was excited to be able attend the recent Enterprise IQ Future Enterprise summit held in Sydney. The event was up to Enterprise IQ's usual high standard, with a mix of great keynote speakers, breakout sessions, discussions and exhibitors.

Martin Rennhackkamp has already posted an excellent overall summary of the general proceedings (saving me the trouble, thanks Martin!), so in one respect it only remains for me to offer congratulations and thanks to Daniel McMurray and the team for having me involved.

However, I cannot go without exploring one aspect of the conference in some more detail. Bill Inmon's keynote speech was of course much anticipated, and I would be pretty surprised if anyone reading this blog was unaware of Bill's work and his impact on IT and Information Management. (If you've been under a rock for the last twenty-odd years, Bill is generally heralded as "The Father of Data Warehousing", and together with the contributions of Ralph Kimball, his seminal works have laid the foundations for and entire industry and thousands of careers, including my own.)

Now, I had never seen Bill present before, so I don't know if his contribution was typical or if he was just having a particularly "engaged" day (jet-lag probably contributes!). But I was totally unprepared for, and blown away by, the strength of opinion that he offered on the current wave of "Big Data".

Boy oh boy, did he launch into one! I think it's fair to say that Bill Inmon is not a fan of what is currently going on in the Information Management sector, and with Big Data in particular. It was certainly gratifying to find that most of Bill's points pretty much aligned with my own points of view on Big Data (obviously he's a smart fellow...)

Observations that resonated with me included:

  • The current technologies are generally not living up to the hype.
  • "Big Data" vendors are not engaging with (and don't understand) business problems.
  • Data processing methods based on programming intensive techniques (Hadoop/MapReduce etc) are not extensible or flexible enough.
  • The dependencies on "Data Scientists" are unsustainable and not scalable. Where are all these in-demand gurus coming from? 
  • There is an invalid assumption within semantic and natural language processing that context can be inferred from the words alone. 
For me, it was this last point that was most revelatory. 

Bill highlighted that while "traditional" data warehousing approaches create and imply context and meaning for the data by means of a structured data model, the "Big Data" approach does not impose such structure and a lot more work is required in order to contextualise the data, perform text disambiguation and make it usable (machine readable). 

Aspects that need to be considered in the data preparation and parsing steps are:
  • Defined taxonomies and ontologies.
  • Homographic resolution (words that are written the same, but have different meanings).
  • Deriving meaning of terms based on their textual proximity to other items.
  • Document metadata.
  • Acronym resolution.
  • Inference of additional or missing information from surrounding content.
  • Interpretation and decryption of encoded data streams.
Bill is clearly of the view that these functions can be systemised, configured and made repeatable. Indeed, they must be moved away from custom data processing and into encoded, re-usable tools and data products if we are to really start harnessing the benefits that Big Data promises. This is the direction that Bill is taking with his company Forest Rim and their Textual ETL tool. 

I await further developments with anticipation. Given his track record of predicting market trends ahead of the curve, I guess I'll return the favour and say that on this occasion, I agree with Bill's point of view...!

My thanks also to Neil Currie and his team at QFire Software for inviting me to be a guest at their post-conference networking event. Over a quiet drink or two, I was delighted to have the opportunity to meet Bill in person, enjoy panoramic views of Sydney Harbour and exchange a few views and opinions (mostly on the relative merits of living in Sydney versus Bill's home of Colorado!).

Tuesday, 12 November 2013

To Centralise or Not To Centralise, That is the Question...

Which Data Governance model is right for me?

There’s still much debate about whether Data Governance should be centralised or not. 

Thinking back to the Data Quality AsiaPacific event in March, Nonna Milmeister at Telstra made the point that as far as she’s concerned, there’s little interest in the “where does it sit” discussion, and I broadly agree with Nonna’s point of view that the hierarchical placing of the team may be a bit of a red-herring with respect to the core Data Governance function, which exists to facilitate the overall process of information flow within the organisation. (see also my post “What Do The Simple Folk Do”…)

However, when it comes to the overall organisational operating models for Data Governance, the question is somewhat different. It’s not so much about the hierarchical situation of the core team; it’s more to do with the overall approach to bringing the various governance groups together in a cross-functional, enterprise-wide approach. I suggest that rather than being a hierarchical or functional issue, it will be the social and cultural characteristics of an organisation that will be the deciding factors in determining which approach to adopt.

Each organisation will of course have its own dynamics, cultural constraints and behavioural norms that influence the way the business runs; these are often not formally recognised or dealt with, even in organisations that have well-documented Mission or Value Statements. However, I have identified three broad categories of organisational cultural models that I think have an overarching influence on the approach to establishing an Data Governance environment:

1. Centralised IM Governance
Each business unit may operate separately, but there is significant commonality across the whole business lifecycle. Customers, product lines and service channels are inter-related and cross-business effectiveness is enhanced by close co-operation.
Approaches & processes for Information Management and Data Governance need to be held in common.
One set of controls and policies is put in place for Information Management throughout the organisation.

GOOD FOR: Hierarchical organisations where there needs to be a significant amount of information sharing between departments and business units. This is typical in organisations where information sharing and re-use is required at the detailed level.

EXAMPLES: Banking, Telecoms, Retailing

2. Federated IM Governance
Each unit operates autonomously and may have very different approaches & processes. However, each executes the same overall functionality & responsibilities as the others. There are shared guiding principles & objectives for Information Management.

GOOD FOR: Geographically diverse organisations with a loose hierarchy. Information sharing is appropriate at a high level (themes, approaches, learning). This would suit organisations where significant differences in the operational environment exist, but where the overall high-level objectives are held in common.

EXAMPLES: Universities, Healthcare authorities, Social Welfare programmes.

3. Distributed IM Governance
In the distributed model, there is little or no commonality within business units, customer base and product lines, with each operating fully autonomously.

GOOD FOR: Organisations that are operationally diverse and functionality independent, with little or no need for sharing information between silos.

EXAMPLES: Fast-Moving Consumer Goods businesses operating widely differing product markets (e.g. Cosmetics, Foodstuffs and Cleaning products); government departments with diverse portfolios.

Clearly these are very generalised models, but in my experience, any organisation will fall into one of these categories of cultural behaviour. It then becomes part of the Data Governance function’s role to identify the most suitable model and match any initiatives to fit within the cultural norms that apply. (Beware trying to enforce one approach on an organisation that has another social structure!)

Does your organisation fit into one of these three cultural orders? Will mapping these behavioural norms help you to identify the best approach to rolling out Data Governance capability within your company? Or are there other types of environment that would require a different approach? Please let me know your views.

(See also Share the Love... of Data Quality for some thoughts on a distributed approach to more operational aspects of data governance & data quality management.)