There’s been a lot of talk about scalability recently, some emerging technologies even use it as a differentiator against the established big players…but what constitutes a truly scalable solution and why does it matter?
I’ll start with the latter first…if a method or solution or database or tool is not scalable to meet all the challenges of the modern business…it is like trying to dig a motorway with a teaspoon…you start with small holes and you pretty much stay there, never really completing the entire process or getting to the value proposition in enough volume that matters.
Although business may thank you for plugging a tiny hole in the metaphorical water dyke of their data problems, the water is still leaking out from many other places…in floods and torrents…your tiny hole plugging exercise is pretty much meaningless in the bigger picture.
The data industry has survived so far on a “Divide and Conquer” approach with disparate disciplines and niche solutions all doing their own thing…but things like Digital Transformation and GDPR are now on the scene and being discussed in the boardrooms…and these things need a holistic approach…they need scalability in all directions.
Even Scalability itself has been dissected and divided…some data people touting the ability to chew through petabytes of data 5 times quicker than anybody else (though why you’d want to do that often defeats me…has anybody heard of “less is more”!).
But then I’m reminded of the Exit Poll in the UK 2019 General Election…its prediction was so incredibly close to what actually happened, and was based on merely 19607 face to face chats…did that need petabytes? No it didn’t…just the right focus and scope.
Now I’m no Data Scientist…but there must be a point where increasing sample size bears no additional value…so is such huge record munching scalability just a symptom that we’ve got too much data that we know not enough about? Is Big Data merely a plaster on a gaping wound that could be better treated with a couple of well positioned stitches?
I’ve got a gut feeling that it is…so let’s get down to what scalability truly should mean in the context of Enterprise Information Management…
Yes, of course, a method, platform or tool should be able to handle the volume and throughput of data needed to both operate and steer a business forwards…but it certainly shouldn’t be an excuse to process ever bigger quantities of data…just because it can.
Ideally, it should encourage a “less is more” mindset based around business need and focused information insights, processes and storage.
Businesses can “buy” strategic insights and trends from industry and academic researchers very possibly more cost effectively than finding them out themselves via wading through petabytes of Big Data…and when you find them, they will most likely involve such a huge and cross functional transformation to deliver them that realising the value takes too much investment and too much time.
What business can get quite easily is a thousand small insights from the operational data they already have, through the tribal knowledge they already possess within their workforce and management teams. Such small nuggets can be more easily shaped, incorporated locally and evolved over time…this is natural business evolution.
So basically, volume isn’t everything!
So then there is…
This area is certainly flavour of the month at the moment!
Be careful though, because there is a big difference between have loads of semantic meta data and having scalable semantics…if care is not taken here, you end up with your business semantics being just as hard to manage as your business data!
This is one of the areas of scalability where effort to reduce your overhead and maximise reuse is the key to success…and the banishment of ambiguity is the ultimate goal here.
Focus should be on business semantics over technology semantics…your business only has one set of business semantics, a relatively small set in fact, but your technology landscape can have tens if not hundreds of lexicons…
So beware methods and tools that base semantic naming and meaning primarily on technology applications and database models…these sources are generally not sufficient in context, clarity, consistency and verbose detail to form a basis for a business semantics catalogue, use business SMEs instead and provide them with the methods and tools to describe the semantics of their business areas directly…ideally, it is then up to the technology to adopt this and not visa-versa.
Separation (Roles and Concerns) Scalability
Believe it or not, Data People can not solve all this on their own, however, take a closer look at most methods, platforms and tools in the marketplace and this is pretty much what they assume happens…they expose all their features without much consideration of separation of roles and concerns.
Quite often, this means that well-meaning data practitioners assume too much responsibility and take too much on themselves, becoming a bottleneck to wider adoption…and in turn, never really getting data management and governance out of the data team and into the business.
In many ways, this is because business sees this as an IT problem…for data experts to drive forwards…and not a business problem for data practitioners to steer and the business to drive…so data methods, platforms and tools have been designed on that basis.
This is a systemic issue…data is a people business and therefore all roles must participate, each adding their own piece to the enterprise puzzle…so look for solutions that do not assume all features = one user or role, but instead is able to define who does what, how and when and tailors roles and access accordingly.
Federation (Organisation and Outcomes) Scalability
Following on from separation of concerns and closely linked is ability to then disperse activities and outcomes across the entire business. This is purely the appropriate allocation of time and effort to the individuals most accountable and closest to the data.
Data governance and quality management are key aspects to consider here…continuous business as usual activities that take a small amount of an individual’s time when dispersed widely across as many participants as possible.
To achieve this, such tasks must be focussed, interconnected, within the capability, skill, knowledge and accountability of the individual and must have clear, consistent and measurable outcomes. So clearly, data practitioners will have to only play a small part in this overall process…this is an enterprise scale work flow problem.
Scalable methods, tools and platforms should therefore be able to demonstrate the ability for mass federation and work flow across diverse business and technology functions.
Have you ever seen a method or tool that is great with small amounts of content…but falls apart or becomes unusable when loaded with enterprise scope volumes? Of course you have! Unfortunately, is seems to be the norm rather than the exception.
There are two main aspects here…the key is visual focus and dexterity.
Firstly and linked with federation and separation of concerns, the visuals should only show the scope of the individuals involvement and nothing that is irrelevant to them.
Secondly, the format of those visuals should reflect the amount of information needed to complete the task at hand, provide multiple viewpoints of the same data based on audience and outcomes needed.
In other words, one shape or size fits all does not work at enterprise scale…look out for close harmony between visuals and needs and assess whether a method, platform or tool is clearly designed for diverse enterprise scale use…many come from a single user, small problem mentality.
The business world changes constantly, methods, platforms and tools should clearly support all enterprise information management and governance aspects of business transformations.
One might expect that, in this day and age, there is no excuse for a transformation endeavour not to start from a recorded architectural baseline and deliver the next baseline as a matter of course…but for this to happen, such information landscape management must be seamlessly intertwined with transformational change.
Many methods, platforms and tools are point solutions…they describe transformation end points only, are weak on version control, change lifecycle and delivery management and therefore virtually useless to a transformation endeavour during detailed design stages.
The key here though, is to stop transformations making the enterprise information landscape more fragmented and disparate by engaging throughout the entire change process. Semantic consistency and many other data problems can be fixed by design…this is not just Privacy by Design and Default…this is EIM by Design and Default…architected not just developed.
So look out for information landscape management features and the ability to roll out a method, platform or tool across business transformations as well as business as usual activities. Look out for robust version control and lifecycle management.
Many data solutions, especially in the master data management space focus on a particular information domain or industry vertical, for example, Product Management or Telecommunications.
These are quite attractive to the businesses or business functions they serve…they feel targeted, focussed…instantly relevant (easy for people to sell internally)…as long as your business can fit into their interpretation of what such things should cover and do of course.
These are generally “least common denominator” solutions, they implement a core of the domain that everybody, hopefully, can have a consensus on. You can extend them normally, but is this defeating the point a little? They should be fitting your business, not visa-versa…these things are never, ever turnkey implementations.
Information domains are also, so often, so interconnected that by separating one or two away, you lose some of the synergy of those inter-relationships…this is a trade-off most of the time…what you end up with if you split these things up is many methods and tools not playing well with each other…basically, the very functional silos we are trying to break down, but now under the name of data management…this is why so many fail to deliver value.
Domain agnostic methods and tools come with less of these restrictions, but take a little more effort to both sell internally and realise their value. In the whole though, going domain agnostic provides a clearer path to a holistic information architecture and treats all domains equally…so if you have a complex business landscape, weigh up these pros and cons carefully to avoid expensive and time consuming mistakes that will discourage future sponsorship of enterprise wide solutions.
Large businesses employ a myriad of technologies…many flavours of databases, software stacks, integration and data structure styles. You can quickly see when a method or tool is biased towards a particular set of these to the detriment of all others. This is more common than you may think and is a symptom of the preferences and skillsets of the designers of the tools themselves.
Relational thinking is the most prevalent out there in the marketplace and quite often steers the design and features available to fit that theory…but what about everything else in the information landscape?
Scalability in this area comes from being totally technology agnostic…the ability to cope with any style and structure as data is captured, processed, stored, moved and monetised. For example, recently a single information flow we modelled had Oracle Forms, Oracle DB, XML over MQ, SQL*Server DB, .NET, HTML, JMS, XML over remote procedure call, SAP BAPI, SAP Function Modules, SAP Tables and SAP UIs…I’m sure this is not unusual.
So beware and watch out for technology limitations that will not cover your full technology portfolio.
Providing full access even just to a handful of “Data People” for the market leading vendors is shockingly expensive most of the time, letting hundreds if not thousands of users make changes is pretty much beyond what most, even large, businesses will pay for what is seen generally as an IT tool. Many tools offer free “Read Only” access to try and offset this challenge, but that’s diverting the problem away from licensing models that don’t scale economically past a handful of full access users. Data People can’t do everything, they simply aren’t enough of them…
Look to get to the bottom of unlimited update license costs and then see how the options on the market really compare.
Any data technology or method that requires complex and rare skills to work is heading for a heap of long term trouble. It can take years for a new coding language or database technology, for example, to fully embed into general acceptance and use. Niche and emerging skills are also expensive to buy…this is simple supply and demand economics.
Of course, this is where the IT hype machine kicks in and sells on the “getting left behind with legacy stuff coz this is new and better” message. This is rarely true immediately, more of an ambition much of the time based on perceived benefit and the sales machine smoothing over these kinds of negative messages.
The equation is simple, commonly known technology = cheap and widely available resources.
Remove any one of these scalability aspects and your method, framework or platform will eventually hit a hard wall of scope, usability and value creation. This is purely mathematics in action, complex businesses need scalability in all of these directions. Small problems are easily solved in desktop tools like Excel and Visio, but such end user tools will not scale well to enterprise level.
A truly scalable solution takes you to the next level, beyond the world of the single scenario into the entropy of the full enterprise…and the above are measures that you should use together to determine what real enterprise scope scalability is on offer…