Any organisation looking to truly optimise its big data stack knows it has enemies stacked up against them, throwing blockages in the way of DataOps and DevOps teams. It can feel like climbing a mountain; the fabled golden city of uptime and peak efficiency always beyond another crag, around another boulder.
It’s a tough area, particularly when the IT team and wider business have other conflicting ideas about how to spend that time. Yet it’s a fantastic challenge to grapple with for those who like nothing better than overcoming whatever block that business, technology, or chance throws into their path.
For the DevOps and the DataOps teams operationalising big data and ensuring models and services remain in production there are certain blocks that loom larger than the rest. Here’s how teams can look at these blockages with determination rather than fear.
Let’s get it out of the way first, because when the business talks analytics and big data, the number one, front-of-mind issue, is volume. It’s one of the Vs of big data, and it’s part of why we call it ‘big’ data.
The volume of data can be a challenge. Expanding giga, tera, and petabytes need some ‘tin’ to store it in. Everyone knows that the digital society is producing great quantities of data, and the volume is rising day-on-day from digital transactions and records, the internet of things (IoT), and ever-greater digitalisation and the ‘chips-with-everything’ mentality that’s connecting so many new categories of device each year. Even that sentence was high in word volume.
And so, with rising data volumes available, enterprises continue to leverage this resource to create new business value. The trend fuels a plethora of new applications spanning an alphabet soup, from ETL, AI, IoT, and ML, aimed at many business drivers.
These applications need an application performance management solution to meet robust enterprise production requirements as they are deployed on new data platforms.
Then there are the systems that begin to creak or fall over when data volumes start pushing at their technical boundaries. These are often solutions like relational database management systems, or statistics or visualisation software. Many examples of these struggle to manage really big data.
Data applications (data consumers), don’t exist in isolation of the underlying big data stack. Endlessly looking for more storage server space, reconfiguring clusters, and ensuring that databases are optimised is no mean feat for the DevOps and DataOps teams.
Applications are threaded together with many different systems e.g. ETL, Spark, MapReduce, Kafka, Hive, etc. How the stack performs has a direct impact on downstream consumers. Managing applications is highly complex and requires an end-to-end solution, especially to meet SLA agreements.
Stitch those silos together
The barriers and blocks that get in the way of a strong big data stack and a good analytics process go deeper, almost inevitably encompassing the silos most organisations have built up as data pools in departments and teams.
Each guards their information and narrowly optimised processes so that the business grows a complex data estate of co-mingled assets. Each silo is a barrier and a block to a single version of the truth, and a strong analytic process that encompasses the whole organisation. As is said in the context of customer service, ‘it’s easier when there’s one throat to choke’.
In fact, merging data sources and cataloguing data (best-practice aspects in busting those silos) really helps combat another block.
The data pipeline is only as good as its weakest link. Unexplained run-time problems in your applications often occur because one part of the analytics pipeline has changed, moved, been reconfigured, is starved of compute resources etc.
Troubleshooting is time-intensive and complex and configuration variables are a pain to parse through. And when aiming for perfection the wise DataOps person knows that ‘perfect is the enemy of good’.
The automata, your friend
Automation is one big data block busting ally. The DataOps team have some big fish to fry, and only so many pairs of hands with which to work. In fact, there’s a real skills shortage for good big data architects and cloud specialists.
Morgan McKinley, the recruiters, published an IT salary guide showing soaring demand and a massive shortage of available talent in emerging technologies and specialist fields. The dearth of talent has really driven up industry salaries.
A cloud architect could expect £100,000 to £130,000 with the right industry experience on top of their technology acumen.
Given the lack of freely available talent, it makes sense to safeguard the teams you have and allow them to use their skills to best effect by automating the parts of their roles that allow them to focus on their higher skill-set.
In case you were wondering, the hottest data technologies are still SQL, R, Python, Hadoop within Data Science, and Kafka, Scala, Spark underpinned with Java within Data Engineering – according to Morgan McKinley.
Giving the existing DataOps and analytical trouble-shooting talent a hand with some strong application performance management solutions will help them bust through blockages with ease, more easily troubleshooting ands optimising every aspect of the big data stack.
Not only does it make sense to optimise the process for better, faster business results – but given the strength of the DataOps team’s negotiating power in the current skills shortage, it’s wise to keep them happy and allow them to enjoy their jobs, rather than debugging, chasing their own tails all day.
By Kunal Agarwal, CEO, Unravel Data