As businesses use more and more data every passing day, it has become vital that new practices and disciplines take over in order to help improve the coordination between the analysis of this data and the overall operation of the enterprise.
This practice is then known as DataOps – Data Operations – and is now of essential value to businesses. Thus, we have talked with experts in the industry to shed light on this ever-growing topic and share with us the significant importance of DataOps.
What is DataOps?
DataOps is an emerging methodology that intertwines DevOps teams with data engineer and data scientist roles so as to provide the necessary tools, processes, and structures to support data-focused organizations.
Simon Trewin, DataOps transformation expert and thought leader at Kinaesis, describes DataOps as the art of driving data-driven projects in continuous integration and continuous deployment cycle. This then enables teams to be able to incrementally build solutions fast, potentially fail fast, and to get those solutions in front of stakeholders to help guide the direction and to build collaboration and momentum.
By carrying out a structured DataOps process, projects can maintain the momentum through changing requirements, which are likely, and through multiple phases and releases. A good DataOps methodology will enable a project team to establish a process knowing how to maintain the velocity of the release train.
Neill Cain, Lead Software Engineer at Craneware, adds that DataOps is foremost about how you as an organization enable the democratization of data – how you capture datasets across the enterprise, how you communicate the governance of that data and how people discover and use that data.
Thereafter, it becomes somewhat similar to DevOps [at least the ops side] in that DataOps focuses on the efficient and cost-effective delivery of that data to consumers.
The difference between DevOps and DataOps, Simon continues, is that DataOps incorporates DevOps processes, however, DataOps is an extension of DevOps to cater to the specific nuances related to data.
- data is continually changing; therefore, you need to build patterns to manage this,
- data is owned by the business therefore it is crucial to building collaboration,
- data models need to be understood to structure data pipelines and support iteration,
- requirements processes need to capture additional elements that software engineers do not need to consider.
The tooling is varied, and each tool has certain performance characteristics that make them specific to problems that potentially change, therefore flexibility is key.
Neill also points out that DevOps is a blend of tooling, culture, and working practices leveraged to deliver software efficiently combined with an agile and metrics-driven approach to incrementally enhancing that software.
The business value of DataOps
DataOps is a step-change in the productivity of your data and analytics projects in line with the changes that DevOps brought.
Indeed, as Simon states, all businesses are keen to be data-driven and to be using the latest AI and ML tools and techniques. Much of this is experimental and evolving, therefore it is important to be able to iterate fast and learn fast and be prepared to test theories and throw them away.
DataOps gives you the agility and the flexibility to be able to move quickly with controls in place. It enables you to establish reusability and consistency around knowledge and information to enable you to support decisions and help you make the right choices rapidly to help with the competition.
Businesses adopt DataOps to remain competitive, Neill underlines. Every organization wishes to deliver quickly to customers as well as evaluate a hypothesis without having to traverse layer upon layer of bureaucracy and red tape.
Moreover, DataOps will position data in front of the business efficiently and effectively. Hence, as Simon notes, DataOps will:
- Be responsive to their needs,
- Enable them to experiment and gain insight from trusted sources of data to prototype theories rapidly and then transition IT processes,
- Provide them with metadata alongside their data to provide transparency and trust,
- Help break down silos,
- Build collaboration.
- Significantly reduce costs.
Whilst it would be a challenge to convey effectively the correlation between DataOps practice adoption and the bottom line, Neill continues, it stands to reason that by introducing agility into the delivery of data together with the adoption of data democratization type initiatives and enabling data discovery, you’ll get to market quicker and be able to answer that all-important question: is this of value to our customers?
If it is, then the business should prosper and be a viable alternative to companies that do not.
According to Neill, the best DataOps practices are automation, the cataloging of data assets, applying Agile principles to the process, supporting apps, infrastructure, etc. around the delivery of the data, and finally, enabling data discovery and therefore self-serve access, but adding layers of governance as necessary. The key to that is to make it crystal clear what the data governance is i.e., who the data owner is and how you request access to the data.
In order to have the best DataOps practices, Simon adds that you need to:
- Define the use case and vision effectively,
- Instrument your data pipeline all the way along,
- Capture the right metadata,
- Build extensibility into your platforms to cope with changing requirements,
- Be able to build collaboration,
- Govern the projects in a lean and efficient way that adds value to the customer.
Firstly, Simon states, you need to start with a good candidate project and establish what works for you as an organization. You need to bring in experts from the start to either guide you or help you get things off the ground also to educate your teams on how to deliver effectively. Then, you need to build out your tooling and frameworks to support you, much like with DevOps. Build a thin slice of functionality front to back. Learn from it and then extend out.
‘Start small, build incrementally’, Neill emphasizes. You have to first establish a candidate pilot project. Then, you have to ensure there is diversity in project team members and ensure that you have at least one key stakeholder involved.
The benefits of DataOps…
One of the main advantages of DataOps is the facilitation of conversations around leveraging the company’s data assets, Neill points out.
Breaking down any so-called data silos in the company and applying the same tools and approaches to all data. Uniformity comes with multiple benefits: efficiency and reduction in code rot and archaic tools that cannot be supported – ETL tooling built in-house by a developer that left 2 years ago and everyone is terrified of touching, for example.
Another example is an industry-recognized technology that has been over-engineered by one person, on their own PC and has left the company but the company cannot figure out how it works, how to deploy it, or how to support it so they literally kept the former employee’s workstation running for years!
Simon also offers a few benefits of DataOps, which are:
- The ability to test out models and analysis in a sandbox environment,
- To build collaboration across business and IT teams,
- The speed up the cadence of your releases,
- Reduces the side effects of changes,
- Empowers the IT teams and the business teams,
- Helps align responsibilities around data with the right people,
- Makes the teams more productive.
…and the challenges
For Neill, the challenges organizations that wish to implement DataOps face can come from being impeded by various data governance policies and data ETL processes and technologies can be siloed in disparate teams.
For instance, if you have a question you want to ask your data, but you cannot access the data and it is “owned” by a centralized team, such as a dedicated BI team. The process is initiated by asking them to ask the question for you and this immediately puts you at the mercy of someone else’s work schedule/load – what if they’re too busy to perform the task, or even worse, on annual leave for a week!
According to Simon, here are some of the biggest challenges of DataOps:
- Cultural changes, it is new and therefore different from what people know,
- It is not zero cost to set it up,
- The tooling, support, and knowledge are not 100% consistent. Some of the practices lack maturity,
- Small data gets in the way – Data Processes start with ingestion from sources and normally end with SMEs adjusting at the end of processes. Without full visibility of this, it is not possible to establish accuracy and consistency.
- The skills gap, there is a significant knowledge gap in the market and a lack of educational resources around to skill people.
The future of DataOps
Simon believes that DataOps will, like DevOps, become the standard for data and analytics teams.
In the future, the tooling and the support for DataOps will improve and the methods will become more standardized and refined. The approaches will become mainstream. A new generation of data scientists and data engineers will accept it as the way to do things, and thus, this will drive the data-driven world forward rapidly and effectively.
Neill thinks that the forecasted increasing data volumes will warrant the adoption of automation of data ingestion and analysis. Hence, this is where AI/ML solutions will be developed: to do the heavy lifting that users cannot perform with point and click devised and keyboards!
Besides, they will also be diversifying across solutions and building custom code to “glue” them together. There are no silver bullets or multi-purpose tools that can accommodate the needs of an organization, so be prepared to stitch multiple solutions together. Finally, user interfaces will need to evolve to keep pace with the advancing backend technologies. This is vital as a large proportion of data users and experts cannot drive a CLI or write exploratory code, the UI components will need to enable data discovery and exploration.
Special thanks to Simon Trewin and Neill Cain for their insights!