Tuesday, 6 March 2018

Machine learning could unlock the power of self-driving data centers

For Ben Treynor Sloss, Google’s VP of engineering, the data center of the future will not only benefit from the use of machine learning, but will be run by AI.

Sloss pointed to the significant cost savings gleaned from Google’s own DeepMind machine learning system, which was instrumental in running the technology giant’s data center in 2016.

The DeepMind system was able to significantly improve the power efficiency of the data center by adjusting how servers were run and the operation of power and cooling equipment. Energy reductions reached 40% and if similar systems were rolled out across all Google’s data centers globally, it could add up to a saving of tens of millions of dollars each year.

For Alex Robbio, cofounder and president of Belatrix Software, the potential for the application of machine learning and artificial intelligence is about more than just power management. Robbio has long nurtured an interest in neural networks and machine learning and within Belatrix he has created a research group to examine the application of different AI and machine learning frameworks for customer programmes. As much as 95% of Belatrix’s 500-odd engineers end up working at some kind of data center for the company’s customers, and so the application of machine learning solutions in these environments is of particular interest.

Robbio can imagine a scenario where, like the promise of self-driving cars, we will have self-driving data centers that are operating self-driving computers that are operating self-driving software. He says that while Google’s use of DeepMind to optimise power management is exciting, there are many other applications of machine learning and AI in the data center environment that we are going to see soon.

Artia Moghbel, head of operations and director of communications at DFINITY Networks, says that AI is already currently being used in data centers to reduce cost of operation by identifying inefficiencies and opportunities for cost savings, particular in relation to data center operation variables such as cooling fans, windows, and so on.

How might machine learning in the data center work in practice?

The next step, then, is the use of AI to manage the other physical aspects of running a data center. “We could see robotics managing the physical network at a data center. If you have a faulty machine that is doing something in the network because it is infected by a virus, for example, usually software is spun out to notify the operator that something needs to be done,” Robbio says, adding that AI could play this role.

“Software does not have the capability to act on those situations, but with machine learning you could get into the proactive or reactive action where the data center uncouples that device from the network automatically without the need for human intervention.”

He adds that robotics does not necessarily imply a humanoid robot like those often shown in science fiction. “It is about the physical action of plugging or unplugging a network card, not a complex robot,” Robbio says.

Jerry Melnick, president and CEO of SIOS Technology, which creates machine learning IT analytics tools, adds that virtual environments have rapidly become the heart of many data centers—running the company’s most important business applications. Melnick says this trend has outpaced the conventional approach to data center management and optimisation. Most IT teams are still organised along traditional physical silos of compute, storage, application and network and using a combination of manual know-how and tools that rely on single dimensional “threshold” based analysis.

You can easily spend weeks exploring New York City and still not do everything. Learn the must do's around SoHo

“Today, virtual environments are simply too complex and dynamic for humans to manage with this approach. Without an effective way to see across the virtual infrastructure silos and into the interactions between components, IT is blindsided by performance issues, capacity over-runs and other unexpected consequence,” Melnick says. This is where AI comes in. Advanced machine learning and deep learning analytic tools solve this problem by identifying the root cause of application performance issues in virtual environments instantaneously, and by recommending specific steps to resolve them.

“They look across the IT silos and learn the complex patterns of behaviour between interrelated components in the virtual ecosystem. This allows them to identify even complex, subtle interactions, such as ‘noisy neighbour’ scenarios where the slow performance on one VM may be caused by the behaviour of other VMs that share resources. Most importantly, advanced machine learning analytics tools can predict when performance issues will arise, based on the past behaviour,” he says.

Longer term, Robbio believes we could have fully automated data centers. “You could have your supply of servers and devices, put in and out by robots controlled by machine learning algorithms that can proactively handle the maintenance of the data center.” This could represent a significant costs saving since such maintenance is very expensive.

The other area that Robbio sees the potential for machine learning is on the logical side. As machine learning evolves, decision making will become part of the functionality of software deployed to maintain data centers. “Most of software is very rules-based with no ability to make decisions based on past patterns, but with machine learning there is the potential to apply more fuzzy logic and identify possible patterns and make decisions on that basis,” Robbio says.

Dinesh Nirmal, VP of analytics development at IBM, agrees. “The future is about transforming the data center to being self-optimising, self-managing—where things like root cause analysis and predicting/reducing equipment hotspots is automatic.” Nirmal believes that, in the future, the data center should be able to operate by itself. “The AI should know what patch should go on, when to apply, for example. It should be able to conduct anomaly detection, monitoring how you look at unexpected machine failures.”

Moghbel says that, while the industry’s use of AI in data centers is still nascent, its application in next generation data centers, such as the decentralised cloud, may prove to be essential. For instance, in the example of a decentralised cloud computer, decisions about how information is stored and regulated may fall onto the hands of human decision-making, all of whom may not all agree on whether or not new code should be implemented to improve the efficiency of the network.

“With a system that uses AI and self-governance like Dfinity, these types of decisions are governed by algorithms that ensure the cohesiveness of the platform and operational updates are executed in a timely and efficient manner throughout the network,” Moghbel says.

Implementing artificial intelligence and machine learning in the data centers is not without challenges, however, even at the most basic level of deployment. It is not technology though, that is the barrier. Rather, Robbio says, it is about the quantity, quality and nature of the available data.

“The challenge is more collecting and making sense of the data because you need so much data to make such applications work. Not everything is instrumented, not everything has logging capabilities, there are no common standards, so it is a matter of getting it all together, cleansing it and putting it in a format that can be learned by machine learning algorithms,” he says.

Does machine learning in the data center create a talent shortage?

For those looking to implement machine learning in the data center context another stumbling block may be talent. “Getting the data is one thing, getting the people to work with the data is another,” Robbio cautions, explaining that machine learning and AI is very different to what traditional developers have learned to do. “Developers tend to think in a rules-based mindset, while machine learning requires a different thought process. Getting people with that kind of mindset and experience is incredibly hard. That is why you are seeing companies like Google that are trying to train all their developers in machine learning.”

Michael Elliott, cloud strategy and evangelist at NetApp, says that the deployment of machine learning and AI should be treated like any other business automation program. Executives should ask: What are you ultimately trying to gain from the insight?

Elliott adds: “The C-suite should also consider the optimal storage infrastructure for their business and what they want to deliver to their customers. Once established, keeping data management and protection top of mind to maintain control while some of the day-to-day hurdles are minimised with the help of machine learning and AI advancements will maintain data governance for risk mitigation.”

Nirmal warns that moving to AI is not an easy transition. “There’s a tremendous amount of testing you have to do. The better prepared you are to take on the task, the more smoothly it will go and the cost of a data center failure is much greater than the cost of implementing AI or machine learning. But it all starts with understanding what you want to get done.”

https://www.networkworld.com

No comments:

Post a Comment