Wednesday, 29 January 2020

Hybrid cloud management requires new tools, skills

Hybrid cloud environments can deliver an array of benefits, but in many enterprises, they're becoming increasingly complex and difficult to manage. To cope, adopters typically turn to some type of management software. What soon becomes apparent, however, is that hybrid cloud management tools can be as complex and confounding as the environments they're designed to support.

A hybrid cloud typically includes a mix of computing, storage and other services. The environment is formed by a combination of on-premises infrastructure resources, private cloud services, and one or more public cloud offerings, such as Amazon Web Services (AWS) or Microsoft Azure, as well as orchestration among the various platforms.

Any organization contemplating a hybrid cloud deployment should begin building a transition framework at the earliest possible stage. "The biggest decision is what data and which applications should be on-premises due to the sensitivity of data, and what goes into the cloud," says Umesh Padval, a partner at venture capital firm Thomvest Ventures.

Numerous other issues also need to be sorted out at the start, including the ultimate destination of lower priority, yet still critical, data and applications. Will they be kept on premises forever or migrated at some point into the cloud? With applications and data scattered, security is another major concern. Operational factors and costs also need to be addressed at the very beginning. "Your email application may run great in your data center, but may operate differently in the cloud," Padval notes.

Hybrid cloud tools immature yet evolving
A complex hybrid cloud requires constant oversight as well as a way to intuitively and effectively manage an array of operations, including network performance, workload management, security and cost control. Not surprisingly, given the large number of management tasks needed to run an efficient and reliable hybrid cloud environment, adopters can select from a rapidly growing array of management tools.

"There’s a dizzying array of options from vendors, and it can be difficult to sort through them all," says R. Leigh Henning, principal network architect for data center operator Markley Group. "Vendors don’t always do the best job at making their differentiators clear, and a lot of time and effort is wasted as a result of this confusion. Companies are getting bogged down in an opaque field of choices."

The current hybrid cloud management market is both immature and evolving, declares Paul Miller, vice president of hybrid cloud at Hewlett Packard Enterprise. Vendors are still getting a handle on the types of management tools their customers need. "Offerings are limited and may not be supported across all public, on-premises and edges," Miller adds.

Perhaps the biggest challenge to hybrid cloud management is that the technology adds new, complex and frequently discordant layers to operations management. "Many solutions have compatibility restrictions on the components they can manage, locking your management platform into a vendor or group of vendors, which may or may not align with your current or future system architecture," warns George Burns III, senior consultant of cloud operations for IT professional services firm SPR.

A lack of standardized APIs, which in turn results in a shortage of standardized management tools, presents another adoption challenge. "The lack of standardized tools increases operational complexity through the creation of multiple incongruent tools; this leads to vendor lock-in and, in some cases, gross inefficiencies in terms of resource utilization," explains Vipin Jain, CTO of Pensando, a software-defined services platform developer. "To make it worse, these kinds of problems are typically 'solved' by adding another layer of software, which further increases complexity, reduces debuggability, and results in suboptimal use of features and resources."

Meanwhile, using standardized open-source tools can be an effective starting point to safeguard against compatibility issues. "Cloud Native Computing Foundation (CNCF) tools, such as Kubernetes and Prometheus, are good examples," Jain says. "Open-source tools from HashiCorp, such as Vault, Vagrant, Packer, and Terraform, [provide] a good normalization layer for multi-cloud and hybrid cloud deployments, but they are by no means sufficient," he notes. Ideally, the leading public cloud vendors would all agree on a standardized set of APIs that the rest of the industry could then follow. "Standardization can be a moving target, but it's critical from an efficiency and customer satisfaction perspective," Jain says.

Developers writing API configurations, as well as developers using API configurations, form a symbiotic relationship that should be mutually maintained, Burns advises. "Hardware vendors need to be open about changes and enhancements coming to their products and how that will affect their APIs," he explains. "Equally, management platform developers need to be mindful of changes to hardware platform APIs, [and] regularly participate in testing releases and provide adequate feedback to the vendor about results and functionality."

Prioritize management requirements; expect gaps
Even when everything works right, there are often gaps remaining between intended and actual management functionality. "In an ideal world, developers would have the perfect lab environments that would allow them to successfully test each product implementation, allowing functionality to be seamless across upgrades," Burns observes. "Unfortunately, we can’t expect everything to function perfectly and cannot forgo [on-site] testing."

When selecting a hybrid cloud management platform, it's important to not only be aware of its documented limitations, but also to know that nothing is certain until it's tested in its user's own hybrid cloud environment, Burns advises. "Gaps will exist, but it's ultimately your responsibility to fully identify and verify those gaps in your own environment," he says.

Further muddling the situation is the fact that many management tool packages are designed to supply multiple functions, which can make product selection difficult and confusing. "To simplify, customers need to consider which features are most important to them based on their use cases and can show a quick return on investment, mapping to their specific cloud journey," Miller explains.

Real-world experience with hybrid cloud management
Despite management challenges, most hybrid cloud adopters find a way to get their environment to function effectively, reliably and securely.

Gavin Burris, senior project leader, research computing, at the Wharton School of the University of Pennsylvania, appreciates the flexibility a hybrid cloud provides. "We have a small cluster ... that's generally available to all the faculty and PhD students," he notes. The school's hybrid environment supports a fair share prioritization scheme, which ensures that all users have access to the resources they need to support their work. "When they need more, they're able to request their own dedicated job queue that's run in the cloud," he says.

Burris, who uses Univa management products, says that having a management tool that allows fast and easy changes is perfect for individuals who like to maintain firm control over their hybrid environment. "I like to do things with scripting and automation, so to be able to go in and write my own rules and policies and build my own cluster with these management tools is really what I’m looking for," he explains.

James McGibney, senior director of cybersecurity and compliance at Rosendin Electric, an electrical contractor headquartered in San Jose, Calif., relies on a hybrid cloud to support a variety of essential operations. "Approximately two years ago we embarked on our journey from an on-premises disaster recovery, quality assurance and production environment to a cloud migration encompassing hundreds of terabytes of data," he says. McGibney relies on a management console provided by AWS and VMWare. The tool meets his current needs, but like many hybrid cloud administrators, he's keeping a close eye on industry developments. "We're currently investigating [other] options, just to see what’ out there," he says. Yet he doesn't expect to make any changes in the short term. "We're happy with the tools currently provided by AWS and VMware."

Sharpen network skills for hybrid cloud
Selecting a hybrid cloud management platform is not as simple as purchasing software and spinning up some VMs to run it. "During implementation, ensure that you have selected the proper product owners and engineers, and then determine what, if any, additional education or credentials they will need to effectively deploy and maintain the platform," Burns suggests. "Fully define your architecture, ensure buy-in from your staff, work with them to identify education gaps and create a solid operational plan for going forward."

Most hybrid cloud management tasks focus on configuration and access control operations, which tend to be both complex and challenging to implement. "At the same time, the beauty of the cloud is its ability to automate," says Mike Lamberg vice president and CISO at ION Group and its Openlink unit, which provides risk management, operations and finance software. Yet deploying a high level of automation also requires new skills and developers who can expertly handle the demands of virtual software-defined infrastructures as well as traditional environments. "We can’t assume that because teams can build applications in physical data centers that these skills will translate as they move to the cloud; new skills are required for success," Lamberg notes.

Hybrid cloud management requires a new team mindset. "IT networking staff literally need to unlearn what they know about physical networks and connectivity and recognize that the moving of packets and data is now handled by a forwarding software configuration, not by physical routers or switches," Lamberg says. "You can’t take what you did in building and supporting physical data centers and just apply it to the cloud—it simply doesn’t work."

In the big picture, transitioning to a hybrid cloud environment can solve many problems, yet it can also create some new obstacles if not properly implemented and managed. "Don't rush into any decision without considering all the points of impact that you can identify," Burns advises. "Make sure that you understand the breadth of a hybrid infrastructure and how it will be used to address business needs."

Tuesday, 21 January 2020

Deep learning vs. machine learning: Understand the differences

Machine learning and deep learning are both forms of artificial intelligence. You can also say, correctly, that deep learning is a specific kind of machine learning. Both machine learning and deep learning start with training and test data and a model and go through an optimisation process to find the weights that make the model best fit the data. Both can handle numeric (regression) and non-numeric (classification) problems, although there are several application areas, such as object recognition and language translation, where deep learning models tend to produce better fits than machine learning models.

Machine learning explained
Machine learning algorithms are often divided into supervised (the training data are tagged with the answers) and unsupervised (any labels that may exist are not shown to the training algorithm). Supervised machine learning problems are further divided into classification (predicting non-numeric answers, such as the probability of a missed mortgage payment) and regression (predicting numeric answers, such as the number of widgets that will sell next month in your Manhattan store).

Unsupervised learning is further divided into clustering (finding groups of similar objects, such as running shoes, walking shoes, and dress shoes), association (finding common sequences of objects, such as coffee and cream), and dimensionality reduction (projection, feature selection, and feature extraction).

Classification algorithms
A classification problem is a supervised learning problem that asks for a choice between two or more classes, usually providing probabilities for each class. Leaving out neural networks and deep learning, which require a much higher level of computing resources, the most common algorithms are Naive Bayes, Decision Tree, Logistic Regression, K-Nearest Neighbors, and Support Vector Machine (SVM). You can also use ensemble methods (combinations of models), such as Random Forest, other Bagging methods, and boosting methods such as AdaBoost and XGBoost.

Regression algorithms
A regression problem is a supervised learning problem that asks the model to predict a number. The simplest and fastest algorithm is linear (least squares) regression, but you shouldn’t stop there, because it often gives you a mediocre result. Other common machine learning regression algorithms (short of neural networks) include Naive Bayes, Decision Tree, K-Nearest Neighbors, LVQ (Learning Vector Quantization), LARS Lasso, Elastic Net, Random Forest, AdaBoost, and XGBoost. You’ll notice that there is some overlap between machine learning algorithms for regression and classification.

Clustering algorithms
A clustering problem is an unsupervised learning problem that asks the model to find groups of similar data points. The most popular algorithm is K-Means Clustering; others include Mean-Shift Clustering, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), GMM (Gaussian Mixture Models), and HAC (Hierarchical Agglomerative Clustering).

Dimensionality reduction algorithms
Dimensionality reduction is an unsupervised learning problem that asks the model to drop or combine variables that have little or no effect on the result. This is often used in combination with classification or regression. Dimensionality reduction algorithms include removing variables with many missing values, removing variables with low variance, Decision Tree, Random Forest, removing or combining variables with high correlation, Backward Feature Elimination, Forward Feature Selection, Factor Analysis, and PCA (Principal Component Analysis).

Optimization methods
Training and evaluation turn supervised learning algorithms into models by optimizing their parameter weights to find the set of values that best matches the ground truth of your data. The algorithms often rely on variants of steepest descent for their optimizers, for example stochastic gradient descent, which is essentially steepest descent performed multiple times from randomized starting points.

Common refinements on stochastic gradient descent add factors that correct the direction of the gradient based on momentum, or adjust the learning rate based on progress from one pass through the data (called an epoch or a batch) to the next.

Data cleaning for machine learning
There is no such thing as clean data in the wild. To be useful for machine learning, data must be aggressively filtered. For example, you’ll want to:

  • Look at the data and exclude any columns that have a lot of missing data.
  • Look at the data again and pick the columns you want to use (feature selection) for your prediction. This is something you may want to vary when you iterate.
  • Exclude any rows that still have missing data in the remaining columns.
  • Correct obvious typos and merge equivalent answers. For example, U.S., US, USA, and America should be merged into a single category.
  • Exclude rows that have data that is out of range. For example, if you’re analyzing taxi trips within New York City, you’ll want to filter out rows with pickup or drop-off latitudes and longitudes that are outside the bounding box of the metropolitan area.

There is a lot more you can do, but it will depend on the data collected. This can be tedious, but if you set up a data cleaning step in your machine learning pipeline you can modify and repeat it at will.

Data encoding and normalization for machine learning
To use categorical data for machine classification, you need to encode the text labels into another form. There are two common encodings.

One is label encoding, which means that each text label value is replaced with a number. The other is one-hot encoding, which means that each text label value is turned into a column with a binary value (1 or 0). Most machine learning frameworks have functions that do the conversion for you. In general, one-hot encoding is preferred, as label encoding can sometimes confuse the machine learning algorithm into thinking that the encoded column is supposed to be an ordered list.

To use numeric data for machine regression, you usually need to normalize the data. Otherwise, the numbers with larger ranges might tend to dominate the Euclidian distance between feature vectors, their effects could be magnified at the expense of the other fields, and the steepest descent optimization might have difficulty converging. There are a number of ways to normalize and standardize data for machine learning, including min-max normalization, mean normalization, standardization, and scaling to unit length. This process is often called feature scaling.

Feature engineering for machine learning
A feature is an individual measurable property or characteristic of a phenomenon being observed. The concept of a “feature” is related to that of an explanatory variable, which is used in statistical techniques such as linear regression. Feature vectors combine all the features for a single row into a numerical vector.

Part of the art of choosing features is to pick a minimum set of independent variables that explain the problem. If two variables are highly correlated, either they need to be combined into a single feature, or one should be dropped. Sometimes people perform principal component analysis to convert correlated variables into a set of linearly uncorrelated variables.

Some of the transformations that people use to construct new features or reduce the dimensionality of feature vectors are simple. For example, subtract Year of Birth from Year of Death and you construct Age at Death, which is a prime independent variable for lifetime and mortality analysis. In other cases, feature construction may not be so obvious.

Splitting data for machine learning
The usual practice for supervised machine learning is to split the data set into subsets for training, validation, and test. One way of working is to assign 80% of the data to the training data set, and 10% each to the validation and test data sets. (The exact split is a matter of preference.) The bulk of the training is done against the training data set, and prediction is done against the validation data set at the end of every epoch.

The errors in the validation data set can be used to identify stopping criteria, or to drive hyperparameter tuning. Most importantly, the errors in the validation data set can help you find out whether the model has overfit the training data.

Prediction against the test data set is typically done on the final model. If the test data set was never used for training, it is sometimes called the holdout data set.

There are several other schemes for splitting the data. One common technique, cross-validation, involves repeatedly splitting the full data set into a training data set and a validation data set. At the end of each epoch, the data is shuffled and split again.

Machine learning libraries
In Python, Spark MLlib and Scikit-learn are excellent choices for machine learning libraries. In R, some machine learning package options are CARAT, randomForest, e1071, and KernLab. In Java, good choices include Java-ML, RapidMiner, and Weka.

Deep learning explained
Deep learning is a form of machine learning in which the model being trained has more than one hidden layer between the input and the output. In most discussions, deep learning means using deep neural networks. There are, however, a few algorithms that implement deep learning using other kinds of hidden layers besides neural networks.

The ideas for “artificial” neural networks go back to the 1940s. The essential concept is that a network of artificial neurons built out of interconnected threshold switches can learn to recognize patterns in the same way that an animal brain and nervous system (including the retina) does.

Backprop
The learning occurs basically by strengthening the connection between two neurons when both are active at the same time during training. In modern neural network software this is most commonly a matter of increasing the weight values for the connections between neurons using a rule called back propagation of error, backprop, or BP.

Neurons in artificial neural networks
How are the neurons modeled? Each has a propagation function that transforms the outputs of the connected neurons, often with a weighted sum. The output of the propagation function passes to an activation function, which fires when its input exceeds a threshold value.

Activation functions in neural networks
In the 1940s and ’50s artificial neurons used a step activation function and were called perceptrons. Modern neural networks may say they are using perceptrons, but actually have smooth activation functions, such as the logistic or sigmoid function, the hyperbolic tangent, or the Rectified Linear Unit (ReLU). ReLU is usually the best choice for fast convergence, although it has an issue of neurons “dying” during training if the learning rate is set too high.

The output of the activation function can pass to an output function for additional shaping. Often, however, the output function is the identity function, meaning that the output of the activation function is passed to the downstream connected neurons.

Neural network topologies
Now that we know about the neurons, we need to learn about the common neural network topologies. In a feed-forward network, the neurons are organized into distinct layers: one input layer, n hidden processing layers, and one output layer. The outputs from each layer go only to the next layer.

In a feed-forward network with shortcut connections, some connections can jump over one or more intermediate layers. In recurrent neural networks, neurons can influence themselves, either directly or indirectly through the next layer.

Training neural networks
Supervised learning of a neural network is done just like any other machine learning: You present the network with groups of training data, compare the network output with the desired output, generate an error vector, and apply corrections to the network based on the error vector. Batches of training data that are run together before applying corrections are called epochs.

Optimizers for neural networks
Optimizers for neural networks typically use some form of gradient descent algorithm to drive the back propagation, often with a mechanism to help avoid becoming stuck in local minima, such as optimizing randomly selected mini-batches (Stochastic Gradient Descent) and applying momentum corrections to the gradient. Some optimization algorithms also adapt the learning rates of the model parameters by looking at the gradient history (AdaGrad, RMSProp, and Adam).

As with all machine learning, you need to check the predictions of the neural network against a separate validation data set. Without doing that you risk creating neural networks that only memorize their inputs instead of learning to be generalized predictors.

Deep learning algorithms
A deep neural network for a real problem might have upwards of 10 hidden layers. Its topology might be simple, or quite complex.

The more layers in the network, the more characteristics it can recognize. Unfortunately, the more layers in the network, the longer it will take to calculate, and the harder it will be to train.

Convolutional neural networks (CNN) are often used for machine vision. Convolutional neural networks typically use convolutional, pooling, ReLU, fully connected, and loss layers to simulate a visual cortex. The convolutional layer basically takes the integrals of many small overlapping regions. The pooling layer performs a form of non-linear down-sampling. ReLU layers apply the non-saturating activation function f(x) = max(0,x). In a fully connected layer, the neurons have connections to all activations in the previous layer. A loss layer computes how the network training penalizes the deviation between the predicted and true labels, using a Softmax or cross-entropy loss function for classification, or a Euclidean loss function for regression.

Recurrent neural networks (RNN) are often used for natural language processing (NLP) and other sequence processing, as are Long Short-Term Memory (LSTM) networks and attention-based neural networks. In feed-forward neural networks, information flows from the input, through the hidden layers, to the output. This limits the network to dealing with a single state at a time.

In recurrent neural networks, the information cycles through a loop, which allows the network to remember recent previous outputs. This allows for the analysis of sequences and time series. RNNs have two common issues: exploding gradients (easily fixed by clamping the gradients) and vanishing gradients (not so easy to fix).

In LSTMs, the network is capable of forgetting (gating) previous information as well as remembering it, in both cases by altering weights. This effectively gives an LSTM both long-term and short-term memory, and solves the vanishing gradient problem. LSTMs can deal with sequences of hundreds of past inputs.

Attention modules are generalized gates that apply weights to a vector of inputs. A hierarchical neural attention encoder uses multiple layers of attention modules to deal with tens of thousands of past inputs.

Random Decision Forests (RDF), which are not neural networks, are useful for a range of classification and regression problems. RDFs are constructed from many layers, but instead of neurons an RDF is constructed from decision trees, and outputs a statistical average (mode for classification or mean for regression) of the predictions of the individual trees. The randomized aspects of RDFs are the use of bootstrap aggregation (a.k.a. bagging) for individual trees, and taking random subsets of the features for the trees.

XGBoost (eXtreme Gradient Boosting), also not a deep neural network, is a scalable, end-to-end tree boosting system that has produced state-of-the-art results on many machine learning challenges. Bagging and boosting are often mentioned in the same breath; the difference is that instead of generating an ensemble of randomized trees, gradient tree boosting starts with a single decision or regression tree, optimizes it, and then builds the next tree from the residuals of the first tree.

Some of the best Python deep learning frameworks are TensorFlow, Keras, PyTorch, and MXNet. Deeplearning4j is one of the best Java deep learning frameworks. ONNX and TensorRT are runtimes for deep learning models.

Deep learning vs. machine learning
In general, classical (non-deep) machine learning algorithms train and predict much faster than deep learning algorithms; one or more CPUs will often be sufficient to train a classical model. Deep learning models often need hardware accelerators such as GPUs, TPUs, or FPGAs for training, and also for deployment at scale; without them, the models would take months to train.

For many problems, some classical machine learning algorithms will produce a “good enough” model. For other problems, classical machine learning algorithms have not worked terribly well in the past.

One area that is usually attacked with deep learning is natural language processing, which encompasses language translation, automatic summarization, co-reference resolution, discourse analysis, morphological segmentation, named entity recognition, natural language generation, natural language understanding, part-of-speech tagging, sentiment analysis, and speech recognition.

Another prime area for deep learning is image classification, which includes image classification with localization, object detection, object segmentation, image style transfer, image colorization, image reconstruction, image super-resolution, and image synthesis.

In addition, deep learning has been used successfully to predict how molecules will interact in order to help pharmaceutical companies design new drugs, to search for subatomic particles, and to automatically parse microscope images used to construct a three-dimensional map of the human brain.

Monday, 20 January 2020

Do containers need backup?

Containers are breaking backups around the world, but there are steps you can take to make sure that the most critical parts of your container infrastructure are protected against the worst things that can happen to your data centre.

At first glance it may seem that containers don’t need to be backed up, but on closer inspection, it does make sense in order to protect against catastrophic events and for other, less disastrous eventualities.

Container basics

Containers are another type of virtualization, and Docker is the most popular container platform. Containers are a specialized environment in which you can run a particular application. One way to think of them is like lightweight virtual machines. Where each VM in a hypervisor server contains an entire copy of an operating system, containers share the underlying operating system, and each of them contains only the required libraries needed by the application that will run in that container. As a result, many containers on a single node (a physical or virtual machine running an OS and the container runtime environment) take up far fewer resources than the same number of VMs.

Another difference between VMs and containers is that where companies tend to simultaneously run many applications in a single VM, containers are typically designed to each serve a single application component that typically does a single task such as logging or monitoring.  If multiple application components need to interact, each will typically run in its own container and communicate across the network. This allows for individual scaling of each application and provides some fault and security isolation between applications.

Where VMs are designed to run inside a particular hypervisor running on a particular set of hardware, containers are much more portable. Containers are designed to run on virtually any Linux system, and can even run on Windows if the appropriate software has been installed. Finally, containers are designed to be much more temporary than VMs. Where a typical VM might run for months or even years, 95% of all containers live for less than a week, according to a recent Sysdig survey.

Running a lot of containers in a production environment requires orchestration, and that’s where Kubernetes (often spelled K8s) comes in. It groups containers into pods, which are one more containers accomplishing a single purpose.  Containers in a pod can easily communicate with each other and can share storage by mounting a shared volume.

How containers break backups
Historically backups were accomplished by placing an agent in a server that needed to be backed up. Virtualization broke that model, so a different model was created where the agent runs at the hypervisor level and backs up the VMs as images.  Containers offer neither of these options.

While you could theoretically place an agent inside a container image, that is considered very bad form for many reasons, so no one does that. In addition, there is currently no way to run an agent at the container runtime layer, which is analogous to the hypervisor level. Finally, the idea of backing up containers seems rather foreign to many who use them.  Think about it; most containers live for less than a week.

Why containers need backing up
In one sense, a typical container does not need to have its running state backed up; it is not unique enough to warrant such an operation. Furthermore, most containers are stateless – there is no data stored in the container. It’s just another running instance of a given container image that is already saved via some other operation.

Many container advocates are quick to point out that high availability is built into every part of the container infrastructure. Kubernetes is always run in a cluster.  Containers are always spawned and killed off as needed.  Unfortunately, many confuse this high availability with the ability to recover from a disaster.

To change the conversation, ask someone how they would replicate their entire Kubernetes and Docker environment should something take out their entire cluster, container nodes and associated persistent storage. Yes, there are reasons Kubernetes, Docker and associated applications need to backed up.

First, to recover from disasters.  What do you do if the worst happens? Second, to replicate the environment as when moving from a test/dev environment to production, or from production to staging before an upgrade. And third, to migrate a Kubernetes cluster more easily.

What would you need in a disaster?
There are several things you would need to replicate an entire environment in case of disaster:

Container images – A container image is a static file that contains all the executable code necessary for a container to run. Container images do not change; they are what is used to run a given container.  If changes need to be made to the libraries and code for a given container, a new image would be created for that container. Container images need to be protected in some way, often using a repository for such things. In turn, that repository should be protected against disasters.

Attached storage, databases – Containers often create data that outlives the life of the container. To accomplish that, mount a volume via NFS, an object store or similar mechanism, and write data to that volume. It may also make a connection to a database.

Persistent volumes – Kubernetes pods are increasingly using persistent storage. That data should also get backed up if the data stored on it is valuable to the business.

Deployments – A deployment is a Kubernetes concept of a set of pods accomplishing a particular function. Deployments are stored as YAML files that need to be backed up.

Kubernetes etcd – The Kubernetes central database is etcd, and it needs to be backed up. It’s relatively small, and K8s provides tools to dump its contents to a file that you can then back up.

Prometheus – Prometheus is often used to monitor K8s and Docker. It’s configuration should also be backed up.

Kubernetes resources – As developers create resources in K8s, those resources need to be backed up with the right group and version.

What shouldn’t need backup?
Not everything needs to be backed up. For example:

Running stateless containers – A running container is temporary. It was spawned from an image – which needs to be backed up – but the running instance of the container does not need to be backed up. Any data it creates should probably be backed up, but if the container itself needs to be backed up, something is wrong. If a container actually contains data, as opposed to storing it on an external volume, then it would need to backed up – but that should be very rare.

Pods – Since pods are simply groups of running containers, they also do not need to be backed up.

Each entity mentioned above offers a native tool that can be used to back up that entity to local or remote storage. There are also commercial utilities starting to come on the market that run in a variety of ways. This article covers these methods in detail, including how to use them to restore the various parts of your Kubernetes and Docker environment.

https://www.networkworld.com/

Friday, 17 January 2020

Enterprises spend more on cloud IaaS than on-premises data-center gear

Enterprise tech crossed a significant line as the decade ended. For the first time, enterprises spent more annually on cloud infrastructure services than on data-center hardware and software, according to Synergy Research Group.

Synergy reports that total spending on cloud infrastructure services in 2019 will reach $97 billion, a 38% increase over the prior year. Ten years ago, that spending was near zero. Total spending on data center hardware and software, on the other hand, is expected to hit $93 billion in 2019, an increase of only 1% when compared to 2018.

It should be noted that Synergy derived its figures from actual sales in the first three quarters and projections for the fourth quarter, so both stats are subject to adjustment, but not likely very much.

From 2009 to 2019, average annual spending growth for cloud infrastructure services was 56%, according to Synergy, while on-prem hardware sales grew only 4% on average. Data center spending did jump in 2018 due to the popularity of hyperconverged infrastructure systems, which are cloud-like in their operation.

The major segments with the highest growth rates over the decade were virtualization software, Ethernet switches and network security. Server share of the total data center market remained steady, while storage share declined.

"The decade has seen a dramatic increase in computer capabilities, increasingly sophisticated enterprise applications and an explosion in the amount of data being generated and processed, pointing to an ever-growing need for data center capacity," said John Dinsdale, chief analyst at Synergy Research Group, in a statement.

However, more than half of the servers now being sold are going into cloud providers’ data centers and not those of enterprises, Dinsdale added. "Over the last ten years we have seen a remarkable transformation in the IT market. Enterprises are now spending almost $200 billion per year on buying or accessing data center facilities, but cloud providers have become the main beneficiaries of that spending."

I’ve written in the past that the trend is for continued support and use of on-prem data centers – but Dinsdale believes they are on the way out. "Lots of companies are getting out of running their own data centers, and there is no end in sight to the trend. Things or news items that run against the grain of a big trend tend to get media coverage and can give readers the sense the trend isn’t there. This one most definitely is," he said via email.

There are many different approaches going on, however. Some companies are going all in on the cloud, while others are reducing investment in owned infrastructure and moving some workloads to the cloud. Some are maintaining ownership of hardware but pushing it into colocation facilities. Others are maintaining ownership of hardware but consolidating down to a smaller number of larger data centers, and some are maintaining the status quo.

"But when you net it all out the numbers from [Synergy's report] tell the big story – that cloud is growing like gangbusters, while enterprise investments in their own data center infrastructure are being heavily constrained," Dinsdale said.

At the same time, there is a trend toward companies making combined on-prem and cloud purchases, according to Dinsdale. So when a company does a data center overhaul, it balances in-house priorities with a planned cloud migration as part of the overall purchase.

"This isn’t a simple ‘either x or y’ situation, and a common practice is moving suitable workloads and apps to the cloud while continuing to manage in-house IT for more complex or sensitive tasks. But the balance is very clearly swinging heavily towards using cloud providers," Dinsdale said.

With hyperscale data center operators such as Amazon, Microsoft, Google and Facebook buying servers in the tens of thousands, we have seen the emergence over the last few years of original design manufacturers (ODM) that make huge volumes of cloud provider-designed hardware just for the big hyperscale operators. That’s why companies like Inspur, Huawei, and Supermicro have muscled their way onto the top server sales lists from Gartner and IDC. And these operators use their own custom hardware, so they aren’t causing component shortages for mainline server vendors like HPE and Dell.

"Hyperscale companies operate at such a scale that this strategy works for them. They design hardware that is very specifically optimized for their own usage, stripping out all extraneous components and functions. They then use contract manufacturers to produce large volumes at relatively low cost per unit. And one result is that ODMs have steadily eaten into the market share of more traditional hardware vendors," Dinsdale said.

Google Cloud launches Archive cold storage service

Google Cloud announced the general availability of Archive, a long-term data retention service intended as an alternative to on-premises tape backup.

Google pitches it as cold storage, meaning it is for data which is accessed less than once a year and has been stored for many years. Cold storage data is usually consigned to tape backup, which remains a surprisingly successful market despite repeated predictions of its demise.

Of course, Google's competition has their own products. Amazon Web Services has Glacier, Microsoft has Cool Blob Storage, and IBM has Cloud Storage. Google also offers its own Coldline and Nearline cloud storage offerings; Coldline is designed for data a business expects to touch less than once a quarter, while Nearline is aimed at data that requires access less than once a month.

With Archive, Google highlights a few differentiators from the competition and its own archival offerings. First, Google promises no delay on data retrieval, claiming millisecond latency. AWS can take minutes or hours. While Archive costs a little more than AWS and Azure – $1.23 per terabyte per month vs. $1 per terabyte per month for AWS and Azure – that’s due in part to the longer remit for an early deletion charge. Google offers 365 days compared with 180 days for AWS and Azure.

"Having flexible storage options allows you to optimize your total cost of ownership while meeting your business needs," wrote Geoffrey Noer, Google Cloud storage product manager in a blog post announcing the service’s availability. "At Google Cloud, we think that you should have a range of straightforward storage options that allow you to more securely and reliably access your data when and where you need it, without performance bottlenecks or delays to your users."

Archive is a store-and-forget service, where you keep stuff only because you have to. Tape replacement and archiving data under regulatory retention requirements are two of the most common use cases, according to Google. Other examples include long-term backups and original master copies of videos and images.

The Archive class can also be combined with Bucket Lock, Google Cloud’s data locking mechanism to prevent data from being modified, which is available to enterprises for meeting various data retention laws, according to Noer.

The Archive class can be set up in dual-regions or multi-regions for geo-redundancy and offers checksum verification durability of "11 nines" – 99.999999999 percent.

More information can be found here.

https://www.networkworld.com/

Thursday, 2 January 2020

How to Use Machine Learning to Drive Real Value

Continuously connected customers with multiple devices and an endless number of interaction touchpoints aren't easy to engage. They’re on a multi-dimensional journey and can appear to a brand at any time, on any channel.

It’s not surprising, then, that consumers give brands low marks for their ability to deliver an exceptional customer experience. According to a recent Harris Poll survey, only 18 percent of consumers rated brands’ ability to deliver an exceptional experience as excellent.

Even if the data about a customer is well managed, to successfully engage the connected consumer and deliver highly personalized experiences requires advanced analytical tools. Artificial intelligence and machine learning are now being applied by innovative businesses to create real-time, personalized experiences at scale with models that intelligently orchestrate offerings throughout the customer journey.

How to Deploy Effective In-Line Analytics
It’s easy to get caught up in the hype surrounding AI and machine learning, with business leaders chasing shiny objects for an AI application that might have little to do with critical business goals.

When paired with a persistent, real-time, single customer record, AI and automated machine learning platforms can be utilized to meet those business goals, increase revenue and fundamentally change the way brands communication with customers.

In this eWEEK Data Points article, George Corugedo, Chief Technology Officer and co-founder of customer engagement hub maker RedPoint Global, suggests several truths about machine learning that every business leader should keep in mind when thinking about customer records.

Data Point No. 1: Machine learning should drive revenue.
The ultimate goal of machine learning shouldn’t be a flashy, futuristic example but instead a system to drive revenue and results for the business. The result of effective machine learning isn’t likely a robot, chatbot or facial recognition tool – it’s machine learning-driven programs that are embedded behind the scenes, driving intelligent decisions for optimized customer engagement.

Data Point No. 2: Having one model–or even many–is not enough.
Organizations need many models running and working in real time to truly make machine learning work for their needs. For future-forward organizations, intelligence and analysis needs to be embedded, so instead of using one model, multiple in-line analytic models can incrementally adjust and find opportunities for growth. These fleets of ML models can optimize business functions and drive associated revenues.

Data Point No. 3: When applied in silos, machine learning is not as effective.
Today’s consumer is omnichannel. Businesses must forego the traditional channel-specific “batch and blast” approach that sufficed when customer choice was limited and the buying journey followed a mostly straight-line path. Today’s customer journey is dynamic, and the learning applied to the customer relationship should be, as well. Machine learning is particularly well-suited to solving these multidimensional problems.

Data Point No. 4: Analytics are only intelligent when models are up to date.
News flash: Machine-learning models age and can quickly become stale. For this reason, organizations must consistently rebuild and retrain models using today’s data. In fact, models should be developed, updated and even applied in real-time on an ongoing testing basis so that businesses can truly capitalize on the moment of interaction. This is most effective in a closed loop system that continually looks for incremental changes to optimize.

Data Point No. 5: You don’t need to be a data scientist to benefit from machine learning.
When models are configured correctly, they will run 24/7 looking for opportunities within the data, set up and managed by marketers. These systems can be set once and guided to produce the specific business metrics needed. With every record tracked in the system, insights are pulled easily, and the recommendations can be made automatically. Businesses should focus on producing continually updated data and let the automation tools use machine learning to drive greater revenue.

Data Point No. 6: Summary: The power is in your hands.
Machine learning has the power to fully transform an enterprise. Therefore, it’s natural for business leaders to get lost in the hype and lose sight of the real value it can deliver day-to-day. The truth is, the real value of machine learning is that it allows businesses to try new things, amplify creative strengths, reveal new discoveries and enable collaboration across the organization. However, these benefits will only be realized once organizations get past the hype and are willing to walk into the weeds.