Friday, 26 July 2019

IBM expands its storage management software to support competitor products

Most companies like to keep their specialty software to themselves, but IBM is making a move to expand its Spectrum Discover metadata management software to support competitive storage products.

Spectrum Discover is modern metadata management software that provides data insight for petabyte-scale unstructured storage. It was designed for IBM Cloud Object Storage and IBM Spectrum Scale, a clustered file system, to rapidly ingest, consolidate and index metadata for billions of files and objects.

Its metadata layer enables storage administrators, data stewards and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data while helping to mitigate risk and accelerate large-scale analytics.

And now it’s coming to competitive storage systems. IBM has announced the availability of connectors for Spectrum Discover to Dell EMC Isilon, NetApp, Amazon S3, Ceph, and other S3- and NFSv3-compliant data sources.

IBM has also added new content-based data classification and tagging capabilities with the ability to extract metadata from over a thousand different file types. It has also added the ability to automatically detect certain types of personally identifiable information (PII) and other types of sensitive data, making it potentially easier for organizations to manage data in compliance with GDPR and other regulations.

“These enhancements are designed to enable your enterprise to create an open and transparent data ecosystem for AI and analytics, make it easier for data scientists to find and curate data for analysis, and enable storage administrators to optimize storage and manage unstructured data,” IBM said in a blog post announcing the news.

IBM Spectrum Protect Plus enhanced
IBM also announced enhancements to IBM Spectrum Protect Plus, its data protection and access control on virtual environments, including integration with IBM tape products, support for Amazon Glacier, Microsoft Azure archive and IBM Cloud Object Storage archive tier, and new disaster recovery from backup repositories running on Amazon Web Services (AWS).

Other improvements include a new user interface and additional language translations. All of the improvements are aimed at making things easier for partners to help customers perform needed operations in hybrid multicloud environments.

IBM cloud storage improved
Finally, IBM announced the next generation of capabilities for IBM Cloud Object Storage, its cloud-based storage service similar to Amazon Redshift. IBM Cloud Object Storage Gen 2 delivers a new architecture for object storage that’s scalable to over 10 PB in a single rack at a cost of less than one cent per GB per month. It also is now more flexible to allow the creation of tailored configurations that meet unique requirements for performance or capacity.

Data centers may soon recycle heat into electricity

Waste heat is the scurge of computing. In fact, much of the cost of powering a computer is from creating unwanted heat. That’s because the inefficiencies in electronic circuits, caused by resistance in the materials, generates that heat. The processors, without computing anything, are essentially converting expensively produced electrical energy into waste energy.

It’s a fundamental problem, and one that hasn’t been going away. But what if you could convert the unwanted heat back into electricity—recycle the heat back into its original energy form? The data center heat, instead of simply disgorging into the atmosphere to be gotten rid of with dubious eco-effects, could actually run more machines. Plus, your cooling costs would be taken care of—there’s nothing to cool because you’ve already grabbed the hot air.

Scientists at Rice Univeristy are trying to make that a reality by developing heat scavenging and conversion solutions.

Currently, the most efficient way to convert heat into electricity is through the use of traditional turbines.

Turbines “can give you nearly 50% conversion efficiency,” says Chloe Doiron, a graduate student at Rice University and co-lead on the project, in a news article on the school’s website. Turbines convert the kinetic energy of moving fluids, like steam or combustion gases, into mechanical energy. The moving steam then shifts blades mounted on a shaft, which turns a generator, thus creating the power.

Not a bad solution. The problem, though, is “those systems are not easy to implement,” the researchers explain. The issue is that turbines are full of moving parts, and they’re big, noisy, and messy.

Thermal emitter better than turbines for converting heat to energy

A better option would be a solid-state, thermal device that could absorb heat at the source and simply convert it, perhaps straight into attached batteries.

The researchers say a thermal emitter could absorb heat, jam it into tight, easy-to-capture bandwidth and then emit it as light. Cunningly, they would then simply turn the light into electricity, as we see all the time now in solar systems.

“Thermal photons are just photons emitted from a hot body,” says Rice University professor Junichiro Kono in the article. “If you look at something hot with an infrared camera, you see it glow. The camera is capturing these thermally excited photons.” Indeed, all heated surfaces, to some extent, send out light as thermal radiation.

The Rice team wants to use a film of aligned carbon nanotubes to do the job. The test system will be structured as an actual solar panel. That’s because solar panels, too, lose energy through heat, so are a good environment in which to work. The concept applies to other inefficient technologies, too. “Anything else that loses energy through heat [would become] far more efficient,” the researchers say.

Around 20% of industrial energy consumption is unwanted heat, Doiron says. That's a lot of wasted energy.

Other heat conversion solutions

Other heat scavenging devices are making inroads, too. Now-commercially available thermoelectric technology can convert a temperature difference into power, also with no moving parts. They function by exposing a specially made material to heat. Electrons flow when one part is cold and one is hot. And the University of Utah is working on silicon for chips that generates electricity as one of two wafers heat up.

https://www.networkworld.com

Saturday, 20 July 2019

How edge computing is driving a new era of CDN

We are living in a hyperconnected world where anything can now be pushed to the cloud. The idea of having content located in one place, which could be useful from the management’s perspective, is now redundant. Today, the users and data are omnipresent.

The customer’s expectations have up-surged because of this evolution. There is now an increased expectation of high-quality service and a decrease in customer’s patience. In the past, one could patiently wait 10 hours to download the content. But this is certainly not the scenario at the present time. Nowadays we have high expectations and high-performance requirements but on the other hand, there are concerns as well. The internet is a weird place, with unpredictable asymmetric patterns, buffer bloat and a list of other performance-related problems.

Also, the internet is growing at an accelerated rate. By the year 2020, the internet is expected to reach 1.5 Gigabyte of traffic per day per person. In the coming times, the world of the Internet of Things (IoT) driven by objects will far supersede these data figures as well. For example, a connected airplane will generate around 5 Terabytes of data per day. This spiraling level of volume requires a new approach to data management and forces us to re-think how we delivery applications.

Why? Because all this information cannot be processed by a single cloud or an on-premise location. Latency will always be a problem. For example, in virtual reality (VR) anything over 7 milliseconds will cause motion sickness. When decisions are required to be taken in real-time, you cannot send data to the cloud. You can, however, make use of edge computing and a multi-CDN design.

Introducing edge computing and multi-CDN
The rate of cloud adoption, all-things-video, IoT and edge computing are bringing life back to CDNs and multi-CDN designs. Typically, a multi-CDN is an implementation pattern that includes more than one CDN vendor. The traffic direction is performed by using different metrics, whereby traffic can either be load balanced or failed across the different vendors.

Edge computing moves actions as close as possible to the source. It is the point where the physical world interacts with the digital world. Logically, the decentralized approach of edge computing will not take over the centralized approach. They will be complementary to each other, so that the application can run at its peak level, depending on its position in the network.

For example, in IoT, saving battery life is crucial. Let’s assume an IoT device can conduct the transaction in 10ms round trip time (RTT), instead of 100ms RTT. As a result, it can use 10 times less battery.

The internet, a performance bottleneck
The internet is designed on the principle that everyone can talk to everyone, thereby providing universal connectivity whether required or not. There has been a number of design changes with network address translation (NAT) being the biggest. However, essentially the role of the internet has remained the same in terms of connectivity, regardless of location.

With this type of connectivity model, distance is an important determinant for the application’s performance. Users on the other side of the planet will suffer regardless of buffer sizes or other device optimizations. Long RTT is experienced as packets go back and forth before the actual data transmission. Although caching and traffic redirection is being used but limited success has been achieved so far.

The principles of application delivery
When transmission control protocol (TCP) starts, it thinks it is back in the late 1970s. It assumes that all services are on a local area network (LAN) and there is no packet loss. It then starts to work backward from there. Back when it was designed, we didn't have real-time traffic, such as voice and video that is latency and jitter sensitive.

Ideally, TCP was designed for the ease of use and reliability, not to boost the performance. You actually need to optimize the TCP stack. And this is why CDNs are very good at performing such tasks. For example, if a connection is received from a mobile phone, a CDN will start with the assumption that there is going to be high jitter and packet loss. This allows them to size the TCP window correctly that accurately match network conditions.

How do you magnify the performance, what options do you have? In a generic sense, many look to lowering the latency. However, with applications, such as video streaming, latency does not tell you if the video is going to buffer. One can only assume that lower latency will lead to less buffering. In such a scenario, measurement-based on throughput is a far better performance metric since will tell you how fast an object will load.

We have also to consider the page load times. At the network level, it's the time to first byte (TTFB) and ping. However, these mechanisms don’t tell you much about the user experience as everything fits into one packet. Using ping will not inform you about the bandwidth problems.

And if a web page goes slower by 25% once packet loss exceeds 5% and you are measuring time to the first byte which is the 4th packet - what exactly can you learn? TTFB is comparable to an internet control message protocol (ICMP) request just one layer up the stack. It's good if something is broken but not if there is underperformance issue.

When you examine the history of TTFB measuring, you will find that it was deployed due to the lack of Real User Monitoring (RUM) measurements. Previously TTFB was as good in approximating how fast something was going to load, but we don't have to approximate anymore as we can measure it with RUM. RUM is measurements from the end-users. An example could be the metrics generated from a webpage that is being served to an actual user.

Conclusively, TTFB, ping and page load times are not sophisticated measurements. We should prefer RUM time measurements as much as we can. This provides a more accurate picture of the user experience. This is something which has become critical over the last decade.

Now we are living in a world of RUM which lets us build our network based on what matters to the business users. All CDNs should aim for RUM measurements. For this, they may need to integrate with traffic management systems that intelligently measure on what the end-user really sees.

The need for multi-CDN
Primarily, the reasons one would opt for a multi-CDN environment are availability and performance. No single CDN can be the fastest to everyone and everywhere in the world. It is impossible due to the internet's connectivity model. However, combining the best of two or even more CDN providers will increase the performance.

A multi-CDN will give a faster performance and higher availability than what can be achieved with a single CDN. A good design is what runs two availability zones. A better design is what runs two availability zones with a single CDN provider. However, superior design is what runs two availability zones in a multi-CDN environment.

Edge applications will be the new norm
It’s not that long ago that there was a transition from the heavy physical monolithic architecture to the agile cloud. But all that really happened was the transition from the physical appliance to a virtual cloud-based appliance. Maybe now is the time that we should ask, is this the future that we really want?

One of the main issues in introducing edge applications is the mindset. It is challenging to convince yourself or your peers that the infrastructure you have spent all your time working on and investing in is not the best way forward for your business. 

Although the cloud has created a big buzz, just because you migrate to the cloud does not mean that your applications will run faster. In fact, all you are really doing is abstracting the physical pieces of the architecture and paying someone else to manage it. The cloud has, however, opened the door for the edge application conversation. We have already taken the first step to the cloud and now it's time to make the second move.

Basically, when you think about edge applications: its simplicity is a programmable CDN. A CDN is an edge application and an edge application is a superset of what your CDN is doing. Edge applications denote cloud computing at the edge. It is a paradigm to distribute the application closer to the source for lower latency, additional resilience, and simplified infrastructure, where you still have control and privacy.

From an architectural point of view, an edge application provides more resilience than deploying centralized applications. In today's world of high expectations, resilience is a necessity for the continuity of business. Edge applications allow you to collapse the infrastructure into an architecture that is cheaper, simpler and more attentive to the application. The less in the expanse of infrastructure, the more time you can focus on what really matters to your business - the customer.

An example of an edge architecture
An example of edge architecture is within each PoP, every application has its own isolated JavaScript (JS) environment. JavaScript is great for security isolation and the performance guarantees scale. The JavaScript is a dedicated isolated instance that executes the code at the edge.

Most likely, each JavaScript has its own virtual machine (VM). The sole operation that the VM is performing is the JavaScript runtime engine and the only thing it is running is the customer's code. One could use Google V8 open-source high-performance JavaScript and WebAssembly engine.

Let’s face it, if you continue building more PoPs, you will hit the law of diminishing returns. When it comes to application such as mobile, you really are maxed out when throwing PoPs to form a solution. So we need to find another solution.

In the coming times, we are going to witness a trend where most applications will become global, which means edge applications. It certainly makes little sense to place all the application in one location when your users are everywhere else.

Wednesday, 17 July 2019

Juniper Adds Multi-Cloud Kubernetes Support to Juke

Juniper Networks released Juke 2.2, the multi-cloud container platform it acquired when it bought composable infrastructure firm HTBASE late last year. The updated version integrates with Kubernetes and its container storage interface (CSI) and has new snapshot and scheduler capabilities.
The product itself isn’t composable infrastructure — it’s essentially container-focused software-defined storage, explained Scott Sneddon, senior director and evangelist for multi-cloud solutions at Juniper. But it plays partially in that space by stretching across the multi-cloud gap that exists between composable offerings.
“Juke is primarily a distributed, persistent storage solution,” he explained. “We’re really focused in on the storage problem that exists and would coexist with some of the composable solutions that are out there by delivering storage, and some orchestration capabilities to help manage containers — Kubernetes in particular.”
The product had been “getting some traction” prior to Juniper’s acquisition of HTBASE. But the 2.2 version aims to solve two key problems within the Kubernetes ecosystem, Sneddon said. One is the challenge associated with multi-cloud Kubernetes management. “The second challenge is that there really isn’t a good platform for persistent storage in Kubernetes,” he added.
The updates address these challenges by integrating with Kubernetes and its CSI. This adds Kubernetes cluster scale out with compute and storage nodes that can now span clouds and sites or stay local to one cluster availability zone.
The new version also adds volume snapshots and clone management for better reliability and mobility as well as deployment improvements to make Juke easier to install.
These improvements give customers “the ability to manage multiple clusters across multiple clouds and then deliver a distributed storage platform to support containerized applications,” Sneddon said.
Plus, this ability to manage core storage objects for Kubernetes from the Juke user interface means administrators can do things like automate multi-cloud resource access to fit changing edge compute or proximity requirements and automate multi-cloud arbitrage for changing day-and-night usage patterns.
“Because we are able to orchestrate this persistent distributed storage, we can also understand the latency between running containers and their access to storage,” Sneddon explained. “So we can take that latency and performance information, feed that back into the orchestrator, and determine where is the best place to deploy that workload,” such as on-premises to save costs during low usage times but then to the cloud when usage spikes.

IBM Launches Trio of Kubernetes-Focused Open Source Projects

IBM unveiled a trio of new open source projects targeted at helping developers more quickly build cloud-native applications for Kubernetes. The moves come on the heels of the company closing its $34 billion acquisition of open source powerhouse Red Hat.
The new projects are Codewind, Appsody, and Kabanero. Appsody uses components from Codewind, while Kabanero uses components from both Codewind and Appsody.
Codewind provides extensions to integrated development environments (IDEs) like VS Code, Eclipse, and Eclipse Che to build applications in containers. It allows developers to iterate, debug, and performance test applications inside of containers similar to how those applications run in a production environment. IBM said that this essentially allows developers to build applications in containers without knowing they are working in containers.
Codewind is being managed by the Eclipse Foundation, which emerged from the Eclipse Project that IBM founded in late 2001. It’s focused on supporting Eclipse open source projects, with the most well known being the Eclipse multi-language software development environment platform.

Building a Platform

Appsody provides developers with pre-configured stacks and templates for a set of open source runtimes and frameworks. These can be used as a foundation to build applications for Kubernetes and Kubernetes-based Knative serverless deployments. IBM noted that this allows developers to focus on their code, reduces the learning curve for cloud-native deployments, and allows for quicker development for those applications.
The pre-configured stacks can be customized for specific development requirements. If customized, developers can then have a single point of control to roll out those changes to all applications built from that customized stack.
Kabanero is an end-to-end DevOps platform that integrates already established runtimes and frameworks with a Kubernetes-native DevOps toolchain. It supports the architecting, building, deploying, and lifecycle management of Kubernetes-based applications. This includes the ability to integrate security and to build pipelines into a customized stack to support governance and compliance requirements.
In addition to the other two new projects, Kabanero integrates the Knative serverless platform, Istio service mesh, Google-developed Tekton continuous integration/continuous development (CI/CD) platform, and IBM’s Razee multi-cluster continuous delivery tool.
Nate Ziemann, an IBM Cloud product manager, noted in a blog post that the integration is what sets Kabanero apart in the market.
“While there are open source projects that address individual aspects of what Kabanero addresses, no other open source project provides an integrated experience from the creation of a containerized cloud-native application through its production lifecycle on Kubernetes,” Ziemann wrote.

Friday, 12 July 2019

Julia vs. Python: Which is best for data science?

Among the many use cases Python covers, data analytics has become perhaps the biggest and most significant. The Python ecosystem is loaded with libraries, tools, and applications that make the work of scientific computing and data analysis fast and convenient.

But for the developers behind the Julia language — aimed specifically at “scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing”—Python isn’t fast or convenient enough. Python represents a trade-off, good for some parts of data analytics work but terrible for others.

What is the Julia language?

Created in 2009 by a four-person team and unveiled to the public in 2012, Julia is meant to address the shortcomings in Python and other languages and applications used for scientific computing and data processing. “We are greedy,” they wrote. They wanted more: 

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

Here are some of the ways Julia implements those aspirations:

  • Julia is compiled, not interpreted. For faster runtime performance, Julia is just-in-time (JIT) compiled using the LLVM compiler framework. At its best, Julia can approach or match the speed of C.
  • Julia is interactive. Julia includes a REPL (read-eval-print loop), or interactive command line, similar to what Python offers. Quick one-off scripts and commands can be punched right in.
  • Julia has a straightforward syntax. Julia’s syntax is similar to Python’s—terse, but also expressive and powerful.
  • Julia combines the benefits of dynamic typing and static typing. You can specify types for variables, like “unsigned 32-bit integer.” But you can also create hierarchies of types to allow general cases for handling variables of specific types—for instance, to write a function that accepts integers without specifying the length or signing of the integer. You can even do without typing entirely if it isn’t needed in a particular context.
  • Julia can call Python, C, and Fortran libraries. Julia can interface directly with external libraries written in C and Fortran. It’s also possible to interface with Python code by way of the PyCall library, and even share data between Python and Julia.
  • Julia supports metaprogramming. Julia programs can generate other Julia programs, and even modify their own code, in a way that is reminiscent of languages like Lisp.
  • Julia has a full-featured debugger. Julia 1.1 introduced a debugging suite, which executes code in a local REPL and allows you to step through the results, inspect variables, and add breakpoints in code. You can even perform fine-grained tasks like stepping through a function generated by code.
https://www.infoworld.com