Thursday, 27 September 2018

Edge computing is the place to address a host of IoT security concerns

Edge computing can greatly improve the efficiency of gathering, processing and analyzing data gathered by arrays of IoT devices, but it’s also an essential place to inject security between these inherently vulnerable devices and the rest of the corporate network.

First designed for the industrial IoT (IIoT), edge computing refers places placing an edge router or gateway locally with a group of IIoT endpoints, such as an arrangement of connected valves, actuators and other equipment on a factory floor.

Because the lifespan of industrial equipment is frequently measured in decades, the connectivity features of those endpoints either date back to their first installation or they’ve been grafted on after the fact. In either case, the ability of those endpoints to secure themselves is seriously limited, since they’re probably not particularly powerful computing devices. Encryption is hard to cram into a system-on-a-chip designed to open and close a valve and relay status back to a central control pane.

IIoT can be a security blind spot
As a result, IIoT is a rich new target opportunity for malicious hackers, thanks in large part to the difficulty of organizing and gaining visibility into what’s happening on an IIoT, according to Eddie Habibi, CEO of PAS Global, an industrial cybersecurity company who has been working in the industrial control and automation for about 15 years.

A lot of connected IIoT devices have known, exploitable vulnerabilities, but operators might not have the ability to know for certain what systems they have on their networks. “The hardest thing about these older systems that have been connected over the past 25 years is that you can’t easily do discovery on them,” he said. Operators don’t know all the devices they have, so they don’t know what vulnerabilities to patch.

It’ll be decades, Habibi said, before many IIoT users – whose core devices can date back to the 1980s and even the 1970s – update this important hardware.

Edge networks provide security
That’s where the edge comes in, say the experts. Placing a gateway between the industrial endpoints and the rest of a company’s computing resources lets businesses implement current security and visibility technology without ripping and replacing expensive and IIoT machinery.

The edge model also helps IIoT implementations in an operational sense, by providing a lower-latency management option than would otherwise be possible if those IIoT endpoints were calling back to a cloud or a data center for instructions and to process data.

Most of the technical tools used to secure an IoT network in an edge configuration are similar to those in use on IT networks – encryption, network segmentation, and the like. Edge networking creates a space to locate security technologies that limited-capacity endpoints can’t handle on their own.

Mike Mackey is CTO and vice president of engineering at Atonomi, makers of a blockchain-based identity and reputation-tracking framework for IIoT security. He said edge computing adds an important layer of trust between a company’s backend and its potentially vulnerable IIoT devices.

“[N]ow you’re adding network translation to the end-to-end communication between that IoT device and whatever it’s ultimately communicating with, which, today, is typically the cloud,” he said.

Other experts, such as Windmill Enterprise CEO Michael Hathaway, also highlighted that widely used cloud-based backends pose problems of their own. Enterprises are losing control over their security policies and access with every new cloud service they subscribe to, he said.

“Enterprise customers can be very nervous about hooking up an automation system directly to the Internet – it needs a last layer of intelligence and security,” Hathaway said.

Consequently, some of the most effective IIoT implementations can be those that leave the existing structures and networks in place – hence the popularity of the edge architecture, which works both as a buffer and a link between the IT network and a company’s operational technology.

Russ Dietz, chief product security officer at GE Digital, said that old-yet-irreplaceable technology already on the factory floor plays an enormous role in shaping the IIoT infrastructure laid on top of it.

“Over time, we might migrate to a fully digital world where we blend those two together, but because industrial is going to live in this very long-tail environment, we have to be able to provide separate trust for both of those,” he said. “So we may weight how much we trust sensors in a different category than how much we trust a control system.”

Edge networks must fit unique sets of needs
According to Hathaway, it’s important to recognize that not all edge solutions are created equal, and that different businesses will have different requirements for an edge computing deployment. An automotive manufacturer might need to track a lot of process-oriented data and rate information about productivity, while an oil-production facility is likely to need to track things like pressures and volumes through a vast array of pipelines.

“You can’t possibly have provided a cookie-cutter solution,” said Hathaway, adding that, while the tools and approaches used will have commonalities, everyone’s security needs will be different.

The eventual hope for most IIoT deployments is that they provide enough machine-generated data to help businesses make smart decisions for the future, according to Simon Dowling, CTO of edge compute vendor ORI.

Protecting the data those machines send back for analysis – whether at the edge layer or back in the cloud or data center – is of paramount importance.

“As we’re moving towards a world where there is – whether it’s industrial IoT or it’s more commercial/consumer-focused IoT – a level of expectation that these devices will provide more meaningful action,” he said.

And if businesses want to stay on top of cybersecurity threats, they have to realize that it’s not simply a matter of pushing out updates and getting the latest and greatest technology up and running on their systems, said Aruba/HPE's vice president of strategic partnerships, Mike Tennefoss. It’s also understanding the way those updates and additions will tie into the operational technology stack.

“Security is the heart and soul of IT, and what you see happening is that IT systems and processes of cybersecurity are pushing down deeper and deeper into the operational technologist’s realm,” he said.

https://www.networkworld.com

Google Cloud grants $9M in credits for the operation of the Kubernetes project

Kubernetes, the container orchestrator created and open-sourced here at Google, has experienced incredible development and adoption since it was introduced in 2014. Today, a reported 54% of Fortune 100 businesses use Kubernetes in some capacity and developers have made nearly a million comments on the project in GitHub.

Since Kubernetes’ inception, we’ve provided the cloud resources that support the project development—namely CI/CD testing infrastructure, container downloads, and other services like DNS, all running on Google Cloud Platform (GCP). And in that time, Kubernetes has become one of the world’s most popular open-source projects. To put it in perspective, just last month, the Kubernetes container registry we host served 129,537,369 container image downloads of core Kubernetes components. That’s over 4 million per day—and a lot of bandwidth!

In 2015, we contributed the project to the then newly formed Cloud Native Computing Foundation (CNCF) to help facilitate project management and develop an open, vibrant community of contributors. CNCF, under the direction of the Linux Foundation, helps nurture project growth—such as establishing a certified Kubernetes program that’s helped maintain a consistent experience across Kubernetes distributions, and guiding Kubernetes through the incubation process.

As a testament to Kubernetes’ maturity, we’re excited to take the next step, and are opening the Kubernetes project's cloud resources up to contributors. We’re funding this move with a $9 million grant of GCP credits to the CNCF, split over three years, to cover infrastructure costs. In addition to the world-wide network and storage capacity required to serve all those container downloads, a large part of this grant will be dedicated to funding scalability testing, which regularly runs 150,000 containers across 5,000 virtual machines. Our goal is to make sure Kubernetes is ready to scale when your enterprise needs it to.

We believe that all aspects of a mature open-source project—including its testing and release infrastructure—should be maintained by the people developing it. In the coming months, all project operations will be transferred to be administered by members of the Kubernetes project (including many Googlers), who will take ownership of day-to-day operational tasks such as testing and builds, as well as maintaining and operating the image repository and download infrastructure.

https://cloud.google.com

Tuesday, 18 September 2018

Dell EMC Improves Its SMB Block-Storage Arrays

Dell EMC continues to push hard to become the go-to storage vendor for small-to-medium-sized businesses and startups that need to have a data center or two in operation.

The company on Sept. 12 launched its PowerVault ME4 Series, the fourth-generation lineup of entry-level data storage arrays designed for SMBs. These machines are aimed at organizations that want to modernize IT and boost application performance lest they fall behind competitors.

Small- and medium-sized businesses have similar needs to larger enterprises but must meet those needs in the face of smaller budgets and fewer IT support staff. Balancing price and performance is a reality for these organizations, along with selecting storage solutions that are simple to use and manage. Dell EMC designed the PowerVault ME4 Series to meet these needs by offering a feature-rich, lower-priced storage array designed to be intuitive to install and use.

Dell claims the new arrays bring significant increases over previous Dell EMC entry systems in capacity, performance, simplicity and features. Dell EMC built the PowerVault ME4 Series with 75 percent more drives to increase raw storage capacity by 122 percent, while also boosting read IOPS performance by 4X, the company said.

ME4s are designed for a range of block-based storage use cases such as video surveillance, HPC, virtualization/VDI, entry SAN, low cost consolidation, video editing, NoSQL databases, direct attached and OEM solutions, Dell said.

Enterprise-class performance data points and features include:


  • Flexible array configurations:  Three models offer from 12 to 84 storage drives that can be configured from 0-100 percent flash and support any mix of SSDs and HDDs. This provides the capabilities to deliver the optimal performance and low response times of flash with the economics of disk.
  • Enterprise-level capacity and performance: PowerVault ME4 Series can expand up to 4PB and drive up to 320,000 IOPs.
  • All-inclusive software: Essential components to manage and protect data to drive the business forward including built-in data protection, thin provisioning, remote replication, snapshots, 3-level tiering, volume cloning, integration with VMware vCenter and SRM, drive-level encryption and others.
  • Single-vendor integration:  Out-of-the-box integration with Dell EMC PowerEdge 13th and 14th Generation servers simplify the challenges of server capacity expansion, providing business applications with high-speed data access in either SAN or DAS environments.
Each aspect of the PowerVault ME4 has been designed for ease of use, Dell said. Dell claims the array is customer-installable in 15 minutes and takes just 15 more minutes to configure and begin storing data as either hybrid or all-flash storage. These estimates will vary.
The PowerVault ME4 Series also has a new HTML5 user interface, which enables users to deploy a web browser from anywhere to handle common management tasks, such as configuring and managing storage profiles and network connections, responding to alerts and others.
Every PowerVault ME4 Series storage controller is powered by a dual-core Intel Broadwell processor. Each array supports multiple protocols including Fibre Channel, iSCSI (and SFP+) and SAS host connectivity.
According to new data from IDC, Dell’s 31.6 percent revenue share lead in the External Entry Storage segment is more than the next six competitors combined. Dell EMC already has an installed base of more than 400,000 PowerVault units worldwide.
Pricing and Availability
Dell EMC PowerVault ME4 Series minimum configurations are available now starting at a list price of about $13,000. Go here for more information.
http://www.eweek.com

Monday, 17 September 2018

TensorFlow.js puts machine learning in the browser

Google’s TensorFlow open source machine learning library has been extended to JavaScript with Tensorflow.js, a JavaScript library for deploying machine learning models in the browser.

A WebGL-accelerated library, Tensorflow.js also works with the Node.js server-side JavaScript runtime and is part of the TensorFlow ecosystem.  With machine learning directly in the browser, there is no need for drivers; developers can just run code.

The project, which features an ecosystem of JavaScript tools, evolved from the Deeplearn.js library for browser-based machine learning; Deeplearn.js is now known as Tensorflow.js Core.
TensorFlow.js APIs can be used to build models using the low-level JavaScript linear algebra library or the higher-level layers API. TensorFlow.js model converters can run existing models in the browser or under Node.js. Existing models can be retrained using sensor data connected to the browser.
A tensor serves as the central unit of data. Also, a high-level, Keras-inspired API is included for building neural networks.
But TensorFlow.js is not the only JavaScript library built for neural networking; TensorFire, built by MIT students, executes neural networks in a webpage.
Tensorflow.js has an API similar to Tensorflow’s Python API. But the JavaScript API does yet not support all the functionality of the Python API. Builders of Tensorflow.js pledge to achieve parity where it makes sense but want to provide an idiomatic JavaScript API. TensorFlow with WebGL also runs at 50 to 60 percent the speed of the TensorFlow Python API usedwith the AVX library.
  • A visualization library to perform quick visualizations of the model and data.
  • Performance improvements in the browser.
  • WebGL optimization.
  • A browser- and Node-specific data API.
  • Cloud integration on the Node.js side, including serverless-type integration points.
  • Better async support with the libuv asynchronous I/O library.

Where to download TensorFlow.js

You can download TensorFlow.js from GitHub.
https://www.infoworld.com

Friday, 14 September 2018

MongoDB takes on the world

If you’ve built a medium-sized to large-scale web application in the last few years, you probably considered basing it on the open source LAMP or MEAN stack. The older LAMP stack uses the Linux operating system, Apache web server, MySQL relational database, and PHP programming language. MEAN uses the MongoDB NoSQL database, the Express back-end web application framework, the Angular application platform, and the Node.js JavaScript runtime. MEAN is essentially an end-to-end JavaScript stack. Linux isn’t explicitly mentioned in the acronym, but is usually the OS underneath Node.

In this review, I’ll discuss the MongoDB database, now at version 4. MongoDB is a highly scalable, operational database available in both open source and commercial enterprise versions, and it can be run on-premises or as a managed cloud service. The managed cloud service is called MongoDB Atlas.

MongoDB is far and away the most popular of the NoSQL databases. Its document data model gives developers great flexibility, while its distributed architecture allows for great scalability. As a result, MongoDB is often chosen for applications that must manage large volumes of data, that benefit from horizontal scalability, and that handle data structures that don’t fit the relational model.  

Because MongoDB is appropriate for a wide variety of use cases, it is often put forth as a replacement for relational databases. However, while freedom from rigid schema constraints is often beneficial, it’s important to keep in mind that no document database is a universal solution—not even MongoDB.

MongoDB origins
The company behind MongoDB was founded in 2007 as 10gen by a team that was behind DoubleClick, the Internet advertising company. The original motivation for the MongoDB database was to be able to handle the agility and scale required for Internet advertising. As an example of scale, DoubleClick served 400,000 ads per second in 2007, and struggled to perform with the existing databases of the time.

MongoDB is a document-based store that also has a graph-based store implemented on top of it. The other kinds of NoSQL databases are key-value stores and column-based stores. All kinds of NoSQL databases share the ability to scale out in ways that were not possible in the SQL relational databases of 2007, but the different varieties of NoSQL databases have different strengths, weaknesses, and use cases.

Some of the main NoSQL competitors to MongoDB as operational databases are Amazon DynamoDB (key-value store), Google Cloud BigTable (column store), Google Cloud Datastore (document store), Redis (in-memory, key-value store), Couchbase (multi-model key-value and document store), DataStax/Cassandra (column store), and Azure Cosmos DB (multi-model including a SQL option as well as several NoSQL stores).

What is MongoDB?
MongoDB Inc. describes MongoDB as “a document database with the scalability and flexibility that you want with the querying and indexing that you need.” To parse that, we first need to understand the nature of a document database, which is one of the kinds of NoSQL designs.

Rather than storing strongly typed data in related normalized tables with fixed schemas like a relational database, a document database stores related data in de-normalized form embedded in JSON-like name-value documents. MongoDB doesn’t actually store JSON, however: MongoDB stores BSON (Binary JSON), which extends the JSON representation (strings) to include additional types such as int, long, date, floating point, decimal128, and geospatial coordinates, as shown in the diagram below. BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data, and subdocuments. BSON also tracks the size of each document, to allow efficient seeking.

https://goo.gl/4tzJQ5

Wednesday, 12 September 2018

What is Docker? Docker containers explained

Software is anything but simple. Even seemingly basic applications might have a rat’s nest of dependencies, with scores of packages, libraries, and other software components required for them to run. When you try to run multiple applications in the same operating system environment, you might well find that some of these components don’t get along. And when you try to keep those software stacks up-to-date, or change them to keep up with business needs, you bring in all kinds of maintenance headaches.

For many years now, the leading way to isolate and organize applications and their dependencies has been to place each application in its own virtual machine. Virtual machines make it possible to run multiple applications on the same physical hardware while keeping conflicts among software components and competition for hardware resources to a minimum. But virtual machines are bulky—typically gigabytes in size. They don’t really solve problems like portability, software updates, or continuous integration and continuous delivery.

Enter Docker containers. Containers make it possible to isolate applications into small, lightweight execution environments that share the operating system kernel. Typically measured in megabytes, containers use far fewer resources than virtual machines and start up almost immediately. They can be packed far more densely on the same hardware and spun up and down en masse with far less effort and overhead.

Thus containers provide a highly efficient and highly granular mechanism for combining software components into the kinds of application and service stacks needed in a modern enterprise, and for keeping those software components updated and maintained.

Docker container basics
Docker containers are the most modern incarnation of an idea that has been in Unix operating systems such as BSD and Solaris for decades—the idea that a given process can be run with some degree of isolation from the rest of the operating environment.

Virtual machines provide isolation by devoting an entire operating system instance to each application that needs compartmentalizing. This approach provides almost total isolation, but at the cost of significant overhead. Each guest operating instance eats up memory and processing power that could be better devoted to apps themselves.

Containers take a different approach. Each application and its dependencies use a partitioned segment of the operating system’s resources. The container runtime (Docker, most often) sets up and tears down the containers by drawing on the low-level container services provided by the host operating system.

To understand Linux containers, for example, we have to start with cgroups and namespaces, the Linux kernel features that create the walls between containers and other processes running on the host. Linux namespaces, originally developed by IBM, wrap a set of system resources and present them to a process to make it look like they are dedicated to that process.

Linux cgroups, originally developed by Google, govern the isolation and usage of system resources, such as CPU and memory, for a group of processes. For example, if you have an application that takes up a lot of CPU cycles and memory, such as a scientific computing application, you can put the application in a cgroup to limit its CPU and memory usage.

Namespaces deal with resource isolation for a single process, while cgroups manage resources for a group of processes. Together, cgroups and namespaces were used to create a container technology called, appropriately enough, Linux Containers, or LXC.

How Docker changed containers
The original Linux container technology, LXC is a Linux operating system level virtualization method for running multiple isolated Linux systems on a single host. Namespaces and cgroups make LXC possible.

Containers decouple applications from operating systems, which means that users can have a clean and minimal Linux operating system and run everything else in one or more isolated container.

Also, because the operating system is abstracted away from containers, you can move a container across any Linux server that supports the container runtime environment.

Docker introduced several significant changes to LXC that make containers more portable and flexible to use. Using Docker containers, you can deploy, replicate, move, and back up a workload even more quickly and easily than you can do so using virtual machines. Docker brings cloudlike flexibility to any infrastructure capable of running containers.

Docker also provides a way to create container images—specifications for which software components a given container would run and how. Docker’s container image tools allow a developer to build libraries of images, compose images together into new images, and launch the apps in them on local or remote infrastructure.

Docker also makes it easier to coordinate behaviors between containers, and thus build application stacks by hitching containers together. More advanced versions of these behaviors—what’s called container orchestration—are offered by third-party products, such as Kubernetes. But Docker provides the basics.

By taking the LXC concept and building an API and ecosystem around it, the developers of Docker have made working with containers far more accessible to developers and far more useful to enterprises.

Finally, although Docker was originally built atop LXC, eventually the Docker team created its own runtime, called libcontainer. Libcontainer not only provides a richer layer of services for containers, but also makes it easier for the Docker team to develop Docker container technology separately from Linux.

Today, Docker is a Linux or Windows utility that can efficiently create, ship, and run containers.

Docker advantages
Docker containers provide a way to build enterprise and line-of-business applications that are easier to assemble, maintain, and move around than their conventional counterparts.

Docker containers enable isolation and throttling
Docker containers keep apps isolated not only from each other, but from the underlying system. This not only makes for a cleaner software stack, but makes it easier to dictate how a given containerized application uses system resources—CPU, GPU, memory, I/O, networking, and so on. It also makes it easier to ensure that data and code are kept separate.

Docker containers enable portability
A Docker container runs on any machine that supports the container’s runtime environment. Applications don’t have to be tied to the host operating system, so both the application environment and the underlying operating environment can be kept clean and minimal.

For instance, a MySQL for Linux container will run on most any Linux system that supports containers. All of the dependencies for the app are typically delivered in the same container.

Container-based apps can be moved easily from on-prem systems to cloud environments or from developers’ laptops to servers, as long as the target system supports Docker and any of the third-party tools that might be in use with it, such as Kubernetes.

Normally, Docker container images must be built for a specific platform. A Windows container, for instance, will not run on Linux and vice versa. Previously, one way around this limitation was to launch a virtual machine that ran an instance of the needed operating system, and run the container in the virtual machine.

However, the Docker team has since devised a more elegant solution, called manifests, which allow images for multiple operating systems to be packed side-by-side in the same image. Manifests are still considered experimental, but they hint at how containers might become a cross-platform application solution as well as a cross-environment one.

Docker containers enable composability
Most business applications consist of several separate components organized into a stack—a web server, a database, an in-memory cache. Containers make it possible to compose these pieces into a functional unit with easily changeable parts. Each piece is provided by a different container and can be maintained, updated, swapped out, and modified independently of the others.

This is essentially the microservices model of application design. By dividing application functionality into separate, self-contained services, the microservices model offers an antidote to slow traditional development processes and inflexible monolithic apps. Lightweight and portable containers make it easier to build and maintain microservices-based applications.

Docker containers ease orchestration and scaling
Because containers are lightweight and impose little overhead, it’s possible to launch many more of them on a given system. But containers can also be used to scale an application across clusters of systems, and to ramp services up or down to meet spikes in demand or to conserve resources.

The most enterprise-grade versions of the tools for deployment, managing, and scaling containers are provided by way of third-party projects. Chief among them is Google’s Kubernetes, a system for automating how containers are deployed and scaled, but also how they’re connected together, load-balanced, and managed. Kubernetes also provides ways to create and re-use multi-container application definitions or “Helm charts,” so that complex app stacks can be built and managed on demand.

Docker also includes its own built-in orchestration system, swarm mode, which is still used for cases that are less demanding. That said, Kubernetes has become something of the default choice; in fact, Kubernetes is bundled with Docker Enterprise Edition.

Docker caveats
Containers solve a great many problems, but they aren’t cure-alls. Some of their shortcomings are by design; some are by-products of their nature.

Docker containers are not virtual machines
The most common conceptual mistake people make with containers is to equate them with virtual machines. However, because containers and virtual machines use different isolation mechanisms, they have distinctly different advantages and disadvantages.

Virtual machines provide a high degree of isolation for processes, since they run in their own instance of an operating system. That operating system doesn’t have to be the same as the one run on the host, either. A Windows virtual machine can run on a Linux hypervisor and vice versa.

Containers, by contrast, use controlled portions of the host operating system’s resources; many applications share the same OS kernel, in a highly managed way. As a result, containerized apps aren’t as thoroughly isolated as virtual machines, but they provide enough isolation for the vast majority of workloads.

Microsoft offers two types of containers on Windows that blur the lines slightly between containers and virtual machines:
  • Windows Server Containers are essentially Docker-style containers on Windows. Microsoft essentially provided the Windows kernel with some of the same mechanisms used in Linux to perform the isolation, so Docker containers could have the same behaviors on both platforms.
  • Hyper-V Containers are containers that run in their own virtual machine with their own kernel for additional isolation. Thus Hyper-V Containers can run different versions of the Windows kernel if needed. Conventional containers can be converted to Hyper-V Containers if the need arises.

Keep in mind that, while Hyper-V Containers run on the Hyper-V hypervisor and take advantage of Hyper-V isolation, they are still a different animal than full-blown virtual machines.

Docker containers don’t provide bare-metal speed
Containers don’t have nearly the overhead of virtual machines, but their performance impact is still measureable. If you have a workload that requires bare-metal speed, a container might be able to get you close enough—much closer than a VM—but you’re still going to see some overhead.

Docker containers are stateless and immutable
Containers boot and run from an image that describes their contents. That image is immutable by default—once created, it doesn’t change.

Consequently, containers don’t have persistency. If you start a container instance, then kill it and restart it, the new container instance won’t have any of the stateful information associated with the old one.

This is another way containers differ from virtual machines. A virtual machine has persistency across sessions by default, because it has its own file system. With a container, the only thing that persists is the image used to boot the software that runs in the container; the only way to change that is to create a new, revised container image.

On the plus side, the statelessness of containers makes the contents of containers more consistent, and easier to compose predictably into application stacks. It also forces developers to keep application data separate from application code.

If you want a container to have any kind of persistent state, you need to place that state somewhere else. That could be a database or a stand-alone data volume connected to the container at boot time.

Docker containers are not microservices
I mentioned earlier how containers lend themselves to creating microservices applications. That doesn’t mean taking a given application and sticking it into a container will automatically create a microservice. A microservices application must be built according to a microservice design pattern, whether it is deployed in containers or not. It is possible to containerize an application as part of the process of converting it to a microservice, but that’s only one step among many.

When virtual machines came along, they made it possible to decouple applications from the systems they ran on. Docker containers take that idea several steps further—not just by being more lightweight, more portable, and faster to spin up than virtual machines, but also by offering scaling, composition, and management features that virtual machines can’t.

https://www.infoworld.com

Machine learning: When to use each method and technique

You’re probably hearing more and more about machine learning, a subset of artificial intelligence. But what exactly can you do with machine learning?

The technology encompasses a number of methods and techniques, and each has a set of potential use cases. Enterprises would do well to examine them before plunging ahead with plans to invest in machine learning tools and infrastructure.

Machine learning methods

Supervised learning

Supervised learning is ideal if you know what you want a machine to learn. You can expose it to an enormous set of training data, examine the output, and tweak parameters until you get the results you expect. Later, you can see what the machine has learned by having it predict the results for a set of validation data it hasn’t seen before.
The most common supervised learning tasks involve classification and prediction, or regression.
Supervised learning methods can be used for applications such as determining the financial risk of individuals and organizations, based on past information about financial performance. They can also provide a good sense of how customers will act or what their preferences are based on previous behavior patterns.
For example, online loan marketplace Lending Tree is using an automated machine learning platform from DataRobot to customize experiences for its customers and to predict their intent based on what they’ve done in the past, says Akshay Tandon, vice president and head of strategy and analytics.
By predicting customer intent—primarily via lead scoring—Lending Tree can sort out the people who are just shopping around for a rate versus those who are actually looking for a loan and ready to apply for one. Using supervised learning techniques, it built out a classification model to define the probability of a lead closing.
Unsupervised learning lets a machine explore a data set and identify hidden patterns linking different variables. This method can be used to group the data into clusters based on their statistical properties alone.
A good application of unsupervised learning is the clustering algorithm used to perform probabilistic record linking, a technique that extracts connections among data elements and builds on them to identify individuals and organizations and their connections in the physical or virtual world.
This is especially useful for enterprises that need to, for example, integrate data from disparate sources and/or across different business units to build a consistent and comprehensive view of their customers, says Flavio Villanustre, vice president of technology at LexisNexis Risk Solutions, a company that uses analytics to help customers predict and manage risk.
Unsupervised learning can be used for sentiment analysis, which identifies the emotional state of individuals based on their social media posts, emails, or other written feedback, says Sally Epstein, a specialist machine learning engineer at consulting firm Cambridge Consultants. The firm has seen an increasing number of companies in financial services use unsupervised learning to derive insight into customer satisfaction.

Semisupervised learning

Semisupervised learning is a hybrid of supervised and unsupervised learning. By labeling a small portion of the data, a trainer can give the machine clues as to how it should cluster the rest of the data set.
Semisupervised learning can be used to detect identity fraud, among other uses. Fortunately, fraud is not as frequent as nonfraudulent activity, Villanustre notes, and as such fraudulent activity can be considered an “anomaly” in the universe of legitimate activity. Still, fraud exists and semisupervised anomaly-detection machine learning methods can be used to model solutions to these types of problems. This type of learning is deployed to identify fraud in online transactions.
Semisupervised learning can also be used when there’s a mixture of labeled and unlabelled data, which is often seen in large enterprise settings, Epstein says. Amazon has been able to enhance the natural language understanding of its Alexa offering by training AI algorithms on a mix of labeled and unlabelled data, she says. This has helped increase the accuracy of Alexa’s responses, she says.

Reinforcement learning

With reinforcement learning, you let the machine interact with its environment (for example, pushing damaged products off a conveyor into a bin) and provide a reward when it does what you want. By automating the calculation of the reward, you can leave the machine to learn in its own time.
One use case for reinforcement learning is the sorting of clothing and other items at a retail establishment.
Some apparel retailers have been piloting new types of technology such as robotics to help sort items such as clothing, shoes, and accessories, says David Schatsky, an analyst at consulting firm Deloitte who focuses on emerging technology and business trends.
The robots use reinforcement learning (as well as deep learning) to figure out how much pressure they should use when grasping items and how best to grab these items in inventory, Schatsky says.
A variation of reinforcement learning is deep reinforcement learning, which is well-suited for autonomous decision-making where supervised learning or unsupervised learning techniques alone can’t do the job.

Deep learning

Deep learning performs learning types such as unsupervised or reinforcement learning. Broadly speaking, deep learning mimics some aspects of how people learn, mainly by using neural networks to identify characteristics of the data set in more and more detail.
Deep learning, in the form of deep neural networks (DNN), has been used to accelerate high content screening for drug discovery, Schatsky says. It involves applying DNN acceleration techniques to process multiple images in significantly less time, while extracting greater insight from image features that the model ultimately learns.
This machine learning method is also letting many companies fight fraud, improving detection rates by using automation to detect wrongdoing.
Deep learning can also be used in the auto industry. One company has developed a neural network-based system that allows early detection of problems with cars, Schatsky says. This system can recognize noises and vibrations, and it uses any deviations from the norm to interpret the nature of the failure. It can become part of predictive maintenance, because it determines the vibrations of any moving parts of the car and can notice even minor changes in their performance.

Machine learning techniques

Neural networks

Neural networks are designed to mimic the structure of neurons in human brains, with each artificial neuron connecting to other neurons inside the system. Neural networks are arranged in layers, with neurons in one layer passing data to multiple neurons in the next layer, and so on. Eventually they reach the output layer, where the network presents its best guesses to solve a problem, identify an object, and so on.
  • In life sciences and health care, they can be used to analyze medical images to speed up diagnostic processes and for drug discovery, Schatsky says.
  • In telecom and media, neural networks can be used for language translations, fraud detection, and virtual assistant services.
  • In financial services, they can be used to fraud detection, portfolio management, and risk analysis.
  • In retail, they can be used to eliminate checkout lines and personalize customer experience.

Decision trees

A decision tree algorithm aims to classify items by identifying questions about their attributes that will help decide in which class to place them. Each node in the tree is a question, with branches leading to more questions about the items, with the leaves being the final classifications.
Use cases for decision trees can include building knowledge management platforms for customer service, pricing predictions, and product planning.
An insurance company might use a decision tree when it requires insights into what type of insurance products and premium adjustments are needed based on potential risk, says Ray Johnson, chief data scientist at business and technology consulting firm SPR. Using location data overlaid with weather-related loss data, it can create risk categories based on submitted claims and expenditure amounts. Then it can evaluate new applications for coverage against models to provide a risk category and the potential financial impact, he says.

Random forests

While a single decision tree must be trained to provide accurate results, the random forest algorithm takes an ensemble of randomly created decision trees that base their decisions on different sets of attributes, and lets them vote on the most popular class. 
Random forests are versatile tools for finding relationships in data sets and are quick to train, Epstein says. For example, unsolicited bulk email has long been a problem, not just for users but also for the internet service providers that have to manage the increased load to servers. As a response to this problem, automated methods for filtering spam from normal email have been developed, using random forests to quickly and accurately identify unwanted email, she says.
Other uses for random forests include identifying a disease by analyzing a patient’s medical records, detecting fraud in banking, predicting call volume in call centers, and forecasting profits or losses through the purchasing of a particular stock.

Clustering

Clustering algorithmsuse techniques such as K-means, mean-shift, or expectation-maximization to group data points based on shared or similar characteristics. This is an unsupervised learning technique that can be applied to classification problems.
The clustering technique is particularly useful when needing to segment or categorize, Schatsky says. Examples include segmenting customers by distinct characteristics to better assign marketing campaigns, recommending news articles to certain readers, and effective police enforcement.
Clustering is also effective for discovering groupings in complex data sets that may not be obvious with the human eye. Examples range from categorizing similar documents in a database to identifying crime hot spots from crime reports, Epstein says.

Association rule learning

Association rule learning is an unsupervised technique used in recommendation engines, which looks for relationships between variables.
This is the technique behind the “people who bought X also bought Y” suggestions on many e-commerce sites, and examples of how this is being used are common.
A specific use case might be a specialty food retailer that wants to drive additional sales, Johnson says. It would use this technique to examine customer buying behavior to provide special tins and bundles for products celebrating events, sports teams, and so on. The association rules techniqueprovidesinsightsthat can uncoverwhen and where customers bought the preferred combination of products.
https://www.infoworld.com

Breakthrough: Python reaches Tiobe index Top 3

As expected, Python has climbed into the Top 3 of the Tiobe index of language popularity, achieving that milestone for the first time ever in the September 2018 edition of the index.

With a rating of 7.653 percent, Python placed third behind first-place Java, which had a rating of 17.436 percent, and second-place C, rated at 15.447. Python displaced C++, which finished third last month and took fourth place this month, with a rating of 7.394 percent.

Python is becoming increasingly ubiquitous, having conquered universities and the industrial world, Tiobe said, because it is easy to learn, learn, and deploy.

The monthly Tiobe index is based on a formula involving searches on languages in popular search engines such as Google, Bing, Yahoo, and Wikipedia. Tiobe provides software quality services.

Python also has been scoring high in two other language rankings:

  • The PyPL Popularity of Programming Language index, where it ranked No. 1  this month, as it has done before, and has had the most growth in the past five years.
  • The RedMonk Programming Language Rankings, where Python again placed third.

PyPL’s ratings are based on an assessment of searches on language tutorials in Google. Analyst firm RedMonk bases its ratings on an assessment of code in GitHub and discussions in Stack Overflow.

The Tiobe Top 10
The top 10 languages in Tiobe’s September index were:


  1. Java, with a rating of 17.436 percent
  2. C, at 15.447 percent
  3. Python, at 7.653 percent
  4. C++, at 7.394 percent
  5. Visual Basic .Net, at 5.308 percent
  6. C#, at 3.295 percent
  7. PHP, at 2.775 percent
  8. JavaScript, at 2.131 percent
  9. SQL, at 2.062 percent
  10. Objective-C, at 1.509 percent

The PyPL Top 10

PyPL’s top 10 for September were:
  1. Python, with a share of 24.58 percent
  2. Java, at 22.14 percent
  3. JavaScript, at 8.41 percent
  4. PHP, at 7.77 percent
  5. C#, at 7.74 percent
  6. C/C++, at 6.22 percent
  7. R, at 4.04 percent
  8. Objective-C, at 3.33 percent
  9. Swift, at 2.65 percent
  10. Matlab, at 2.1 percent

The RedMonk Top 10

The most-recent RedMonk rankings, labeled as the June 2018 rankings but actually released in August, were:
  1. JavaScript
  2. Java
  3. Python
  4. PHP
  5. C#
  6. C++
  7. CSS
  8. Ruby
  9. (tie) C, Objective-C
https://www.infoworld.com