Tuesday, 25 February 2020

Hybrid-cloud management requires new tools, skills

Hybrid cloud environments can deliver an array of benefits, but in many enterprises, they're becoming increasingly complex and difficult to manage. To cope, adopters typically turn to some type of management software. What soon becomes apparent, however, is that hybrid cloud management tools can be as complex and confounding as the environments they're designed to support.

A hybrid cloud typically includes a mix of computing, storage and other services. The environment is formed by a combination of on-premises infrastructure resources, private cloud services, and one or more public cloud offerings, such as Amazon Web Services (AWS) or Microsoft Azure, as well as orchestration among the various platforms.

Any organization contemplating a hybrid cloud deployment should begin building a transition framework at the earliest possible stage. "The biggest decision is what data and which applications should be on-premises due to the sensitivity of data, and what goes into the cloud," says Umesh Padval, a partner at venture capital firm Thomvest Ventures.

Numerous other issues also need to be sorted out at the start, including the ultimate destination of lower priority, yet still critical, data and applications. Will they be kept on premises forever or migrated at some point into the cloud? With applications and data scattered, security is another major concern. Operational factors and costs also need to be addressed at the very beginning. "Your email application may run great in your data center, but may operate differently in the cloud," Padval notes.

Hybrid cloud tools immature yet evolving
A complex hybrid cloud requires constant oversight as well as a way to intuitively and effectively manage an array of operations, including network performance, workload management, security and cost control. Not surprisingly, given the large number of management tasks needed to run an efficient and reliable hybrid cloud environment, adopters can select from a rapidly growing array of management tools.

"There’s a dizzying array of options from vendors, and it can be difficult to sort through them all," says R. Leigh Henning, principal network architect for data center operator Markley Group. "Vendors don’t always do the best job at making their differentiators clear, and a lot of time and effort is wasted as a result of this confusion. Companies are getting bogged down in an opaque field of choices."

The current hybrid cloud management market is both immature and evolving, declares Paul Miller, vice president of hybrid cloud at Hewlett Packard Enterprise. Vendors are still getting a handle on the types of management tools their customers need. "Offerings are limited and may not be supported across all public, on-premises and edges," Miller adds.

Perhaps the biggest challenge to hybrid cloud management is that the technology adds new, complex and frequently discordant layers to operations management. "Many solutions have compatibility restrictions on the components they can manage, locking your management platform into a vendor or group of vendors, which may or may not align with your current or future system architecture," warns George Burns III, senior consultant of cloud operations for IT professional services firm SPR.

A lack of standardized APIs, which in turn results in a shortage of standardized management tools, presents another adoption challenge. "The lack of standardized tools increases operational complexity through the creation of multiple incongruent tools; this leads to vendor lock-in and, in some cases, gross inefficiencies in terms of resource utilization," explains Vipin Jain, CTO of Pensando, a software-defined services platform developer. "To make it worse, these kinds of problems are typically 'solved' by adding another layer of software, which further increases complexity, reduces debuggability, and results in suboptimal use of features and resources."

Meanwhile, using standardized open-source tools can be an effective starting point to safeguard against compatibility issues. "Cloud Native Computing Foundation (CNCF) tools, such as Kubernetes and Prometheus, are good examples," Jain says. "Open-source tools from HashiCorp, such as Vault, Vagrant, Packer, and Terraform, [provide] a good normalization layer for multi-cloud and hybrid cloud deployments, but they are by no means sufficient," he notes. Ideally, the leading public cloud vendors would all agree on a standardized set of APIs that the rest of the industry could then follow. "Standardization can be a moving target, but it's critical from an efficiency and customer satisfaction perspective," Jain says.

Developers writing API configurations, as well as developers using API configurations, form a symbiotic relationship that should be mutually maintained, Burns advises. "Hardware vendors need to be open about changes and enhancements coming to their products and how that will affect their APIs," he explains. "Equally, management platform developers need to be mindful of changes to hardware platform APIs, [and] regularly participate in testing releases and provide adequate feedback to the vendor about results and functionality."

Prioritize management requirements; expect gaps
Even when everything works right, there are often gaps remaining between intended and actual management functionality. "In an ideal world, developers would have the perfect lab environments that would allow them to successfully test each product implementation, allowing functionality to be seamless across upgrades," Burns observes. "Unfortunately, we can’t expect everything to function perfectly and cannot forgo [on-site] testing."

When selecting a hybrid cloud management platform, it's important to not only be aware of its documented limitations, but also to know that nothing is certain until it's tested in its user's own hybrid cloud environment, Burns advises. "Gaps will exist, but it's ultimately your responsibility to fully identify and verify those gaps in your own environment," he says.

Further muddling the situation is the fact that many management tool packages are designed to supply multiple functions, which can make product selection difficult and confusing. "To simplify, customers need to consider which features are most important to them based on their use cases and can show a quick return on investment, mapping to their specific cloud journey," Miller explains.

Real-world experience with hybrid cloud management
Despite management challenges, most hybrid cloud adopters find a way to get their environment to function effectively, reliably and securely.

Gavin Burris, senior project leader, research computing, at the Wharton School of the University of Pennsylvania, appreciates the flexibility a hybrid cloud provides. "We have a small cluster ... that's generally available to all the faculty and PhD students," he notes. The school's hybrid environment supports a fair share prioritization scheme, which ensures that all users have access to the resources they need to support their work. "When they need more, they're able to request their own dedicated job queue that's run in the cloud," he says.

Burris, who uses Univa management products, says that having a management tool that allows fast and easy changes is perfect for individuals who like to maintain firm control over their hybrid environment. "I like to do things with scripting and automation, so to be able to go in and write my own rules and policies and build my own cluster with these management tools is really what I’m looking for," he explains.

James McGibney, senior director of cybersecurity and compliance at Rosendin Electric, an electrical contractor headquartered in San Jose, Calif., relies on a hybrid cloud to support a variety of essential operations. "Approximately two years ago we embarked on our journey from an on-premises disaster recovery, quality assurance and production environment to a cloud migration encompassing hundreds of terabytes of data," he says. McGibney relies on a management console provided by AWS and VMWare. The tool meets his current needs, but like many hybrid cloud administrators, he's keeping a close eye on industry developments. "We're currently investigating [other] options, just to see what’ out there," he says. Yet he doesn't expect to make any changes in the short term. "We're happy with the tools currently provided by AWS and VMware."

Sharpen network skills for hybrid cloud
Selecting a hybrid cloud management platform is not as simple as purchasing software and spinning up some VMs to run it. "During implementation, ensure that you have selected the proper product owners and engineers, and then determine what, if any, additional education or credentials they will need to effectively deploy and maintain the platform," Burns suggests. "Fully define your architecture, ensure buy-in from your staff, work with them to identify education gaps and create a solid operational plan for going forward."

Most hybrid cloud management tasks focus on configuration and access control operations, which tend to be both complex and challenging to implement. "At the same time, the beauty of the cloud is its ability to automate," says Mike Lamberg vice president and CISO at ION Group and its Openlink unit, which provides risk management, operations and finance software. Yet deploying a high level of automation also requires new skills and developers who can expertly handle the demands of virtual software-defined infrastructures as well as traditional environments. "We can’t assume that because teams can build applications in physical data centers that these skills will translate as they move to the cloud; new skills are required for success," Lamberg notes.

Hybrid cloud management requires a new team mindset. "IT networking staff literally need to unlearn what they know about physical networks and connectivity and recognize that the moving of packets and data is now handled by a forwarding software configuration, not by physical routers or switches," Lamberg says. "You can’t take what you did in building and supporting physical data centers and just apply it to the cloud—it simply doesn’t work."

In the big picture, transitioning to a hybrid cloud environment can solve many problems, yet it can also create some new obstacles if not properly implemented and managed. "Don't rush into any decision without considering all the points of impact that you can identify," Burns advises. "Make sure that you understand the breadth of a hybrid infrastructure and how it will be used to address business needs."

Wednesday, 12 February 2020

How to query and extract data from SaaS applications

Behind every SaaS application are databases storing business information about employees, suppliers, customers, and other partners. SaaS applications support workflows such as CRM for sales and marketing, cloud ERPs for financials, workforce management for human resource functions, and other enterprise and departmental services. Today, many businesses use a wide range of SaaS applications—from mainstream products such as Salesforce, Slack, Workday, and Atlassian, to many smaller SaaS tools.

SaaS applications shouldn’t operate in silos, and most organizations need to integrate capabilities across them and with other enterprise applications managed in private or public clouds.

If a workflow across multiple applications requires application integration, then development teams can leverage a SaaS platform’s APIs to trigger events from one platform to another. Enterprise integration platforms such as Boomi, SnapLogic, or MuleSoft are options when many applications and services need integration. If lighterweight integrations that follow the If This, Then That pattern are required, then an IFTTT platform may provide sufficient integration. Development teams should also explore low-code platforms such as Appian, OutSystems, and PowWow if they are developing new applications that connect to multiple SaaS and enterprise workflows.

Leveraging SaaS data for different business needs

What if you need to integrate the data from a SaaS platform with other data sources? There are a few reasons why data integration across SaaS tools may be required:

  • Business analysts want to develop reports and dashboards using this data.
  • Data science teams want the data for machine learning experiments.
  • Business teams want to centralize the data to support workflows and other types of applications. For example, marketing teams often use customer data platforms or master data platforms to centralize data on customers, products, and other business entities.
  • IT teams should extract the data for backups or enable transitioning data to other platforms.
  • Legal teams sometimes need to perform legal discovery on the underlying data.
  • Data stewards often want to cleanse, transform, or enrich the underlying data.

Sure, you can leverage the SaaS platforms’ APIs to extract data, but this may require a significant development effort to learn the APIs, understand the SaaS platform’s data model, create data stores for any new data, write the code to load the data, and develop the logic for any transformations. In addition, IT teams have to define cloud or data center infrastructure to host this application or service. Lastly, ongoing support is required for any data integrations designed to run on a schedule or on demand. Developing the integration from scratch may be expensive for development teams and IT organizations with other, more strategic priorities.

Another methodology is to consider data integration, data streaming, ETL (extraction, transformation, and loading), or other data prep platforms. Using a data integration platform may be the optimal method when working with large volumes of data that frequently change, since these platforms enable flexible extraction and transformation. However, they also require upfront development for the integration before end-users get access and utilize the information.


Lighterweight means of querying and managing SaaS data may be desirable. Sometimes, these are useful to experiment, discover, and prototype rapidly. Other times these approaches can easily be used for operation or production needs, especially when data volumes are low and query throughput isn’t significant. Here are three options.

1. BI platforms that directly query SaaS applications

If your primary requirement is reporting, then many self-service BI and data visualization platforms have direct connectors to the more popular SaaS applications.
  • Tableau can connect to platforms such as Intuit Quickbooks, Google Analytics, LinkedIn Sales Navigator, ServiceNow, Eloqua, Marketo, and Salesforce.
  • Microsoft Power BI also integrates with online services such as Adobe Analytics, Facebook, GitHub, MailChimp, Stripe, Quick Base, and Zendesk.
  • Domo claims to have more than a thousand connectors, including platforms such as HubSpot, Jira, Instagram, Qualtrics, Shopify, SurveyMonkey, Twitter, and Workday.
At a minimum, these integrations provide an easy way to query and discover the underlying SaaS data sources. At best, the out-of-the-box integration is sufficient for end-users to create the required data blending, reports, and dashboards.

There are some considerations.
  • These platforms enable joins and data blends when columns have matching keys. They become harder to use if significant data transformation is required before integrating the data source or blending it with other data sources.
  • Review whether SaaS data integrations are performed with real-time queries, or whether the data is extracted or cached.
  • Performance may be a factor if the SaaS application contains large data volumes, if there are complex joins with many other data sources, or if dashboards will be utilized concurrently by many users.
2. Platforms that emulate ODBC, JDBC, OData, or other drivers

If the business needs to go beyond reporting and dashboarding, and a lightweight integration approach is still desirable, then some commercial tools convert SaaS APIs into standard database drivers such as ODBC, JDBC, or OData. Two options for drivers to common SaaS platforms are Progress DataDirect and CData Driver Technologies.

The driver method may be most useful to data science teams who want to perform ad hoc queries into SaaS databases before pulling the data into their analysis. It’s also a good option for application developers who require real-time querying of SaaS application data.

Development and data science teams should investigate the performance of this integration, especially if high query volumes, large data sets, or low latency is required. Also, many SaaS applications throttle or charge customers based on API usage, so this may be a factor if higher query or data volumes are needed.

3. Lightweight ETL platforms that sync SaaS data to cloud databases

One final idea is to instrument a data integration out of the SaaS application into a cloud database that your organization sets up and manages. This strategy adds some operational complexity and costs, and it may not be ideal if real-time querying of the SaaS application data is required. But it does have several advantages:
  • It provides more control over the database platform and data architecture that business users, data scientists (including citizen data scientists), and application developers utilize. The platform and architecture should meet the volume, performance, and latency requirements.
  • Storing the data independent of the SaaS database provides greater flexibility to transform, join, cleanse, cube, or aggregate data as required by downstream users and applications.
  • If data security, data privacy, or other data governance controls for querying this data are different from the access and entitlement controls available in the SaaS applications, then hosting the data in a separate database may be required.
  • Hosting the data independent of the SaaS platform may be more cost-effective for higher data and query volume needs.
Although you could instrument this integration with data integration or data prep platforms, there are SaaS data integration platforms with out-of-the-box connectors to many SaaS applications. Stitch, a Talend company, is a plug-and-play solution if your objective is to stream data from SaaS applications to cloud databases. You can select what data to replicate and the replication frequency, but it does not provide any tools for transforming or filtering the data. Skyvia offers a similar product, and both have free tiers to let development teams try out integrations. Alooma, part of Google Cloud, focuses on moving data into big data platforms such as Google BigQuery, Amazon Redshift, and Snowflake, and provides some data transformation capabilities.

If your organization is utilizing many SaaS platforms, then a one-size-fits-all strategy may not work. Each integration path supports different SaaS integrations, and the type of integration must align with anticipated business needs. Reviewing the tools and considering multiple options is a best practice, especially when data integration needs vary.