Post Top Ad


Post Top Ad

Your Ad Spot

December 6, 2018

So, WTF is Artificial Intelligence Anyway?

December 06, 2018 113

Image By Seanbatty (Pixabay)

According to Encyclopedia Britannica, artificial intelligence (AI) can be defined as:

"The ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings. The term is frequently applied to the project of developing systems endowed with the intellectual processes characteristic of humans, like the ability to reason, discover meaning, generalize, or learn from previous experiences."

By now, we have all heard about how AI can make it possible for computers, machines and other electronic devices to perform increasingly complex and human-like tasks.

As all this sounds almost like magic, with machines performing increasingly complex tasks —from new gaming computers to self-driving cars – in reality most of AI technologies rely on the blend of software methods and technologies that imply collecting, processing and recognizing patterns within large amounts of data.

So, how does AI Works?

AI’s development initiated as an effort towards the creation of systems with similar human-intelligence capabilities having, according to, two main goals:

  • The Creation of Expert Systems − The systems which exhibit intelligent behavior, learn, demonstrate, explain, and advise its users.
  • The Implementation of artificial  “Human Intelligence” in Machines − Creating systems that understand, think, learn, and behave like humans.

By providing these devices with key abilities like learning from experience and adjusting to the type of input received, software providers enable them to change and adapt to produce insights by detecting these variances.

Through its evolution, AI has been continuously incorporating technological contributions from many sciences and disciplines, ranging from mathematics to biology and computer sciences and has evolved in parallel with many other sub-field disciplines or subareas of AI (Figure 1).

Some of these, subareas include:

  • Machine learning (ML). Basically, ML uses methods like statistics, neural networks, operations research and others to automate the analytical model building process that makes it possible to find hidden patterns and insights using large data sets. You can check WTF is ML here.
  • Neural networks. A neural network is a specific type of machine learning method built of interconnected units (network) able to iteratively process data by responding to external inputs and relaying data between each unit. The process requires multiple runs over the data set to find connections and derive meaning from undefined data.
  • Deep learning. A special case of ML that applies neural networks, is composed of many layers of processing units. The network has taken advantage of continuous advances in computing power as well as new training techniques to “learn” complex patterns within large data sets. Among its many applications image and speech recognition can be included as preponderant ones. Check WTF is Neural Network here.
  • Natural language processing (NLP), or the technology that gives computers the ability to “understand” and generate human language, written or in speech. Today, NLP includes human-computer interaction, in which both devices and humans communicate using normal, everyday language to interact.
  • Computer vision. Relying on some previous technologies mentioned and especially on pattern recognition deep learning, computer vision aims to recognize what’s in an existing image or video. By analyzing and understanding images, computers and devices can capture images or videos in real-time and interpret with accuracy what they contain.
  • Cognitive computing. A more recent addition to the AI field, cognitive computing aims also to provide information and insights that allow improving the decision-making, while enabling natural interaction between computers, devices and users. The main objective is to enable machines to mimic human processes and provide insights in a “human natural” fashion (language, images, etc.)

 Figure 1. Some of many AI sub-fields

In essence, AI consists on developing algorithms and models that can ingest large amounts of data and, using an iterative process, progressively learn and adapt to improve the information outcome.

With each iteration AI learns and acquires a new “skill” which enables it to improve the way it performs a classification or a prediction.

Today AI and many of its sub-fields are especially suited to approach problems that require working with data:
  • In large amounts
  • that it’s not structured, well organized or well formatted
  • That changes constantly
As AI finds structure and regularities in data, the algorithm keeps improving and acquires a skill, the algorithm keeps executing until it can  accurately classify or predict.

A key aspect of AI models is they adapt when given new data, which allows the model to adjust through training.

Traditional and AI based programs have important differences between them, while traditional programs are coded using a set of precise instructions and rules to answer specific questions, AI programs are flexible to allow the answering of generic questions.

According to Dr. Rajiv Desai, there are important differences between traditional and AI based software solutions which include, processing, nature of the data input and structure, among others (Figure 2):

Figure 2. Conventional programming vs AI programming (Credit: Dr. Rajiv Desai, An Educational Blog)

As opposed to conventional coding where the code is key to guide the process data in AI is, rather than the algorithm, the key value. Conventional software programs are provided with data and are told how to solve a problem, while AI programs work on exploiting inference capabilities to gain knowledge about a specific domain.

The following table (Figure 3), provided also by Dr. Rajiv Desai illustrates main differences between programming with and without AI.

Figure 3. Programming with and without AI programing (Credit: Dr. Rajiv Desai, An Educational Blog)

Due to its modular nature, AI is, in many cases, incorporated within existing applications rather than sold as an individual solution, although a new generation of programming platforms exist that enable users and organizations to develop AI-based applications.

Good But Still, What’s with All the Recent Hype With AI?

While we can date the initial development back to the 1940s —somehow in parallel to the own evolution of computer systems— it’s until recent years that AI has become almost omnipresent in any type of software system available, why?

While today  traditional computer programs can perform simple and increasingly complex tasks and analysis of data ―especially due to advances in computer processing speed as well as memory and storage power―, new business models keep increasing the demand for systems that can provide better insights and even act or decided on them, such is the case with new technologies like as mobility,  cloud computing or the internet of things.

All the previous is triggering the need for systems capable to analyze, predict and autonomously improve, features that traditional systems don't have.

So, aside from the new AI-based applications that keep emerging,  due to its modular nature, AI with all its sub set of methods and technologies have the capability to embed “intelligence” to existing software applications so today, a myriad of computers and new devices already in the market are being improved with new  AI capabilities, a reason why more and more applications are being infused with this pervasive technology.

So, today, a myriad of services, ranging from conversational platforms, to bots and smart machines are now being applied exponentially to more software and products to improve their services at homes, workplace, and even on the streets.

From Siri, added as a feature to all Apple products , to the brand new autonomous database services offered by the Oracle DWH Automation service, many products are now poised to be infused with advanced AI capabilities.

How About the Potential Applications of AI?

As mentioned, software applications in all industries and business areas keep incorporating small and big pieces of AI functionality within their domains. A good sample of the many current uses of AI include:

  • Cybersecurity. A growing number of organizations incorporate AI and ML algorithms to, for example, detect malware. ML algorithms and models can predict with increasing accuracy which files carry malware by looking into patterns within the file or how the data was accessed which can signal its presence.
  • Fraud Detection. As AI and ML algorithms improve and become more efficient, so the solutions to detect potential fraud. New systems for this purpose now incorporate AI for spotting and predicting potential cases of fraud across diverse fields, including banking or online purchasing sites. Organizations use AI’s capabilities to continuously improve their mechanisms for spotting potential cases of fraud by comparing millions of transactions and being able to distinguish between legitimate and fraudulent transactions.
  • Health Care. New AI applications can now provide personalized medicine and X-ray readings by analyzing images while AI based personal health care assistants can remind you to take your pills, exercise or eat healthier relying on the analysis of your personal health data.
  • Manufacturing. As data is streamed from connected equipment, AI based software can analyze manufacturing equipment’s data and forecast expected load and demand or predict its maintenance cycle by using specific types of deep learning networks that use sequence data.
  • Retail. AI can now provide retailers with virtual shopping capabilities, offer personalized services and recommendations for users while also gain efficient stock management and site layout via improved analysis and insight provided by AI.
  • Sports. New AI based solutions in sports can now be applied for image capturing of game plays and provide coaches with reports that can help them improve game tactics and strategy.

A we can see from the samples above, there are several cases where AI can be effectively applied for process improvement, efficient analysis, and better decision making.

How About the Software Available and its Adoption in an Organization?

Despite AI sound like a complicated and worst of all, expensive technology to adopt, currently AI has become accessible for almost any type of organization.

Now AI is embedded in so many software solutions that organizations of all sizes can adopt AI in some form and for a great deal of business uses that, it wouldn’t even be surprising if you are already using some AI enabled solution and not being unaware of it.

So, where should we start using AI within our organization? Well, this will depend on your organization’s budget, use case(s) complexity and existing expertise of AI to define what type of AI and consequently, what type of provider and vendor should we pick.

A good starting point would be to classify categories, those of companies offering AI solutions to understand in general the varied types of AI companies and how they could potentially help us to adopt some form of AI within our organization
In her blog The 3 major categories of AI companies, Catherine Lu, makes an interesting classification of AI companies, dividing them in three main categories:

  • Data science consulting firms: low productization
“Data science consulting firms are defined by their low level of productization. Their main advantage is that it’s easier for them to deliver great results, as AI models require customization and are highly dependent on customer data. Their disadvantage is that they cannot scale quickly. For companies that are expected to be high growth, they will need to figure out how to move out of this category.”

  • AI platform companies: high productization targeting many use cases
“AI platform companies offer to be the underlying infrastructure on top of which specific AI solutions live. They can allow end users to import data, perform data wrangling and transformations, train models, and perform model validation.”
This includes platforms like and Databricks.

  • Vertical AI companies: high productization targeting few use cases
“Vertical AI companies solve a particular business problem or set of problems with a productized solution. They enable their enterprise customers to achieve additional lift from AI without needing to build or maintain models in-house. Examples on this end are more numerous.”
This includes companies like DigitalGenius (customer support), Entelo (recruiting), Cylance (cybersecurity), or DataVisor (fraud detection).”

On a brief note, while Ms. Lu emphasizes her belief that vertical AI companies will be those that succeed due to their ability to provide productized solutions that scale, recent development and evolution of low code technologies make me think a bit different, as they are enabling a larger number of organizations to instead of adopt vertical solutions, to acquire AI development platforms that have lower learning curves and, consequently, enable the production of custom solutions with lesser effort but more customized capabilities.

Examples? Some include IBM (Watson), Amazon and Microsoft.

So… What’s in it for Me and my Organization?

Well, in short, AI could offer effective ways to achieve improvement in different fronts, including business operation, analytics efficiency as well as decision making improvement.
In wider view, the benefits of AI adoption can come in different forms, AI solutions deployed properly can allow organizations to streamline and improve operations via automation and adaptation while also  improving analysis processes to increase accuracy and improve chances of successful decisions.

Whether your organization decides to go easy and adopt a proven vertical AI solution or jump directly to developing AI solutions in-house, as more and more software providers keep infusing AI to their software offerings, it is only natural to expect AI will keep continuously evolving and, as it does, will be improving the way many software solutions work.

So, while science fiction novels and movies portray AI as machines and robots that can and will eventually rule the world, in reality, up to now AI is more about enhancing than replacing what humans can do, or can’t?

Read More

August 30, 2018

The BBBT Sessions: WhereScape

August 30, 2018 50
Originally founded in 1997 in Auckland, NZ as a data warehouse consulting company, WhereScape has evolved to become a solution provider and —especially in the last five years— a key player in the data management market and especially in the data warehousing and big data spaces.

During a great session with the BBBT, WhereScape showed their “latest and greatest” news and innovations, triggering meaningful discussions and interactions with the analysts of the BBBT.

Here, a summary and commentary of that cool session.

WhereScape at a glance

As mentioned before, during an evolution process that expands for more than 20 years, WhereScape became a solution provider of data infrastructure and automation solutions, it currently offers three main solutions:
  • WhereScape 3D. A solution to assist in planning, modeling and designing data infrastructure projects as well as enabling rapid prototyping.
  • WhereScape RED. A solution to enable fast-track development, deployment and operation of data infrastructure projects and reduce delivery times, effort, cost and risk of new projects.
  • WhereScape Data Vault Express. A solution especially designed to enable the automation of the entire life cycle of a modern database modeling method Data Vault 2.0-based project delivery.
Along with its vast experience in the consulting field and later with its software solutions, WhereScape has become a major player that navigates in the frontline of the data warehouse and data infrastructure automation market.

Is in the context of “automation” that the BBBT session with WhereScape was focused on, centering on the software company’s advances within its data infrastructure software.

Today and especially since the last couple of years, it comes as no surprise that automation is becoming increasingly important for almost all major software technology footprints, including the data warehousing and data management markets.

Well, because as Neil Barton —WhereScape’s chief technology officer— explains neatly during the briefing: automation in a data infrastructure can provide great value to an organization which can be translated into many aspects of its operation, from cost, development time and risk reduction, to enabling IT to keep the pace with business needs.
This is especially true when data warehousing has traditionally been done in a complicated and cumbersome way (Figure 1).

Figure 1.  The traditional data warehousing development/maintenance cycle (Image courtesy of WhereScape)

Consequently, WhereScape aims to provide automation and simplification of most, if not all, the process data warehousing process by incorporating not just plain automation of tasks but encapsulation of methodologies, the best practices and industry standards within to enhance simplification, provide time reduction and yet, ensuring compliance with internal and external data management regulations.

One of what I think is a unique aspect of WhereScape is the company’s “holistic” view to automation in which all the data cycle is considered via a metadata driven approach, making it possible for WhereScape to easily provide documentation and lineage and have a full cycle management approach within a single solution (Figure 2), and more so, to bring integration to WhereScape’s own set of solution.

Figure 2. Simplification and automation with WhereScape (Image courtesy of WhereScape)

Interestingly, while not trying to “reinvent the wheel”, and remaining loyal to a well proven life cycle (Figure 3.) WhereScape developed and made this life cycle an intelligent and automated one by aiming to improve all its faces from the discovery to the rest of its stages to reduce time, effort and augment efficiency.

By being metadata-driven and having an integrated structure within all its components, WhereScape  incorporates automation to the entire life cycle process and enable smoother documentation.

This is especially true under the design and operation processes —normally a complex and tedious stages— in which WhereScape introduces, for the first one, an automated model generation engine which allows speeding the common iterative design cycle as well as the ability to manage efficiency by enabling solid dependency management and integrated scheduling and logging to ensure efficient auditing and operation analysis.

Figure 3. WhereScape’s automation life cycle (Image courtesy of WhereScape)

Additionally, to showing efficient automation control and management, WhereScape also shows flexibility by offering support for a set of projects, including:
  • Data Warehouses
  • Data Marts
  • Data Vaults
  • Cloud deployments and
  • Big data support
Making WhereScape to count as an important emerging player in the data management automation market.

WhereScape at the BBBT

After a summary of what WhereScape is all about, allow me to highlight some of the most relevant aspects of what was a great presentation conducted by Neil Barton, WhereScape’s Chief Technology Officer (CTO).

The briefing gave us good info about what WhereScape has been working in recent months and what is now up to for the future, some of which include:
  • Increasing data platform coverage. WhereScape has been working to expand its set of “data infrastructure automation” solutions to a wider number of data platforms, which now includes:
    • Amazon Redshift
    • Microsoft Azure SQL Data Warehouse
    • Snowflake
    • EXASOL
    • SAP HANA
    • PostgreSQL
  • Extensive support for Data Vaults.  WhereScape has worked on providing extensive support for Data Vault —a database modeling method designed to provide long-term historical storage of data coming in from multiple operational systems— and more specifically engineered for Data Vault 2.0.
  • Support for Data Streaming. The briefing included an interesting discussion about WhereScape’s new support for data streaming a feature aimed to help IT teams to manage hybrid flows of streaming real-time and traditional batch-based data by enabling the design, development and deployment of more advanced data infrastructures.
  • Reinforcing Support for Cloud Data Platforms. Within the company’s efforts expand support for data platforms it can be highlighted the continuous support for cloud-based data platforms which includes direct support for cloud-based data management engines including Snowflake, Amazon’s Redshift, and Microsoft SQL Azure.
  • Addressing a Hybrid Reality. WhereScape has taken steps to address a reality for companies that now navigate data that exist both on cloud and on-premises sources making it possible to move data between both world and to ease this transition.   

On another key moment during WhereScape’s presentation, WhereScape guided us towards what the company has developed to enable organizations to, via its solutions combined with key partners including StreamSets, include support for streaming IoT integration.

Real-time data flows, or “streaming” data sources, can now be collected from many areas within an organization’s data landscape, including from in-field units sharing sensor-based data, social media feeds to support sentiment analysis, or from internal systems feeds.

Utilizing industry-leading dataflow management technology developed by StreamSets and the proven efficiencies of WhereScape automation solutions WhereScape RED or WhereScape Data Vault Express, WhereScape automation with Streaming minimizes the learning curve for IT teams working with these new data sources and ensures best practice integration of streaming data (Figure 4).

Figure 4. WhereScape’s Streaming IoT Integration (Image courtesy of WhereScape)

Serving as a central element, WhereScape aims to provide all the necessary tools to complete the full IoT cycle from data collection to the provision of insights and information for users, automating the event queuing transformation and storage and administration stages and according to WhereScape:
  • New analytic opportunities
  • Speed up pipeline development
  • Hide complexity of underlying technologies
  • Minimize the learning curve for teams
  • Integrate with batch-based data
  • Ease ongoing management
The briefing portrays a great environment to help us confirm WhereScape’s role in the market as well as what the company is doing to improve all their solutions to adapt to new technologies and needs. A great, clear and concise briefing by WhereScape.

WhereScape: A Tough a Couple of Final Thoughts

From we can see, WhereScape has been interestingly evolving, from being a data warehouse automation solution to become what the company calls: a data infrastructure automation solution.

Interestingly, the company has shown savviness to evolve its solutions so that, via a model driven design, to incorporate not just connectors to new data sources but the necessary tools and design to enable organizations to modernize full data management platforms in complete hybrid environments.

Figure 5. WhereScape’s RED Multiple Data Model Architecture Overview (Image courtesy of WhereScape)

Of course, even though WhereScape has grown a solid and intelligent solution strategy, from a market perspective, gaining market share hasn’t and will not be a walk in the park.

As automation technologies continue to gain steam in all main software areas, a larger number of software companies are now incorporating increasingly sophisticated mechanisms to their data management and analytics solutions thus, making automation another key feature in an already extremely competitive market, such is the case for Oracle and its new Oracle Autonomous Data Warehouse Cloud Service or offerings by other competitors like Attunity Compose or Panoply

So, does WhereScape can lead in data infrastructure automation?

Of course, but will need to make it can consistently deliver IT organizations with the means to achieve effective automation within all faces —design, development, and deployment— of, especially, a data warehouse and, generally, a full fledge data infrastructure.

This includes, achieving seamless collection and processing of multiple source types and of course effective complexity cutting.

From the briefing, it seems WhereScape is, if not there, on a right way to achieve some or many of these goals, making it one of those companies up to the challenge for gaining the favor of companies with complex data infrastructures to deal with.

WhereScape is rapidly realigning to serve a new generation of data management systems and needs.

Read More

July 26, 2018

Informatica Partners with Google Cloud to Provide AI-Driven Integration

July 26, 2018 114

As cloud computing adoption continues to grow, so  the need for modern and more efficient business and data integration capabilities.

And while many aspects of business and data integration are being simplified and automated, the increasing sophistication of business needs and the requirement of capabilities for performing highly efficient integration continuously are forcing organizations to make continuous calls for new and ongoing digital transformation efforts.

In this vein, an interesting news came in just a couple of weeks ago when a partnership between Informatica, a big player in the integration platform as a service (iPaaS) and tech giant Google was announced.

Of course, the mere fact that two major players in the software industry decide to partner is already something worth to listen to, but the partnership is also particularly interesting because it involves the provision of artificial intelligence(AI)-driven integration services, in an enormous effort from both companies to bring integration services to a new level.   

The Details

The announcement specifically describes a new relation between Informatica and Google Cloud’s Apigee product group —the company’s group devoted to help organizations design, secure and scale application programming interfaces (APIs)— to enable customers to rapidly leverage data and applications, deployed within hybrid and multi-cloud environments through innovative AI-driven integrations and APIs.

According to both companies' communique, customers now can develop APIs that can easily enable access to applications, data and metadata and make use of AI-driven predictive analysis and recommendations powered by the Informatica CLAIRE engine, Informatica’s enterprise unified metadata intelligence engine.

The new Informatica Integration Cloud for Apigee, aims to provide “zero-code” API development and management capabilities. According to the announcement:
“Developers, integrators, and analysts will be able to point to any data or application, turn it into a secure, managed API with the click of a button, and then integrate and orchestrate business processes with a simple, wizard-driven drag and drop.”
Other relevant aspects of the Informatica-Google partnership include product-level integrations, enabling organizations to take business integration processes built in Informatica and publish them as managed and secure APIs to the Apigee Edge API Platform. Secure API proxies can be quickly built by automatically discovering Informatica business integration data and processes.

From Apigee’s side, the platform will provide customers with Informatica Intelligent Cloud Services that are an integrated edition within the Apigee Edge API Platform.

So What?

Well...while perhaps not flashy, this new partnership announcement news carries in my view a relevant message to the data management market, signaling on one hand, the increasing importance of seamless integration of enterprise software services and a new approach to designing, developing and deploying intelligent, universally embeddable and easy to use processes and data management tasks. And on the other, the continuous effort from software providers to enhance their ability to simplify business and function integration through APIs, including efforts made by other companies including Microsoft, Oracle and others.

To this point, Ed Anuff, director of product management, Google Cloud mentioned:
“Modern business isn’t just about adopting a mobile strategy or using the cloud to generate efficiency savings. Enterprises are leveraging new integrations for seamless workflows that allow them to use data and applications to create remarkable experiences for their customers, employees and partners. With the product-level integrations between Apigee Edge and Informatica's Integration Cloud, we can deliver end-to-end API life cycle management and integration capabilities to help enterprises accelerate their journey to become modern, connected digital businesses.”
Personally, I think this news keeps signaling what will a next stage in enterprise software where integration will be developed under new paradigms to enable low-code, increased device and platform portability as well as neater third party integration.

Finally, you can either take a moment to read a good piece on the Google Cloud Platform Blog introducing these and more new capabilities being introduced within the new version of the Apigee platform or throw me a comment in the lines below.
Read More

June 5, 2018

WTF is Deep Learning Anyway

June 05, 2018 56
Following on my previous WTF post on Machine Learning, it just make sense to continue in this line of thought to address another of many popular and trendy concepts. We are talking about: Deep Learning.

So without further due, lets explain WTF is deep learning shall we?

Simply put, and as inferred from the previous post mentioned, deep learning is one of  now many approaches to machine learning we can find out there, along the lines of other approaches like decision tree learning, association rule learning, or Bayesian networks.

While deep learning is not new, was introduced by Dr. Rina Dechter in 1986, its until recent years that this approach have gained fame and popularity among users and particularly among software companies adopting it within their analytics arsenals.

Deep learning enables to train the computer to perform tasks including recognizing speech, identifying images or making predictions by, instead of organizing data to run through predefined equations, sets up basic parameters about the data and train the computer so it can “learn” by recognizing patterns and by executing many layers of processing.

So, What Has Made Deep Learning so Popular?

Many factors have played out to enable the popularity of machine learning in general as well as deep learning in articular.

Today, modern deep learning can provide a powerful framework for supervised learning and for addressing increasingly complex problems, consequently it has gained huge popularity in many fields of computing, including computer vision, speech and audio recognition, natural language processing (NLP), bioinformatics, drug design and many others but, why?

This popularity have to do on one hand, of course, with the fast evolution of deep learning algorithms but also, due to the converged evolution of core computer processing related technologies including big data, cloud computing or in-memory processing which has enabled deep learning algorithms which require intensive computer resources to be deployed in increasingly faster and more efficient computing infrastructures.

On the other, due to the evolution and consumerization of peripheral technologies like mobile and smart devices  which have made it possible to providers to embed deep learning functionality within increasing systems and for increasing use cases and reach more audiences that can use and develop deep learning in a more “natural” way.

How Does Deep Learning Works?

In general, most deep learning architectures are constructed from a type of computing system called artificial neural networks (ANN) —I know, we will get to its own WTF soon— yet they can also include other computing structures and techniques so, Inspired by the structure and the functions of the brain, deep learning usage of ANN’s recreates the interconnection of neurons by developing algorithms that mimic the biological structure of the brain.

Within an ANN, units (neurons) are organized in discrete layers and connected to other units so that each layer choses a specific feature to learn (shapes, patterns, etc.). Each layer creates a depth of “learning” or “feature to learn” so that, by adding more layers and more units within a layer, a deep network can represent functions of increasing complexity or depth. Is this layering or depth that gives the deep learning its name (Figure 1).

Figure 1.  A 3-layer neural network with three inputs, two hidden layers of 4 neurons each and one output layer. (Source: CS231n Convolutional Neural Networks for Visual Recognition)

Until now most, if not all, deep learning applications deal with tasks or problems that, as the previous figure shows, consist on mapping an input vector to an output vector, allowing the solving of problems that require large enough models and large enough datasets.

These problems are  commonly those that humans would solve relatively easy and without a need to reflect on them (identify forms and shapes, for example), and yet due to the increasing computing power available and the continuous evolution of deep learning, are now allowing computers to perform even faster than humans.

It's clear then that both machine learning in general, and deep learning in particular change the common paradigm for analytics by, instead of developing an algorithm or algorithms to instruct a computer system on how to specifically solve a problem, a model is developed and trained so that the system can “learn” and solve the problem by itself (Figure 2).

Figure 2.  Traditional programming vs Machine learning approaches. (Source: CS231n Convolutional Neural Networks for Visual Recognition)

A key advantage of deep learning is that while a traditional approach will start by using the available data to perform feature engineering and then select a model to estimate parameters within an often repetitive and complex cycle to finally get to an effective model, deep learning replaces it with layer approach in which each layer can recognize key features from patterns or regularities from the data.

Hence, deep learning replaces the formulation of a model using instead characterizations (or layers) organized hierarchically  that can “learn” to recognize features from the available data (Figure 3), which result in the construction of "systems of prediction" that can:

  • Avoid use of hard parameters and business rules
  • Make better generalizations
  • Improve continuously

Figure 3.  Machine learning vs Deep learning. (Source: Xenonstack)

On the downside side, one common challenge when deploying and application of deep learning is it requires intensive computational power due to:

  1. The iterative nature of deep learning algorithms
  2. The increasing complexity as amount of layers increases
  3. The need for large volumes of data to train the neural networks

Still, deep learning’s continuous improvement feature sets an ideal stage for any organization to implement dynamic behavior features within their analytics platforms.

Applications of Deep Learning?

Today, deep learning has already applied in many industries and lines of businesses and, it keeps increasing at a constant pace. Some areas where deep learning has been successfully applied include:

Recommendation Systems

This is perhaps the flagship use case for machine learning and deep learning, companies including Amazon and Netflix have worked on using these systems to develop systems that can, with good chances of assertion, know what a viewer might be interested in watching or purchasing next, after his/her past behavior.

Deep learning enhances their recommendations in complex environments by increasingly learning users interests across multiple platforms.

Image and Speech Recognition

Another common applications of deep learning in the software industry is speech and image recognition, on the speech recognition aspect, companies like Google, Apple and Microsoft have applied deep learning to products like Google Now, Siri and Cortana to recognize voice patterns and human speech.

One the image recognition side, regardless of how challenging can be, it’s possible to find projects already applying deep learning with different levels of success, companies like DeepGlint are using deep learning to recognize and acquire real-time insights from the behavior of cars, people and practically any object.

Applications like this have huge potential in sectors including law enforcement or self-driving cars.

Natural Language Processing

Neural networks and deep learning had been key for the development of natural language processing (NLP), an area of artificial intelligence that develops techniques and solutions that allow “natural” interaction between computers and human languages, especially to enable the processing of large amounts of natural language data.

Companies like MindMeld use deep learning and other techniques to develop intelligent conversational interfaces.

We could go on describing more use cases for deep learning but perhaps its is fair to say the number and types of applications for deep learning keep growing.

What is Out There in the Market?

Currently there are varied options for using or deploying deep learning, both to start experimenting and developing or, to deploy enterprise ready solutions that apply deep learning.

For those organizations with the will for development and innovation, open source based deep learning frameworks and analytics like Tensorflow, Caffe or Pytorch represent a great opportunity to get them up and running.

Other great solutions for developing and applying deep learning solutions include data science platforms like Dataiku, Datarobot or, just recently acquired by Oracle.

Also, users and organizations can take a practical approach and use niche vertical solutions like cloud-native endpoint protection platform CrowdStrike, healthcare software provider Flatiron Health, or security intelligence & analytics (SIA) company Endgame among many others.

Today deep learning and machine learning solutions are increasingly available for small, medium and large companies while, promoting a continuous and fast evolution of these techniques within the market landscape, no surprisingly, expectations are high from users to address and solve increasingly complex problems.

It also hints that perhaps, with new advances and techniques seeing the light of day so frequently, we are just at the beginning of a new era in the analytics marketplace.

It seems deep learning is no joke, or is it?

Read More

May 16, 2018

Oracle’s New Cloud Services: A New Big Push for Automation

May 16, 2018 213
With a recent announcement Oracle, the global software and hardware powerhouse follows on its continuing effort to equipe all the solutions from its Cloud Platform with autonomous capabilities.

As part of a venture that started early this year with the announcement of the first set of autonomous services —including Oracle Autonomous Data Warehouse Cloud Service— and the announcement of Oracle 18c to be Oracle’s first fully autonomous database— the company is now extending these capabilities with the launch of another set of services in the cloud.

This time the turn is for three new services: Oracle Autonomous Analytics Cloud, Oracle Autonomous Integration Cloud, and Oracle Autonomous Visual Builder Cloud which, according to Oracle, will be followed by the release of more autonomous services later through the year and which will be focused on mobile, chatbots, data integration, blockchain, security and management,  as well as more traditional database workloads including OLTP.

Built from the ground-up with advanced artificial intelligence (AI) and machine learning algorithms, Oracle’s Cloud Platform new autonomous set of services aims, according to Oracle, to automate and/or eliminate tasks so organizations can lower costs, reduce risks and accelerate innovation while also gain predictive insights.

In this regard Amit Zavery, executive vice president of development from Oracle Cloud Platform mentioned:
“Embedding AI and machine learning in these cloud services will help organizations innovate in revolutionary new ways. These new cloud services are the latest in a series of steps from Oracle to incorporate industry-first autonomous capabilities that will enable customers to significantly reduce operational costs, increase productivity, and decrease risk.”

Oracle´s new and existing autonomous services within Oracle Cloud Platform services all follow the companies guidelines and fundamental autonomous capabilities, which can be summarized as:

  • Self-Driving capabilities that can reduce/eliminate human labor throughout all processes: provisioning, securing, monitoring, storing and copying and troubleshooting.
  • Self-Securing capabilities to secure services from external attacks and malicious internal users, this includes automatically application of security updates, protection against cyberattacks as well as automatically encrypt all data.
  • Self-Repairing capabilities for providing automated protection against planned and unplanned downtime, including maintenance.
The new autonomous services announced by Oracle are planned to impact different functional aspects of an organization’s necessary enterprise software services, from analytics to software development, a brief description of these new services include:

Oracle Autonomous Analytics Cloud

This service assembles a combination of technologies including machine learning, adaptive intelligence, and service automation within an analytics platform aiming to change the way users  analyze, understand, and act on information.

Oracle’s Autonomous Analytics Cloud service includes also functionality to enable business users to uncover insights by asking questions on their mobile devices. Natural language processing techniques convert questions into queries in the back end to processed so the system can deliver visualizations on their device.

The service’s machine learning functionality can autonomously gain intelligence and proactively suggest insight on data the user might not even have asked for or reveal hidden patterns.

The service is designed so can provide predictive analytics on IoT functionality data by applying domain specific machine learning algorithms on large volumes of sensor data and historical patterns.

Oracle Autonomous Integration Cloud

This service aims to speed an organization’s complex application integration process via automation.

Business processes that expand both Oracle and non-Oracle applications sitting over on-premises and SaaS applications can be embedded and integrated through a best-practice guided autonomous application integration process using machine-learning and pre-built application integration techniques.

The Autonomous Integration Cloud Service delivers an adaptive case management system through APIs with AI and machine learning frameworks which enables also Robotic Process Automation (RPA) to deliver process automation to systems with no APIs enabled.

Autonomous Visual Builder Cloud

Oracle’s Autonomous Visual Builder is designed to help companies accelerate their mobile and web application development cycles by providing business users and developers a framework to build applications with no coding.

By using the latest industry-standard technologies, the service automates code generation and allowing its deployment with a single click.

Aside from enabling rapid application development the service also automates the delivery of mobile applications across multiple mobile platforms including iOS and Android as well as availability to development on standard open-source technologies including Oracle JET and  Swagger.

So what?

Well, with a set of significant and continuous moves to achieve automation, Oracle aims to gain significant leading edge across the software industry which has become increasingly competitive.

Oracle is making clear it will extend autonomous capabilities throughout its entire Cloud Platform, committed to provide self-driving, self-securing, and self-repairing capabilities to all its PaaS services. Yet, in my view, even with all the potential advantages these movements might bring to the company is taking not a small risk, one perhaps comparable with IBM and Watson which for some time seemed to be launched in a time where most users were not ready for all the goodness the AI system could provide them with.

This been said, it's still hard of course not to be excited about Oracle’s new promise for a new generation of fully autonomous software, able to achieve many of the end objectives for what expert systems and artificial intelligence visionaries have dreamed for.

In the meantime, I hardly can wait to know what will be the response in the market both from users and of course, from Oracle’s competitors, if any, to these new release from Oracle.

Read More

May 4, 2018

Hadoop Platforms: The Elephants in the Room

May 04, 2018 268
"When there’s an elephant in the room introduce him"
-Randy Paush

It is common that when speaking about Big Data two major assumptions often take place:

One: Hadoop comes to our minds right by its side, and many times are even considered synonyms, which they are not.

While Big Data is the boilerplate concept that refers to the process of handling enormous amounts of data coming in different forms  (structured and unstructured), independent of the use use of a particular technology or tool, Hadoop is in fact, a specific open source technology for dealing with these sort of voluminous data sets.

But before we continue, and as a mind refresher, let’s remind ourselves what is Hadoop with their own definition:
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Commercial Hadoop distributions assemble different combinations of various open source components from the Apache Software Foundation and more specifically from the Apache Hadoop stack.

These distributions integrate all components within a single product offering as an enterprise  ready commercial solution. In many cases, some distribution offer also proprietary software, support, consulting services, and training as part of their offering.

Two: When talking about Hadoop and its commercial use, quite often three common suspects come to our minds which, due to their history and ties with the evolution of Hadoop have become major players, we are talking about Cloudera, Hortonworks and MapR.

While there’s no doubt these Hadoop-based data platforms are major players, nowadays we can find  a significant number of options from which a company can choose from. So, to follow Mr. Pausch advice, let’s take a look at a list of Hadoop-based data platforms available in the market and introduce them.

Alibaba Cloud
Solution: Alibaba E-MapReduce Service

The Alibaba Cloud Elastic MapReduce (E-MapReduce) is a cloud-based big data processing solution based on Apache Hadoop and Apache Spark. E-MapReduce's flexibly allows the platform to be applied in different big data use cases including as trend analysis, data warehousing, and analysis of continuously streaming data.

Being in the cloud, E-MapReduce offers big data processing available within a flexible and scalable platform of distributed Hadoop clusters and seamless integration with the rest of the Alibaba Cloud offerings available.

Amazon Web Services
Solution: Amazon EMR

With Amazon EMR, the company provides a cloud-based managed Hadoop framework to make it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.
With Amazon EMR is also possible to deploy and run other open source distributed frameworks including Spark, HBase, Presto, and Flink within Amazon EMR and, interact with data stored in other AWS data stores like Amazon S3 and Amazon DynamoDB.

Amazon EMR includes interesting features for log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bio-informatics capabilities.

Solution:  Arenadata Hadoop (Open Analytical Platform)

The ArenaData Unified Data Platform is composed of a set of components along with Hadoop, including all the necessary software to access, manipulate, protect and analyze data.

Arenadata Hadoop (ADH), aims at handling semi-structured and unstructured data. It's an enterprise ready Apache Hadoop based distribution. Today, Arenadata Hadoop (ADH) is certified to fully comply the ODPI (Open Data Platform initiative) standard to fully deploy and assembly a completed Apache-based set of open source products, without proprietary software.

Arenadata Hadoop provides a full set of tools for autonomous installation on physical, as well as virtual machines. A software for monitoring and administration helps the system to optimize performance on all system’s components while with Apache Ambari it provides the necessary interfaces required for integration with current administrative systems including as like Microsoft System Center and Teradata ViewPoint.

Solution: Cloudera Enterprise Data Hub

The Enterprise Data Hub (EDH) is Cloudera’s Hadoop data platform distribution, it is a solution intended for enabling fast, secure, and easy big data software available. From data science and engineering, to powering an operational database, to running large-scale analytics, all within the same product.

Offered in different flavors: Analytic DB, Operational DB, Data Science & Engineering as well as an Essentials version, Cloudera’s EDH also offers, aside from its analytics and data management capabilities, features to run in the cloud like:

  • High-performance analytics. Able to run any analytics tool of choice against cloud-native object store, Amazon S3.
  • Elasticity and flexibility. Support transient Hadoop clusters and the ability to scale up and down as needed as well as use of permanent clusters for long-running BI and operational jobs.
  • Multi-cloud provisioning. Deploy and manage Cloudera Enterprise across AWS, Google Cloud Platform, Microsoft Azure, and private networks.
  • Automated metering and billing. To only pay for what a company needs, when it needs it.

Solution:  Gluent Data Platform

Implemented in large organizations across industries including: finance, telecom, retail and healthcare around the world, the Gluent Data Platform offers a Hadoop data platform for data offloading and access as well as its analysis.

Some benefits and features offered by Gluent include, among others:

  • High parallelism in Hadoop using cheap Hadoop cluster hardware and software
  • No changes required to existing application code for connection with sources by using Gluent’s Smart Connector
  • Offers capability to choose from and use multiple data engines (like Impala, Hive and Spark) to process your data
  • No data conversion or export/import is needed when using new engines on Hadoop

Google Cloud Platform
Solution:  Cloud Dataproc

Google’s Cloud Dataproc is a fully-managed cloud service for running Apache Spark and Apache Hadoop clusters. Some of the features of Cloud Dataproc includes:

  • Automated cluster management
  • Re-sizable clusters
  • Versioning
  • High availability
  • Integration with developer tools
  • Automatic or manual configuration
  • Flexible virtual machines

Cloud Dataproc also easily integrates with other Google Cloud Platform (GCP) services to provide a complete platform for data processing, analytics and machine learning.

Solution:  Hortonworks Data Platform (HDP)

HDP is an enterprise ready and secure Apache Hadoop distribution designed on a centralized architecture based in YARN. HDP aims to address the complete set of needs for data-at-rest, as well as to power real-time customer applications and deliver robust big data analytics solutions.

Whether on-premises or in the cloud, Hortonworks provides flexibility to run the same industry-leading, open source platform to gain data insights in the data center as well as on the public cloud of choice (Microsoft Azure, Amazon Web Services or Google Cloud Platform)
Solution:  Infosys Information Platform (IIP)

IIP is a data and analytics platform designed to help enterprises leverage their data assets for innovation and enhance business growth. The solution can easily integrate with proprietary software, to allow companies to maximize value from existing investments.

According Infosys, IIP is collaborative platform that enables data engineers, data analysts and data scientists to work jointly across business domains and verticals. IIP can be deployed with ease and without vendor lock-in.

With improve security with role-based access controls that include cell-level authorizations IIP helps enterprises to simplify their data management operations and understand data better to accelerate the data-insight-action cycle.

IIP aims to be the right tool for organizations that want to gain real-time insights, get faster business value, stay compliant with updated governance and robust security, and reduce total cost of ownership with high availability.

Solution:  MapR Converged Data Platform

MapR’s Converged Data Platform integrates Hadoop, Spark, and Apache Drill along with real-time database capabilities, global event streaming, and scalable enterprise storage to provide a full enterprise ready big data management platform with Hadoop.

The MapR Platform aims to deliver enterprise grade security, reliability, and provide real-time performance capabilities while lowering both hardware and operational costs for applications and data.

The MapR Converged Data Platform has the ability to simultaneously perform analytics and applications at high speed and enable scaling and reliability. The strategy is to converge all data within a data fabric allows its storage, management, processing, and its analysis as data is being generated.

Mastodon C
Solution:  Kixi

Mastodon C’s open source data platform Kixi uses Hadoop, Cassandra and a set of open source technologies to ingest and integrate batch and real-time data within a single repository, from which the platform  can aggregate, model, and analyze it.

Some of kixi’s main features include:

  • Handling of real-time and sensor data via Apache Kafka
  • ETL and batch processing capabilities
  • Data Science capabilities for advanced data analysis
  • Ongoing support to ensure efficient data processing and continuous review and improvement of customers data pipelines and models.

Microsoft Azure
Solution:  Microsoft Azure HDInsight

Backed by Hortonworks, Azure’s HDInsight is, according to Microsoft, a fully-managed, full-spectrum open source analytics service for enterprises.

The Azure HDInsight service aims to provide a fully-managed cloud service to make it easy for organizations to process massive amounts of data via popular open source frameworks including Hadoop, Spark, Hive, LLAP, Kafka, Storm, R and others.

Azure HDInsight provides an architecture landscape for different use cases including ETL, Data Warehousing, Machine Learning, IoT and other services within an integrated platform.

Solution:  NEC Data Platform for Hadoop

Another offering powered by Hortonworks, NEC’s "Data Platform for Hadoop" is a pre-designed and pre-validated Hadoop appliance which integrates NEC's specialized hardware and Hortonworks’ Data Platform.

This NEC Hadoop-based appliance is already tuned to work with an enterprise ready Hortonworks platform, already certified for working on NEC’s server hardware.

Solutions: Oracle Big Data Cloud Service and Oracle Big Data Cloud 

Oracle has gone “big” with big data, with both its Big Data Cloud and Big Data Cloud Service, the mega tech vendor offers a couple of Hadoop-based data management platforms: The Oracle Big Data Cloud Service and Oracle Big Data Cloud.

Derived from a partnership with Cloudera, the Oracle Big Data Cloud Service aims to enable organizations to launch their Big Data efforts by providing a data platform within a secure, automated and scalable service that can easily can be fully integrated with existing enterprise data in Oracle Database. The service has been designed to:

  • Deliver high performance through dedicated instances
  • Allow dynamic scaling as needed
  • Reinforce and extend security to Hadoop and NoSQL processes
  • Deliver a comprehensive solution that includes robust data integration, capabilities and integration with R, spatial and graph software

Oracle Big Data is an enterprise-ready Hadoop data platform intended for those organizations that want to run big data workloads including batch processing, streaming and/or machine learning within a public or as a private cloud configuration.

Solution:  Qubole Data Service (Apache Hadoop as a Service)

Qubole offers an autonomous data platform implementation of Apache Hadoop in the cloud. The Apache Hadoop as a Service, part of Qubole Data Service offers a self-managing and self-optimizing implementation of Apache Hadoop that can run on different public cloud infrastructures including AWS, Azure and Oracle Cloud.

Qubole’s Hadoop service runs applications in MapReduce, Cascading, Pig, Hive, and Scalding. The service is optimized for faster workload performance and incorporates an enterprise-ready data security infrastructure.

Solution:  SAP Cloud Platform Big Data Services

SAP’s Big Data Services on its Cloud Platform is a full-service big data cloud-based Hadoop and Spark data platform.

The platform allows companies to utilize Apache Hadoop, Spark, Hive and Pig, as well as several third-party applications to take advantage of the most recent innovations in big data and attend the diverse set of use cases an organization might have.

Also, and worth mentioning, is that the service integrates with SAP Leonardo, the company’s IoT and digital innovation platform to take a systematic approach to digital innovation with SAP Leonardo’s capabilities while, according to SAP, the platform meets rigorous demands for reliability, scalability, and security.

Solution:  Syncfusion Big Data Platform

Syncfusion Big Data Platform is a full fledge Hadoop distribution designed for Windows, Linux, and Azure. One of the things that make this Hadoop platform interesting, aside from its features for managing huge data loads is its ability to easily create, deploy, and scale a secure Syncfusion Hadoop cluster with basic or Kerberos enabled authentication in a Microsoft Azure Virtual Machines environment.

Syncfusion cluster manager allows to effectively manage the resources in Microsoft Azure with options to track billing details and shut down, restart, and destroy the virtual machines as required or start and stop the virtual machines with the Hadoop cluster at scheduled intervals.

Additionally, Syncfusion Big Data Platform includes support for creating and managing Hadoop clusters within Linux environments, Azure Blob storage for Azure VM-based Hadoop clusters as well as integration with Elasticsearch and MongoDB data access with Spark, among many other features.

Solution:  T-Systems Big Data Platform

The T-Systems Big Data Platform offering is a full solution Hadoop and in-memory based solution that comprises consultancy, planning, implementation and the optimization of big data analysis solutions and processes.

Along with a partnership with Cloudera and SAP HANA, and other best of breed data management tools, T-Systems provides organizations with a Hadoop ecosystem. T-Systems’ big data solution offers a scalable big data platform in the cloud.

The solution offers a full set of functions for the collection, backup and processing of large sets of unstructured data.

Additionally, T-Systems’ big data solution includes capabilities for real-time analytics, done with SAP HANA's in-memory architecture, which allows all data to be directly stored in main memory (RAM).

Solution:  Teradata Appliance for Hadoop

The Teradata Appliance for Hadoop is Teradata’s enterprise Hadoop implementation approach. A ready-to-run enterprise platform pre-configured and optimized specifically to run enterprise class big data workloads.

The appliance features optimized versions of either Hortonworks HDP or Cloudera CDH running on top of Teradata hardware and a comprehensive set of Teradata-developed software components. Some features of the Teradata Appliance for Hadoop include:

  • Optimized hardware and flexible configurations
  • High-speed connectors and enhanced software usability features
  • Systems monitoring and management portals
  • Continuous availability and linear scalability
  • Teradata's world-class service and support

Solution: TickVault

TickVault is a Hadoop-based big data platform with the purpose of collecting, storing, transforming, analyzing and providing insights from structured and unstructured financial data. This includes trade & quote history, news and events, research and corporate actions among others.

The platform has been designed to help organizations speed development and management of financial related big data projects. The platform provides APIs and integrates them with pre-existing business software solutions including Matlab, R, or Excel, to avoid business disruptions and speed the analytics process.

Its unified web interface aims to provide easy data access and its distribution within as secure environment, allowing flexible and managing granular  permissions.

Hadoop Platforms: Mature and Enterprise Ready Big Data Platforms

From the list above its easy to see way gone are the days were just a few vendors would provide enterprise-ready option for undertaking a Hadoop-based big data project. The Hadoop space continues to evolve, while a more than decent amount  of vendors offer now reliable solutions for deploying Hadoop both on-premises or in the cloud to comply with most of the use cases an organization needs to address.

Granted is, of course, that for making a decision over which Hadoop data platform is the best for an organization much more information is needed, but this list can provide a place to start exploring the possibilities for new small or big data projects involving Hadoop.

Finally, I wouldn’t be surprised to discover there are other Hadoop platforms I had not mentioned here. Please feel free to let me know about ant other distribution I’m not considering in this list or feel free to drop me a comment or feedback below.


  • During the writing of this piece, it wasn't possible to gather link and information regarding Huawei’s FusionInsight Big Data Platform, which is why it does not appear as part of our list.
  • While IBM will remain offering a Hadoop-based offering, this will be by integrating Hortonworks to its analytics arsenal rather than the existing IBM BigInsights. For more information read here.
  • All logos and trademarks are the property of their respective owners.

Read More

Post Top Ad

Your Ad Spot