Subscribe to our Newsletter | To Post On IoT Central, Click here

data science (7)

A to Z of Analytics

Analytics has taken world by storm & It it the powerhouse for all the digital transformation happening in every industry.

Today everybody is generating tons of data – we as consumers leaving digital footprints on social media,IoT generating millions of records from sensors, Mobile phones are used from morning till we sleep. All these variety of data formats are stored in Big Data platform. But only storing this data is not going to take us anywhere unless analytics is applied on it. Hence it is extremely important to close the loop with Analytics insights.
Here is my version of A to Z for Analytics:
Artificial Intelligence: AI is the capability of a machine to imitate intelligent human behavior. BMW, Tesla, Google are using AI for self-driving cars. AI should be used to solve real world tough problems like climate modeling to disease analysis and betterment of humanity.
Boosting and Bagging: it is the technique used to generate more accurate models by ensembling multiple models together
Crisp-DM: is the cross industry standard process for data mining.  It was developed by a consortium of companies like SPSS, Teradata, Daimler and NCR Corporation in 1997 to bring the order in developing analytics models. Major 6 steps involved are business understanding, data understanding, data preparation, modeling, evaluation and deployment.
Data preparation: In analytics deployments more than 60% time is spent on data preparation. As a normal rule is garbage in garbage out. Hence it is important to cleanse and normalize the data and make it available for consumption by model.
Ensembling: is the technique of combining two or more algorithms to get more robust predictions. It is like combining all the marks we obtain in exams to arrive at final overall score. Random Forest is one such example combining multiple decision trees.
Feature selection: Simply put this means selecting only those feature or variables from the data which really makes sense and remove non relevant variables. This uplifts the model accuracy.
Gini Coefficient: it is used to measure the predictive power of the model typically used in credit scoring tools to find out who will repay and who will default on a loan.
Histogram: This is a graphical representation of the distribution of a set of numeric data, usually a vertical bar graph used for exploratory analytics and data preparation step.
Independent Variable: is the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable like effect of increasing the price on Sales.
Jubatus: This is online Machine Learning Library covering Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering
KNN: K nearest neighbor algorithm in Machine Learning used for classification problems based on distance or similarity between data points.
Lift Chart: These are widely used in campaign targeting problems, to determine which decile can we target customers for a specific campaign. Also, it tells you how much response you can expect from the new target base.
Model: There are more than 50+ modeling techniques like regressions, decision trees, SVM, GLM, Neural networks etc present in any technology platform like SAS Enterprise miner, IBM SPSS or R. They are broadly categorized under supervised and unsupervised methods into classification, clustering, association rules.
Neural Networks: These are typically organized in layers made up by nodes and mimic the learning like brain does. Today Deep Learning is emerging field based on deep neural networks.
Optimization: It the Use of simulations techniques to identify scenarios which will produce best results within available constraints e.g. Sale price optimization, identifying optimal Inventory for maximum fulfillment & avoid stock outs
PMML: this is xml base file format developed by data mining group to transfer models between various technology platforms and it stands for predictive model markup language.
Quartile: It is dividing the sorted output of model into 4 groups for further action.
R: Today every university and even corporates are using R for statistical model building. It is freely available and there are licensed versions like Microsoft R. more than 7000 packages are now available at disposal to data scientists.
Sentiment Analytics: Is the process of determining whether an information or service provided by business leads to positive, negative or neutral human feelings or opinions. All the consumer product companies are measuring the sentiments 24/7 and adjusting there marketing strategies.
Text Analytics: It is used to discover & extract meaningful patterns and relationships from the text collection from social media site such as Facebook, Twitter, Linked-in, Blogs, Call center scripts.
Unsupervised Learning: These are algorithms where there is only input data and expected to find some patterns. Clustering & Association algorithms like k-menas & apriori are best examples.
Visualization: It is the method of enhanced exploratory data analysis & showing output of modeling results with highly interactive statistical graphics. Any model output has to be presented to senior management in most compelling way. Tableau, Qlikview, Spotfire are leading visualization tools.
What-If analysis: It is the method to simulate various business scenarios questions like what if we increased our marketing budget by 20%, what will be impact on sales? Monte Carlo simulation is very popular.
What do think should come for X, Y, Z?
Read more…

Machine Learning (ML) has revolutionized the world of computers by allowing them to learn as they progress forward with large datasets, thus mitigating many previous programming pitfalls and impasses. Machine Learning builds algorithms, which when exposed to high volumes of data, can self-teach and evolve. When this unique technology powers Artificial Intelligence (AI) applications, the combination can be powerful. We can soon expect to see smart robots around us doing all our jobs – much quicker, much more accurately, and even improving themselves at every step. Will this world need intelligent humans anymore or shall we soon be outclassed by self-thinking robots? What are the most visible 2017 Machine Learning trends?

2017 Machine Learning Trends in Research

In the research areas, Machine Learning is steadily moving away from abstractions and engaging more in business problem solving with support from AI and Deep Learning. In What Is the Future of Machine Learning , Forbes predicts the theoretical research in ML will gradually pave the way for business problem-solving. With Big Data making its way back to mainstream business activities, now smart (ML) algorithms can simply use massive loads of both static and dynamic data to continuously learn and improve for enhanced performance.

2017 ML Application Development Trends

Gartner’s Top 10 Technology Trends for 2017 predicts that the combined AI and advanced ML practice that ignited about four years ago and since continued unscathed, will dominate Artificial Intelligence application development in 2017. This lethal combination will deliver more systems that “understand, learn, predict, adapt and potentially operate autonomously. “Cheap hardware, cheap memory, cheap storage technologies, more processing power, superior algorithms, and massive data streams will all contribute to the success of ML-powered AI applications. There will be a steady rise in Ml-powered AI application in industry sectors like preventive healthcare, banking, finance, and media. For businesses that mean more automated functions and fewer human checkpoints.  2017 Predictions from Forrester suggests that the Artificial Intelligence and Machine Learning Cloud will increasingly feed on IoT data as sensors and smart apps take over every facet of our daily lives.

Democratization of Machine Learning in the Cloud          

The democratization of AI and ML through Cloud technologies, open standards, and algorithm economy will continue. The growing trend of deploying prebuilt ML algorithms to enable Self-Service Business Intelligence and Analytics is a positive step towards democratization of ML. In Google Says Machine Learning is the Future, the author champions the democratization of ML through idea sharing. A case in point is Google’s Tensor Flow, which has championed the need for open standards in Machine Learning. This article claims that almost anyone with a laptop and an Internet connection can dare to be a Machine Learning expert today provided they have the right mindset.

The provisioning of Cloud-based IT services was already a good step to make advanced Data Science a mainstream activity, and now with Cloud and packaged algorithms, mid-sized ad smaller businesses will have access to Self-Service BI and Analytics, which was only a dream till now. Also, the mainstream business users will gradually take an active role in data-centric business systems. Machine Learning Trends – Future AI claims that more enterprises in 2017 will capitalize on the Machine Learning Cloud and do their part to lobby for democratized data technologies.

Platform Wars will Peak in 2017

The platform war between IBM, Microsoft, Google, and Facebook to be the leader in ML developments will peak in 2017.  Where Machine Learning Is Headed predicts that 2017 will experience a tremendous growth of smart apps, digital assistants and mainstream use of Artificial Intelligence. Although many ML-enabled AI systems have turned into success stories, the self-driving cars may die a premature death.

Humans will Make Peace with Machines

 Since 2012 the global business community has witnessed a meteoric rise and widespread proliferation of data technologies. Finally, humans will realize that it is time to stop fearing the machines and begin working with them. The InfoWorld article titled Application Development, Docker, Machine Learning Are Top Tech Trends for 2017 asserts humans and machines will work with each other, not against each other. In this context, readers should review the DATAVERSITY® article The Future of Machine Learning: Trends, Observations, and Forecasts, where the readers are reminded that as businesses develop a strong dependence on pre-built ML algorithms for Advanced Analytics, the need for Data Scientists or large IT departments may diminish.

Demand-Supply Gaps in Data Science and Machine Learning will Rise

The business world is steadily heading toward the prophetic 2018, when according to McKinsey the first void in data technology expertise will be felt in the US and then gradually in the rest of the world. The demand-supply gap in Data Science and Machine Learning skills will continue to rise till academic programs and industry workshops begin to produce a ready workforce. In response to this sharp rise in the demand-supply gap, more enterprises and academic institutions will collaborate to train future Data Scientists and ML experts. This kind of training will compete with the traditional Data Science classroom and will focus more on practical skills rather than on theoretical knowledge. 

 The Algorithm Economy will take Centre Stage

Over the next year or two, businesses will be using canned algorithms for all data-centric activities like BI, Predictive Analytics, and CRM. The algorithm economy, which Forbes mentions, will usher in a marketplace where all data companies will compete for space. In 2017, global businesses will engage in Self-Service BI, and experience the growth of algorithmic business solutions, and ML in the Cloud. So far as algorithm-driven business decision making is concerned, 2017 may actually see two distinct types of algorithm economies. On one hand, average businesses will utilize canned algorithmic models for their operational and customer-facing functions. On the other hand, proprietary ML algorithms will become a market differentiator among large, competing enterprises.

Some Thoughts to Ponder

If the threat of intelligent machines taking over Data Scientists is really as real as it is made out to be, then 2017 is probably the year when the global Data Science community should take a new look at the capabilities of so-called “smart machines.” The repeated failure of autonomous cars has made one point clear – that even learning machines cannot surpass the natural thinking faculties bestowed by nature on human beings. If autonomous or self-guided machines have to be useful to human society, then the current Artificial Intelligence and Machine Learning research should focus on acknowledging the limits of machine power and assign tasks that are suitable for the machines and include more human interventions at necessary checkpoints to avert disasters. Repetitive, routine tasks can be well handled by machines, but any out-of-the-ordinary situations will still require human intervention.

To know more about High-End professional training on ML, AI, IoT, Big Data, Cloud, Analytics, Data Science and more, feel free to drop a line at: [email protected]

This article originally appeared here.

Read more…

What is Deep Learning ?

Remember how you started recognizing fruits, animals, cars and for that matter any other object by looking at them from our childhood? 
Our brain gets trained over the years to recognize these images and then further classify them as apple, orange, banana, cat, dog, horse, Toyota, Honda, BMW and so on.
Inspired by these biological processes of human brain, artificial neural networks (ANN) were developed.  Deep learning refers to these artificial neural networks that are composed of many layers. It is the fastest-growing field in machine learning. It uses many-layered Deep Neural Networks (DNNs) to learn levels of representation and abstraction that make sense of data such as images, sound, and text
Why ‘Deep Learning’ is called deep? It is because of the structure of ANNs. Earlier 40 years back, neural networks were only 2 layers deep as it was not computationally feasible to build larger networks. Now it is common to have neural networks with 10+ layers and even 100+ layer ANNs are being tried upon.
Using multiple levels of neural networks in Deep Learning, computers now have the capacity to see, learn, and react to complex situations as well or better than humans.
Normally data scientists spend lot of time in data preparation – feature extraction or selecting variables which are actually useful to predictive analytics. Deep learning does this job automatically and make life easier.
Many technology companies have made their deep learning libraries as open source:
  • Google’s Tensorflow
  • Facebook open source modules for Torch
  • Amazon released DSSTNE on GitHub
  • Microsoft released CNTK, its open source deep learning toolkit, on GitHub

Today we see lot of examples of Deep learning around:

  • Google Translate is using deep learning and image recognition to translate not only voice but written languages as well. 
  • With CamFind app, simply take a picture of any object and it uses mobile visual search technology to tell you what it is. It provides fast, accurate results with no typing necessary. Snap a picture, learn more. That’s it.
  • All digital assistants like Siri, Cortana, Alexa & Google Now are using deep learning for natural language processing and speech recognition
  • Amazon, Netflix & Spotify are using recommendation engines using deep learning for next best offer, movies and music
  • Google PlaNet can look at the photo and tell where it was taken
  • DCGAN is used for enhancing and completing the human faces
  • DeepStereo: Turns images from Street View into a 3D space that shows unseen views from different angles by figuring out the depth and color of each pixel
  • DeepMind’s WaveNet is able to generate speech which mimics any human voice that sounds more natural than the best existing Text-to-Speech systems
  • Paypal is using H2O based deep learning to prevent fraud in payments
Till now, Deep Learning has aided image classification, language translation, speech recognition and it can be used to solve any pattern recognition problem, and all of it is happening without human intervention.
Deep learning is a disruptive Digital technology that is being used by more and more companies to create new business models.
Read more…

Using Data Science for Predictive Maintenance

Remember few years ago there were two recall announcements from National Highway Traffic Safety Administration for GM & Tesla – both related to problems that could cause fires. These caused tons of money to resolve.
Aerospace, Rail industry, Equipment manufacturers and Auto makers often face this challenge of ensuring maximum availability of critical assembly line systems, keeping those assets in good working order, while simultaneously minimizing the cost of maintenance and time based or count based repairs.
Identification of root causes of faults and failures must also happen without the need for a lab or testing. As more vehicles/industrial equipment and assembly robots begin to communicate their current status to a central server, detection of faults becomes more easy and practical.
Early identification of these potential issues helps organizations deploy maintenance team more cost effectively and maximize parts/equipment up-time. All the critical factors that help to predict failure, may be deeply buried in structured data like equipment year, make, model, warranty details etc and unstructured data covering millions of log entries, sensor data, error messages, odometer reading, speed, engine temperature, engine torque, acceleration and repair & maintenance reports.
Predictive maintenance, a technique to predict when an in-service machine will fail so that maintenance can be planned in advance, encompasses failure prediction, failure diagnosis, failure type classification, and recommendation of maintenance actions after failure.
Business benefits of Data Science with predictive maintenance:
  • Minimize maintenance costs - Don’t waste money through over-cautious time bound maintenance. Only repair equipment when repairs are actually needed.
  • Reduce unplanned downtime - Implement predictive maintenance to predict future equipment malfunctioning and failures and minimize the risk for unplanned disasters putting your business at risk.
  • Root cause analysis - Find causes for equipment malfunctions and work with suppliers to switch-off reasons for high failure rates. Increase return on your assets.
  • Efficient labor planning — no time wasted replacing/fixing equipment that doesn’t need it
  • Avoid warranty cost for failure recovery – thousands of recalls in case of automakers while production loss in assembly line

TrainItalia has invested 50M euros in Internet of Things project which expects to cut maintenance costs by up to 130M euros to increase train availability and customer satisfaction.

Rolls Royce is teaming up with Microsoft for Azure cloud based streaming analytics for predicting engine failures and ensuring right maintenance.
Sudden machine failures can ruin the reputation of a business resulting in potential contract penalties, and lost revenue. Data Science can help in real time and before time to save all this trouble.
Read more…

Why Data Science Is The Top Job In Digital Transformation

Digital Transformation has become a burning question for all the businesses and the foundation to ride on the wave is being data driven.
DJ Patil & Thomas Davenport mentioned in 2012 HBR article, that Data Scientist is the sexiest job of the century, and how true!  Even the latest Glassdoor ranked Data Scientist at 1st in top 25 best jobs in America.
Over the last decade there’s been a massive explosion in both the data generated and retained by companies. Uber, Airbnb, Netflix, Wallmart, Amazon, LinkedIn, Twitter all process tons of data every minute and use that for revenue growth, cost reductions and increase in customer satisfaction.
Most industries such as Retail, Banking, Travel, Financial Sector, Healthcare, and Manufacturing want to be able to make better decisions. With speed of change and profitability pressures on the businesses, the ability to take decisions had gone down to real time. Data has become an asset for every company, hence they need someone who can comb through these data sets and apply their logic and use tools to find some patterns and provide insights for future.
Think about Facebook, Twitter and other social media platforms,smartphone apps, in-store purchase behavior data, online website analytics, and now all connected devices with internet of things are generating tsunami of new data streams.
All this data is useless if not analyzed for actions or new insights.
The importance of Data Scientists has rose to top due to two key issues:
  • Increased need & desire among businesses to gain greater value from their data
  • Over 80% of data/information that businesses generate and collect is unstructured or semi-structured data that need special treatment 

Data Scientists:

  • Typically requires mix of skills - mathematics, statistics, computer science, machine learning and most importantly business knowledge
  • They need to employ the R or Python programming language to clean and remove irrelevant data
  • Create algorithms to solve the business problems
  • Finally effectively communicate the findings to management

Any company, in any industry, that crunches large volumes of numbers, possesses lots of operational and customer data, or can benefit from social media streams, credit data, consumer research or third-party data sets can benefit from having a data scientist or a data science team.

Top data scientists in the world today are:
  • Kirk D Borne of BoozAllen
  • D J Patil Chief Data Scientist at White House
  • Gregory Piatetsky of kdnuggets
  • Vincent Granville of Analyticsbridge
  • Jonathan Goldman of LinkedIn
  • Ronald Van Loon

Data science will involve all the aspects of statistics, machine leaning, and artificial intelligence, deep learning & cognitive computing with addition of storage from big data.

Read more…

What is Cognitive Computing?

Although computers are better for data processing and making calculations, they were not able to accomplish some of the most basic human tasks, like recognizing Apple or Orange from basket of fruits, till now.

Computers can capture, move, and store the data, but they cannot understand what the data mean. Thanks to Cognitive Computing, machines are bringing human-like intelligence to a number of business applications.
Cognitive Computing is a term that IBM had coined for machines that can interact and think like humans.
In today's Digital Transformation age, various technological advancements have given machines a greater ability to understand information, to learn, to reason, and act upon it. 
Today, IBM Watson and Google DeepMind are leading the cognitive computing space.
Cognitive Computing systems may include the following components:
·      Natural Language Processing - understand meaning and context in a language, allowing deeper, more intuitive level of discovery and even interaction with information.
·    Machine Learning with Neural Networks - algorithms that help train the system to recognize images and understand speech
·    Algorithms that learn and adapt with Artificial Intelligence
·    Deep Learning – to recognize patterns
·    Image recognition – like humans but more faster
·    Reasoning and decision automation – based on limitless data
·    Emotional Intelligence
Cognitive computing can help banking and insurance companies to identify risks and frauds. It analyses information to predict weather patterns. In healthcare it is helping doctors to treat patients based on historical data.
Some of the recent examples of Cognitive Computing:
·   ANZ bank of Australia used Watson-based financial services apps to offer investment advice, by reading through thousands of investments options and suggesting best-fit based on customer specific profiles, further taking into consideration their age, life stage, financial position, and risk tolerance.
·   Geico is using Watson based cognitive computing to learn the underwriting guidelines, read the risk submissions, and effectively help underwrite
·   Brazilian bank Banco Bradesco is using Cognitive assistants at work helping build more intimate, personalized relationships
·   Out of the personal digital assistants we have Siri, Google Now & Cortana – I feel Google now is much easy and quickly adapt to your spoken language. There is a voice command for just about everything you need to do — texting, emailing, searching for directions, weather, and news. Speak it; don’t text it!
As Big Data gives the ability to store huge amounts of data, Analyticsgives ability to predict what is going to happen, Cognitive gives the ability to learn from further interactions and suggest best actions.
Read more…

Upcoming IoT Events

6 things to avoid in transactional emails

transactional man typing

  You might think that once a sale has been made, or an email subscription confirmed, that your job is done. You’ve made the virtual handshake, you can have a well-earned coffee and sit down now right? Wrong! (You knew we were…


More IoT News

IoT Career Opportunities