Subscribe to our Newsletter | To Post On IoT Central, Click here

data science (8)

Do you want to hire a Data Scientist?

As mentioned by Tom Davenport few years back,Data Scientist is still a hottest job of century.
Data scientists are those elite people who solve business problems by analyzing tons of data and communicate the results in a very compelling way to senior leadership and persuade them to take action.
They have the critical responsibility to understand the data and help business get more knowledgeable about their customers.
The importance of Data Scientists has rose to top due to two key issues:
·     Increased need & desire among businesses to gain greater value from their data to be competitive
·     Over 80% of data/information that businesses generate and collect is unstructured or semi-structured data that need special treatment
So it is extremely important to hire a right person for the job.Requirements for being a data scientist are pretty rigorous, and truly qualified candidates are few and far between.
Data Scientists are very high in demand, hard to attract, come at a very high cost so if there is a wrong hire then it’s really more frustrating. 
Here are some guidelines for checking them:
·     Check the logical reasoning ability
·     Problem solving skills
·     Ability to collaborate & communicate with business folks
·     Practical experience on collaborating Big Data tools
·     Statistical and machine learning experience
·     Should be able to describe their projects very clearly where they have solved business problems
·     Should be able to tell story from the data
·     Should know the latest of cognitive computingdeep learning
I have seen smartest data scientists in my career who do the best job best but cannot communicate the results to senior leaders effectively. Ideally they should know the data in depth and can explain its significance properly. Data visualizations comes very handy at this stage.
Today with digital disrupting every field it has an impact on data science also.
Gartner has called this new breed as citizen data scientists. Their primary job function is outside analytics, they don’t know much about statistics but can work on ready to use algorithms available in APIs like Watson, Tensor flow, Azure and other well-known tools.
The good data scientist can make use of them to spread the awareness and expand their influence.
It has become more important to hire a right data scientist as they will show you the results which may make or break the company.
Read more…

A to Z of Analytics

Analytics has taken world by storm & It it the powerhouse for all the digital transformation happening in every industry.

Today everybody is generating tons of data – we as consumers leaving digital footprints on social media,IoT generating millions of records from sensors, Mobile phones are used from morning till we sleep. All these variety of data formats are stored in Big Data platform. But only storing this data is not going to take us anywhere unless analytics is applied on it. Hence it is extremely important to close the loop with Analytics insights.
Here is my version of A to Z for Analytics:
Artificial Intelligence: AI is the capability of a machine to imitate intelligent human behavior. BMW, Tesla, Google are using AI for self-driving cars. AI should be used to solve real world tough problems like climate modeling to disease analysis and betterment of humanity.
Boosting and Bagging: it is the technique used to generate more accurate models by ensembling multiple models together
Crisp-DM: is the cross industry standard process for data mining.  It was developed by a consortium of companies like SPSS, Teradata, Daimler and NCR Corporation in 1997 to bring the order in developing analytics models. Major 6 steps involved are business understanding, data understanding, data preparation, modeling, evaluation and deployment.
Data preparation: In analytics deployments more than 60% time is spent on data preparation. As a normal rule is garbage in garbage out. Hence it is important to cleanse and normalize the data and make it available for consumption by model.
Ensembling: is the technique of combining two or more algorithms to get more robust predictions. It is like combining all the marks we obtain in exams to arrive at final overall score. Random Forest is one such example combining multiple decision trees.
Feature selection: Simply put this means selecting only those feature or variables from the data which really makes sense and remove non relevant variables. This uplifts the model accuracy.
Gini Coefficient: it is used to measure the predictive power of the model typically used in credit scoring tools to find out who will repay and who will default on a loan.
Histogram: This is a graphical representation of the distribution of a set of numeric data, usually a vertical bar graph used for exploratory analytics and data preparation step.
Independent Variable: is the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable like effect of increasing the price on Sales.
Jubatus: This is online Machine Learning Library covering Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering
KNN: K nearest neighbor algorithm in Machine Learning used for classification problems based on distance or similarity between data points.
Lift Chart: These are widely used in campaign targeting problems, to determine which decile can we target customers for a specific campaign. Also, it tells you how much response you can expect from the new target base.
Model: There are more than 50+ modeling techniques like regressions, decision trees, SVM, GLM, Neural networks etc present in any technology platform like SAS Enterprise miner, IBM SPSS or R. They are broadly categorized under supervised and unsupervised methods into classification, clustering, association rules.
Neural Networks: These are typically organized in layers made up by nodes and mimic the learning like brain does. Today Deep Learning is emerging field based on deep neural networks.
Optimization: It the Use of simulations techniques to identify scenarios which will produce best results within available constraints e.g. Sale price optimization, identifying optimal Inventory for maximum fulfillment & avoid stock outs
PMML: this is xml base file format developed by data mining group to transfer models between various technology platforms and it stands for predictive model markup language.
Quartile: It is dividing the sorted output of model into 4 groups for further action.
R: Today every university and even corporates are using R for statistical model building. It is freely available and there are licensed versions like Microsoft R. more than 7000 packages are now available at disposal to data scientists.
Sentiment Analytics: Is the process of determining whether an information or service provided by business leads to positive, negative or neutral human feelings or opinions. All the consumer product companies are measuring the sentiments 24/7 and adjusting there marketing strategies.
Text Analytics: It is used to discover & extract meaningful patterns and relationships from the text collection from social media site such as Facebook, Twitter, Linked-in, Blogs, Call center scripts.
Unsupervised Learning: These are algorithms where there is only input data and expected to find some patterns. Clustering & Association algorithms like k-menas & apriori are best examples.
Visualization: It is the method of enhanced exploratory data analysis & showing output of modeling results with highly interactive statistical graphics. Any model output has to be presented to senior management in most compelling way. Tableau, Qlikview, Spotfire are leading visualization tools.
What-If analysis: It is the method to simulate various business scenarios questions like what if we increased our marketing budget by 20%, what will be impact on sales? Monte Carlo simulation is very popular.
What do think should come for X, Y, Z?
Read more…

What is Deep Learning ?

Remember how you started recognizing fruits, animals, cars and for that matter any other object by looking at them from our childhood? 
Our brain gets trained over the years to recognize these images and then further classify them as apple, orange, banana, cat, dog, horse, Toyota, Honda, BMW and so on.
Inspired by these biological processes of human brain, artificial neural networks (ANN) were developed.  Deep learning refers to these artificial neural networks that are composed of many layers. It is the fastest-growing field in machine learning. It uses many-layered Deep Neural Networks (DNNs) to learn levels of representation and abstraction that make sense of data such as images, sound, and text
Why ‘Deep Learning’ is called deep? It is because of the structure of ANNs. Earlier 40 years back, neural networks were only 2 layers deep as it was not computationally feasible to build larger networks. Now it is common to have neural networks with 10+ layers and even 100+ layer ANNs are being tried upon.
Using multiple levels of neural networks in Deep Learning, computers now have the capacity to see, learn, and react to complex situations as well or better than humans.
Normally data scientists spend lot of time in data preparation – feature extraction or selecting variables which are actually useful to predictive analytics. Deep learning does this job automatically and make life easier.
Many technology companies have made their deep learning libraries as open source:
  • Google’s Tensorflow
  • Facebook open source modules for Torch
  • Amazon released DSSTNE on GitHub
  • Microsoft released CNTK, its open source deep learning toolkit, on GitHub

Today we see lot of examples of Deep learning around:

  • Google Translate is using deep learning and image recognition to translate not only voice but written languages as well. 
  • With CamFind app, simply take a picture of any object and it uses mobile visual search technology to tell you what it is. It provides fast, accurate results with no typing necessary. Snap a picture, learn more. That’s it.
  • All digital assistants like Siri, Cortana, Alexa & Google Now are using deep learning for natural language processing and speech recognition
  • Amazon, Netflix & Spotify are using recommendation engines using deep learning for next best offer, movies and music
  • Google PlaNet can look at the photo and tell where it was taken
  • DCGAN is used for enhancing and completing the human faces
  • DeepStereo: Turns images from Street View into a 3D space that shows unseen views from different angles by figuring out the depth and color of each pixel
  • DeepMind’s WaveNet is able to generate speech which mimics any human voice that sounds more natural than the best existing Text-to-Speech systems
  • Paypal is using H2O based deep learning to prevent fraud in payments
Till now, Deep Learning has aided image classification, language translation, speech recognition and it can be used to solve any pattern recognition problem, and all of it is happening without human intervention.
Deep learning is a disruptive Digital technology that is being used by more and more companies to create new business models.
Read more…

Using Data Science for Predictive Maintenance

Remember few years ago there were two recall announcements from National Highway Traffic Safety Administration for GM & Tesla – both related to problems that could cause fires. These caused tons of money to resolve.
Aerospace, Rail industry, Equipment manufacturers and Auto makers often face this challenge of ensuring maximum availability of critical assembly line systems, keeping those assets in good working order, while simultaneously minimizing the cost of maintenance and time based or count based repairs.
Identification of root causes of faults and failures must also happen without the need for a lab or testing. As more vehicles/industrial equipment and assembly robots begin to communicate their current status to a central server, detection of faults becomes more easy and practical.
Early identification of these potential issues helps organizations deploy maintenance team more cost effectively and maximize parts/equipment up-time. All the critical factors that help to predict failure, may be deeply buried in structured data like equipment year, make, model, warranty details etc and unstructured data covering millions of log entries, sensor data, error messages, odometer reading, speed, engine temperature, engine torque, acceleration and repair & maintenance reports.
Predictive maintenance, a technique to predict when an in-service machine will fail so that maintenance can be planned in advance, encompasses failure prediction, failure diagnosis, failure type classification, and recommendation of maintenance actions after failure.
Business benefits of Data Science with predictive maintenance:
  • Minimize maintenance costs - Don’t waste money through over-cautious time bound maintenance. Only repair equipment when repairs are actually needed.
  • Reduce unplanned downtime - Implement predictive maintenance to predict future equipment malfunctioning and failures and minimize the risk for unplanned disasters putting your business at risk.
  • Root cause analysis - Find causes for equipment malfunctions and work with suppliers to switch-off reasons for high failure rates. Increase return on your assets.
  • Efficient labor planning — no time wasted replacing/fixing equipment that doesn’t need it
  • Avoid warranty cost for failure recovery – thousands of recalls in case of automakers while production loss in assembly line

TrainItalia has invested 50M euros in Internet of Things project which expects to cut maintenance costs by up to 130M euros to increase train availability and customer satisfaction.

Rolls Royce is teaming up with Microsoft for Azure cloud based streaming analytics for predicting engine failures and ensuring right maintenance.
Sudden machine failures can ruin the reputation of a business resulting in potential contract penalties, and lost revenue. Data Science can help in real time and before time to save all this trouble.
Read more…

Why Data Science Is The Top Job In Digital Transformation

Digital Transformation has become a burning question for all the businesses and the foundation to ride on the wave is being data driven.
DJ Patil & Thomas Davenport mentioned in 2012 HBR article, that Data Scientist is the sexiest job of the century, and how true!  Even the latest Glassdoor ranked Data Scientist at 1st in top 25 best jobs in America.
Over the last decade there’s been a massive explosion in both the data generated and retained by companies. Uber, Airbnb, Netflix, Wallmart, Amazon, LinkedIn, Twitter all process tons of data every minute and use that for revenue growth, cost reductions and increase in customer satisfaction.
Most industries such as Retail, Banking, Travel, Financial Sector, Healthcare, and Manufacturing want to be able to make better decisions. With speed of change and profitability pressures on the businesses, the ability to take decisions had gone down to real time. Data has become an asset for every company, hence they need someone who can comb through these data sets and apply their logic and use tools to find some patterns and provide insights for future.
Think about Facebook, Twitter and other social media platforms,smartphone apps, in-store purchase behavior data, online website analytics, and now all connected devices with internet of things are generating tsunami of new data streams.
All this data is useless if not analyzed for actions or new insights.
The importance of Data Scientists has rose to top due to two key issues:
  • Increased need & desire among businesses to gain greater value from their data
  • Over 80% of data/information that businesses generate and collect is unstructured or semi-structured data that need special treatment 

Data Scientists:

  • Typically requires mix of skills - mathematics, statistics, computer science, machine learning and most importantly business knowledge
  • They need to employ the R or Python programming language to clean and remove irrelevant data
  • Create algorithms to solve the business problems
  • Finally effectively communicate the findings to management

Any company, in any industry, that crunches large volumes of numbers, possesses lots of operational and customer data, or can benefit from social media streams, credit data, consumer research or third-party data sets can benefit from having a data scientist or a data science team.

Top data scientists in the world today are:
  • Kirk D Borne of BoozAllen
  • D J Patil Chief Data Scientist at White House
  • Gregory Piatetsky of kdnuggets
  • Vincent Granville of Analyticsbridge
  • Jonathan Goldman of LinkedIn
  • Ronald Van Loon

Data science will involve all the aspects of statistics, machine leaning, and artificial intelligence, deep learning & cognitive computing with addition of storage from big data.

Read more…

What is Cognitive Computing?

Although computers are better for data processing and making calculations, they were not able to accomplish some of the most basic human tasks, like recognizing Apple or Orange from basket of fruits, till now.

Computers can capture, move, and store the data, but they cannot understand what the data mean. Thanks to Cognitive Computing, machines are bringing human-like intelligence to a number of business applications.
Cognitive Computing is a term that IBM had coined for machines that can interact and think like humans.
In today's Digital Transformation age, various technological advancements have given machines a greater ability to understand information, to learn, to reason, and act upon it. 
Today, IBM Watson and Google DeepMind are leading the cognitive computing space.
Cognitive Computing systems may include the following components:
·      Natural Language Processing - understand meaning and context in a language, allowing deeper, more intuitive level of discovery and even interaction with information.
·    Machine Learning with Neural Networks - algorithms that help train the system to recognize images and understand speech
·    Algorithms that learn and adapt with Artificial Intelligence
·    Deep Learning – to recognize patterns
·    Image recognition – like humans but more faster
·    Reasoning and decision automation – based on limitless data
·    Emotional Intelligence
Cognitive computing can help banking and insurance companies to identify risks and frauds. It analyses information to predict weather patterns. In healthcare it is helping doctors to treat patients based on historical data.
Some of the recent examples of Cognitive Computing:
·   ANZ bank of Australia used Watson-based financial services apps to offer investment advice, by reading through thousands of investments options and suggesting best-fit based on customer specific profiles, further taking into consideration their age, life stage, financial position, and risk tolerance.
·   Geico is using Watson based cognitive computing to learn the underwriting guidelines, read the risk submissions, and effectively help underwrite
·   Brazilian bank Banco Bradesco is using Cognitive assistants at work helping build more intimate, personalized relationships
·   Out of the personal digital assistants we have Siri, Google Now & Cortana – I feel Google now is much easy and quickly adapt to your spoken language. There is a voice command for just about everything you need to do — texting, emailing, searching for directions, weather, and news. Speak it; don’t text it!
As Big Data gives the ability to store huge amounts of data, Analyticsgives ability to predict what is going to happen, Cognitive gives the ability to learn from further interactions and suggest best actions.
Read more…

Upcoming IoT Events

More IoT News

IoT Career Opportunities