We are entering into the age of IOT, lot of connected devices will talk to each other using sensors/signals and it is expected that those devices will generate an enormous amount of data.
Handling that much of data [Big data] and generating actionable signals from that would be a challenge.
Companies are investing a lot of resources to build the platforms that can process the big data and retrieve some intelligence out of it. I am wondering that IOT will generate limitless data. Would these platforms will still be able to handle that situation and would it still remain cost effective to process the data of thousands of devices and then retrieving the signals that could be understand able to other devices.
Today SkyServer had data on 200 million galaxies in the database. The SDSS data exceeds 150 terabytes, covering more than 220 million galaxies and 260 million stars. The images alone include 2.5 trillion pixels of original raw data.
As the accuracy of devices will improve with the passage of time, their discovery capabilities will also improve resulting into more petabytes of data. Would we continue adding Servers, RAM, Hard drives and processors to manage all this?
Why we cannot invest all of these resources in making the devices intelligent? Every device do necessary intelligence at its own end its output interface expose only the actionable data to the outside world?
For example the sky telescope can compare all the images itself and only send the images to sky server which has some new insights. Some kind of machine learning built-in to sky telescope which could help it to learn it from the past and generate actionable insights based on that…in simple words, fit a human brain inside telescope that will transform big data to smart data. Storing, managing and processing garbage [Big data] at SkyServer do not make any sense to me. Why we should bear the data transfer cost of garbage? May be hardware device keep it in its unconsciousness and can use it if needed to generate the smart data.
In the world of IOT if output interface of all devices could expose smart data, it will tremendously decrease the data size that needs to be traveled on the IOT wire and software world would be more comfortable with handling those megabytes of data.
We also need to rethink the way we use the internet today, the data being generated from the internet is tremendously increasing with the passage of time. Software and hardware technology cannot handle the situation at all in future.
May be we think of a new language for the internet which can help to convey the message in a few signals instead of writing a story of 1000 words. If you see conversations between deaf people, they use few signals and tell a full a story in few actions. Consider internet users as deaf people and think of a language which could tell a story in few signals.
We think of a platform that could help to reduce the size of the data that we are generating today on the internet.
Transform the world towards the smart data instead of wasting money to handle the garbage [Big Data]
Originally posted on Data Science Central
As we move towards widespread deployment of sensor-based technologies, three issues come to the fore: (1) many of the these applications will need machine learning to be localized and personalized, (2) machine learning needs to be simplified and automated, and (3) machine learning needs to be hardware-based.
Beginning of the era of personalization of machine learning
Imagine a complex plant or machinery being equipped with all kinds of sensors to monitor and control its performance and to predict potential points of failure. Such plants can range from an oil rig out in the ocean to an automated production line. Or such complex plants can be human beings, perhaps millions of them, who are being monitored with a variety of devices in a hospital or at home. Although we can use some standard models to monitor and compare performance of these physical systems, it would make more sense to either rebuild these models from scratch or adjust them to individual situations. This would be similar to what we do in economics. Although we might have some standard models to predict GDP and other economic variables, we would need to adjust each one of them to individual countries or regions to take into account their individual differences. The same principle of adjustment to individual situations would apply to physical systems that are sensor-based. And, similar to adjusting or rebuilding models of various economic phenomena, the millions of sensor-based models of our physical systems would have to be adjusted or rebuilt to account for differences in plant behavior. We are, therefore, entering an era of personalization of machine learning at a scale that we have never imagined before. The scenario is scary because we wouldn’t have the resources to pay attention to these millions of individual models. Cisco projects 50 billion devices to be connected by 2020 and the global IoT market size to be over $14 trillion by 2022 [1, 2].
The need for simplification and automation of machine learning technologies
If this scenario of widespread deployment of personalized machine learning is to play out, we absolutely need automation of machine learning to the extent that requires less expert assistance. Machine learning cannot continue to depend on high levels of professional expertise. It has to be simplified to be similar to automobiles and spreadsheets where some basic training at a high school can certify one to use these tools. Once we simplify the usage of machine learning tools, it would lead to widespread deployment and usage of sensor-based technologies that also use machine learning and would create plenty of new jobs worldwide. Thus, simplification and automation of machine learning technologies is critical to the economics of deployment and usage of sensor-based systems. It should also open the door to many new kinds of devices and technologies.
The need for hardware-based localized machine learning for "anytime, anywhere" deployment and usage
Although we talk about the Internet of Things, it would simply be too expensive to transmit all of the sensor-based data to a cloud-based platform for analysis and interpretation. It would make sense to process most of the data locally. Many experts predict that, in the future, about 60% of the data would be processed at the local level, in local networks - most of it may simply be discarded after processing and only some stored locally. There is a name for this kind of local processing – it’s called “edge computing” .
The main characteristics of data generated by these sensor-based systems are: high-velocity, high volume, high-dimensional and streaming. There are not many machine learning technologies that can learn in such an environment other than hardware-based neural network learning systems. The advantages of neural network systems are: (1) learning involves simple computations, (2) learning can take advantage of massively parallel brain-like computations, (3) they can learn from all of the data instead of samples of data, (4) scalability issues are non-existent, and (4) implementations on massively parallel hardware can provide real-time predictions in micro seconds. Thus, massively parallel neural network hardware can be particularly useful with high velocity streaming data in these sensor-based systems. Researchers at Arizona State University, in particular, are working on such a technology and it is available for licensing .
Hardware-based localized learning and monitoring will not only reduce the volume of Internet traffic and its cost, it will also reduce (or even eliminate) the dependence on a single control center, such as the cloud, for decision-making and control. Localized learning and monitoring will allow for distributed decision-making and control of machinery and equipment in IoT.
We are gradually moving to an era where machine learning can be deployed on an “anytime, anywhere” basis even when there is no access to a network and/or a cloud facility.
Gartner (2013). "Forecast: The Internet of Things, Worldwide, 2013."
We have seen the birth to a generation of enterprises that are data-rich and analytically driven, eagerly following trends in big data and analytics. Let’s take a closer look as I provide some use cases demonstrating how IBM is helping clients find innovative big data solutions.
1. Datafication-led innovation
Data is the new basis of competitive advantage. Enterprises that use data and sophisticated analytics turn insight into innovation, creating efficient new business processes, informing strategic decision making and outpacing their peers on a variety of fronts.
2. Sophisticated analytics for rich media
Much of produced data is useless without applying appropriate analytics to it. Where does opportunity lie? According to the International Data Corporation (IDC), rich media (video, audio, images) analytics will at least triple in 2015 to emerge as a key driver for big data and analytics technology investment. And such data requires sophisticated analytics tools. Indeed, consider e-commerce–based image search: accurate, relevant image search analysis that doesn't require human tagging or intervention is a significant opportunity in the market. We can expect similar smart analytics capabilities to offer similar opportunities.
3. Predictive analytics driving efficiency
Applications featuring predictive capabilities are picking up speed. Predictive analytics enhances value by boosting effectiveness, providing measurability of the application itself, recognizing the value of the data scientist and maintaining a dynamically adaptive infrastructure. For these reasons, predictive analytics capabilities are becoming an integral component of analytics tools.
4. Big data in the cloud
Over the next five years, IDC predicts, spending on cloud-based big data analytics solutions will grow three times more quickly than spending on on-premises solutions—and hybrid deployments will become a must-have. Moreover, says IDC, with data sources located both in and out of the cloud, business-level metadata repositories will be used to relate data. Organizations should evaluate offerings from public cloud providers to seek help overcoming challenges associated with big data management, including the following:
- Security and privacy policies and regulations affecting deployment options
- Data movement and integration requirements for supporting hybrid cloud environments
- Building a business glossary and managing map data to prevent overwhelming data
- Building a cloud metadata repository (containing business terms, IT assets, data definitions and logical data models) that points to physical data elements.
5. Cognitive computing
Cognitive computing is a game-changing technology that uses natural language processing and machine learning to help humans and machines interact naturally and to augment human expertise. Personalization applications using cognitive computing will help consumers shop for clothes, choose a bottle of wine or even create a new recipe. And IBM Watson is leading the charge.
6. Big money for big data
Increasingly, organizations are monetizing their data, whether by selling it or by providing value-added content. According to IDC, 70 percent of large organizations already purchase external data, and 100 percent are expected to do so by 2019. Accordingly, organizations must understand what their potential customers value and must become proficient at packaging data and value-added content products, experimenting to find the “right” mix of data and combining content analytics with structured data, delivered through dashboards, to help create value for external parties interacting with the analysis.
7. Real-time analytics and the Internet of Things
The Internet of Things (IoT) is expected to grow at a five-year CAGR of 30 percent and, in its role as a business driver, to lead many organizations to their first use of streaming analytics. Indeed, the explosion of data coming from the Internet of Things will accelerate real-time and streaming analytics, requiring data scientists and subject matter experts to sift through data in search of repeatable patterns that can be developed into event processing models. Event processing can then process incoming events, correlating them with relevant models and detecting in real time conditions requiring response. Moreover, event processing is an integral part of systems and applications that operationalize big data, for doing so involves continuous processing and thus requires response times as near to real time as possible.
8. Increased investments in skills
Many organizations want to combine business knowledge and analytics but have difficulty finding individuals who are skilled enough to do so. Leading companies in particular feel this talent gap keenly, for as they move to broaden skills across the enterprise, the need for combined skills becomes ever more apparent. Indeed, combined skills are of critical importance in speed-driven organizations, for such skills speed the translation of insights into actions through deep knowledge of the business drivers—and the data related to them—that are likely to affect performance.
Originally posted on Data Science Central
Guest blog post by Daniel Calvo-Marin
All the people interested in data is always looking or researching for new data to integrate either in their business, academic research, software solution, etc.
There are lots of data sets meaningful for ones and not important for others, but there is one kind of data that every person who is passionate about data will find interesting, personal data.
We’re leaving logs of everything we do even without knowing it. Here is a list of data sets that we’re generating and we can study to know a little more of how we behave:
- Fitness data: if you’re a user of apps like Endomondo, Nike +, Adidas MyCoach,MapMyRun or you wear things like a Jawbone, Misfit, Fitbit or Garmin and a lot more, you have data to study. Steep counter, distance, speed, pace, carbs lost and heart rate are some of the dimensions that you can analyze to get a deep understanding of your exercise and improve on it. In some way you can become your own coach by establishing realistic goals, scheduling exercise sessions in a more efficient way and planning rest days when you need them.
- Personal Schedule: maybe this tool doesn’t sounds as awesome as fit bands or greatest gadgets but it has data that could help you a lot. If you are a person that is a high user of agendas and schedules you can analyze it to even predict your future. Yes I know, it sounds crazy, you are the owner of your future. But how many times you get late to an appointment? If you analyze your schedule maybe you can define a “late index” for every entry in your agenda. Or what about reminders to get in touch with people you care about based on your last dates with them. What about academics, maybe you can adjust a model that can set up the right timing for start studying for the exam based on your last scores and time of studying, so the next time you schedule an exam it automatically will also schedule your study sessions.
- Chats history: depending on the chat service you use, you can export your chat history. First of all, it's fun! Definitely you’re going to find things that you don’t remember and will give you a laugh. After that, you can analyze your data to know how is going your relation with others. Topics, number of messages, number of images and emoticons are some indicators to analyze. Other thing you can do is measure the level of importance you are giving to a person. Are you being absorbed by your job?, this is something that you can get to know through your chat history. What if you could apply sentimental analysis to your conversations? Are they nice conversations or there is someone you should avoid to prevent getting angry or upset. Go ahead and give a try to “Chat analytics”.
- Personal mail: in most of the cases this is a less intense communication channel than chat but it will also let you know better how you behave. Analyze your vacation bookings, orders history in different online stores, blogs, forums, advertising and way more. You have a lot of information here! First of all maybe you can create your own spam detector, those used by the mail services are very good, but for a practice it is a good exercise for analytics. We all are receiving messages that are not considered as important, so go ahead and make your own spam detector. After that you can analyze your preferences in products or vacation places to improve your next choice. You don’t have enough time to read emails that maybe you would like to analyze. For example, make a discount searcher that can find in all your mails promotions that interest you.
This isn’t an exhaustive list. From now on stay with your eyes open so you can discover new sources of personal data that you can analyze to improve your life. I think most of this type of analytics could help a lot in become a more efficient person in all of your activities, but I want to give you an advise, despite all of this information, live every day as you want, don’t feel restricted by your agenda, chat history analytics or vacation planing algorithm, you’re free to choose what you prefer in every moment of your life, this is just a help guide, not the final guide!
Originally posted here.
Did you know that athletes are not only monitored by cameras on stadiums, but also by many quirky devices such as accelerometers, heart rate sensors and even local GPS-like systems? Indeed, Big Data and modern technologies are currently revolutionizing sports and even powering the Fantasy Sports industry.
Sports clubs, media outlets and fans around the world all share a thirst for advanced statistics and information. Big clubs use them to improve the performance of their own players, prepare tactics against other teams or scout potentially interesting players. On the other hand, media outlets love data just as much since it gives added value to their reports. Finally, the stats have also outsized importance for fantasy sports managers who create their fantasy teams with individual athletes in top form.
Let’s start with last year’s soccer World Cup in Brazil. While their last FIFA World Cup success can be attributed to many things, the Germans, known for technological know-how, had a trump card in their hands. Many soccer fans raised their eyebrows when it was revealed that the national squad wore Adidas’ miCoach elite team system during training sessions before and during the competition. The physiological monitoring service collects and transmits information directly from the athlete’s bodies, including heart rate, distance, speed, acceleration and power, and then display those metrics live on an iPad. All this information is made available live on an iPad to coaches and trainers on the sideline during training, as well as post-session for in-depth analysis. Interestingly enough, analysis of the data can help identify the fit players from those who could use a rest.
Effective Use of GNSS and GPS
Of course, MiCoach is not the only device of this kind on the market. The major player is actually Australian company Catapult Sports, focused on Global Navigation Satellite System (GNSS) data, which is increasingly important to sports scientists and coaches, who monitor it to measure player movement and fatigue. As you have likely guessed, the tracking devices rely on GNSS satellites.
However, they also have their own local positioning system, ClearSky, which can be installed around the indoor area of the stadium when obstacles, like a closed roof, interfere with the ability to lock on to individual units. ClearSky uses anchor nodes to track players’ movements, while the devices with transmitters are worn at the top of the back, held in place by a compression shirt that looks a bit like a sports bra and can be worn over or under the uniform.
Another pioneer of wearable tracking devices is GPSport, acquired by Catapult Sports in July, known for its sophisticated performance-monitoring devices that incorporate advanced GPS tracking with heart rate. The combined group now works with more than 450 teams worldwide, including Chelsea, Real Madrid and Brazilian national team.
All these tracking devices fall under the Electronic Performance and Tracking System (EPTS) category. Until mid-2015, soccer players were allowed to wear them only during training. However, on July 7, the international governing body of soccer FIFA issued a memorandum announcing the approval of wearable electronic performance and tracking systems in matches – on the condition that they do not endanger player safety and that information is not available to coaches during matches.
It’s worth noting that every respective association, league or competition has the final decision on whether to adopt or reject EPTS devices.
FIFA has already made a step forward to control the use of these tools for its own competitions. After suffering several concussions, U.S. soccer veteran Ali Krieger chose to wear performance protection headgear from FIFA-approved Unequal Technologies during this month’s women’s soccer tournament in Canada. Unequal Halo, which looks similar to Petr Cech’s famous black protective helmet, combines Kevlar, high-tensile strain fibres and an Accelleron composite for ballistic-strength protection against concussions. So when force hits the 10-millimetres-thick Halo, instead of pinpointing directly into the head at the point of impact, it disperses the energy throughout the entirety of the headband.
Going back to soccer data, FIFA relies on a visual-tracking technology called Matrics to provide extensive data set in real time that makes up the on-site heat maps, passes completed and distance covered. The company behind Matrics is an Italian firm called Deltatre that uses several technologies and manual inputs from a large crew to deliver the real-time stats. This technology has been around for years, but it’s becoming more diverse, detailed and accurate.
Deltatre has also been appointed by UEFA to provide a number of services from each Champions League venue. These include on-air graphics generation, which is directly embedded into the multilateral feed, and data capture for UEFA’s official results system, produced using a combination of a player tracking system and dedicated in-venue spotters. On UEFA’s website one can find a comprehensive list of stats for players, clubs, groups, matches and goal times.
The Rise of Sports Analytics
FIFA also collaborates with Infostrada, which delivers live-match coverage (streaming) services, and develops and calculates a World Ranking for the women’s national teams associated with FIFA’s competitions. Apart from Deltratre and Infostrada, sports data markets have other big players. One of them is ChyronHego, mostly known for its player-tracking technology TRACAB, which uses advanced patented image processing technology to identify the position and speed of all moving objects within each soccer arena. At 25 times per second, the system generates live, accurate X, Y and Z coordinates for every viewable object, including players, referees and even the ball. The data provides insight for coaches to evaluate player performance and track metrics such as distance run, speeds, stamina, pass completion, team formations, etc. TRACAB is installed in over 125 stadiums and is used in more than 2,000 matches per year by the Premier League, Bundesliga and Spanish La Liga.
ChyronHego has an agreement with Bundesliga, while Premier League’s first sports technology partner is EA Sports. The official data provider for the French Ligue 1 is Prozone (acquired by U.S.-based competitor Stats LLC in May 2015), which has a product that works in a similar way to TRACAB. Using around 10 cameras, the programme creates a two-dimensional animation of the playing field and is capable of registering 3,000 touches of the ball per game. It offers real-time, post-match and opposition analysis, and now most of Premier League teams use the system.
Many soccer leagues and clubs also collaborate with Opta, the leading provider of soccer sports data. Opta’s analytics can determine every single action of a player in a specific zone on the field, regardless of whether he has a ball or not. It can also measure the distance the player runs during the course of a game. There are more than 100 match-specific statistic categories, for instance shots, goals, assists, yellow and red cards, won and lost duels and also some lesser-known categories, such as accurate corners into box, effective blocked cross or accurate keeper throws.
This type of analytics is useful for clubs when scouting and to help shape roster development decisions. Probably, the most known supporter of this approach among the top managers is Arsene Wenger. The Arsenal manager once said that the personal touch in player scouting remains decisive, but the computer-generated statistics can certainly help his management to find a player they need. Not surprisingly, back in 2012, the English club even bought U.S.-based data company StatDNA that provides expert analysis, guiding everything from identifying new players to post-game tactical analysis.
Moreover, the stats are also used by betting companies (William Hill embed Opta’s regular contextual facts directly within its betting platform, while SkyBet uses them to add detail and colour to its regular email marketing communications), and more recently, Daily Fantasy Sports (DFS) operators. Sports data is used in two ways: firstly, companies use it to form scoring systems, and secondly, it’s a must for fantasy players who have to study many pieces of information in order to be successful fantasy sports managers. The more information they have, given the ability to dissect it, the higher are the chances to enhance the competitiveness.
The Forefront of Fantasy Soccer
Fantasy football (soccer) operators usually form partnerships with sports data companies, and use a limited number of stats, normally from five to 20, to sum up a game. However, there is one company whose scoring system proves the most advanced. Malta-based company Oulala Games Ltd has built a mathematical matrix that uses Opta’s data to create an efficient scoring system. This company’s platform uses a highly sophisticated algorithm to assess the crucial aspects of an athlete’s performance that contribute to an overall result. Their system includes a total of 70 different criteria dependent on a player’s position (keeper, defender, midfielder and striker) resulting in a total of 275 ways to gain or lose points. The aggregated numbers of these actions, made by players who are included in virtual teams, give the overall winners of different daily leagues.
Despite the fact that a high level of skill is required to become a league’s winner, some professional DFS players, especially in the U.S., have found a bypass that gives them a substantial edge over their competition – They are using advanced software known as scripting, which helps them to decide which players to pick, and in many cases, automatically enter hundreds of line-ups in multi-entry guaranteed prize pool (GPP) contests. Currently, all major DFS operators allow the practice.
Soccer, which has always been a numbers game, is apparently driven by more and more Big Data. Clubs are now likely hiring fewer scouts and more computer analysts; TV, radio and newspapers drive more stats-based conversation about the performance of players, managers and teams than ever before. Numbers are also seeping out of real soccer and into the fantasy – the stats that surround players are not only used to measure their actual performance, but also to evaluate their contribution to fantasy soccer teams. It’s fair to say that this Big Data revolution in soccer will only continue and change the whole experience of watching the most popular sport in the world.
If you want to stay current on the latest Daily Fantasy Sports industry news, updates, and opinions, then don't hesitate to join the Daily Fantasy Sports in Europe group. I am looking forward to welcome you and share success stories with you!
Originally posted on Data Science Central
Guest blog post by Martin Doyle
For ever it seems, we’ve been warning about the dangers of low quality data. Our warnings have been reinforced and echoed by some of the world’s biggest think tanks. However, despite this, some organisations still haven’t acted to improve the quality of their data. And we’re wondering why?
Over the last 12 months, we’ve blogged about business automation , and about cutting the waste that’s destroying your ROI. We’ve reminded you that your data is vulnerable to decay, and we correctly predicted that Google Now would become a bigger presence in our lives.
Despite our best efforts though:
These Experian data quality statistics prove that businesses are failing to take action. Their data quality challenges are growing, despite the fact that data quality software is getting better all the time.
Will 2016 be the year that the message finally gets through, or will we be singing from the same carol book this time next year?
What might you achieve in 2016?
The benefits of better data management are vast, and they benefit everyone who comes into contact with that data. For a profit-making business, or an efficiency-driven private sector organisation, we can divide the benefits into three distinct categories: Efficiency, Innovation and Experience.
Laying the foundations for 2016
Have you decided to get to grips with data quality in 2016? Whether you’re planning cloud migration, digital transformation, or simply want to improve your bottom line, you can start putting the basics in place as soon as the Christmas tree is packed away.
Consider adding a Chief Data Officer to the organisation to act as a data quality ambassador. Why? Data is going to have to be valid and reliable all the time if customers are going to receive the quality of service they are looking for. This means that there has to be constant focus on data quality, rather than conducting occasional data quality reviews, and you need someone who can drive change in your processes and culture.
Additionally, look at the way the world around you is changing. A couple of years ago, tablet computers were on everyone’s Christmas list. This year, it’s wearable technology and products to automate the home. Data is already shifting towards centre stage position, and this should provide all the inspiration you need to modernise your business accordingly.
Finally, imagine a world where your organisation was more streamlined and agile. Imagine the cost savings of automation and efficiency. Think about how much your staff could do if they didn’t have to duplicate their work. Consider how much more accurate your reports would be if you had access to reliable data, and how much money you’re currently gambling on data that doesn’t make sense.
By 2016, the amount of money spent on digital marketing will consume 35% of total marketing budgets. Experian says we’re already wasting £197 million because of bad data. How much more can you afford to waste? If you continue wasting money at the current rate, how long will it take for a more agile competitor to overtake you?
Turn over a new leaf in 2016 with better data
Data is the one constant in every business. It flows through every process and helps us make sense of what we do. We owe it to our staff, our users and our customers to manage data properly and improve its accuracy. The New Year presents a great time to change the way we’re managing data.
Data quality software is no longer a niche purchase, , or something that can be pushed back into next year’s budget. It’s now an essential component in the workings of an efficient business. Not only that, but data quality is married to automation and integration, and your business needs both if it’s to survive.
From marketing to business intelligence, data quality is becoming a prime concern. Forrester predicts that 2016 will bring more personalisation, better customer experience, more ‘digitally savvy’ leaders and a requirement to make digital a “core driver of business transformation”. Will your organisation be one of the few that puts data quality at the heart of its strategy?
The original blog can be seen here.
The first prediction is that data and analytics will continue to grow at an astounding pace and with increased velocity
This is no big surprise as all the past reports have pointed towards this growth and expansion -
Venturebeat * note that “Although the big data market will be nearly $50B by 2019 according to analysts, what’s most exciting is that the disruptive power of machine data analytics is only in its infancy. Machine analytics will be the fastest growing area of big data, which will have CAGR greater than 1000%.”
The move towards cloud based solutions opens up opportunities and it is not going to reverse. Following on from the trend in recent years yet more and more companies are increasing their use of cloud based solutions and along with this the opportunity to extract and collect data provides a potential for gleaning some information and knowledge from that data.
Suhale Kapoor, Co-Founder and Executive Vice President, Absolutdata Analytics * highlights “The fast shift to the cloud: The cloud has become a preferred information storage place. Its rapid adoption is likely to continue even in 2016. According to Technology Business Research, big data will lead to tremendous cloud growth; Revenues for top 50 public cloud providers shot up from 47% in the last quarter of 2013 to $ 6.2 billion"
It is not difficult to predict that in 2016 the cloud and the opportunities that open up for data, analytics and machine learning will becomes huge drivers for business
Applications will learn how to make themselves better
Applications will be designed to discover self improvement strategies as a new breed of log and machine data analytics, at the cloud layer, using predictive algorithms, enables; continuous improvement, continuous integration and continuous deployment. The application will learn from its users, in this sense the users will become the system architects teaching the system what they, the users, want and how the system is to deliver it to them.
Gartner view Advanced Machine Learning amongst the top trends to emerge in 2016 * with “advanced machine learning where deep neural nets (DNNs) move beyond classic computing and information management to create systems that can autonomously learn to perceive the world, on their own … (being particularly applicable to large, complex datasets) this is what makes smart machines appear "intelligent." DNNs enable hardware- or software-based machines to learn for themselves all the features in their environment, from the finest details to broad sweeping abstract classes of content. This area is evolving quickly, and organisations must assess how they can apply these technologies to gain competitive advantage.” the capability of systems to use advanced machine learning does not need to be confined to the information it finds outside it will also be introspective and be applied to the systems own itself and how it interfaces with human users.
A system performing data analytics needs to learn what questions it is being asked, how the questions are framed, as well as the vocabulary and the syntax the user chooses to ask those questions. No longer will the user be required to struggle with the structure of queries and programing language aimed at eliciting insight from data. The system will understand the users natural language requests such as “get me all the results that are relevant to my understanding of ‘x,y and z’ ”. The system will be able to do this because of the experience it has of the user/s asking these questions many times in structured programming languages (a corpus of language that the machine has long understood) and matching them to a new vocabulary that is more native to the non specialised user.
2016 will be the year these self learning applications emerge due to changes in the technology landscape for as Himanshu Sareen, CEO at Icreon Tech * points out this move to machine learning is being fuelled by the technology that is becoming available “Just as all of the major cloud companies (Amazon Web Services, Google, IBM, etc.) provide analytics as a service, so do these companies provide machine learning APIs in the cloud. These APIs allow everyday developers to ‘build smart, data-driven applications’ ” it would be a foolish if these developers did not consider a system that was not self learning.
Our prediction is that through 2016 many more applications will become self learning thanks to developments in deep learning technology
Working with data will become easier
While the highly specialised roles of the programmer,the data scientist, and the data analyst are not going to disappear the exclusivity of the insights they have been part to is set to dissipate. Knowledge gleaned from data will not remain in the hands of the specialist and technology will once again democratise information. The need for easy to use applications providing self serve reports and self serve analysis is already recognised by business According to Hortonworks Chief Technology Officer Scott Gnau * “There is a market need to simplify big data technologies, and opportunities for this exist at all levels: technical, consumption, etc.” … “Next year there will be significant progress towards simplification,”
Data will become democratised, first from programmers, then from data scientists and finally from analysts as Suhale Kapoor, Co-Founder and Executive Vice President, Absolutdata remarks “Even those not specially trained in the field will begin to crave a more mindful engagement with analytics. This explains why companies are increasingly adopting platforms that allow end users to apply statistics, seek solutions and be on top of numbers.” … “Humans can’t possibly know all the right questions and, by our very nature, those questions are loaded with bias, influenced by our presumptions, selections and what we intuitively expect to see. In 2016, we’ll see a strong shift from presumptive analytics — where we rely on human analysts to ask the right, bias-free questions — toward automated machine learning and smart pattern discovery techniques that objectively ask every question, eliminating bias and overcoming limitations.”
“Historically, self-service data discovery and big data analyses were two separate capabilities of business intelligence. Companies, however, will soon see an increased shift in the blending of these two worlds. There will be an expansion of big data analytics with tools to make it possible for managers and executives to perform comprehensive self-service exploration with big data when they need it, without major handholding from information technology (IT), predicts a December study by business intelligence (BI) and analytics firm Targit Inc.” *…“Self-service BI allows IT to empower business users to create and discover insights with data, without sacrificing the greater big data analytics structures that help shape a data-driven organisation,” Ulrik Pedersen, chief technology officer of Targit, said in the report.
We are able to confidently predict that in 2016 more and more applications for analysing data will require less technical expertise.
Data integration will become the key to gaining useful information
The maturity of big data processing engines enable an agile exploration of data and agile analytics able to make huge volumes of disparate and complex data fathomable. Connecting and combining datasets unlocks the insights held across data silos and will be done in the automatically in the background by SaaS applications rather than by manually manipulating spreadsheets.
David Cearley, vice president and Gartner Fellow postulates a “The Device Mesh” that “refers to an expanding set of endpoints people use to access applications and information or interact with people, social communities, governments and businesses” and that "In the postmobile world the focus shifts to the mobile user who is surrounded by a mesh of devices extending well beyond traditional mobile devices," that are “increasingly connected to back-end systems through various networks” and “As the device mesh evolves, we expect connection models to expand and greater cooperative interaction between devices to emerge”.
In the same report Cearley says that “Information has always existed everywhere but has often been isolated, incomplete, unavailable or unintelligible. Advances in semantic tools such as graph databases as well as other emerging data classification and information analysis techniques will bring meaning to the often chaotic deluge of information.”
It is an easy prediction but, more and more data sets will be blended from different sources allowing more insights, this will be a noticeable trend that will emerge during 2016.
Seeing becomes all important, visualisations are the key to unlocking the path from data to information to knowledge
Having the ability to collect and explore complex data leads to an inevitable need to have a toolset to understand them. Tools that can present the information in these complex data as visual representations have been getting more mature and more widely adopted. Suhale Kapoor, Co-Founder and Executive Vice President, Absolutdata Analytics * “Visuals will come to rule: The power of pictures over words is not a new phenomenon – the human brain has been hardwired to favour charts and graphs over reading a pile of staid spreadsheets. This fact has hit data engineers who are readily welcoming visualisation softwares that enable them to see analytical conclusions in a pictorial format.”
The fact that visualisation do leverage knowledge from data will lead to more adaptive and dynamic visualisation tools “Graphs and charts are very compelling, but also static, sometimes giving business users a false sense of security about the significance — or lack of it — in the data they see represented. … data visualisation tools will need to become more than pretty graphs — they’ll need to give us the right answers, dynamically, as trends change … leading to dynamic dashboards … automatically populating with entirely new charts and graphs depicting up-to-the-minute changes as they emerge, revealing hidden insights that would otherwise be ignored”*
We predict that in 2016 a new data centric semiotic, a visual language for communicating data derived information, will become stronger, grow in importance and be the engine of informatics .
Originally posted on Data Science Central
While many of us recognize that companies are empowered by actionable information penetrations and help drive sales, devotion and superior customer experiences, the thought of making sense of enormous quantities of information and undertaking the task of unifying is daunting. But that is slowly changing. Experts forecast that this year, budgets will be allocated by most companies, and that 2015 will undoubtedly be the year of big data and discover the best tools and resources to really harness their data.
Information gathering has developed radically, and both C-level executives as well as their teams now recognize they have to join the data arms race that was big to keep and grow their customer base, also to stay competitive in today's data-driven marketplace. Terms like in-memory databases, sensor information, customer data platforms and predictive analytics will end up more widely understood.
With terabytes of information being gathered by companies at multiple touchpoints, platforms, devices and offline places, companies will start to focus more on possessing their info, to be able to access, visualize and control this data, and on monetizing their audience in real-time together with the right content. More emphasis will likely be placed on ethically info is accumulated, how clean and collect the big data is and to be an information hoarder that accumulates information you don't really want.
Here are the top 5 information trends that we predict will reign 2015:
1. Data agility will take center stage
It's not sufficient to just own quantities of customer information if this info is not agile. More companies are seeking approaches that are simple, quick and easy to offer unified and protected use of customer information, across departments and systems. CMOs, CTOs, information scientists, business analysts, programmers and sales teams possess precisely the same pressing need for tools and training to assist them navigate their customer data. With the growing popularity of wearables, sensors and IoT apparatus, there's additional real time information flooding in. Plus having customer information saved on multiple legacy platforms and third party vendor systems only makes information agility that much more challenging. Most firms only use about 12.5% of their available data to grow their company. Having access to the proper tools which make customer information more agile and easy to use is going to be a significant focus of businesses in 2015.
2. Information is the New Gold & Puts Businesses In Control
For several businesses, the most commonly-faced information need is ownership and unification: Volumes of information being generated every second, being saved on multiple legacy platforms that still use dated structure, as well as the inability to access all this customer data in a single area to get a "complete view" of their customers. But together with technology that makes information union easier and the introduction of new tools, businesses are beginning to appreciate the worth of controlling and possessing their customer data. The frustrations of working with multiple third party sellers to gain possession of information, along with a lack of information rights keys that permits you to automatically pull information from these vendors will be major pain points which is handled. Companies can now select from a variety of systems like Umbel to help gather first-party customer information from multiple online and offline sources, platforms and sellers, possess and unify the data, and make use of the information in real-time to power and optimize marketing and sales efforts.
3. The Rise of Customer Information Platforms
While DMPs and CRMs help fulfill many business needs, today's marketers want a centralized customer information platform like Umbel that examines and gives profound penetrations on their customer base to them. Very few businesses really have one genuinely complete, unified customer database alternative. They're largely still using multiple systems and platforms that collect information separately.
A CMO's top priority will probably be to possess a reliable Customer Info Platform that collects exact customer information from all online and offline touch points (enclosed web site visits and purchases, social interactions, beacon info, cellular and in store interactions etc.), removes duplicates and appends it with added data (demographic, geographic, behavioral, brand kinship) from other trusted sources.
4. Info Democratization Across Departments
The abundance of customer data offered to brands today is staggering and yet many companies are yet to fully use the information to supercharge marketing and sales efforts. Among the biggest hurdles that marketers face is the fact that accessibility for this information is quite limited at most firms. Primarily, only larger companies with IT resources had the capacity to gather, save, analyze, and monetize this information that is precious. Second even if data, the IT department was collecting and/or the business enterprise analytics teams have restricted access to this information and sales and marketing teams that actually use this data must undergo a convoluted, time-consuming procedure to get insights and the data they need.
With new tools like Umbel, teams don't desire an Information Scientist to make sense of their data.
For info to be genuinely valuable to an organization, it is critical that the info be democratized across teams and departments, empowering all employees, irrespective of their specialized expertise, to get information and make more informed decisions. In 2015 more companies will start to use automated platforms that enable anyone in the organization to see, assess and take actions according to customer data.
5. Mobile Data and Strategy Will End Up Vital to Advertising
Based on eMarketer, mobile search ad spend in the U.S. grew 120.8% in 2013 (overall gain of 122.0% for all mobile advertisements). Meanwhile, desktop advertisement spending went up by just 2.3% last year. The mobile program has become as useful as an essential component of any marketing plan, and sites for retailers. For companies to remain competitive, a seamless, secure, fast and instinctive experience on mobile devices, and also the ability to capture this information that is mobile and add it to a unified customer data base is critical. Having this unified information of customers from every touchpoint (including cellular and offline) will enable firms to identify trends and shape a better customer experience. More companies are getting to be conscious of how significant it is to be able to unify their information and compare analytics across all platforms to help them create personalised marketing campaigns centered on a "complete customer view."
Originally posted on Data Science Central
It’s no secret that analytics are everywhere. We can now measure everything, from exabytes of organizational “big data” to smaller, personal information like your heart rate during a run. And when this data is collected, deciphered, and used to create actionable items, the possibilities, both for businesses and individuals, are virtually endless.
One area tailor-made for analytics is the sports industry. In a world where phrases like “America’s pastime” are thrown around and “the will to win” is revered as an intangible you can’t put a number on, stats lovers with PhDs in analytics are becoming more and more essential to sports franchises. Since the sabermetric revolution, sports franchises have begun investing time and money in using sports analytics from wearable technology to help their athletes train and even make more money from their stadiums.
Today, Sports Fans Prefer the Couch Over the Stadium
For decades, television networks have tried to create an at-home experience that’s on par with the stadium experience — and they’ve succeeded emphatically. In a 1998 ESPN poll, 54% of sports fans reported that they would rather be at the game than watch it at home; however, when that same poll was readministered in 2011 found that only 29% preferred being at the game.
While this varies by sport to some degree, the conclusion is clear: people would rather watch a game in the comfort of their own climate-controlled homes, with easy access to the fridge and a clean bathroom, than experience the atmosphere of the stadium in person. Plus, sports fans today want the ability to watch multiple games at once; it’s not unusual for diehard fans to have two televisions set up with different games on, plus another game streaming on a tablet.
However, fans could be persuaded to make their way back to the stadiums; 45% of “premium fans” (who always or often buy season tickets) would pay more money for a better in-person experience. That’s where wearable technology comes into play.
Wearable Data — for Fans Too
At first glance, the sole application of wearable technology and data science should seemingly be to monitor and improve athlete performance. These tasks might include measuring heart rate and yards run, timing reactions and hand speed, gauging shot arch, and more, while also monitoring the body for signs of concussion or fatigue
And that’s largely true. For example, every NBA arena now uses SportVU, a series of indoor GPS technology-enabled cameras, to track the movements of the ball and all players on the court at a rate of 25 times per second. With that data, they can use myriad statistics concerning speed, distance, player separation, and ball possession to decide when to rest players.
Similarly, Adidas’ Micoach is used by the German national soccer team during training to monitor speed, running distances, and heart rates of each player. In fact, this system is credited with the decision to sub in German soccer player Mario Gotze in the 88th minute of the 2014 World Cup final; in the 113th minute, the midfielder scored the World Cup-winning goal.
However, some sports franchises are using that wearable technology to benefit the fan sitting in the stadium. For example, the Cleveland Cavaliers’ Quicken Loans Arena (an older stadium) was retrofitted with SportsVU; however, they don’t use them just for determining when LeBron James needs a break. Instead, the Cavs use the data tracked by SportsVU to populate their Humungotron with unique statistics tracked in real-time during the game. The Cavs then took this data to the next level by using the stats in their social media marketing and to partner with various advertisers.
How Analytics Are Improving the Stadium Experience
Besides sharing interesting statistics on the JumboTron during the game, stadiums are using data from athletes and fans to enhance the spectators’ experience. In fact, stadiums are actually mirroring the in-home experience, through various apps and amenities that reach the spectator right in their seat.
And at times, they’re going above and beyond simply imitating the in-home experience. Take the Sacramento Kings, for example. In 2014, the team partnered with Google to equip many of its courtside personnel (mascots, reporters, and even dancers) with Google Glass. Fans were able to stream close-up, first-person views of the action through their mobile devices, allowing them to feel closer than their upper-level seats would suggest.
Levi’s Stadium in Santa Clara (home of the San Francisco 49ers) boasts a fiber optic network that essentially powers every activity in their thoroughly modern stadium. The stadium contains 680 Wi-Fi access ports (one for every 100 seats in the stadium) and around 12,000 ethernet ports, allowing everything from video cameras and phones to connect to a 40 gigabit-per-second network that’s 10,000 times faster than the federal classification for broadband. 1700 wireless beacons use a version of Bluetooth to triangulate a fan’s position within the stadium and give them directions. And for fans who don’t want to leave their seats, a specially developed app can be used for tickets, food delivery to your seat, and watching replays of on-field action.
The Miami Dolphins, meanwhile, have partnered with IBM and use technology from their “Smart Cities” initiative to monitor and react to weather forecasts, parking delays, and even shortages of concessions at specific stands in Sun Life Stadium. The Dallas Cowboys’ AT&T Stadium features 2,800 video monitors throughout the stadium as well as more than five million feet of fiber optic cable, used for everything from gathering data to ordering food in-suite.
NFL teams aren’t the only franchises making use of sports analytics. The Barclays Center, home of the Brooklyn Nets, uses Vixi to display properly hashtagged tweets on multiple big screens throughout the arena. They also use AmpThink, a series of networking tools that require the user to submit some personal information before logging onto the arena’s Wi-Fi; that way, they’re able to collect data on how and where people are logging in, as well as what services they’re using while in the arena. Fans can already order food and drink from their seats and replay sequences from various camera angles, and in the future, they’ll be able to use an app that gives information about restroom waits and directions to the restrooms with the shortest lines.
To some, the increase of connectivity might seem to take away from the experience of watching a game live; after all, how can you enjoy live action if you’re constantly staring down at your phone? On the contrary: by employing these apps to judge the shortest bathroom lines or order food directly to their seats, fans are able to stay in their seat longer and watch more of the games.
While this technology certainly isn’t cheap (and will be reflected in increased ticket prices), those extra minutes of action may be worth the higher cost to some fans. Ultimately, it’s up to the fans to decide if paying more for tickets is worth the premium experience — and the time saved waiting in line.
Bringing Fans Back, One Byte at a Time
Sports teams aren’t going to lose their fans to television without a fight. And with the majority of sports franchises embracing wearable and mobile data in some form or another, it’s a natural transition for marketing departments to apply that data to the fan experience. With easy access to Wi-Fi, snacks, replays, and shorter restroom lines, sports fans can combine the atmosphere of game day with the comfort of being in their own homes.
Originally posted on Data Science Central
Predictions are in our DNA. Millions of us live with them daily, from checking the weather to reading daily horoscopes. When it comes to Big Data, the industry has shown no shortage of predictions for 2014. In fact, you might have read about insights on women in data science, ambitions for Machine Learning or a vision for the consumerization of Advanced Analytics.
It is quite difficult to accurately assess when these predictions will materialize. Some of them will see the light of the day in 2014 but many might take until 2020 to fully mature.
Wearable Devices and Big Data
Take the case of wearable devices. There is no question that mobile phones, tablets and smart watches will become pervasive over the next 5 years. According to Business Insider, the market for wearables could reach $12B in 2018 and theses devices have a strong potential for changing our habits all together.
The only issue is how quickly we will adopt them and in turn get clear value from them. Pioneers like Robert Scoble have made a great case for the opportunity but also have provided a down to earth perspective for the rest of us (his recent article on “Why Google Glass is doomed ” is a gem).
So, I predict that, while the tipping point for such technologies might be 2014, but the true disruption might not happen before 2020. Why? Definitions and Center of Design.
For starters, the definition of a “wearable device” is still very loose. I’m a big fan of devices like the Jawbone UP, the Fitbit and the Basis watch. In fact, I’ve built an analytical system that allows me to visualize my goals, measure and predict my progress already. My “smart devices” collect information I couldn’t easily understand before and offer the opportunity to know more about myself. Big Data growth will primarily come from these types of smart devices.
The wearables that are still confusing are the so-called “smart-watches”. Theses watches, in my opinion, suffer from a “Center of Design” Dilemna.
Let me explain: the technology industry is famous for wanting new technologies to sunset old ones. When Marc Benioff introduced Chatter, he said it would obliterate email. When PC shipments went down, the industry rushed to talk about the “Post-PC” era. Have any of these two trends fully materialized yet?!
The answer is unfortunately not simple. Smart watches, phones, tablets and PC all have a distinct use cases, just like email and social apps. Expecting that one technology would completely overlap the other one would be disregarding what I call a product’s “center of design”. The expression refers to the idea that a particular technology can be stretched for many uses but that it is particularly relevant for a set of defined use cases. Let’s take the example of the phone, tablet and PC:
- A phone is best used for quickly checking texts, browsing emails, calendar invites…and of course making phone calls (duh!)
- A tablet is best used for reading and browsing websites, documents, books and emails. Typing for 12 hours and creating content is possible but it’s not a tablet’s center of design…
- A PC or a Macbook are best for creating content for many hours. They might be best for typing, correcting and working on projects that require lots of editing.
When I see an ad like this on the freeway, I really question the value of an additional device. What can a watch in this case add, if the wrist that wears it, is also connected to a hand that holds a much more appropriate device?
Big Data from Wearables is a Predictive Insight for 2020 in my opinion, because I think that, by then, the broad public will have embraced them into use cases that truly add value to their lives.
Bruno Aziza is a Big Data entrepreneur and author. He’s lead Marketing at multiple start-ups and has worked at Microsoft, Apple and BusinessObjects/SAP. One of his startups sold to Symantec in 2008 and two of them have raised tens of millions and experienced triple digit growth. Bruno is currently Chief Marketing Officer at Alpine Data Labs, loves soccer and has lived in France, Germany and the U.K.
Originally posted on Data Science Central
Guest blog post by Mike Davie.
With the exponential growth of IoT and M2M, data is seeping out of every nook and cranny of our corporate and personal lives. However, harnessing data and turning it into a valuable asset is still in its infancy stage of development. In a recent study, IDC estimates that only 5% of data created is actually analyzed. Thankfully, this is set to change as companies now have found lucrative revenue streams by converting their data into products.
Impediments to Data Monetization
Many companies are unaware of the value of their data, the type of customers who might potentially be interested in those data, and how to go about monetizing the data. To further complicate matters, many also are concerned that the data they possess, if sold, could reveal trade secrets and personalized information of their customers, thus violating personal data protection laws.
Dashboards and Applications
The most common approach for companies who have embarked on data monetization is to develop a dashboard or application for the data, thinking that it would give them greater control over the data. However, there are several downsides to this approach:
- Limited customer base
- The dashboard or application is developed with only one type of customer in mind, thus limiting the potential of the underlying data to reach a wider customer base.
- Data is non-extractable
- The data in a dashboard or application cannot be extracted to be mashed up with other data, with which valuable insights and analytics can be developed.
- Long lead time and high cost to develop
- Average development time for a dashboard or application is 18 months. Expensive resources including those of data scientists and developers are required.
Data as a Product
What many companies have failed to realize is that the raw data they possess could be cleansed, sliced and diced to meet the needs of data buyers. Aggregated and anonymized data products have a number of advantages over dashboards and applications.
- Short lead time and less cost to develop
- The process of cleaning and slicing data into bite size data products could be done in a 2-3 month time frame without the involvement of data scientists.
- Wide customer base
- Many companies and organizations could be interested in your data product. For example, real time footfall data from a telco could be used in a number of ways:
- A retailer could use mall foot traffic to determine the best time of the day to launch a new promotion to drive additional sales during off-peak hours.
- A logistics provider could be combining footfall data with operating expenses to determine the best location for a new distribution centre.
- A maintenance company could be using footfall to determine where to allocate cleaners to maximize efficiency, while ensuring clean facilities.
- Many companies and organizations could be interested in your data product. For example, real time footfall data from a telco could be used in a number of ways:
- Data is extractable
- Data in its original form could be meshed and blended with other data sources to provide unique competitive advantages. For example:
- An airline could blend real time weather forecast data with customer profile data to launch a promotion package prior to severe bad weather for those looking to escape for the weekend.
- Real time ship positioning data could be blended with a port’s equipment operation data to minimize downtime of the equipment and increase overall efficiency of the port.
- Data in its original form could be meshed and blended with other data sources to provide unique competitive advantages. For example:
Monetizing your data does not have to a painful and drawn out undertaking if you view data itself as the product. By taking your data product to market, data itself can become one of your company’s most lucrative and profitable revenue streams. By developing a data monetization plan now, you can reap the rewards of the new Data Economy.
About the Author:
Mike Davie has been leading the commercialization of disruptive mobile technology and ICT infrastructure for a decade with leading global technology firms in Asia, Middle East and North America.
He parlayed his vision and knowledge of evolution of ICT into the creation of DataStreamX, the world's first online marketplace for real time data. DataStreamX’s powerful platform enables data sellers to stream their data to global buyers across various industries in real time, multiplying their data revenue without having to invest in costly infrastructure and sales teams. DataStreamX's online platform provides a plethora of real time data to data hungry buyers at the click of their fingertips, enabling them to broaden and deepen their understanding of the industry they compete in, and to device effective strategies to out-manoeuvre their competitors.
Prior to founding DataStreamX, Mike was a member of the Advanced Mobile Product Strategy Division at Samsung where he developed go-to-market strategies for cutting edge technologies created in the Samsung R&D Labs. He also provided guidance to Asia and Middle East telcos on their 4G/LTE infrastructure data needs and worked closely with them to monetize their M2M and telco analytics data.
Mike has spoken at ICT and Big Data conferences including 4G World, LTE Asia, Infocomm Development of Singapore's IdeaLabs Sessions. Topics of his talks include Monetization of Data Assets, Data-as-a-Service, the Dichotomy of Real-time vs. Static Data.
Originally posted on Data Science Central
Guest blog post by ajit jaokar
The Open Cloud – Apps in the Cloud
Based on my discussions at Messe Hannover , this blog explores the potential of applying Data Science to manufacturing and process control industries. In my new course at Oxford University (Data Science for IoT) and community (Data Science and Internet of Things ), I explore application of predictive algorithms to Internet of Things (IoT) datasets.
The Internet of Things plays a key role here because sensors in machines and process control industries generate a lot of data. This data has real, actionable business value (Smart Data). The objective of Smart data is to improve productivity through digitization. I had a chance to speak to Siemens management and engineers about how this vision of Smart Data is translated into reality
When I discussed the idea of Smart Data with Siegfried Russwurm, Prof. Dr.-Ing. - Member of the Managing Board of Siemens AG , he spoke of key use cases that involve transforming big data into business value by providing context, increasing efficiency and addressing large, complex problems. These include applications for Oil rigs, wind turbines and process control industries etc. In these industries, the smallest productivity increase translates to huge commercial gains.
This blog is my view on how this vision (Smart data) could translate into reality within the context Data Science and IoT.
Data: the main driver for Industrie 4.0 ecosystem
At Messe Hannover, it was hard to escape the term ‘Industry 4.0’ (in German – Industrie 4.0). Broadly, Industry 4.0 refers to the use of electronics and IT to automate production and to create intelligent networks along the entire value chain that can control each other autonomously. Machines generate a lot of Data. In many cases, if you consider the large installation such as an Oil Rig, this data is bigger than the traditional ‘Big Data’. Its use case is also slightly different i.e. the value does not like in capturing a lot of data from outside the enterprise – but rather in capturing (and making innovative uses of) a large volume of data generated within the enterprise. The ‘smart’ in smart data is predictive and algorithmic. Thus, Data is the main driver of Industry 4.0 and it’s important to understand the flow of Data before it can be optimized
The flow of Data in the Digital Enterprise
The ‘Digital factory’ is already a reality. For instance, Industrial Ethernet standards like Profinet, PLM(Product lifecycle management) software like Teamcenter and Data models for lifecycle engineering and plan management such as Comos. To extend the Digital factory to achieve end-to-end interconnection and autonomous operation across the value chain (as is the vision of Industry 4.0), we need a component in the architecture.
The Open Cloud: Paving the way for Smart Data analytics
In that context, the cooperation of Siemens with SAP to create open cloud platform. Is very interesting. The Open Cloud enables ‘apps in the cloud’ based on the intelligent use of large quantities of data. The SAP Hana architecture based on in-memory, columnar database provides analytics services in the Cloud. For instance, the "Asset Analytics"(to increase the availability of machines through online monitoring, pattern recognition, simulation, prediction of issues) and “Energy Analytics" ( revealing hidden energy savings potential)
While it is early days, based on the above, the manufacturing domain offers real value and tangible benefits to customers. Even now, we see the customers who harness value from large quantities of Data through predictive analytics stand to gain significantly. I will cover this subject in more detail as it evolves.
About the author
Ajit''s work spans research, entrepreneurship and academia relating to IoT, predictive analytics and Mobility. His current research focus is on applying data science algorithms to IoT applications. This includes Time series, sensor fusion and deep learning. This research underpins his teaching at Oxford University (Big Data and Telecoms) and the City sciences program at the Technical University of Madrid (UPM). Ajit also runs a community/learning program through his company - futuretext for Data Science and IoT
guest blog by Jin Kim, VP Product Development for Objectivity, Inc.
Almost any popular, fast-growing market experiences at least a bit of confusion around terminology. Multiple firms are frantically competing to insert their own “marketectures,” branding, and colloquialisms into the conversation with the hope their verbiage will come out on top.
Add in the inherent complexity at the intersection of Business Intelligence and Big Data, and it’s easy to understand how difficult it is to discern one competitive claim from another. Everyone and their strategic partner is focused on “leveraging data to glean actionable insights that will improve your business.” Unfortunately, the process involved in achieving this goal is complex, multi-layered, and very different from application to application depending on the type of data involved.
For our purposes, let’s compare and contrast two terms that are starting to be used interchangeably – Information Fusion and Data Integration. These two terms in fact refer to distinctly separate functions with different attributes. By putting them side-by-side, we can showcase their differences and help practitioners understand when to use each.
Before we delve into their differences, let’s take a look at their most striking similarity. Both of these technologies and best practices are designed to integrate and organize data coming in from multiple sources in order to present a unified view of data for consumption by various applications to derive actionable insights, thus making it easier for analytics applications to use and derive the “actionable insights” everyone is looking to generate.
However, Information Fusion diverges from Data Integration in a few key ways that make it much more appropriate for many of today’s environments.
• Data Reduction – Information Fusion is, first and foremost, designed to enable data abstraction. So, while data integration focuses on combining data to create consumable data, Information Fusion frequently involves “fusing” data at different abstraction levels and differing levels of uncertainty to support a more narrow set of application workloads.
• Handling Streaming/Real-Time Data – Data Integration is best used with data-at-rest or batch-oriented data. The problem is that the most compelling applications associated with Big Data and the Industrial Internet of Things are often based on streaming, sensor data. Information Fusion is capable of integrating, transforming and organizing all manner of data (structured, semi-structured, and unstructured), but specifically time-series data, for use by today’s most demanding analytics applications to bridge the gap between Fast Data and Big Data. Another way to put this is Data integration creates an integrated set of data where the larger set is retained. By comparison, Information Fusion uses multiple techniques to reduce the amount of stateless data and provide only the stateful, valuable and relevant, data to deliver improved confidence.
• Human Interfaces – Information Fusion also adds in the opportunity for a human analyst to incorporate their own contributions to the data in order to further reduce uncertainty. By adding and saving inferences and detail that can only be derived with human analysis and support into existing and new data, organizations are able to maximize their analytics efforts and deliver a more complete “Big Picture” view of a situation.
As you can see, Information Fusion, unlike Data Integration, focuses on deriving insight from real-time streaming data and enriching this stream with semantic context from other Big Data sources. This is a critical distinction, as todays most advanced, mission-critical, analytical applications start looking to Information Fusion to add real-time value.
Originally posted on Data Science Central
Guest blog post by Bill Vorhies
Now that everyone is thinking about IoT and the phenomenal amount of data that will stream past us and presumably need to be stored we need to break out a vocabulary well beyond our comfort zone of mere terabytes (about the size of a good hard drive on your desk).
In this article Beyond Just “Big” Data author Paul McFedries argues for nomenclature even beyond Geopbytes (and I'd never heard of that one). There is a presumption though that all that IoT data actually needs to be stored which is misleading. We may want to store some big chunks of it but increasingly our tools are allowing for 'in stream analytics' and for filtering the stream to identify only the packets we're interested in. I don't know that we'll ever need to store Geopbytes but you'll enjoy his argument. Use the link Beyond Just “Big” Data.
Here's the beginning of his thoughts:
Beyond Just “Big” Data
We need new words to describe the coming wave of machine-generated information
When Gartner released its annual Hype Cycle for Emerging Technologies for 2014, it was interesting to note that big data was now located on the downslope from the “Peak of Inflated Expectations,” while the Internet of Things (often shortened to IoT) was right at the peak, and data science was on the upslope. This felt intuitively right. First, although big data—those massive amounts of information that require special techniques to store, search, and analyze—remains a thriving and much-discussed area, it’s no longer the new kid on the data block. Second, everyone expects that the data sets generated by the Internet of Things will be even more impressive than today’s big-data collections. And third, collecting data is one significant challenge, but analyzing and extracting knowledge from it is quite another, and the purview of data science.
Guest blog post by ajit jaokar
Often, Data Science for IoT differs from conventional data science due to the presence of hardware.
Hardware could be involved in integration with the Cloud or Processing at the Edge (which Cisco and others have called Fog Computing).
Alternately, we see entirely new classes of hardware specifically involved in Data Science for IoT(such as synapse chip for Deep learning)
Hardware will increasingly play an important role in Data Science for IoT.
A good example is from a company called Cognimem which natively implements classifiers(unfortunately, the company does not seem to be active any more as per their twitter feed)
In IoT, speed and real time response play a key role. Often it makes sense to process the data closer to the sensor.
This allows for a limited / summarized data set to be sent to the server if needed and also allows for localized decision making. This architecture leads to a flow of information out from the Cloud and the storage of information at nodes which may not reside in the physical premises of the Cloud.
In this post, I try to explore the various hardware touchpoints for Data analytics and IoT to work together.
Cloud integration: Making decisions at the Edge
Intel Wind River edge management system certified to work with the Intel stack and includes capabilities such as data capture, rules-based data analysis and response, configuration, file transfer and Remote device management
Integration of Google analytics into Lantronix hardware – allows sensors to send real-time data to any node on the Internet or to a cloud based application.
Microchip integration with Amazon Web services uses an embedded application with the Amazon Elastic Compute Cloud (EC2) service. Based on Wi-Fi Client Module Development Kit . Languages like Python or Ruby can be used for development
Integration of Freescale and Oracle which consolidates data collected from multiple appliances from multiple Internet of things service providers.
Libraries are another avenue for analytics engines to be integrated into products – often at the point of creation of the device. Xively cloud services is an example of this strategy through xively libraries
In contrast, keen.io provides APIs for IoT devices to create their own analytics engines ex (smartwatch Pebble’s using of keen.io) without locking equipment providers into a particular data architecture.
We see increasing deployment of specialized hardware for analytics. Ex egburt from Camgian which uses sensor fusion technolgies for IoT.
In the Deep learning space, GPUs are widely used and more specialized hardware emerges such asIBM’s synapse chip. But more interesting hardware platforms are emerging such as Nervana Systemswhich creates hardware specifically for Neural networks.
Ubuntu Core and IFTTT spark
Two more initiatives on my radar deserve a space in themselves – even when neither of them have currently an analytics engine: Ubuntu Core – Docker containers+lightweight Linux distribution as an IoT OS and IFTTT spark initiatives
This post is leading to vision for Data Science for IoT course/certification. Please sign up on the link if you wish to know more when launched in Feb.
Image source: cognimem
Note: this page contains paid content.
Please, subscribe to get an access.
Note: this page contains paid content.
Please, subscribe to get an access.