Originally posted by Vincent Granville
It's time again to share your predictions for 2017. I did my homework and came with these 10 predictions. I invite you to post your predictions in the comment section, or write a blog about it. Ramon Chen's predictions are posted here, while you can read Tableau's prediction here. Top programming languages for 2017 can be found here. Gil Press' top 10 hot data science technologies is also worth reading. For those interested, here were the predictions for 2016. Finally, MariaDB discusses the future of analytics and data warehousing in their Dec 20 webinar.
- Data science and machine learning will become more mainstream, especially in the following industries: energy, finance (banking, insurance), agriculture (precision farming), transportation, urban planning, healthcare (customized treatments), even government.
- Some, with no familiarity with data science, will want to create a legal framework about how data can be analyzed, how the algorithms should behave, and to force public disclosure of algorithm secrets. I believe that they will fail, though Obamacare is an example where predictive algorithms were required to ignore metrics such as gender or age, to compute premiums, resulting in more expensive premiums for everyone.
- The rise of sensor data - that is, IoT - will create data inflation. Data quality, data relevancy, and security will continue to be of critical importance.
- With the rise of IoT, more processes will be automated (piloting, medical diagnosis and treatment) using machine-to-machine or device-to-device communications powered by algorithms relying on artificial intelligence (AI), deep learning, and automated data science. I am currently writing an article that describes the differences between machine learning, IoT, AI, deep learning and data science. You can sign-up on DSC to make sure that you won't miss it.
- The frontier between AI, IoT, data science, machine learning, deep learning and operations research will become more fuzzy. Statistical engineering will be present in more and more applications, be it machine learning, AI or data science.
- Many systems will continue to not work properly. The solution will have to be found not in algorithms, but in people. Read my article Why so many Machine Learning Implementations Fail. An example is Google analytics, which fails to catch huge amounts of robotic traffic that is so rudimentary and so obvious, you don't need any statistical or data science knowledge to filter it or block it. People publish elementary solutions to address these issues, yet it continues unabated. Fake reviews, fake news, undetected hate speech on Twitter, undetected plagiarism by Google search, are in the same category. Eventually it leaves room for new players to jump in and build a system that will actually work.
- Reliance on public data and public news will come with bigger scrutiny. Some say that the failure to predict the elections is a data science failure. In my opinion, it is a different type of failure: it is the failure to recognize that the media are biased (they publish whatever predictions that fit with their agenda) and maybe even those doing the surveys are biased or incompetent (there are lies, damn lies, and statistics as the saying goes). It is also a failure to recognize the very high volatility in these elections, and the fact that day-to-day variations were huge. Anyone able to compute sound confidence intervals that incorporates historical data, would have said that the results were not reliably predictable. Finally, I always thought that the winner would be the one best able at manipulation and playing tricks, be it hacking or paying the media.
- More and more data cleaning, pre-processing, and exploratory data analysis will be automated. We will also face more unstructured data, with powerful ways to structure them. Multiple algorithms and models will be more and more blended together to provide the best pattern recognition and predictive systems, and boost accuracy.
- Data science education will evolve, with perhaps a come back of strong university curricula run by leading practitioners, and fewer people finding a job through data science camps only, as many of these camps do not train you to become a data scientist, but instead a Python / R / SQL coder with classic, elementary, even outdated and dangerous statistical knowledge. Or data camps will have to evolve, or otherwise risk becoming another kind of Phoenix university.
- Attacks against data-dependent infrastructure will switch from stealing or erasing data, to modifying data. Some will be launched from IoT devices if security holes are not fixed.