The Predictive Analytics Summit was held in San Diego this week. Kudos to Innovation Enterprise for doing a very good job in pulling together a diverse line up of speakers, representing a balanced cross section of analytics/data science leaders at their respective companies. This does not always happen at data conferences. Listening to several of the talks throughout the first day was a good way to take the pulse of analytics integration across multiple industries. It was also good to hear that while many of the organizations represented are facing similar challenges, they are indeed bullish on the opportunities ahead of us. In addition, the talks served as a good sense-check to gauge how ‘data science’ has evolved; not as a discipline, but as a way of thinking about technology and analytics, and their integration into any type of commercial organization. Remember that Data Science and analytics are not just for the Bloomberg’s, the Google’s and the IBM’s of the world. Fortunately for me, my talk came at the end of the day, giving me the ability to weave some of what I heard throughout the day into the discussion. A few of the points that I stressed in my talk follow:
Let’s get past the ‘Data Science’ moniker. There is no one-size-fits-all data scientist. Just as a biologist can range from a molecular biologist working on genomic sequencing to a field ecologist working in wildlife biology, a data scientist can be many different things. Being cognizant of other skill sets that contribute to data science is not the same thing as being an expert in everything, which given the pace and scope of the tools and technologies involved in analysis and machine learning, is nothing short of impossible.
Domain expertise is undervalued. This is related to the point above, but goes further in that while many people can be fantastic technical specialists, understanding what they want to get out of the data, and translating the findings into something of value is not a simple thing to do. Also, the perspective that an individual can bring when looking to develop solutions in a team environment is important, as is creativity in the approach. Going to the bias coming from my entry into data science (mathematics & physics), I can almost look at the team that I want as a Complex Adaptive System, evolving along with profession and the relevant technologies. (See writings of Yaneer Bar-Yam of the New England Complex Systems Institute for more on this).
It is also important to be clear on the distinction between the roles of data science and business analyst. This is not in any way an attempt to diminish the value of an analyst, but the work of an analyst by definition does differ from the role of a scientist. A scientist needs to be looking at problems and asking questions of the data, constructing an environment that is amenable to machine learning, and needs to be thinking of scale. Many roles that fall under data science easily can default to an analyst.
Data Science/Analytics/Machine Learning is not a Zero-Sum Game. When new improvements ranging from insights to technologies enter the market, the entire ecosystem benefits.
Diversity & STEM. We do not only hire students right out of their masters or PhD program. IBM has been a very big proponent of expanding the opportunities to non-traditional candidates. For more on this, continue reading here.
The Wisdom of Crowds and Blind Faith in Models. I feel that this is one of the most important themes in data science today, and one which I will expand upon in future articles. Data science and machine learning are centered around deriving value from data and building quantitative solutions that (hopefully) have some predictive capacity. And we are getting very good at doing this. However, the misuse of models and trust that can be put into learning systems, without skepticism, can also be dangerous. Here, I can incorporate the many years that I spent (and still spend) in the commodities sector, where I built models that pulled diverse data to build market positions around risk. Every day at the end of the trading session, my scoreboard was the market. If my thesis was correct, there was confirmation, and if wrong, there was nowhere to hide. But this transparency is not always present in other disciplines that require decision support. What this should force data science/machine learning practitioners to do is to always be suspicious of models, even if they ‘know’ that they are accurate. There will always be something that will happen tomorrow that today’s model equations do not account for, so there are no true quantitative absolutes. It follows that an appropriate course should be to always anticipate that your model or predictive system will fail, and to then build into your structure of decisions a way to protect against the downside associated with bad recommendations.
This conversation will extend through more talks at upcoming conferences and forums. Thanks again to Meg Rimmer and the Innovation Enterprise staff for organizing an engaging and worthwhile event.