560 likes | 780 Vues
The Zen of Data Science. Eugene Dubossarsky Chief Data Scientist – Principal Founder –. Presentation Summary - Promised. -Key concepts, dos and don'ts of Data Science -Science and engineering : very different! - What are Data Scientists for?
E N D
The Zen of Data Science Eugene Dubossarsky Chief Data Scientist – Principal Founder –
Presentation Summary - Promised -Key concepts, dos and don'ts of Data Science -Science and engineering : very different! - What are Data Scientists for? - Where should Data Science sit in the business? - How should data science be measured, managed, planned? - Starting, nourishing and growing a successful Data Science function in your business skills and experience - Becoming an effective data scientist
Presentation Summary – But Actually More Like... Shameless self promotion Parables Metaphors Abstract Philosophical Stuff Surprises Challenges and Reframes You saying “This is relevant to my life how?”
Presentation Summary Tools vs Ideas – Science vs Technology Finding vs Building – Science and Engineering Engagement Exploration – a legitimate, vital and strategic business activity Intelligence – a business function Mastery Apprenticeship
The “Zen” bit The bare essence The kernel of truth The thing that isn't illusion The way (Tao) to enlightenment (Satori) Clarity and simplicity derived from meditation, possibly quite different to everyday experience
Parable 1: Getting Airports Wrong Everybody thinks that this is an airplane:
Parable 1: Getting Airports Wrong Imagine your job is to build an Airport You need to take the design of airplanes in to account. The only problem is:
Parable 1: Getting Airports Wrong This is what is called a “fundamental category error”. Anything done with this misconception in place will be a waste of time, money and resources. “Working around it”, and “being realistic about the client's expectations” is a bit beside the point.
Parable 1: Getting Airports Wrong Most people probably want to focus on the aerodynamics of the “airplane” as currently conceived, the buzz around technology to support such “airplanes” and may see this as being “business focused”, while more fundamental discussions would be seen as “negative”, “academic” or too “challenging”.
Parable 1: Getting Airports Wrong Nevertheless, getting the fundamental issue sorted out would seem to be the first order of business, no matter how abstract, controversial, politically inconvenient or offensive to some quarters, or how many people have built careers managing, selling and practicing in this paradigm.
Parable 1: Getting Airports Wrong Because... Uh.. Donkey ?
Data, Science, Tools and Definitions Data Scientist = “Hadoop Guy” ? “Guy Who Does Stuff with Data” ? Guy Who Does Stuff with Lots of Data ? Guy Who Does Stuff with Big Data ? Guy Who Does Stuff With Big Data That Sounds Cool or Businessy? (And what makes Data “Big” anyway?)
Science and Engineering Is there a difference ? What is it ? What is a “Data Engineer” ? What is a “non-Data Engineer” ?
Science and Engineering Are actually direct opposites Skills, positioning, personality types, appropriate management frameworks and place in the business are quite different. The confusion needs sorting out.
Now I've Lost You... That's not “realistic” - most “data scientists” are actually “engineers” by this framework ! That sounds too “technical”, “academic” or not “relevant to business”
Now I've Lost You... That's not “realistic” - most “data scientists” are actually “engineers” Yep. That sounds too “technical”, “academic” or “not relevant to business” Maybe, Too Bad and No
Engineering Start with an identified idea, end with a design Build or maintain something to pre-defined parameters Uncertainty is the enemy (time, budget, resources, performance)
Engineering Plans, Timeframes and Specifications, vs ongoing (loosely focused) discussion Delivers Products and pre-determined KPIs. The Unexpected is a (usually unwelcome) exception Works to milestones and a specification Engaged with operational and technical management
Engineers Outcomes are Things An Engineer may do more or less the same thing many times An Engineer performs “projects” and manages “processes” An engineer is managed according to tight requirements
Engineers easier to identify easier to manage easier to understand less stressful to deal with Easier to train more plentiful easier to recruit
Engineers And Data Data is a resource to move and manipulate Focus is on building and maintaining processes that do that Data is a “commodity” that flows through the system. The focus is on the system.
Science and Scientists Start with reality - derive new insights Uncertainty is your job “Projects” and “processes” are anathema, and people who manage them don't help Explore and Interrogate Data No two jobs are the same No job can be specified too tightly Findings are inherently uncertain, otherwise why bother ?
Scientists and Data Focused on The Data. Tools help but don't feature. Data is complex, an undiscovered country to explore. Data is not a commodity : it is complex, ever-changing and information rich
Scientists and The CEO Data is “The Last Frontier”, where dangers lurk and opportunities abound. The scientist is the guide. Objective is to Tell the Story of the Data, to someone who cares and matters (ideally CEO), preferably as part of an ongoing conversation
Science and Engineering Scientists help you identify new risks and opportunities, they provide transformational insights. Engineers make transformations tangible Scientists explore Engineers deliver and maintain The personality types are actually quite different
Science and Engineering There is a lot of crossover It is good to be skilled in both Many of the tools used are the same The distinction is not obvious to most outsiders The distinction is crucial
Why the Confusion? It's all “technical”, apparently It has the word “data” in it. Some vendors like it that way. Much of management likes it that way. Much of management is out of its depth And almost all of HR and recruting .
Science and Engineering Real Business Needs Both Pretend Business only needs Engineering (and maybe not even that) Science is crucial for real competition and risk Science is irrelevant otherwise Engineering is Delivery Science is Intelligence
The Intelligence Function – Where Data Science Should Sit in the Business? Absent in most “enterprises” Present informally in most real businesses A strategic, secret asset not to be bragged about or shared “Data” is not just structured, electronic, concerete or even conscious
The Intelligence Function Strategic, secret role Trusted, discreet, low-key advisor, mentor, guide A mix of Mr Spock, James Bond and Steve Jobs May guises, many names Well understood by militaries at war, and organisations with real challenges, risks and uncertainty Often next in line for CEO
The Intelligence Function – Where Data Science Should Sit in the Business Not IT Not Operations Right near the CEO Reporting directly, discretely, interactively Not managed by Prince2, waterfall or any other “project management” or “Business Analysis” methods Lean Startup, real Agile (see Manifesto) and OODA loop much more like it
Data Science and Analytics Today Insights or Process ? Tools or Outcomes ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today Insights or Process ? Tools or Outcomes ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Insights vs Process Insights CANNOT be the same each time. But Much of “Analytics” can Deriving value from predictive targeting is a repeatable, mechanical process. Deriving value from insights derived from the same model is not.
Insights vs Process Only one requires a scientist. Only one is valued by businesses that don't have real competitive, environmental and other change pressures.
Data Science and Analytics Today Insights or Process ? Tools ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Tools and Trinkets Is “Hadoop” really the most important thing on a “data scientist's resume ? Why or why not ? What is missing ?
Data Science and Analytics Today Insights or Process ? Tools ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Asset or Vanity ? Engaged or Disengaged ? Measured ?
Data Science and Analytics Today Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Engaged or Disengaged ? Measured ?
Value, Compliance or Vanity ? What would happen to the business if the analytics/data science/data mining function disappered overnight ? Who would care ? Why ? Why does the function exist in the business in the first place ? Science does not serve vanity well, and is not necessary for compliance.
Data Science and Analytics Today Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Leadership Engaged or Disengaged ? Measured ?
Engagement in Parables Is investing in data analytics like investing in stocks or investing in an education (or gym membership) ? If analytics was a taxi, does the CEO think the analytics function are car mechanics, drivers or tour guides, does he know, does he care ?
Engagement in Extremes Analytics in a hedge fund Analytics in a bank Basel II compliance analytics in a bank What are the KPIs ? Does the CEO personally care about them ? Can the organisation do without the analytics function ? Can the organisation afford the CEO ignoring the analytics function ?
Data Science and Analytics Today Insights or Process ? Tools or Science ? Transformation or BAU ? Value or Compliance ? Vital Asset or Vanity ? Leadership Engaged or Disengaged ? Measured ?
Measurement How many predictive analytics function in banking, telco, insurance etc are measured explicitly on improvement in predictive accuracy, with the CEO keeping an eye on this (retention, acquisition, risk, pricing models) ? How many know/care about the predictive accuracy of their competitors ?
Finding Data Scientists Data Scientists are part engineer, part enterpreneur and part hunter/gatherer – outcome focused explorers ! ADHD is an asset, personality profile is not typical corporate Communication skills and lateral thinking as important as technical skill Technical skills are DEEEEP, eclectic