Cassie Kozyrkov: “Everyone already has a foot in the data science profession”

September 25, 2018

IT Arena kicks off on Friday! The organizing team is busy finishing everything, but we continue to introduce you to our speakers. This time, we talked to Cassie Kozyrkov, Chief Decision Scientist at Google Cloud. Learn how the profession of a data scientist has evolved in the last decade, how data-driven organizations function and what are the differences between research AI/ML and applied AI/ML approaches.


Does your degree in cognitive neuroscience help in the Chief Decision Scientist job?

My area of expertise is neuroeconomics, otherwise known as the neuroscience of human decision-making, which sounds like something that should be right up a decision scientist’s alley… and it is.  Since I’m interested in the everything of decision-making, it helps to have a biological appreciation for the topic.

It seems your entire career you have been devoted to data science: how has the role of data science changed over the last 10 years in the technology world and in the everyday life?

I remember the jokes going around 10 years ago, back when statistics ruled the roost, for example: “How do you define a data scientist? It’s a statistician who lives in Silicon Valley.”  Or “How do you define data science? It’s a statistics on a Mac.” Today, we recognize it as something broader than statistics.

Data science wasn’t on such firm footing back then, but more recently it’s grown in popularity.

So much so that everyone seems to want to rebrand themselves into it.  With every ‘data scientist’ title I’ve held, I had already been doing the job under a different name before naming czars in HR applied a little nip-tuck to the employee database. My duties didn’t change in the slightest.  So it’s hot enough that today it sometimes feels like everyone wants in, and that’s definitely a change from a decade ago.

Another difference is that the work in data science has become much more vital.  More companies are collecting data and they’re demanding to make it useful. To me, that’s what data science is all about – it’s the discipline of making your data useful. As data grows, so does the need for data science.

Finally, now that there are more people in it, I’ve noticed how much more vibrant the community is. It’s an exciting space and I’m thrilled to be part of it!

How do you envision the nearest future of Data science?

Thanks to the academic researchers who have already done so much amazing work in developing general purpose tools, the rest of the community is able to do work that’s more about creatively applying the components made by researchers.  I think the nearest future will be about standing on the shoulders of giants. We’re moving towards an era of application. I also believe it will be an era of more diversity and inclusion in the field. Creativity thrives when we blend different perspectives.  After all, we’re so much more creative together than any one of us is alone.

Is data science something anyone can do?

Everyone – and I mean everyone – already has a foot in the data science profession.  Descriptive analytics (a.k.a. data-mining) is all about visualizing data and if you’ve opened a digital photo, you’ve visualized data. Getting started with that doesn’t take a special degree.  You’re already doing it. But if you want to talk about excellence in analytics, the key is speed. Beginners aren’t very quick at looking at a million photos because they haven’t learned the most convenient tools for that and they might not be sure how to look at something that isn’t a photo.  The good news is that you start out slow and as you learn, you get faster and faster at looking at different kinds of data. Go for it and have fun; analytics is something that everyone can get into.

To be a data scientist, as opposed to a data analyst, you need to be able to do the whole thing: data-mining plus the other two areas in data science (statistical inference and machine learning).  Machine learning is almost as easy as analytics, but statistics is heavy on philosophy, so it takes more time and effort to learn. I believe that everyone can become a full data scientist if they put in the hard work.  But will everyone have the time to do it? That’s a different question.

How is the academic approach different from the business approach to AI and ML?

I prefer to call these research AI/ML and applied AI/ML, since it’s possible to be in the business of research.  An analogy I often use is that the research side is about innovating in microwaves, while the applied side is about innovating in recipes.

In the research approach, the emphasis is how to make the algorithms (microwaves) better-faster-stronger or how apply them to whole new classes of problem. Usually, the dataset is not a primary focus because most researchers work with curated and pre-cleaned benchmark datasets. The business application is even less important because figuring out what to apply the shiny new algorithm to will be someone else’s problem.  Once you have your benchmark dataset and you know how well everyone else’s algorithms performed on it according to a standard metric for your academic focus area, you work hard to make a new algorithm that leaves those others in the dust. That’s more or less what the research game is all about in machine learning.

The applied approach, on the other hand, cares most about solving a problem that an organization has.  It isn’t about general or benchmark datasets, it’s about specific datasets a specific business has. What it means to get the job done is defined in terms of what matters to the leadership team of that specific business. It may or may not take into account computational efficiency and hardware compatibility.  The data are of paramount importance – they pose unique challenges and opportunities to that specific organization. They’re not a theoretical concept that will be someone else’s problem. They’re immediate. The algorithm, though, is only a means to an end. The business needs a system that works on their data and usually doesn’t care so much what algorithm wins.  If you’re business savvy, you’d adore finding out that a simple algorithm from a century ago does the job perfectly for you (it means the solution is cheap!) and you’re happy not to have to shell out piles of cash for a bespoke solution (you’d only do that if the simple options don’t work). Researchers, on the other hand, would be out of a job if the algorithm that performed best for their setting was last century’s old news.  Different perspectives indeed!

How the data-driven organization functions and what are the examples of data-driven organizations? How this approach helps organizations to be successful?  

In order for data to drive decisions, business leaders have to frame the decision before anyone examines the data. There can be mountain of setup work, especially in statistical inference (which is part of the testing step in AI/ML).  If decision-makers aren’t skilled and thorough at doing this initial work, the result will be that the organization’s decision-making will at best be data-inspired, not data-driven.  Data-driven organizations have leaders with the expertise and skills to transform information into better actions. Such organizations make higher quality decisions, control risk more reliably, and are more likely to succeed at building good ML/AI systems.

You dedicate quite a lot of time to public speaking. What is your mission and what is your main message to people?

The world is collecting more data than ever before.  With it, you can achieve things that would have seemed impossible even a few years ago.  I don’t want people to miss the opportunity to make their dreams real.

Literacy with data and the science of decision-making goes a long way towards making sure that people are the architects of their own bright futures. That’s something I’m really passionate about.  I hope to empower everyone to become data literate, to get value out of their data, and to do more with less. Now is the time to invite everyone in – this isn’t just for the experts and professors.  Let’s all be part of the conversation!

Your profile says you particularly enjoy arts. So do we 🙂 Please share your top 3 artworks, novels and performances.

  • (Artwork) The Entire City by Max Ernst

  • (Literature) Master and Margarita by Bulgakov

  • (Performing arts) Underneath by Pat Kinevane

If you still haven’t purchased IT Arena ticket, use your last chance:

You can visit Cassie Kozyrkov’s talk on September 29, 11:30-12:15 at Tech Track 1.