What’s in a Ph.D. Degree: The Make of a Data Scientist


I recognized him when the young man came up to me, with other people around me after my lecture. Earlier in my lecture, he was sitting on the first row and watched attentively. He looked like a typical engineer you see everywhere in the Valley. He hesitated a little, then talked about his current job at a famous company (where I buy my phone from). He has been a software engineer for a while. “But I am excited about AI, and I want to become a data scientist. What should I do? Should I go back to school to get a Ph.D. or master’s in AI?” I paused for an answer. This is a common question engineers ask me these days.

Earlier I had a conversation with a friend who manages a team and tries to hire a data scientist. He asked, “Who should I hire? A new Ph.D. with no work experience or someone who has a master’s and a good amount of work experience?” Others have asked: “Is degree really that important for being a data scientist?”

The answer becomes clear when observing the job market. In the past 2 years, new computer science Ph.D.s, particularly those majoring in machine learning, are getting multiple job offers and are grabbed by large AI companies immediately. The demand is so high that the total compensation for a new Ph.D. graduate reaches $200k annually. Data scientists are in high demand, but especially in high demand are those with computer science (CS) Ph.D. degrees.

What’s in a Ph.D. degree? What are employers paying for with higher salary?

Having received my Ph.D. in AI and built Ph.D.-dominated AI teams, I can share my perspective on the uniqueness of Ph.D. training. This is by no means saying people without Ph.D. should feel left out. It is only to illustrate what a computer science Ph.D. training gives, so that you can make an informed decision when you hire someone or decide on your next career move.

It takes on average 5 years to finish a Ph.D. degree. The first 2 years are classes, and the next 3 years are research. I would say the class part is nothing exceptional, which gives you more knowledge. It is a Master’s degree level training at most. What’s unique about Ph.D. degree is the next 3 years after the classes are done. These are years of intensive research, and where a young student morphs into a researcher. This transformation of a Ph.D. is done through the following 3 activities:

  1. Publishing research papers
  2. Interacting in a research group
  3. Writing a Ph.D. thesis.

Let’s take a look at each of these activities, and the training they give.

1. Publishing research papers

A computer science Ph.D. student is expected to publish at least 3–4 papers by the time he or she graduates. Some people have written 10 papers, and some more prolific people can have 20. Such papers are typically published in conferences. (The major machine learning conferences are: NIPS, ICML, ICLR, KDD, and AAAI.) The paper acceptance rate of a top AI conference is around 20–25%. That means out of 2,000 paper submitted, only 400 are accepted. This fierce competition brings out the excellence in the published paper.

To be accepted for publication, a paper has to demonstrate the following property: (1) Innovative (2) Significance of the problem (3) Feasible solution — demonstrated through experimental results (4) Good writing (Clear, concise).

Being innovative means you propose a new method that no one has done before. For example, if you propose a new method that classify videos, your method has to be new. This requires an inquisitive mind, extensive reading on other people’s work, and deep thinking on all possible solutions. For AI researchers, this also means good training in mathematics and statistics so that you can understand the current work.

A researcher also has to have good judgment, in order to pick an important problem to solve. If you pick a problem that is trivial such as counting the number of a chairs in an image, it’s not that interesting. If you pick a problem that’s already solved, like identifying objects in an image, then you are wasting your time (your solution is not adding to the advance in knowledge). An important problem is a problem that has big impact and there are no good solutions. Reviewing the literature (all previous papers) helps you to select your topic. This forces you to read widely and attend conferences to keep updated. The topic selection reflects an inquisitive mind that is able to understands what’s feasible and what is not feasible.

After designing your solution, you run many experiments in the computer to show it is superior than all the other solutions known so far. If you are lucky to discover a good solution, you will then write a paper. Writing a research paper is putting together your thoughts, constructing a coherent solution, comparing them with the literature, and demonstrating the superiority of your solution. It takes on average 3 months to finish a good paper, which includes literature review, algorithm design, experiments and writing the paper.

Finally, you submit your paper for publication, typically to a conference. You then receive feedback from the reviewers. Here is where you will receive sharp opinions, not only on your content but also on your writing. Many times you have to revise your paper so that it is clearer, the literature review has to be comprehensive. Your writing has to be concise, and your logic has to be right.

Getting paper published is a rigorous and competitive process. Publishing sharpens one’s mind and forces a person to work hard, read widely and explore deeply. A good paper reflects curiosity, good judgement, persistence, and high energy. (This is why when I interview a data scientist candidate, I would review their publications.)

Writing a paper involves identifying a problem, designing innovative ways to solve the problem, doing experiments, and presenting the solution in a coherent way. All of these are required skills of a data scientist at workplace.

2. Interacting in a research group.

A Ph.D. student is supervised by one professor, who is the student’s advisor (on classes), financial supporter (for tuition and stipend), and a research partner in writing a paper.

Your advisor is your mentor. His rigor demands you to have rigorous approach. His high standard pushes you to a higher ground. In weekly meeting with my Ph.D. advisor, I discussed my ideas and got his feedback. I learned to sharpen my research, my mathematics and my argument on why my research is innovate. He also gave me idea on how to present, and where to look for literature.

A successful professor can get many grants and fund a large group. My advisor used to have 10 students in his group. Many other professors have similar number of students or more. Interacting with others in a research group is a learning ground. You observe how other people do research, how they get paper published, and how they present their ideas. In the weekly group meeting, you get to listen, to discuss and to share.

A large research group provides a stimulating environment where you are pushed for excellence. When fellow students publish in top conferences, you want to do the same thing. When they do internship in a top research lab, you want to do the same thing. The list goes on and on. When someone graduates and lands a good job, you have something to strive for.

I remembered one time a fellow group member came back from his internship at Microsoft research. He presented the idea of a recommender system. This was the time before Netflix. He talked about user-user similarity and other ways to infer user preference. It was fascinating, but I didn’t know where to apply it. (My own research was on a different subject.) Many years later, I led a data science team at PayPal building recommender systems for marketing. It was very easy for me to pick up this topic as the foundation was laid in graduate school.

3. Writing a Ph.D. thesis

The most important part of finishing a Ph.D. degree is writing a thesis. A Ph.D. thesis is typically more than 100 pages, requiring much more depth and breadth than a research paper that is merely 10 pages.

The requirement for a Ph.D. thesis is the same as a research paper: (1) Innovative (2) Significance of the problem (3) Feasibility of your solution (4) Good writing. But a thesis is in a much larger scale. If writing a paper is like building a house, writing a thesis is building a skyscraper. It takes 2–3 years to finish a thesis. This process builds endurance and perseverance.

In order to show that your research is innovative, you have to make sure that nobody has invented the method you are proposing. This requires reviewing all the publications in the world related to your work. This is called literature review. It typically takes 1 year to finish the literature review for a thesis.

By the time you finish literature review, you are getting clear about your thesis topic. The topic is a problem you want to solve that has not been solved by others. In addition, your have a vague idea you how you want to solve it.

Deciding on a thesis topic is a long process. You have to try and experiment. This is where some graduate students give up. How to demonstrate your idea is new? This is particularly tricky for students in social science and humanity domain, where it is hard to demonstrate novelty. Many humanity Ph.D. students take 6 or 7 years to graduate.

Fortunately for computer science students, publishing papers trains a person to do innovative work. Since each CS Ph.D. student is expected to publish by Year 2, they have to swim in the vast ocean of academic water and survive. By publishing papers, they get familiar with research methodology, and how to create new algorithms. Therefore the length of a computer science Ph.D. time is relative shorter. It is very common for CS students to incorporate their papers into their thesis, as long as these papers fit into 1 coherent theme.

The coherent theme is the real innovation of a thesis. This is the true value of Ph.D. training, where you can see the whole picture. It is consistent exploration of one topic. For example, before writing my Ph.D. thesis, I wrote papers of self-fulling bias in a multiagent environment, and I wrote papers on auction agents and how they act by learning about each other. I also experimented with agents playing soccer and wondered what would be their strategies. But what is the general theme that summarize these different topics? It’s about learning in a multiagent environment. Then what is the general theoretic framework that describes the interaction? I remember staying in the library, reading and thinking, making notes. I remember visiting another department, stumbling on a book on control theory and the work on Markov games. That was the framework I looked for. Those were quiet time, with no outside distraction or obligation.

Once we settle down on a coherent framework, and know that whatever you invent is general enough (application to different scenarios), the next question is: How to solve this general problem? You provide a new computing algorithm to solve this problem. Here is your unique invention. You demonstrate its soundness with experimental results along with mathematical proof. You also demonstrate its computational efficiency. The uniqueness of computer science Ph.D. vs other Ph.D. training is the emphasis on computing feasibility. In computer science, we care about time complexity (how long does it take to finish computing?), and space complexity (how much memory space required to store the date and intermediary results?). We have to show that whatever we invent can be implemented, runs fast and does not take excessive memory space.

When you are satisfied with your experimental results, you write up your thesis. Here you put all the work together into one piece. As a non-native speaker, I struggled with my English writing then. I remember the red inks on my thesis marked by my adviser, mostly on grammatical mistakes. I felt very embarrassed. Looking back today, I realize how it helped me to have a higher standard for good writing.

By the time you finish your thesis, you have endured the lonely graduate life (others already graduated and working), you have suffered through low income syndrome as a poor graduate student. You are secluded from the real world, staying in the academic ivory tower, for the purpose of attacking a big problem. It’s like a martial art master who hides inside a mountain practicing his moves, you are practicing the moves in your research thinking and the muscle of attacking a difficult problem.

By now your research ability has taken a leap from a year before. You have ascended to a mountain, and you can see the whole horizon of the research field. Like larvae turning into a butterfly, you are transformed into an independent researcher.

This is what a Ph.D. training gives you.

I hope I have peeled back a veil that mystifies many people. When you decide on going to a Ph.D. program or on hiring a Ph.D. graduate, you know what you are getting.

AI Frontiers Conference brings together AI thought leaders to showcase cutting-edge research and products. This year, our speakers include: Ilya Sutskever (Founder of OpenAI), Jay Yagnik (VP of Google AI), Kai-Fu Lee(CEO of Sinovation), Mario Munich (SVP of iRobot), Quoc Le (Google Brain), Pieter Abbeel (Professor of UC Berkeley) and more.

Buy tickets at aifrontiers.com. For question and media inquiry, please contact: info@aifrontiers.com