What profession did Harvard call the Sexiest Job of the 21st Century? That’s right… the data scientist.

Ah yes, the ever mysterious data scientist. So what exactly is the data scientist’s secret sauce, and what does this “sexy” person actually do at work every day?

While data scientists often come from many different educational and work experience backgrounds, most should be strong in, or in an ideal case be experts in four fundamental areas. In no particular order of priority or importance, these are:

    • Business/Domain
    • Mathematics (includes statistics and probability)
    • Computer science (e.g., software/data architecture and engineering)
    • Communication (both written and verbal)

Since computer programming is a large component, data scientists must be proficient with programming languages such as Python, R, SQL, Java, Julia, and Scala. Usually it’s not necessary to be an expert programmer in all of these, but Python or R, and SQL are definitely key.

For statistics, mathematics, algorithms, modeling, and data visualization, data scientists usually use pre-existing packages and libraries where possible. Some of the more popular Python-based ones include Scikit-learn, TensorFlow, PyTorch, Pandas, Numpy, and Matplotlib.

For reproducible research and reporting, data scientists commonly use notebooks and frameworks such as Jupyter and JupyterLab. These are very powerful in that the code and data can be delivered along with key results so that anyone can perform the same analysis, and build on it if desired.

More and more these days, data scientists should be able to utilize tools and technologies associated with big data as well. Some of the most popular examples include Hadoop, Spark, Kafka, Hive, Pig, Drill, Presto, and Mahout.

Data scientists should also know how to access and query many of the top RDBMS, NoSQL, and NewSQL database management systems. Some of the most common are MySQL, PostgreSQL, Redshift, Snowflake, MongoDB, Redis, Hadoop, and HBase.