Big data, data analytics, predictive analytics, data visualization … these are all emerging as critical skill sets at many companies. Ten years ago a lot of us were busy pulling together our data into data warehouses and layering on business intelligence systems. But what is happening today is going way beyond that.

The key themes

The business intelligence people have gone to start new companies based on data analytics. For example, some of the SAP team have gone on to start the HR analytics company Visier. Josh James, former CEO of Omniture, is now leading data integration giant Domo.

But not all of us are giants in the data analytics world staffed with mathematicians, data scientists and visualization experts. Yet we still need to be able to build a data team and provide data analysis services into our projects. So what skills should we be looking for?

Let’s begin with a quick look at the data science process. This helps organize how the skills fit together.

datascience

 

At TeamFit, we organize skills into technical skills, business skills and core skills. We also look for patterns on how skills are associated – for any skill there are associated skills and complementary skills.

A high-level skill map for data scientists might look something like this:

Get Data

So what skills should we be looking for as we search for data scientists?

Start with the technical skills. For data scientists that means math (I showed this to a data scientist who works with some of the companies in VentureLabs and he said that math is a core skill. An example of how skill categories depend on one’s personal skill map.) Some of the math skills needed for modern data analysis are

  • Linear algebra and optimization
  • Calculus
  • Probability and statistics
  • Nonlinear analysis

These will do for most mundane analysis, but once you get out into the wild, one might also want to be able to go deeper using disciplines like

  • Graph theory
  • Set theory
  • Category theory

Big data means dealing with data, lots of data, and that means data scientists have to be able to program or to know enough about programming to work closely with programmers. Think about this as getting data and organizing data before you can analyze data.

Getting data

  • ETL
  • Spidering
  • Scraping
  • API integration
  • SQL and SPARQL

Organizing data (and storing it as these two often go together)

  • Relational database data schema
  • RDF (and OWL if you want to add logical inference)
  • Distributed storage and processing like Hadoop (you are going to have a lot of data, right)
  • Non SQL databases like mongoDB
  • Graph databases and triple stores like neo4j

Analyzing data

  • SPSS
  • SAS
  • R
  • Python
  • … you can solve data analysis problems in most programming languages.

Presenting data

If you want to go deeper you can try to model causal relationships. Here the best approach is Bayesian causal networks as developed by Judea Pearl. The big math company Via Science implements this method.

And now the open question, do data analysts need domain knowledge?

A practical answer is “It is going to be hard to find the data scientists you need, so hire smart people and train them in your domain as needed.”

An alternative view is that “The less the data scientist knows about the domain the better, let the meaning of the data emerge from the data.”

But in most cases some understanding of the domain is needed in order to get the data for the analysis, organize it and then present the analysis in a meaningful way. So domain knowledge is a set of complementary skills that are needed for practical data analysis.

Some questions TeamFit is investigating are listed below. Any insights you can share from your own experience will be much appreciated. Send us an e-mail!

  • What firms are using data scientists?
  • How did they find them?
  • What projects do they work on?
  • Who do they work with?
  • What are the commentary skills that other people need in order to work effectively with data scientists?
  • What are the connector skills that help data scientists become effective team members?

Interested in TeamFit?

Individuals sign up here for free.

Companies, contact us to learn how we can help you make sure you have the skills to deliver on your strategy.

Image of the Butterfly Nebula taken by the Hubble Telescope.