The most important part is Data Science’s application, all kinds of applications. Yes, you read it right, all kinds of applications, for example machine learning.
The Data Revolution
Around year 2010, with an abundance of data, it made it possible to train machines with a data driven approach rather than a knowledge driven approach. All the theoretical papers about recurring Neural Networks supporting vector machines became feasible. Something that can change the way we lived, how we experience things in the world. Deep learning is no longer an academic concept that lies in a thesis paper. It became a tangible, useful class of learning that would affect our everyday lives. So Machine Learning and AI dominated the media overshadowing every other aspect of Data Science like Exploratory Analysis, Metrics, Analytics, ETL, Experimentation, A/B testing and what was traditionally called Business Intelligence.
Data Science – the General Perception
So now, the general public thinks of data science as researchers focussed on machine learning and AI. But the industry is hiring Data Scientists as Analysts. So, there is a misalignment there. The reason for the misalignment is that yes, most of these scientists can probably work on more technical problem but big companies like Google, Facebook and Netflix have so many low hanging fruits to improve their products that they do not need to acquire any more machine learning or statistical knowledge to find these impacts in their analysis.
A good Data Scientist is not just about complex models
Being a good data scientist is not about how advanced your models are. It is about how much impact you can have on your work. You are not a data cruncher, you are a problem solver. You are a strategist. Companies will give you the most ambiguous and hard problems and they expect you to guide the company in the right direction.
A Data Scientist’s job starts with collecting data. This includes User generated content, instrumentation, sensors, external data and logging.
The next aspect of a Data Scientist’s role is to move or store this data. This involves the storage of unstructured data, flow of reliable data, infrastructure, ETL, pipelines and storage of structured data.
As you move up the required work for a Data Scientist, the next one is transforming or exploring. This particular set of work encompasses preparation, anomaly detection and cleaning.
Next in the hierarchy of work for a Data Scientist is Aggregation and Labelling of data. This work involves Metris, analytics, aggregates, segments, training data and features.
Learning and Optimizing forms the next set of work for Data Scientists. This set of work includes simple machine learning algorithms, A/B testing and experimentation.
At the top of the set is the most complex work of Data Scientists. It consists of Artificial Intelligence and Deep Learning,
All of this data engineering effort is very important and it is not just about creating complex models, there is a lot more to the job.