Thursday, December 28, 2017

Building a Data Science team

Below is a proposal for data science team formation. I tried to articulate the functions that are not typically part of any data management team. That is what I think the gap between the existing ''Classical'' data management (aka Enterprise Information Management) team and the analytics empowered data team. That means none of the existing functions should be omitted. Just to add the following roles.
Data Science Team formation
1 Responsibilities
This team applies mathematical, statistical and Machine Learning methods on big data to solve strategic problems and explore strategic opportunities.
Its central focus is the customer so it studies in deep all customer interactions and all business actions that affect customer experience to achieve an objective measurement of customer experience and solid plans to enhance the customer experience throughout all his/her interactions with corporate resources and processes.
It helps business to understand precisely how corporate is performing highlighting the weaknesses and strengths and provide optimized solutions according to the corporate priorities and constraints.
It helps business to plan and allocate its precious resources optimally, efficiently and fairly to achieve its pioneer market position via its value to customers and shareholders as well.
It discovers the hidden patterns inside the business processes that utilize resources in an inefficient way and provide optimal alternatives for it.
It transforms the way business is planning and evaluating its activities by providing thorough quantitative analyses to all aspect of business.
It builds and maintains an intelligent platform to support business decision makers, all level of management and even customers to make the right decision on the right time. This platform utilizes the most advanced high performance computing and machine learning techniques to provide the most accurate yet the fastest advice to its audience.
This team must has a high influence in all data custodians throughout the whole business as the quality and resolution of data is a key success factor in delivering the value of this team's studies and research. Also a specialized data team is supposed to be dedicated to this team to help in all data integration and preparation tasks.
This team consists of two groups:
1- Data Scientist
2- Domain Experts / Strategic Analysts
2 Roles
2.1 Data Scientist
2.1.1 Responsibilities
The data scientist is the key element in the Data Science team. He/She applies the mathematical, statistical and machine learning methods to the big data using the high performance computing tools and techniques. He/She harvests the ideas and expertise of the domain experts and strategists to tailor specific models and algorithms to solve the problem at hand.
Data Scientist writes, use and execute the computational techniques needed to apply his/her algorithms to the data. He/She also manipulates the data using different data analysis techniques like SQL, Pig and Python to prepare it to suit the applied algorithms and tools.
Data Scientist uses different high performance computing methodologies like parallel and distributed computing as well as mathematical and statistical tools like R and Matlab in his/her daily work. He/She must master the scientific computing techniques and C/C++ and/or Java as well as the MapReduce patterns.
2.1.2 Skills
Basic Skills
Programming (C/C++, Java, R, Matlab)
Statistics
Machine Learning
Mathematical Modeling
Database Development
Scientific Computing
High Performance Computing
Big Data (Hadoop, Hive, Pig, Spark, etc.)
Educational and academic background
B.Sc in (Mathematics, Statistics, Physics, Computer Science) is a must
M.Sc is highly preferable
Work Experience
Working and successfully develop predictive and statistical models for many behavioral and business phenomena
2.2 Domain Expert / Strategic Analyst
2.2.1 Responsibilities
This expert carries deep understanding of his/her domain of expertise along with deep strategic analysis and analytical skills to the Data Science team. The domains of interest are specified by the priorities and projects of the team. This expert could work full time of just invited for a specific project or task. The presence of these experts guarantees the ultimate business value delivered to the corporate via the Data Science team.
The expert role starts from the definition of the study to the formulation of the results passing through all phases of the project. Different experts could be hired in the same project depends on the need and value they can bring in to the project.
Permanent expert could be hired in strategic domains like customer insight and network planning.
This expert will also play a consultation role to the corresponding business stack holders to help them understand and apply the research results to achieve the promised optimization and business benefits.
2.2.2 Skills
Basic Skills
High analytical Skills
Problem Solving
Critical Thinking
Creative Ideas
Research Skills
Updated with the state of the art practice

Work Experience
Working in a business environments is mandatory and a track record of achievements too.
10+ years of experience in the domain.

Friday, May 9, 2014

Enterprise Information Architecture



To understand the landscape of data we have to decompose it to make it simple to understand and work with. I classified it in layers. Each layer either identifies the way data is structured or the processes used to manipulate the data in this layer.

Raw Data Sources Layer


These sources are categorized in three types of sources:

1 – Databases of business applications that run the daily business and capture all business transactions like CRM, Billing, Ticketing, etc.  These types of data are always structured.

2 – Logs of monitoring systems like network traffic, call center , system logs. These types of data are normally unstructured.

3 – External Sources like social media and competition data.

Data Integration Layer


In this layer raw data from different sources is integrated together and transformed into a unified logical data model. Data quality measures are applied to the data in this layer to maintain accurate and consistent data. Data integration layer is a processing only layer which stores no data.

Core Data Layer


This layer is the most important layer and represents the corporate core data reservoir. It holds the data after processing it in the previous layer and makes it ready for the subsequent computing.

This layer consists of three systems:



1 – Master Data Management MDM

This system holds the unified and integrated version of main entities in the business. Customer profile is the first entity to be in this system. It can consolidate the customer profile and information from different systems those hold customer data (partially or full data) to support the decision maker to have a holistic view of the customer away from any operational system constraints. It provides a single and consistent version of customer data.

Product is the second entity that normally included in the MDM.


2 – Big Data

This is a Hadoop platform and used for massive parallel computing processes on structured and unstructured data.


3 – Enterprise Data Warehouse EDW

This system is the custodian of the unified logical data model that unifies all data from database sources and makes it ready for the subsequent layers.


Computational Layer



This layer is where intensive computing techniques are applied to the data held in the core data layer. This includes the following computing techniques:

1 – Multidimensional analysis which is the basis for BI cubes.

2 – Statistical Analysis

3 – Data Mining

4 – Real-Time Analytics (processing data streams as it is generated)

 Presentation and Visualization Layer



This is the delivery layer which presents the outcomes of the whole preceding layers to the business stake holder. It consists of the following:

1 – Reporting System which presents the detailed data reports to support the daily business users

2 – Analysis which is a typical BI cubes with advanced visualization tools

3 – Dashboard which support the management to review the business performance in a fast way.

4 – Predictive Models which are the outcome of the statistical analysis and data mining. It could be used by business stake holder for planning purposes or by other systems to apply these models in automatic decision making.

Governance Layer


This layer spans the data in its entire lifecycle whether it is in data stores or in processing layers. It controls how the data is accessed and stored. Also how and why it is processed.