Chapter1.TheRolesofDataandPredictiveAnalyticsinBusiness.pptx
The Roles of Data and Predictive Analytics in Business
Chapter 1
© 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Learning Objectives
Explain how predictive analytics can help in business strategy formulation.
Distinguish structured from unstructured data.
Differentiate units of observation.
Outline a data-generating process.
Describe the primary ways that data analysis is used to aid business performance.
Discriminate between lead and lag information.
Discriminate between active and passive prediction.
Recognize questions pertaining to business strategy that may utilize (active) predictive analytics.
‹#›
© 2019 McGraw-Hill Education.
Defining Data & Data Uses in Business
Data
A collection of information
Database
Organized collection of data that firms use for analysis
Business analytics
The use of data analysis to aid in business decision making
Predictive analytics
The use of data analysis to designed to form predictions about future, unknown, events or outcomes
‹#›
© 2019 McGraw-Hill Education.
Business Strategy
Plan of action designed by a business practitioner to achieve a business objective
Business objectives include profit maximization, enhanced employee satisfaction, etc.
Examples of action include pricing decisions, advertising campaigns, and methods of employee compensation
‹#›
© 2019 McGraw-Hill Education.
Predictive Analytics for Business Strategy
With no data, strong theoretical model is often not enough to predict effective business strategies
Sound theoretical arguments coupled with data becomes a strong tool to predict effective business strategies
Predictive analytics is an ideal complement to create a successful business strategy
‹#›
© 2019 McGraw-Hill Education.
Data Features
Structured Data
Data with well-defined units of observation which can be classified and structured in the form of a spreadsheet.
An example:
‹#›
© 2019 McGraw-Hill Education.
Data Features
Unstructured Data
Any data that cannot be classified and structured.
An example:
‹#›
© 2019 McGraw-Hill Education.
The Unit of Observation
The entity for which information has been collected
Crucial component of structured data
Tells us the way the in which the information in a dataset varies
Answers the questions: What, Where, Who, When?
Four main groupings: cross-sectional data, pooled cross-sectional data, time-series data, and panel data
‹#›
© 2019 McGraw-Hill Education.
Data Types
Cross-Sectional Data
Data that provide a snapshot of information at one fixed point in time.
An example:
‹#›
© 2019 McGraw-Hill Education.
Data Types
Pooled Cross-Sectional Data
Combination of two or more unrelated cross-sectional data merged into one.
An example
‹#›
© 2019 McGraw-Hill Education.
More Data Types
Time-series data:
Data that exhibit only variation in time
An example
‹#›
© 2019 McGraw-Hill Education.
More Data Types
Panel data
Same cross-sectional units over multiple points in time
An example
‹#›
© 2019 McGraw-Hill Education.
Data Generating Process (DGP)
Data Generating Process
The underlying mechanism that produces the pieces of information contained in a dataset
Steps for DGP
Establish both formal and informal DGP
Understand what variables are important
Create a representative statistical model
Collect and analyze relevant variables and perform simple tests
‹#›
© 2019 McGraw-Hill Education.
Basic Uses of Data Analysis for Business
Categories include:
Queries
Pattern discovery
Causal inference
‹#›
© 2019 McGraw-Hill Education.
Queries
Any request for information from a database
Descriptive Statistics
Quantitative measures meant to summarize and interpret properties of a dataset
Pivot Table
A tool for data summarization that enables different views of the underlying dataset.
‹#›
© 2019 McGraw-Hill Education.
Pattern Discovery
Pattern
Any distinct relationship between observations within a dataset
Pattern discovery
The process of identifying distinctive relationships between observations in a dataset
Data mining
Pattern discovery, typically in large datasets
‹#›
© 2019 McGraw-Hill Education.
Pattern Discovery
Types of Pattern Discovery
Association analysis
Looking for conditional probabilities to determine relationships between two or more variables
Cluster analysis
Groups of observations according to some measure of similarity
Outlier detection
Small subsets of observations, if they exist, that contain information far different from the vast majority of the observations in the dataset
‹#›
© 2019 McGraw-Hill Education.
17
Examples of Pattern Discovery
Example of Outlier Detection and Cluster Analysis
Example of Association Analysis: Scatter Plot on Profit & Price
‹#›
© 2019 McGraw-Hill Education.
Causal Inference
The process of establishing a causal relationship between a variable(s)representing a cause and a variable(s) representing an effect, where a change in the cause variable results in changes in the effect variable
Causal Inference
Direct: A change in the causal variable, X, directly affects change variable Y
Indirect: A change in X causes a change in Y, but only through its impact on a third variable, Z
‹#›
© 2019 McGraw-Hill Education.
Use of Causal Inference
Causal inference occurs in two ways:
Causal Inference has two important applications
Using Experimentation
Econometric Models
Prediction
Campaign Evaluation
‹#›
© 2019 McGraw-Hill Education.
Data Analysis for the Past, Present, and Future
Lag information
Information about past outcomes
Typically contains information on key performance indicators (KPIs), or variables that are used to help measure firm performance
Designed to answer the question, “What happened/ What is happening?”
Lag information can be generated by queries, pattern discovery, and causal inference
‹#›
© 2019 McGraw-Hill Education.
Examples of Lag Information
Reports
Any structured presentation of the information in a dataset
Scorecards
Any structured assessment of variables of interest, typically KPIs, against a given benchmark
Dashboards
A graphical presentation of the current standing and historical trends for variables of interest, typically KPIs
‹#›
© 2019 McGraw-Hill Education.
Report Example
‹#›
© 2019 McGraw-Hill Education.
Dashboard Example
‹#›
© 2019 McGraw-Hill Education.
Scorecard Example
‹#›
© 2019 McGraw-Hill Education.
Lead Information
Lead Information
Information that provides insights about the future
Designed to answer the question, “What is going to happen?”
It helps firms in its future planning process with expectations and strategic moves.
Lead information is not generally presented in a standardized format
‹#›
© 2019 McGraw-Hill Education.
Predictive Analytics and Lead Information
Predictive analytics is data analysis designed to provide lead information
Two ways predictive analytics can predict the future
Active prediction
Passive Prediction
‹#›
© 2019 McGraw-Hill Education.
Passive Prediction
Passive prediction uses predictive analytics to make predictions based on actual or hypothetical data, where no variables are exogenously altered.
Exogenously altered – a variable in a dataset that changes due to factors outside the data-generating process that are independent of all other variables within the data-generating process
Examples: Weather forecasting, prediction about customers likely to drop service etc.
Pattern discovery (data mining) when used to make predictions, is generally used for passive predictions
Model fit – the basis on which analysts choose among competing models for passive prediction
‹#›
© 2019 McGraw-Hill Education.
Active Prediction
Active prediction uses predictive analytics to make predictions based on actual or hypothetical data, for which one or more variables are exogenously altered.
Making active predictions need causal relationship between variable ‘X’ and variable ‘Y’.
If change in X affects Y, this occurs due to a causal relationship between the two.
‹#›
© 2019 McGraw-Hill Education.
Active Prediction for Business Strategy Formation
Predicting an outcome for alternative strategies requires the application of active prediction
To accurately predict an outcome for a range of competing strategies, you must establish the causal effects of those strategies in that outcome
The leap from correlation to causality is a large one, and can lead to grossly incorrect predictions
‹#›
© 2019 McGraw-Hill Education.