Main
Excel Data Analysis: Modeling and Simulation
Excel Data Analysis: Modeling and Simulation
Hector Guerrero
This book offers a comprehensive and readable introduction to modern business and data analytics. It is based on the use of Excel, a tool that virtually all students and professionals have access to. The explanations are focused on understanding the techniques and their proper application, and are supplemented by a wealth of inchapter and endofchapter exercises. In addition to the general statistical methods, the book also includes Monte Carlo simulation and optimization. The second edition has been thoroughly revised: new topics, exercises and examples have been added, and the readability has been further improved. The book is primarily intended for students in business, economics and government, as well as professionals, who need a more rigorous introduction to business and data analytics – yet also need to learn the topic quickly and without overly academic explanations.
Categories:
Mathematics\\Optimization. Operations Research
Year:
2019
Edition:
2
Publisher:
Springer
Language:
english
Pages:
358
ISBN 13:
9783030012786
File:
PDF, 17.85 MB
Download (pdf, 17.85 MB)
Preview
 Open in Browser
 Checking other formats...
 Please login to your account first

Need help? Please read our short guide how to send a book to Kindle.
The file will be sent to your email address. It may take up to 15 minutes before you receive it.
The file will be sent to your Kindle account. It may takes up to 15 minutes before you received it.
Please note you need to add our NEW email km@bookmail.org to approved email addresses. Read more.
Please note you need to add our NEW email km@bookmail.org to approved email addresses. Read more.
You may be interested in
teacher
excellent materials for me，thanks a lot！
28 March 2019 (09:18)
sanda
usefull and incredible books
23 July 2019 (21:33)
dee
great minds, great people
07 August 2019 (12:45)
BIGSIAW
Excel Data Analysis_Modeling and Simulation, 2nd2019_(Hector Guerrero).pdf
pages: 358
pages: 358
27 August 2019 (05:58)
Philly
Thanks so much for sharing. Great useful book. God Bless
17 September 2019 (21:53)
nyalo
This is a wonderful site.Didn't know that there is a lot of information.
04 October 2019 (10:00)
MJ
very nice books and wonderful site
20 October 2019 (17:09)
حمدي المقري
thanks for giving the free books
02 November 2019 (09:10)
Daniel
Quite a resourceful website. Thanks very much
26 November 2019 (12:13)
laBiased
Amazing content,a researcher's partner
11 December 2019 (12:23)
Vishnu B
Thanks a lot for sharing all these knowledge, searching for these books is at very ease.. A perfect platform for those who seeking knowledge.
26 December 2019 (13:49)
Ross
Many thanks for the various materials...generous sharing of excellent books beyond the reach of those in third world countries. Best wishes
19 January 2020 (22:40)
You can write a book review and share your experiences. Other readers will always be interested in your opinion of the books you've read. Whether you've loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them.
1

2

Hector Guerrero Excel Data Analysis Modeling and Simulation Second Edition Excel Data Analysis Hector Guerrero Excel Data Analysis Modeling and Simulation Second Edition Hector Guerrero College of William & Mary Mason School of Business Williamsburg, VA, USA ISBN 9783030012786 ISBN 9783030012793 https://doi.org/10.1007/9783030012793 (eBook) Library of Congress Control Number: 2018958317 © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland To my parents . . ..Paco and Irene Preface Why Does the World Need—Excel Data Analysis, Modeling, and Simulation? When spreadsheets ﬁrst became widely available in the early 1980s, it spawned a revolution in teaching. What previously could only be done with arcane software and largescale computing was now available to the common man, on a desktop. Also, before spreadsheets, most substantial analytical work was done outside the classroom where the tools were; spreadsheets and personal computers moved the work into the classroom. Not only did it change how the data analysis curriculum was taught, but it also empowered students to venture out on their own to explore new ways to use the tools. I can’t tell you how many phone calls, ofﬁce visits, and/or emails I have received in my teaching career from ecstatic students crowing about what they have just done with a spreadsheet model. I have been teaching courses related to business and data analytics and modeling for over 40 years, and I have watched and participated in the spreadsheet revolution. During that time, I have been a witness to the following important observations: • Each successive year has led to more and more demand for Excelbased analysis and modeling skills, both from students, practitioners, and recruiters. • Excel has evolved as an ever more powerful suite of tools, functions, and capabilities, including the recent iteration and basis for this book—Excel 2013. • The ingenuity of Excel users to create applications and tools to deal with complex problems continues to amaze me. • Those students who preceded the spreadsheet revolution often ﬁnd themselves at a loss as to where to go for an introduction to what is commonly taught to most undergraduates in business and sciences. Each one of these observations has motivated me to write this book. The ﬁrst suggests that there is no foreseeable end to the demand for the skills that Excel enables; in fact, the need for continuing productivity in all economies guarantees that an individual with proﬁciency in spreadsheet analysis will be highly prized by an vii viii Preface organization. At a minimum, these skills permit you freedom from specialists that can delay or hold you captive while waiting for a solution. This was common in the early days of information technology (IT); you requested that the IT group provide you with a solution or tool and you waited, and waited, and waited. Today if you need a solution you can do it yourself. The combination of the second and third observations suggests that when you couple bright and energetic people with powerful tools and a good learning environment, wonderful things can happen. I have seen this throughout my teaching career, as well as in my consulting practice. The trick is to provide a teaching vehicle that makes the analysis accessible. My hope is that this book is such a teaching vehicle. I believe that there are three simple factors that facilitate learning—select examples that contain interesting questions, methodically lead students through the rationale of the analysis, and thoroughly explain the Excel tools to achieve the analysis. The last observation has fueled my desire to lend a hand to the many students who passed through the educational system before the spreadsheet analysis revolution: to provide them with a book that points them in the right direction. Several years ago, I encountered a former MBA student in a Cincinnati Airport bookstore. He explained to me that he was looking for a good Excelbased book on data analysis and modeling—“You know it’s been more than 20 years since I was in a Tuck School classroom, and I desperately need to understand what my interns seem to be able to do so easily.” By providing a broad variety of exemplary problems, from graphical/ statistical analysis to modeling/simulation to optimization, and the Excel tools to accomplish these analyses, most readers should be able to achieve success in their selfstudy attempts to master spreadsheet analysis. Besides a good compass, students also need to be made aware of the possible. It is not usual to hear from students “Can you use Excel to do this?” or “I didn’t know you could do that with Excel!” Who Beneﬁts from This Book? This book is targeted at the student or practitioner who is looking for a single introductory Excelbased resource that covers three essential business skills—data analysis, business modeling, and simulation. I have successfully used this material with undergraduates, MBAs, and executive MBAs and in executive education programs. For my students, the book has been the main teaching resource for both semester and halfsemester long courses. The examples used in the books are sufﬁciently ﬂexible to guide teaching goals in many directions. For executives, the book has served as a compliment to classroom lectures, as well as an excellent postprogram, selfstudy resource. Finally, I believe that it will serve practitioners, like that former student I met in Cincinnati, who have the desire and motivation to refurbish their understanding of data analysis, modeling, and simulation concepts through selfstudy. Preface ix Key Features of This Book I have used a number of examples in this book that I have developed over many years of teaching and consulting. Some are brief and to the point; others are more complex and require considerable effort to digest. I urge you to not become frustrated with the more complex examples. There is much to be learned from these examples, not only the analytical techniques, but also approaches to solving complex problems. These examples, as is always the case in real world, messy problems, require making reasonable assumptions and some concession to simpliﬁcation if a solution is to be obtained. My hope is that the approach will be as valuable to the reader as the analytical techniques. I have also taken great pains to provide an abundance of Excel screen shots that should give the reader a solid understanding of the chapter examples. But, let me vigorously warn you of one thing—this is not an Excel howto book. Excel howto books concentrate on the Excel tools and not on analysis—it is assumed that you will ﬁll in the analysis blanks. There are many excellent Excel howto books on the market and a number of excellent websites (e.g., MrExcel.com) where you can ﬁnd help with the details of speciﬁc Excel issues. I have attempted to write a book that is about analysis, analysis that can be easily and thoroughly handled with Excel. Keep this in mind as you proceed. So in summary, remember that the analysis is the primary focus and that Excel simply serves as an excellent vehicle by which to achieve the analysis. Second Edition The second edition of this book has updated to the current version of Excel, 2013. The additions and changes to Excel, since the ﬁrst publication of the book, have been signiﬁcant; thus, a revision was requested by many users. Additionally, topics have been extended for a more complete coverage. For example, in Chaps. 2–6 a more indepth discussion of statistical techniques (sampling, conﬁdence interval analysis, regression, and graphical analysis) is provided. Also, in numerous passages, changes have been made to provide greater ease of understanding. Williamsburg, VA, USA Hector Guerrero Acknowledgements I would like to thank the editorial staff of Springer for their invaluable support— Christian Rauscher and Barbara Bethke. Thanks to Ms. Elizabeth Bowman and Traci Walker for their invaluable editing effort over many years. Special thanks to the countless students I have taught over the years, particularly Bill Jelen, the World Wide Web’s Mr. Excel who made a believer out of me. Finally, thanks to my family and friends who provided support over the years. xi Contents 1 Introduction to Spreadsheet Modeling . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 What’s an MBA to do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Why Model Problems? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Why Model Decision Problems with Excel? . . . . . . . . . . . . . . . . 1.5 The Feng Shui of Spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 A Spreadsheet Makeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Julia’s Business Problem–A Very Uncertain Outcome . . . 1.6.2 Ram’s Critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Julia’s New and Improved Workbook . . . . . . . . . . . . . . . 1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 3 5 8 8 11 11 17 18 18 2 Presentation of Quantitative Data: Data Visualization . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Data Context and Data Orientation . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Data Preparation Advice . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Types of Charts and Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Ribbons and the Excel Menu System . . . . . . . . . . . . . . . . 2.4.2 Some Frequently Used Charts . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Speciﬁc Steps for Creating a Chart . . . . . . . . . . . . . . . . . . 2.5 An Example of Graphical Data Analysis and Presentation . . . . . . . 2.5.1 Example—Tere’s Budget for the 2nd Semester of College . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Collecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Summarizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Presenting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Some Final Practical Graphical Presentation Advice . . . . . . . . . . . 21 21 22 23 26 29 29 31 35 38 39 40 40 43 48 51 xiii xiv 3 4 Contents 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 56 56 Analysis of Quantitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 What Is Data Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Data Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Data Analysis for Two Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Time Series Data: Visual Analysis . . . . . . . . . . . . . . . . . . 3.4.2 CrossSectional Data: Visual Analysis . . . . . . . . . . . . . . . . 3.4.3 Analysis of Time Series Data: Descriptive Statistics . . . . . . 3.4.4 Analysis of CrossSectional Data: Descriptive Statistics . . . 3.5 Analysis of Time Series Data: Forecasting/Data Relationship Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Graphical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Other Forecasting Models . . . . . . . . . . . . . . . . . . . . . . . . 3.5.5 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Analysis of CrossSectional Data: Forecasting/Data Relationship Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 59 60 61 64 66 68 71 72 Presentation of Qualitative Data—Data Visualization . . . . . . . . . . . . 4.1 Introduction–What Is Qualitative Data? . . . . . . . . . . . . . . . . . . . . 4.2 Essentials of Effective Qualitative Data Presentation . . . . . . . . . . . 4.2.1 Planning for Data Presentation and Preparation . . . . . . . . . 4.3 Data Entry and Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Tools for Data Entry and Accuracy . . . . . . . . . . . . . . . . . . 4.3.2 Data Transposition to Fit Excel . . . . . . . . . . . . . . . . . . . . . 4.3.3 Data Conversion with the Logical IF . . . . . . . . . . . . . . . . . 4.3.4 Data Conversion of Text from Non–Excel Sources . . . . . . . 4.4 Data Queries with Sort, Filter, and Advanced Filter . . . . . . . . . . . . 4.4.1 Sorting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Filtering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Advanced Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 103 104 104 107 108 112 115 118 121 122 124 125 129 133 139 140 141 75 76 80 86 87 88 89 96 97 98 99 Contents xv 5 Analysis of Qualitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Essentials of Qualitative Data Analysis . . . . . . . . . . . . . . . . . . . . 5.2.1 Dealing with Data Errors . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 PivotChart or PivotTable Reports . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 PivotTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 PivotCharts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 TiendaMía.com Example: Question 1 . . . . . . . . . . . . . . . . . . . . . . 5.5 TiendaMía.com Example: Question 2 . . . . . . . . . . . . . . . . . . . . . . 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 145 147 147 151 151 153 164 168 169 175 175 176 6 Inferential Statistical Analysis of Data . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Let the Statistical Technique Fit the Data . . . . . . . . . . . . . . . . . . 6.3 χ2—ChiSquare Test of Independence for Categorical Data . . . . . 6.3.1 Tests of Hypothesis—Null and Alternative . . . . . . . . . . . 6.4 zTest and tTest of Categorical and Interval Data . . . . . . . . . . . . 6.5 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 zTest: 2 Sample Means . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Is There a Difference in Scores for SC Nonprisoners and EB Trained SC Prisoners? . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 tTest: Two Samples Unequal Variances . . . . . . . . . . . . . 6.5.4 Do Texas Prisoners Score Higher than Texas Nonprisoners? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.5 Do Prisoners Score Higher Than Nonprisoners Regardless of the State? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.6 How Do Scores Differ Among Prisoners of SC and Texas Before Special Training? . . . . . . . . . . . . . . . . . . . . . . . . 6.5.7 Does the EB Training Program Improve Prisoner Scores? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.8 What If the Observations Means Are the Same, But We Do Not See Consistent Movement of Scores? . . . . . . . . . . . . 6.5.9 Summary Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Conﬁdence Intervals for Sample Statistics . . . . . . . . . . . . . . . . . 6.6.1 What Are the Ingredients of a Conﬁdence Interval? . . . . . 6.6.2 A Conﬁdence Interval Example . . . . . . . . . . . . . . . . . . . 6.6.3 Single Sample Hypothesis Tests Are Similar to Conﬁdence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.1 ANOVA: Single Factor Example . . . . . . . . . . . . . . . . . . 6.7.2 Do the Mean Monthly Losses of Reefers Suggest That the Means Are Different for the Three Ports? . . . . . . . . . . . . 179 180 181 181 182 186 186 189 190 193 193 195 196 198 199 199 201 202 203 204 207 207 209 xvi Contents 6.8 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Randomized Complete Block Design Example . . . . . . . 6.8.2 Factorial Experimental Design Example . . . . . . . . . . . . 6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 213 216 219 221 221 7 Modeling and Simulation: Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 What Is a Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 How Do We Classify Models? . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 An Example of Deterministic Modeling . . . . . . . . . . . . . . . . . . . 7.3.1 A Preliminary Analysis of the Event . . . . . . . . . . . . . . . . 7.4 Understanding the Important Elements of a Model . . . . . . . . . . . 7.4.1 Premodeling or Design Phase . . . . . . . . . . . . . . . . . . . . 7.4.2 Modeling Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Resolution of Weather and Related Attendance . . . . . . . . 7.4.4 Attendees Play Games of Chance . . . . . . . . . . . . . . . . . . 7.4.5 Fr. Eﬁa’s Whatif Questions . . . . . . . . . . . . . . . . . . . . . . 7.4.6 Summary of OLPS Modeling Effort . . . . . . . . . . . . . . . . 7.5 Model Building with Excel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Controls from the Forms Control Tools . . . . . . . . . . . . . . 7.5.4 Option Buttons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.5 Scroll Bars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 225 227 229 231 232 235 236 236 240 241 243 244 245 246 248 255 256 259 261 261 262 8 Modeling and Simulation: Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Types of Simulation and Uncertainty . . . . . . . . . . . . . . . . . . . . . 8.2.1 Incorporating Uncertain Processes in Models . . . . . . . . . . 8.3 The Monte Carlo Sampling Methodology . . . . . . . . . . . . . . . . . . 8.3.1 Implementing Monte Carlo Simulation Methods . . . . . . . 8.3.2 A Word About Probability Distributions . . . . . . . . . . . . . 8.3.3 Modeling Arrivals with the Poisson Distribution . . . . . . . 8.3.4 VLOOKUP and HLOOKUP Functions . . . . . . . . . . . . . . 8.4 A Financial Example–Income Statement . . . . . . . . . . . . . . . . . . 8.5 An Operations Example–Autohaus . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Status of Autohaus Model . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 Building the Brain Worksheet . . . . . . . . . . . . . . . . . . . . . 8.5.3 Building the Calculation Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 265 267 267 268 269 274 278 280 282 286 291 292 294 Contents xvii 8.5.4 9 Variation in Approaches to Poisson Arrivals: Consideration of Modeling Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.5 Sufﬁcient Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.6 Building the Data Collection Worksheet . . . . . . . . . . . . . . 8.5.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 297 298 303 307 307 308 Solver, Scenarios, and Goal Seek Tools . . . . . . . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Solver–Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Example–York River Archaeology Budgeting . . . . . . . . . . . . . . . 9.3.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Formulation of YRA Problem . . . . . . . . . . . . . . . . . . . . . . 9.3.3 Preparing a Solver Worksheet . . . . . . . . . . . . . . . . . . . . . . 9.3.4 Using Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.5 Solver Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.6 Some Questions for YRA . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Example 1—Mortgage Interest Calculations . . . . . . . . . . . 9.4.2 Example 2—An Income Statement Analysis . . . . . . . . . . . 9.5 Goal Seek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Example 1—Goal Seek Applied to the PMT Cell . . . . . . . . 9.5.2 Example 2—Goal Seek Applied to the CUMIPMT Cell . . . 9.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 311 313 314 316 318 318 322 323 328 334 334 337 338 339 341 342 343 344 About the Author Hector Guerrero is a Professor Emeritus at Mason School of Business at the College of William and Mary, in Williamsburg, Virginia. He teaches in the areas of business analytics, decision making, statistics, operations, and business quantitative methods. He has previously taught at the Amos Tuck School of Business at Dartmouth College and the College of Business of the University of Notre Dame. He is well known among his students for his quest to bring clarity to complex decision problems. He earned a PhD in Operations and Systems Analysis at the University of Washington and a BS in Electrical Engineering and an MBA at the University of Texas. He has published scholarly work in the areas of operations management, product design, and catastrophic planning. Prior to entering academe, he worked as an engineer for Dow Chemical Company and Lockheed Missiles and Space Co. He is also very active in consulting and executive education with a wide variety of clients––U.S. Government, international ﬁrms, as well as many small and large U.S. manufacturing and service ﬁrms. It is not unusual to ﬁnd him relaxing on a quiet beach with a challenging Excel workbook and an excellent cabernet. xix Chapter 1 Introduction to Spreadsheet Modeling Contents 1.1 1.2 1.3 1.4 1.5 1.6 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What’s an MBA to do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Why Model Problems? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Why Model Decision Problems with Excel? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Feng Shui of Spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Spreadsheet Makeover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Julia’s Business Problem–A Very Uncertain Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Ram’s Critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Julia’s New and Improved Workbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 1 2 3 3 5 8 8 11 11 17 18 18 Introduction Spreadsheets have become as commonplace as calculators in data analysis and decision making. In this chapter, we explore the value and importance of building decisionmaking models with Excel. We also consider the characteristics that make spreadsheets useful, not only for ourselves, but for others with whom we collaborate. As with any tool, learning to use the tool effectively requires carefully conceived planning and practice; thus, we will terminate the chapter with an example of a poorly planned spreadsheet that is rehabilitated into a shining example of what a spreadsheet can be. Some texts provide you with very detailed and indepth explanations of the intricacies of Excel; this text opts to concentrate on the types of analysis and model building you can perform with Excel. The principal goal of this book is to provide you with an Excelcentric approach to solving problems and to do so with relatively simple and abbreviated examples. In other words, this book is for the individual that shouts, “I’m not interested in a 900page text, full of difﬁculttoremember CtlShiftF4R key stroke shortcuts! What I need is a good and instructive example of analytics, so I can solve this problem before I leave the ofﬁce tonight.” © Springer Nature Switzerland AG 2019 H. Guerrero, Excel Data Analysis, https://doi.org/10.1007/9783030012793_1 1 2 1 Introduction to Spreadsheet Modeling Finally, for many texts the introductory chapter is a “throwaway” to be read casually before getting to substantial material in the chapters that follow, but that is not the case for this chapter. It sets the stage for some important guidelines for constructing worksheets and workbooks that will be essential throughout the remaining chapters. I urge you to read this material carefully and consider the content seriously. Let’s begin by considering the following encounter between two graduate school classmates of the class of 1990. In it, we begin to answer the question that decisionmakers face as Excel becomes their standard for analysis and collaboration—How can I quickly and effectively learn the capabilities of this powerful tool, Excel? 1.2 What’s an MBA to do? It was late Friday afternoon when Julia Lopez received an unexpected phone call from an MBA classmate, Ram Das, whom she had not heard from in years. They both work in Washington, DC, and after the call, they agreed to meet at a coffee shop on Connecticut Avenue to catch up and discuss their career experiences. Ram: Julia, it’s great to see you. I don’t remember you looking as prosperous when we were struggling with our quantitative and computer classes in school. Julia: No kidding! In those days, I was just trying to keep up and survive. You don’t look any worse for wear yourself. Still doing that rocketscience analysis you loved in school? Ram: Yes, but it’s getting tougher to defend my status as a rocket scientist. This summer we hired an undergraduate intern that just blew us away. This kid could do any type of analysis we asked, and do it on one software plat form, Excel. Now my boss expects the same from me, but many years out of school, there is no way I have the training to equal that intern’s skills. Julia: Join the club. We had an intern we called the Excel Wonder Woman. I don’t know about you, but in the last few years, people are expecting more and better data analytic skills from MBAs. As a product manager, I’m expected to know as much about complex business analytics as I do about understanding my customers and markets. I even bought ﬁve or six books on business analytics with Excel. It’s just impossible to get through hundreds of pages of detailed keystrokes and tricks for using Excel, much less simultaneously understand the basics of the analytics. Who has the time to do it? Ram: I’d be satisﬁed with a brief, readable book that gives me a clear view of the kinds of things you can do with Excel, and just one straightforward example. Our intern was doing things that I would never have believed possible— analyzing qualitative data, querying databases, simulations, optimization, statistical analysis, collecting data on web pages (web crawling), you name it. It used to take me six separate software packages to do all those things. I would love to do it all in Excel, and I know that to some degree you can. 1.4 Why Model Decision Problems with Excel? 3 Julia: Just before I came over here, my boss dumped another project on my desk that he wants done in Excel. The Excel Wonder Woman convinced him that we ought to be building all our important analytical tools on Excel—Decision Support Systems she calls them. And, if I hear the term collaborative one more time, I’m going to explode! Ram: Julia, I should go, but let’s talk more about this. Maybe we can help each other learn more about the capabilities of Excel. Julia: Ram, this is exciting. Reminds me of our study group work in the MBA. This brief episode is occurring with uncomfortable frequency for many people in analytical and decisionmaking roles. Technology, in the form of desktop software and hardware, is becoming as much a part of daytoday business analytics as the concepts and techniques that have been with us for years. Although sometimes complex, the difﬁculty has not been in understanding analytical concepts and techniques, but more often, how to put them to use. For many individuals, if analytic software is available for modeling problems, it is often unfriendly and inﬂexible; if software is not available, then they are limited to solving baby problems that are generally of little practical interest. This is why Excel has become so valuable—it is easilymanaged technology. 1.3 Why Model Problems? It may appear to be trivial to ask why we model problems, but it is worth considering. Usually, there are at least two reasons for modeling problems—(1) if a problem has important ﬁnancial and organizational implications, then it deserves serious consideration, and models permit serious analytical investigation, and (2) on a very practical level, often we are directed by superiors to model a problem because they believe it is important. For a subordinate analyst, important problems generally call for more than a gratuitous “I think. . .” or “I feel. . .” to convincingly satisfy a superior’s probing questions. Increasingly, superiors are asking questions about decisions that require careful consideration of assumptions, and about the sensitivity of decision outcomes to possible changes in environmental conditions and the assumptions (sensitivity analysis). To deal with these questions, formality in decision making is a must; thus, we build models that can accommodate this higher degree of scrutiny. Ultimately, careful modeling can (and should) lead to better overall decision making. 1.4 Why Model Decision Problems with Excel? So, if the modeling of decision problems is important and necessary in our work, then what modeling tool(s) do we select? In recent years, there has been little doubt as to the answer of this question for most decision makers: Microsoft Excel. Excel is 4 1 Introduction to Spreadsheet Modeling the most pervasive, allpurpose and ﬁrststop modeling tool on the planet, due to its ease of use. It has a wealth of internal capability that continues to grow as each new version is introduced. Excel also resides in Microsoft Ofﬁce, a suite of similarly popular tools that permit interoperability. Finally, there are tremendous advantages to onestop shopping in the selection of a modeling tool (that is, a single tool with many capabilities). There is so much power and capability built into Excel that unless you have received very recent training in its latest capabilities, you might be unaware of the variety of modeling that is possible with Excel. Of course, there are occasions where advanced tools are required, but for most circumstances, Excel is sufﬁcient. Here is the ﬁrst layer of questions that decision makers should ask when considering Excel as tool: 1. What types of analysis are possible with Excel? 2. If my modeling effort requires multiple forms of analysis, can Excel handle the various techniques required? 3. If I commit to using Excel, will it be capable of handling new forms of analysis and a potential increase in the scale and complexity of my models? The general answer to these questions is that just about any analytical technique that you can conceive that ﬁts in the rowcolumn structure of spreadsheets can be modeled with Excel. Note that this is a very broad and bold statement. Obviously, if you are modeling phenomena related to high energy physics or theoretical mathematics, you are very likely to choose other modeling tools. Yet, for the individual looking to model business problems, Excel is a must, and that is why this book will be of value to you. More speciﬁcally, Table 1.1 provides a partial list of the types of analytical tools this book will address. When we ﬁrst conceptualize and plan to solve a decision problem, one of the ﬁrst considerations we face is which modeling tool or approach to use. There are business problems that are sufﬁciently unique and complex that will require a much more targeted and specialized modeling approach than Excel. Yet, most of us are involved with business problems that span a variety of problem areas—e.g. marketing issues that require qualitative database analysis, ﬁnance problems that require simulation of ﬁnancial statements, and risk analysis that requires the determination of risk proﬁles. Spreadsheets permit us to unify these analyses on a single modeling platform. This makes our modeling effort: (1) durable—a robust structure that can anticipate varied Table 1.1 Types of analysis this book will undertake Quantitative data visualization/presentation—Graphs and charts Quantitative data analysis—Summary statistics and data exploration and manipulation Qualitative data visualization/presentation—Pivot tables and Pivot charts Qualitative data analysis—Data tables, data queries, and data ﬁlters Advanced statistical analysis—Hypothesis testing, conﬁdence intervals, correlation analysis, and regression models Sensitivity analysis—Oneway, twoway, data tables, visualization/graphical presentation Optimization models and goal seek—Solver, optimizationconstrained/unconstrained Modeling uncertainty—Monte Carlo simulation, scenarios 1.5 The Feng Shui of Spreadsheets 5 future use, (2) ﬂexible—capable of adaptation as the problem changes and evolves, and (3) shareable—models that can be shared by a variety of individuals at many levels of the organization, all of whom are collaborating in the solution process of the problem. Additionally, the standard programming required for spreadsheets is easier to learn than other forms of sophisticated programming languages found in many modeling systems. Even so, Excel has anticipated the occasional need for more formal programming by providing a resident, powerful programming language, VBA (Visual Basic for Applications). The ubiquitous nature of Excel spreadsheets has led to serious academic research and investigation into their use and misuse. Under the general title of spreadsheet engineering, academics have begun to apply many of the important principles of software engineering to spreadsheets, attempting to achieve better modeling results: more useful models, fewer mistakes in programming, and a greater impact on decision making. The growth in the importance of this topic is evidence of the potentially high costs associated with poorlydesigned spreadsheets. In the next section, I address some best practices that will lead to superior everyday spreadsheet and workbook design, or good spreadsheet engineering and data analytics. Unlike some of the highlevel concepts of spreadsheet engineering, I provide very simple and speciﬁc guidance for spreadsheet development. My recommendations are aimed at daytoday users, and just as the ancient art of Feng Shui provides a sense of order and wellbeing in a building, public space, or home, these best practices can do the same for builders of spreadsheets. 1.5 The Feng Shui1 of Spreadsheets The initial development of a spreadsheet project should focus on two areas— (1) planning and organizing the problem to be modeled, and (2) use of general practices of good spreadsheet engineering. In this section, we focus on the latter. In succeeding chapters, we will deal with the former by presenting numerous forms of analysis that can be used to model business decisions. So, let us begin by presenting the following ﬁve best practices to consider when designing a spreadsheet model: A. Think workbooks not worksheets—Spare the worksheet; spoil the workbook. When spreadsheets were ﬁrst introduced, a workbook consisted of a single worksheet. Over time, spreadsheets have evolved into multiworksheet workbooks, with interconnectivity between worksheets and other workbooks and ﬁles. In workbooks that represent serious analytical effort, you should be conscious of not attempting to place too much information, data, or analysis on a single worksheet. Thus, I always include on separate worksheets: (1) an introductory or cover page with documentation that identiﬁes the purpose, authors, contact information, and 1 The ancient Chinese study of arrangement and location in one’s physical environment, currently very popular in ﬁelds of architecture and interior design. 6 1 Introduction to Spreadsheet Modeling intended use of the spreadsheet model and, (2) a table of contents providing users with a glimpse of how the workbook will proceed. In deciding on whether or not to include additional worksheets, it is important to ask yourself the following question—Does the addition of a worksheet make the workbook easier to view and use? If the answer is yes, then your course of action is clear. Yet, there is a cost to adding worksheets—extra worksheets lead to the use of extra computer memory for a workbook. Thus, it is always a good idea to avoid the inclusion of gratuitous worksheets which, regardless of their memory overhead cost, can be annoying to users. When in doubt, I cautiously decide in favor of adding a worksheet. B. Place variables and parameters in a central location—Every workbook needs a “Brain.” I deﬁne a workbook’s “Brain” as a central location for variables and parameters. Call it what you like—data center, variable depot, etc.—these values generally do not belong in cell formulas hidden from easy viewing. Why? If it is necessary to change a value that is used in the individual cell formulas of a worksheet, the change must be made in every cell containing the value. This idea can be generalized in the following concept: if you have a value that is used in numerous cell locations, and you anticipate the possibility of changing that value, then you should have the cells that use the value reference it at some central location (“Brain”). For example, if a speciﬁc interest or discount rate is used in many cell formulas and/or in many worksheets, you should locate that value in a single cell in the Brain to make a change in the value easier to manage. As we will see later, a Brain is also quite useful in conducting the sensitivity analysis for a model. C. Design workbook layout with users in mind—User friendliness and designer control. As the lead designer of the workbook, you should consider how you want others to interact with your workbook. User interaction should consider not only the ultimate end use of the workbook, but also the collaborative interaction by others involved in the workbook design and creation process. Here are some speciﬁc questions to consider that facilitate userfriendliness and designer control: a) What areas of the workbook will the end users be allowed to access when the design becomes ﬁxed? b) Should certain worksheets or ranges be hidden from users? c) What speciﬁc level of design interaction will collaborators be allowed? d) What speciﬁc worksheets and ranges will collaborators be allowed to access? Remember that your authority as lead designer extends to testing the workbook and determining how end users will employ the workbook. Therefore, not only do you need to exercise direction and control for the development process of the workbook, but also how it will be used. D. Document workbook content and development—Insert text and comments liberally. There is nothing more annoying than viewing a workbook that is incomprehensible. This can occur even in carefully designed spreadsheets. What leads to spreadsheets that are difﬁcult to comprehend? From the user perspective, the complexity of a workbook can be such that it may be necessary for you to provide explanatory documentation; otherwise, worksheet details and overall analytical approach can bewilder the user. Additionally, the designer often needs to provide 1.5 The Feng Shui of Spreadsheets 7 users and collaborators with perspective on how and why a workbook developed as it did—e.g. why were certain analytical approaches incorporated in the design, what assumptions were made, and what were the alternatives considered? You might view this as justiﬁcation, or defense, of the workbook design. There are a number of choices available for documentation: (1) text entered directly into cells, (2) naming cell ranges with descriptive titles (e.g. Revenue, Expenses, COGS, etc.), (3) explanatory text placed in text boxes, and (4) comments inserted into cells. I recommend the latter three approaches—text boxes for more detailed and longer explanations, range names to provide users with descriptive and understandable formulas, since these names will appear in cell formulas that reference them, and cell comments for quick and brief explanations. In later chapters, I will demonstrate each of these forms of documentation. E. Provide convenient workbook navigation— Beam me up Scotty! The ability to easily navigate around a welldesigned workbook is a must. This can be achieved using hyperlinks. Hyperlinks are convenient connections to other cell locations within a worksheet, to other worksheets in the same workbook, or to other workbooks or other ﬁles. Navigation is not only a convenience, but also it provides a form of control for the workbook designer. Navigation is integral to our discussion of “Design workbook layout with users in mind.” It permits control and inﬂuence over the user’s movement and access to the workbook. For example, in a serious spreadsheet project, it is essential to provide a table of contents on a single worksheet. The table of contents should contain a detailed list of the worksheets, a brief explanation of what is contained in the worksheet, and hyperlinks the user can use to access the various worksheets. Organizations that use spreadsheet analysis are constantly seeking ways to incorporate best practices into operations. By standardizing the ﬁve general practices, you provide valuable guidelines for designing workbooks that have a useful and enduring life. Additionally, standardization will lead to a common “structure and look” that allows decision makers to focus more directly on the modeling content of a workbook, rather than the noise often caused by poor design and layout. The ﬁve best practices are summarized in Table 1.2. Table 1.2 Five best practices for workbook deign A. Think workbooks, not worksheets—Spare the worksheet; spoil the workbook B. Place variables and parameters in a central location—Every workbook needs a Brain C. Design workbook layout with users in mind—User friendliness and designer control D. Document workbook content and development—Insert text and comments liberally E. Provide convenient workbook navigation—Beam me up Scotty 8 1.6 1 Introduction to Spreadsheet Modeling A Spreadsheet Makeover Now, let’s consider a speciﬁc problem that will allow us to apply the best practices we have discussed. Our friends, Julia and Ram, are meeting several weeks after their initial encounter. It is early Sunday afternoon, and they have just returned from running a 10km road race. The following discussion takes place after the run. Julia: Ram, you didn’t do badly on the run. Ram: Thanks, but you’re obviously being kind. I can’t keep up with you. I’m exhausted. Julia: Speaking of exhaustion, remember that project I told you my boss dumped on my desk? Well, I have a spreadsheet that I think does a pretty good job of solving the problem. Can you take a look at it? Ram: Sure. By the way, do you know that Professor Lopez from our MBA has written a book on spreadsheet analysis? The old guy did a pretty good job of it, too. I brought along a copy for you. Julia: Thanks. I remember him as being pretty good at simplifying some tough concepts. Ram: His ﬁrst chapter discusses a simple way to think about spreadsheet structure and workbook design—workbook Feng Shui, as he puts it. It’s actually ﬁve best practices to consider in workbook design. Julia: Maybe we can apply it to my spreadsheet? Ram: Yes, let’s do it! 1.6.1 Julia’s Business Problem–A Very Uncertain Outcome Julia works for a consultancy, Market Focus International (MFI), which advises ﬁrms on marketing to American, ethic markets—Hispanic Americans, Lebanese Americans, Chinese Americans, etc. One of her customers, MidAtlantic Foods Inc., a prominent food distributor in the MidAtlantic of the U.S., is considering the addition of a new product to their ethnic foods line—ﬂour tortillas.2 The ﬁrm is interested in a forecast of the ﬁnancial effect of adding ﬂour tortillas to their product lines. This is considered a controversial product line extension by some of the MidAtlantic’s management, so much so, that one of the executives has named the project A Very Uncertain Outcome. Julia has decided to perform a pro forma (forecasted or projected) proﬁt or loss analysis, with a relatively simple structure. The proﬁt or loss statement is one of the most important ﬁnancial statements in business. After interviews with the relevant individuals at the client ﬁrm, Julia assembles the important variable values and relationships that she will incorporate into her spreadsheet analysis. These values and relationships are shown in Fig. 1.1. The information collected reveals the considerable uncertainty involved in forecasting the success of the ﬂour tortilla 2 A tortilla is a form of ﬂat, unleavened bread popular in Mexico, Latin America, and the U.S. 1.6 A Spreadsheet Makeover 9 Fig. 1.1 A Very Uncertain Outcome data introduction. For example, the Sales Revenue (Sales Volume ∗ Average Unit Selling Price) forecast is based on three possible values of Sales Volume and three possible values of Average Unit Selling Price. This leads to nine (3 3) possible combinations of Sales Revenue. One combination of values leading to Sales Revenue is volume of 3.5 million units in sales and a selling unit price of $5, or Sales Revenue of $17.5 million. Another source of uncertainty is the percentage of the Sales Revenue used to calculate Costs of Goods Sold Expense, either 40% or 80% with equal probability of occurrence. Uncertainty in sales volume and sales price also affects the variable expenses. Volume driven and revenue driven variable expenses are also dependent on the uncertain outcomes of Sales Revenue and Sales Volume. Julia’s workbook appears in Figs. 1.2 and 1.3. These ﬁgures provide details on the cell formulas used in the calculations. Note that Fig. 1.2 consists of a single 10 1 Introduction to Spreadsheet Modeling Fig. 1.2 Julia’s initial workbook Fig. 1.3 Julia’s initial workbook with cell formulas shown worksheet comprised of a single forecasted Proﬁt or Loss scenario; that is, she has selected a single value for the uncertain variables (the most likely) for her calculations. The Sales Revenue in Fig. 1.3 is based on sales of 3.5 million units, the most likely value for volume, and a unit price of $5, the mean/average, of equally possible unit selling prices. 1.6 A Spreadsheet Makeover 11 Her calculation of Cost of Goods Sold Expense (COGS) is not quite as simple to determine. There are two equally possible percentages, 40% or 80%, that can be multiplied times the Sales Revenue to determine COGS. Rather than select one, she has decided to use a percentage value that is at the midpoint of the range, 60%. Thus, she has made some assumptions in her calculations that may need explanation to the client, yet there is no documentation of her reasons for this choice, or any other assumption. Additionally, in Fig. 1.3 the inﬂexibility of the workbook is apparent—all parameters and variables are imbedded in the workbook formulas; thus, if Julia wants to make changes to these assumed values, it will be difﬁcult to undertake. To make these changes quickly and accurately, it would be wiser to place these parameters in a central location—in a Brain—and have the cell formulas refer to this location. It is quite conceivable that the client will want to ask some whatif questions about her analysis. For example, what if the unit price range is changed from 4, 5 and 6 dollars to 3, 4, and 5 dollars; what if the most likely Sales Volume is raised to 4.5 million. Obviously, there are many more questions that Ram could ask before providing a formal critique of Julia’s workbook and analysis, a critique that is organized around the ﬁve best practices. Julia hopes that by sending the workbook to Ram, he will suggest changes to improve the workbook. 1.6.2 Ram’s Critique After considerable examination of the worksheet, Ram gives Julia his recommendations for a “spreadsheet makeover” in Table 1.3. He also makes some general analytical recommendations that he believes will improve the usefulness of the workbook. Ram has serious misgivings about her analytical approach. It does not, in his opinion, capture the substantial uncertainty of her A Very Uncertain Outcome problem. Although there are many possible avenues for improvement, it is important to provide Julia with rapid and actionable feedback; she has a deadline that must be met for the presentation of her analytical ﬁndings. His recommendations are organized in terms of the ﬁve best practices (P/A ¼ Practice A, etc.) in Table 1.3. 1.6.3 Julia’s New and Improved Workbook Julia’s initial reaction to Ram’s critique is a bit guarded. She wonders what added value will result from applying the best practices to workbook, and how the sophisticated analysis that Ram is suggesting will help the client’s decisionmaking. More importantly, she also wonders if she is capable of making the changes. Yet, she understands that the client is quite interested in the results of the analysis, and anything she can do to improve her ability to provide insight to this problem and of course, sell future consulting services, are worth considering carefully. With Ram’s critique in mind, she begins the process of rehabilitating the spreadsheet 12 1 Introduction to Spreadsheet Modeling Table 1.3 Makeover recommendations for Julia General Comment—I don’t believe that you have adequately captured the uncertainty associated with the problem. In all cases you have used a single value of a set, or distribution, of possible values—e.g. you use 3,500,000 as the Sales Volume. Although this is the most likely value, 2,000,000 and 5,000,000 have a combined probability of occurrence of 35% —a nontrivial probability of occurrence. By using the full range of possible values, you can provide the user with a view of the variability of the resulting “bottom line valueNet Income” in the form of a risk proﬁle. This requires randomly selecting (random sampling) values of the uncertain parameters from their stated distributions. You can do this through the use of the RAND() function in Excel, and repeating these experiments many times, say 100 times. This is known as Monte Carlo Simulation. (Chaps. 7 and 8 are devoted to this topic.) P/A—The Workbook is simply a single spreadsheet. Although it is possible that an analysis would only require a single spreadsheet, I don’t believe that it is sufﬁcient for this complex problem, and certainly the customer will expect a more complete and sophisticated analysis. —Modify the workbook to include more analysis, more documentation, and expanded presentation of results on separate worksheets. P/B—There are many instances where variables in this problem are imbedded in cell formulas (see Fig. 1.2 cell D3. . .3500000*5*0.6). The variables should have a separate worksheet location for quick access and presentation—a Brain. The cell formulas can then reference the cell location in the Brain to access the value of the variable or parameter. This will allow you to easily make changes in a single location and note the sensitivity of the model to these changes. If the client asks what if questions during your presentation of results, the current spreadsheet will be very difﬁcult to use. —Create a Brain worksheet in the workbook. P/C—The new layout that results from the changes I suggest should include several userfriendliness considerations— (1) create a table of contents, (2) place important analysis on separate worksheets, and (3) place the results of the analysis into a graph that provides a “risk proﬁle” of the problem results (see Fig. 1.7). Number (3) is related to a larger issue of appropriateness of analysis (see General Comment). P/D—Document the workbook to provide the user with information regarding the assumptions and form of analysis employed— Use text boxes to provide users with information on assumed values (Sales Volume, Average Selling Price, etc.), use cell comments to guide users to cells where the input of data can be performed, and name cell ranges so formulas reﬂect directly the operation being performed in the cell. P/E—Provide the user with navigation from the table of content to, and within, the various worksheets of the workbook— Insert hypertext links throughout the workbook. she has constructed by concentrating on three issues: reconsideration of the overall analysis to provide greater insight of the uncertainty, structuring and organizing the analysis within the new multiworksheet structure, and incorporating the ﬁve best practices to improve spreadsheet functionality. In reconsidering the analysis, Julia agrees that a singlepoint estimate of the P/L statement is severely limited in its potential to provide MidAtlantic Foods with a broad view of the uncertainty associated with the extension of the product line. A risk proﬁle, a distribution of the net income outcomes associated with the uncertain values of volume, price, and expenses, is a far more useful tool for this purpose. Thus, to create a risk proﬁle it will be necessary to perform the following: 1. place important input data on a single worksheet that can be referenced (“Brain”) 2. simulate the possible P/L outcomes on a single worksheet (“Analysis”) by randomly selecting values of uncertain factors 1.6 A Spreadsheet Makeover 13 Fig. 1.4 Improved workbook—Table of Contents 3. repeat the process numerous times––100 (an arbitrary choice for this example) 4. collect the data on a separate worksheet and present the data in a graphical format that provides the Risk Proﬁle for the simulation (“Rsk Prof and Data Coln”) This suggests two worksheets associated with the analysis (“Analysis” and “Rsk Prof and Data Coln”)). If we consider the additional worksheet for the location of important parameter values (“Brain”) and a location from which the user can navigate the multiple worksheets (“Table of Contents”), we are now up to a total of four worksheets. Additionally, Julia realizes that she must avoid the issues of inﬂexibility we discussed above in her initial workbook (Fig. 1.3). Finally, she is aware that she will have to automate the data collection process by creating a simple macro that generates simulated outcomes, captures the results, and stores 100 such results in worksheet. A macro is a computer program written in a simple language (VBA) that performs speciﬁc Excel programming tasks for the user, and it is beyond Julia’s capabilities. Ram has skill in creating macros and has volunteered to help her. Figure 1.4 presents the new four worksheet structure that Julia has settled on. Each of the colored tabs, a useful identiﬁcation feature, represents a worksheet. The worksheet displayed, “Ex 1.4 T of C”, is the Table of Contents. Note that the underlined text items in the table are hyperlinks that transfer you to the various worksheets. Moving the cursor over the link will permit you to click the link and then automatically transfer you to the speciﬁed location. Insertion of a hyperlink is 14 1 Introduction to Spreadsheet Modeling Fig. 1.5 Improved workbook—Brain performed by selecting the icon in the Insert menu bar that is represented by a globe and links of a chain (see the Insert menu tab in Fig. 1.4). When this Globe icon is selected, a dialog box will guide you to where you would like the link to transfer the cursor, including questions regarding whether the transfer will be to this or other worksheets, or even other workbooks or ﬁles. Note that this worksheet also provides documentation describing the project in a text box. In Fig. 1.5, Julia has created a Brain, which she has playfully entitled Señor (Mr.) Brain. We can see how data from her earlier spreadsheet (see Fig. 1.1) is carefully organized to permit direct and simple referencing by formulas in the Analysis worksheet. If the client should desire a change to any of the assumed parameters or variables, the Brain is the place to perform the change. Observing the sensitivity of the P/L outcomes to these changes is simply a matter of adjusting the relevant data elements in the Brain, and noting the new outcomes. Thus, Julia is prepared for the 1.6 A Spreadsheet Makeover 15 Fig. 1.6 Improved workbook—Analysis clients what if questions. In later chapters we will refer to this process as Sensitivity Analysis. The heart of the workbook, the Analysis worksheet in Fig. 1.6, simulates individual scenarios of P/L Net Income based on randomly generated values of uncertain parameters. The determination of these uncertain values occurs off the screen image in columns N, O, and P. The values of sales volume, sales price, and COGS percentage are selected randomly according to the speciﬁed distributions and used to calculate a Net Income. This can be thought of as a single scenario: a result based on a speciﬁc set of randomly selected variables. Then, the process is repeated to generate many P/L outcome scenarios. All of this is managed by the macro that automatically makes the random selection, calculates new Net Income, and records the Net Income to a worksheet called Data Collection Area. The appropriate number of scenarios, or iterations, for this process is a question of simulation design. It is important to select a number of scenarios that reﬂect accurately the full behavior of the Net Income. Too few scenarios may lead to unrepresentative results, and too many scenarios can be costly and tedious to collect. Note that the particular scenario in Fig. 1.6 shows a loss of $2.97 million. This is a very different result from her simple analysis in Fig. 1.2, where a proﬁt of over $1,000,000 was presented. (More discussion of the proper number of scenarios can be found in Chaps. 7 and 8.) In Fig. 1.7, GraphRisk proﬁle, simulation results (recorded in the data collection area and shown in Fig. 1.8) are arranged into a frequency distribution by using the Data Analysis tool (more on this tool in Chaps. 2, 3, 4, and 5) available in the Data Tab. A frequency distribution is determined from a sample of variable values and provides the number of scenarios that fall into a relatively narrow range of Net 16 1 Introduction to Spreadsheet Modeling Fig. 1.7 Improved workbook—GraphRisk proﬁle Fig. 1.8 Improved workbook—Data Collection Area Income performance; for example, a range from $1,000,000 to $1,100,000. By carefully selecting these ranges, also known as bins, and counting the scenarios falling in each, a proﬁle of outcomes can be presented graphically. We often refer to these graphs as Risk Proﬁles. This title is appropriate given that the client is presented with both the positive (higher net income) and negative (lower net income) risk associated with the adoption of the ﬂour tortilla product line. 1.7 Summary 17 It is now up to the client to take this information and apply some decision criteria to either accept or reject the product line. Those executives that are not predisposed to adopting the product line might concentrate on the negative potential outcomes. Note that in 26 of the 100 simulations, the P/L outcome is zero or a loss, and with a substantial downside risk—13 observations are losses of at least $16 million. This information can be gleaned from the risk proﬁle or the frequency distribution that underlies the risk proﬁle. Clearly, the information content of the risk proﬁle is far more revealing than Julia’s original calculation of a single proﬁt of $1,257,300, based on her selective use of speciﬁc parameter values. As a manager seeking as thorough of an analysis as possible, there is little doubt that they would prefer the risk proﬁle to the single scenario that Julia initially produced. 1.7 Summary This example is one that is relatively sophisticated for the casual or ﬁrsttime user of Excel. Do not worry if you do not understand every detail of the simulation. It is presented here to help us focus on how a simple analysis can be extended and how our best practices can improve the utility of a spreadsheet analysis. In later chapters, we will return to these types of models and you will see how such models can be constructed. It is easy to convince oneself of the lack of importance of an introductory chapter of a textbook, especially one that in later chapters focuses on relatively complex analytical issues. Most readers often skip an introduction or skim the material in a casual manner, preferring instead to get the “real meat of the book.” Yet, in my opinion, this chapter may be one of the most important chapters of this book. With an understanding of the important issues in spreadsheet design, you can turn an ineffective, cumbersome, and unfocused analysis into one that users will hail as an “analytical triumph.” Remember that spreadsheets are used by a variety of individuals in the organization—some at higher levels and some at lower levels. The design effort required to create a workbook that can easily be used by others and serve as a collaborative document by numerous colleagues is not an impossible goal to achieve, but it does require thoughtful planning and the application of a few simple, best practices. As we saw in our example, even the analysis of a relatively simple problem can be greatly enhanced by applying the ﬁve practices in Table 1.2. Of course, the signiﬁcant change in the analytical approach is also important, and the remaining chapters of the book are dedicated to these analytical topics. In the coming chapters, we will continue to apply the ﬁve practices and explore the numerous analytical techniques that are contained in Excel. For example, in the next four chapters, we examine the data analytic capabilities of Excel with quantitative (numerical—e.g. 2345.81 or 53%) and qualitative (categorical—e.g. registered voter vs. unregistered voter or individuals born in Beijing City vs. Chengdu City) data. We will also see how both quantitative and qualitative data can be presented in charts and tables to answer many important business questions; graphical data analysis, or data visualization, can be very persuasive in decisionmaking. 18 1 Introduction to Spreadsheet Modeling Key Terms Web Crawling Decision Support Systems Spreadsheet Engineering Best Practices Feng Shui User Friendliness Hyperlinks Pro Forma Uncertainty Whatif Monte Carlo Simulation Risk Proﬁle Macro VBA Sensitivity Analysis Data Visualization Problems and Exercises 1. Consider a workbook project that you or a colleague have developed in the past and apply the best practices of the Feng Shui of Spreadsheets to your old work book. Show both workbooks to a friend or colleague and ask which they prefer. 2. Create a workbook that has four worksheets—Table of Contents, Me, My Favorite Pet, and My Least Favorite Pet. Place hyperlinks on the Table of Contents to permit you to link to each of the pages and return to the Table of Contents. Insert a picture of yourself on the Me page and a picture of pets on the My Favorite Pet and My Least Favorite Pet page. Be creative and insert any text you like in text boxes explaining who you are and why these pets are your favorite and least favorite. 3. What is a risk proﬁle? How can it be used for decision making? 4. Explain to a classmate or colleague why Best Practices in creating workbooks and worksheets are important. Ask them if they have a favorite or any to add. 5. Advanced Problem—An investor is considering the purchase of one to three condominiums in the tropical paradise of Costa Rica. The investor has no intention of using the condo for her personal use, and is only concerned with the income producing capability that it will produce. After some discussion with a long time and real estate savvy resident of Costa Rica, the investor decides to perform a simple analysis of the operating proﬁt/loss based on the following information: Problems and Exercises Variable property Costa Fixed property costa Daily revenuea a 19 A Based on: Most likely monthly occupancy of 20 day 12 months per year operation 2000 Colones per occupancy day cost 3,000,000 B Based on: Most likely monthly occupancy of 25 day 12 months per year operation 1000 Colones per occupancy day cost 2,500,000 C Based on: Most likely monthly occupancy of 15 day 10 months per year operation 3500 Colones per occupancy day cost 4,500,000 33,800 26,000 78,000 All Cost and Revenues in Colones–520 Costa Rican Colones/US Dollar Additionally, the exchange rate may vary 15%, and the most likely occupancy days can vary from a low and high of 15–25, 20–30, and 10–20 for A, B, and C, respectively. Based on this information, create a workbook that determines the best case, most likely, and worse case annual cash ﬂows for each of the properties. Chapter 2 Presentation of Quantitative Data: Data Visualization Contents 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Data Context and Data Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Data Preparation Advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Types of Charts and Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Ribbons and the Excel Menu System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Some Frequently Used Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Speciﬁc Steps for Creating a Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 An Example of Graphical Data Analysis and Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Example—Tere’s Budget for the 2nd Semester of College . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Collecting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Summarizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Presenting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Some Final Practical Graphical Presentation Advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 21 22 23 26 29 29 31 35 38 39 40 40 43 48 51 55 56 56 Introduction We often think of data as being numerical values, and in business, those values are often stated in terms of units of currency (dollars, pesos, dinars, etc.). Although data in the form of currency are ubiquitous, it is quite easy to imagine other numerical units: percentages, counts in categories, units of sales, etc. This chapter, in conjunction with Chap. 3, discusses how we can best use Excel’s graphic capabilities to effectively present quantitative data (ratio and interval) to inform and inﬂuence an audience, whether it is in euros or some other quantitative measure. In Chaps. 4 and 5, we will acknowledge that not all data are numerical by focusing on qualitative (categorical/nominal or ordinal) data. The process of data gathering often produces a combination of data types, and throughout our discussions, it will be impossible to © Springer Nature Switzerland AG 2019 H. Guerrero, Excel Data Analysis, https://doi.org/10.1007/9783030012793_2 21 22 2 Presentation of Quantitative Data: Data Visualization ignore this fact: quantitative and qualitative data often occur together. Let us begin our study of data visualization. Unfortunately, the scope of this book does not permit in depth coverage of the data collection process, so I strongly suggest you consult a reference on data research methods before you begin a signiﬁcant data collection project. I will make some brief remarks about the planning and collection of data, but we will generally assume that data has been collected in an efﬁcient manner, and that the data accurately represents what is intended to be measured. Now, let us consider the essential ingredients of good data visualization and the issues that can make it either easy or difﬁcult to succeed. We will begin with a general discussion of data: how to classify it, and the context within which it exists. 2.2 Data Classiﬁcation Skilled data analysts spend a great deal of time and effort in planning a data collection effort. They begin by considering the type of data they can and will collect to meet their goals for the use of the data. Just as carpenters are careful in selecting their tools, so are analysts in their choice of data. You cannot expect a low precision tool to perform high precision work. The same is true for data. A good analyst is cognizant of the types of analyses they can perform on various categories of data. This is particularly true in statistical analysis, where there are often strict rules for the types of analyses that can be performed on various types of data. The standard characteristics that help us categorize data are presented in Table 2.1. Each successive category, nominal to ratio, permits greater measurement precision and more extensive statistical analysis. Thus, we can see from Table 2.1 that ratio data measurement is more precise than nominal data measurement. It is important to remember that all these forms of data, regardless of their classiﬁcation, are valuable, and we collect data in different forms by considering availability and our analysis goals. For example, nominal data are used in many marketing studies, while ratio data are more often the tools of ﬁnance, operations, and economics; yet, all business functions collect data in each of these categories. For nominal and ordinal data, we use nonmetric measurement scales in the form of categorical properties or attributes. Interval and ratio data are based on metric measurement scales allowing a wide variety of mathematical operations to be performed on the data. The major difference between interval and ratio measurement scales is the existence of an absolute zero for ratio scales, and arbitrary zero points for interval scales. For example, consider a comparison of the Fahrenheit and Celsius temperature scales. The zero points for these scales are arbitrarily set and do not indicate an “absolute absence” of temperature. Similarly, it is incorrect to suggest that 40 C is half as hot as 80 C. By contrast, it can be said that 16 ounces of coffee are, in fact, twice as heavy as 8 ounces. Ultimately, the ratio scale has the highest information content of any of the measurement scales. 2.3 Data Context and Data Orientation 23 Table 2.1 Data categorization Data Nominal or categorical data Description Data that can be placed into mutually exclusive categories Ordinal data Data are ordered or ranked according to some characteristic Interval data Data characterized and ordered by a speciﬁc distance between each observation, but having no natural zero Data that have a natural zero Ratio data Properties Quantitative relationships among and between data are meaningless and descriptive statistics are meaningless Categories can be compared to one another, but the difference in categories is generally meaningless and calculating averages is suspect Ratios are meaningless, thus 15 C is not half as warm as 30 C These data have both ratios and differences that are meaningful Examples Country in which you were born, a geographic region, your gender— These are either/or categories Ranking breakfast cereals—Preferring cereal X more than Y implies nothing about how much more you like one versus the other The Fahrenheit (or Celsius) temperature scale or consumer survey scales that are speciﬁed to be interval scales Sales revenue, time to perform a task, length, or weight Just as thorough problem deﬁnition is essential to problemsolving, careful selection of appropriate data categories is essential in a data collection effort. Data collection is an arduous and often costly task, so why not carefully plan for the use of the data prior to its collection? Additionally, remember that there are few things that will anger a cost conscious superior more than the news that you have to repeat a data collection effort. 2.3 Data Context and Data Orientation The data that we collect and assemble for presentation purposes exists in a unique data context: a set of conditions (an environment) related to the data. This context is important to our understanding of the data. We relate data to time (e.g. daily, quarterly, yearly, etc.), to categorical treatments (e.g. an economic downturn, sales in Europe, etc.), and to events (e.g. sales promotions, demographic changes, etc.). Just as we record the values of quantitative data, we also record the context of data, such as revenue generated by product A, in quarter B, due to salesperson C, in sales territory D. Thus, associated with the quantitative data element that we record are numerous other important data elements that may, or may not, be quantitative. Sometimes the context is obvious, sometimes the context is complex and difﬁcult to identify, and oftentimes, there is more than a single context that is essential to consider. Without an understanding of the data context, important insights related to the data can be lost. To make matters worse, the context related to the data may 24 2 Presentation of Quantitative Data: Data Visualization change or reveal itself only after substantial time has passed. For example, consider data that indicates a substantial loss of value in your stock portfolio, recorded from 1990 to 2008. If the only context that is considered is time, it is possible to ignore a host of important contextual issues—e.g. the bursting of the dotcom bubble of the late 1990s. Without knowledge of this contextual event, you may simply conclude that you are a poor stock picker. It is impossible to anticipate all the elements of data context that should be collected, but whatever data we collect should be sufﬁcient to provide a context that suits our needs and goals. If I am interested in promoting the idea that the revenues of my business are growing over time and growing only in selected product categories, I will assemble timeoriented revenue data for the various products of interest. Thus, the related dimensions of my revenue data are time and product. There may also be an economic context, such as demographic conditions that may inﬂuence types of sales. Determining the contextual dimensions that are important will inﬂuence what data we collect and how we present it. Additionally, you can save a great deal of effort and afterthefact data adjustment by carefully considering, in advance, the various dimensions that you will need. Consider the owner of a small business that is interested in recording expenses in a variety of accounts for cash ﬂow management, income statement preparation, and tax purposes. This is an important activity for any small business. Cash ﬂow is the lifeblood of these businesses, and if it is not managed well, the results can be catastrophic. Each time the business owner incurs an expense, he either collects a receipt (upon ﬁnal payment) or an invoice (a request for payment). Additionally, suppliers to small businesses often request a deposit that represents a form of partial payment and a commitment to the services provided by the supplier. An example of these data is shown in the worksheet in Table 2.2. Each of the primary data entries, referred to as records, contain several important and diverse dimensions, referred to as ﬁelds—date, account, amount, nature of the expense, names, a comment, etc. A record represents a single observation of the collected data ﬁelds, as in item 3 (printing on 1/5/2004) of Table 2.2. This record contains 7 ﬁelds—Item 1, Printing, $2543.21, etc.—and each record is a row in the worksheet. Somewhere in our business owner’s ofﬁce is an old shoebox that is the ﬁnal resting place for his primary data. It is ﬁlled with scraps of paper: invoices and receipts. At the end of each week, our businessperson empties the box and records what he believes to be the important elements of each receipt or invoice. Table 2.2 is an example of the type of data that the owner might collect from the receipts and invoices over time. The receipts and invoices can contain more data than needs to be recorded or used for analysis and decision making. The dilemma that the owner faces is the amount and type of data to record in the worksheet; recording too much data can lead to wasted effort and neglect of other important activities, and recording too little data can lead to overlooking important business issues. What advice can we provide our businessperson that might make his efforts in collecting, assembling, and recording data more useful and efﬁcient? Below I provide several guidelines that can make the process of planning for a data collection effort more straightforward. 2.3 Data Context and Data Orientation 25 Table 2.2 Payment example Item Account 1 Ofﬁce supply 2 Ofﬁce supply 3 Printing 4 Cleaning Service 5 Coffee Service 6 Ofﬁce supply 7 Printing 8 Ofﬁce supply 9 Ofﬁce rent 10 Fire insurance 11 Cleaning Service 12 Orphan’s Fund 13 Ofﬁce supply 14 Printing 15 Coffee Service 16 Cleaning Service 17 Printing 18 Ofﬁce supply 19 Ofﬁce supply 20 Ofﬁce supply 21 Ofﬁce rent 22 Police fund 23 Printing 24 Printing 25 Entertaining 26 Orphan’s Fund 27 Ofﬁce supply 28 Ofﬁce supply 29 Ofﬁce supply 30 Coffee Service 31 Ofﬁce supply 32 Cleaning $ Amount $123.45 $54.40 $2543.21 $78.83 $56.92 Date Rcvd. 1/2/2004 1/5/2004 1/5/2004 1/8/2004 1/9/2004 Deposit $10.00 $0.00 $350.00 $0.00 Days to pay 0 0 45 15 Comment Project X Project Y Feb. brochure Monthly $0.00 15 Monthly Project X Hand bills Computer Monthly Quarterly Water damage $914.22 $755.00 $478.88 $1632.00 $1254.73 $135.64 1/12/2004 1/13/2004 1/16/2004 1/19/2004 1/22/2004 1/22/2004 $100.00 $50.00 $50.00 $0.00 $0.00 $0.00 30 30 30 15 60 15 $300.00 1/27/2004 $0.00 0 $343.78 $2211.82 $56.92 1/30/2004 2/4/2004 2/5/2004 $100.00 $350.00 $0.00 15 45 15 Laser printer Mar. brochure Monthly $78.83 2/10/2004 $0.00 15 Monthly $254.17 $412.19 $1467.44 $221.52 $1632.00 $250.00 $87.34 $94.12 $298.32 $300.00 2/12/2004 2/12/2004 2/13/2004 2/16/2004 2/18/2004 2/19/2004 2/23/2004 2/23/2004 2/26/2004 2/27/2004 $50.00 $50.00 $150.00 $50.00 $0.00 $0.00 $25.00 $25.00 $0.00 $0.00 15 30 30 15 15 15 0 0 0 0 Hand bills Project Y Project W Project X Monthly Charity Posters Posters Project Y Charity $1669.76 $1111.02 $76.21 $56.92 3/1/2004 3/2/2004 3/4/2004 3/5/2004 $150.00 $150.00 $25.00 $0.00 45 30 0 15 Project Z Project W Project W Monthly $914.22 $78.83 3/8/2004 3/9/2004 $100.00 $0.00 30 15 Project X Monthly Charity∗ (continued) 26 2 Presentation of Quantitative Data: Data Visualization Table 2.2 (continued) Item Account Service 33 Printing 34 Ofﬁce supply 35 Ofﬁce rent 36 Police fund 37 Ofﬁce supply 38 Ofﬁce supply 39 Orphan’s Fund 2.3.1 $ Amount Date Rcvd. Deposit Days to pay Comment $455.10 $1572.31 $1632.00 $250.00 $642.11 $712.16 $300.00 3/12/2002 3/15/2002 3/17/2002 3/23/2002 3/26/2002 3/29/2002 3/29/2002 $100.00 $150.00 $0.00 $0.00 $100.00 $100.00 $0.00 15 45 15 15 30 30 0 Hand bills Project Y Monthly Charity Project W Project Z Charity Data Preparation Advice 1. Not all data are created equal—Spend some time and effort considering the category of data (nominal, ratio, etc.) that you will collect and how you will use it. Do you have choices in the categorical type of data you can collect? How will you use the data in analysis and presentation? 2. More is better—If you are uncertain of the speciﬁc dimensions of a data observation that you will need for analysis, err on the side of recording a greater number of dimensions (more information on the context). It is easier not to use collected data than it is to add the uncollected data later. Adding data later can be costly and assumes that you will be able to locate it, which may be difﬁcult or impossible. 3. More is not better—If you can communicate what you need to communicate with less data, then by all means do so. Bloated databases can lead to distractions and misunderstanding. With new computer memory technology, the cost of data storage is declining rapidly, but there is still a cost to data entry, storage, verifying data accuracy, and achieving records for long periods of time. 4. Keep it simple and columnar—Select a simple, unique title for each data dimension or ﬁeld (e.g. Revenue, Address, etc.) and record the data in a column, with each row representing a record, or observation, of recorded data. Each column (ﬁeld) represents a different dimension of the data. Table 2.2 is a good example of columnar data entry for seven data ﬁelds. 5. Comments are useful—It may be wise to place a miscellaneous dimension or ﬁeld reserved for a variety of written observations—a comment ﬁeld. Be careful! Because of their unique nature, comments are often difﬁcult, if not impossible, to query via structured database query languages (SQL). Try to pick key words for entry (overdue, lost sale, etc.) if you plan to later query the ﬁeld. 6. Consistency in category titles—Although you may not consider that there is a signiﬁcant difference between the category titles Deposit and $Deposit, Excel will view them as completely distinct ﬁeld titles. Excel is not capable of understanding that the terms may be synonymous in your mind. 2.3 Data Context and Data Orientation 27 Let’s examine Table 2.2 in light of the data preparation advice we have just received, but ﬁrst, let’s take a look at a typical invoice and the data that it might contain. Figure 2.1 shows an invoice for ofﬁce supply items purchased at Hamm Ofﬁce Supply, Inc. Note the amount of data that this generic invoice (an MS Ofﬁce Template) contains is quite substantial: approximately 20 ﬁelds. Of course, some of the data are only of marginal value, such as our address—we know that the invoice was intended for our ﬁrm, and we know where we are located. Yet, it is veriﬁcation that the Hamm invoice is in fact intended for our ﬁrm. Notice that each line item in the invoice will require multiple item entries—qty (quantity), description, unit price, and total. Given the potential for large quantities of data, it would be wise to consider a relational database, such as MS Access, to optimize data entry effort. Of course, even if the data are stored in a relational database, that does not Invoice No. AB1234 INVOICE Customer Misc. Name Address City Phone State Qty Date Order No. Rep FOB ZIP Description Unit Price TOTAL SubTotal Shipping Select One… Comments Name CC # Expires Fig. 2.1 Generic invoice Tax Rate(s) TOTAL Office Use Only 28 2 Presentation of Quantitative Data: Data Visualization restrict us from using Excel to analyze the data by downloading it from Access to Excel; in fact, this is a wonderful advantage of the Ofﬁce suite. Now, let’s examine the data in Table 2.2, in light of our advice: 1. Not all data are created equal—Our businessperson has assembled a variety of data dimensions or ﬁelds to provide the central data element ($ Amount) with ample context and orientation. The seven ﬁelds that comprise each record appear to be sufﬁcient for the businessperson’s goal of recording the expenses and describing the context associated with his business operation. This includes recording each expense to ultimately calculate annual proﬁt or loss, tracking expenses associated with projects or other uses of funds (e.g. charity), and the timing of expenses (Date Rcvd., Days to Pay, etc.) and subsequent cash ﬂow. If the businessperson expands his examination of the transactions, some data may be missing (for example, Order Number or Shipping Cost). Only the future will reveal if these data elements will become important, but for now, these data are not collected in the spreadsheet. 2. More is better—The data elements that our businessperson has selected may not all be used in our graphical presentation, but this could change in the future. It is better to collect more data initially than to perform an extensive collection of data at a later date. The invoices and scraps of paper representing primary data may be difﬁcult to ﬁnd or identify in 3 months. 3. More is not better—Our businessperson has carefully selected the data that he feels is necessary, without creating excessive data entry effort. 4. Keep it simple and columnar—Unique and simple titles for the various data dimensions (e.g. Account, Date Rcvd., etc.) have been selected and arranged in columnar fashion. Adding, inserting, or deleting a column is virtually costless for an Excel user, skilled or unskilled. 5. Comments are useful—The Comment ﬁeld has been designated for the speciﬁc project (e.g. Project X), source item (e.g. Computer), or other important information (e.g. Monthly charge). If any criticism can be made here, it is that maybe these data elements deserve a title other than Comment. For example, entitle this data element Project/Sources of Expense and use the Comment title as a less structured data category. These could range from comments relating to customer service experiences to information on possible competitors that provide similar services. 6. Consistency in category titles—Although you may not consider there to be a signiﬁcant difference between the account titles Ofﬁce Supply and Ofﬁce Supplies, Excel will view them as completely distinct accounts. Our businessperson appears to have been consistent in the use of account types and comment entries. It is not unusual for these entries to be converted to numerical codes, thus replacing Printing with an account code, such as 351. 2.4 Types of Charts and Graphs 2.4 29 Types of Charts and Graphs There are literally hundreds of types of charts and graphs (these are synonymous terms) available in Excel. Thus, the possibilities for selecting a presentation format are both interesting and daunting. Which graph type is best for my needs? Often the answer is that more than one type of graph will perform the presentation goal required; thus, the selection is a matter of your taste or that of your audience. Therefore, it is convenient to divide the problem of selecting a presentation format into two parts: 1) the actual data presentation, and 2) the embellishment that will surround it. In certain situations, we choose to do as little embellishment as possible; in others, we ﬁnd it necessary to dress the data presentation in lovely colors, backgrounds, and labeling. To determine how to blend these two parts, ask yourself few simple questions: 1. What is the purpose of the data presentation? Is it possible to show the data without embellishment, or do you want to attract attention through your presentation style? In a business world where people are exposed to many, many presentations, it may be necessary to do something extraordinary to gain attention or simply conform to the norm. 2. At what point does my embellishment of the data become distracting? Does the embellishment cover or conceal the data? Don’t forget that from an information perspective, it is all about the data, so don’t detract from its presentation by adding superﬂuous and distracting adornment. 3. Am I being true to my taste and style of presentation? This author’s taste in formatting is guided by some simple principles that can be stated in several familiar laws: less is more, small is beautiful, and keep it simple. As long as you are able to deliver the desired information and achieve your presentation goal, there is no problem with our differences in taste. 4. Formatting should be consistent among graphs in a workbook—don’t mix various formats, unless there is good reason to do so. 2.4.1 Ribbons and the Excel Menu System So, how do we put together a graph or chart? In pre2007 Excel, an ingenious tool called a Chart Wizard was available to perform these tasks. As the name implies, the Chart Wizard guided you through standardized steps (four, to be exact) that took the guesswork out of creating charts. If you followed the four steps, it was almost foolproof, and if you read all the options available to you for each of the four steps, it would allow you to create charts very quickly. In Excel 2007, the wizard was replaced because of a major development in the Excel 2007 user interface—ribbons. Ribbons replaced the old hierarchical pulldown menu system that was the basis for user interaction with Excel. Ribbons are menus and commands organized in tabs that provide access to the functionality for speciﬁc uses. Some of these will appear 30 2 Presentation of Quantitative Data: Data Visualization familiar to preExcel 2007 users and others will not—Home, Insert, Page Layout, Formulas, Data, Review, Format, and View. Within each tab, you will ﬁnd groups of related functionality and commands. Additionally, some menus speciﬁc to an activity, for example, the creation of a graph or chart, will appear as the activity is taking place. For those just beginning to use Excel 2016 and with no previous exposure to Excel, you will probably ﬁnd the menu system quite easy to use; for those with prior experience with Excel, the transition may be a bit frustrating at times. I have found the ribbon system quite useful, in spite of the occasional difﬁculty of ﬁnding functionality that I was accustomed to before Excel 2007. Figure 2.2 shows the Insert tab where the Charts group is found. In this Figure, a very simple graph of six data points for two data series, data 1 and data 2, is shown as two variations of the column graph. One also displays the data used to create the graph. Additionally, since the leftmost graph has been selected, indicated by the border that surrounds the graph, a group of menus appear at the top of the ribbon—Chart Tools. These tools contain menus for Design, Layout, and Format. This group is relevant to the creation of a chart or graph. Ultimately, ribbons lead to a ﬂatter, or less hierarchical, menu system. Our ﬁrst step in chart creation is to organize our data in a worksheet. In Fig. 2.2 the six data points for the two series have a familiar columnar orientation and have titles, data 1 and data 2. By capturing the data range containing the data that you intend to chart before engaging the charts group in the Insert tab, you automatically identify the data to be graphed. Note that this can, but need not, include the column title of the data speciﬁed as text. By capturing the title, the graph will assume that you want to name the data series the same as title selected. If you place alphabetic Fig. 2.2 Insert tab and excel chart group 2.4 Types of Charts and Graphs 31 characters, a through f in this case, in the ﬁrst column of the captured data, the graph will use these characters as the xaxis of the chart. If you prefer not to capture the data prior to engaging the charts group, you can either: (1) open and capture a blank chart type and copy and paste the data to the blank chart type, or (2) use a right click of your mouse to select data. Obviously, there will be numerous detailed steps to capturing data and labeling the graph appropriately. We defer to a detailed example of creating graphs using the chart group for the next section. 2.4.2 Some Frequently Used Charts It is always dangerous to make bold assertions, but it is generally understood that the mother of all graphs is the Column or Bar chart. They differ only in their vertical and horizontal orientation, respectively. They easily represent the most often occurring data situation: some observed numerical variable that is measured in a single dimension (often time). Consider a simple set of data related to ﬁve products (A–E) and their sales over a 2year period of time and measured in millions of dollars. The ﬁrst four quarters represent year one, and the second four quarters represent year two. These data are shown in Table 2.3. In Quarter 1 of the second year, sales revenue for product B was $49,000,000. A quick visual examination of the data in Table 2.3 reveals that the product sales are relatively similar in magnitude (less than 100), but with differences in quarterly increases and decreases within the individual products. For example, product A varies substantially over the eight quarters, while product D shows relatively little variation. Additionally, it appears that when product A shows high sales in early quarters (1 and 2), product E shows low sales in early quarters—they appear to be somewhat negatively correlated, although a graph may reveal more conclusive information. Negative correlation implies that one data series moves in the opposite direction from another; positive correlation suggests that both series move in the same direction. In later chapters, we will discuss statistical correlation in greater detail. Table 2.3 Sales data for products A–E (in millions of dollars) Quarter 1 2 3 4 1 2 3 4 A 98 58 23 43 89 52 24 34 B 45 21 36 21 49 20 43 21 C 64 45 21 14 27 40 58 76 D 21 23 31 30 35 40 37 40 E 23 14 56 78 27 20 67 89 32 2 Presentation of Quantitative Data: Data Visualization Let’s experiment with a few chart types to examine the data and tease out insights related to product A–E sales. The ﬁrst graph, Fig. 2.3, displays a simple column chart of sales for the ﬁveproduct series in each of eight quarters. The relative magnitude of the ﬁve products in a quarter is easily observed, but note that the ﬁveproduct series are difﬁcult to follow through time, despite the color coding. It is difﬁcult to concentrate solely on a single series (e.g. Product A) through time; it is even more difﬁcult to follow multiple series. In Fig. 2.4, the chart type used is a Stacked Column. This graph provides a view not only of the individual product sales, but also of the quarterly totals for products A–E. By observing the absolute height of each stacked column, one can see that total Product Sales Millions of Dollars 120 100 80 A 60 B 40 C 20 D 0 E 1 2 3 4 1 2 3 4 Quarters Fig. 2.3 Column chart for products A–E Product Sales 300 Millions of Dollars 250 200 E 150 D C 100 B 50 0 A 1 2 3 4 1 Quarters Fig. 2.4 Stacked column chart for products A–E 2 3 4 2.4 Types of Charts and Graphs 33 product sales in Quarter 1 of Year