Main
Learn Keras for Deep Neural Networks: A FastTrack Approach to Modern Deep Learning with Python
Learn Keras for Deep Neural Networks: A FastTrack Approach to Modern Deep Learning with Python
Jojo John Moolayil
Learn, understand, and implement deep neural networks in a math and programmingfriendly approach using Keras and Python. The book focuses on an endtoend approach to developing supervised learning algorithms in regression and classification with practical businesscentric usecases implemented in Keras.
The overall book comprises three sections with two chapters in each section. The first section prepares you with all the necessary basics to get started in deep learning. Chapter 1 introduces you to the world of deep learning and its difference from machine learning, the choices of frameworks for deep learning, and the Keras ecosystem. You will cover a reallife business problem that can be solved by supervised learning algorithms with deep neural networks. You’ll tackle one use case for regression and another for classification leveraging popular Kaggle datasets.
Later, you will see an interesting and challenging part of deep learning: hyperparameter tuning; helping you further improve your models when building robust deep learning applications. Finally, you’ll further hone your skills in deep learning and cover areas of active development and research in deep learning.
At the end of Learn Keras for Deep Neural Networks, you will have a thorough understanding of deep learning principles and have practical handson experience in developing enterprisegrade deep learning solutions in Keras.
What You’ll Learn
• Master fastpaced practical deep learning concepts with math and programmingfriendly abstractions.
• Design, develop, train, validate, and deploy deep neural networks using the Keras framework
• Use best practices for debugging and validating deep learning models
• Deploy and integrate deep learning as a service into a larger software service or product
• Extend deep learning principles into other popular frameworks
Who This Book Is For
Software engineers and data engineers with basic programming skills in any language and who are keen on exploring deep learning for a career move or an enterprise project.
The overall book comprises three sections with two chapters in each section. The first section prepares you with all the necessary basics to get started in deep learning. Chapter 1 introduces you to the world of deep learning and its difference from machine learning, the choices of frameworks for deep learning, and the Keras ecosystem. You will cover a reallife business problem that can be solved by supervised learning algorithms with deep neural networks. You’ll tackle one use case for regression and another for classification leveraging popular Kaggle datasets.
Later, you will see an interesting and challenging part of deep learning: hyperparameter tuning; helping you further improve your models when building robust deep learning applications. Finally, you’ll further hone your skills in deep learning and cover areas of active development and research in deep learning.
At the end of Learn Keras for Deep Neural Networks, you will have a thorough understanding of deep learning principles and have practical handson experience in developing enterprisegrade deep learning solutions in Keras.
What You’ll Learn
• Master fastpaced practical deep learning concepts with math and programmingfriendly abstractions.
• Design, develop, train, validate, and deploy deep neural networks using the Keras framework
• Use best practices for debugging and validating deep learning models
• Deploy and integrate deep learning as a service into a larger software service or product
• Extend deep learning principles into other popular frameworks
Who This Book Is For
Software engineers and data engineers with basic programming skills in any language and who are keen on exploring deep learning for a career move or an enterprise project.
Categories:
Computers\\Cybernetics: Artificial Intelligence
Year:
2019
Edition:
1
Publisher:
Apress
Language:
english
Pages:
182 / 192
ISBN 10:
1484242394
ISBN 13:
9781484242391
File:
PDF, 2.74 MB
Download (pdf, 2.74 MB)
Preview
 Open in Browser
 Checking other formats...
 Convert to EPUB
 Convert to FB2
 Convert to MOBI
 Convert to TXT
 Convert to RTF
 Converted file can differ from the original. If possible, download the file in its original format.
 Please login to your account first

Need help? Please read our short guide how to send a book to Kindle.
The file will be sent to your email address. It may take up to 15 minutes before you receive it.
The file will be sent to your Kindle account. It may takes up to 15 minutes before you received it.
Please note you need to add our NEW email km@bookmail.org to approved email addresses. Read more.
Please note you need to add our NEW email km@bookmail.org to approved email addresses. Read more.
You may be interested in
Most frequently terms
keras^{188}
dataset^{150}
deep neural^{133}
output^{133}
neural networks^{131}
activation^{128}
deep neural networks^{122}
dense^{96}
epoch^{89}
supervised learning^{86}
layers^{83}
neurons^{79}
sales^{76}
import^{75}
epochs^{73}
python^{73}
samples^{71}
regression^{70}
columns^{68}
classification^{67}
relu^{63}
datasets^{61}
input^{60}
accuracy^{58}
validation^{55}
weights^{52}
dnn^{48}
sequential^{46}
https^{46}
categorical^{45}
metrics^{43}
architecture^{42}
tuning^{41}
metric^{41}
numeric^{40}
acc^{39}
keras in action^{38}
frameworks^{36}
optimizer^{34}
deploying^{34}
software^{33}
activation function^{32}
batch^{32}
customer^{31}
adam^{31}
test dataset^{30}
hyperparameter^{28}
predictions^{27}
optimization^{26}
sigmoid^{26}
cnn^{26}
regularization^{25}
validate^{25}
code snippet^{24}
input data^{24}
neuron^{24}
deploying deep^{24}
neural network^{24}
tuning and deploying^{24}
You can write a book review and share your experiences. Other readers will always be interested in your opinion of the books you've read. Whether you've loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them.
1

2

Learn Keras for Deep Neural Networks A FastTrack Approach to Modern Deep Learning with Python — Jojo Moolayil Learn Keras for Deep Neural Networks A FastTrack Approach to Modern Deep Learning with Python Jojo Moolayil Learn Keras for Deep Neural Networks Jojo Moolayil Vancouver, BC, Canada ISBN13 (pbk): 9781484242391 https://doi.org/10.1007/9781484242407 ISBN13 (electronic): 9781484242407 Library of Congress Control Number: 2018965596 Copyright © 2019 by Jojo Moolayil This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Managing Director, Apress Media LLC: Welmoed Spahr Acquisitions Editor: Celestin Suresh John Development Editor: Matthew Moodie Coordinating Editor: Aditee Mirashi Cover designed by eStudioCalamar Cover image designed by Freepik (www.freepik.com) Distributed to the book trade worldwide by Springer Science+Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone 1800SPRINGER, fax (201) 3484505, email ordersny@springersbm.com, or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. For information on translations, please email rights@apress.com, or visit http://www.apress. com/rightspermissions. Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales web page at http://www.apress.com/bulksales. Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub via the book’s product page, located at www.apress.com/9781484242391. For more detailed information, please visit http://www.apress.com/sourcecode. Printed on acidfree paper Table of Contents About the Author��������������������������������������������������������������������������������vii About the Technical Reviewer�������������������������������������������������������������ix Acknowledgments�������������������������������������������������������������������������������xi Introduction���������������������������������������������������������������������������������������xiii Chapter 1: An Introduction to Deep Learning and Keras����������������������1 Introduction to DL�������������������������������������������������������������������������������������������������1 Demystifying the Buzzwords���������������������������������������������������������������������������2 What Are Some Classic Problems Solved by DL in Today’s Market?���������������5 Decomposing a DL Model��������������������������������������������������������������������������������5 Exploring the Popular DL Frameworks������������������������������������������������������������������8 LowLevel DL Frameworks������������������������������������������������������������������������������9 HighLevel DL Frameworks���������������������������������������������������������������������������11 A Sneak Peek into the Keras Framework������������������������������������������������������������13 Getting the Data Ready����������������������������������������������������������������������������������15 Defining the Model Structure������������������������������������������������������������������������15 Training the Model and Making Predictions��������������������������������������������������15 Summary������������������������������������������������������������������������������������������������������������16 Chapter 2: Keras in Action������������������������������������������������������������������17 Setting Up the Environment��������������������������������������������������������������������������������17 Selecting the Python Version�������������������������������������������������������������������������17 Installing Python for Windows, Linux, or macOS�������������������������������������������18 Installing Keras and TensorFlow Back End����������������������������������������������������19 iii Table of Contents Getting Started with DL in Keras�������������������������������������������������������������������������21 Input Data������������������������������������������������������������������������������������������������������21 Neuron�����������������������������������������������������������������������������������������������������������23 Activation Function����������������������������������������������������������������������������������������24 Sigmoid Activation Function��������������������������������������������������������������������������25 Model�������������������������������������������������������������������������������������������������������������28 Layers������������������������������������������������������������������������������������������������������������28 The Loss Function�����������������������������������������������������������������������������������������32 Optimizers�����������������������������������������������������������������������������������������������������35 Metrics����������������������������������������������������������������������������������������������������������39 Model Configuration��������������������������������������������������������������������������������������39 Model Training�����������������������������������������������������������������������������������������������40 Model Evaluation�������������������������������������������������������������������������������������������43 Putting All the Building Blocks Together�������������������������������������������������������������45 Summary������������������������������������������������������������������������������������������������������������52 Chapter 3: Deep Neural Networks for Supervised Learning: Regression������������������������������������������������������������������������������������������53 Getting Started����������������������������������������������������������������������������������������������������53 Problem Statement���������������������������������������������������������������������������������������������55 Why Is Representing a Problem Statement with a Design Principle Important?�����������������������������������������������������������������������������������������������������56 Designing an SCQ������������������������������������������������������������������������������������������57 Designing the Solution����������������������������������������������������������������������������������59 Exploring the Data�����������������������������������������������������������������������������������������������60 Looking at the Data Dictionary����������������������������������������������������������������������63 Finding Data Types����������������������������������������������������������������������������������������66 Working with Time�����������������������������������������������������������������������������������������67 Predicting Sales���������������������������������������������������������������������������������������������69 iv Table of Contents Exploring Numeric Columns��������������������������������������������������������������������������70 Understanding the Categorical Features�������������������������������������������������������74 Data Engineering�������������������������������������������������������������������������������������������������78 Defining Model Baseline Performance����������������������������������������������������������������84 Designing the DNN����������������������������������������������������������������������������������������������85 Testing the Model Performance���������������������������������������������������������������������89 Improving the Model��������������������������������������������������������������������������������������89 Increasing the Number of Neurons����������������������������������������������������������������93 Plotting the Loss Metric Across Epochs��������������������������������������������������������97 Testing the Model Manually���������������������������������������������������������������������������98 Summary������������������������������������������������������������������������������������������������������������99 Chapter 4: Deep Neural Networks for Supervised Learning: Classification������������������������������������������������������������������������������������101 Getting Started��������������������������������������������������������������������������������������������������101 Problem Statement�������������������������������������������������������������������������������������������102 Designing the SCQ���������������������������������������������������������������������������������������103 Designing the Solution��������������������������������������������������������������������������������103 Exploring the Data���������������������������������������������������������������������������������������������104 Data Engineering�����������������������������������������������������������������������������������������������110 Defining Model Baseline Accuracy��������������������������������������������������������������������118 Designing the DNN for Classification����������������������������������������������������������������119 Revisiting the Data��������������������������������������������������������������������������������������������124 Standardize, Normalize, or Scale the Data��������������������������������������������������124 Transforming the Input Data������������������������������������������������������������������������126 DNNs for Classification with Improved Data�����������������������������������������������������127 Summary����������������������������������������������������������������������������������������������������������134 v Table of Contents Chapter 5: Tuning and Deploying Deep Neural Networks�����������������137 The Problem of Overfitting��������������������������������������������������������������������������������137 So, What Is Regularization?������������������������������������������������������������������������������139 L1 Regularization�����������������������������������������������������������������������������������������140 L2 Regularization�����������������������������������������������������������������������������������������140 Dropout Regularization��������������������������������������������������������������������������������141 Hyperparameter Tuning�������������������������������������������������������������������������������������142 Hyperparameters in DL��������������������������������������������������������������������������������143 Approaches for Hyperparameter Tuning������������������������������������������������������147 Model Deployment��������������������������������������������������������������������������������������������152 Tailoring the Test Data���������������������������������������������������������������������������������152 Saving Models to Memory���������������������������������������������������������������������������154 Retraining the Models with New Data���������������������������������������������������������155 Online Models����������������������������������������������������������������������������������������������156 Delivering Your Model As an API������������������������������������������������������������������157 Putting All the Pieces of the Puzzle Together����������������������������������������������158 Summary����������������������������������������������������������������������������������������������������������159 Chapter 6: The Path Ahead���������������������������������������������������������������161 What’s Next for DL Expertise?���������������������������������������������������������������������������161 CNN�������������������������������������������������������������������������������������������������������������162 RNN�������������������������������������������������������������������������������������������������������������167 CNN + RNN��������������������������������������������������������������������������������������������������170 Why Do We Need GPU for DL?���������������������������������������������������������������������������171 Other Hot Areas in DL (GAN)������������������������������������������������������������������������������174 Concluding Thoughts����������������������������������������������������������������������������������������176 Index�������������������������������������������������������������������������������������������������177 vi About the Author Jojo Moolayil is an artificial intelligence, deep learning, machine learning, and decision science professional and the author of the book Smarter Decisions: The Intersection of IoT and Decision Science (Packt, 2016). He has worked with industry leaders on several highimpact and critical data science and machine learning projects across multiple verticals. He is currently associated with Amazon Web Services as a Research Scientist–AI. Jojo was born and raised in Pune, India and graduated from the University of Pune with a major in Information Technology Engineering. He started his career with Mu Sigma Inc., the world’s largest pureplay analytics provider, and worked with the leaders of many Fortune 50 clients. He later worked with Flutura, an IoT analytics startup, and GE, the pioneer and leader in industrial AI. He currently resides in Vancouver, BC. Apart from authoring books on deep learning, decision science, and IoT, Jojo has also been technical reviewer for various books on the same subject with Apress and Packt Publishing. He is an active data science tutor and maintains a blog at http://blog.jojomoolayil.com. Jojo’s personal website: www.jojomoolayil.com Business email: mail@jojomoolayil.com vii About the Technical Reviewer Manohar Swamynathan is a data science practitioner and an avid programmer, with over 13 years of experience in various data science–related areas that include data warehousing, business intelligence (BI), analytical tool development, ad hoc analysis, predictive modeling, data science product development, consulting, formulating strategy, and executing analytics programs. He’s had a career covering the life cycles of data across different domains such as US mortgage banking, retail/ecommerce, insurance, and industrial IoT. He has a bachelor’s degree with a specialization in physics, mathematics, and computers, and a master’s degree in project management. He currently lives in Bengaluru, the Silicon Valley of India. He has authored the book Mastering Machine Learning with Python in Six Steps (Apress, 2017). You can learn more about his various other activities on his website, http://www.mswamynathan.com. ix Acknowledgments I would like to thank my parents, my brother Tijo, and my sister Josna for their constant support and love. xi Introduction This book is intended to gear the readers with a superfast crash course on deep learning. Readers are expected to have basic programming skills in any modernday language; Python experience would be great, but is not necessary. Given the limitations on the size and depth of the subject we can cover, this short guide is intended to equip you as a beginner with sound understanding of the topic, including tangible practical experience in model development that will help develop a foundation in the deep learning domain. This guide is not recommended if you are already above the beginner level and are keen to explore advanced topics in deep learning like computer vision, speech recognition, and so on. The topics of CNN, RNN, and modern unsupervised learning algorithms are beyond the scope of this guide. We provide only a brief introduction to these to keep the readers aware contextually about more advanced topics and also provide recommended sources to explore these topics in more detail. What will you learn from this guide? The book is focused on a fastpaced approach to exploring practical deep learning concepts with math and programmingfriendly abstractions. You will learn to design, develop, train, validate, and deploy deep neural networks using the industry’s favorite Keras framework. You will also learn about the best practices for debugging and validating deep learning models and briefly learn about deploying and integrating deep learning as a service into a larger software service or product. Finally, with the experience gained in building deep learning models with Keras, you will also be able to extend the same principles into other popular frameworks. xiii Introduction Who is this book for? The primary target audience for this book consists of software engineers and data engineers keen on exploring deep learning for a career move or an upcoming enterprise tech project. We understand the time crunch you may be under and the pain of assimilating new content to get started with the least amount of friction. Additionally, this book is for data science enthusiasts and academic and research professionals exploring deep learning as a tool for research and experiments. hat is the approach to learning W in the book? We follow the lazy programming approach in this guide. We start with a basic introduction, and then cater to the required context incrementally at each step. We discuss how each building block functions in a lucid way and then learn about the abstractions available to implement them. How is the book structured? The book is organized into three sections with two chapters each. Section 1 equips you with all the necessary gear to get started on the fasttrack ride into deep learning. Chapter 1 introduces the topic of deep learning, details its differences from similar fields, and explores the choices of frameworks for deep learning with a deeper look at the Keras ecosystem. Chapter 2 will help you get started with a handson exercise in Keras, understanding the basic building blocks of deep learning and developing the first basic DNN. Section 2 embraces the fundamentals of deep learning in simple, lucid language while abstracting the math and complexities of model training xiv Introduction and validation with the least amount of code without compromising on flexibility, scale, and the required sophistication. Chapter 3 explores a business problem that can be solved by supervised learning algorithms with deep neural networks. We tackle one use case for regression and another for classification, leveraging popular Kaggle datasets. Chapter 4 delves into the craft of validating deep neural networks (i.e., measuring performance and understanding the shortcomings and the means to circumvent them). Section 3 concludes the book with topics on further model improvement and the path forward. Chapter 5 discusses an interesting and challenging part of deep learning (i.e., hyperparameter tuning). Finally, Chapter 6—the conclusion—discusses the path ahead for the reader to further hone his or her skills in deep learning and discusses a few areas of active development and research in deep learning. At the end of this crash course, the reader will have gained a thorough understanding of the deep learning principles within the shortest possible time frame and will have obtained practical handson experience in developing enterprisegrade deep learning solutions in Keras. xv CHAPTER 1 An Introduction to Deep Learning and Keras In this chapter, we will explore the field of deep learning (DL) with a brief introduction and then move to have a look at the popular choices of available frameworks for DL development. We will also take a closer look at the Keras ecosystem to understand why it is special and have a look at a sample code to understand how easy the framework is for developing DL models. Let’s get started. I ntroduction to DL We’ll first start with a formal definition and then tackle a simple way of delineating the topic. DL is a subfield of machine learning (ML) in artificial intelligence (AI) that deals with algorithms inspired from the biological structure and functioning of a brain to aid machines with intelligence. © Jojo Moolayil 2019 J. Moolayil, Learn Keras for Deep Neural Networks, https://doi.org/10.1007/9781484242407_1 1 Chapter 1 An Introduction to Deep Learning and Keras Maybe this was too high level or probably difficult to consume, so let’s break it down step by step. We see three important terms in the definition, in a specific order: DL, ML, and AI. Let’s first tackle these buzzwords individually, starting with AI. Demystifying the Buzzwords AI in its most generic form can be defined as the quality of intelligence being introduced into machines. Machines are usually dumb, so to make them smarter we induce some sort of intelligence in them where they can take a decision independently. One example would be a washing machine that can decide on the right amount of water to use and on the required time for soaking, washing, and spinning; that is, it makes a decision when specific inputs are provided and therefore works in a smarter way. Similarly, an ATM could make a call on disbursing the amount you want with the right combination of notes available in the machine. This intelligence is technically induced in the machine in an artificial way, thus the name AI. Another point to note is that the intelligence here is explicitly programmed, say a comprehensive list of ifelse rules . The engineer who designed the system carefully thought through all the combinations possible and designed a rulebased system that can make decisions by traversing through the defined rule path. What if we need to introduce intelligence in a machine without explicit programming, probably something where the machine can learn on its own? That’s when we touch base with ML. Machine learning can be defined as the process of inducing intelligence into a system or machine without explicit programming. —Andrew NG, Stanford Adjunct Professor 2 Chapter 1 An Introduction to Deep Learning and Keras Examples for ML could be a system that could predict whether a student will fail or pass in a test by learning from the historical test results and student attributes. Here, the system is not encoded with a comprehensive list of all possible rules that can decide whether a student will pass or fail; instead, the system learns on its own based on the patterns it learned from the historical data. So, where does DL stand within this context? It happens that while ML works very well for a variety of problems, it fails to excel in some specific cases that seem to be very easy for humans: say, classifying an image as a cat or dog, distinguishing an audio clip as of a male or female voice, and so on. ML performs poorly with image and other unstructured data types. Upon researching the reasons for this poor performance, an inspiration led to the idea of mimicking the human brain’s biological process, which is composed of billions of neurons connected and orchestrated to adapt to learning new things. On a parallel track, neural networks had already been a research topic for several years, but only limited progress had been made due to the computational and data limitations at the time. When researchers reached the cusp of ML and neural networks, there came the field of DL, which was framed by developing deep neural networks (DNNs), that is, improvised neural networks with many more layers. DL excelled at the new frontiers where ML was falling behind. In due course, additional research and experimentation led to the understanding of where we could leverage DL for all ML tasks and expect better performance, provided there was surplus data availability. DL, therefore, became a ubiquitous field to solve predictive problems rather than just being confined to areas of computer vision, speech, and so on. Today, we can leverage DL for almost all use cases that were earlier solved using ML and expect to outperform our previous achievements, provided that there is a surplus of data. This realization has led to distinguishing the order of the fields based on data. A new rule of thumb was established: ML would not be able to improve performance with increased training data after a certain threshold, whereas DL was able to 3 Chapter 1 An Introduction to Deep Learning and Keras leverage the surplus data more effectively for improved performance. The same was true a few years back in the debate between statistical models and ML. The following chart is an illustration to represent the overall idea of model performance with data size for the three aforementioned fields. Now, if we revisit the formal definition, you can probably make a better sense of the statement that the AI subfield of ML is inspired by the biological aspects of a human brain. We can simplify the three fields using a simple Venn diagram, as shown in the following. 4 Chapter 1 An Introduction to Deep Learning and Keras Putting it all together, we can say that AI is the field of inducing intelligence into a machine or system artificially, with or without explicit programming. ML is a subfield in AI where intelligence is induced without explicit programming. Lastly, DL is a field within ML where intelligence is induced into systems without explicit programming using algorithms that have been inspired by the biological functioning of the human brain. hat Are Some Classic Problems Solved by DL W in Today’s Market? Today, we can see the adoption of DL in a variety of daytoday aspects of our life in the digital world. If you are active on social media, you might have noticed Facebook suggesting tagging your friends when you upload a picture. Also note the selfdriving mode in Tesla’s cars, predictions of the next word in the messaging system on your iOS or Android phone, Alexa, Siri, and Google Assistant responding to you as a human, and so on. If we try to analyze the type of use cases we can solve using DL, we can already witness the power of DL in almost any system you use in today’s world. Decomposing a DL Model In its most basic form, DL models are designed using neural network architecture. A neural network is a hierarchical organization of neurons (similar to the neurons in the brain) with connections to other neurons. These neurons pass a message or signal to other neurons based on the received input and form a complex network that learns with some feedback mechanism. The following is a simplistic representation of a basic neural network. 5 Chapter 1 An Introduction to Deep Learning and Keras As you can see in the preceding figure, the input data is consumed by the neurons in the first hidden layer, which then provides an output to the next layer and so on, eventually resulting in the final output. Each layer can have one or many neurons, and each of them will compute a small function (e.g., activation function). The connection between two neurons of successive layers would have an associated weight. The weight defines the influence of the input to the output for the next neuron and eventually for the overall final output. In a neural network, the initial weights would all be random during the model training, but these weights are updated iteratively to learn to predict a correct output. Decomposing the network, we can define few logical building blocks like neuron, layer, weight, input, output, an activation function inside the neuron to compute a learning process, and so on. For an intuitive understanding, let’s take an example of how a human brain learns to identify different people. When you meet a person for the second time, you will be able to identify him. How does this happen? People have a resemblance in overall structure; two eyes, two ears, a nose, lips, and so on. Everyone has the same structure, yet we are able to distinguish between people quite easily, right? 6 Chapter 1 An Introduction to Deep Learning and Keras The nature of the learning process in the brain is quite intuitive. Rather than learning the structure of the face to identify people, the brain learns the deviation from a generic face (e.g., how different an individual’s eyes are from the reference eye), which can then be quantified as an electrical signal with a defined strength. Similarly, it learns deviations from all parts of the face from a reference base, combines these deviations into new dimensions, and finally gives an output. All of this happens so quickly that none of us realizes what our subconscious mind has actually done. Similarly, the neural network showcased in the preceding illustration tries to mimic the same process using a mathematical approach. The input is consumed by neurons in the first layer and an activation function is calculated within each neuron. Based on a simple rule, it forwards an output to the next neuron, similar to the deviations learned by the human brain. The larger the output of a neuron, the larger the significance of that input dimension will be. These dimensions are then combined in the next layer to form additional new dimensions, which we probably can’t make sense of. But the system learns it intuitively. The process, when multiplied several times, develops a complex network with several connections. Now that the structure of the neural network is understood, let’s understand how the learning happens. When we provide the input data to the defined structure, the end output would be a prediction, which could be either correct or incorrect. Based on the output, if we provide a feedback to the network to adapt better by using some means to make a better prediction, the system learns by updating the weight for the connections. To achieve the process of providing feedback and defining the next step to make changes in the correct way, we use a beautiful mathematical algorithm called “backpropagation.” Iterating the process several times step by step, with more and more data, helps the network update the weights appropriately to create a system where it can make a decision for predicting output based on the rules it has created for itself through the weights and connections. 7 Chapter 1 An Introduction to Deep Learning and Keras The name “deep neural networks” evolved from the use of many more hidden layers, making it a “deep” network to learn more complex patterns. The success stories of DL have only surfaced in the last few years because the process of training a network is computationally heavy and needs large amounts of data. The experiments finally saw the light of the day only when computer and data storage became more available and affordable. Exploring the Popular DL Frameworks Given that the adoption of DL has proceeded at an alarming pace, the maturity of the ecosystem has also shown phenomenal improvement. Thanks to many large tech organizations and open source initiatives, we now have a plethora of options to choose from. Before we delve into the specifics of various frameworks, let us understand why would we essentially need a framework and what could be used as an alternative. Let’s start by understanding how the software industry evolved in frameworks. If you observe the evolution of the software industry, you will understand that today it is far easier to develop highend software than it was a few years back. Credit for this goes to the available tools that have automated or abstracted complex problems in a way that’s simple use. The techfraternity has been benevolent and innovative in contributing great ideas. We build new services that are built on top of the previous ones and will ultimately create a complex service that will be capable of orchestrating the collection of services while being secure as well as scalable. Given the maturity of software tools available today, we can afford to abstract several complexities that happen in the background. These tools are nothing but building blocks for software systems. You technically don’t need to start from scratch; you can instead rely on available powerful tools that have matured significantly to take care of several softwarebuilding services. 8 Chapter 1 An Introduction to Deep Learning and Keras Similarly, in DL, there are a set of code blocks that can be reused for different types of use cases. The same algorithm with a different parameter value can be used for a different use case, so why not package the algorithm as a simple function or a class? Several aspects of DL have been developed as reusable codes that can today be directly used from frameworks that do an excellent job of abstracting the idea. Building blocks in a DL model include the neurons, activation functions, optimization algorithms, data augmentation tools, and so on. You could indeed develop a DNN from scratch, say in C++, Java, or Python, with ~1000 lines of code, or probably use a framework and reuse available tools with maybe 10–15 lines of code. That being said, let’s have a look at the popular choices of DL frameworks used in the industry today. LowLevel DL Frameworks Given the level of abstraction a framework provides, we can classify it as a lowlevel or highlevel DL framework. While this is by no means industry recognized terminology, we can use this segregation for a more intuitive understanding of the frameworks. The following are a few of the popular lowlevel frameworks for DL. T heano Theano was one of the first DL libraries to gain popularity. It was developed by the Montreal Institute for Learning Algorithms (MILA) at the University of Montreal. Theano is an open source Python library that was made available in 2007; the last main release was published in late 2017 by MILA. Additional details are available at http://deeplearning.net/software/theano/ https://github.com/Theano/Theano/ 9 Chapter 1 An Introduction to Deep Learning and Keras T orch Torch is another popular ML and DL framework based on the Lua programming language. It was initially developed by Ronan Collobert, Koray Kavukcuoglu, and Clement Farabet but was later improved by Facebook with a set of extension modules as open source software. Additional details are available at http://torch.ch/ P yTorch PyTorch is an open source ML and DL library for Python and was developed by the Facebook AI research team. PyTorch has become more popular than Torch, since anyone with a basic understanding of Python can get started on developing DL models. Moreover, PyTorch was far easier and transparent to use for DL development. Additional details are available at https://pytorch.org/ M xNet Pronounced “mixnet,” MxNet stands for both “mix” and “maximize” and was developed by researchers from CMU, NYU, NUS, MIT, and others. The idea was simplified to combine declarative and imperative programming together (mix) to maximize efficiency and productivity. It supports the use of multiple GPUs and is widely supported by major cloud providers like AWS and Azure. Additional details are available at https://mxnet.apache.org/ 10 Chapter 1 An Introduction to Deep Learning and Keras T ensorFlow TensorFlow is undoubtedly one of the most popular and widely used DL frameworks in the DL fraternity. It was developed and open sourced by Google and supports deployment across CPUs, GPUs, and mobile and edge devices as well. It was released in November 2015 and then saw a huge increase in its adoption within the industry. www.tensorflow.org/ The list of DL frameworks is a long one, and discussing all of them is beyond the scope of our book. A few other popular frameworks you could additionally research are Caffe, Microsoft CNTK, Chainer, PaddlePaddle, and so on. Discussing the pros and cons of one framework over other is another interesting and neverending debate. I would highly recommend that you explore and understand what improvements each framework has to offer. This would be a good starting point: https://blogs.technet.microsoft.com/ machinelearning/2018/03/14/comparingdeeplearningframeworksarosettastoneapproach/ HighLevel DL Frameworks The previously mentioned frameworks can be defined as the first level of abstraction for DL models. You would still need to write fairly long codes and scripts to get your DL model ready, although much less so than using just Python or C++. The advantage of using the firstlevel abstraction is the flexibility it provides in designing a model. However, to simplify the process of DL models, we have frameworks that work on the second level of abstraction; that is, rather than using the previously mentioned frameworks directly, we can use a new framework on top of an existing framework and thereby simplify DL model development even further. 11 Chapter 1 An Introduction to Deep Learning and Keras The most popular highlevel DL framework that provides a second level abstraction to DL model development is Keras. Other frameworks like Gluon, Lasagne, and so on are also available, but Keras has been the most widely adopted one. Note While Gluon works on top of MxNet, and Lasagne on top of Theano, Keras can work on top of TensorFlow, Theano, MxNet, and Microsoft CNTK. The list has been aggressively expanding, and quite possibly by the time you read this book many more will have been added. Keras is a highlevel neural network API written in Python and can help you in developing a fully functional DL model with less than 15 lines of code. Since it is written in Python, it has a larger community of users and supporters and is extremely easy to get started with. The simplicity of Keras is that it helps users quickly develop DL models and provides a ton of flexibility while still being a highlevel API. This really makes Keras a special framework to work with. Moreover, given that it supports several other frameworks as a back end, it adds the flexibility to leverage a different lowlevel API for a different use case if required. By far the most widely adopted usage of Keras is with TensorFlow as a back end (i.e., Keras as a highlevel DL API and TensorFlow as its lowlevel API back end). In a nutshell, the code you write in Keras gets converted to TensorFlow, which then runs on a compute instance. You can read more about Keras and its recent developments here: https://keras.io/ 12 Chapter 1 An Introduction to Deep Learning and Keras A Sneak Peek into the Keras Framework Now that we have an understanding of the different frameworks available for DL as well as the need to use one of them, we can take a sneak peek into why Keras has an unfair advantage in DL development before we conclude the chapter. We will definitely take a deeper look at what Keras has to offer in the next chapter, but it is interesting to look at the beauty of Keras in action before we end this chapter. Have a look at the DNN showcased in the following. Yes, this is the same figure we saw earlier while exploring the topic “Decomposing a DL Model.” If we try to define the network, we can say that it is a DNN that has two hidden layers with five and four neurons, respectively. The first hidden layer accepts an input data that has three dimensions and gives an output in the output layer with two neurons. To have this make more intuitive sense, we can assume that this is a simple DNN for a problem like predicting whether a student will pass or fail based on some input data. 13 Chapter 1 An Introduction to Deep Learning and Keras Say we have the age, the number of hours studied, and the average score out of 100 for all the previous tests for which he appeared as the input data point. Building a neural network in Keras is as simple as the following script. It is absolutely fine not to understand the whole code that follows at the moment; we will explore this step by step in more detail in the next chapter. #Import required packages from keras.models import Sequential from keras.layers import Dense import numpy as np # Getting the data ready # Generate train dummy data for 1000 Students and dummy test for 500 #Columns :Age, Hours of Study &Avg Previous test scores np.random.seed(2018). #Setting seed for reproducibility train_data, test_data = np.random.random((1000, 3)), np.random. random((500, 3)) #Generate dummy results for 1000 students : Whether Passed (1) or Failed (0) labels = np.random.randint(2, size=(1000, 1)) #Defining the model structure with the required layers, # of neurons, activation function and optimizers model = Sequential() model.add(Dense(5, input_dim=3, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 14 Chapter 1 An Introduction to Deep Learning and Keras #Train the model and make predictions model.fit(train_data, labels, epochs=10, batch_size=32) #Make predictions from the trained model predictions = model.predict(test_data) The preceding code can be divided into three sections. Getting the Data Ready Normally, we would spend some time with the data by importing and exploring the content and making necessary augmentations to the data as the model’s input. Here, since this is a dummy use case, we are just using a random number generator in Python’s numpy package to create a dummy training dataset for 1000 students, another dummy test dataset for 500 students, and lastly, the labels or actual outputs for the students (i.e., whether they passed or failed). Defining the Model Structure Once we have the data ready in the necessary format, we would need to first design the structure of the DNN. We define the number and types of layers, the number of neurons in each layer, the required activation function, the optimizer to use, and few other network attributes. Training the Model and Making Predictions Once the network is defined, we can use the training data with the correct predictions to train the network using the “fit” method for the model. Finally, once the model is trained, we can use the trained model to make predictions on the new test dataset. 15 Chapter 1 An Introduction to Deep Learning and Keras I hope this example, though oversimplified, will give you an understanding of how easy it to use the Keras framework to develop DL models. If understanding the code was overwhelming at this point, it’s absolutely fine. We will tackle codes step by step in detail in the next chapter. Summary In this chapter, we have learned the basics of DL with a simple introduction and also explored a few examples of common use cases that leverage DL in our daytoday digital lives. We then studied the need for using a DL framework for developing models and explored a few lowlevel as well as highlevel frameworks available in the industry. We then looked at Keras, our preferred framework for this book, with a simple dummy example in order to see the simplicity of creating DL models. In the next chapter, we will take a deeper look at Keras and the various building blocks it offers. We will try developing a simple DL model with handson exercises using Keras and Python. 16 CHAPTER 2 Keras in Action In this chapter, we will explore the Keras framework and get started with handson exercises to learn the basics of Keras along with a bit of Python and the necessary DL topics. A word of caution, given that this a fasttrack guide: we will not have the scope to talk in detail about exhaustive topics in DL. Instead, we will start with a simple topic, explore the basic idea behind it, and add references where you can dive deeper for a more foundational understanding of the topic. S etting Up the Environment As discussed earlier, we will be developing DL models with the Keras stack using TensorFlow as a back end in Python. Hence, to get started we need to set up our playground environment by installing Python, a few important Python packages, TensorFlow, and finally Keras. Let’s get started. Selecting the Python Version Python is currently available in two major versions: 2.7.x and 3.x. Although Python 3.x is the most recent version and the future of Python, there have been a series of conflicts due to backward incapability in the developer community with regard to the transition from 2.7 to 3.x. Unfortunately, many developers are still connected with the Python 2.7.x version. © Jojo Moolayil 2019 J. Moolayil, Learn Keras for Deep Neural Networks, https://doi.org/10.1007/9781484242407_2 17 Chapter 2 Keras in Action However, for our use case, I highly recommend getting started with Python 3.x, given that it is the future. Some may be reluctant to start with Python 3, assuming there will be issues with many packages in the 3.x version, but for almost all practical use cases, we have all major DL, ML, and other useful packages already updated for 3.x. Installing Python for Windows, Linux, or macOS There are many distributions of Python available in the market. You could either download and install Python from the official python.org website or choose any popular distribution. For ML and DL, the most recommended distribution of Python is the Anaconda distribution from Continuum Analytics. Anaconda is a free and open source distribution of Python, especially for ML and DL largescale processing. It simplifies the entire package management and deployment process and comes with a very easy to use virtual environment manager and a couple of additional tools for coding like Jupyter Notebooks and the Spyder IDE. To get started with Anaconda, you can go to www.anaconda.com/ download/ and select an appropriate version based on the OS (Mac/ Windows/Linux) and architecture (32 bit/64 bit) of your choice. At the time of writing this book, the most recent version of Python 3 is 3.6. By the time you read this book, there might be a newer version available. You should comfortably download and install the most updated version of Anaconda Python. Once you have downloaded the installer, please install the application. For Windows users, this will be a simple executable file installation. Doubleclick the .exe file downloaded from Anaconda’s website and follow the visual onscreen guidelines to complete the installation process. Linux users can use the following command after navigating to the downloaded folder: bash AnacondalatestLinuxx86_64.sh 18 Chapter 2 Keras in Action Mac users can install the software by doubleclicking the downloaded .pkg file and then following the onscreen instructions. The Anaconda distribution of Python eases out the process for DL and ML by installing all major Python packages required for DL. Installing Keras and TensorFlow Back End Now that Python is set up, we need to install TensorFlow and Keras. Installing packages in Python can be done easily using the pip, a package manager for Python. You can install any Python package with the command pip install packagename in the terminal or command prompt. So, let’s install our required packages (i.e., TensorFlow and Keras). pip install keras followed by pip install tensorflow In case you face any issues in setting Anaconda Python with TensorFlow and Keras, or you want to experiment only within a Python virtual environment, you can explore a more detailed installation guide here: https://medium.com/@margaretmz/anacondajupyternotebook tensorflowandkerasb91f381405f8 Also, you might want to install TensorFlow with GPU support if your system has any NVIDIA CUDA–compatible GPUs. Here is a link to a stepbystep guide to install TensorFlow with GPU support on Windows, Mac and Linux: www.tensorflow.org/install/ To check if your GPU is CUDA compatible, please explore the list available on NVIDIA’s official website: https://developer.nvidia.com/cudagpus 19 Chapter 2 Keras in Action To write codes and develop models, you can choose the IDE provided by Anaconda (i.e., Spyder), the native terminal or command prompt, or a webbased notebook IDE called Jupyter Notebooks. For all data science– related experiments, I would highly recommend using Jupyter Notebooks for the convenience it provides in exploratory analysis and reproducibility. We will be using Jupyter Notebooks for all experiments in our book. Jupyter Notebooks comes preinstalled with Anaconda Python; in case you are using a virtual environment, you might have to install it using the package manager or just the command conda install jupyter To start Jupyter Notebooks, you can use the Anaconda Navigator or just enter the command jupyter notebook inside your command prompt or terminal; then, Jupyter should start in your default browser on localhost. The following screenshot shows when Jupyter is running in the browser. Click the ‘New’ button at the extreme right and select Python from the dropdown menu. If you have installed one or more virtual environments, all of them will show up in the dropdown; please select the Python environment of your choice. Once selected, your Jupyter notebook should open and should be ready to get started. The following screenshot showcases a Jupyter notebook up and running in the browser. 20 Chapter 2 Keras in Action The green highlighted cell is where you write your code, and Ctrl + Enter will execute the selected cell. You can add more cells with the ‘+’ icon in the control bar or explore additional options from the Menu bar. If this is your first time with Jupyter, I recommend the available options in the navigation menu. Now that we have all the required tools set up and running, let’s start with simple DL building blocks with Keras. Getting Started with DL in Keras Let’s start by studying the DNN and its logical components, understanding what each component is used for and how these building blocks are mapped in the Keras framework. If you recall the topic “Decomposing a DL Model” from Chapter 1, we had defined the logical components in a DNN as input data, neurons, activation functions, layer (i.e., group of neurons), connections between neurons or edges, a learning procedure (i.e., the backpropagation algorithm), and the output layer. Let’s look at at these logical components one by one. I nput Data Input data for a DL algorithm can be of a variety of types. Essentially, the model understands data as “tensors”. Tensors are nothing but a generic form for vectors, or in computer engineering terms, a simple ndimensional matrix. Data of any form is finally represented as a 21 Chapter 2 Keras in Action homogeneous numeric matrix. So, if the data is tabular, it will be a two dimensional tensor where each column represents one training sample and the entire table/matrix will be m samples. To understand this better, have a look at the following visual illustration. You could also reverse the representation of training samples (i.e., each row could be one training sample), so in the context of the student passing/failing in the test example, one row would indicate all the attributes of one student (his marks, age, etc.). And for n rows, we would have a dataset with n training samples. But in DL experiments, it is common notation to use one training sample in a column. Thus, m columns would denote m samples. Additionally, DL models can interpret only numeric data. If the dataset has any categorical data like “gender” with values of “male” and “female,” we will need to convert them to onehot encoded variables (i.e., simply representing the columns with a value of 0 or 1, where 0 would represent “male” and 1 would represent “female” or vice versa). 22 Chapter 2 Keras in Action Image data also needs to be transformed into an ndimensional tensor. We will not cover DL models for image data in this book but I do want to keep you aware of its representation as input data. An image is stored in data as a threedimensional tensor where two dimensions define the pixel values on a 2D plane and a third dimension defines the values for RGB color channels. So essentially, one image becomes a threedimensional tensor and n images will be a fourdimensional tensor, where the fourth dimension will stack a 3D tensor image as a training sample. Therefore, if we have 100 images with a 512 × 512pixel resolution, they will be represented as a 4D tensor with shape 512 × 512 × 3 × 100. Lastly, it is a good practice to normalize, standardize, or scale the input values before training. Normalizing the values will bring all values in the input tensor into a 0–1 range, and standardization will bring the values into a range where the mean is 0 and the standard deviation is 1. This helps to reduce computation, as the learning improves by a great margin and so does performance, as the activation functions (covered in the following) behave more appropriately. Neuron At the core of the DNN, we have neurons where computation for an output is executed. A neuron receives one or more inputs from the neurons in the previous layer. If the neurons are in the first hidden layer, they will receive the data from the input data stream. In the biological neuron, an electric signal is given as an output when it receives an input with a higher influence. To map that functionality in the mathematical neuron, we need to have a function that operates on the sum of input multiplied by the corresponding weights (denoted as f(z) in the following visual) and responds with an appropriate value based on the input. If a higher influence input is received, the output should be higher, and vice versa. It is in a way analogous to the activation signal (i.e., higher influence > then activate, otherwise deactivate). The function that works on the computed input data is called the activation function. 23 Chapter 2 Keras in Action Activation Function An activation function is the function that takes the combined input z as shown in the preceding illustration, applies a function on it, and passes the output value, thus trying to mimic the activate/deactivate function. The activation function, therefore, determines the state of a neuron by computing the activation function on the combined input. A quick thought crossing your mind might be as follows: why do we really need an activation function to compute the combined output z, when we could just pass the value of z as the final output? There are several problems here. Firstly, the range of the output value would be Infinity to + Infinity, where we won’t have a clear way of defining a threshold where activation should happen. Secondly, the network will in a way be 24 Chapter 2 Keras in Action rendered useless, as it won’t really learn. This is where a bit of calculus and derivatives come into the picture. To simplify the story, we can say that if your activation function is a linear function (basically no activation), then the derivative of that function becomes 0; this becomes a big issue because training with the backpropagation algorithm helps give feedback to the network about wrong classifications and thereby helps a neuron to adjust its weights by using a derivative of the function. If that becomes 0, the network loses out on this learning ability. To put it another way, we can say there is really no point of having the DNN, as the output of having just one layer would be similar to having n layers. To keep things simple, we would always need a nonlinear activation function (at least in all hidden layers) to get the network to learn properly. There are a variety of choices available to use as an activation function. The most common ones are the sigmoid function and the ReLU (rectified linear unit). Sigmoid Activation Function 1 , which renders the output (1+ e  z ) between 0 and 1 as shown in the following illustration. The nonlinear output (s shaped as shown) improves the learning process very well, as it closely resembles the following principle—lower influence: low output and higher influence: higher output—and also confines the output within the 0to1 range. In Keras, the sigmoid activation function is available as keras. activations.sigmoid(x). We can import this into Python simply with the import command: A sigmoid function is defined as import keras.activations.sigmoid 25 Chapter 2 Keras in Action ReLU Activation Function Similarly, the ReLU uses the function f(z) = max(0,z), which means that if the output is positive it would output the same value, otherwise it would output 0. The function’s output range is shown in the following visual. Keras provides ReLU as keras.activations.relu(x, alpha=0.0, max_value=None) The function may look linear, but it isn’t. ReLU is a valid nonlinear function and in fact works really well as an activation function. It not only improves the performance but significantly helps the number of 26 Chapter 2 Keras in Action computations to be reduced during the training phase. This is a direct result of the 0 value in the output when z is negative, thereby deactivating the neuron. But because of the horizontal line with 0 as the output, we can face serious issues sometimes. For instance, in the previous section we discussed a horizontal line, which is a constant with a derivative of 0 and therefore may become a bottleneck during training, as the weights will not easily get updated. To circumvent the problem, there was a new activation function proposed: Leaky ReLU, where the negative value outputs a slightly slanting line instead of a horizontal line, which helps in updating the weights through backpropagation effectively. Leaky ReLU is defined as f(z) = z ; when z >0 f(z) = ∝z ; when z<0 and where ∝ is a parameter that is defined as a small constant, say 0.005 Keras provides Leaky ReLU as follows: keras.layers.LeakyReLU(X, alpha=0.0, max_value=None). We can directly use the activation function by setting the value of alpha with a small constant. 27 Chapter 2 Keras in Action There are many more activation functions that can be used in a DNN and are available in Keras. A few other popular ones are tanh (hyperbolic tan activation), swish activation, elu (exponential linear unit), selu (scaled elu), and so on. Model The overall structure of a DNN is developed using the model object in Keras. This provides a simple way to create a stack of layers by adding new layers one after the other. The easiest way to define a model is by using the sequential model, which allows easy creation of a linear stack of layers. The following example showcases the creation of a simple sequential model with one layer followed by an activation. The layer would have 10 neurons and would receive an input with 15 neurons and be activated with the ReLU activation function. from keras.models import Sequential from keras.layers import Dense, Activation model = Sequential() model.add(Dense(10, input_dim=15)) model.add(Activation('relu')) Layers A layer in the DNN is defined as a group of neurons or a logically separated group in a hierarchical network structure. As DL became more and more popular, there were several experiments conducted with network architectures to improve performance for a variety of use cases. The use cases centered around regular supervised algorithms like classification and regression, computer vision experiments, extending DL for natural language processing and understanding, speech recognition, and 28 Chapter 2 Keras in Action combinations of different domains. To simplify the model development process, Keras provides us with several types of layers and various means to connect them. Discussing all of them would be beyond the scope of the book. However, we will take a close look at a few layers and also glance through some important layers for other advanced use cases, which you can explore later. Core Layers There are a few important layers that we will be using in most use cases. Dense Layer A dense layer is a regular DNN layer that connects every neuron in the defined layer to every neuron in the previous layer. For instance, if Layer 1 has 5 neurons and Layer 2 (dense layer) has 3 neurons, the total number of connections between Layer 1 and Layer 2 would be 15 (5 × 3). Since it accommodates every possible connection between the layers, it is called a “dense” layer. Keras offers the dense layer with the following default parameters. keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None) It offers a lot of customization for any given layer. We can specify the number of units (i.e., neurons for the layer), the activation type, the type initialization for kernel and bias, and other constraints. Most often, we just use parameters like units and activation. The rest can be left to the defaults 29 Chapter 2 Keras in Action for simplicity. These additional parameters become important when we are working in specialized use cases where the importance of using specific types of constraints and initializers for a given layer is paramount. We also need to define the input shape for the Keras layer. The input shape needs to be defined for only the first layer. Subsequent layers just need the number of neurons defined. We can use the input_dim attribute to define how many dimensions the input has. For instance, if we have a table with 10 features and 1000 samples, we need to provide the input_dim as 10 for the layer to understand the shape of input data. Example: A network with one hidden layer and the output layer for simple binary classification. Layer 1 has 5 neurons and expects an input with 10 features; therefore, input_dim =10. The final layer is the output, which has one neuron. model = Sequential() model.add(Dense(5,input_dim=10,activation = "sigmoid")) model.add(Dense(1,activation = "sigmoid")) D ropout Layer The dropout layer in DL helps reduce overfitting by introducing regularization and generalization capabilities into the model. In the literal sense, the dropout layer drops out a few neurons or sets them to 0 and reduces computation in the training process. The process of arbitrarily dropping neurons works quite well in reducing overfitting. We will take up this topic in more depth and understand the rationale behind overfitting, model generalization in Chapter 5. Keras offers a dropout layer with the following default parameters: keras.layers.Dropout(rate, noise_shape=None, seed=None) 30 Chapter 2 Keras in Action We add the dropout layer after a regular layer in the DL model architecture. The following codes show a sample: model = Sequential() model.add(Dense(5,input_dim=10,activation = "sigmoid")) model.add(Dropout(rate = 0.1,seed=100)) model.add(Dense(1,activation = "sigmoid")) Other Important Layers Considering the diversity of use cases, Keras has inbuilt defined layers for most. In computer vision use cases, the input is usually an image. There are special layers to extract features from images; they are called convolutional layers. Similarly, for natural language processing and similar use cases, there is an advanced DNN called recurrent neural network (RNN). Keras has provided several different types of recurrent layers for its development. The list is quite long, and we won’t cover the other advanced layers now. However, in order to keep you updated, here are some of the other important layers in Keras that will be handy for you for advanced use cases in the future: • Embedding layers  https://keras.io/layers/ embeddings/ • Convolutional layers  https://keras.io/layers/ convolutional/ • Pooling layers  https://keras.io/layers/pooling/ • Merge layers  https://keras.io/layers/merge/ • Recurrent layers  https://keras.io/layers/ recurrent/ • Normalization layers and many more  https://keras. io/layers/normalization/ 31 Chapter 2 Keras in Action You can also write your own layers in Keras for a different type of use case. More details can be explored here: https://keras.io/layers/ writingyourownkeraslayers/ The Loss Function The loss function is the metric that helps a network understand whether it is learning in the right direction. To frame the loss function in simple words, consider it as the test score you achieve in an examination. Say you appeared for several tests on the same subject: what metric would you use to understand your performance on each test? Obviously, the test score. Assume you scored 56, 60, 78, 90, and 96 out of 100 in five consecutive language tests. You would clearly see that the improving test scores are an indication of how well you are performing. Had the test scores been decreasing, then the verdict would be that your performance is decreasing and you would need to change your studying methods or materials to improve. Similarly, how does a network understand whether it is improving its learning process in each iteration? It uses the loss function, which is analogous to the test score. The loss function essentially measures the loss from the target. Say you are developing a model to predict whether a student will pass or fail and the chance of passing or failing is defined by the probability. So, 1 would indicate that he will pass with 100% certainty and 0 would indicate that he will definitely fail. The model learns from the data and predicts a score of 0.87 for the student to pass. So, the actual loss here would be 1.00 – 0.87 = 0.13. If it repeats the exercise with some parameter updates in order to improve and now achieves a loss of 0.40, it would understand that the changes it has made are not helping the network to appropriately learn. Alternatively, a new loss of 0.05 would indicate that the updates or changes from the learning are in the right direction. Based on the type of data outcome, we have several standard loss functions defined in ML and DL. For regression use cases (i.e., where the end prediction would be a continuous number like the marks scored by a 32 Chapter 2 Keras in Action student, the number of product units sold by a shop, the number of calls received from customers in a contact center, etc.), here are some popular loss functions available: • Mean Squared Error  Average squared difference between the actual and predicted value. The squared difference makes it easy to penalize the model more for a higher difference. So, a difference of 3 would result in a loss of 9, but difference of 9 would return a loss of 81. • The mathematical equivalent would be k å n =1 • ( Actual  Predicted ) 2 k Keras equivalent keras.losses.mean_squared_error(y_actual, y_pred) • Mean Absolute Error – The average absolute error between actual and predicted. • The mathematical equivalent would be k å Actual  Predicted n =1 • Keras equivalent keras.losses.mean_absolute_error (y_actual, y_pred) • Similarly, few other variants are • MAPE – Mean absolute percentage error keras.losses.mean_absolute_percentage_error • MSLE – Mean square logarithmic error keras.losses.mean_squared_logarithmic_error 33 Chapter 2 Keras in Action For categorical outcomes, your prediction would be for a class, like whether a student will pass (1) or fail (0), whether the customer will make a purchase or not, whether the customer will default on payment or not, and so on. Some use cases may have multiple classes as an outcome, like classifying types of disease (Type A, B, or C), classifying images as cats, dogs, cars, horses, landscapes, and so on. In such cases, the losses defined in the preceding cannot be used due to obvious reasons. We would need to quantify the outcome of the class as probability and define losses based on the probability estimates as predictions. A few popular choices for losses for categorical outcomes in Keras are as follows: • Binary crossentropy: Defines the loss when the categorical outcomes is a binary variable, that is, with two possible outcomes: (Pass/Fail) or (Yes/No) • The mathematical form would be Loss = − [ y * log(p) + (1−y) * log(1−p) ] • Keras equivalent keras.losses.binary_crossentropy(y_ actual, y_predicted) • Categorical crossentropy: Defines the loss when the categorical outcomes is a nonbinary, that is, >2 possible outcomes: (Yes/No/Maybe) or (Type 1/ Type 2/… Type n) • The mathematical form would be n Loss = å yi‘ log 2 yi i • Keras equivalent keras.losses.categorical_crossentropy (y_actual, y_predicted) 34 Chapter 2 Keras in Action Optimizers The most important part of the model training is the optimizer. Up to this point, we have addressed the process of giving feedback to the model through an algorithm called backpropagation; this is actually an optimization algorithm. To add more context, imagine the model structure that you have defined to classify whether a student will pass or fail. The structure created by defining the sequence of layers with the number of neurons, the activation functions, and the input and output shape is initialized with random weights in the beginning. The weights that determined the influence of a neuron on the next neuron or the final output are updated during the learning process by the network. In a nutshell, a network with randomized weights and a defined structure is the starting point for a model. The model can make a prediction at this point, but it would almost always be of no value. The network takes one training sample and uses its values as inputs to the neurons in the first layer, which then produces an output with the defined activation function. The output now becomes an input for the next layer, and so on. The output of the final layer would be the prediction for the training sample. This is where the loss function comes into the picture. The loss function helps the network understand how well or poorly the current set of weights has performed on the training sample. The next step for the model is to reduce the loss. How does it know what steps or updates it should perform on the weights to reduce the loss? The optimizer function helps it understand this step. The optimizer function is a mathematical algorithm that uses derivatives, partial derivatives, and the chain rule in calculus to understand how much change the network will see in the loss function by making a small change in the weight of the neurons. The change in the loss function, which would be an increase or decrease, helps 35 Chapter 2 Keras in Action in determining the direction of the change required in the weight of the connection. The computation of one training sample from the input layer to the output layer is called a pass. Usually, training would be done in batches due to memory constraints in the system. A batch is a collection of training samples from the entire input. The network updates its weights after processing all samples in a batch. This is called an iteration (i.e., a successful pass of all samples in a batch followed by a weight update in the network). The computing of all training samples provided in the input data with batchbybatch weight updates is called an epoch. In each iteration, the network leverages the optimizer function to make a small change to its weight parameters (which were randomly initialized at the beginning) to improve the end prediction by reducing the loss function. Step by step, with several iterations and then several epochs, the network updates its weights and learns to make a correct prediction for the given training samples. The mathematical explanation for the functioning of the optimizer function was abstracted in a simple way for you to understand and appreciate the background operations that happen in the DNN during the training process. The indepth math equations and the reasoning for the optimization process are beyond the scope of this book. In case you are supercurious about learning math and the actual process of the optimization algorithm, I would recommend reading a chapter from the book Pro Deep Learning with TensorFlow by Santanu Pattanayak (Apress, 2017). The book does an amazing job of explaining the math behind DL with a very intuitive approach. I highly recommend this book to all PhD students exploring DL. Given that you have a fair understanding of the overall optimization process, I would like to take a minute to discuss various optimization algorithms available in Keras. 36 Chapter 2 Keras in Action Stochastic Gradient Descent (SGD) SGD performs an iteration with each training sample (i.e., after the pass of every training sample, it calculates the loss and updates the weight). Since the weights are updated too frequently, the overall loss curve would be very noisy. However, the optimization is relatively fast compared to others. The formula for weight updates can be expressed in a simple way as follows: Weights = Weights – learning rate * Loss Where learning rate is a parameter we define in the network architecture. Say, for learning rate =0.01 Keras provides SGD with keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False) For updates with every training sample, we would need to use batch_ size=1 in the model training function. To reduce high fluctuations in the SGD optimizations, a better approach would be to reduce the number of iterations by providing a minibatch, which would then enable averaging the loss for all samples in a batch and updating the weights at the end of the batch. This approach has been more successful and results in a smoother training process. Batch size is usually set in powers of 2 (i.e., 32, 64, 128, etc.). Adam Adam, which stands for Adaptive Moment Estimation, is by far the most popular and widely used optimizer in DL. In most cases, you can blindly choose the Adam optimizer and forget about the optimization alternatives. This optimization technique computes an adaptive learning rate for each 37 Chapter 2 Keras in Action parameter. It defines momentum and variance of the gradient of the loss and leverages a combined effect to update the weight parameters. The momentum and variance together help smooth the learning curve and effectively improve the learning process. The math representation can be simplified in the following way: Weights = Weights – (Momentum and Variance combined) Keras provides the Adam optimizer as keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False) The parameters beta_1 and beta_2 are used in computing the momentum and variance, respectively. The default values work quite effectively and doesn’t need to be changed for most use cases. Other Important Optimizers There are many other popular optimizers that can also be used for different DL models. Discussing all of them would be beyond the scope of this book. In the interest of keeping you well informed about the available options, I would like to list a few of the other popular optimization alternatives used and available within Keras: 38 • Adagrad • Adadelta • RMSProp • Adamax • Nadam Chapter 2 Keras in Action Each of the optimization techniques has its own pros and cons. A major problem which we often face in DL is the vanishing gradient and saddle point problem. You can explore these problems in more detail while choosing the best optimizer for your problem. But for most use cases, Adam always works fine. Metrics Similar to the loss function, we also define metrics for the model in Keras. In a simple way, metrics can be understood as the function used to judge the performance of the model on a different unseen dataset, also called the validation dataset. The only difference between metrics and the loss function is that the results from metrics are not used in training the model with respect to optimization. They are only used to validate the test results while reporting. A few available options for metrics in Keras are as follows: • Binary Accuracy  keras.metrics.binary_accuracy • Categorical Accuracy  keras.metrics.caetogrical_ accuracy • Sparse Categorical Accuracy  keras.metrics.sparse_ categorical_accuracy You can also define custom functions for your model metrics. Keras provides you with the ability to easily configure a model with userdefined metrics. Model Configuration Now that we understand the most fundamental building blocks of a DNN in Keras, we can take a look at the final model configuration step, which orchestrates all the preceding components together. 39 Chapter 2 Keras in Action Once you have designed your network, Keras provides you with an easy onestep model configuration process with the ‘compile’ command. To compile a model, we need to provide three parameters: an optimization function, a loss function, and a metric for the model to measure performance on the validation dataset. The following example builds a DNN with two hidden layers, with 32 and 16 neurons, respectively, with a ReLU activation function. The final output is for a binary categorical numeric output using a sigmoid activation. We compile the model with the Adam optimizer and define binary crossentropy as the loss function and “accuracy” as the metric for validation. from keras.models import Sequential from keras.layers import Dense, Activation model = Sequential() model.add(Dense(32, input_dim=10,activation = "relu")) model.add(Dense(16,activation = "relu")) model.add(Dense(1,activation = "sigmoid")) model.compile(optimizer='Adam',loss='binary_crossentropy', metrics=['accuracy']) Model Training Once we configure a model, we have all the required pieces for the model ready. We can now go ahead and train the model with the data. While training, it is always a good practice to provide a validation dataset for us to evaluate whether the model is performing as desired after each epoch. The model leverages the training data to train itself and learn the patterns, and at the end of each epoch, it will use the unseen validation data to make predictions and compute metrics. The performance on the validation dataset is a good cue for the overall performance. 40 Chapter 2 Keras in Action For validation data, it is a common practice to divide your available data into three parts with a 60:20:20 ratio. We use 60% for training, 20% for validation, and the last 20% for testing. This ratio is not a mandate. You have the flexibility to change the ratio as per your choice. In general, when you have really large training datasets, say n>1MN samples, it is fine to take 95% for training, 2% for validation, and 3% for testing. Again, the ratio is a choice you make based on your judgment and available data. Keras provides a fit function for the model object to train with the provided training data. Here is a sample model invoking its fit method. At this point, it is assumed that you have the model architecture defined and configured (compiled) as discussed in the preceding. model.fit(x_train, y_train, batch_size=64, epochs=3, validation_data=(x_val, y_val)) We have a model being trained on a training dataset named x_train with the actual labels in y_train. We choose a batch size of 64. Therefore, if there were 500 training samples, the model would intake and process 64 samples at a time in a batch before it updates the model weights. The last batch may have <64 training sample if unavailable. We have set the number of epochs to three; therefore, the whole process of training 500 sample in batches of 64 will be repeated thrice. Also, we have provided the validation dataset as x_val and y_val. At the end of each epoch, the model would use the validation data to make predictions and compute the performance metrics as defined in the metrics parameter of the model configuration. Now that we have all the pieces required for the model to be designed, configured, and trained, let’s put all pieces of the puzzle together and see it in action. import numpy as np from keras.models import Sequential from keras.layers import Dense, Activation 41 Chapter 2 Keras in Action # Generate dummy training dataset np.random.seed(2018) x_train = np.random.random((6000,10)) y_train = np.random.randint(2, size=(6000, 1)) # Generate dummy validation dataset x_val = np.random.random((2000,10)) y_val = np.random.randint(2, size=(2000, 1)) # Generate dummy test dataset x_test = np.random.random((2000,10)) y_test = np.random.randint(2, size=(2000, 1)) #Define the model architecture model = Sequential() model.add(Dense(64, input_dim=10,activation = "relu")) #Layer 1 model.add(Dense(32,activation = "relu")) #Layer 2 model.add(Dense(16,activation = "relu")) #Layer 3 model.add(Dense(8,activation = "relu")) #Layer 4 model.add(Dense(4,activation = "relu")) #Layer 5 model.add(Dense(1,activation = "sigmoid")) #Output Layer #Configure the model model.compile(optimizer='Adam',loss='binary_crossentropy',metri cs=['accuracy']) #Train the model model.fit(x_train, y_train, batch_size=64, epochs=3, validation_data=(x_val,y_val)) 42 Chapter 2 Keras in Action The output while training the model is showcased in the following: We can see that after every epoch, the model prints the mean training loss and accuracy as well as the validation loss and accuracy. We can use these intermediate results to make a judgment on the model performance. In most large DL use cases, we would have several epochs for training. It is a good practice to keep a track of the model performance with the metrics we have configured at intervals to see the results after a few epochs. If the results don’t seem in your favor, it might be a good idea to stop the training and revisit the model architecture and configuration. Model Evaluation In all of the preceding examples, we have looked into a specific portion of the model development step or we have concluded with model training. We haven’t discussed model performance so far. Understanding how effectively your model is performing on an unseen test dataset is of paramount importance. Keras provides the model object equipped with inbuilt model evaluation and another function to predict the outcome from a test dataset. Let’s have a look at both of these using the trained model and dummy test data generated in the preceding example. The method provided by Keras for the sequential model is as shown in the following: evaluate(x=None, y=None, batch_size=None, verbose=1, sample_ weight=None, steps=None) 43 Chapter 2 Keras in Action We provide the test data and the test labels in the parameters x and y. In cases where the test data is also huge and expected to consume a significant amount of memory, you can use the batch size to tell the Keras model to make predictions batchwise and then consolidate all results. print(model.evaluate(x_test,y_test)) [0.6925005965232849, 0.521] In the evaluate method, the model returns the loss value and all metrics defined in the model configuration. These metric labels are available in the model property metrics_names. print(model.metrics_names) ['loss', 'acc'] We can therefore see that the model has an overall accuracy of 52% on the test dataset. This is definitely not a good model result, but it was expected given that we used just a dummy dataset. Alternatively, you could use the predict method of the model and leverage the actual predictions that would be probabilities (for this use case, since binary classification): #Make predictions on the test dataset and print the first 10 predictions pred = model.predict(x_test) pred[:10] Output 44 Chapter 2 Keras in Action This output can be used to make even more refined final predictions. A simple example is that the model would use 0.5 as the threshold for the predictions. Therefore, any predicted value above 0.5 is classified as 1 (say, Pass), and others as 0 (Fail). Depending on your use case, you might want to slightly tweak your prediction for more aggressive correct prediction for 1 (Pass), so you might choose a threshold at 0.6 instead of 0.5, or vice versa. Putting All the Building Blocks Together I hope you can now make sense of the first DNN model we saw in the last section of Chapter 1. Before understanding all the basic building blocks, it would have been overwhelming to grasp the reasoning for the code used in the model development. Now that we have all the basic necessary ingredients ready, let’s look at more tangible use case before we conclude this chapter. To do so, let’s take a better dataset and see what things look like. Keras also provides a few datasets to play with. These are real datasets and are usually used by most beginners during their initial experiments with ML and DL. For our experiment, let’s select a popular Keras dataset for developing a model. We can start with the Boston House Prices dataset. It is taken from the StatLib library, which is maintained at Carnegie Mellon University. The data is present in an Amazon S2 bucket, which we can download by using simple Keras commands provided exclusively for the datasets. #Download the data using Keras; this will need an active internet connection from keras.datasets import boston_housing (x_train, y_train), (x_test, y_test) = boston_housing.load_ data() 45 Chapter 2 Keras in Action The dataset is directly downloaded into the Python environment and is ready to use. Let’s have a look at what the data looks like. We will use basic Python commands to look at the type of data, its length and breadth, and a preview of the content. #Explore the data structure using basic python commands print("Type of the Dataset:",type(y_train)) print("Shape of training data :",x_train.shape) print("Shape of training labels :",y_train.shape) print("Shape of testing data :",type(x_test)) print("Shape of testing labels :",y_test.shape) Output Type of the Dataset: <class 'numpy.ndarray'> Shape of training data : (404, 13) Shape of training labels : (404,) Shape of testing data : <class 'numpy.ndarray'> Shape of testing labels : (102,) We can see that the training and test datasets are Python numpy arrays. Numpy is a Python library to handle large multidimensional arrays. We have 404 rows of data with 13 features in the training dataset and 102 rows with the same number of features in the test dataset. Overall, it’s approximately an 80:20 ratio between train and test. We can further divide the 402 rows of training data into 300 for training and 102 for validation. Alright, the data structure and its shape look great. Let’s have a quick look at the contents of the dataset. The preceding code showcased that we have 13 columns in the data. To understand the actual column names, we would need to refer to the data dictionary provided by CMU. You can find more details about the dataset here: http://lib.stat.cmu.edu/ datasets/boston. 46 Chapter 2 Keras in Action The description for the features in the data is showcased in the following list. The last row in the list refers to the label or the actual house price in our use case. Column Name Description CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 sq. ft. INDUS proportion of nonretail business acres per town CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) NOX nitric oxide concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owneroccupied units built prior to 1940 DIS weighted distances to five Boston employment centers RAD index of accessibility to radial highways TAX fullvalue property tax rate per $10,000 PTRATIO pupilteacher ratio by town B 1000(Bk – 0.63)^2, where Bk is the proportion of blacks by town LSTAT % lower status of the population MEDV median value of owneroccupied homes in $1000’s To look at the contents of the training dataset, we can use the indexslicing option provided by Python’s numpy library for the numpy ndimensional arrays. x_train[:3,:] 47 Chapter 2 Keras in Action Output array([[1.23247e+00, 0.00000e+00, 5.38000e01, 6.14200e+00, 4.00000e+00, 3.07000e+02, 1.87200e+01], [2.17700e02, 8.25000e+01, 4.15000e01, 7.61000e+00, 2.00000e+00, 3.48000e+02, 3.11000e+00], [4.89822e+00, 0.00000e+00, 6.31000e01, 4.97000e+00, 2.40000e+01, 6.66000e+02, 3.26000e+00]]) 8.14000e+00, 0.00000e+00, 9.17000e+01, 3.97690e+00, 2.10000e+01, 3.96900e+02, 2.03000e+00, 0.00000e+00, 1.57000e+01, 6.27000e+00, 1.47000e+01, 3.95380e+02, 1.81000e+01, 0.00000e+00, 1.00000e+02, 1.33250e+00, 2.02000e+01, 3.75520e+02, All columns have numeric values, so there is no need for data transformation. Usually, once we have imported the dataset, we will need to extensively explore the data and will almost always clean, process, and augment it before we can start developing the models. But for now, we will directly go ahead with a simple model and see what the results look like. import numpy as np from keras.models import Sequential from keras.layers import Dense, Activation #Extract the last 100 rows from the training data to create the validation datasets. x_val = x_train[300:,] y_val = y_train[300:,] 48 Chapter 2 Keras in Action #Define the model architecture model = Sequential() model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu')) model.add(Dense(6, kernel_initializer='normal', activation='relu')) model.add(Dense(1, kernel_initializer='normal')) # Compile model model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_absolute_percentage_error']) #Train the model model.fit(x_train, y_train, batch_size=32, epochs=3, validation_data=(x_val,y_val)) Output Train on 404 samples, validate on 104 samples Epoch 1/3 404/404 [==============================]  2s 4ms/step  loss: 598.8595  mean_absolute_percentage_error: 101.7889  val_loss: 681.4912  val_mean_absolute_percentage_error: 100.0789 Epoch 2/3 404/404 [==============================]  0s 81us/step  loss: 583.6991  mean_absolute_percentage_error: 99.7594  val_loss: 674.8345  val_mean_absolute_percentage_error: 99.2616 Epoch 3/3 404/404 [==============================]  0s 94us/step  loss: 573.6101  mean_absolute_percentage_error: 98.3180  val_loss: 654.3787  val_mean_absolute_percentage_error: 96.9662 49 Chapter 2 Keras in Action We have created a simple twohiddenlayer model for the regression use case. We have chosen MAPE as the metric. Generally, this is not the best metric to choose for studying model performance, but its advantage is simplicity in terms of comprehending the results. It gives a simple percentage value for the error, say 10% error. So, if you know the average range of your prediction, you can easily estimate what the predictions are going to look like. Let’s now train the model and use the evaluate function to study the results of the model. results = model.evaluate(x_test, y_test) for i in range(len(model.metrics_names)): print(model.metrics_names[i]," : ", results[i]) Output 102/102 [==============================]  0s 87us/step loss : 589.7658882889093 mean_absolute_percentage_error : 96.48218611174939 We can see that MAPE is around 96%, which is actually not a great number to have for model performance. This would translate into our model predictions at around 96% error. So, in general, if a house was priced at 10K, our model would have predicted ~20K. In DL, the model updates weight after every iteration and evaluates after every epoch. Since the updates are quite small, it usually takes a fairly higher number of epochs for a generic model to learn appropriately. To test the performance once again, let’s increase the number of epochs to 30 instead of 3. This would increase the computation significantly and might take a while to execute. But since this is a fairly small dataset, training with 30 epochs should not be a problem. It should execute in ~1 min on your system. 50 Chapter 2 Keras in Action #Train the model model.fit(x_train, y_train, batch_size=32, epochs=30, validation_data=(x_val,y_val)) Output Train on 404 samples, validate on 104 samples Epoch 1/1000 404/404 [==============================]  0s 114us/step loss: 536.6662  mean_absolute_percentage_error: 93.4381  val_ loss: 580.3155  val_mean_absolute_percentage_error: 88.6968 Epoch 2/1000 404/404 [==============================]  0s 143us/step loss: 431.7025  mean_absolute_percentage_error: 79.0697  val_ loss: 413.4064  val_mean_absolute_percentage_error: 67.0769 Skipping the output for inbetween epochs. (Adding output for only the last three epochs, i.e., 28 to 30) Epoch 28/30 404/404 [==============================]  0s 111us/step loss: 6.0758  mean_absolute_percentage_error: 9.5185  val_ loss: 5.2524  val_mean_absolute_percentage_error: 8.3853 Epoch 29/30 404/404 [==============================]  0s 100us/step loss: 6.2895  mean_absolute_percentage_error: 10.1037  val_ loss: 6.0818  val_mean_absolute_percentage_error: 8.9386 Epoch 30/30 404/404 [==============================]  0s 111us/step loss: 6.0761  mean_absolute_percentage_error: 9.8201  val_ loss: 7.3844  val_mean_absolute_percentage_error: 8.9812 51 Chapter 2 Keras in Action If we take a closer look at the loss and MAPE for the validation datasets, we can see a significant improvement. It has reduced from 96% in the previous example to 8.9% now. Let’s have a look at the test results. results = model.evaluate(x_test, y_test) for i in range(len(model.metrics_names)): print(model.metrics_names[i]," : ", results[i]) Output 102/102 [==============================]  0s 92us/step loss : 22.09559840782016 mean_absolute_percentage_error : 16.22196163850672 We can see that the results have improved significantly, but there still seems to be a significant gap between the MAPE for validation dataset and the test dataset. As discussed earlier, this gap is an indicator that the model has overfit, or in simple terms, has overcomplicated the process of learning. We will look in detail at the steps to reduce overfitting in DNNs in the next chapter for a bigger and better use case. For now, we have successfully explored Keras on a real dataset (though a small one) and used our learnings on the building blocks of DL in Keras. Summary In this chapter, we explored Keras in depth with handson exercises as well as contextual depth of topics. We studied the basic building blocks of DL and its implementation in Keras. We looked at how we can combine the different building blocks together in using Keras to develop DNN models. In the next chapter, we will start exploring a real use case step by step by exploring, cleaning, extracting, and applying the necessary transformations to get the data ready for developing DL models. 52 CHAPTER 3 Deep Neural Networks for Supervised Learning: Regression In Chapters 1 and 2, we explored the topic of DL and studied how DL evolved from ML to solve an interesting area of problems. We discussed the need for DL frameworks and briefly explored a few popular frameworks available in the market. We then studied why Keras is special and spent some time playing around with its basic building blocks provided to develop DNNs and also understood the intuition behind a DL model holistically. We then put together all our learnings from the practical exercises to develop a baby neural network for the Boston house prices use case. Now that we have a fair understanding of the different DL building blocks and the associated science, let’s explore a practical DNN for a regression use case in this chapter. G etting Started The evolution of AI as a field and the increasing number of researchers and practitioners in the field have created a mature and benevolent community. Today, it’s fairly easy to access tools, research papers, datasets, and in fact even infrastructure to practice DL as a field. For our first use © Jojo Moolayil 2019 J. Moolayil, Learn Keras for Deep Neural Networks, https://doi.org/10.1007/9781484242407_3 53 Chapter 3 Deep Neural Networks for Supervised Learning: Regression case, we would need a dataset and a business problem to get started. Here are a few popular choices. • Kaggle: www.kaggle.com/ Kaggle is the world’s largest community of data scientists and machine learners. It started off as an online ML competition forum and later evolved into a mature platform that is highly recommended for every individual in data science. It still hosts ML competitions and also provides ML datasets, kernels or communitydeveloped scripts for solving ML problems, ML jobs, and a platform to develop and execute ML models for the hosted competitions and public datasets. • US Government Open Data: www.data.gov/ Provides access to thousands of datasets on agriculture, climate, finance, and so on. • Indian Government Open Data: https://data.gov.in/ Provides open datasets for India’s demography, education, economy, industries, and so on. • Amazon Web Service Datasets: https://registry. opendata.aws/ Provides a few large datasets from NASA NEX and Openstreetmap, the Deutsche Bank public dataset, and so on. • 54 Google Dataset Search: https://toolbox.google. com/datasetsearch Chapter 3 Deep Neural Networks for Supervised Learning: Regression This is relatively new and still in beta (at the writing of this book), but very promising. It provides access to thousands of public datasets for research experiments with a simple search query. It aggregates datasets from several public dataset repositories. • UCI ML Repository: https://archive.ics.uci.edu/ml/ Another popular repository to explore datasets for ML and DL. We will use the Kaggle public data repository for getting datasets for our DL use case. We will use the Rossmann Store sales dataset, which is available at www.kaggle.com/c/rossmannstoresales/data. This was a very popular competition hosted a couple of years ago and has a fairly large dataset. You would need to register with Kaggle and accept the competition rules to be able to download the data. In case you have not already registered with Kaggle, I would highly recommend doing it. Every data science professional should keep a close watch on Kaggle for its great learning, experimentation, and discussion platform for data science. From the datasets, you need only train.csv and store.csv, which are around 38MB and 45KB, respectively. Please download the data and keep it ready in a separate folder. P roblem Statement Rossmann is one of the largest drugstore chains in Germany, with operations across Europe. As of 2018, they have well over 3,900 stores in Europe with an annual turnover of 9 billion euros. Our task is to predict the sales for a few identified stores on a given day. 55 Chapter 3 Deep Neural Networks for Supervised Learning: Regression Now, let’s look at the problem from a pure business perspective. The first question you would need to ask is: who is the end stakeholder for the business problem and how is he going to utilize the solution? Well, given that this was an online data science competition, we won’t have a validated answer for this question, but we can more or less figure out what one would look like. First, we need to reframe the problem statement in a slightly strategic way to be able to represent the problem statement as a design solution. There are several problemsolving frameworks recognized by the market to help define and represent a problem statement in a standard way to be more effective in solving the problem. McKinsey’s “Situation– Complication–Resolution” (SCR) and Mu Sigma Inc.’s “Situation Complication Question” (SCQ) are among the most popular frameworks. We will leverage one of the aforementioned frameworks to represent our problem statement in a more effective and concise way. But let us first understand why this would be important. hy Is Representing a Problem Statement W with a Design Principle Important? Most large, complex problems need detailed design, peer reviews, validation of approach and strategy, a ton of brainstorming, and probably even a small proof of concept before getting started. Enterprise software development is a classic example. You would have a team defining the business requirements and documenting them for future reference, designing a highlevel diagram followed by a lowlevel design and eventually detailing the specifics for each software component and how the end solution would look. At any point in time, if a new engineer joins the team to collaborate, the design documents, approach, and business requirements would help him understand the larger picture without the need for individual discussions. Also, at any point in time the design and approach help in the smooth execution of the overall objective. 56