Learning To Love Data Science

Learning to love data science is a collection of reports that Mike Barlow wrote for O’Reilly Media in 2013, 2014, and 2015. The reports focused on topics generally associated with data science, machine learning, predictive analytics, and “big data,” a term that has largely fallen from favour. Since Mike is a journalist and not a scientist, he approached the reports from the perspective of a curious outsider.

The reports betray his sense of amused detachment, which is probably the right way to approach writing about a field like data science, and his ultimate faith in the value of technology, which seems unjustifiably optimistic. At any rate, the reports provide valuable snapshots, taken almost randomly, of a field whose scale, scope, and influence are ____growing steadily. Mike’s reports are like dispatches from a battlefield; they aren’t history, but they provide an exciting and reasonably accurate picture of life on the front lines.

Learning To Love Data Science
Learning To Love Data Science

With this book, you’ll learn how:

■ Big data is driving a new generation of predictive analytics, creating new products, new business models, and new markets

■ New analytics tools let businesses leap beyond data analysis and go straight to decision-making

■ Indie manufacturers are blurring the lines between hardware and software products

■ Companies are learning to balance their desire for rapid innovation with the need to tighten data security

■ Big data and predictive analytics are applied for social good, resulting in higher standards of living for millions of people

■ Advanced analytics and low-cost sensors are transforming equipment maintenance from a cost centre to a profit centre

Statistical Analysis with R

Although the field of statistics proceeds in a logical way, the writer organized statistical analysis with R so that you can open it up in any chapter and start reading. The idea is for you to find the information you’re looking for in a hurry and use it immediately whether it’s a statistical concept or an R-related one. On the other hand, reading from cover to cover is okay if you’re so inclined. If you’re a statistics newbie and you have to use R to analyze data, I recommend that you begin at the beginning.

You can start reading this book anywhere, but here are a couple of hints. Want to learn the foundations of statistics? Turn the page. Introduce yourself to R? That’s Chapter 2. Want to start with graphics? Hit Chapter 3. For anything else, find it in the table of contents or the index and go for it. In addition to what you’re reading right now, this product comes with a free ____access-anywhere Cheat Sheet that presents a selected list of R functions and describes what they do. To get this Cheat Sheet, visit and type Statistical Analysis with R For Dummies Cheat Sheet in the search box.

Statistical Analysis with R

R For Dummies

R For Dummies is an introduction to the statistical programming language known as R. Book start by introducing the interface and works our way from the very basic concepts of the language through more sophisticated data manipulation and analysis. Book illustrate every step with easy-to-follow examples. This book contains numerous code snippets, several write-it-yourself functions you can use later on, and complete analysis scripts. All these are for you to try out yourself.

Book doesn’t attempt to give a technical description of how R is programmed internally but the book does focus as much on the why as on the how. R doesn’t function as your average scripting language, and it has plenty of unique features that may seem surprising at first. Instead of just telling you how you have to talk to R, writers believe it’s important for us to explain how the R engine reads what you tell it to do. After reading this book, you should be able to manipulate your data in the form you want and understand how to use functions we didn’t cover in the book (as well as the ones we do cover).

R For Dummies takes the steepness out of the learning curve for using R. R For Dummies does not guarantee that you’ll be a guru if you read this book, but you should be able to do the following:

  • Perform data analysis by using a variety of powerful tools Use the power of R to do statistical analysis and other data-processing tasks
  • Appreciate the beauty of using vector-based operations rather than loops to do speedy calculations Appreciate the meaning of the following line of code: knowledge <- apply(theory, 1, sum)
  • Know how to find, download, and use code that has been contributed to R by its very active community of developers

Advanced Analytics with Power BI + R

Data is everywhere. The world contains an astronomical amount of data, an amount that grows larger and larger each day. This vast collection of information has changed the way the world interacts uncovered breakthroughs in medicine and revealed new ways to understand trends in business and in our daily lives. With the increasing availability of data comes new challenges and opportunities as business leaders seek to gain important insights and transform information into actionable and meaningful results. As data becomes more accessible, manipulating vast amounts of available data to drive insights and make business decisions can be challenging.

Business leaders at every level need to become data literate and be able to understand data and analytical concepts that may have previously seemed out of reach, including statistical methods, machine learning, and data manipulation. With this spread of data literacy comes the powerful ability to make educated business decisions that rely on the smart use of data, rather than on an individual’s opinions. In the past, these tasks were extremely complex and would be handed off to engineers.

With the tools that exist today, business leaders are able to dive into their own analytics and uncover powerful insights. Microsoft Power BI brings advanced analytics to the daily business decision process, allowing users to extract valuable knowledge from data to solve business problems. This white paper will cover the advanced analytic capabilities of Power BI, including predictive analytics, data visualisations, R integration, and data analysis expressions.

Advanced Analytics with Power BI + R
Advanced Analytics
with Power BI

Table of contents
Advanced analytics in Power BI ……………………………………4
Predictive analytics with Azure
R integration
Quick Insights feature
Segmentation and cohort analysis ……………………………….9
Data grouping and Binning
Data streaming in Power BI …………………………………………11
Real-time dashboards
Setup of real-time streaming data sets
Visualizations in Power BI…………………………………………….12
Community-sourced visualizations
R visualisations
Custom visualizations
Data connection and shaping……………………………………….14
Azure services
Data fetching with the R connector
Data shaping in Power Query with R
Data Analysis Expressions…………………………………………….17
Conclusion ……………………………………………………………………18

R for Everyone

With the increasing prevalence of data in our daily lives, new and better tools are needed to analyze the deluge. Traditionally there have been two ends of the spectrum: lightweight, individual analysis using tools like Excel or SPSS and heavy-duty, high-performance analysis built with C++ and the like. With the increasing strength of personal computers grew a middle ground that was both interactive and robust. Analysis done by an individual on his or her own computer in an exploratory fashion could quickly be transformed into something destined for a server, underpinning advanced business processes. This area is the domain of R, Python, and other scripted languages.

R, invented by Robert Gentleman and Ross Ihaka of the University of Auckland in 1993, grew out of S, which was invented by John Chambers at Bell Labs. It is a high-level language that was originally intended to be run interactively where the user runs a command, gets a result, and then runs another command. It has since evolved into a language that can also be embedded in systems and tackle complex problems. In addition to transforming and analyzing data, R can produce amazing graphics and reports with ease. It is now being used as a full stack for data analysis, extracting and transforming data, fitting models, drawing inferences and making predictions, and plotting and reporting results.

R’s popularity has skyrocketed since the late 2000s, as it has stepped out of academia and into banking, marketing, pharmaceuticals, politics, genomics and many other fields. Its new users are often shifting from low-level, compiled languages like C++, other statistical packages such as SAS or SPSS, and from the 800-pound gorilla, Excel. This time period also saw a rapid surge in the number of add-on package libraries of prewritten code that extend R’s functionality. While R can sometimes be intimidating to beginners, especially for those without programming experience, I find that programming analysis, instead of pointing and clicking, soon becomes much easier, more convenient and more reliable. It is my goal to make that learning process easier and quicker.

R for everyone lays out information in a way I wish I were taught when learning R in graduate school. Coming full circle, the content of this book was developed in conjunction with the data science course that writer teaches at Columbia University. It is not meant to cover every minute detail of R, but rather the 20% of functionality needed to accomplish 80% of the work.

Microsoft Excel And Business Analysis For Professionals

Microsoft Excel is the world’s most-used business intelligence tool. Its knowledge is even
compulsory for an MBA degree and the business world depends greatly on it. Microsoft excel and business analysis for professionals is intended for Sales Managers, Financial Analysts, Business Analysts, Data Analysts, MIS Analysts, HR Executives and frequent Excel users.

It is written by Michael Olafusi a two-time Microsoft Excel MVP (most valuable professional) and a full-time Microsoft Excel consultant. He is the founder of UrBizEdge, a business data analysis and Microsoft Excel consulting firm.

He has trained hundreds of business professionals on Microsoft Excel and has used the experience gained from interacting with them both during such training and while consulting for companies to write this excellent guide for the busy professional who needs the improved work productivity Microsoft Excel provides.

Microsoft Excel And Business Analysis For Professional
Microsoft Excel And Business Analysis For Professionals

Microsoft Excel: It’s more powerful and easier to…
How Excel Handles What You Type
Data Consistency, starting with the end in view
Building Datasheets that can easily scale
Data Cleaning
Data Formatting
PivotTable and PivotChart
Business Data Analysis
Power Excel Formulas
Named Range, Goal Seek and Scenario Manager
Introduction To Excel VBA (macros)

Statistics In Python

In the era of big data and artificial intelligence, data science and machine learning have become essential in many fields of science and technology. A necessary aspect of working with data is describing, summarising, and visually representing data. Statistics in python is a popular and widely used tool that will assist you in working with data.

There are many Python statistics libraries out there for you to work with, but in this book, you’ll be learning about some of the most popular and widely used ones:

  • Python’s statistics is a built-in Python library for descriptive statistics. You can use it if your datasets are not too large or if you can’t rely on importing other libraries.
  • NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.
  • SciPy is a third-party library for scientific computing based on NumPy. It offers additional functionality compared to NumPy, including scipy.stats for statistical analysis.
  • Pandas is a third-party library for numerical computing based on NumPy. It excels in handling labelled one-dimensional (1D) data with Series objects and two-dimensional (2D) data with DataFrame objects.
  • Matplotlib is a third-party library for data visualization. It works well in combination with NumPy, SciPy, and Pandas.

Data Science Interview Questions and Answers

164 data science interview questions and answers will help you to master the art of interviewing for a data science position, from job-specific technical questions to tricky behavioural inquiries and unexpected brainteasers and guesstimates. This book will prepare you for any job candidacy in the field – data scientist, data analyst, BI analyst, data engineer or data architect.

Its goal is to teach by example – not only by giving you a list of interview questions and their answers but also by sharing the techniques and thought processes behind each question and the expected answer. Once you read it, you’ll have all the knowledge and tools to succeed during the data science interview.

How to Use This Book for Best Results? Award yourself with enough time to work through the
questions. This way, you’ll really understand what they are asking and what information you should highlight for the best response. If studied well, this book will enhance both your technical and communication skills.

Regression Models For Data Science In R

This book is designed as a companion to the Regression Models Coursera class as part of the Data
Science Specialization, a ten-course program offered by three faculty, Jeff Leek, Roger Peng and
Brian Caffo, at the Johns Hopkins University Department of Biostatistics. The videos associated with this book can be watched in full here, though the relevant links to specific videos are placed at the appropriate locations throughout. Before beginning, we assume that you have a working knowledge of the R programming language.

If not, there is a wonderful Coursera class by Roger Peng, that can be found here. In addition, students should know the basics of frequentist statistical inference. There is a Coursera class here and a LeanPub book here. The entirety of the book is on GitHub here. Please submit pull requests if you find errata! In addition, the course notes can also be found on GitHub here. While most code is in the book, all of the code for every figure and analysis in the book is in the R markdown files (.Rmd) for the respective lectures.

Finally, we should mention swirl (statistics with interactive R programming). swirl is an intelligent
tutoring system developed by Nick Carchedi, with contributions by Sean Kross and Bill and Gina
Croft. It offers a way to learn R in R. Download swirl here. There’s a swirl module for this course!.
Try it out, it’s probably the most effective way to learn.


The field guide to data science is a textbook for students who love data science. The writers of this textbook have a deeper understanding of the concepts at the heart of Data Science. Data is the byproduct of our new digital existence. Recorded bits of data from mundane traffic cameras to telescopes peering into the depths of space are propelling us into the greatest age of discovery our species has ever known. Every aspect of our lives, from life-saving disease treatments to national security, to economic stability and even the convenience of selecting a restaurant, can be improved by creating better data analytics through Data Science.

The Field Guide to Data Science provides Booz Allen’s perspective on the complex and sometimes mysterious Field of Data Science. We cannot capture all that is Data Science. Nor can we keep up – the pace at which this field progresses outdates work as fast as it is produced. As a result, writers have opened this field guide to the world as a living document to bend and grow with technology, expertise, and evolving techniques. If you find the guide to be useful, neat, or even lacking, then we encourage you to add your expertise, including:
› Case studies from which you have learned
› Citations for journal articles or papers that inspire you
› Algorithms and techniques that you love
› Your thoughts and comments on other people’s additions

The field guide to data science