Most Useful R Functions You Might Not Know

Almost every R user knows about popular packages like dplyr and ggplot2. But with 10,000+ packages on CRAN and yet more on GitHub, it’s not always easy to unearth libraries with great R functions. Here are the ten most useful R functions you might not know that makes my life easier working in R. If you already know them all, sorry for wasting your reading time, and please consider adding a comment with something else that you find useful for the benefit of other readers.

Useful R Functions You Might Not Know
Most Useful R Functions You Might Not Know

1. RStudio shortcut keys

This is less an R hack and more about the RStudio IDE, but the shortcut keys available for common commands are super useful and can save a lot of typing time. My two favourites are Ctrl+Shift+M for the pipe operator %>% and Alt+- for the assignment operator<-. If you want to see a full set of these awesome shortcuts just type Atl+Shift+K in RStudio.

2. Automate tidyverse styling with styler

It’s been a tough day, you’ve had a lot on your plate. Your code isn’t as neat as you’d like and you don’t have time to line edit it. Fear not. The stylerpackage has numerous functions to allow automatic restyling of your code to match tidyverse style. It’s as simple as running  styler::style_file() on your messy script and it will do a lot (though not all) of the work for you.

3. The Switch function

I LOVE switch(). It’s basically a convenient shortening of an if a statement that chooses its value according to the value of another variable. I find it particularly useful when I am writing code that needs to load a different dataset according to a prior choice you make. For example, if you have a variable called animal and you want to load a different set of data according to whether animal is a dog, cat or rabbit you might write this:

data <- read.csv(
  switch(animal, 
         "dog" = "dogdata.csv", 
         "cat" = "catdata.csv",
         "rabbit" = "rabbitdata.csv")
)

4. k-means on long data

k-means is an increasingly popular statistical method to cluster observations in data, often to simplify a large number of data points into a smaller number of clusters or archetypes. The kml package now allows k-means clustering to take place on longitudinal data, where the ‘data points’ are actually data series. This is super useful where the data points you are studying are actually readings over time. This could be the clinical observation of weight gain or loss in hospital patients or compensation trajectories of employees.

kml works by first transforming data into an object of the class ClusterLongDatausing the cld function. Then it partitions the data using a ‘hill climbing’ algorithm, testing several values of k 20 times each. Finally, the choice()function allows you to view the results of the algorithm for each k graphically and decide what you believe to be an optimal clustering.

5. Text searching

If you’ve been using regular expressions to search for text that starts or ends with a certain character string, there’s an easier way. “startsWith() and endsWith() — did I really not know these?” tweeted data scientist Jonathan Carroll. “That’s it, I’m sitting down and reading through dox for every #rstats function.”

6. The req and validate functions in R Shiny

R Shiny development can be frustrating, especially when you get generic error messages that don’t help you understand what is going wrong under the hood. As Shiny develops, more and more validation and testing functions are being added to help better diagnose and alert when specific errors occur. The req() function allows you to prevent an action from occurring unless another variable is present in the environment, but does so silently and without displaying an error. So you can make the display of UI elements conditional on previous actions. For example:


output$go_button <- shiny::renderUI({
  # only display button if an animal input has been chosen
  
  shiny::req(input$animal)
  # display button
  shiny::actionButton("go", 
                      paste("Conduct", input$animal, "analysis!") 
  )
})

validate() checks before rendering output and enables you to return a tailored error message should a certain condition not be fulfilled, for example, if the user uploaded the wrong file:

# get csv input file
inFile <- input$file1
data <- inFile$datapath
# render table only if it is dogs
shiny::renderTable({
  # check that it is the dog file, not cats or rabbits
  shiny::validate(
    need("Dog Name" %in% colnames(data)),
    "Dog Name column not found - did you load the right file?"
  )
  data
})

7. revealjs

revealjs is a package which allows you to create beautiful presentations in HTML with an intuitive slide navigation menu, with embedded R code. It can be used inside R Markdown and has very intuitive HTML shortcuts to allow you to create a nested, logical structure of pretty slides with a variety of styling options. The fact that the presentation is in HTML means that people can follow along on their tablets or phones as they listen to you speak, which is really handy. You can set up a revealjspresentation by installing the package and then calling it in your YAML header. Here’s an example YAML header of a talk I gave recently using revealjs

---
title: "Exporing the Edge of the People Analytics Universe"
author: "Keith McNulty"
output:
  revealjs::revealjs_presentation:
    center: yes
    template: starwars.html
    theme: black
date: "HR Analytics Meetup London - 18 March, 2019"
resource_files:
- darth.png
- deathstar.png
- hanchewy.png
- millenium.png
- r2d2-threepio.png
- starwars.html
- starwars.png
- stormtrooper.png
---

8. Datatables in RMarkdown or Shiny using DT

 The DT package is an interface from R to the DataTables javascript library. This allows a very easy display of tables within a shiny app or R Markdown document that has a lot of in-built functionality and responsiveness. This prevents you from having to code separate data download functions, gives the user flexibility around the presentation and the ordering of the data and has a data search capability built in. For example, a simple command such as :

DT::datatable(
  head(iris),
  caption = 'Table 1: This is a simple caption for the table.'
)

9. Pimp your RMarkdown with prettydoc

prettydoc is a package by Yixuan Qiu which offers a simple set of themes to create a different, prettier look and feel for your RMarkdown documents. This is super helpful when you just want to jazz up your documents a little but don’t have time to get into the styling of them yourself. It’s really easy to use. Simple edits to the YAML header of your document can invoke a specific style theme throughout the document, with numerous themes available. For example, this will invoke a lovely clean blue colouring and style across titles, tables, embedded code and graphics:

---
title: "My doc"
author: "Me"
date: June 3, 2019
output:
  prettydoc::html_pretty:
    theme: architect
    highlight: github
---

10. Get minimum and maximum values with a single command. 

Talking about the useful R functions you might not know how can I miss to find the minimum and maximum values in a vector? Base R’s range() function does just that, returning a 2-value vector with the lowest and highest values. The help file says range() works on numeric and character values, but I’ve also had success using it with date objects.

Learning To Love Data Science

Learning to love data science is a collection of reports that Mike Barlow wrote for O’Reilly Media in 2013, 2014, and 2015. The reports focused on topics generally associated with data science, machine learning, predictive analytics, and “big data,” a term that has largely fallen from favour. Since Mike is a journalist and not a scientist, he approached the reports from the perspective of a curious outsider.

The reports betray his sense of amused detachment, which is probably the right way to approach writing about a field like data science, and his ultimate faith in the value of technology, which seems unjustifiably optimistic. At any rate, the reports provide valuable snapshots, taken almost randomly, of a field whose scale, scope, and influence are ____growing steadily. Mike’s reports are like dispatches from a battlefield; they aren’t history, but they provide an exciting and reasonably accurate picture of life on the front lines.

Learning To Love Data Science
Learning To Love Data Science

With this book, you’ll learn how:

■ Big data is driving a new generation of predictive analytics, creating new products, new business models, and new markets

■ New analytics tools let businesses leap beyond data analysis and go straight to decision-making

■ Indie manufacturers are blurring the lines between hardware and software products

■ Companies are learning to balance their desire for rapid innovation with the need to tighten data security

■ Big data and predictive analytics are applied for social good, resulting in higher standards of living for millions of people

■ Advanced analytics and low-cost sensors are transforming equipment maintenance from a cost centre to a profit centre

Statistical Analysis with R

Although the field of statistics proceeds in a logical way, the writer organized statistical analysis with R so that you can open it up in any chapter and start reading. The idea is for you to find the information you’re looking for in a hurry and use it immediately whether it’s a statistical concept or an R-related one. On the other hand, reading from cover to cover is okay if you’re so inclined. If you’re a statistics newbie and you have to use R to analyze data, I recommend that you begin at the beginning.

You can start reading this book anywhere, but here are a couple of hints. Want to learn the foundations of statistics? Turn the page. Introduce yourself to R? That’s Chapter 2. Want to start with graphics? Hit Chapter 3. For anything else, find it in the table of contents or the index and go for it. In addition to what you’re reading right now, this product comes with a free ____access-anywhere Cheat Sheet that presents a selected list of R functions and describes what they do. To get this Cheat Sheet, visit www.dummies.com and type Statistical Analysis with R For Dummies Cheat Sheet in the search box.

Statistical Analysis with R

Solving System of Equations in R With Example

In this article, we will discuss solving a system of equations in R Programming Language. solve() function in R Language is used to solve the equation. Here equation is like a*x = b, where b is a vector or matrix and x is a variable whose value is going to be calculated.

Syntax: solve(a, b)

Parameters:

  • a: coefficients of the equation
  • b: vector or matrix of the equation

Example 1: Solving system equation of three equations

Given Equations:
x + 2y + 3z = 20  
2x + 2y + 3z = 100  
3x + 2y + 8z = 200

Matrix A and B for solution using coefficient of equations:
A->
1   2   3
2   2   3
3   2   8
B->
20
100
200

To solve this using two matrices in R we use the following code:

# create matrix A and B using given equations
A <- rbind(c(1, 2, 3),
		c(2, 2, 3),
		c(3, 2, 8))
B <- c(20, 100, 200)

# Solve them using solve function in R
solve(A, B)

Output:

80 -36 3.99999999999999

Example 2: Solving system equation of three equations

To get solutions in form of fractions, we use library MASS in R Language and wrap solve function in fractions.

Given Equations:
19x + 32y + 31z = 1110  
22x + 28y + 13z = 1406  
31x + 12y + 81z = 3040
Matrix A and B for solution using coefficient of equations:
A->
19   32   31
22   28   13
31   12   81
B->
1110
1406
3040

To solve this using two matrices in R we use the following code:

  • R
# Load package MASS
library(MASS)

# create matrix A and B using given equations
A <- rbind(c(19, 32, 31),
		c(22, 28, 31),
		c(31, 12, 81))
B <- c(1110, 1406, 3040)

# Solve them using solve
# function wrapped in fractions
fractions(solve(A, B))

Output:

[1] 159950/2243 -92039/4486  29784/2243 which means x=159950/2243 , y=-92039/4486 and z=29784/2243 is the solution for the above given linear equation.

Example 3: Solving Inverse matrix

  • R
# create matrix A and B using given equations
A <- matrix(c(4, 7, 3, 6), ncol = 2)
print(A)

print("Inverse matrix")

# Solve them using solve function in R
print(solve(A))

Output:

     [,1] [,2]
[1,]    4    3
[2,]    7    6
[1] "Inverse matrix"
          [,1]      [,2]
[1,]  2.000000 -1.000000
[2,] -2.333333  1.333333

Related post: Solving a System of Equations in Pure<br>Python without Numpy or Scipy

Solving a System of Equations in Pure
Python without Numpy or Scipy

Solving a system of equations in pure python without numpy or scipy covers a system of equations from math to complete code, and it’s very closely related to the matrix inversion post. There are times that we’d want an inverse matrix of a system for repeated uses of solving for X, but most of the time we simply need a single solution of X for a system of equations, and there is a method that allows us to solve directly for X where we don’t need to know the inverse of the system matrix. We’ll use python again, and even though the code is similar, it is a bit different. So there’s a separate GitHub repository for this project.

Also, we know that numpy or scipy or sklearn modules could be used, but we want to see how to solve for X in a system of equations without using any of them, because this post, like most posts on this site, is about understanding the principles from math to complete code. However, near the end of the post, there is a section that shows how to solve for X in a system of equations using numpy / scipy. Remember too, try to develop the code on your own with as little help from the post as possible, and use the post to compare to your math and approach. However, just working through the post and making sure you understand the steps thoroughly is also a great thing to do.

An Introduction to Statistical Learning with Applications in R

An Introduction to Statistical Learning with Applications in R is intended for anyone who is interested in using modern statistical methods for modelling and prediction from data. This group includes scientists, engineers, data analysts, data scientists, and quants, but also less technical individuals with degrees in non-quantitative fields such as the social sciences or business.

Writers expect that the reader will have had at least one elementary course in statistics. Background in linear regression is also useful, though not required since we review the key concepts behind linear regression in Chapter 3. The mathematical level of this book is modest, and detailed knowledge of matrix operations is not required. This book provides an introduction to the statistical programming language R. Previous exposure to a programming language, such as MATLAB or Python, is useful but not required.

The first edition of this book has been used to teach masters and PhD students in business, economics, computer science, biology, earth sciences, psychology, and many other areas of the physical and social sciences. It has also been used to teach advanced undergraduates who have already taken a course on linear regression. In the context of a more mathematically rigorous course in which ESL serves as the primary textbook, ISL could be used as a supplementary text for teaching computational aspects of the various approaches.

R For Dummies

R For Dummies is an introduction to the statistical programming language known as R. Book start by introducing the interface and works our way from the very basic concepts of the language through more sophisticated data manipulation and analysis. Book illustrate every step with easy-to-follow examples. This book contains numerous code snippets, several write-it-yourself functions you can use later on, and complete analysis scripts. All these are for you to try out yourself.

Book doesn’t attempt to give a technical description of how R is programmed internally but the book does focus as much on the why as on the how. R doesn’t function as your average scripting language, and it has plenty of unique features that may seem surprising at first. Instead of just telling you how you have to talk to R, writers believe it’s important for us to explain how the R engine reads what you tell it to do. After reading this book, you should be able to manipulate your data in the form you want and understand how to use functions we didn’t cover in the book (as well as the ones we do cover).

R For Dummies takes the steepness out of the learning curve for using R. R For Dummies does not guarantee that you’ll be a guru if you read this book, but you should be able to do the following:

  • Perform data analysis by using a variety of powerful tools Use the power of R to do statistical analysis and other data-processing tasks
  • Appreciate the beauty of using vector-based operations rather than loops to do speedy calculations Appreciate the meaning of the following line of code: knowledge <- apply(theory, 1, sum)
  • Know how to find, download, and use code that has been contributed to R by its very active community of developers

Advanced Analytics with Power BI + R

Data is everywhere. The world contains an astronomical amount of data, an amount that grows larger and larger each day. This vast collection of information has changed the way the world interacts uncovered breakthroughs in medicine and revealed new ways to understand trends in business and in our daily lives. With the increasing availability of data comes new challenges and opportunities as business leaders seek to gain important insights and transform information into actionable and meaningful results. As data becomes more accessible, manipulating vast amounts of available data to drive insights and make business decisions can be challenging.

Business leaders at every level need to become data literate and be able to understand data and analytical concepts that may have previously seemed out of reach, including statistical methods, machine learning, and data manipulation. With this spread of data literacy comes the powerful ability to make educated business decisions that rely on the smart use of data, rather than on an individual’s opinions. In the past, these tasks were extremely complex and would be handed off to engineers.

With the tools that exist today, business leaders are able to dive into their own analytics and uncover powerful insights. Microsoft Power BI brings advanced analytics to the daily business decision process, allowing users to extract valuable knowledge from data to solve business problems. This white paper will cover the advanced analytic capabilities of Power BI, including predictive analytics, data visualisations, R integration, and data analysis expressions.

Advanced Analytics with Power BI + R
Advanced Analytics
with Power BI

Table of contents
Advanced analytics in Power BI ……………………………………4
Predictive analytics with Azure
R integration
Quick Insights feature
Segmentation and cohort analysis ……………………………….9
Data grouping and Binning
Data streaming in Power BI …………………………………………11
Real-time dashboards
Setup of real-time streaming data sets
Visualizations in Power BI…………………………………………….12
Community-sourced visualizations
R visualisations
Custom visualizations
Data connection and shaping……………………………………….14
Azure services
DirectQuery
Data fetching with the R connector
Data shaping in Power Query with R
Data Analysis Expressions…………………………………………….17
Conclusion ……………………………………………………………………18

R for Everyone

With the increasing prevalence of data in our daily lives, new and better tools are needed to analyze the deluge. Traditionally there have been two ends of the spectrum: lightweight, individual analysis using tools like Excel or SPSS and heavy-duty, high-performance analysis built with C++ and the like. With the increasing strength of personal computers grew a middle ground that was both interactive and robust. Analysis done by an individual on his or her own computer in an exploratory fashion could quickly be transformed into something destined for a server, underpinning advanced business processes. This area is the domain of R, Python, and other scripted languages.

R, invented by Robert Gentleman and Ross Ihaka of the University of Auckland in 1993, grew out of S, which was invented by John Chambers at Bell Labs. It is a high-level language that was originally intended to be run interactively where the user runs a command, gets a result, and then runs another command. It has since evolved into a language that can also be embedded in systems and tackle complex problems. In addition to transforming and analyzing data, R can produce amazing graphics and reports with ease. It is now being used as a full stack for data analysis, extracting and transforming data, fitting models, drawing inferences and making predictions, and plotting and reporting results.


R’s popularity has skyrocketed since the late 2000s, as it has stepped out of academia and into banking, marketing, pharmaceuticals, politics, genomics and many other fields. Its new users are often shifting from low-level, compiled languages like C++, other statistical packages such as SAS or SPSS, and from the 800-pound gorilla, Excel. This time period also saw a rapid surge in the number of add-on package libraries of prewritten code that extend R’s functionality. While R can sometimes be intimidating to beginners, especially for those without programming experience, I find that programming analysis, instead of pointing and clicking, soon becomes much easier, more convenient and more reliable. It is my goal to make that learning process easier and quicker.


R for everyone lays out information in a way I wish I were taught when learning R in graduate school. Coming full circle, the content of this book was developed in conjunction with the data science course that writer teaches at Columbia University. It is not meant to cover every minute detail of R, but rather the 20% of functionality needed to accomplish 80% of the work.

Microsoft Excel And Business Analysis For Professionals

Microsoft Excel is the world’s most-used business intelligence tool. Its knowledge is even
compulsory for an MBA degree and the business world depends greatly on it. Microsoft excel and business analysis for professionals is intended for Sales Managers, Financial Analysts, Business Analysts, Data Analysts, MIS Analysts, HR Executives and frequent Excel users.

It is written by Michael Olafusi a two-time Microsoft Excel MVP (most valuable professional) and a full-time Microsoft Excel consultant. He is the founder of UrBizEdge, a business data analysis and Microsoft Excel consulting firm.

He has trained hundreds of business professionals on Microsoft Excel and has used the experience gained from interacting with them both during such training and while consulting for companies to write this excellent guide for the busy professional who needs the improved work productivity Microsoft Excel provides.

Microsoft Excel And Business Analysis For Professional
Microsoft Excel And Business Analysis For Professionals

Contents
Preface
Microsoft Excel: It’s more powerful and easier to…
How Excel Handles What You Type
Data Consistency, starting with the end in view
Building Datasheets that can easily scale
Sorting
Filtering
Data Cleaning
Data Formatting
Charts
PivotTable and PivotChart
Business Data Analysis
Power Excel Formulas
Named Range, Goal Seek and Scenario Manager
Introduction To Excel VBA (macros)