Introduction
Source Codes
Introduction
Using R
Links to Procedures
R is a set of free software that is designed for teaching statistics, and over time its resources has grown to include many complex programs that are used by researchers. R is well documented on the web
Other than these initial links, I shall not explain in great detail on how or why of using R. The reason I included R on this web site is to help users access the more complex programs that are beyond my ability to program, and snippets of R codes arepresented to help the users.
Installing R
R the CRAN project page provides instructions and the necessary links to download R and RStudio. RStudio is a working platform on which codes can be imported, edited, used, and saved. All the codes provided on this site will assume that RStudio is available to the user.
Help for doing statistics using R
This site provides templates for some statistical procedures using R, so resources and codes provided on this site are minimal, only that relevant to the particulare problems to be solved. There is no general instructions on how to program in R from this site. For general help with R, user may find the following resources useful.
Web sites
Text Books
- R for Dummies by Vries and Meys. John Wiley & Son's Inc. ISBN 978-1-119-05580-9. This is good for the novice with no prior experience of R.
- R Cookbook by Teetor. O'Reilly Media Inc ISBN 978-93-5023-379-5. This takes the user step by step learning how to use and program R, and introduces the basic R resources
- R Graphics Cookbook by Chang. O'Reilly Media Inc ISBN 978-1-449-31695-2. This is an excellent book providing detailed instructions on how to do graphics using R
Introduction
Data Entry
Results Output
Data Frames
There are, of course, many ways R is used, depending on the experience and preference of the user. On this site however, R is used via the RStudio, an user interface that can be downloaded at the same time the R is installed.
R Studio contains 4 panels and are used as follows
- The top left panel, the "source", is essentially a text editor, into which the user types his data and command codes. Once entered, the relevant information can be marked, and executed by clicking on the "Run" button near the top right corner of this panel. In the use of R on this website, data and commands are always entered into and executed from this panel
- The bottom left panel, the "console", is where all the responses from the R engine are shown. These responses can be viewed, maked, copied onto the clipboard, and exported to other applications
- The top right panel, the "history", records all the activities that have taken place. For those using R just to do a procedure, this panel is not important. However, for professional users, who may carry out sequences of complex data entyry and manipulation, this history is an important record. The history can also be saved, so that the session can be continued at a different tile
- The bottom right panel, the "utilities", has many useful functions. However, the main function used on this website is the "Plots" sub-panel, as this is where all the graphics are produced.
Although there are many other ways the 4 panels can be used in programming and data entry, the usual cycle of operation for programs from this site consists of
- Entry of codes and instructions into the source panel (top left panel)
- Marking the relevant codes in the source panel, and executing these codes by clicking run above the source panel
- Examining the results in the console panel (bottom left), the plot panel if there is any graphics (bottom right)
- Copy the results to clipboard, and export to other applications
When the amount of data is too large for direct entry, the data can be imported from a file. Examples for data entry using direct entry, excelfile, comma delimited, or text file are provided in the following panels. Importing more complex files are discussed in the resources listed in the previous panel
When the amount of result are too large for copy and paste using the clipboard, the results can be exported directly onto a file. Exporting to a text file or an excel file are discussed in the following sections
Data entry, for resources on this site, are carried out by coding in the "source" (top left) panel of RStudio
Step 1 : link a matrix of data to a variable (the example is MyInput)
The first step is to link the data to a variable name
- For direct type in data entry, the following can be typed directly onto the "source" panel
MyInput = ("
v1 v2 v3
1 2 3
4 5 6")
- The same table, in a text file delimited by spaces of tabs, can also be imported by the following command in the "source panel
MyInput = read.table("MyInputFile.txt")
- If the file is delimited, say by comma, colon, or tab. The command is
MyInput = read.delim("MyInputFile.txt", sep = s) # where s = " ", ",", ";","\t"
- A special case of the delimited file is the comma delimited file, where the columns are separated by commas. In this case, the command is
MyInput = read.csv("MyInputFile.csv")
- If the data is in a table in an excel file called MyInputExcel.xlsx, in a worksheet called ThisSheet, the commands are
install.packages("XLConnect")
library('XLConnect')
MyInput = readWorksheetFromFile("MyInputExcel.xlsx", sheet="ThisSheet")
or
MyInput = readWorksheetFromFile("MyInputExcel.xlsx", sheet=1) # order of the sheets in excel file
Note: The XLConnect resource is necessary to translate the excel file
Note: In input by file, the file is assumed to be in the current folder. This is either the same as the current program, or that defined by the setwd() command. To check the current folder, use the command getwd()
Also Note : in data entry on this site, I have always use the R default header=TRUE in matrix input, because the headers are necessary for executing complex statistical programs.
Link the variable to the dataframe (the example is MyDataFrame)
Once a set of data are link to a variable, all sorts of manipulations and programs can be applied to the data. However, in complex algorithms in R, the data are placed in a dataframe, usually a matrix with the first row as headers (names of columns).
In order to convert our variable MyInput to the dataframe MyDataFrame, the following command is necessary
Summary
To get the data to the state that complex R procedures can be carried out requires a two stage command
- Input of data from typing, text file, or excel file, to a named variable
- Read the variable to a named dataframe.
- The data frame now has a name, and within that frame, each column has a name
Common practices
After the data are entered as a variable, editing, transformation, and programs may be applied.
After the data is transformed into the dataframe, complex statistical procedures in R can be executed on the dataframe. The results of th cacluations are then presented in the console (bottom left panel) of RStudio
In most cases, the results are statements or tables that are manageable manually. These are marked, copied (ctrl+c in Windows or Command+C in Mac), and pasted into other applications (ctrl+v in Windows or Command+v in Mac)
Similarly, graphic outputs are displayed in the "Plot" panel, bottom right of RStudio. Plots can be copied onto the clipboard or saved as a bitmap file by clicking the "Export" button
Export results to files
In some occasions, the amount of output from R is too large to handle manually, or the output is to be used for further analysis. When this happens, the output has to be exported directly onto files.
There are, of course, many ways of handling large amount of information in R, but this site provides only the common and simpler methods.
Output into text files
In most circumstances, the results of R procedures are already text, and these can be directly exported to files, as follows
write.table(MyResults, "MyResults.txt", sep=s) #s=any delimiter, common ones are " ", ",","\t"
Sometimes, the procedures are complex, and the results multilayered, and a program or subroutine is required to displayed any subset of results. In these circumstances, the results of the subroutine output can be exported directly into a text file
fit <- a procedure such as logistic regression
capture.output(summary(fit), file = "MyResults.txt")
Output into Excel files
These must preceed any command using excel files
install.packages("XLConnect") library('XLConnect')
It takes 4 steps to save any set of text data to excel, create a workbook, any number of (create a worksheet, write to the worksheet), and save the workbook
MyWorkbook <- loadWorkbook("MyOutputExcelFile.xlsx", create = TRUE) # creates a workbook MyWorkbook
createSheet(MyWorkbook, name = "MySheet1") # creates a worksheet MySheet1 in MyWorkbook
writeWorksheet(MyWorkbook, Mycontent, sheet = "MySheet1", startRow = 1, startCol = 1) # put content in variable MyContent into MySheet1, starting in row 1 and col 1 (A)
saveWorkbook(MyWorkbook) # save MyWorkbook as an excelfile MyOutputExcelFile.xlsx
An example : say that I have a matrix called mx, to save the matrix to MyOutputExcel.xlsx in a sheet called MySheet1
install.packages("XLConnect")
library('XLConnect')
MyWorkbook <- loadWorkbook("MyOutputExcel.xlsx", create = TRUE) # creates a workbook MyWorkbook
createSheet(MyWorkbook, name = "MySheet1") # creates a worksheet MySheet1 in MyWorkbook
writeWorksheet(MyWorkbook, mx, sheet = "MySheet1", startRow = 1, startCol = 1)
# put mx into MySheet1, starting in row 1 and col 1 (A)
saveWorkbook(MyWorkbook) # save MyWorkbook as an excelfile MyOutputExcelFile.xlsx
Sometimes, the procedures are complex, and the results multilayered, and a program or subroutine is required to displayed any subset of results. In these circumstances, the results of the subroutine output can be exported directly into an excel file. For example, I have a R procedure called fit, and I want to save the summary summary(fit) directly into excel. The codes are as follows
fit <- a procedure such as logistic regression
MyOutput <- capture.output(summary(fit)) # MyOutput is now a table of text
install.packages("XLConnect")
library('XLConnect')
MyWorkbook <- loadWorkbook("MyOutputExcelFile.xlsx", create = TRUE) # creates a workbook MyWorkbook
createSheet(MyWorkbook, name = "MySheet1") # creates a worksheet MySheet1 in MyWorkbook
writeWorksheet(MyWorkbook, MyOutput, sheet = "MySheet1", startRow = 1, startCol = 1)
# put MyOutput into MySheet1, starting in row 1 and col 1 (A)
saveWorkbook(MyWorkbook) # save MyWorkbook as an excelfile MyOutputExcelFile.xlsx
Note : However, when summary is translated to MyOutput, the tab in the table was translated into a number of spaces. The columns in the table within the excel file therefore do not separate well. In other words, the procedure works, but probably not much good, as direct export to a text file provides a better formatted output
StatTools is not a teaching site for R, not does it provide a comprehensive set of statistical procedures in R. However, there are a number of complex procedures that are beyond the ability of the authors of StatTools, so instructions and codes in R are offered for these procedures instead. As there are only a limited number of such procedures, links to their pages are listed in alphabetical orders.
ANCOVA explained and R Codes Page
Analysis of Variance and Covariance
Logistic Regression Explained and Codes
Logistic Regression
|