Skip to main content

Quantitative Analysis Guide: Which Statistical Software to Use?

Resources and support for statistical and numerical data analysis

Quick Links

Statistical Software Comparison

Software Access

 

 Software   Mac/Windows   DS Lab   VCL   HPC  Personal Access at NYU
 SPSS  Both Purchase via NYU Computer Store
 JMP  Both Purchase via NYU Computer Store (Departments, Faculty and Staff)
Download a Free 30 day Trial
 Stata  Both Purchase via Stata Grad Plan
 SAS  Windows Purchase via NYU Computer Store (Departments, Faculty and Staff)
SAS University Edition is free for students and professors
 R  Both Free download via CRAN website
 MATLAB  Both Contact hpc@nyu.edu
Purchase via NYU Computer Store

 

History
  • The first version of SPSS was developed by Norman H. Nie, Dale H. Bent and C. Hadlai Hull in and released in 1968 as the Statistical Package for Social Sciences.
  • In July 2009, IBM acquired SPSS.

 
Users
  • Social sciences
  • Health sciences
  • Marketing
  • Academia
 
Data Format and Compatibility
  • .sav file to save data
  • Optional syntax files (.sps)
  • Easily export .sav file from Qualtrics
  • Import Excel files (.xls, .xlsx), Text files (.csv, .txt, .dat), SAS (.sas7bdat), Stata (.dta)
  • Export Excel files (.xls, .xlsx), Text files (.csv, .dat), SAS (.sas7bdat), Stata (.dta)
 
Graphics

 
Highlights
  • Easy and intuitive user interface; menus and dialog boxes
  • Similar feel to Excel
  • SEMs through SPSS Amos
  • Easily exclude data and handle missing data

 
Limitations
  • Absence of robust methods (e.g...Least Absolute Deviation Regression, Quantile Regression, ...)
  • Unable to perform complex many to many merge

 

Sample Data

Sex Test1 Test2
0 86 83
0 93 79
0 85 81
0 83 80
0 91 76
1 94 79
1 91 94
1 83 84
1 96 81
1 95 75
History
  • Developed by SAS 
  • Created in the 1980s by John Sall to take advantage of the graphical user interface introduced by Macintosh
  • Orginally stood for 'John's Macintosh Program'
  • Five products: JMP, JMP Pro, JMP Clinical, JMP Genomics, JMP Graph Builder App

 
Users
  • Engineering: Six Sigma, Quality Control, Scientific Research, Design of Experiments
  • Biology
  • Healthcare/Pharmaceutical
 
Data Format and Compatibility
  • .jmp file to save data
  • Optional syntax files (.jsl)
  • Import Excel files (.xls, .xlsx), Text files (.csv, .txt, .dat), SAS (.sas7bdat), Stata (.dta), SPSS (.sav)
  • Export Excel files (.xls, .xlsx), Text files (.csv, .dat), SAS (.sas7bdat)
 
Graphics
  • Gallery of JMP Graphs
  • Drag and Drop Graph Editor will try to guess what chart is correct for your data
  • Dynamic interface can be used to zoom and change view
  • Ability to lasso outliers on a graph and regraph without the outliers
 
Highlights
  • Interactive Graphics
  • Scripting Language (JSL)
  • SAS, R and MATLAB can be executed using JSL
  • Interface for using R from within and add-in for Excel
  • Great interface for easily managing output
  • Graphs and data tables are dynamically linked
  • Great set of online resources!
 
Limitations
  • Absence of some robust methods (regression: 2SLS, LAD, Quantile)

Sample Data

Sex Test1 Test2
0 86 83
0 93 79
0 85 81
0 83 80
0 91 76
1 94 79
1 91 94
1 83 84
1 96 81
1 95 75

History
  • Stata was first released in January 1985 as a regression and data management package with 44 commands, written Bill Gould and Sean Becketti. 
  • The name Stata is a syllabic abbreviation of the words statistics and data.
  • The graphical user interface (menus and dialog boxes) wasn't released until version 8.0 in 2003.

 
Users

 
Data Format and Compatibility
  • .dta file to save dataset
  • .do syntax file, where commands can be written and saved
  • Import Excel files (.xls, .xlsx), Text files (.txt, .csv, .dat), SAS (.XPT), Other (.XML)
  • Export Excel files (.xls, .xlsx), Text files (.txt, .csv, .dat), SAS (.XPT), Other (.XML)
  • Old versions of Stata cannot read newer versions of Stata datasets

 
Graphics

 
Highlights
  • Syntax mainly used, but menus are an option as well
  • Some user written programs are available to install
  • Offers matrix programming in Mata
  • Works well with panel data, survey data, multiple imputations etc...
  • Data management
 
Limitations
  • Can only hold one dataset in memory at a time
  • Cannot handle very large datasets - may have to sacrifice the number of variables for the number of observations
  • Graphs have limited flexibility

Sample Syntax

* First enter the data manually;
input
str10 sex test1 test2
  "Male" 86 83
  "Male" 93 79
  "Male" 85 81
  "Male" 83 80
  "Male" 91 76
  "Female" 94 79
  "Female" 91 94
  "Female" 83 84
  "Female" 96 81
  "Female" 95 75
end

* Next run a paired t-test;
ttest test1 == test2

* Create a scatterplot;
twoway (scatter test2 test1 if sex == "Male") (scatter test2 test1 if sex == "Female"), legend (lab(1 "Male") lab(2 "Female"))

History
  • The development of SAS (Statistical Analysis System) began in 1966 by Anthony Bar of North Carolina State University and later joined by James Goodnight. 
  • The National Institute of Health funded this project with a goal of analyzing agricultural data to improve crop yields.
  • The first release of SAS was in 1972. In 2012, SAS held 36.2% of the market making it the largest market-share holder in 'advanced analytics.'

 
Users
  • Financial Services
  • Government
  • Manufacturing
  • Health and Life Sciences

 
Data Format and Compatibility
  • Available for Windows only
  • Import Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), SPSS (.sav), Stata (.dta), JMP (.jmp), Other (.xml)
  • Export Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), SPSS (.sav), Stata (.dta), JMP (.jmp), Other (.xml)
 
Graphics
 
Highlights
  • BASE SAS contains the data management facility, programming language, data analysis and reporting tools
  • SAS Libraries collect the SAS datasets you create
  • More than 200 components are available to complement Base SAS which include SAS/GRAPH, SAS/PH (Clinical Trial Analysis), SAS/ETS (Econometrics and Time Series), SAS/Insight (Data Mining) etc...
  • SAS Certification exams
  • Handles extremely large datasets
  • Predominantly used for data management and statistical procedures
  • SAS has two main types of code; DATA steps and PROC steps
  • With one procedure, test results, post estimation and plots can be produced
  • Size of datasets analyzed is only limited by the machine
 
Limitations
  • Graphics can be cumbersome to manipulate
  • Since SAS is a proprietary software, there may be an extensive lag time for the implementation of new methods
  • Documentation and books tend to be very technical and not necessarily new user friendly

 

Sample Syntax

* First enter the data manually;
data example;
  input sex $ test1 test2;
  datalines;
    M 86 83
    M 93 79
    M 85 81
    M 83 80
    M 91 76
    F 94 79
    F 91 94
    F 83 84
    F 96 81
    F 95 75 
  ;
run;

* Next run a paired t-test;
proc ttest data = example;
  paired test1*test2;
run;

* Create a scatterplot;
proc sgplot data = example;
  scatter y = test1 x = test2 / group = sex;
run;

History
  • R first appeared in 1993 and was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. 
  • R is an implementation of the S programming language which was developed at Bell Labs.
  • It is named partly after its first authors and partly as a play on the name of S.
  • R is currently developed by the R Development Core Team. 
  • RStudio, an integrated development environment (IDE) was first released in 2011.
 
Users
  • Companies Using R
  • Data Science
  • Finance and Economics
  • Bioinformatics
  • Sociology
  • Marketing
 
Data Format and Compatibility
  • Import Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), SPSS (.sav), Stata (.dta), SAS(.sas7bdat), Other (.xml, .json)
  • Export Excel files (.xlsx), Text files (.txt, .csv), SPSS (.sav), Stata (.dta), Other (.json)
 
Graphics
  • ggplot2 package, grammar of graphics
  • Available graphs available through ggplot2
  • Network analysis (igraph)
  • Flexible esthetics and options
  • Interactive graphics with Shiny
  • Many available packages to create field specific graphics
 
Highlights
  • R is a free and open source
  • Over 6000 user contributed packages available through CRAN
  • Large online community
  • Network Analysis, Text Analysis, Data Mining, Web Scraping 
  • Interacts with other software such as, Python, Bioconductor, WinBUGS, JAGS etc...
  • Scope of functions, flexible, versatile etc..
  • Size of datasets analyzed is only limited by the machine
 
Limitations​
  • Large online help community but no 'formal' tech support
  • Have to have a good understanding of different data types before real ease of use begins
  • Many user written packages may be hard to sift through

 

Sample Syntax

# Manually enter the data into a dataframe
dataset <- data.frame(sex = c("Male", "Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female", "Female"), 
                      test1 = c(86, 93, 85, 83, 91, 94, 91, 83, 96, 95), 
                      test2 = c(83, 79, 81, 80, 76, 79, 94, 84, 81, 75))

# Now we will run a paired t-test
t.test(dataset$test1, dataset$test2, paired = TRUE)

# Last let's simply plot these two test variables
plot(dataset$test1, dataset$test2, col = c("red","blue")[dataset$sex])
legend("topright", fill = c("blue", "red"), c("Male", "Female"))

# Making the same graph using ggplot2
install.packages('ggplot2')
library(ggplot2)
mygraph <- ggplot(data = dataset, aes(x = test1, y = test2, color = sex))
mygraph + geom_point(size = 5) + ggtitle('Test1 versus Test2 Scores')

 

History
  • Cleave Moler of the University of New Mexico began development in the late 1970s.
  • With the help of Jack Little, they cofounded MathWorks and released MATLAB (matrix laboratory) in 1984. 
 
Users
  • Education (linear algebra and numerical analysis)
  • Popular among scientists involved in image processing
  • Engineering
 
Data Format and Compatibility
  • .m Syntax file
  • Import Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), Other (.xml, .json)
  • Export Excel files (.xls, .xlsx), Text files (.txt, .dat, .csv), Other (.xml, .json)
 
Graphics
 
Highlights
  • Optimized for data analysis, matrix manipulation in particular
  • Basic unit is a matrix
  • Vectorized operations are quick
  • Diverse set of available toolboxes (apps) [Statistics, Optimization, Image Processing, Signal Processing, Parallel Computing etc..]
  • Large online community (MATLAB Exchange)
  • Image processing
  • Vast number of pre-defined functions and implemented algorithms
 
Limitations
  • Lacks implementation of some advanced statistical methods
  • Integrates easily with some languages such as C, but not others, such as Python
  • Limited GIS capabilities

Sample Syntax

sex = {'Male','Male', 'Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female'};
t1 = [86,93,85,83,91,94,91,83,96,95];
t2 = [83,79,81,80,76,79,94,84,81,75];

% paired t-test
[h,p,ci,stats] = ttest(t1,t2)


% independent samples t-test
sex = categorical(sex);
[h,p,ci,stats] = ttest2(t1(sex=='Male'),t1(sex=='Female'))


plot(t1,t2,'o')
g = sex=='Male';
plot(t1(g),t2(g),'bx'); hold on; plot(t1(~g),t2(~g),'ro')

Software Features and Capabilities

 

Software Interface* Learning Curve Data Manipulation Statistical Analysis Graphics Specialties
 SPSS  Menus & Syntax  Gradual  Moderate

 Moderate Scope
​ Low Versatility

 Good Custom Tables, ANOVA and Multivariate Analysis
 JMP  Menus & Syntax  Gradual  Strong

 Moderate Scope
 Medium Versatility

 Great Design of Experiments, Quality Control, Model Fit
 Stata  Menus & Syntax  Moderate  Strong

 Broad Scope
 Medium Versatility

 Good Panel Data, Mixed Models, Survey Data Analysis
 SAS  Syntax  Steep  Very Strong

 Very Broad Scope
 High Versatility

 Very Good Large Datasets, Reporting, Password Encryption, Components for Specific Fields
 R  Syntax  Steep  Very Strong

 Very Broad Scope
 High Versatility

 Excellent Graphic Packages, Machine Learning, Predictive Modeling
 MATLAB  Syntax  Steep  Very Strong

 Limited Scope
 High Versatility

 Excellent Simulations, Multidimensional Data, Image and Signal Processing

*The primary interface is bolded in the case of multiple interface types available.

Learning Curve

Further Reading


Loading