Skip to Main Content

Business Analytics (MSBA) Strategic Capstone

This guide contains resources selected for the completion of the MSBA Business Analytics strategic capstone.

Streamlining Your Internet Searches

When searching for data on the web, you need to make sure that you include search terms relating to the content of the data you’re trying to find as well as some information on the format or source that you expect it to be in:

  1. Searching by file type: Append your search with ‘filetype:XLS filetype:CSV’, geodata (‘filetype:shp’), or database extracts (‘filetype:MDB, filetype:SQL, filetype:DB’).
  2. Search by part of a URL. Google: ‘inurl:downloads filetype:xls’ will find all Excel files that have “downloads” in their web address. You can also limit your search to only those results on a single domain name, by searching for, e.g. ‘site: agency.gov’.

Single datasets and data repositories

  1. http://archive.ics.uci.edu/ml/
  2. http://crawdad.org/
  3. http://data.austintexas.gov
  4. http://data.cityofchicago.org
  5. http://data.govloop.com
  6. http://data.gov.uk/
  7. data.gov.in
  8. http://data.medicare.gov
  9. http://data.seattle.gov
  10. http://data.sfgov.org
  11. http://data.sunlightlabs.com
  12. https://datamarket.azure.com/
  13. http://developer.yahoo.com/geo/g...
  14. http://econ.worldbank.org/datasets
  15. http://en.wikipedia.org/wiki/Wik...
  16.  http://factfinder.census.gov/ser...
  17. http://ftp.ncbi.nih.gov/
  18. http://gettingpastgo.socrata.com
  19. http://googleresearch.blogspot.c...
  20. http://books.google.com/ngrams/
  21. http://medihal.archives-ouvertes.fr
  22. http://public.resource.org/
  23. http://rechercheisidore.fr
  24. http://snap.stanford.edu/data/in...
  25. http://timetric.com/public-data/
  26. https://wist.echo.nasa.gov/~wist...
  27. http://www2.jpl.nasa.gov/srtm
  28. http://www.archives.gov/research...
  29. http://www.bls.gov/
  30. http://www.crunchbase.com/
  31. http://www.dartmouthatlas.org/
  32. http://www.data.gov/
  33. http://www.datakc.org
  34. http://dbpedia.org
  35. http://www.delicious.com/jbaldwi...
  36. http://www.faa.gov/data_research/
  37. http://www.factual.com/
  38. http://research.stlouisfed.org/f...
  39. http://www.freebase.com/
  40. http://www.google.com/publicdata...
  41. http://www.guardian.co.uk/news/d...
  42. http://www.infochimps.com
  43. http://www.kaggle.com/
  44. http://build.kiva.org/
  45. http://www.nationalarchives.gov....
  46. http://www.nyc.gov/html/datamine...
  47. http://www.ordnancesurvey.co.uk/...
  48. http://www.philwhln.com/how-to-g...
  49. http://www.imdb.com/interfaces
  50. http://imat-relpred.yandex.ru/en...
  51. http://www.dados.gov.pt/pt/catal...
  52. http://knoema.com
  53. http://daten.berlin.de/
  54. http://databib.org/
  55. http://datacite.org/
  56. http://data.reegle.info/
  57. http://data.wien.gv.at/
  58. http://data.gov.bc.ca
  59. https://pslcdatashop.web.cmu.edu/ (interaction data in learning environments)
  60. http://www.icpsr.umich.edu/icpsrweb/CPES/ - Collaborative Psychiatric Epidemiology Surveys: (A collection of three national surveys focused on each of the major ethnic groups to study psychiatric illnesses and health services use)
  61. http://www.dati.gov.it
  62. http://dati.trentino.it
  63. http://www.databagg.com/
  64. http://networkrepository.com - Network/ML data repository w/ visual interactive analytics
  65. https://www.usafacts.org/ 

Data Science

  1. This section contains data sets used in the book "Doing Data Science" by Rachel Schutt and Cathy O'Neil (O'Reilly 2014) Datasets on the book site: https://github.com/oreillymedia/doing_data_science
  2. Enron Email Dataset: http://www.cs.cmu.edu/~enron/
  3. Half a million Hubway rides: http://hubwaydatachallenge.org/trip-history-data/

Health Care

  1. Gapminder: http://www.gapminder.org/data/
  2. US department of Health and Services: (Largest collection of longitudinal hospital care data in the United States: http://www.ahrq.gov/research/data/dataresources/index.html

Other

  1. Peer to Peer lending data : lendingClub: https://www.lendingclub.com/info/download-data.action
  2. Facebook Data: Complete set of Friends of various School networks: http://masonporter.blogspot.ae/2011/02/facebook100-data-set.html
  3. Public Dataset on AWS: http://aws.amazon.com/public-data-sets/
  4. Youtube Networks Dataset: http://netsg.cs.sfu.ca/youtubedata/
  5. IMDB Dataset: http://www.imdb.com/interfaces
  6. UCI – Machine Learning Dataset: http://archive.ics.uci.edu/ml/
  7. Google Books n-gram dataset: http://aws.amazon.com/datasets/8172056142375670
  8. Million song database: http://labrosa.ee.columbia.edu/millionsong/