Data Science and Informatics
- Data Wrangling TranslationsCommon data wrangling methods (like filtering, sorting, and adding columns) in JavaScript, Python, SQL, R, and Excel. All examples use mock data.
- How to Debug Small Programming Scripts"This methodology will not find every bug in every program, but it is highly effective for the sort of short programs that beginner programmers are assigned as homework. These techniques then scale up to finding bugs in non-trivial programs."
- OpenRefineOpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
- Data Science: A First Introduction"an open source textbook aimed at introducing undergraduate students to data science. [...] In this book, we define data science as the study and development of reproducible, auditable processes to obtain value (i.e., insight) from data." Uses R's tidyverse packages and Jupyter notebooks.
- Practical Computing for Biologists byCall Number: QH 324.2 .H33 2011ISBN: 9780878933914Publication Date: 2010"Although many of the techniques are relevant to molecular bioinformatics, the motivation for the text is much broader, focusing on topics and techniques that are applicable to a range of scientific endeavors."
- Exploratory DesktopExploratory Desktop provides an advanced, interactive, and reproducible data wrangling and analysis experience powered by R and visualization.
- R for data science : import, tidy, transform, visualize, and model data byISBN: 9781491910368Publication Date: 2017Clearly written guide to using R for data science, visualization, tidying data, and more, by the author of "tidyverse" package. Highly recommended!
- Python Data Science Handbook byCall Number: QA76.73.P98 V365 2016 (Youngblood Energy Library)ISBN: 9781491912058Publication Date: 2016"For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all--IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. "
- Python Data Analytics byISBN: 1484209583Publication Date: 2015"Python Data Analytics will help you tackle the world of data acquisition and analysis using the power of the Python language. At the heart of this book lies the coverage of pandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. "
Statistical Programming in R
- The R Book byCall Number: QA 276.45 .R3 C73 2007ISBN: 1299190286Publication Date: 2012An extensive guide to R code for a wide range of statistical topics.
- Modern Applied Statistics with S byCall Number: QA 276.4 .V46 2002ISBN: 0387954570Publication Date: 2003This book is written for S but can be used for R with minimal modifications. Available as an ebook or physical copy. "A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods. The emphasis is on presenting practical problems and full analyses of real data sets."
- An Introduction to Applied Multivariate Analysis with R byCall Number: eBookISBN: 1441996508Publication Date: 2011"Multivariate analysis includes methods both for describing and exploring such data and for making formal inferences about them. The aim of all the techniques is, in general sense, to display or extract the signal in the data in the presence of noise and to find out what the data show us in the midst of their apparent chaos."
- Just Enough RFrom the introduction: "R makes it easy to work with and learn from data. It also happens to be a complete programmming language, but if you’re reading this guide then that might not be of interest to you. That’s OK — the goal here is not to teach you how to program in R. The goal is to teach you just enough R to be confident to explore your data. In this guide, we use R in the same way we use any other statistics software: To check and visualise data, run statistical analyses, and share our results with others. To do that it’s worth learning the absolute basics of the R language and key recent extensions to it. "
- Research Methods in RDesigned around psychology data with two sections: "Absolute Beginners' Guide to R" and "Putting R to Work" with more detailed case studies.
Statistical Programming in Python
- Statistics in Python - Scipy LecturesIntroductory statistics in Python using Scipy.
- An Introduction to Statistics with Python byISBN: 9783319283166Publication Date: 2016"This textbook provides an introduction to the free software Python and its use for statistical data analysis. It covers common statistical tests for continuous, discrete and categorical data, as well as linear regression analysis and topics from survival analysis and Bayesian statistics. Working code and data for Python solutions for each test, together with easy-to-follow Python examples, can be reproduced by the reader and reinforce their immediate understanding of the topic."
- Bayesian statistics in Python"implements a probabilistic programming language in Python."
- Handbook of Applied Spatial Analysis byISBN: 3642036473Publication Date: 2009"This Handbook summarizes, explains, and demonstrates the nature of current models, methods, and techniques particularly designed for the analysis of spatial data. The book is designed to be a desk reference for all researchers just getting into the field of spatial data analysis as well as for seasoned spatial analysts. "