Pre-Course Setup: EcoInformatics Tools
The purpose of this course is to train you in key ecoinformatics practices.
Therefore, as an Ecoinformatician you need to be able to:
Pull data from Application Programming Interfaces (APIs)
- More on this in Chapter 2
Organize and document your code and data
Version control your code to avoid disaster and make it reproducible
- For you, your collaborators, and/or the wider community
Push your code up to public-facing repositories
Pull others code from public repositories.
More thoughts on the benefits and power of reproducibility can be found here
To be successful, both in this course and in your careers you will need these skills. This is why they are a requirement for this course. If you are already using these skills on a daily basis, fantastic! If you don’t feel that you have mastery in the workflows listed above we have placed lesson links throughout this chapter so that you can build these skills and be successful in this course.
0.1 Pre-Course Skills & Setup
For the purpose of this course we will largely be using the following tools to access, pull, and explore data:
- R & Rstudio
- Git, GitHub, & Atom.io
- Markdown & Rmarkdown
As such we will need to install and/or update these tools on your personal computer before our first day of class. While we chose R for this course, nearly all of the packages and data are fully available and transferable to Python or other languages. If you’d like to brush up on your R skills I highly recommend Data Carpentry Boot camp’s free R for Reproducible Scientific Analysis course.
0.1.1 Installing or Updating R
Please check your version of R. You will need R 3.6.0+
How to check your version in R or RStudio if you already have it:
> version
_
platform x86_64-apple-darwin15.6.0
arch x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major 3
minor 5.1
year 2018
month 07
day 02
svn rev 74947
language R
version.string R version 3.5.1 (2018-07-02)
nickname Feather Spray
If you don’t already have R or need to update it do so here.
0.1.2 Windows R/RStudio Setup
After you have downloaded R, run the .exe file that was just downloaded Go to the RStudio Download page Under Installers select RStudio X.XX.XXX - e.g. Windows Vista/7/8/10 Double click the file to install it Once R and RStudio are installed, click to open RStudio. If you don’t get any error messages you are set. If there is an error message, you will need to re-install the program.
0.1.3 Mac R/RStudio Setup
After you have downloaded R, double click on the file that was downloaded and R will install Go to the RStudio Download page Under Installers select RStudio 1.2.1135 - Mac OS X XX.X (64-bit) to download it. Once it’s downloaded, double click the file to install it. Once R and RStudio are installed, click to open RStudio. If you don’t get any error messages you are set. If there is an error message, you will need to re-install the program.
0.2 Linux R/RStudio Setup
R is available through most Linux package managers. You can download the binary files for your distribution from CRAN. Or you can use your package manager. e.g. for Debian/Ubuntu
run sudo apt-get install r-base
and for Fedora
run sudo yum install R
To install RStudio, go to the RStudio Download page Under Installers select the version for your distribution. Once it’s downloaded, double click the file to install it Once R and RStudio are installed, click to open RStudio. If you don’t get any error messages you are set. If there is an error message, you will need to re-install the program.
0.2.1 Install basic packages for this course
You can run the following script to make sure all the required packages are properly installed on your computer.
# list of required packages
<- c(
list.of.packages 'data.table',
'tidyverse',
'jsonlite',
'jpeg',
'png',
'raster',
'rgdal',
'rmarkdown',
'knitr'
)
# identify new (not installed) packages
<- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
new.packages
# install new (not installed) packages
if(length(new.packages))
install.packages(new.packages,
repos='http://cran.rstudio.com/')
# load all of the required libraries
sapply(list.of.packages, library, character.only = T)
Note: On some operating systems, you may need to install the Geospatial Data Abstraction Library (GDAL). More information about GDAL can be found from here.
0.3 Installing and Setting up Git & Github on Your Machine
For this course you will need: 1. Git installed on your local machine 2. Very basic bash scripting 3. A linked GitHub account 4. To link RStudio to git via RStudio or Atom.io
As we will be using these skills constantly, they are a pre-requisite for this course. If you don’t yet have these skills it’s okay! You can learn everything that you need to know via the following freely available resources:
If you are learning these skills from scratch I estimate that you will need to devote ~4-6 hours to get set up and comfortable with the various workflows. Also remember that I have code office hours every week and that Stack Exchange is your friend.
0.4 Installing Atom
Atom.io is a powerful and useful text editor for the following reasons:
It is language agnostic
It fully integrates with git and github + You can use it to push/pull/resolve conflicts and write code all in one space.
0.5 Linking RStudio to Git
Happy Git with R has a fantastic tutorial to help you link Rstudio-Git-Github on your local machine and push/pull from or to public repositories.
0.6 How we will be Conducting this Course
If you find a broken link or error in this course text submit an issue on the course github repository.
At the end of each chapter you will find a set of Exercises. At the end of the assigned chapter you will be expected to submit two files to the course webdrive:
An RMarkdown file with the naming convention: LASTNAME_COURSECODE_Section#.Rmd, and
A knitted .PDF with the same naming convention: LASTNAME_COURSECODE_Section#.pdf
To generate these files you have two options:
Click on the pencil and pad logo in the top of this text, copy the exercise section code, and drop it into your own .Rmd.
Git clone our course Github Repository, navigate to the ‘_Exercises’ folder, and use that .Rmd as a template.
Note: Exercises submitted in any other format, or those missing questions will not be graded
To generate your .PDF to upload, in your RMarkdown file simply push the ‘Knit’ button at the top of your document.
0.7 Exercises:
0.7.1 Exercise 0.1: A git introduction
- Navigate to our course github
git fork
our repo onto your own personal github account.git clone
the repo onto your own personal machine in a place that is functional and not temporary (e.g. not your downloads folder).
#hints
cd `Your/Path/Here'
git clone 'repo HTTPS'
- Add 2-3 sentences introducing yourself in the
_Course-participants
folder. For example:
***
Hi, I'm Dr. Katharyn Duffy. I have a Ph.D in Earth Science from Northern Arizona University. Over the past two years I've worked as an open-source software engineer in the PhenoCam lab, and now I'm the coding and lab support for your course. I really look forward to working with all of you!
***
- Submit a
pull request
to add your introduction to our course participants folder.
#hints
git add ...
git commit ...
git status....
git push --set-upstream
git remote -v
git remote add upstream...
Note: You may complete these either on the command line or via a program like Atom.io. If you haven’t yet made commits to a remote repository or submitted pull requests please reference the resources listed above.