Pre-Course Setup: EcoInformatics Tools

The purpose of this course is to train you in key ecoinformatics practices.

Therefore, as an Ecoinformatician you need to be able to:

  1. Pull data from Application Programming Interfaces (APIs)

    • More on this in Chapter 2
  2. Organize and document your code and data

  3. Version control your code to avoid disaster and make it reproducible

    • For you, your collaborators, and/or the wider community
  4. Push your code up to public-facing repositories

  5. Pull others code from public repositories.

More thoughts on the benefits and power of reproducibility can be found here

To be successful, both in this course and in your careers you will need these skills. This is why they are a requirement for this course. If you are already using these skills on a daily basis, fantastic! If you don’t feel that you have mastery in the workflows listed above we have placed lesson links throughout this chapter so that you can build these skills and be successful in this course.

0.1 Pre-Course Skills & Setup

For the purpose of this course we will largely be using the following tools to access, pull, and explore data:

  1. R & Rstudio
  2. Git, GitHub, & Atom.io
  3. Markdown & Rmarkdown

As such we will need to install and/or update these tools on your personal computer before our first day of class. While we chose R for this course, nearly all of the packages and data are fully available and transferable to Python or other languages. If you’d like to brush up on your R skills I highly recommend Data Carpentry Boot camp’s free R for Reproducible Scientific Analysis course.

0.1.1 Installing or Updating R

Please check your version of R. You will need R 3.6.0+

How to check your version in R or RStudio if you already have it:

> version
               _                           
platform       x86_64-apple-darwin15.6.0   
arch           x86_64                      
os             darwin15.6.0                
system         x86_64, darwin15.6.0        
status                                     
major          3                           
minor          5.1                         
year           2018                        
month          07                          
day            02                          
svn rev        74947                       
language       R                           
version.string R version 3.5.1 (2018-07-02)
nickname       Feather Spray  

If you don’t already have R or need to update it do so here.

0.1.2 Windows R/RStudio Setup

After you have downloaded R, run the .exe file that was just downloaded Go to the RStudio Download page Under Installers select RStudio X.XX.XXX - e.g. Windows Vista/7/8/10 Double click the file to install it Once R and RStudio are installed, click to open RStudio. If you don’t get any error messages you are set. If there is an error message, you will need to re-install the program.

0.1.3 Mac R/RStudio Setup

After you have downloaded R, double click on the file that was downloaded and R will install Go to the RStudio Download page Under Installers select RStudio 1.2.1135 - Mac OS X XX.X (64-bit) to download it. Once it’s downloaded, double click the file to install it. Once R and RStudio are installed, click to open RStudio. If you don’t get any error messages you are set. If there is an error message, you will need to re-install the program.

0.2 Linux R/RStudio Setup

R is available through most Linux package managers. You can download the binary files for your distribution from CRAN. Or you can use your package manager. e.g. for Debian/Ubuntu

  run sudo apt-get install r-base

and for Fedora

  run sudo yum install R

To install RStudio, go to the RStudio Download page Under Installers select the version for your distribution. Once it’s downloaded, double click the file to install it Once R and RStudio are installed, click to open RStudio. If you don’t get any error messages you are set. If there is an error message, you will need to re-install the program.

0.2.1 Install basic packages for this course

You can run the following script to make sure all the required packages are properly installed on your computer.

# list of required packages
list.of.packages <- c(
  'data.table',
  'tidyverse',
  'jsonlite',
  'jpeg',
  'png',
  'raster',
  'rgdal',
  'rmarkdown', 
  'knitr'
)

# identify new (not installed) packages
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]

# install new (not installed) packages
if(length(new.packages)) 
  install.packages(new.packages, 
                   repos='http://cran.rstudio.com/')

# load all of the required libraries
sapply(list.of.packages, library, character.only = T)

Note: On some operating systems, you may need to install the Geospatial Data Abstraction Library (GDAL). More information about GDAL can be found from here.

0.3 Installing and Setting up Git & Github on Your Machine

For this course you will need: 1. Git installed on your local machine 2. Very basic bash scripting 3. A linked GitHub account 4. To link RStudio to git via RStudio or Atom.io

As we will be using these skills constantly, they are a pre-requisite for this course. If you don’t yet have these skills it’s okay! You can learn everything that you need to know via the following freely available resources:

If you are learning these skills from scratch I estimate that you will need to devote ~4-6 hours to get set up and comfortable with the various workflows. Also remember that I have code office hours every week and that Stack Exchange is your friend.

0.4 Installing Atom

Atom.io is a powerful and useful text editor for the following reasons:

  1. It is language agnostic

  2. It fully integrates with git and github + You can use it to push/pull/resolve conflicts and write code all in one space.

0.5 Linking RStudio to Git

Happy Git with R has a fantastic tutorial to help you link Rstudio-Git-Github on your local machine and push/pull from or to public repositories.

0.6 How we will be Conducting this Course

If you find a broken link or error in this course text submit an issue on the course github repository.

At the end of each chapter you will find a set of Exercises. At the end of the assigned chapter you will be expected to submit two files to the course webdrive:

  1. An RMarkdown file with the naming convention: LASTNAME_COURSECODE_Section#.Rmd, and

  2. A knitted .PDF with the same naming convention: LASTNAME_COURSECODE_Section#.pdf

To generate these files you have two options:

  1. Click on the pencil and pad logo in the top of this text, copy the exercise section code, and drop it into your own .Rmd.

  2. Git clone our course Github Repository, navigate to the ‘_Exercises’ folder, and use that .Rmd as a template.

Note: Exercises submitted in any other format, or those missing questions will not be graded

To generate your .PDF to upload, in your RMarkdown file simply push the ‘Knit’ button at the top of your document.

0.7 Exercises:

0.7.1 Exercise 0.1: A git introduction

  1. Navigate to our course github
  2. git fork our repo onto your own personal github account.
  3. git clone the repo onto your own personal machine in a place that is functional and not temporary (e.g. not your downloads folder).
#hints
cd `Your/Path/Here'
git clone 'repo HTTPS'
  1. Add 2-3 sentences introducing yourself in the _Course-participants folder. For example:
***
Hi, I'm Dr. Katharyn Duffy.  I have a Ph.D in Earth Science from Northern Arizona University.  Over the past two years I've worked as an open-source software engineer in the PhenoCam lab, and now I'm the coding and lab support for your course.  I really look forward to working with all of you!
***
  1. Submit a pull request to add your introduction to our course participants folder.
#hints
git add ...
git commit ...
git status....
git push --set-upstream
git remote -v
git remote add upstream...

Note: You may complete these either on the command line or via a program like Atom.io. If you haven’t yet made commits to a remote repository or submitted pull requests please reference the resources listed above.