For this workshop : a folder on your computer
There are many definitions of a project: it can be a single analysis for a paper or an entire cohort study with many papers.
In this workshop: a folder on your computer where we will analyse the NHANES data.
RStudio natively support project oriented wrokflow.
Advantages:
Not a single organization fits everybody’s needs, but being consistent throughout different projects or within a research group helps collaborating.
data/: Ready-to-analyze dataset, intermediate datasets.data-raw/: Data from the outside world untouched. Can contain scripts to import data from the internet and prepare it.R/: R files containing functions. python/ for Python.scripts/ or code/: Scripts for things that need to be run once.qmd/, md/, Rmd/: Quarto and markdown documents.output/: Folder with outputs, can contain images, graphs, or other stuff.figs/: Folder with figures produced by your scripts.results/: Results from the project, e.g., CSV tables.docs/: Documentation or rendered documents.man/: Documentation for R packages.extra/: Extra, non-code, files.README: Must-read file. At least one in the project directory, but can be added to any folder.LICENSE: License file for your project..gitignore: List of files that Git should ignore.Human and machine-readable.
files functions.Some examples
This faciliate this kind of operations:
More examples
# Good
other/2014-06-08_abstract-for-sla.docx
other/filenames-are-getting-better.xlsx
01-load-data.R
02-exploratory-analysis.R
03-model-approach-1.R
04-model-approach-2.R
fig-01.png
fig-02.png
fig01_scatterplot-talk-length-vs-interest.png
fig02_histogram-talk-attendance.png
report-2022-03-20.qmd
report-2022-04-02.qmd
report-draft-notes.txtWhy adopt a coding style:
Tip
Tidyverse style guide: https://style.tidyverse.org/
_ to separate words within a name, e.g., day_one; day_1.mean(x, na.rm = TRUE).{}:
{ should be the last character on the line.} should be the first character of the line.Quickly fix the syntax of your code using styler package:
do_something().%>% or |>).Bad:
Better:
Comment as you code, provide as much detail as you can. Your future self will thank you.
Important
With Git, you only need one version of your files!
Git can track text files:
%%{init:{'themeCSS': ".actor {stroke: DarkBlue;fill: White;stroke-width:1.5px;}", 'sequence':{'mirrorActors': false}}}%%
sequenceDiagram
participant W as Working folder
participant S as Staged
participant H as History
W->>S: Add
S->>H: Commit
repository, repo: a folder tracked by git.working folder: files not tracked by Git, or contain new modification not saved yet.add: add files or modifications to the staged areas,commit: create a snapshot of changes and save it in Git history. Commits must have a short description.%%{init:{'themeCSS': ".actor {stroke: DarkBlue;fill: White;stroke-width:1.5px;}", 'sequence':{'mirrorActors': false}}}%%
sequenceDiagram
participant W as Working folder
participant S as Staged
participant H as History
W->>S: Add
S->>H: Commit
stage: Files and modification here are tracked by Git and can be put into the history with a commit. They still can be lost (unsafe).history: all the changes that have been commited. Everything that has been commited in the Git history will never be completely gone (safe).Branches and repository:
local: refer to the repository that you store on your computer. We only work locally today.remote: refer to repository stored online like in GitHub.branch: branches are parallel versions of your project. They allow you to experiment things without affecting the main project until you’re ready to merge them back.merge: merging is the process of integrating changes from one branch into another. It combines the histories of both branches, creating a single, unified history.Why write functions?
DRY - Don’t Repeat Yourself
Create functions for actions that are often repeated:
Important
A function is a bundled sequence of steps that achieve a specific action.
For example, + (sum) is a function, mean() is a function …
Functions are made of a function call, arguments, and the function body:
Tip
Type sd in R (without parenthesis) to see how sd() calculate standard deviation
The output of the function depends only on the inputs. Identical inputs will give identical results.
Functions can replace loops and make your code much clearer.
Functions are easier to share between projects and can be gathered in a package.