2024-06-05
For this workshop : a folder on your computer
There are many definitions of a project: it can be a single analysis for a paper or an entire cohort study with many papers.
In this workshop: a project will refer to a folder on your computer.
RStudio natively support project oriented wrokflow.
Advantages:
Not a single organization fits everybody’s needs, but being consistent throughout different projects or within a research group helps collaborating.
data/
: Ready-to-analyze dataset, intermediate datasets.data-raw/
: Data from the outside world untouched. Can contain scripts to import data from the internet and prepare it.R/
: R files containing functions. python/
for Python.scripts/
or code/
: Scripts for things that need to be run once.qmd/
, md/
, Rmd/
: Quarto and markdown documents.output/
: Folder with outputs, can contain images, graphs, or other stuff.figs/
: Folder with figures produced by your scripts.results/
: Results from the project, e.g., CSV tables.docs/
: Documentation or rendered documents.man/
: Documentation for R packages.extra/
: Extra, non-code, files.README
: Must-read file. At least one in the project directory, but can be added to any folder.LICENSE
: License file for your project..gitignore
: List of files that Git should ignore.Human and machine-readable.
files
functions.Some examples
2024-04-01_air-pollution_PM25.csv
2024-04-01_air-pollution_NO2.csv
2021-04-01_air-pollution_PM25.csv
2021-04-01_air-pollution_NO2.csv
This faciliate this kind of operations:
More examples
# Good
other/2014-06-08_abstract-for-sla.docx
other/filenames-are-getting-better.xlsx
01-load-data.R
02-exploratory-analysis.R
03-model-approach-1.R
04-model-approach-2.R
fig-01.png
fig-02.png
fig01_scatterplot-talk-length-vs-interest.png
fig02_histogram-talk-attendance.png
report-2022-03-20.qmd
report-2022-04-02.qmd
report-draft-notes.txt
Why adopt a coding style:
Tip
Tidyverse style guide: https://style.tidyverse.org/
_
to separate words within a name, e.g., day_one; day_1
.mean(x, na.rm = TRUE)
.{}
:
{
should be the last character on the line.}
should be the first character of the line.Quickly fix the syntax of your code using styler
package:
do_something()
.%>%
or |>
).Bad:
Better:
Comment as you code, provide as much detail as you can. Your future self will thank you.
Important
With Git, you only need one version of your files!
Git can track text files:
Configure a git repository for each project.
repository
, repo
: a folder tracked by git.working folder
: files not tracked by Git, or contain new modification not saved yet.add
: add files of modifications to be tracked by Gitcommit
: create a snapshot of changes and save it in Git history
. Commits must have a short description.stage
: Files here are tracked by Git and can be put into the history with a commit
.history
: all the changes that have been commited
. Everything that has been commited in the Git history will never be completely gone.Branches and repository:
local
: refer to the repository that you store on your computer.remote
: refer to the repository that are stored online like git hub.branch
: branches are parallel versions of your project. They allow you to experiment things without affecting the main project until you’re ready to merge them back.merge
: merging is the process of integrating changes from one branch into another. It combines the histories of both branches, creating a single, unified history.Why write functions?
Golden rule of programming: DRY - Don’t Repeat Yourself
Create function for actions that are often repeated:
Important
A function is a bundled sequence of steps that achieve a specific action.
For example, +
(add) is a function, mean()
is a function …
Functions are made of a function call, its arguments, and the function body:
Tip
Type sd
in R to see how sd()
calculate standard deviation
The output of the function depends only on the inputs. Identical inputs will give identical results.
Functions can replace loops and make your code much clearer.
Functions are easier to share between projects and can be gathered in a package.