Data Management Pipelines
In this lecture, we will cover:
- data annotation
- data management piplines
- metadata
- reproducibility
- versioning your data
Why should I care about data management?
- Project edits and manuscript revisions…sometimes many months (or years!) down the line
- Maximizing the benefit of your work to the scientific community and society
- Facilitate collaborations based on your contributions
The Data Life Cycle
DataONE has an excellent overview of the steps in a data management and analysis pipeline.
Best practices
We will be presenting an overview of this DataONE Primer.
Another resource is Shuai’s AI & Data Blog which has practical insights from an experienced data scientist and startup founder.