TidyData
Significance
TidyData, through its core principle of "variables as columns, observations as rows," dramatically reduces the complexity of data cleaning — letting us focus on business problems rather than data format conversion.
Paper
Author: Hadley Wickham. The paper discusses a small module in data processing — data tidying — because tidy datasets are easier to manipulate, model, and visualize, and have a specific structure.
This paper is highly recommended. See: Tidy Data
TidyData in VSeed
The dataset configuration in VSeed DSL uses the TidyData format.
Core characteristics:
- Each variable has a column: Variable values are stored in separate columns, e.g., "age", "gender".
- Each observation has a row: All variable values for one observation form a row, e.g., a person's age and gender.
- Each unit of observation has a table: Different types of observation units (e.g., person, time, location) should be stored separately.
Therefore, the result of an SQL query can be passed directly into VSeed's dataset configuration — no additional data processing needed for quick analysis and visualization.