Data Reshape-Implementation
This is the most interesting and core module of VSeed. It seems complex, but it is actually very simple and ingenious, consisting of less than 200 lines of code.
As long as foldMeasures and unfoldDimensions are properly utilized, any Measures and Dimensions can be converted to fixed Measures and Dimensions, achieving highly flexible visual mapping.
foldMeasures
foldMeasures folds all Measures into one measure, adding a Measure Name Dimension and a Measure ID Dimension. Any potentially lost information is stored in foldInfo, and data statistics can also be computed during this process.
Features
- Feature 1: After
foldMeasuresfinishes executing, there will be exactly 1 measure field. This means data described by multiple measures can all be converted to 1 measure; mapping any multiple measures data to exactly one graphic element. - Feature 2: A data item is strictly consistent with the graphic element (geometric element)'s data. One data item corresponds to one graphic element.
- Feature 3: Data statistics are computed during this process.
1measure0dimensions -> AfterfoldMeasures, you get1measure and2dimensions (including Measure Name and Measure ID).4measures1dimension -> After2passes offoldMeasures, you can get2measures and3dimensions (including Measure Name and Measure ID), which perfectly supports scenarios like Dual Axis Charts.Nmeasures0dimensions -> AfterY(Y ≤ N) passes offoldMeasures, you can getYmeasures and2dimensions (including Measure Name and Measure ID).
Minimal Runnable Example
unfoldDimensions
unfoldDimensions concatenates any subset of Dimensions into a new Dimension without losing any information. All newly added information is stored in unfoldInfo.
A complete unfoldDimensions == Converting all Dimension values to Measures + One foldMeasures pass.
However, the cost of iterating over the dataset is significant. An extra foldMeasures pass would result in performance degradation.
Because foldMeasures inherently guarantees that one data item holds precisely one measure, we can directly apply a simple merge exclusively on the source data. This cleanly achieves the equivalent effect, ultimately scaling performance substantially.
Upon further consideration, theoretically, unfoldDimensions and foldMeasures could be fully merged to complete all data processing within a single dataset iteration. However, for the sake of readability and maintainability, they are tentatively kept apart when there is no performance bottleneck.
Features
Feature 1: After unfoldDimensions is executed, there is strictly 1 measure field remaining.
Feature 2: It can merge Dimensions without losing the original data structure.
- As long as it proceeds after
foldMeasures, you can achieve the expansion of Dimensions and merging of Measures via a simple concat operation, yielding outstanding performance. - Arbitrary Dimensions can be merged together to form an entirely new Dimension field, empowering infinitely flexible visual channel mappings.
- Since it is not complex intrinsically, it can theoretically be stitched seamlessly onto
foldMeasuresto diminish traversal passes and bolster performance.