数据重塑-实现

简单而又巧妙

这是VSeed最有意思的, 也是最核心的模块, 看似复杂, 实则非常简单与巧妙, 仅仅不到200行代码.

只要善用foldMeasures 与 unfoldDimensions, 可以将任意指标维度, 转换为固定的指标与维度, 做到足够自由的可视化映射.

foldMeasures

源代码位置

foldMeasures 将所有的指标 fold 为一个指标, 增加一个指标名称维度和一个指标Id维度, 所有的可能丢失信息都存储在foldInfo内, 并且在这个过程中可以进行数据统计

特性

  1. 特性1: foldMeasures执行完之后, 一定只有1个指标字段, 即能够将多指标描述的数据, 都转换成1个指标; 将任意多指标数据对应一个图元
  2. 特性2: 1. 数据条目与图元(几何元素)的数据严格一致,一条数据对应一个图元
  3. 特性3: 该过程进行数据统计
最妙的地方!!!
  • 1个指标0个维度, foldMeasures 后可以获得1个指标2个维度(包括指标名称和指标Id)
  • 4个指标1个维度, 经过2次foldMeasures 后可以获得2个指标3个维度(包括指标名称和指标Id), 从而完美的可以支持双轴图等场景.
  • N个指标0个维度, 经过Y(Y ≤ N)次foldMeasures 后, 可以获得Y个指标和2个维度(包括指标名称和指标Id)

最小可运行示例

foldMeasures
1const data = [
2  { category: 'A', sales: 100, profit: 30 },
3  { category: 'B', sales: 200, profit: 50 },
4]
5
6const measures = [
7  { id: 'sales', alias: 'Sales' },
8  { id: 'profit', alias: 'Profit' },
9]
10
11function foldMeasures(dataset, measures, options) {
12  const {
13    measureId,
14    measureName,
15    measureValue,
16    colorMeasureId,
17    allowEmptyFold = true,
18  } = options || {}
19
20  const foldInfo = {
21    measureId,
22    measureName,
23    measureValue,
24    statistics: {
25      max: -Infinity,
26      min: Infinity,
27      sum: 0,
28      count: 0,
29      colorMin: Infinity,
30      colorMax: -Infinity,
31    },
32    foldMap: {},
33  }
34
35  const ids = measures.map(m => m.id)
36  const result = []
37
38  for (const row of dataset) {
39    for (const measure of measures) {
40      const { id, alias } = measure
41      const newRow = { ...row }
42
43      // 删除其他指标字段,避免重复
44      for (const key of ids) {
45        delete newRow[key]
46      }
47
48      newRow[measureId] = id
49      newRow[measureName] = alias || id
50      newRow[measureValue] = row[id]
51
52      if (colorMeasureId) {
53        const colorValue = row[colorMeasureId]
54        newRow.color = colorValue
55        foldInfo.statistics.colorMin = Math.min(foldInfo.statistics.colorMin, Number(colorValue))
56        foldInfo.statistics.colorMax = Math.max(foldInfo.statistics.colorMax, Number(colorValue))
57      }
58
59      const val = Number(row[id])
60      foldInfo.statistics.min = Math.min(foldInfo.statistics.min, val)
61      foldInfo.statistics.max = Math.max(foldInfo.statistics.max, val)
62      foldInfo.statistics.sum += val
63      foldInfo.statistics.count++
64
65      foldInfo.foldMap[id] = alias
66
67      result.push(newRow)
68    }
69  }
70
71  return { dataset: result, foldInfo }
72}
73
74const { dataset: foldedData, foldInfo } = foldMeasures(data, measures, {
75  measureId: '__MeaId__',
76  measureName: '__MeaName__',
77  measureValue: '__MeaValue__',
78})
79
80console.log(foldedData)
预期输出
1[
2  {
3    "category": "A",
4    "__MeaId__": "sales",
5    "__MeaName__": "Sales",
6    "__MeaValue__": 100
7  },
8  {
9    "category": "A",
10    "__MeaId__": "profit",
11    "__MeaName__": "Profit",
12    "__MeaValue__": 30
13  },
14  {
15    "category": "B",
16    "__MeaId__": "sales",
17    "__MeaName__": "Sales",
18    "__MeaValue__": 200
19  },
20  {
21    "category": "B",
22    "__MeaId__": "profit",
23    "__MeaName__": "Profit",
24    "__MeaValue__": 50
25  }
26]

unfoldDimensions

源代码位置

unfoldDimensions 在不丢失信息的前提下, 将任意的维度 concat 为一个新的维度, 所有的增加的信息都存储在unfoldInfo内.

一个完整unfoldDimensions == 所有维度值转指标 + 一次foldMeasures

但遍历dataset的开销是巨大的, 一次多余的 foldMeasures 会导致性能下降.

foldMeasures 可以直接保证一条数据只有一个指标, 因此可以直接在源数据上进行单纯的合并, 就能巧妙的达到等价效果, 最终从而大幅度提升性能.

经过思考, 理论上unfoldDimensions可以和foldMeasures完全合并, 在一次dataset 遍历中完成所有数据处理, 但为了可读性和可维护性, 在没有性能瓶颈的情况下, 暂定不合并.

特性

特性1: unfoldDimensions执行完之后, 一定只有1个指标字段, 特性2: 可以在不丢失原数据的情况下, 合并维度

最妙的地方!!!
  1. 只要在foldMeasures后进行, 就可以通过最简单的 concat 操作, 即可完成展开维度与合并指标, 性能极其优异.
  2. 任意的维度都能合并为一个全新的维度字段, 做到任意的视觉通道映射.
  3. 因为本身并不复杂, 所以理论上可以和 foldMeasures 合并在一起, 降低遍历次数, 提升性能.

最小可运行示例

1const XEncoding = '__DimX__'
2const ColorEncoding = '__DimColor__'
3/**
4 * 展开并合并视觉通道的维度, 在foldMeasures后合并维度, 所以不需要进行笛卡尔积
5 * @param {Array<Object>} dataset 原始数据集
6 * @param {Array<Object>} dimensions 维度数组,每个维度对象至少包含 id 字段
7 * @param {Object} encoding 编码对象,key为通道名,value为维度id数组
8 * @param {Object} options 配置项
9 *  - foldMeasureId: 折叠指标的字段名
10 *  - separator: 维度值拼接分隔符
11 *  - colorItemAsId: 是否只用颜色项作为 colorId,默认 false
12 * @returns {Object} { dataset, unfoldInfo }
13 */
14function unfoldDimensions(dataset, dimensions, encoding, options) {
15  const { foldMeasureId, separator, colorItemAsId } = options || {}
16
17  const unfoldInfo = {
18    encodingX: XEncoding,
19    encodingColor: ColorEncoding,
20
21    colorItems: [],
22    colorIdMap: {},
23  }
24
25  // 根据 encoding 过滤对应维度
26  const xDimensions = encoding.x ? dimensions.filter(d => encoding.x.includes(d.id)) : []
27  const colorDimensions = encoding.color ? dimensions.filter(d => encoding.color.includes(d.id)) : []
28
29  const colorItemsSet = new Set()
30  const colorIdMap = {}
31
32  for (let i = 0; i < dataset.length; i++) {
33    const datum = dataset[i]
34
35    applyEncoding(XEncoding, xDimensions, datum, separator)
36    applyEncoding(ColorEncoding, colorDimensions, datum, separator)
37
38    const measureId = String(datum[foldMeasureId])
39    const colorItem = String(datum[ColorEncoding])
40    colorItemsSet.add(colorItem)
41  }
42
43  unfoldInfo.colorItems = Array.from(colorItemsSet)
44
45  return {
46    dataset,
47    unfoldInfo,
48  }
49}
50
51/**
52 * 应用编码至数据中, 原地修改 datum
53 * @param {string} encoding 编码字段名
54 * @param {Array<Object>} dimensions 维度数组
55 * @param {Object} datum 单条数据
56 * @param {string} separator 拼接分隔符
57 */
58function applyEncoding(encoding, dimensions, datum, separator) {
59  if (encoding && dimensions.length) {
60    datum[encoding] = dimensions.map(dim => String(datum[dim.id])).join(separator)
61  }
62}
63
64
65const dataset = [
66  { "category": "A", "__MeaId__": "sales",  "__MeaName__":  "Sales",  "__MeaValue__": 100 },
67  { "category": "A", "__MeaId__": "profit", "__MeaName__": "Profit",  "__MeaValue__": 30  },
68  { "category": "B", "__MeaId__": "sales",  "__MeaName__":  "Sales",  "__MeaValue__": 200 },
69  { "category": "B", "__MeaId__": "profit", "__MeaName__": "Profit",  "__MeaValue__": 50  }
70]
71const dimensions = [
72  { id: 'category'},
73  { id: '__MeaName__'},
74]
75
76const encoding = {
77  x: ['category'],
78  color: ['__MeaName__'],
79}
80
81const options = {
82  foldMeasureId: '__MeaId__',
83  separator: '-',
84  colorItemAsId: false,
85}
86
87const { dataset: unfoldedData, unfoldInfo } = unfoldDimensions(dataset, dimensions, encoding, options)
88
89console.log(unfoldedData)
预期输出
1[
2  {
3    "category": "A",
4    "__MeaId__": "sales",
5    "__MeaName__": "Sales",
6    "__MeaValue__": 100,
7    "__DimX__": "A",
8    "__DimColor__": "Sales"
9  },
10  {
11    "category": "A",
12    "__MeaId__": "profit",
13    "__MeaName__": "Profit",
14    "__MeaValue__": 30,
15    "__DimX__": "A",
16    "__DimColor__": "Profit"
17  },
18  {
19    "category": "B",
20    "__MeaId__": "sales",
21    "__MeaName__": "Sales",
22    "__MeaValue__": 200,
23    "__DimX__": "B",
24    "__DimColor__": "Sales"
25  },
26  {
27    "category": "B",
28    "__MeaId__": "profit",
29    "__MeaName__": "Profit",
30    "__MeaValue__": 50,
31    "__DimX__": "B",
32    "__DimColor__": "Profit"
33  }
34]