数据重塑-实现

简单而又巧妙

这是VSeed最有意思的, 也是最核心的模块, 看似复杂, 实则非常简单与巧妙, 仅仅不到200行代码.

只要善用foldMeasures 与 unfoldDimensions, 可以将任意指标维度, 转换为固定的指标与维度, 做到足够自由的可视化映射.

foldMeasures

源代码位置

foldMeasures 将所有的指标 fold 为一个指标, 增加一个指标名称维度和一个指标Id维度, 所有的可能丢失信息都存储在foldInfo内, 并且在这个过程中可以进行数据统计

特性

  1. 特性1: foldMeasures执行完之后, 一定只有1个指标字段, 即能够将多指标描述的数据, 都转换成1个指标; 将任意多指标数据对应一个图元
  2. 特性2: 1. 数据条目与图元(几何元素)的数据严格一致,一条数据对应一个图元
  3. 特性3: 该过程进行数据统计
最妙的地方!!!
  • 1个指标0个维度, foldMeasures 后可以获得1个指标2个维度(包括指标名称和指标Id)
  • 4个指标1个维度, 经过2次foldMeasures 后可以获得2个指标3个维度(包括指标名称和指标Id), 从而完美的可以支持双轴图等场景.
  • N个指标0个维度, 经过Y(Y ≤ N)次foldMeasures 后, 可以获得Y个指标和2个维度(包括指标名称和指标Id)

最小可运行示例

foldMeasures
const data = [
  { category: 'A', sales: 100, profit: 30 },
  { category: 'B', sales: 200, profit: 50 },
]

const measures = [
  { id: 'sales', alias: 'Sales' },
  { id: 'profit', alias: 'Profit' },
]

function foldMeasures(dataset, measures, options) {
  const {
    measureId,
    measureName,
    measureValue,
    colorMeasureId,
    allowEmptyFold = true,
  } = options || {}

  const foldInfo = {
    measureId,
    measureName,
    measureValue,
    statistics: {
      max: -Infinity,
      min: Infinity,
      sum: 0,
      count: 0,
      colorMin: Infinity,
      colorMax: -Infinity,
    },
    foldMap: {},
  }

  const ids = measures.map(m => m.id)
  const result = []

  for (const row of dataset) {
    for (const measure of measures) {
      const { id, alias } = measure
      const newRow = { ...row }

      // 删除其他指标字段,避免重复
      for (const key of ids) {
        delete newRow[key]
      }

      newRow[measureId] = id
      newRow[measureName] = alias || id
      newRow[measureValue] = row[id]

      if (colorMeasureId) {
        const colorValue = row[colorMeasureId]
        newRow.color = colorValue
        foldInfo.statistics.colorMin = Math.min(foldInfo.statistics.colorMin, Number(colorValue))
        foldInfo.statistics.colorMax = Math.max(foldInfo.statistics.colorMax, Number(colorValue))
      }

      const val = Number(row[id])
      foldInfo.statistics.min = Math.min(foldInfo.statistics.min, val)
      foldInfo.statistics.max = Math.max(foldInfo.statistics.max, val)
      foldInfo.statistics.sum += val
      foldInfo.statistics.count++

      foldInfo.foldMap[id] = alias

      result.push(newRow)
    }
  }

  return { dataset: result, foldInfo }
}

const { dataset: foldedData, foldInfo } = foldMeasures(data, measures, {
  measureId: '__MeaId__',
  measureName: '__MeaName__',
  measureValue: '__MeaValue__',
})

console.log(foldedData)
预期输出
[
  {
    "category": "A",
    "__MeaId__": "sales",
    "__MeaName__": "Sales",
    "__MeaValue__": 100
  },
  {
    "category": "A",
    "__MeaId__": "profit",
    "__MeaName__": "Profit",
    "__MeaValue__": 30
  },
  {
    "category": "B",
    "__MeaId__": "sales",
    "__MeaName__": "Sales",
    "__MeaValue__": 200
  },
  {
    "category": "B",
    "__MeaId__": "profit",
    "__MeaName__": "Profit",
    "__MeaValue__": 50
  }
]

unfoldDimensions

源代码位置

unfoldDimensions 在不丢失信息的前提下, 将任意的维度 concat 为一个新的维度, 所有的增加的信息都存储在unfoldInfo内.

一个完整unfoldDimensions == 所有维度值转指标 + 一次foldMeasures

但遍历dataset的开销是巨大的, 一次多余的 foldMeasures 会导致性能下降.

foldMeasures 可以直接保证一条数据只有一个指标, 因此可以直接在源数据上进行单纯的合并, 就能巧妙的达到等价效果, 最终从而大幅度提升性能.

经过思考, 理论上unfoldDimensions可以和foldMeasures完全合并, 在一次dataset 遍历中完成所有数据处理, 但为了可读性和可维护性, 在没有性能瓶颈的情况下, 暂定不合并.

特性

特性1: unfoldDimensions执行完之后, 一定只有1个指标字段, 特性2: 可以在不丢失原数据的情况下, 合并维度

最妙的地方!!!
  1. 只要在foldMeasures后进行, 就可以通过最简单的 concat 操作, 即可完成展开维度与合并指标, 性能极其优异.
  2. 任意的维度都能合并为一个全新的维度字段, 做到任意的视觉通道映射.
  3. 因为本身并不复杂, 所以理论上可以和 foldMeasures 合并在一起, 降低遍历次数, 提升性能.

最小可运行示例

const XEncoding = '__DimX__'
const ColorEncoding = '__DimColor__'
/**
 * 展开并合并视觉通道的维度, 在foldMeasures后合并维度, 所以不需要进行笛卡尔积
 * @param {Array<Object>} dataset 原始数据集
 * @param {Array<Object>} dimensions 维度数组,每个维度对象至少包含 id 字段
 * @param {Object} encoding 编码对象,key为通道名,value为维度id数组
 * @param {Object} options 配置项
 *  - foldMeasureId: 折叠指标的字段名
 *  - separator: 维度值拼接分隔符
 *  - colorItemAsId: 是否只用颜色项作为 colorId,默认 false
 * @returns {Object} { dataset, unfoldInfo }
 */
function unfoldDimensions(dataset, dimensions, encoding, options) {
  const { foldMeasureId, separator, colorItemAsId } = options || {}

  const unfoldInfo = {
    encodingX: XEncoding,
    encodingColor: ColorEncoding,

    colorItems: [],
    colorIdMap: {},
  }

  // 根据 encoding 过滤对应维度
  const xDimensions = encoding.x ? dimensions.filter(d => encoding.x.includes(d.id)) : []
  const colorDimensions = encoding.color ? dimensions.filter(d => encoding.color.includes(d.id)) : []

  const colorItemsSet = new Set()
  const colorIdMap = {}

  for (let i = 0; i < dataset.length; i++) {
    const datum = dataset[i]

    applyEncoding(XEncoding, xDimensions, datum, separator)
    applyEncoding(ColorEncoding, colorDimensions, datum, separator)

    const measureId = String(datum[foldMeasureId])
    const colorItem = String(datum[ColorEncoding])
    colorItemsSet.add(colorItem)
  }

  unfoldInfo.colorItems = Array.from(colorItemsSet)

  return {
    dataset,
    unfoldInfo,
  }
}

/**
 * 应用编码至数据中, 原地修改 datum
 * @param {string} encoding 编码字段名
 * @param {Array<Object>} dimensions 维度数组
 * @param {Object} datum 单条数据
 * @param {string} separator 拼接分隔符
 */
function applyEncoding(encoding, dimensions, datum, separator) {
  if (encoding && dimensions.length) {
    datum[encoding] = dimensions.map(dim => String(datum[dim.id])).join(separator)
  }
}


const dataset = [
  { "category": "A", "__MeaId__": "sales",  "__MeaName__":  "Sales",  "__MeaValue__": 100 },
  { "category": "A", "__MeaId__": "profit", "__MeaName__": "Profit",  "__MeaValue__": 30  },
  { "category": "B", "__MeaId__": "sales",  "__MeaName__":  "Sales",  "__MeaValue__": 200 },
  { "category": "B", "__MeaId__": "profit", "__MeaName__": "Profit",  "__MeaValue__": 50  }
]
const dimensions = [
  { id: 'category'},
  { id: '__MeaName__'},
]

const encoding = {
  x: ['category'],
  color: ['__MeaName__'],
}

const options = {
  foldMeasureId: '__MeaId__',
  separator: '-',
  colorItemAsId: false,
}

const { dataset: unfoldedData, unfoldInfo } = unfoldDimensions(dataset, dimensions, encoding, options)

console.log(unfoldedData)

预期输出
[
  {
    "category": "A",
    "__MeaId__": "sales",
    "__MeaName__": "Sales",
    "__MeaValue__": 100,
    "__DimX__": "A",
    "__DimColor__": "Sales"
  },
  {
    "category": "A",
    "__MeaId__": "profit",
    "__MeaName__": "Profit",
    "__MeaValue__": 30,
    "__DimX__": "A",
    "__DimColor__": "Profit"
  },
  {
    "category": "B",
    "__MeaId__": "sales",
    "__MeaName__": "Sales",
    "__MeaValue__": 200,
    "__DimX__": "B",
    "__DimColor__": "Sales"
  },
  {
    "category": "B",
    "__MeaId__": "profit",
    "__MeaName__": "Profit",
    "__MeaValue__": 50,
    "__DimX__": "B",
    "__DimColor__": "Profit"
  }
]