JSON数据列展开

数据库中常常需要处理JSON数据,例如,将JSON数据中的某些属性字段进行展开转换,转成行、列的关系型形式。JSON数据的展开方式灵活多变,本文通过详细的SQL示例,介绍如何展开JSON数据列。

基本思路

  1. 使用JSON函数,对JSON字符串进行解析和数据提取;

  2. 提取、转换为ARRAY或者MAP的数据结构,也可以使用Lambda函数式表达式进行转换处理;

  3. 使用UNNEST语法进行列展开。

使用UNNEST对MAP进行关系型展开

SQL示例:

  1. SELECT t.m, t.n
  2. FROM (
  3. SELECT MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) as map_data
  4. )
  5. CROSS JOIN unnest(map_data) AS t(m, n);

结果:

  1. +------+------+
  2. | m | n |
  3. +------+------+
  4. | foo | 1 |
  5. | bar | 2 |
  6. +------+------+

使用UNNEST对JSON数据进行关系型展开

SQL示例1:

  1. SELECT json_extract(t.a, '$.a') AS a,
  2. json_extract(t.a, '$.b') AS b
  3. FROM (
  4. SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')
  5. AS array<JSON>) AS package_array
  6. )
  7. CROSS JOIN UNNEST(package_array) AS t(a);

结果:

  1. +------+------+
  2. | a | b |
  3. +------+------+
  4. | 1 | 2 |
  5. | 3 | 4 |
  6. +------+------+

SQL示例2:

  1. SELECT t.m AS _col1, t.n AS _col2
  2. FROM (
  3. SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')
  4. AS array<JSON>) AS array_1,
  5. cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x')
  6. AS array<JSON>) AS array_2
  7. )
  8. CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

结果:

  1. +---------------+-----------------+
  2. | _col1 | _col2 |
  3. +---------------+-----------------+
  4. | {"a":1,"b":2} | {"a":5,"b":6} |
  5. | {"a":3,"b":4} | {"a":7,"b":8} |
  6. | NULL | {"a":9,"b":10} |
  7. | NULL | {"a":11,"b":12} |
  8. +---------------+-----------------+

SQL示例3:

  1. SELECT json_extract(t.m, '$.a') AS _col1,
  2. json_extract(t.m, '$.b') AS _col2,
  3. json_extract(t.n, '$.a') AS _col3,
  4. json_extract(t.n, '$.b') AS _col4
  5. FROM (
  6. SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')
  7. AS array<JSON>) AS array_1,
  8. cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x')
  9. AS array<JSON>) AS array_2
  10. )
  11. CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

结果:

  1. +-------+-------+-------+-------+
  2. | _col1 | _col2 | _col3 | _col4 |
  3. +-------+-------+-------+-------+
  4. | 1 | 2 | 5 | 6 |
  5. | 3 | 4 | 7 | 8 |
  6. | NULL | NULL | 9 | 10 |
  7. | NULL | NULL | 11 | 12 |
  8. +-------+-------+-------+-------+

结合Lambda表达式,使用UNNEST对JSON数据进行关系型展开

SQL示例:

  1. SELECT count(*) AS cnt,
  2. package_name
  3. FROM (
  4. SELECT t.a AS package_name
  5. FROM (
  6. SELECT transform(packages_map_array, x -> Element_at(x, 'packageName'))
  7. AS package_array
  8. FROM (
  9. SELECT cast(Json_extract(data_json, '$.packages')
  10. AS array<map<VARCHAR, VARCHAR>>) AS packages_map_array
  11. FROM (
  12. SELECT json_parse(data) AS data_json
  13. FROM (
  14. SELECT '{
  15. "packages": [
  16. {
  17. "appName": "铁路12306",
  18. "packageName": "com.MobileTicket",
  19. "versionName": "4.1.9",
  20. "versionCode": "194"
  21. },
  22. {
  23. "appName": "QQ飞车",
  24. "packageName": "com.tencent.tmgp.speedmobile",
  25. "versionName": "1.11.0.13274",
  26. "versionCode": "1110013274"
  27. },
  28. {
  29. "appName": "掌阅",
  30. "packageName": "com.chaozh.iReaderFree",
  31. "versionName": "7.11.0",
  32. "versionCode": "71101"
  33. }
  34. ]
  35. }'
  36. AS data
  37. )
  38. )
  39. )
  40. ) AS x (package_array)
  41. CROSS JOIN UNNEST(package_array) AS t (a)
  42. )
  43. GROUP BY package_name
  44. ORDER BY cnt DESC;

结果:

  1. +------+------------------------------+
  2. | cnt | package_name |
  3. +------+------------------------------+
  4. | 1 | com.MobileTicket |
  5. | 1 | com.tencent.tmgp.speedmobile |
  6. | 1 | com.chaozh.iReaderFree |
  7. +------+------------------------------+