JSON

本文介绍如何在DLA中为JSON类型的数据创建表。

前提条件

  • 创建文件数据表之前需要先创建OSS Schema,本文示例中所有文件数据表均使用以下OSS Schema。

    1. CREATE SCHEMA dla_oss_db with DBPROPERTIES(
    2. catalog='oss',
    3. location 'oss://dlaossfile1/dla/'
    4. )
  • JSON文件必须满足以下条件,否则无法在DLA中为JSON文件创建表

    • 无论JSON数据是否为嵌套格式,必须将一条完整的JSON数据放在一行中,否则无法在DLA中为JSON文件创建表。

    • JSON文件以纯文本格式存储,例如TXT等。

    • JSON文件中数据的编码方式是UTF-8。

    • JSON文件中每一行必须是一个完整的JSON对象。

      例如,DLA无法为以下文件创建表:

      1. {"id": 123, "name": "jack",
      2. "c3": "2001-02-03 12:34:56"}
      3. {"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
      4. {"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
      5. {"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}

      可改写成:

      1. {"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
      2. {"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
      3. {"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
      4. {"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}

不含嵌套的JSON数据

例如,OSS中有一个json_table_1.txt文件:oss://dlaossfile1/dla/json_table_1.txt,存储的数据如下所示。

  1. {"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
  2. {"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
  3. {"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
  4. {"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}

DLA中的建表语句为:

  1. CREATE EXTERNAL TABLE json_table_1
  2. (id int,
  3. name string,
  4. c3 timestamp
  5. )STORED AS JSON
  6. LOCATION 'oss://dlaossfile1/dla/';

查询结果为:

  1. SELECT * FROM dla_oss_db.json_table_1
  2. |----|------|---------------------|
  3. | id | name | c3 |
  4. | 123| jack | 2001-02-03 12:34:560|
  5. | 456| rose | 1906-04-18 05:12:00 |
  6. | 789| tome | 2001-02-03 12:34:56 |
  7. | 234| alice|1906-04-18 05:12:00 |

含有嵌套的JSON数据

含有嵌套的JSON数据是指使用struct和array定义的嵌套JSON数据。

例如,以下一条原始JSON数据。

  1. { "DocId": "Alibaba", "User_1": { "Id": 1234, "Username": "bob1234", "Name": "Bob", "ShippingAddress": { "Address1": "969 Wenyi West St.", "Address2": null, "City": "Hangzhou", "Province": "Zhejiang" }, "Orders": [{ "ItemId": 6789, "OrderDate": "11/11/2017" }, { "ItemId": 4352, "OrderDate": "12/12/2017" } ] } }

为了便于您查看JSON数据,使用在线JSON格式化工具处理后,数据内容为:

  1. {
  2. "DocId": "Alibaba",
  3. "User_1": {
  4. "Id": 1234,
  5. "Username": "bob1234",
  6. "Name": "Bob",
  7. "ShippingAddress": {
  8. "Address1": "969 Wenyi West St.",
  9. "Address2": null,
  10. "City": "Hangzhou",
  11. "Province": "Zhejiang"
  12. },
  13. "Orders": [
  14. {
  15. "ItemId": 6789,
  16. "OrderDate": "11/11/2017"
  17. },
  18. {
  19. "ItemId": 4352,
  20. "OrderDate": "12/12/2017"
  21. }
  22. ]
  23. }
  24. }

建表语句为:

  1. CREATE EXTERNAL TABLE json_table_2 (
  2. docid string,
  3. user_1 struct<
  4. id:INT,
  5. username:string,
  6. name:string,
  7. shippingaddress:struct<
  8. address1:string,
  9. address2:string,
  10. city:string,
  11. province:string
  12. >,
  13. orders:array<
  14. struct<
  15. itemid:INT,
  16. orderdate:string
  17. >
  18. >
  19. >
  20. )
  21. STORED AS JSON
  22. LOCATION 'oss://dlaossfile1/dla/json_table_2.txt';

查询结果为:

  1. select * from json_table_2;
  2. +---------+----------------------------------------------------------------------------------------------------------------+
  3. | docid | user_1 |
  4. +---------+----------------------------------------------------------------------------------------------------------------+
  5. | Alibaba | [1234, bob1234, Bob, [969 Wenyi West St., null, Hangzhou, Zhejiang], [[6789, 11/11/2017], [4352, 12/12/2017]]] |
  6. +---------+----------------------------------------------------------------------------------------------------------------+

查询数据时,对于使用struct定义的嵌套结构,可以通过.进行层次对象引用;对于array定义的数组结构,可以通过[数组下标](数组下标从1开始)进行对象引用。

  1. select DocId,
  2. User_1.Id,
  3. User_1.ShippingAddress.Address1,
  4. User_1.Orders[1].ItemId
  5. from json_table_2
  6. where User_1.Username = 'bob1234'
  7. and User_1.Orders[2].OrderDate = '12/12/2017';
  8. +---------+------+--------------------+-------+
  9. | DocId | id | address1 | _col3 |
  10. +---------+------+--------------------+-------+
  11. | Alibaba | 1234 | 969 Wenyi West St. | 6789 |

使用JSON函数处理JSON数据

以下JSON数据中,keyvalue_string对应的value为嵌套字符。

  1. {"data_key":"com.taobao.vipserver.domains.meta.biz.alibaba.com","ts":1524550275112,"value_string":"{\"appName\":\"\",\"apps\":[],\"checksum\":\"50fa0540b430904ee78dff07c7350e1c\",\"clusterMap\":{\"DEFAULT\":{\"defCkport\":80,\"defIPPort\":80,\"healthCheckTask\":null,\"healthChecker\":{\"checkCode\":200,\"curlHost\":\"\",\"curlPath\":\"/status.taobao\",\"type\":\"HTTP\"},\"name\":\"DEFAULT\",\"nodegroup\":\"\",\"sitegroup\":\"\",\"submask\":\"0.0.0.0/0\",\"syncConfig\":{\"appName\":\"trade-ma\",\"nodegroup\":\"tradema\",\"pubLevel\":\"publish\",\"role\":\"\",\"site\":\"\"},\"useIPPort4Check\":true}},\"disabledSites\":[],\"enableArmoryUnit\":false,\"enableClientBeat\":false,\"enableHealthCheck\":true,\"enabled\":true,\"envAndSites\":\"\",\"invalidThreshold\":0.6,\"ipDeleteTimeout\":1800000,\"lastModifiedMillis\":1524550275107,\"localSiteCall\":true,\"localSiteThreshold\":0.8,\"name\":\"biz.alibaba.com\",\"nodegroup\":\"\",\"owners\":[\"junlan.zx\",\"张三\",\"李四\",\"cui.yuanc\"],\"protectThreshold\":0,\"requireSameEnv\":false,\"resetWeight\":false,\"symmetricCallType\":null,\"symmetricType\":\"warehouse\",\"tagName\":\"ipGroup\",\"tenantId\":\"\",\"tenants\":[],\"token\":\"1cf0ec0c771321bb4177182757a67fb0\",\"useSpecifiedURL\":false}"}

为了便于您查看JSON数据,使用在线JSON格式化工具处理后,数据内容为:

  1. {
  2. "data_key": "com.taobao.vipserver.domains.meta.biz.alibaba.com",
  3. "ts": 1524550275112,
  4. "value_string": "{\"appName\":\"\",\"apps\":[],\"checksum\":\"50fa0540b430904ee78dff07c7350e1c\",\"clusterMap\":{\"DEFAULT\":{\"defCkport\":80,\"defIPPort\":80,\"healthCheckTask\":null,\"healthChecker\":{\"checkCode\":200,\"curlHost\":\"\",\"curlPath\":\"/status.taobao\",\"type\":\"HTTP\"},\"name\":\"DEFAULT\",\"nodegroup\":\"\",\"sitegroup\":\"\",\"submask\":\"0.0.0.0/0\",\"syncConfig\":{\"appName\":\"trade-ma\",\"nodegroup\":\"tradema\",\"pubLevel\":\"publish\",\"role\":\"\",\"site\":\"\"},\"useIPPort4Check\":true}},\"disabledSites\":[],\"enableArmoryUnit\":false,\"enableClientBeat\":false,\"enableHealthCheck\":true,\"enabled\":true,\"envAndSites\":\"\",\"invalidThreshold\":0.6,\"ipDeleteTimeout\":1800000,\"lastModifiedMillis\":1524550275107,\"localSiteCall\":true,\"localSiteThreshold\":0.8,\"name\":\"biz.alibaba.com\",\"nodegroup\":\"\",\"owners\":[\"junlan.zx\",\"张三\",\"李四\",\"cui.yuanc\"],\"protectThreshold\":0,\"requireSameEnv\":false,\"resetWeight\":false,\"symmetricCallType\":null,\"symmetricType\":\"warehouse\",\"tagName\":\"ipGroup\",\"tenantId\":\"\",\"tenants\":[],\"token\":\"1cf0ec0c771321bb4177182757a67fb0\",\"useSpecifiedURL\":false}"
  5. }

建表语句为:

  1. CREATE external TABLE json_table_3 (
  2. data_key string,
  3. ts bigint,
  4. value_string string
  5. )
  6. STORED AS JSON
  7. LOCATION 'oss://dlaossfile1/dla/json_table_3.txt';

查询结果为:

查询结果

查询数据时,可以使用json_parse()json_extract_scalar()json_extract()等常用JSON函数对数据进行处理。

  1. select json_extract_scalar(json_parse(value_string),'$.owners[1]') from json_table_3
  2. +--------+
  3. | _col0 |
  4. +--------+
  5. | 张三 |
  1. select json_extract_scalar(json_obj.json_col, '$.DEFAULT.submask')
  2. from (
  3. select json_extract(json_parse(value_string), '$.clusterMap') as json_col from json_table_3
  4. ) json_obj
  5. where json_extract_scalar(json_obj.json_col, '$.DEFAULT.healthChecker.curlPath') = '/status.taobao';
  6. +-----------+
  7. | _col0 |
  8. +-----------+
  9. | 0.0.0.0/0 |
  1. with json_obj as (select json_extract(json_parse(value_string), '$.clusterMap') as json_col from json_table_3)
  2. select json_extract_scalar(json_obj.json_col, '$.DEFAULT.submask')
  3. from json_obj
  4. where json_extract_scalar(json_obj.json_col, '$.DEFAULT.healthChecker.curlPath') = '/status.taobao';
  5. +-----------+
  6. | _col0 |
  7. +-----------+
  8. | 0.0.0.0/0 |