多值子列是HybridDB for MySQL分析引擎的独特特性,用于表示一个列(某个Cell)有多个不确定的值,列中列的概念。一个表中可以定义一个或者多个多值列,多值列可以做筛选、分组、参与连接过滤等。以下是几个具体的例子。
建表
CREATE TABLE test1 (
user_id BIGINT,
city VARCHAR,
tags MULTIVALUE delimiter_tokenizer ',' value_type 'varchar',
brands MULTIVALUE delimiter_tokenizer ': ,' value_type 'varchar int'
)
DISTRIBUTE BY HASH (user_id)
ENGINE = 'CSTORE'
INDEX_ALL='Y';
说明:
列 | 说明 | 业务意义 |
---|---|---|
tags | 只有一个多值子列,用`,`分隔 | 每个用户可以打多个标签 |
brands | 第一个多值子列类型为varchar,用`:`分隔 | 表示每个用户偏好的品牌 |
第二个多值子列类型为int,用`,`分隔 | 表示每个用户偏好品牌的偏好度(分数) |
写入数据
insert into test1 values(1, 'HZ', 'A,B,C', 'X:80,Y:70');
insert into test1 values(2, 'BJ', 'B,D', 'X:50,Z:90,W:60');
insert into test1 values(3, 'SH', 'A,C,D,F', 'X:90');
mysql> select * from test1 order by user_id;
+---------+------+---------+----------------+
| user_id | city | tags | brands |
+---------+------+---------+----------------+
| 1 | HZ | A,B,C | X:80,Y:70 |
| 2 | BJ | B,D | X:50,Z:90,W:60 |
| 3 | SH | A,C,D,F | X:90 |
+---------+------+---------+----------------+
3 rows in set (0.01 sec)
多值子列查询
过滤查询
#查询包含A、B标签的用户数目
mysql> select count(*) from test1 where ref(tags,0) in ('A', 'B');
+----------+
| COUNT(*) |
+----------+
| 3 |
+----------+
1 row in set (0.10 sec)
#查询同时包含A和B标签的用户数目
mysql> select * from test1 where ref(tags,0) in ('A') and ref(tags,0) in ('B');
+---------+------+-------+-----------+
| user_id | city | tags | brands |
+---------+------+-------+-----------+
| 1 | HZ | A,B,C | X:80,Y:70 |
+---------+------+-------+-----------+
1 row in set (0.01 sec)
#查询偏好品牌包含X和偏好度在80以上的用户
mysql> select * from test1 where ref(brands, 0) in ('X') and ref(brands, 1) > 80;
+---------+------+---------+----------------+
| user_id | city | tags | brands |
+---------+------+---------+----------------+
| 2 | BJ | B,D | X:50,Z:90,W:60 |
| 3 | SH | A,C,D,F | X:90 |
+---------+------+---------+----------------+
2 rows in set (0.01 sec)
分组查询
#不在杭州且标签为A或者B的用户在A、B标签下的人数
mysql> select ref(tags,0), count(*) from test1
where ref(tags,0) in ('A', 'B') and city != 'HZ' group by ref(tags,0);
+--------------+----------+
| ref(tags, 0) | COUNT(*) |
+--------------+----------+
| A | 1 |
| B | 1 |
+--------------+----------+
2 rows in set (0.05 sec)
#查询A或者B标签和其对应城市的联合分组
mysql> select ref(tags,0), city, count(*) from test1
where ref(tags,0) in ('A', 'B') group by ref(tags,0), city;
+--------------+----------+
| ref(tags, 0) | COUNT(*) |
+--------------+----------+
| B | BJ |
| B | HZ |
| A | SH |
| A | HZ |
+--------------+----------+
4 rows in set (0.02 sec)
#多个多值子列的联合分组
mysql> select ref(tags,0), ref(brands,0), count(*) from test1
where ref(tags,0) in ('A', 'B') and ref(brands,0) in ('X')
group by ref(tags,0), ref(brands,0) ;
+--------------+----------------+----------+
| ref(tags, 0) | ref(brands, 0) | COUNT(*) |
+--------------+----------------+----------+
| A | X | 2 |
| B | X | 2 |
+--------------+----------------+----------+
2 rows in set (0.02 sec)
连接查询
mysql> select count(*) from test1 join test
on test1.user_id = test.id where ref(tags,0) in ('A', 'B') ;
+----------+
| COUNT(*) |
+----------+
| 3 |
+----------+
1 row in set (0.16 sec)
mysql> select sum(test.int_test) from test1 join test
on test1.user_id = test.id where ref(tags,0) in ('A', 'B') ;
+--------------------+
| SUM(test.int_test) |
+--------------------+
| -207 |
+--------------------+
1 row in set (0.03 sec)
mysql> select test.id, test1.* from test1 join test
on test1.user_id = test.id
where ref(tags,0) in ('A', 'B') and ref(brands, 0) in ('Y');
+----+---------+------+-------+-----------+
| id | user_id | city | tags | brands |
+----+---------+------+-------+-----------+
| 1 | 1 | HZ | A,B,C | X:80,Y:70 |
+----+---------+------+-------+-----------+
1 row in set (0.02 sec)
注意事项
- 多值列的GROUP BY一定要带WHERE条件,否则引擎底层需要遍历所有的value(默认只遍历1024个value)。
- 多个子列之间的组合筛选是Cell内的或(OR)关系。