HybridDB for MySQL的分析引擎支持在SQL的全文检索语法。需要说明的是,HybridDB for MySQL的全文检索能力和SQL中的like是有区别的,这里的全文检索是指在SQL中需要分词的搜索需求,而SQL中的like是模糊匹配的需求,只需要字符串列建索引就可以。目前HybridDB for MySQL的全文检索性能可以达到亿级别数据毫秒级响应。
全文检索列,底层实现是多值列,只是分词器为nlp。
建表
CREATE TABLE test3 (
user_id BIGINT,
city VARCHAR,
text MULTIVALUE nlp_tokenizer 'ik' value_type 'varchar'
)
DISTRIBUTE BY HASH (user_id)
ENGINE = 'CSTORE'
INDEX_ALL='Y';
说明
- 全文检索列,底层实现是多值列,只是分词器类型为nlp_tokenizer
- 默认的分词器为ElasticSearch的ik分词器
写入数据
insert into test3 values(1, 'HZ', '中华人民共和国');
insert into test3 values(2, 'BJ', 'HybridDB for MySQL是新型HTAP数据库');
insert into test3 values(3, 'SH', 'hello, world');
mysql> select * from test3 order by user_id;
+---------+------+------------------------------------------+
| user_id | city | text |
+---------+------+------------------------------------------+
| 1 | HZ | 中华人民共和国 |
| 2 | BJ | HybridDB for MySQL是新型HTAP数据库 |
| 3 | SH | hello, world |
+---------+------+------------------------------------------+
3 rows in set (0.02 sec)
检索查询
mysql> select * from test3 where text in ('中华', '数据库');
+---------+------+------------------------------------------+
| user_id | city | text |
+---------+------+------------------------------------------+
| 1 | HZ | 中华人民共和国 |
| 2 | BJ | HybridDB for MySQL是新型HTAP数据库 |
+---------+------+------------------------------------------+
2 rows in set (0.30 sec)
mysql> select * from test3 where text in ('hello') and city != 'HZ';
+---------+------+--------------+
| user_id | city | text |
+---------+------+--------------+
| 3 | SH | hello, world |
+---------+------+--------------+
1 row in set (0.93 sec)
mysql> select * from test3 where ref(text,0) in ('hybriddb');
+---------+------+---------------------------------------------+
| user_id | city | text |
+---------+------+---------------------------------------------+
| 2 | BJ | HybridDB for MySQL是新型HTAP数据库 |
+---------+------+---------------------------------------------+
4 rows in set (0.02 sec)
# 普通字符串列也可以做like查找
mysql> select * from test3 where city like '%J%';
+---------+------+---------------------------------------------+
| user_id | city | text |
+---------+------+---------------------------------------------+
| 2 | BJ | HybridDB for MySQL是新型HTAP数据库 |
+---------+------+---------------------------------------------+
4 rows in set (0.02 sec)
注意
全文检索的英文单词,默认都是小写归一化处理。