加入收藏 | 设为首页 | 会员中心 | 我要投稿 焦作站长网 (https://www.0391zz.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 站长资讯 > 动态 > 正文

postgresql中的like搜索优化方法

发布时间:2021-10-09 16:03:36 所属栏目:动态 来源:互联网
导读:当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比。 一、对比情况说明: 1、数据量100w条数据 2、执行sql 二、对比结

当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比。

 

一、对比情况说明:

 

1、数据量100w条数据

 

2、执行sql

 

二、对比结果

 

explain analyze SELECT

 c_patent,

 c_applyissno,

 d_applyissdate,

 d_applydate,

 c_patenttype_dimn,

 c_newlawstatus,

 c_abstract

FROM

 public.t_knowl_patent_zlxx_temp

WHERE

 c_applicant LIKE '%本溪满族自治县连山关镇安平安养殖场%';

 

 

 

1、未建索时执行计划:

 

"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1)

 Workers Planned: 2

 Workers Launched: 2

 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3)

  Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)

  Rows Removed by Filter: 333333

Planning time: 0.272 ms

Execution time: 228.116 ms"

 

2、btree索引

 

建索引语句

 

1CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);

 

执行计划

 

"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1)

 Workers Planned: 2

 Workers Launched: 2

 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3)

  Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)

  Rows Removed by Filter: 333333

Planning time: 0.116 ms

Execution time: 218.189 ms"

 

但是如果将查询sql稍微改动一下,把like查询中的前置%去掉是这样的

 

Index Scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows=92 width=1278) (actual time=0.292..0.292 rows=0 loops=1)

 Index Cond: (((c_applicant)::text ~>=~ '本溪满族自治县连山关镇安平安养殖场'::text) AND ((c_applicant)::text ~<~ '本溪满族自治县连山关镇安平安养殖圻'::text))

 Filter: ((c_applicant)::text ~~ '本溪满族自治县连山关镇安平安养殖场%'::text)

Planning time: 0.710 ms

Execution time: 0.378 ms

 

3、gin索引

 

创建索引语句(postgresql要求在9.6版本及以上)

 

create extension pg_trgm;CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx USING gin (c_applicant gin_trgm_ops);

 

执行计划

 

Bitmap Heap Scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows=91 width=1268) (actual time=0.649..0.649 rows=0 loops=1)

 Recheck Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)

 -> Bitmap Index Scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows=91 width=0) (actual time=0.647..0.647 rows=0 loops=1)

  Index Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)

Planning time: 0.673 ms

Execution time: 0.740 ms

 

三、结论

 

btree索引可以让后置% "abc%"的模糊匹配走索引,gin + gp_trgm可以让前后置% "%abc%" 走索引。但是gin 索引也有弊端,以下情况可能导致无法命中:

 

搜索字段少于3个字符时,不会命中索引,这是gin自身机制导致。

 

当搜索字段过长时,比如email检索,可能也不会命中索引,造成原因暂时未知。

 

补充:PostgreSQL LIKE 查询效率提升实验

 

一、未做索引的查询效率

 

作为对比,先对未索引的查询做测试

 

EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘';

             QUERY PLAN            

-----------------------------------------------------------------------------------------------------------------

 Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.011..39.753 rows=1031 loops=1)

 Filter: ((author)::text = '曹志耘'::text)

 Rows Removed by Filter: 71315

 Planning time: 0.194 ms

 Execution time: 39.879 ms

(5 rows)

 

Time: 40.599 ms

EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘';

             QUERY PLAN            

-----------------------------------------------------------------------------------------------------------------

 Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.017..41.513 rows=1031 loops=1)

 Filter: ((author)::text ~~ '曹志耘'::text)

 Rows Removed by Filter: 71315

 Planning time: 0.188 ms

 Execution time: 41.669 ms

(5 rows)

 

Time: 42.457 ms

 

EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%';

             QUERY PLAN            

-----------------------------------------------------------------------------------------------------------------

 Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.017..41.492 rows=1031 loops=1)

 Filter: ((author)::text ~~ '曹志耘%'::text)

 Rows Removed by Filter: 71315

 Planning time: 0.307 ms

 Execution time: 41.633 ms

(5 rows)

 

Time: 42.676 ms

 

 

 

很显然都会做全表扫描

 

二、创建btree索引

 

PostgreSQL默认索引是btree

 

CREATE INDEX ix_gallery_map_author ON gallery_map (author);

 

EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘'; 

                QUERY PLAN               

-------------------------------------------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.457..1.312 rows=1031 loops=1)

 Recheck Cond: ((author)::text = '曹志耘'::text)

 Heap Blocks: exact=438

 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.358..0.358 rows=1031 loops=1)

   Index Cond: ((author)::text = '曹志耘'::text)

 Planning time: 0.416 ms

 Execution time: 1.422 ms

(7 rows)

 

Time: 2.462 ms

 

EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘';

                QUERY PLAN               

-------------------------------------------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.752..2.119 rows=1031 loops=1)

 Filter: ((author)::text ~~ '曹志耘'::text)

 Heap Blocks: exact=438

 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.560..0.560 rows=1031 loops=1)

   Index Cond: ((author)::text = '曹志耘'::text)

 Planning time: 0.270 ms

 Execution time: 2.295 ms

(7 rows)

 

Time: 3.444 ms

EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%';

             QUERY PLAN            

-----------------------------------------------------------------------------------------------------------------

 Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..41.389 rows=1031 loops=1)

 Filter: ((author)::text ~~ '曹志耘%'::text)

 Rows Removed by Filter: 71315

 Planning time: 0.260 ms

 Execution time: 41.518 ms

(5 rows)

 

Time: 42.430 ms

EXPLAIN ANALYZE select * from gallery_map where author like '%研究室';

             QUERY PLAN            

-----------------------------------------------------------------------------------------------------------------

 Seq Scan on gallery_map (cost=0.00..7002.32 rows=2282 width=621) (actual time=0.064..52.824 rows=2152 loops=1)

 Filter: ((author)::text ~~ '%研究室'::text)

 Rows Removed by Filter: 70194

 Planning time: 0.254 ms

 Execution time: 53.064 ms

(5 rows)

 

Time: 53.954 ms

 

 

 

可以看到,等于、like的全匹配是用到索引的,like的模糊查询还是全表扫描

(编辑:焦作站长网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    热点阅读