elasticsearch语法知多少之matchquery(代码片段)

身前一尺是我的世界 身前一尺是我的世界     2022-10-22     124

关键词:

目录

目标

ES版本信息

官方文档

相关术语

创建相关的索引和文档(数据用于实战案例)

创建索引

索引文档

Match query常见参数实战

基本语法

analyzer(指定分词器查询)

operator(解释查询条件的布尔逻辑)

minimum_should_match(最少匹配数)

fuzzy(模糊搜索)


目标

掌握匹配查询,本文会列举各种常见的案例,通过这些案例来熟悉匹配查询各个参数的功能和使用方法。


ES版本信息

7.17.5


官方文档

Match queryhttps://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html


相关术语

Match query

即匹配查询。返回与提供的文本、数字、日期或布尔值匹配的文档。在匹配之前分析提供的文本。匹配查询是执行全文搜索的标准查询,包括模糊匹配选项。


创建相关的索引和文档(数据用于实战案例)

创建索引

PUT /student_db

  "settings": 
    "index": 
      "analysis.analyzer.default.type": "ik_max_word"
    
  


PUT /address_list

  "mappings": 
    "properties": 
      "province": 
        "type": "text",
        "copy_to": "fullAddress"
      ,
      "city": 
        "type": "text",
        "copy_to": "fullAddress"
      ,
      "county": 
        "type": "text",
        "copy_to": "fullAddress"
      
    
  ,
  "settings": 
    "index": 
      "analysis.analyzer.default.type": "ik_max_word"
    
  

索引文档

PUT /student_db/_bulk
"index":"_id":"1"
"province":"湖南省","city":"长沙市","county":"天心区","describe":"侠客岛服务员A。","stu_id":"10001","stu_name":"张三","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":"Math":"value":98.5,"level":"优","English":"value":87.5,"level":"良"
"index":"_id":"2"
"province":"湖南省","city":"长沙市","county":"芙蓉区","describe":"侠客岛服务员B。","stu_id":"10002","stu_name":"李四","age":12,"sex":true,"birthday":"1998-01-01","hobby":["唱歌","跳舞","游泳"],"examination_results":"English":"value":97.5,"level":"优","Chinese":"value":85.5,"level":"良"
"index":"_id":"3"
"province":"湖北省","city":"武汉市","county":"江夏区","describe":"会九阳神功、乾坤大挪移、圣火令武功、太极拳,太极剑等武功。","stu_id":"10003","stu_name":"张无忌","age":11,"sex":false,"birthday":"1999-01-01","hobby":["乒乓球","跳舞","游泳"],"examination_results":"Physics":"value":77.5,"level":"一般","Chinese":"value":100,"level":"优"
"index":"_id":"4"
"province":"湖北省","city":"黄石市","county":"铁山区","describe":"会黯然销魂掌、弹指神功、玉女剑法等武功。","stu_id":"10004","stu_name":"杨过","age":9,"sex":false,"birthday":"2001-01-01","hobby":["乒乓球","唱歌","游泳"],"examination_results":"Chemistry":"value":70.5,"level":"一般","Chinese":"value":91.5,"level":"优"
"index":"_id":"5"
"province":"广东省","city":"广州市","county":"南沙区","describe":"辽国南院大王,精通降龙十八掌,真正的战神。","stu_id":"10005","stu_name":"萧峰","age":13,"sex":true,"birthday":"1997-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":"FineArts":"value":92.5,"level":"优","Sports":"value":91.5,"level":"优"
"index":"_id":"6"
"province":"广东省","city":"广州市","county":"南沙区","describe":"精通降龙十八掌,为国为民的侠之大者。","stu_id":"10006","stu_name":"郭靖","age":13,"sex":true,"birthday":"1997-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":"History":"value":92.5,"level":"优","Chemistry":"value":91.5,"level":"优"
"index":"_id":"7"
"province":"广东省","city":"广州市","county":"白云区","describe":"会降龙十八掌,逍遥派诸多武功。","stu_id":"10007","stu_name":"虚竹","age":14,"sex":false,"birthday":"1996-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":"History":"value":90.5,"level":"优","Chemistry":"value":94.5,"level":"优"
"index":"_id":"8"
"province":"广东省","city":"广州市","county":"白云区","describe":"会六脉神剑和北冥神功。","stu_id":"10008","stu_name":"段誉","age":14,"sex":false,"birthday":"1996-01-01","hobby":["篮球","足球","乒乓球"],"examination_results":"History":"value":90.5,"level":"优","Chemistry":"value":94.5,"level":"优"
"index":"_id":"9"
"province":"广东省","city":"广州市","county":"白云区","describe":"以光复大燕国为己任,会斗转星移和参合指。","stu_id":"10009","stu_name":"慕容复","age":15,"sex":false,"birthday":"1995-01-01","hobby":["篮球","游泳","乒乓球"],"examination_results":"History":"value":90.5,"level":"优","Chemistry":"value":94.5,"level":"优"
"index":"_id":"10"
"province":"广东省","city":"广州市","county":"白云区","describe":"斗转星移的创作者。","stu_id":"10010","stu_name":"慕容龙城","age":15,"sex":false,"birthday":"1995-01-01","hobby":["篮球","游泳","乒乓球"],"examination_results":"History":"value":90.5,"level":"优"
"index":"_id":"11"
"province":"北京市","city":"朝阳区","county":"三里屯街道","describe":"会少林七十二绝技,以佛法和慈悲度化慕容博和萧远山,是佛法和武功的集大成者。","stu_id":"10011","stu_name":"扫地僧","age":9,"sex":false,"birthday":"2001-01-01","hobby":["篮球","游泳","乒乓球"],"examination_results":"History":"value":100,"level":"优","Chinese":"value":100,"level":"优","Chemistry":"value":94.5,"level":"优","English":"value":100,"level":"优","Physics":"value":100,"level":"优","Math":"value":100,"level":"优"
"index":"_id":"12"
"province":"湖南省","city":"长沙市","county":"天心区","describe":"九阴真经的作者,武学创作天赋真正的第一人。","stu_id":"10012","stu_name":"黄裳","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":"Math":"value":98.5,"level":"优","English":"value":87.5,"level":"良"
"index":"_id":"13"
"province":"湖南省","city":"长沙市","county":"天心区","describe":"根据九阴真经创作了九阳神功。","stu_id":"10013","stu_name":"斗酒僧","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":"Math":"value":100,"level":"优","English":"value":100,"level":"优"
"index":"_id":"14"
"province":"湖南省","city":"长沙市","county":"天心区","describe":"绝技先天功,大器晚成,第一届华山论剑夺得九阴真经。","stu_id":"10014","stu_name":"王重阳","age":10,"sex":true,"birthday":"2000-01-01","hobby":["唱歌","跳舞","篮球"],"examination_results":"Math":"value":100,"level":"优","English":"value":100,"level":"优"

PUT /address_list/_bulk
 "index":  "_id": "1"   
"province": "湖南省","city": "长沙市","county":"天心区"
 "index":  "_id": "2"   
"province": "湖南省","city": "长沙市","county":"芙蓉区"
 "index":  "_id": "3"   
"province": "广东省","city": "广州市","county":"白云区"
 "index":  "_id": "4"   
"province": "湖北省","city": "武汉市","county":"江夏区"
 "index":  "_id": "4"   
"province": "内蒙古自治区","city": "呼和浩特","county":"玉泉区"

Match query常见参数实战

基本语法

需求:全文检索describe字段,匹配值为真经。

第一步:以ik分词器对真经分词,发现分词结果为:"真经"。

POST _analyze

  "analyzer": "ik_max_word",
  "text": "真经"

第二步:匹配查询。

#方法一
GET /student_db/_search

  "query": 
    "match": 
      "describe": "真经"
    
  


#方法二
GET /student_db/_search

  "query": 
    "match": 
      "describe": 
        "query": "真经"
      
    
  

analyzer(指定分词器查询)

需求:指定标准分词器全文检索describe字段,匹配值为真经。

第一步:以标准分词器对真经分词,发现分词结果为:"真","经"。

POST _analyze

  "analyzer": "standard",
  "text": "真经"

第二步:指定标准分词器匹配查询。

GET /student_db/_search

  "query": 
    "match": 
      "describe": 
        "query": "真经",
        "analyzer": "standard"
      
    
  

operator(解释查询条件的布尔逻辑)

需求:对省市县合并后的字段做匹配查询,查询条件是"湖南天心区"。用AND和OR演示该参数的使用方法。

第一步:以ik分词器对"湖南天心区"分词,发现分词结果为:"湖南","南天","天心区","天心","区"。

POST _analyze

  "analyzer": "ik_max_word",
  "text": "湖南天心区"

第二步:对于operator参数分别用AND和OR演来查询。发现用AND查不到数据,因为用ik分词器对fullAddress对应的全地址分词,发现没有一个地址分词以后同时拥有"湖南","南天","天心区","天心","区"。而用OR则可以查到数据,因为OR只要地址分词以后有一个分词在"湖南","南天","天心区","天心","区"就能匹配。注意:该参数默认值为OR。

GET /address_list/_search

  "query": 
    "match": 
      "fullAddress": 
        "query": "湖南天心区",
        "operator": "AND"
      
    
  


GET /address_list/_search

  "query": 
    "match": 
      "fullAddress": 
        "query": "湖南天心区",
        "operator": "OR"
      
    
  

minimum_should_match(最少匹配数)

需求一:对省市县合并后的字段做匹配查询,查询条件是"湖南天心区"。分别设置最少匹配数量为3、2、1,比较它们最终返回的结果。

第一步:以ik分词器对"湖南天心区"分词,发现分词结果为:"湖南","南天","天心区","天心","区"。这里我们可以说分词总数是5个,或者说子句数量是5个。

POST _analyze

  "analyzer": "ik_max_word",
  "text": "湖南天心区"

第二步:查询。发现数字越大,返回的数据越精准;数字越小,返回的数据越多。所以实际生产中需要合理配置该值。

GET /address_list/_search

  "query": 
    "match": 
      "fullAddress": 
        "query": "湖南天心区",
        "minimum_should_match":3
      
    
  


GET /address_list/_search

  "query": 
    "match": 
      "fullAddress": 
        "query": "湖南天心区",
        "minimum_should_match":2
      
    
  


GET /address_list/_search

  "query": 
    "match": 
      "fullAddress": 
        "query": "湖南天心区",
        "minimum_should_match":1
      
    
  

需求二:对省市县合并后的字段做匹配查询,查询条件是"湖南天心区"。分别设置最少匹配数量为1<60%、1<59%,比较它们最终返回的结果。

第一步:根据需求一得出以ik分词器对"湖南天心区"分词,发现分词结果为:"湖南","南天","天心区","天心","区"。

第二步:查询。发现minimum_should_match=1<60%查询1个文档,minimum_should_match=1<59%查询2个文档。因为子句数量为5,则既要满足minimum_should_match=1,又要满足minimum_should_match=百分比数。

GET /address_list/_search

  "query": 
    "match": 
      "fullAddress": 
        "query": "湖南天心区",
        "minimum_should_match":"1<60%"
      
    
  


GET /address_list/_search

  "query": 
    "match": 
      "fullAddress": 
        "query": "湖南天心区",
        "minimum_should_match":"1<59%"
      
    
  

附录

官方文档中描述了该参数可以拥有多种类型的值,比如按照分词数的百分比计算,这里列出了使用方法。

类型取值案例描述
正整数3分词数量至少匹配3个才符合条件。
负整数-2minimum_should_match=子句数量+这个负整数。该负数越小,查询到的数据越多。如果这个负整数太小,小于分词总数,则表示minimum_should_match=1。
正百分比75%符合子句数量的75%则匹配成功,比如子句数量是4,则需要至少有3个分词匹配,该文档才能被匹配;但是子句数量是5,则只需要3个匹配即可,即minimum_should_match=向下取整(子句数量X正百分比)。
负百分比-25%符合子句数量的(100%-25%)则匹配成功,比如子句数量是4,则需要至少有3个分词匹配,该文档才能被匹配;但是子句数量是5,则只需要4个匹配即可,即minimum_should_match=向上取整(子句数量X(100%+负百分比))。
组合1<60%见需求二的实现过程。
多种组合2<60% 9<-4用空格隔开,如果子句数量小于等于2,则每个组合都要匹配,如果数量为3到9个,则需要匹配60%,如果大于9,则需要匹配minimum_should_match=子句数量-4。

fuzzy(模糊搜索)

参数

fuzziness(编辑距离):输入的关键词通过几次操作可以转变为文档中对应的字段的值,这里的操作表示增删改以及相邻字符位置的交换。

#修改"制"变成"治",为1次。
内蒙古自制区->内蒙古自治区
#新增"治区",为2次。
内蒙古自->内蒙古自治区
#删除"区",为1次。
内蒙古自治区区->内蒙古自治区
#交换"治自"为"自治",为1次。
内蒙古治自区->内蒙古自治区

默认值为0表示不开启模糊搜索。为1表示允许一次修改,如:文档中字段的值为"内蒙古自治区",此时的搜索条件为"内蒙古古自治区"、"内蒙股自治区"、"内蒙自治区","内蒙古治自区"都可以搜索出该文档,因为搜索条件只经过了一次修改操作。要特别注意

  1. fuzzy最大为2;
  2. 搜索关键词长度=2,不允许存在模糊;
  3. 搜索关键词长度为3-5,允许1次模糊;
  4. 搜索关键词长度大于5,允许2次模糊。
  5. 官方推荐使用"AUTO",即根据情况自动设定。

prefix_length(前缀长度):模糊搜索时,要求搜索关键词的前缀必须匹配,这里的匹配长度由该参数控制。

需求一:输入关键词模糊搜索省份字段,通过控制编辑距离和前缀长度熟悉两个参数的使用方法。

GET /address_list/_search

  "query": 
    "fuzzy": 
      "province": 
        "value":"湖x省",
        "fuzziness": 1
      
    
  


GET /address_list/_search

  "query": 
    "fuzzy": 
      "province": 
        "value":"湖x省",
        "fuzziness": 1,
        "prefix_length":2
      
    
  


GET /address_list/_search

  "query": 
    "fuzzy": 
      "province": 
        "value":"内蒙自治区",
        "fuzziness": 1,
        "prefix_length":2
      
    
  


GET /address_list/_search

  "query": 
    "fuzzy": 
      "province": 
        "value":"内蒙古字智区",
        "fuzziness": 2,
        "prefix_length":2
      
    
  


GET /address_list/_search

  "query": 
    "fuzzy": 
      "province": 
        "value":"内蒙古治自区",
        "fuzziness": 1,
        "prefix_length":2
      
    
  

elasticsearch数据的检索

   ElasticSearch的检索没有Solr那么多类别,ElasicSearch默认是模糊查询,通过使用余弦相似度量算法来判断keyword和检索值的相似度,然后取出相似度最高的数据作为返回。//检索主体函数publicJSONArrayQuery(Stringkeyword){//单字段查... 查看详情

nginx知多少系列之配置文件详解(代码片段)

原文:Nginx知多少系列之(三)配置文件详解目录1.前言2.安装3.配置文件详解 4.工作原理 5.Linux下托管.NETCore项目6.Linux下.NETCore项目负载均衡7.Linux下.NETCore项目Nginx+Keepalived高可用(主从模式)8.Linux下.NETCore项目Nginx+Keepalived高可... 查看详情

elasticsearch聚合之terms

本篇着重讲解的terms聚合,它是按照某个字段中的值来分类:比如性别有男、女,就会创建两个桶,分别存放男女的信息。默认会搜集doc_count的信息,即记录有多少男生,有多少女生,然后返回给客户端,这样就完成了一个terms... 查看详情

视频教程|高质量数据库建模之数据模型规范化:一二三范式知多少!

正文开始:时光荏苒,本课已经出了近半年,到今天(2016-6-26)为止,一共有1413名观众,非常感谢大家能有耐心听我的课。课程地址:http://www.hellobi.com/course/54此外,希望大家听完课以后去https://www.surveymonkey.com/r/CYQLCTD填写一下... 查看详情

数据标注行业知多少

据相关资料显示,在中国,有10万的全职数据标注员,100万的兼职数据标注员。看到这个数据,不禁想问数据标注到底是一个怎样的行业?其实早在1998年第一家标注公司成立的时候,该行业就已经出现,只是那时人工智能尚未兴... 查看详情

(47)elasticsearch之bulk语法格式解析(代码片段)

  1、bulk的格式:action:metadatarequestbody  2、为什么不使用如下格式:["action":,"data":]  这种方式可读性好,但是内部处理就麻烦了:耗费更多内存,增加java虚拟机开销  1)将json数组解析为JSONArray对象,在内存中就需要有... 查看详情

[转]梦里babel知多少(代码片段)

平时开发中,经常需要用到ES6/ES7的语法。那么就需要用到Babel来对代码进行转码处理。 之前用Vue比较多,所以以Vue-cli作为参考来分析。 第一张图是几个月前的Vue-cli生成的 第二个图是今天使用Vue-cli生成的Babel-core顾... 查看详情

elasticsearch之curl操作

CURL的操作   curl是利用URL语法在命令行方式下工作的开源文件传输工具,使用curl可以简单实现常见的get/post请求。简单的认为是可以在命令行下面访问url的一个工具。在centos的默认库里面是有curl工具的,如果没有请yum... 查看详情

性能测试之并发用户数知多少

参考技术A一、经典公式1:一般来说,利用以下经验公式进行估算系统的平均并发用户数和峰值数据1)平均并发用户数为C=nL/T2)并发用户数峰值C‘=C+3*根号CC是平均并发用户数,n是loginsession的数量,L是loginsession的平均长度,T是... 查看详情

事件总线知多少

ImplementinganeventbuswithRabbitMQforthedevelopmentortestenvironmentAbpEventBus  查看详情

事件总线知多少

ImplementinganeventbuswithRabbitMQforthedevelopmentortestenvironmentAbpEventBus  查看详情

elasticsearch之curl操作(有空再去整理)

https://www.cnblogs.com/jing1617/p/8060421.htmlCURL的操作   curl是利用URL语法在命令行方式下工作的开源文件传输工具,使用curl可以简单实现常见的get/post请求。简单的认为是可以在命令行下面访问url的一个工具。在centos的默认... 查看详情

分布式爬虫之elasticsearch基础6(bluk)(代码片段)

上篇文章介绍了在es里面批量读取数据的方法mget,本篇我们来看下关于批量写入的方法bulk。bulkapi可以在单个请求中一次执行多个索引或者删除操作,使用这种方式可以极大的提升索引性能。bulk的语法格式是:actionandmeta_dataoption... 查看详情

网页制作知多少

网页制作知多少一、通用模板:<!DOCTYPEhtml><htmllang=”en”>  <head>      <meta charset=”UTF-8”/>    &nb 查看详情

unitofwork知多少(代码片段)

原文链接:https://www.cnblogs.com/sheng-jie/p/7416302.html1.引言Maintainsalistofobjectsaffectedbyabusinesstransactionandcoordinatesthewritingoutofchangesandtheresolutionofconcurrencyproblems.UnitofWork  查看详情

初识elasticsearch-批量操作之bulk|条件查询|其它查询

1.bulk:1.1.bulk语法:1.2.bulk行为-增删改:2.bulk-index批量插入:3.bulk-update批量修改:4.bulk-delete批量删除:5._update_by_query条件更新:6._delete_by_query条件删除:7._mget多文档查询 查看详情

elasticsearch之插件扩展

   Elasticsearch之插件介绍及安装Elasticsearch之head插件安装之后的浏览详解Elasticsearch之kopf插件安装之后的浏览详解Elasticsearch-2.4.3的3节点安装(多种方式图文详解)(含head、kopf和marvel插件安装)Elasticsearch之marvel(集群... 查看详情

overflow知多少

本文地址: http://www.hicss.net/some-overflow-knowledge/最近在研究OOCSS,当打开template.css阅读第一行时,震惊了,第一眼居然没看懂。。。。。。以下就是OOCSS下的template.css第一行代码:12.body{overflow:hidden;_overflow:visible;_zoom:1;}.main{overflo 查看详情