黑马程序员技术交流社区

标题: 浅谈 [打印本页]

作者: 小伙伴m 时间: 2019-3-21 15:58
标题: 浅谈
需求是酱紫的：

可以设置某些关键字高亮，可以设置n个高亮的关键字，然后最后的查询结果，按照符合高亮条件的个数来排序，即优先展示高亮字段多的。这个搜索结果是分页查询的。

先看下，设置完条件之后的查询语句：json格式的，可能有点长，重点就是最后的highlight的配置。

{
  "from" : 0,
  "size" : 50,
  "query" : {
"bool" : {
   "must" : [
      {
      "query_string" : {
         "query" : "\" /usr/libexec/atrun)\" \"88555\" \"18:00:00\" \"85464\" \"CMD\" \"root\" \"newsyslog\"",
         "default_field" : "content",
         "fields" : [ ],
         "use_dis_max" : true,
         "tie_breaker" : 0.0,
         "default_operator" : "or",
         "auto_generate_phrase_queries" : false,
         "max_determinized_states" : 10000,
         "enable_position_increments" : true,
         "fuzziness" : "AUTO",
         "fuzzy_prefix_length" : 0,
         "fuzzy_max_expansions" : 50,
         "phrase_slop" : 0,
         "escape" : false,
         "split_on_whitespace" : true,
         "boost" : 1.0
      }
      }
   ],
   "filter" : [
      {
      "bool" : {
         "must" : [
            {
            "range" : {
               "start_at" : {
                  "from" : 1552298160,
                  "to" : 1552300042,
                  "include_lower" : true,
                  "include_upper" : false,
                  "boost" : 1.0
               }
            }
            }
         ],
         "disable_coord" : false,
         "adjust_pure_negative" : true,
         "boost" : 1.0
      }
      }
   ],
   "disable_coord" : false,
   "adjust_pure_negative" : true,
   "boost" : 1.0
}
  },
  "highlight" : {
"pre_tags" : [
   "<span class=\"highlight\">"
],
"post_tags" : [
   "</span>"
],
"order" : "score",
"fields" : {
   "content" : {
      "fragment_size" : 1000
   }
}
  }
}
前面的query里面的都是查询的条件的，可以按照各自需求自己设置，比如时间from to的设置，同时搜索一个字段里面的n个关键字的设置等等。

这个highlight里面设置的说明：

pre_tags和post_tags，这2个合起来是一个标签，用来包装高亮的关键字。

order是关键，这个就是加完高亮标签之后，然后按照标签多少来排序的。

fields则是设置要高亮的字段，里面的fragment_size则是设置高亮字段显示的最长长度。

看文字不好懂，看图就好说啦。下面图展示查询的结果以及传入的n个查询参数。

大师兄

然后查询的结果图：

大师兄

可以看到，一共查到48个，total字段值，还有就是max_score值10.9。然后，hits数组里面第一个的score值就是max_score。后面的依次变小。说明这个hits数组查询结果，是按照这个值递减的顺序排序的。对应到页面上，第一行就是这个数组0的值啦。

代码里面对highlight的设置：

大师兄

这里面需要传个字符串，“score”而不是“_score”，这个score由字符串转化成一个枚举类型，然后就按照这个字段的值order排序呢

大师兄

再看这个查询结果，取数组0的值来看下，里面有个highlightFields，里面存的是content字段的值，对应的fragments，他对符合条件的关键字，给加上了我们传进去的标签。哦，他还分词了，1800，在我们看来是一个词，但es给分成了18和00，并且都给加上了标签。

可能有的老铁说了，我也设置了，为啥结果不是预期的样子呢？

原因可能是不了解这个score(分数)值是怎么在这设置的。

先看下这个查询语句，和上面的差不多，就多了一个sort。

{
  "from" : 0,
  "size" : 50,
  "query" : {
"bool" : {
   "must" : [
      {
      "query_string" : {
         "query" : "\" /usr/libexec/atrun)\" \"88555\" \"18:00:00\" \"85464\" \"CMD\" \"root\" \"newsyslog\"",
         "default_field" : "content",
         "fields" : [ ],
         "use_dis_max" : true,
         "tie_breaker" : 0.0,
         "default_operator" : "or",
         "auto_generate_phrase_queries" : false,
         "max_determinized_states" : 10000,
         "enable_position_increments" : true,
         "fuzziness" : "AUTO",
         "fuzzy_prefix_length" : 0,
         "fuzzy_max_expansions" : 50,
         "phrase_slop" : 0,
         "escape" : false,
         "split_on_whitespace" : true,
         "boost" : 1.0
      }
      }
   ],
   "filter" : [
      {
      "bool" : {
         "must" : [
            {
            "range" : {
               "start_at" : {
                  "from" : 1552298160,
                  "to" : 1552300042,
                  "include_lower" : true,
                  "include_upper" : false,
                  "boost" : 1.0
               }
            }
            }
         ],
         "disable_coord" : false,
         "adjust_pure_negative" : true,
         "boost" : 1.0
      }
      }
   ],
   "disable_coord" : false,
   "adjust_pure_negative" : true,
   "boost" : 1.0
}
  },
  "sort" : [
{
   "start_at" : {
      "order" : "asc",
      "unmapped_type" : "string"
   }
}
  ],
  "highlight" : {
"pre_tags" : [
   "<span class=\"highlight\">"
],
"post_tags" : [
   "</span>"
],
"order" : "score",
"fields" : {
   "content" : {
      "fragment_size" : 1000
   }
}
  }
}
在query里面多了一个sort的设置，按照start_at字段asc排序。

因为这个时候，查询结果，已经有了排序字段啦。所以，他就不去计算这个score值了，不给查询出来的每个结果打分啦。

大师兄

你把这个排序去掉，然后，他就会按照content字段去打分，然后按照这个score字段打分的值来排序，返回查询结果。

最后：

使用的是es5.6版本

使用的Spring Data Elasticsearch 3.0，这个源码有点毛病，比如ID的生成规则，给改了。

还有使用的这个es的索引的mapping等信息，content字段类型是text，然后就是使用了ik分词

{
  "state": "open",
  "settings": {
"index": {
   "creation_date": "1552295229216",
   "number_of_shards": "3",
   "number_of_replicas": "0",
   "uuid": "p4aoDav2QuaV89L5CvP_lQ",
   "version": {
      "created": "5060699"
   },
   "provided_name": "ezs_log_2019-03-11"
}
  },
  "mappings": {
"message": {
   "_all": {
      "enabled": false
   },
   "properties": {
      "firm": {
      "type": "keyword"
      },
      "file": {
      "type": "keyword"
      },
      "logSearchLevel": {
      "type": "keyword"
      },
      "topic": {
      "type": "keyword"
      },
      "id": {
      "type": "text"
      },
      "source": {
      "type": "ip"
      },
      "start_at": {
      "type": "long"
      },
      "content": {
      "analyzer": "ik_max_word",
      "type": "text"
      },
      "key": {
      "type": "text",
      "fields": {
         "keyword": {
            "ignore_above": 256,
            "type": "keyword"
         }
      }
      }
   }
}
  },
  "aliases": [],
  "primary_terms": {
"0": 1,
"1": 1,
"2": 1
  },
  "in_sync_allocations": {
"0": [
   "-RGEY80ORX-jm5jMD4hyHg"
],
"1": [
   "gWYUT_sBTCmhFjtaqV4qwQ"
],
"2": [
   "qh5eDAT-Tb6241m43LR2MA"
]
  }
}

欢迎光临黑马程序员技术交流社区 (http://bbs.itheima.com/)

黑马程序员IT技术论坛 X3.2