通过 Painless 脚本控制 elasticsearch 搜索结果的评分
为了控制搜索结果的相关度,elasticsearch 提供了多种方式,通过脚本实现自定义评分逻辑是终极方式。脚本返回一个评分值,该值再与原_score再进行加法等运算。脚本编写很简单,我们跟随一个例子(基于5.5版本)来看看如何通过脚本实现自定义排序。
创建 user_info索引,只有一个 name 字段,并且不分词:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
PUT user_info { "mappings":{ "user": { "properties": { "name":{ "type":"keyword" } } } } } |
写入测试数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
POST /user_info/_bulk {"index" : {"_type":"user","_id":"1"}} {"name":"高波"} {"index" : {"_type":"user","_id":"2"}} {"name":"高大山"} {"index" : {"_type":"user","_id":"3"}} {"name":"闫高峰"} {"index" : {"_type":"user","_id":"4"}} {"name":"李高峰"} {"index" : {"_type":"user","_id":"5"}} {"name":"安建高"} {"index" : {"_type":"user","_id":"6"}} {"name":"高峰玉成"} |
对于搜索结果,理想的顺序是:
高波
高大山
高峰玉成
李高峰
闫高峰
安建高
执行搜索:
1 2 3 4 5 6 7 8 9 10 |
GET user_info/_search?size=20 { "query": { "query_string" : { "query" : "(name:(*高*))" } } } |
返回结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
{ "hits": { "total": 6, "max_score": 1.0, "hits": [ { "_index": "user_info", "_type": "user", "_id": "5", "_score": 1.0, "_source": { "name": "安建高" } }, { "_index": "user_info", "_type": "user", "_id": "AWIEksAoMnf4TVgYEH8P", "_score": 1.0, "_source": { "name": "高峰玉成" } }, { "_index": "user_info", "_type": "user", "_id": "2", "_score": 1.0, "_source": { "name": "高大山" } }, { "_index": "user_info", "_type": "user", "_id": "4", "_score": 1.0, "_source": { "name": "李高峰" } }, { "_index": "user_info", "_type": "user", "_id": "1", "_score": 1.0, "_source": { "name": "高波" } }, { "_index": "user_info", "_type": "user", "_id": "3", "_score": 1.0, "_source": { "name": "闫高峰" } } ] } } |
得分都是1.0,我们期望的返回顺序与两个原则有关:
1.关键词出现的位置越靠前,排序应该越靠前
2.字段值约短,说明匹配度越高,排序应该越靠前
下面我们通过脚本来实现自定义评分。
elasticsearch 支持多种脚本语言,经历各版本演变后,从5.0开始实现了自己专用的语言:Painless。Groovy已弃用,这次示例使用Painless实现,Painless是内置支持的。脚本内容可以通过多种途径传给 es,包括 rest 接口,或者放到 config/scripts目录等,默认开启情况如下:
script.inline: false
script.stored: false
script.file: true
通过本地文件的方式放置到 config/scripts 目录默认开启。脚本目录中放置的任何文件将在节点启动后自动编译。默认每60秒扫描一次,间隔时间通过 resource.reload.interval设置。编写脚本文件,命名为:user_info_score.painless
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
double position_score = 0; double similarity_score = 0; int pos = doc['name'].value.indexOf(params.keyword); if (pos != -1) { position_score = 10 - pos; if (position_score < 0) position_score = 0;//出现位置大于10的忽略其重要性 } double similarity = Math.abs(1.0*doc['name'].value.length() - params.keyword.length()); similarity_score = 10 - similarity; if (similarity_score < 0) similarity_score = 0; //相似度差10个字符的忽略其重要性 //在下面调节各分值的权重 return position_score*0.6+similarity_score*0.4; |
获取文档值:doc[‘name’].value
实现比较简单,根据位置和相似度分别计算评分,将结果乘不同权重再相加。
再次查询:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
GET user_info/_search?size=20 { "query": { "function_score": { "query": { "query_string": { "query": "(name:(*高*))" } }, "script_score": { "script": { "lang": "painless", "file": "user_info_score", "params":{ "keyword":"高" } } }, "boost_mode": "sum" } } } |
function_score 查询 是用来控制评分过程的终极武器,它允许为每个与主查询匹配的文档应用一个函数, 以达到改变甚至完全替换原始查询评分 _score 的目的。
script_score 用自定义脚本完全控制评分计算,实现所需逻辑
lang 指定脚本语言
inline, stored, file 指定脚本的源。 这里使用文件方式,脚本名称为:user_info_score 注意这里要去掉.painless扩展名
params 指定作为变量传递到脚本中的任何命名参数。
boost_mode 新计算的分数与_score的结合方式,取值可以是:
multiply 相乘(默认)
replace 替换_score
sum 相加
avg 取平均值
max 取最大值
min 取最小值
搜索结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
{ "hits": { "total": 6, "max_score": 10.6, "hits": [ { "_index": "user_info", "_type": "user", "_id": "1", "_score": 10.6, "_source": { "name": "高波" } }, { "_index": "user_info", "_type": "user", "_id": "2", "_score": 10.2, "_source": { "name": "高大山" } }, { "_index": "user_info", "_type": "user", "_id": "AWIEksAoMnf4TVgYEH8P", "_score": 9.8, "_source": { "name": "高峰玉成" } }, { "_index": "user_info", "_type": "user", "_id": "4", "_score": 9.6, "_source": { "name": "李高峰" } }, { "_index": "user_info", "_type": "user", "_id": "3", "_score": 9.6, "_source": { "name": "闫高峰" } }, { "_index": "user_info", "_type": "user", "_id": "5", "_score": 9.0, "_source": { "name": "安建高" } } ] } } |
Painless语法参考链接:
https://www.elastic.co/guide/en/elasticsearch/painless/5.5/painless-syntax.html
https://www.elastic.co/guide/en/elasticsearch/painless/5.5/index.html
https://www.elastic.co/guide/en/elasticsearch/painless/5.5/painless-api-reference.html
其他:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-scripting-using.html#modules-scripting-file-scripts
(转载请注明作者和出处 easyice.cn ,请勿用于任何商业用途)
2 thoughts on “通过 Painless 脚本控制 elasticsearch 搜索结果的评分”
painless 性能很渣,慎入坑。
嗯,脚本的性能肯定会差很多,不得己不用的好