Solr通过edismax提升评分并打印评分规则

本文介绍Solr中DisMax和eDisMax查询解析器的功能及使用方法,特别是如何利用eDisMax中的boost参数调整搜索结果的权重,通过实例展示了不同配置下查询结果的评分变化。

首先看一下DisMax query parser的定义:

The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case

(independent of user input).

再看eDisMax(The Extended DisMax Query Parser)的定义:

The Extended DisMax (eDisMax) query parser is an improved version of the DisMax query parser,includes improved boost function: in Extended DisMax, the boost function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (bf and bq) are also supported.

In addition to all the DisMax parameters, Extended DisMax includes these query parameters:
【The boost Parameter 】

A multivalued list of strings parsed as queries with scores multiplied by the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the BoostQParserPlugin

即通过boost参数可以在原有的评分基础上再乘以这个参数,该参数可以为某个field。


比如从Mysql中向solr导入以下数据:

+----+--------------------------------+--------+
| id | keyword                               | weight |
+----+--------------------------------+--------+
|  3 | 中国                                          |    1.0 |
|  4 | 美国人民                                  |    1.0 |
|  5 | 人民群众                                  |    1.0 |
|  6 | 美国人民                                  |    1.0 |
|  7 | 中国人民解放军                      |    2.0 |
|  8 | 中国很好,美国也不错          |   10.0 |
|  9 | chinese people                      |    1.0 |
| 10 | my god, you are chinese     |    1.0 |
| 11 | you are chinese people       |    1.0 |
| 12 | 中国中国                                 |    1.0 |
+----+--------------------------------+--------+

在执行查询时,可以通过设置debugQuery来打印评分规则(可以在Raw Query Parameters中设置debugQuery=true或者直接勾选debugQuery如下图所示),

例如,不进行boost提分,直接查询关键词:



返回结果中的评分详情:

"debug": {
    "rawquerystring": "keyword:中国",
    "querystring": "keyword:中国",
    "parsedquery": "(keyword:中国 keyword:china)/no_coord",
    "parsedquery_toString": "keyword:中国 keyword:china",
    "explain": {
      "3": "\n0.7724356 = sum of:\n  0.7724356 = weight(keyword:中国 in 0) [ClassicSimilarity], result of:\n    0.7724356 = score(doc=0,freq=1.0), product of:\n      0.4562129 = queryWeight, product of:\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        0.2694467 = queryNorm\n      1.6931472 = fieldWeight in 0, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        1.0 = fieldNorm(doc=0)\n",
      "7": "\n0.24138615 = sum of:\n  0.24138615 = weight(keyword:中国 in 4) [ClassicSimilarity], result of:\n    0.24138615 = score(doc=4,freq=1.0), product of:\n      0.4562129 = queryWeight, product of:\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        0.2694467 = queryNorm\n      0.5291085 = fieldWeight in 4, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        0.3125 = fieldNorm(doc=4)\n",
      "8": "\n0.3862178 = sum of:\n  0.3862178 = weight(keyword:中国 in 5) [ClassicSimilarity], result of:\n    0.3862178 = score(doc=5,freq=1.0), product of:\n      0.4562129 = queryWeight, product of:\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        0.2694467 = queryNorm\n      0.8465736 = fieldWeight in 5, product of:\n        1.0 = tf(freq=1.0), with freq of:\n          1.0 = termFreq=1.0\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        0.5 = fieldNorm(doc=5)\n",
      "12": "\n0.54619443 = sum of:\n  0.54619443 = weight(keyword:中国 in 9) [ClassicSimilarity], result of:\n    0.54619443 = score(doc=9,freq=2.0), product of:\n      0.4562129 = queryWeight, product of:\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        0.2694467 = queryNorm\n      1.1972358 = fieldWeight in 9, product of:\n        1.4142135 = tf(freq=2.0), with freq of:\n          2.0 = termFreq=2.0\n        1.6931472 = idf(docFreq=4, maxDocs=10)\n        0.5 = fieldNorm(doc=9)\n"
    },


当设置 edismax query方式以及boost参数以后(本例中用weight 列作为要提分的权重,lucene的原始评分乘以这个权重为最终得分),如:



评分详情:

"debug": {
    "rawquerystring": "keyword:中国",
    "querystring": "keyword:中国",
    "parsedquery": "BoostedQuery(boost(+(keyword:中国 keyword:china),float(weight)))",
    "parsedquery_toString": "boost(+(keyword:中国 keyword:china),float(weight))",
    "explain": {
      "3": "\n0.7724356 = boost(keyword:中国 keyword:china,float(weight)), product of:\n  0.7724356 = sum of:\n    0.7724356 = weight(keyword:中国 in 0) [ClassicSimilarity], result of:\n      0.7724356 = score(doc=0,freq=1.0), product of:\n        0.4562129 = queryWeight, product of:\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          0.2694467 = queryNorm\n        1.6931472 = fieldWeight in 0, product of:\n          1.0 = tf(freq=1.0), with freq of:\n            1.0 = termFreq=1.0\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          1.0 = fieldNorm(doc=0)\n  1.0 = float(weight)=1.0\n",
      "7": "\n0.4827723 = boost(keyword:中国 keyword:china,float(weight)), product of:\n  0.24138615 = sum of:\n    0.24138615 = weight(keyword:中国 in 4) [ClassicSimilarity], result of:\n      0.24138615 = score(doc=4,freq=1.0), product of:\n        0.4562129 = queryWeight, product of:\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          0.2694467 = queryNorm\n        0.5291085 = fieldWeight in 4, product of:\n          1.0 = tf(freq=1.0), with freq of:\n            1.0 = termFreq=1.0\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          0.3125 = fieldNorm(doc=4)\n  2.0 = float(weight)=2.0\n",
      "8": "\n3.862178 = boost(keyword:中国 keyword:china,float(weight)), product of:\n  0.3862178 = sum of:\n    0.3862178 = weight(keyword:中国 in 5) [ClassicSimilarity], result of:\n      0.3862178 = score(doc=5,freq=1.0), product of:\n        0.4562129 = queryWeight, product of:\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          0.2694467 = queryNorm\n        0.8465736 = fieldWeight in 5, product of:\n          1.0 = tf(freq=1.0), with freq of:\n            1.0 = termFreq=1.0\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          0.5 = fieldNorm(doc=5)\n  10.0 = float(weight)=10.0\n",
      "12": "\n0.54619443 = boost(keyword:中国 keyword:china,float(weight)), product of:\n  0.54619443 = sum of:\n    0.54619443 = weight(keyword:中国 in 9) [ClassicSimilarity], result of:\n      0.54619443 = score(doc=9,freq=2.0), product of:\n        0.4562129 = queryWeight, product of:\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          0.2694467 = queryNorm\n        1.1972358 = fieldWeight in 9, product of:\n          1.4142135 = tf(freq=2.0), with freq of:\n            2.0 = termFreq=2.0\n          1.6931472 = idf(docFreq=4, maxDocs=10)\n          0.5 = fieldNorm(doc=9)\n  1.0 = float(weight)=1.0\n"
    },


可以看到id为7的记录其weight为2.0, 评分提升了两倍,id为8的记录其weight为10.0, 评分提升了10倍.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值