首先看一下DisMax query parser的定义:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options
enable users to influence the score based on rules specific to each use case
(independent of user input).
再看eDisMax(The Extended DisMax Query Parser)的定义:
The Extended DisMax (eDisMax) query parser is an improved version of the DisMax query parser,includes improved boost function: in Extended DisMax, the boost function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (bf and bq) are also supported.
In addition to all the DisMax parameters, Extended DisMax includes these query parameters:
【The boost Parameter 】
A multivalued list of strings parsed as queries with scores multiplied by the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the BoostQParserPlugin
即通过boost参数可以在原有的评分基础上再乘以这个参数,该参数可以为某个field。
比如从Mysql中向solr导入以下数据:
+----+--------------------------------+--------+
| id | keyword | weight |
+----+--------------------------------+--------+
| 3 | 中国 | 1.0 |
| 4 | 美国人民 | 1.0 |
| 5 | 人民群众 | 1.0 |
| 6 | 美国人民 | 1.0 |
| 7 | 中国人民解放军 | 2.0 |
| 8 | 中国很好,美国也不错 | 10.0 |
| 9 | chinese people | 1.0 |
| 10 | my god, you are chinese | 1.0 |
| 11 | you are chinese people | 1.0 |
| 12 | 中国中国 | 1.0 |
+----+--------------------------------+--------+
在执行查询时,可以通过设置debugQuery来打印评分规则(可以在Raw Query Parameters中设置debugQuery=true或者直接勾选debugQuery如下图所示),
例如,不进行boost提分,直接查询关键词:
返回结果中的评分详情:
"debug": {
"rawquerystring": "keyword:中国",
"querystring": "keyword:中国",
"parsedquery": "(keyword:中国 keyword:china)/no_coord",
"parsedquery_toString": "keyword:中国 keyword:china",
"explain": {
"3": "\n0.7724356 = sum of:\n 0.7724356 = weight(keyword:中国 in 0) [ClassicSimilarity], result of:\n 0.7724356 = score(doc=0,freq=1.0), product of:\n 0.4562129 = queryWeight, product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n
0.2694467 = queryNorm\n 1.6931472 = fieldWeight in 0, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 1.0 = fieldNorm(doc=0)\n",
"7": "\n0.24138615 = sum of:\n 0.24138615 = weight(keyword:中国 in 4) [ClassicSimilarity], result of:\n 0.24138615 = score(doc=4,freq=1.0), product of:\n 0.4562129 = queryWeight, product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n
0.2694467 = queryNorm\n 0.5291085 = fieldWeight in 4, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 0.3125 = fieldNorm(doc=4)\n",
"8": "\n0.3862178 = sum of:\n 0.3862178 = weight(keyword:中国 in 5) [ClassicSimilarity], result of:\n 0.3862178 = score(doc=5,freq=1.0), product of:\n 0.4562129 = queryWeight, product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n
0.2694467 = queryNorm\n 0.8465736 = fieldWeight in 5, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 0.5 = fieldNorm(doc=5)\n",
"12": "\n0.54619443 = sum of:\n 0.54619443 = weight(keyword:中国 in 9) [ClassicSimilarity], result of:\n 0.54619443 = score(doc=9,freq=2.0), product of:\n 0.4562129 = queryWeight, product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n
0.2694467 = queryNorm\n 1.1972358 = fieldWeight in 9, product of:\n 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = termFreq=2.0\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 0.5 = fieldNorm(doc=9)\n"
},
当设置 edismax query方式以及boost参数以后(本例中用weight 列作为要提分的权重,lucene的原始评分乘以这个权重为最终得分),如:
评分详情:
"debug": {
"rawquerystring": "keyword:中国",
"querystring": "keyword:中国",
"parsedquery": "BoostedQuery(boost(+(keyword:中国 keyword:china),float(weight)))",
"parsedquery_toString": "boost(+(keyword:中国 keyword:china),float(weight))",
"explain": {
"3": "\n0.7724356 = boost(keyword:中国 keyword:china,float(weight)), product of:\n 0.7724356 = sum of:\n 0.7724356 = weight(keyword:中国 in 0) [ClassicSimilarity], result of:\n 0.7724356 = score(doc=0,freq=1.0), product of:\n 0.4562129 = queryWeight,
product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 0.2694467 = queryNorm\n 1.6931472 = fieldWeight in 0, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 1.6931472 = idf(docFreq=4,
maxDocs=10)\n 1.0 = fieldNorm(doc=0)\n 1.0 = float(weight)=1.0\n",
"7": "\n0.4827723 = boost(keyword:中国 keyword:china,float(weight)), product of:\n 0.24138615 = sum of:\n 0.24138615 = weight(keyword:中国 in 4) [ClassicSimilarity], result of:\n 0.24138615 = score(doc=4,freq=1.0), product of:\n 0.4562129
= queryWeight, product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 0.2694467 = queryNorm\n 0.5291085 = fieldWeight in 4, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 1.6931472
= idf(docFreq=4, maxDocs=10)\n 0.3125 = fieldNorm(doc=4)\n 2.0 = float(weight)=2.0\n",
"8": "\n3.862178 = boost(keyword:中国 keyword:china,float(weight)), product of:\n 0.3862178 = sum of:\n 0.3862178 = weight(keyword:中国 in 5) [ClassicSimilarity], result of:\n 0.3862178 = score(doc=5,freq=1.0), product of:\n 0.4562129 = queryWeight,
product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 0.2694467 = queryNorm\n 0.8465736 = fieldWeight in 5, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 1.6931472 = idf(docFreq=4,
maxDocs=10)\n 0.5 = fieldNorm(doc=5)\n 10.0 = float(weight)=10.0\n",
"12": "\n0.54619443 = boost(keyword:中国 keyword:china,float(weight)), product of:\n 0.54619443 = sum of:\n 0.54619443 = weight(keyword:中国 in 9) [ClassicSimilarity], result of:\n 0.54619443 = score(doc=9,freq=2.0), product of:\n 0.4562129
= queryWeight, product of:\n 1.6931472 = idf(docFreq=4, maxDocs=10)\n 0.2694467 = queryNorm\n 1.1972358 = fieldWeight in 9, product of:\n 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = termFreq=2.0\n 1.6931472
= idf(docFreq=4, maxDocs=10)\n 0.5 = fieldNorm(doc=9)\n 1.0 = float(weight)=1.0\n"
},
可以看到id为7的记录其weight为2.0, 评分提升了两倍,id为8的记录其weight为10.0, 评分提升了10倍.
本文介绍Solr中DisMax和eDisMax查询解析器的功能及使用方法,特别是如何利用eDisMax中的boost参数调整搜索结果的权重,通过实例展示了不同配置下查询结果的评分变化。

4万+

被折叠的 条评论
为什么被折叠?



