We use a 100K-dimentions vector to express a query.
But obviously, it's too large and can't contain all the words.
So we propose a word hashing method,
e.g. "shirt"
first, we expand the word with a pair of '#'.
#shirt#
then, we take every 3 letters.
#sh, shi, hir, irt ,rt#
Finally, the word is represented using a vector of letter n-grams
本文介绍了一种用于压缩高维度查询向量的方法——Word Hashing。该方法通过将单词扩展并转换为字母n-grams的形式来降低向量的维度。例如,将单词“shirt”扩展为“#shirt#”,然后将其分解为更小的部分,如“#sh”、“shi”等。

3839

被折叠的 条评论
为什么被折叠?



