cai-jinlin commented on PR #59014:
URL: https://github.com/apache/doris/pull/59014#issuecomment-3649182995
`// 核心计算逻辑:比较两个等长字符串的不同字符数
static int64_t hamming_distance_impl(const char* s1, const char* s2, size_t
len) {
int64_t count = 0;
for (size_t i = 0; i < len; ++i) {
count += (s1[i] != s2[i]); // 字符不同时加1
}
return count;
}
// 向量化处理函数(Doris BE核心)
Status execute_impl(FunctionContext* context, Block& block, const
ColumnNumbers& args,
size_t result_idx, size_t rows) override {
// 获取输入列
const ColumnPtr& col_left = block.get_by_position(args[0]).column;
const ColumnPtr& col_right = block.get_by_position(args[1]).column;
// 准备结果列
auto result_col = ColumnInt64::create();
auto null_map = ColumnUInt8::create();
auto& results = result_col->get_data();
auto& nulls = null_map->get_data();
results.resize(rows);
nulls.resize_fill(rows, 0);
// 批量处理每一行
for (size_t i = 0; i < rows; ++i) {
// 检查NULL值(Doris会自动处理,这里显式检查)
if (col_left->is_null_at(i) || col_right->is_null_at(i)) {
nulls[i] = 1;
continue;
}
// 获取字符串
auto s1 = col_left->get_data_at(i);
auto s2 = col_right->get_data_at(i);
// 长度不同返回NULL
if (s1.size != s2.size) {
nulls[i] = 1;
continue;
}
// 计算汉明距离
results[i] = hamming_distance_impl(s1.data, s2.data, s1.size);
}
// 包装为可空列
block.get_by_position(result_idx).column =
ColumnNullable::create(std::move(result_col), std::move(null_map));
return Status::OK();
}`
(汉明距离定义就是等长字符串对应位置不同字符的数量)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]