探索redis 键散列过程源码

在lua方案里碰到了在redis集群中执行多键操作，Key要在同一slot的限制。

经查资料，说是要避免 Max redirect exception，节点会因为某些场景发生阻塞(阻塞时间大于 clutser-node-timeout)，被判断下线。
想来在多键命令执行时，节点是阻塞的，如果不做单slot限制，键槽所分布的节点都会受到阻塞的影响。

命令的探索

定义

server.c的redisCommandTable定义了每个命令对应的函数，
源码中的set命令对应的函数是setCommand函数，这个函数在t_string.c中定义。
源码中的cluster命令对应的函数是clusterCommand函数，这个函数在cluster.c中定义。

https://github.com/antirez/redis/blob/6.0.0/src/server.c

line:182
struct redisCommand redisCommandTable[] = {
    ...
    {"set",setCommand,-3,
     "write use-memory @string",
     0,NULL,1,1,1,0,0,0},
     ...
     {"cluster",clusterCommand,-2,
     "admin ok-stale random",
     0,NULL,0,0,0,0,0,0},
     ...
}

`set`命令追踪

打开文件 t_string.c 查看调用过程
setCommand -> setGenericCommand -> genericSetKey，函数genericSetKey在db.c中定义。
https://github.com/antirez/redis/blob/6.0.0/src/t_string.c

line:96
/* SET key value [NX] [XX] [KEEPTTL] [EX <seconds>] [PX <milliseconds>] */
void setCommand(client *c) {
    ...
    setGenericCommand(c,flags,c->argv[1],c->argv[2],expire,unit,NULL,NULL);
}

line:68
void setGenericCommand(client *c, int flags, robj *key, robj *val, robj *expire, int unit, robj *ok_reply, robj *abort_reply) {
    ...
    genericSetKey(c,c->db,key,val,flags & OBJ_SET_KEEPTTL,1);
    ...
    addReply(c, ok_reply ? ok_reply : shared.ok);
}

打开文件 db.c 查看调用过程
genericSetKey -> dbAdd -> slotToKeyAdd -> slotToKeyUpdateKey -> keyHashSlot
终于看到散列计算的函数了，keyHashSlot在cluster.c中定义
https://github.com/antirez/redis/blob/6.0.0/src/db.c

line:244
void genericSetKey(client *c, redisDb *db, robj *key, robj *val, int keepttl, int signal) {
    if (lookupKeyWrite(db,key) == NULL) {
        dbAdd(db,key,val);
    } else {
        dbOverwrite(db,key,val);
    }
    ...
}

line:179
void dbAdd(redisDb *db, robj *key, robj *val) {
    ...
    if (server.cluster_enabled) slotToKeyAdd(key->ptr);
}

line:1691
void slotToKeyAdd(sds key) {
    slotToKeyUpdateKey(key,1);
}

line:1672
void slotToKeyUpdateKey(sds key, int add) {
    ...
    unsigned int hashslot = keyHashSlot(key,keylen);
    ...
}

`cluster keyslot`命令追踪

cluster keyslot 该命令能获取集群中key的slot值，可以看到调用的也是keyHashSlot函数
https://github.com/antirez/redis/blob/6.0.0/src/cluster.c

line:4251
void clusterCommand(client *c) {
    ...
    } else if (!strcasecmp(c->argv[1]->ptr,"keyslot") && c->argc == 3) {
        /* CLUSTER KEYSLOT <key> */
        sds key = c->argv[2]->ptr;

        addReplyLongLong(c,keyHashSlot(key,sdslen(key)));
    }
    ...
}

散列函数

打开文件 cluster.c，可以看到散列计算过程
看函数、函数注释可知，当有花括号{} 时，仅计算括号内内容的散列
https://github.com/antirez/redis/blob/6.0.0/src/cluster.c

line：694
/* -----------------------------------------------------------------------------
 * Key space handling
 * -------------------------------------------------------------------------- */

/* We have 16384 hash slots. The hash slot of a given key is obtained
 * as the least significant 14 bits of the crc16 of the key.
 *
 * However if the key contains the {...} pattern, only the part between
 * { and } is hashed. This may be useful in the future to force certain
 * keys to be in the same node (assuming no resharding is in progress). */
unsigned int keyHashSlot(char *key, int keylen) {
    int s, e; /* start-end indexes of { and } */

    for (s = 0; s < keylen; s++)
        if (key[s] == '{') break;

    /* No '{' ? Hash the whole key. This is the base case. */
    if (s == keylen) return crc16(key,keylen) & 0x3FFF;

    /* '{' found? Check if we have the corresponding '}'. */
    for (e = s+1; e < keylen; e++)
        if (key[e] == '}') break;

    /* No '}' or nothing between {} ? Hash the whole key. */
    if (e == keylen || e == s+1) return crc16(key,keylen) & 0x3FFF;

    /* If we are here there is both a { and a } on its right. Hash
     * what is in the middle between { and }. */
    return crc16(key+s+1,e-s-1) & 0x3FFF;
}

异常追踪

1 2	>mset a 1 b 2 CROSSSLOT Keys in request don't hash to the same slot

倒查异常的触发链：

定义

异常定义在 cluster.c
https://github.com/antirez/redis/blob/6.0.0/src/cluster.c

line:5711
void clusterRedirectClient(client *c, clusterNode *n, int hashslot, int error_code) {
    if (error_code == CLUSTER_REDIR_CROSS_SLOT) {
        addReplySds(c,sdsnew("-CROSSSLOT Keys in request don't hash to the same slot\r\n"));
    } ...
}

触发

触发点在clusterNode *getNodeByQuery函数中
外层循环遍历多指令，内层循环遍历多键，确保同一命令的所有键落在一个槽里。

line:5515
clusterNode *getNodeByQuery(client *c, struct redisCommand *cmd, robj **argv, int argc, int *hashslot, int *error_code) {
    ...
    /* Check that all the keys are in the same hash slot, and obtain this
     * slot and the node associated. */
    for (i = 0; i < ms->count; i++) {
        ...
        for (j = 0; j < numkeys; j++) {
            ...
            if (firstkey == NULL) {
                ...
            } else {
                /* If it is not the first key, make sure it is exactly
                 * the same key as the first we saw. */
                if (!equalStringObjects(firstkey,thiskey)) {
                    if (slot != thisslot) {
                        /* Error: multiple keys from different slots. */
                        getKeysFreeResult(keyindex);
                        if (error_code)
                            *error_code = CLUSTER_REDIR_CROSS_SLOT;
                        return NULL;
                    } else {
                        /* Flag this request as one with multiple different
                         * keys. */
                        multiple_keys = 1;
                    }
                }
            }
            ...
        }
    }
    ...
}

继续向上寻找，函数被多个地方调用，这里主要看server.c
在集群模式下处理命令会在此重定向，如果getNodeByQuery发现错误，会在clusterRedirectClient中应答
https://github.com/antirez/redis/blob/6.0.0/src/server.c

line:3368
int processCommand(client *c) {
    ...
    line:3448
    clusterNode *n = getNodeByQuery(c,c->cmd,c->argv,c->argc,
                                        &hashslot,&error_code);
    ...
    line:3456
    clusterRedirectClient(c,n,hashslot,error_code);
    ...
}

网络

继续向上寻找，函数仅在networking.c中被调用
这里让人下意识感觉有点不一样，从命名上看函数更偏向底层。
https://github.com/antirez/redis/blob/6.0.0/src/networking.c

line:1757
int processCommandAndResetClient(client *c) {
    ...
    if (processCommand(c) == C_OK) {
        commandProcessed(c);
    }
    ...
}

line:1775
void processInputBuffer(client *c) {
    /* Keep processing while there is something in the input buffer */
    while(c->qb_pos < sdslen(c->querybuf)) {
        ...
        /* Multibulk processing could see a <= 0 length. */
        if (c->argc == 0) {
            resetClient(c);
        } else {
            ...
            /* We are finally ready to execute the command. */
            if (processCommandAndResetClient(c) == C_ERR) {
                /* If the client is no longer valid, we avoid exiting this
                 * loop and trimming the client buffer later. So we return
                 * ASAP in that case. */
                return;
            }
        }
    }
    ...
}

line:1858
void readQueryFromClient(connection *conn) {
    ...
    /* There is more data in the client input buffer, continue parsing it
     * in case to check if there is a full command to execute. */
     processInputBuffer(c);
}

line:3091
int handleClientsWithPendingReadsUsingThreads(void) {
    ...
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);
        readQueryFromClient(c->conn);
    }
    ...
    /* Run the list of clients again to process the new buffers. */
    while(listLength(server.clients_pending_read)) {
        ...
        if (c->flags & CLIENT_PENDING_COMMAND) {
            c->flags &= ~CLIENT_PENDING_COMMAND;
            if (processCommandAndResetClient(c) == C_ERR) {
                /* If the client is no longer valid, we avoid
                 * processing the client later. So we just go
                 * to the next. */
                continue;
            }
        }
        processInputBuffer(c);
    }
    return processed;
}

看不下去了

好吧，看到这已经没有方向了。继续查找调用，都是线程、IO相关的函数，实在不明白其中机制。
开始在网上搜索相关资料，一开始搜”get/set源码解析”，一直找不到想要的内容，异常的源头在哪？正常的命令从哪执行？

直到在一篇文章中，发现自己错过了命令调用的地方int processCommand(client *c)。
而且追踪的思路是错的，之前一直以为异常是在命令函数执行之后报的，事实是异常检查在命令函数之前执行。

［Redis源码阅读］当你输入get/set命令的时候，Redis做了什么

https://github.com/antirez/redis/blob/6.0.0/src/server.c

line:3368
int processCommand(client *c) {
    ...
    line:3381
    /* Now lookup the command and check ASAP about trivial error conditions
     * such as wrong arity, bad command name and so forth. */
    c->cmd = c->lastcmd = lookupCommand(c->argv[0]->ptr);
    ...
    line:3448
    clusterNode *n = getNodeByQuery(c,c->cmd,c->argv,c->argc,
                                            &hashslot,&error_code);
    ...
    line:3601
    /* Exec the command */
    if (c->flags & CLIENT_MULTI &&
        c->cmd->proc != execCommand && c->cmd->proc != discardCommand &&
        c->cmd->proc != multiCommand && c->cmd->proc != watchCommand)
    {
        queueMultiCommand(c);
        addReply(c,shared.queued);
    } else {
        call(c,CMD_CALL_FULL);
        ...
    }
    return C_OK;
}


line:3200
void call(client *c, int flags) {
    ...
    c->cmd->proc(c);
    ...
}

proc 即是文章开头提到的命令对应的函数

总结

命令：

redis-cli -> network -> acceptTcpHandler -> anetTcpAccept -> acceptCommonHandler -> createClient -> readQueryFromClient

异常：

readQueryFromClient -> processInputBuffer -> processCommandAndResetClient -> processCommand -> getNodeByQuery -> clusterRedirectClient

执行：

readQueryFromClient -> processInputBuffer -> processCommandAndResetClient -> processCommand -> lookupCommand -> call

散列：

proc(c) -> setCommand -> setGenericCommand -> genericSetKey -> dbAdd -> slotToKeyAdd -> slotToKeyUpdateKey -> keyHashSlot

至此基本可以知道，Key中花括号的散列在哪计算，跨slot异常在哪抛出，命令定义函数在哪执行。

Redis集群的5种使用方式，各自优缺点分析
 redis中set命令的源码分析
 ［Redis源码阅读］当你输入get/set命令的时候，Redis做了什么