### Description
When restarting kamailio nodes in our infrastructure we noticed that under traffic some
nodes started using the 100% of the CPU, with the precious help of @giavac we were able to
track down the issue to an infinite loop inside the htable module when synchronizing
somewhat big (60K) htables via dmq
### Troubleshooting
#### Reproduction
Have 1 kamailio instance with a 60K+ htable and start a new instance, the first instance
will try to send the whole table to the new instance and it will enter an infinite loop
which consumes 100% of the CPU
This is caused by a double call to **ht_dmq_cell_group_flush** which creates a circular
list on the json structure hierarchy, the second call happens in this block of code (hence
why it's required a 60K htable):
https://github.com/kamailio/kamailio/blob/5.2/src/modules/htable/ht_dmq.c#L…
When this happens **ht_dmq_cell_group_flush** try to add
**ht_dmq_jdoc_cell_group.jdoc_cells** inside **ht_dmq_jdoc_cell_group.jdoc->root** but
this root already has **json_cells** as its child
so when **srjson_AddItemToObject** is called (and in turn **srjson_AddItemToArray**) it
gets appended as the child of itself:
https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L…
This circular structure then causes a loop when calling **srjson_PrintUnformatted**
because in the **print_object** function the circular list is looped over:
https://github.com/kamailio/kamailio/blob/master/src/lib/srutils/srjson.c#L…
### Possible Solutions
One possible solution could be to destroy and init again the **ht_dmq_jdoc_cell_group**
structure after calling the flush:
```
if (ht_dmq_jdoc_cell_group.size >= dmq_cell_group_max_size) {
LM_DBG("sending group count[%d]size[%d]\n", ht_dmq_jdoc_cell_group.count,
ht_dmq_jdoc_cell_group.size);
if (ht_dmq_cell_group_flush(node) < 0) {
ht_slot_unlock(ht, i);
goto error;
}
ht_dmq_cell_group_destroy();
ht_dmq_cell_group_init();
}
```
But we are not sure about the performance implications.
### Additional Information
`# kamailio -v
version: kamailio 5.2.1 (x86_64/linux) 44e488
flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE,
USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC,
DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER,
USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535,
DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 44e488
compiled on 11:52:58 Feb 21 2019 with gcc 5.4.0
`
* **Operating System**:
ubuntu:xenial docker container
`# uname -a
Linux kama-0 4.4.0-135-generic #161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64
x86_64 GNU/Linux`
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1863