### Description
Trying to get NATS module working but Kamailio is refusing to start.
``` Sep 27 23:08:27 server kernel: kamailio[338880]: segfault at 70 ip 00007fe97b06f1cc sp 00007fff72d158f0 error 4 in libuv.so.1.0.0[7fe97b066000+1f000] Sep 27 23:08:27 server kernel: Code: 5a 85 ff ff b8 ea ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 8b 4e 30 83 f9 ff 0f 84 80 00 00 00 48 83 ec 08 85 c9 0f 88 90 00 00 00 <3b> 4f 70 73 45 f7 d2 21 56 28 48 8d 4e 18 48 89 f0 48 8b 76 18 75 Sep 27 23:08:27 server systemd[1]: Started Process Core Dump (PID 338886/UID 0). ```
#### Log Messages
In the log message I'm getting the following error:
``` INFO: nats [nats_mod.c:437]: nats_init_connection(): adding server [nats://192.168.1.1:4222] [1] ERROR: nats [nats_mod.c:456]: nats_init_connection(): could not create nats options [System Error] ``` ### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` (version: kamailio 5.6.1 (x86_64/linux) 34e6dd) ``` NATS Server version: ``` nats-server: v2.9.0 ```
libnats version:
``` libnats.so.3.4.0 ```
* **Operating System**:
``` CentOS Stream 9 5.14.0-163.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep 8 13:52:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux ``` * **Module Configuration**: ``` loadmodule "nats.so" modparam("nats", "nats_url", "nats://192.168.1.1:4222") ```
There is no issue at the network/firewall level as by using telnet I'm able to connect to the NATS server successfully. What is weird is that with tcpdump I can't see even attempts to connect to the NATS server.
Thanks for the report. Could you please create a core dump from the created dump file? Refer e.g. to [http://www.kamailio.org/wiki/tutorials/troubleshooting/coredumpfile%5D(wiki)
Hi @henningw ,
Please find the core dump below:
``` (gdb) bt #0 0x00007fb35c0961cc in uv.io_stop () from /lib64/libuv.so.1 #1 0x00007fb35c09c71a in uv.poll_stop.lto_priv () from /lib64/libuv.so.1 #2 0x00007fb35c09c763 in uv_poll_stop () from /lib64/libuv.so.1 #3 0x00007fb35c146d3d in nats_destroy_workers () at nats_mod.c:591 #4 0x00007fb35c146dc0 in mod_destroy () at nats_mod.c:603 #5 0x0000000000636b22 in destroy_modules () at core/sr_module.c:842 #6 0x000000000041f059 in cleanup (show_status=0) at main.c:561 #7 0x0000000000420ac1 in shutdown_children (sig=15, show_status=0) at main.c:704 #8 0x0000000000439a81 in main (argc=5, argv=0x7ffd6bebd598) at main.c:3093 (gdb) bt full #0 0x00007fb35c0961cc in uv.io_stop () from /lib64/libuv.so.1 No symbol table info available. #1 0x00007fb35c09c71a in uv.poll_stop.lto_priv () from /lib64/libuv.so.1 No symbol table info available. #2 0x00007fb35c09c763 in uv_poll_stop () from /lib64/libuv.so.1 No symbol table info available. #3 0x00007fb35c146d3d in nats_destroy_workers () at nats_mod.c:591 i = 0 worker = 0x100000598 pub_worker = 0x7fb35c256b20 __func__ = "nats_destroy_workers" #4 0x00007fb35c146dc0 in mod_destroy () at nats_mod.c:603 __func__ = "mod_destroy" #5 0x0000000000636b22 in destroy_modules () at core/sr_module.c:842 t = 0x7fb36125b6a8 foo = 0x7fb36125b1d0 __func__ = "destroy_modules" #6 0x000000000041f059 in cleanup (show_status=0) at main.c:561 memlog = 0 __func__ = "cleanup" #7 0x0000000000420ac1 in shutdown_children (sig=15, show_status=0) at main.c:704 __func__ = "shutdown_children" #8 0x0000000000439a81 in main (argc=5, argv=0x7ffd6bebd598) at main.c:3093 cfg_stream = 0x244d380 c = -1 r = 0 tmp = 0xc5 <error: Cannot access memory at address 0xc5> tmp_len = 0 port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x8152f8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 4101768862 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x300000000 p = 0x40 <error: Cannot access memory at address 0x40> st = {st_dev = 25, st_ino = 1402, st_nlink = 2, st_mode = 16832, st_uid = 1001, st_gid = 1002, __pad0 = 0, st_rdev = 0, st_size = 60, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1663600770, tv_nsec = 489002484}, st_mtim = {tv_sec = 1664316775, tv_nsec = 681629684}, st_ctim = {tv_sec = 1664316775, tv_nsec = 681629684}, __glibc_reserved = {0, 0, 0}} tbuf = "\342\214\317a\263\177\000\000\070K\314a\263\177\000\000\020\322\353k\375\177\000\000\a\000\000\000\000\000\000\000\330\070\314a\263\177\000\000\a\000\000\000#\000\000\000\060\232\317a\263\177\000\000\225f\315a\263\177\000\000\000\000\000\000\000\000\000\000\200y\315a\263\177\000\000\070\300\234a\263\177\000\000\320\035\314a\263\177\000\000\000\000\000\000\000\000\000\000\340\321\353k\375\177\000\000\360\216\317a\263\177\000\000\330J\314a\263\177\000\000\330J\314a\263\177\000\000\000\000\000\000\000\000\000\000\230,\314a\263\177\000\000\340\321\353k\375\177\000\000\220\245\317a\263\177\000\000\000\060\314a\263\177\000\000\022<\314a\263\177\000\000\066\061\315a\263\177\000\000\344\343\233a\263\177\000\000"... option_index = 0 long_options = {{name = 0x817966 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x812514 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x81796b "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x817971 "subst", has_arg = 1, flag = 0x0, val = 1025}, { name = 0x817977 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x817980 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x81798a "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x817994 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, { name = 0x81799f "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x8179a8 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x8179b3 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x8179b9 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, { name = 0x8179c3 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" ```
can you please go to frame 3 and check the value of ``pub_worker``?
``` f 3 p *pub_worker ```
There you go: ``` (gdb) f 3 #3 0x00007fb35c146d3d in nats_destroy_workers () at nats_mod.c:591 591 uv_poll_stop(&pub_worker->poll); (gdb) p *pub_worker $1 = {pid = 0, fd = 0, uvLoop = 0x0, pipe = {data = 0x0, loop = 0x0, type = UV_UNKNOWN_HANDLE, close_cb = 0x0, handle_queue = {0x0, 0x0}, u = {fd = 0, reserved = {0x0, 0x0, 0x0, 0x0}}, next_closing = 0x0, flags = 0, write_queue_size = 0, alloc_cb = 0x0, read_cb = 0x0, connect_req = 0x0, shutdown_req = 0x0, io_watcher = {cb = 0x0, pending_queue = {0x0, 0x0}, watcher_queue = {0x0, 0x0}, pevents = 0, events = 0, fd = 0}, write_queue = {0x0, 0x0}, write_completed_queue = {0x0, 0x0}, connection_cb = 0x0, delayed_error = 0, accepted_fd = 0, queued_fds = 0x0, ipc = 0, pipe_fname = 0x0}, poll = {data = 0x0, loop = 0x0, type = UV_UNKNOWN_HANDLE, close_cb = 0x0, handle_queue = {0x0, 0x0}, u = {fd = 0, reserved = {0x0, 0x0, 0x0, 0x0}}, next_closing = 0x0, flags = 0, poll_cb = 0x0, io_watcher = {cb = 0x0, pending_queue = {0x0, 0x0}, watcher_queue = {0x0, 0x0}, pevents = 0, events = 0, fd = 0}}, nc = 0x0} (gdb) ```
@eschmidbauer maybe this?
``` diff --git a/src/modules/nats/nats_mod.c b/src/modules/nats/nats_mod.c index 852d0fbbab..35e6b2612f 100644 --- a/src/modules/nats/nats_mod.c +++ b/src/modules/nats/nats_mod.c @@ -588,7 +588,8 @@ int nats_destroy_workers() LM_ERR("could not cleanup worker connection\n"); } } - uv_poll_stop(&pub_worker->poll); + if(pub_worker->poll->type != UV_UNKNOWN_HANDLE) + uv_poll_stop(&pub_worker->poll); shm_free(pub_worker); } } ```
Hi- I think the module requires you to subscribe to a channel. The module was originally built only as a consumer but now since the module can publish, that is no longer the case. Please confirm my suspicion and I will try to work on a fix to disable subscribe if only needed it for publish
@eschmidbauer, not sure about that, just tried now to subscribe to a queue and it's still the same
``` modparam("nats", "subject_queue_group", "test") ``` Log Error: ``` ERROR: nats [nats_mod.c:456]: nats_init_connection(): could not create nats options [System Error] ERROR: nats [nats_mod.c:269]: init_pub_worker(): failed to init nat connections ERROR: nats [nats_mod.c:355]: mod_child_init(): failed to init struct for pub worker[0] ```
I believe you need to use the correct format like `modparam("nats", "subject_queue_group", "Kamailio-World:2020")`
Hi @eschmidbauer,
thanks for looking into this, but unfortunately still not working for me, this is the format I've tried:
``` modparam("nats", "nats_url", "nats://192.168.1.1:4222") modparam("nats", "subject_queue_group", "sql.test_sub.com:testqueue") ```
Could you try the changes @linuxmaniac proposed
@linuxmaniac, I'm trying to apply the patch you provided but it's not working for me. Maybe it's for a different branch? I tried with 5.6, master and none of them worked.
``` patching file src/modules/nats/nats_mod.c Hunk #1 FAILED at 588. 1 out of 1 hunk FAILED -- saving rejects to file src/modules/nats/nats_mod.c.rej ```
If I'm changing file manually, then it does not compile any more ``` CC (gcc) [M nats.so] nats_mod.o nats_mod.c: In function ‘nats_destroy_workers’: nats_mod.c:591:44: error: invalid type argument of ‘->’ (have ‘uv_poll_t’ {aka ‘struct uv_poll_s’}) 591 | if(pub_worker->poll->type != UV_UNKNOWN_HANDLE) | ^~ make: *** [../../Makefile.rules:100: nats_mod.o] Error 1 ```
@iliesh Sorry is a structure not a pointer... try this, please ``` diff --git a/src/modules/nats/nats_mod.c b/src/modules/nats/nats_mod.c index 852d0fbbab..51277e4ac0 100644 --- a/src/modules/nats/nats_mod.c +++ b/src/modules/nats/nats_mod.c @@ -588,7 +588,8 @@ int nats_destroy_workers() LM_ERR("could not cleanup worker connection\n"); } } - uv_poll_stop(&pub_worker->poll); + if(pub_worker->poll.type != UV_UNKNOWN_HANDLE) + uv_poll_stop(&pub_worker->poll); shm_free(pub_worker); } } ```
Hi @linuxmaniac, @eschmidbauer - unfortunately the patch didn't work either, kamailio still crashing with the same error:
``` INFO: nats [nats_mod.c:437]: nats_init_connection(): adding server [nats://192.168.1.1:4222] [1] ERROR: nats [nats_mod.c:456]: nats_init_connection(): could not create nats options [System Error] ERROR: nats [nats_mod.c:230]: init_worker(): failed to init nat connections ERROR: nats [nats_mod.c:346]: mod_child_init(): failed to init struct for worker [0] ```
``` #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007fa7a25445b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007fa7a24f7ce6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007fa7a24cb7f3 in __GI_abort () at abort.c:79 #4 0x00000000006fbc43 in qm_debug_check_frag (qm=0x7fa79cc5c000, f=0x7fa79cd3de40, file=0x7fa79cc33688 "nats: nats_mod.c", line=593, efile=0x882e79 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:129 #5 0x00000000006ffe6b in qm_free (qmp=0x7fa79cc5c000, p=0x7fa79cd3de78, file=0x7fa79cc33688 "nats: nats_mod.c", func=0x7fa79cc34360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7fa79cc33000 "nats") at core/mem/q_malloc.c:511 #6 0x000000000070a971 in qm_shm_free (qmp=0x7fa79cc5c000, p=0x7fa79cd3de78, file=0x7fa79cc33688 "nats: nats_mod.c", func=0x7fa79cc34360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7fa79cc33000 "nats") at core/mem/q_malloc.c:1350 #7 0x00007fa79cc2dd89 in nats_destroy_workers () at nats_mod.c:593 #8 0x00007fa79cc2ddce in mod_destroy () at nats_mod.c:604 #9 0x0000000000636b22 in destroy_modules () at core/sr_module.c:842 #10 0x000000000041f059 in cleanup (show_status=0) at main.c:561 #11 0x0000000000420ac1 in shutdown_children (sig=15, show_status=0) at main.c:704 #12 0x0000000000439a81 in main (argc=5, argv=0x7ffe957de5a8) at main.c:3093 ```
``` #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 tid = <optimized out> ret = 0 pd = <optimized out> old_mask = {__val = {16949132684318090892, 1844065090856224981, 0, 0, 0, 0, 4410266781315238445, 10, 0, 0, 0, 0, 0, 0, 0, 0}} ret = <optimized out> #1 0x00007fa7a25445b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 No locals. #2 0x00007fa7a24f7ce6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 ret = <optimized out> #3 0x00007fa7a24cb7f3 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0, 0, 0, 1880844493789993498, 1880844493789993498, 1880844493789993498, 1880844493789993498, 0, 0, 0, 0, 2314885530818453536, 2314885530818453536, 2314885530818453536, 2314885530818453536, 0}}, sa_flags = 0, sa_restorer = 0x0} sigs = {__val = {32, 0, 0, 0, 13816831175882180370, 13816973010994691967, 13816973010643234476, 13816973012072644543, 0, 0, 0, 0, 4702111234474983745, 4702111234474983745, 4702111234474983745, 4702111234474983745}} #4 0x00000000006fbc43 in qm_debug_check_frag (qm=0x7fa79cc5c000, f=0x7fa79cd3de40, file=0x7fa79cc33688 "nats: nats_mod.c", line=593, efile=0x882e79 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:129 p = 0x501000001ff __func__ = "qm_debug_check_frag" #5 0x00000000006ffe6b in qm_free (qmp=0x7fa79cc5c000, p=0x7fa79cd3de78, file=0x7fa79cc33688 "nats: nats_mod.c", func=0x7fa79cc34360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7fa79cc33000 "nats") at core/mem/q_malloc.c:511 qm = 0x7fa79cc5c000 f = 0x7fa79cd3de40 size = 140731406474448 next = 0x1 prev = 0x7fa79cc956c8 __func__ = "qm_free" #6 0x000000000070a971 in qm_shm_free (qmp=0x7fa79cc5c000, p=0x7fa79cd3de78, file=0x7fa79cc33688 "nats: nats_mod.c", func=0x7fa79cc34360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7fa79cc33000 "nats") at core/mem/q_malloc.c:1350 No locals. #7 0x00007fa79cc2dd89 in nats_destroy_workers () at nats_mod.c:593 i = 1 worker = 0x7fa79cd3dc18 pub_worker = 0x7fa79cd3de78 __func__ = "nats_destroy_workers" #8 0x00007fa79cc2ddce in mod_destroy () at nats_mod.c:604 __func__ = "mod_destroy" #9 0x0000000000636b22 in destroy_modules () at core/sr_module.c:842 t = 0x7fa7a1d426a8 foo = 0x7fa7a1d421d0 __func__ = "destroy_modules" #10 0x000000000041f059 in cleanup (show_status=0) at main.c:561 memlog = 0 __func__ = "cleanup" #11 0x0000000000420ac1 in shutdown_children (sig=15, show_status=0) at main.c:704 __func__ = "shutdown_children" #12 0x0000000000439a81 in main (argc=5, argv=0x7ffe957de5a8) at main.c:3093 cfg_stream = 0x2473380 c = -1 r = 0 tmp = 0xc5 <error: Cannot access memory at address 0xc5> tmp_len = 0 port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x8152f8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 532749240 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x300000000 p = 0x40 <error: Cannot access memory at address 0x40> st = {st_dev = 25, st_ino = 1308, st_nlink = 2, st_mode = 16832, st_uid = 1001, st_gid = 1002, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = { tv_sec = 1664442191, tv_nsec = 175708131}, st_mtim = {tv_sec = 1664442396, tv_nsec = 174674806}, st_ctim = {tv_sec = 1664442396, tv_nsec = 174674806}, __glibc_reserved = {0, 0, 0}} tbuf = "\342\374}\242\247\177\000\000\070\273z\242\247\177\000\000 \342}\225\376\177\000\000\a\000\000\000\000\000\000\000بz\242\247\177\000\000\a\000\000\000#\000\000\000\060\n~\242\247\177\000\000\225\326{\242\247\177\000\000\000\000\000\000\000\000\000\000\200\351{\242\247\177\000\000\070\060K\242\247\177\000\000Ѝz\242\247\177\000\000\000\000\000\000\000\000\000\000\360\341}\225\376\177\000\000\360\376}\242\247\177\000\000غz\242\247\177\000\000غz\242\247\177\000\000\000\000\000\000\000\000\000\000\230\234z\242\247\177\000\000\360\341}\225\376\177\000\000\220\025~\242\247\177\000\000\000\240z\242\247\177\000\000\022\254z\242\247\177\000\000\066\241{\242\247\177\000\000\344SJ\242\247\177\000\000\272\070}\242"... option_index = 0 long_options = {{name = 0x817966 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x812514 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x81796b "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x817971 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x817977 "substdef", has_arg = 1, flag = 0x0, val = 1026}, { name = 0x817980 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x81798a "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x817994 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x81799f "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x8179a8 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, { name = 0x8179b3 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x8179b9 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x8179c3 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" ```
yes but not in the same place :-)
what about with this? ``` diff --git a/src/modules/nats/nats_mod.c b/src/modules/nats/nats_mod.c index 852d0fbbab..aff106c48f 100644 --- a/src/modules/nats/nats_mod.c +++ b/src/modules/nats/nats_mod.c @@ -226,12 +226,12 @@ int init_worker(
nats_init_environment(); nc = _init_nats_connection(); + memset(worker, 0, sizeof(*worker)); if(nats_init_connection(nc) < 0) { LM_ERR("failed to init nat connections\n"); return -1; }
- memset(worker, 0, sizeof(*worker)); worker->subject = shm_malloc(strlen(subject) + 1); strcpy(worker->subject, subject); worker->subject[strlen(subject)] = '\0'; @@ -265,11 +265,11 @@ int init_pub_worker( { nats_connection_ptr nc = NULL; nc = _init_nats_connection(); + memset(worker, 0, sizeof(*worker)); if(nats_init_connection(nc) < 0) { LM_ERR("failed to init nat connections\n"); return -1; } - memset(worker, 0, sizeof(*worker)); worker->nc = nc; return 0; } @@ -588,7 +588,8 @@ int nats_destroy_workers() LM_ERR("could not cleanup worker connection\n"); } } - uv_poll_stop(&pub_worker->poll); + if(pub_worker->poll.type != UV_UNKNOWN_HANDLE) + uv_poll_stop(&pub_worker->poll); shm_free(pub_worker); } } ```
Hi @linuxmaniac,
``` Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007efc5586f5b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007efc55822ce6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007efc557f67f3 in __GI_abort () at abort.c:79 #4 0x00000000006fbc43 in qm_debug_check_frag (qm=0x7efc4ff87000, f=0x7efc50068e90, file=0x7efc4ff5e688 "nats: nats_mod.c", line=593, efile=0x882eb9 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:129 #5 0x00000000006ffe6b in qm_free (qmp=0x7efc4ff87000, p=0x7efc50068ec8, file=0x7efc4ff5e688 "nats: nats_mod.c", func=0x7efc4ff5f360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7efc4ff5e000 "nats") at core/mem/q_malloc.c:511 #6 0x000000000070a971 in qm_shm_free (qmp=0x7efc4ff87000, p=0x7efc50068ec8, file=0x7efc4ff5e688 "nats: nats_mod.c", func=0x7efc4ff5f360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7efc4ff5e000 "nats") at core/mem/q_malloc.c:1350 #7 0x00007efc4ff58d89 in nats_destroy_workers () at nats_mod.c:593 #8 0x00007efc4ff58dce in mod_destroy () at nats_mod.c:604 #9 0x0000000000636b22 in destroy_modules () at core/sr_module.c:842 #10 0x000000000041f059 in cleanup (show_status=0) at main.c:561 #11 0x0000000000420ac1 in shutdown_children (sig=15, show_status=0) at main.c:704 #12 0x0000000000439a81 in main (argc=5, argv=0x7fff506497d8) at main.c:3093 ```
``` #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 tid = <optimized out> ret = 0 pd = <optimized out> old_mask = {__val = {3255307778118664515, 3478467806582802989, 10, 132380695343104, 0, 0, 0, 0, 296, 301, 10, 0, 0, 0, 0, 0}} ret = <optimized out> #1 0x00007efc5586f5b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 No locals. #2 0x00007efc55822ce6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 ret = <optimized out> #3 0x00007efc557f67f3 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x1a1a1a1a1a1a1a1a, sa_sigaction = 0x1a1a1a1a1a1a1a1a}, sa_mask = {__val = {1880844493789993498, 0, 0, 0, 0, 2314885530818453536, 2314885530818453536, 2314885530818453536, 2314885530818453536, 0, 0, 0, 0, 9016478961296, 270582939648, 0}}, sa_flags = 0, sa_restorer = 0x829484} sigs = {__val = {32, 13816973012072644543, 0, 0, 0, 0, 4702111234474983745, 4702111234474983745, 4702111234474983745, 4702111234474983745, 0, 0, 0, 0, 1880844493789993498, 1880844493789993498}} #4 0x00000000006fbc43 in qm_debug_check_frag (qm=0x7efc4ff87000, f=0x7efc50068e90, file=0x7efc4ff5e688 "nats: nats_mod.c", line=593, efile=0x882eb9 "core/mem/q_malloc.c", eline=511) at core/mem/q_malloc.c:129 p = 0x501000001ff __func__ = "qm_debug_check_frag" #5 0x00000000006ffe6b in qm_free (qmp=0x7efc4ff87000, p=0x7efc50068ec8, file=0x7efc4ff5e688 "nats: nats_mod.c", func=0x7efc4ff5f360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7efc4ff5e000 "nats") at core/mem/q_malloc.c:511 qm = 0x7efc4ff87000 f = 0x7efc50068e90 size = 140734542155520 next = 0x1 prev = 0x7efc4ffc06c8 __func__ = "qm_free" #6 0x000000000070a971 in qm_shm_free (qmp=0x7efc4ff87000, p=0x7efc50068ec8, file=0x7efc4ff5e688 "nats: nats_mod.c", func=0x7efc4ff5f360 <__func__.13> "nats_destroy_workers", line=593, mname=0x7efc4ff5e000 "nats") at core/mem/q_malloc.c:1350 No locals. #7 0x00007efc4ff58d89 in nats_destroy_workers () at nats_mod.c:593 i = 1 worker = 0x7efc50068c68 pub_worker = 0x7efc50068ec8 __func__ = "nats_destroy_workers" #8 0x00007efc4ff58dce in mod_destroy () at nats_mod.c:604 __func__ = "mod_destroy" #9 0x0000000000636b22 in destroy_modules () at core/sr_module.c:842 t = 0x7efc5506d6a8 foo = 0x7efc5506d1d0 __func__ = "destroy_modules" #10 0x000000000041f059 in cleanup (show_status=0) at main.c:561 memlog = 0 __func__ = "cleanup" #11 0x0000000000420ac1 in shutdown_children (sig=15, show_status=0) at main.c:704 __func__ = "shutdown_children" #12 0x0000000000439a81 in main (argc=5, argv=0x7fff506497d8) at main.c:3093 cfg_stream = 0xced380 --Type <RET> for more, q to quit, c to continue without paging-- c = -1 r = 0 tmp = 0xc5 <error: Cannot access memory at address 0xc5> tmp_len = 0 port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x8152f8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 3432664775 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x300000000 p = 0x40 <error: Cannot access memory at address 0x40> st = {st_dev = 25, st_ino = 1308, st_nlink = 2, st_mode = 16832, st_uid = 1001, st_gid = 1002, __pad0 = 0, st_rdev = 0, st_size = 60, st_blksize = 4096, st_blocks = 0, st_atim = { tv_sec = 1664442191, tv_nsec = 175708131}, st_mtim = {tv_sec = 1664452599, tv_nsec = 641505726}, st_ctim = {tv_sec = 1664452599, tv_nsec = 641505726}, __glibc_reserved = {0, 0, 0}} tbuf = "⬰U\374~\000\000\070k\255U\374~\000\000P\224dP\377\177\000\000\a\000\000\000\000\000\000\000\330X\255U\374~\000\000\a\000\000\000#\000\000\000\060\272\260U\374~\000\000\225\206\256U\374~\000\000\000\000\000\000\000\000\000\000\200\231\256U\374~\000\000\070\340}U\374~\000\000\320=\255U\374~\000\000\000\000\000\000\000\000\000\000 \224dP\377\177\000\000\360\256\260U\374~\000\000\330j\255U\374~\000\000\330j\255U\374~\000\000\000\000\000\000\000\000\000\000\230L\255U\374~\000\000 \224dP\377\177\000\000\220ŰU\374~\000\000\000P\255U\374~\000\000\022\\255U\374~\000\000\066Q\256U\374~\000\000\344\003}U\374~\000\000\272\350\257"... option_index = 0 long_options = {{name = 0x817966 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x812514 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x81796b "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x817971 "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x817977 "substdef", has_arg = 1, flag = 0x0, val = 1026}, { name = 0x817980 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x81798a "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x817994 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x81799f "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x8179a8 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, { name = 0x8179b3 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x8179b9 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, {name = 0x8179c3 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" ```
The last backtraces are from crashes at shut down, probably because kamailio was not able to start and the nats structures were not initialized. I pushed the commit bc97b7adf77406dfc08b57b4a7df912a76d324e5 with a safety check for this case.
Below you can find the core dump from the last commit in the master branch:
``` (gdb) bt #0 0x00007f4ac66041cc in uv.io_stop () from /lib64/libuv.so.1 #1 0x00007f4ac660a71a in uv.poll_stop.lto_priv () from /lib64/libuv.so.1 #2 0x00007f4ac660a763 in uv_poll_stop () from /lib64/libuv.so.1 #3 0x00007f4ac66b4d3d in nats_destroy_workers () at nats_mod.c:591 #4 0x00007f4ac66b4dd3 in mod_destroy () at nats_mod.c:606 #5 0x0000000000641fb5 in destroy_modules () at core/sr_module.c:861 #6 0x000000000041f089 in cleanup (show_status=0) at main.c:563 #7 0x0000000000420af1 in shutdown_children (sig=15, show_status=0) at main.c:706 #8 0x0000000000439a8e in main (argc=13, argv=0x7fff8f4a7cc8) at main.c:3100 (gdb) bt full #0 0x00007f4ac66041cc in uv.io_stop () from /lib64/libuv.so.1 No symbol table info available. #1 0x00007f4ac660a71a in uv.poll_stop.lto_priv () from /lib64/libuv.so.1 No symbol table info available. #2 0x00007f4ac660a763 in uv_poll_stop () from /lib64/libuv.so.1 No symbol table info available. #3 0x00007f4ac66b4d3d in nats_destroy_workers () at nats_mod.c:591 i = 0 worker = 0x7f4ac67c4c68 pub_worker = 0x7f4ac67c4d08 __func__ = "nats_destroy_workers" #4 0x00007f4ac66b4dd3 in mod_destroy () at nats_mod.c:606 __func__ = "mod_destroy" #5 0x0000000000641fb5 in destroy_modules () at core/sr_module.c:861 t = 0x7f4bc77ca570 foo = 0x7f4bc77ca098 __func__ = "destroy_modules" #6 0x000000000041f089 in cleanup (show_status=0) at main.c:563 memlog = -1077952723 __func__ = "cleanup" #7 0x0000000000420af1 in shutdown_children (sig=15, show_status=0) at main.c:706 __func__ = "shutdown_children" #8 0x0000000000439a8e in main (argc=13, argv=0x7fff8f4a7cc8) at main.c:3100 cfg_stream = 0x13f1380 c = -1 r = 0 tmp = 0x7fff8f4a8eaf "" tmp_len = 0 port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x823308 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 1254916872 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x300000000 p = 0x40 <error: Cannot access memory at address 0x40> st = {st_dev = 25, st_ino = 1308, st_nlink = 2, st_mode = 16832, st_uid = 1001, st_gid = 1002, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1664442191, tv_nsec = 175708131}, st_mtim = {tv_sec = 1665675696, tv_nsec = 415332480}, st_ctim = {tv_sec = 1665675696, tv_nsec = 415332480}, __glibc_reserved = {0, 0, 0}} tbuf = "\342lf\310K\177\000\000\070+c\310K\177\000\000@yJ\217\377\177\000\000\a\000\000\000\000\000\000\000\330\030c\310K\177\000\000\a\000\000\000#\000\000\000\060zf\310K\177\000\000\225Fd\310K\177\000\000\000\000\000\000\000\000\000\000\200Yd\310K\177\000\000\070\240\063\310K\177\000\000\320\375b\310K\177\000\000\000\000\000\000\000\000\000\000\020yJ\217\377\177\000\000\360nf\310K\177\000\000\330*c\310K\177\000\000\330*c\310K\177\000\000\000\000\000\000\000\000\000\000\250\fc\310K\177\000\000\020yJ\217\377\177\000\000\220\205f\310K\177\000\000\000\020c\310K\177\000\000\022\034c\310K\177\000\000\066\021d\310K\177\000\000\344\303\062\310K\177\000\000"... option_index = 0 long_options = {{name = 0x8259a6 "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x820521 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x8259ab "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x8259b1 "subst", has_arg = 1, flag = 0x0, val = 1025}, { name = 0x8259b7 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x8259c0 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x8259ca "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x8259d4 "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, { name = 0x8259df "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x8259e8 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x8259f3 "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x8259f9 "cfg-print", has_arg = 0, flag = 0x0, val = 1033}, { name = 0x825a03 "atexit", has_arg = 1, flag = 0x0, val = 1034}, {name = 0x825a0a "all-errors", has_arg = 0, flag = 0x0, val = 1035}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main" (gdb) ```
There was another destroy case that should have been checked. Try again with latest master branch and see how it goes.
Was the latest master branch fixing it?
Hi @miconda , sorry missed your last comment - I'm going to test it and let you know the status.
Thanks
Hi @miconda ,
I can't compile the module anymore, getting the error:
``` make nats_mod.c: In function ‘nats_cleanup_connection’: nats_mod.c:540:2: error: ‘for’ loop initial declarations are only allowed in C99 mode for(int s = 0; s < NATS_MAX_SERVERS; s++) { ^ nats_mod.c:540:2: note: use option -std=c99 or -std=gnu99 to compile your code make: *** [nats_mod.o] Error 1 ```
I don't recall enforcing specific c standard in makefile, although we try to use older standards in c code to enable using on old systems, although c99 is very old and probably we can consider it as ok for everything now. What are the OS, kernel and gcc versions you use?
For the sake of testing it easier in your case, I pushed a commit so the compile error should be gone if you try with latest master now.
Hi @miconda ,
Now it's compiling successfully and kamailio doesn't seem to crash anymore but it's not starting either. Those are the errors from the log file:
``` Nov 28 16:20:26 k1-dev k1-dev[24701]: INFO: nats [nats_mod.c:439]: nats_init_connection(): adding server [nats://192.168.1.1:4222] [1] Nov 28 16:20:26 k1-dev k1-dev[24701]: ERROR: nats [nats_mod.c:458]: nats_init_connection(): could not create nats options [System Error] Nov 28 16:20:26 k1-dev k1-dev[24701]: ERROR: nats [nats_mod.c:232]: init_worker(): failed to init nat connections Nov 28 16:20:26 k1-dev k1-dev[24701]: ERROR: nats [nats_mod.c:348]: mod_child_init(): failed to init struct for worker [0] Nov 28 16:20:26 k1-dev k1-dev[24701]: ERROR: <core> [core/sr_module.c:903]: init_mod_child(): error while initializing module nats (/usr/local/lib64/kamailio/modules/nats.so) (idx: 0 rank: -127 desc: [main]) Nov 28 16:20:26 k1-dev k1-dev[24701]: ERROR: <core> [main.c:1665]: main_loop(): error in init_child(PROC_INT) -- exiting Nov 28 16:20:26 k1-dev kamailio: ERROR: <core> [core/daemonize.c:303]: daemonize(): Main process exited before writing to pipe ```
OS: CentOS 7 X64, Kernel: 3.10.0-1160.76.1.el7.x86_64, gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Hi, that last error appears to be related to libnats `could not create nats options [System Error]`
Can you confirm the Nats server is available?
Hi @eschmidbauer ,
Yes, at least using telnet I'm able to connect from Kamailio server.
``` [root@dev src]# telnet 192.168.1.1 4222 Trying 192.168.1.1... Connected to 192.168.1.1. Escape character is '^]'. INFO {"server_id":"NAK6SMFE53GJYNICKK6QFTXDAIWEL753AC4ZN4CMRFC5XSWXBIA6GUJA","server_name":"NAK6SMFE53GJYNICKK6QFTXDAIWEL753AC4ZN4CMRFC5XSWXBIA6GUJA","version":"2.9.3","proto":1,"git_commit ``` I'm using NATS Streaming server, not sure that this does matter.
I installed a nats server on the same box where kamailio is running, and got the same error:
``` Nov 28 18:02:25 k1-dev k1-dev[25790]: INFO: nats [nats_mod.c:439]: nats_init_connection(): adding server [nats://127.0.0.1:4222] [1] Nov 28 18:02:25 k1-dev k1-dev[25790]: ERROR: nats [nats_mod.c:458]: nats_init_connection(): could not create nats options [System Error] Nov 28 18:02:25 k1-dev k1-dev[25790]: ERROR: nats [nats_mod.c:232]: init_worker(): failed to init nat connections Nov 28 18:02:25 k1-dev k1-dev[25790]: ERROR: nats [nats_mod.c:348]: mod_child_init(): failed to init struct for worker [0] Nov 28 18:02:25 k1-dev k1-dev[25790]: ERROR: <core> [core/sr_module.c:903]: init_mod_child(): error while initializing module nats (/usr/local/lib64/kamailio/modules/nats.so) (idx: 0 rank: -127 desc: [main]) ```
Just noticed now that I'm using libnats 3.5.1 which is beta still, I will try to downgrade it to 3.5.0 and try again.
unfortunately that didn't help either, the same error:
``` Nov 28 18:26:35 k1-dev k1-dev[27366]: INFO: nats [nats_mod.c:439]: nats_init_connection(): adding server [nats://127.0.0.1:4222] [1] Nov 28 18:26:35 k1-dev k1-dev[27366]: ERROR: nats [nats_mod.c:458]: nats_init_connection(): could not create nats options [System Error] Nov 28 18:26:35 k1-dev k1-dev[27366]: ERROR: nats [nats_mod.c:232]: init_worker(): failed to init nat connections Nov 28 18:26:35 k1-dev k1-dev[27366]: ERROR: nats [nats_mod.c:348]: mod_child_init(): failed to init struct for worker [0] Nov 28 18:26:35 k1-dev k1-dev[27366]: ERROR: <core> [core/sr_module.c:903]: init_mod_child(): error while initializing module nats (/usr/local/lib64/kamailio/modules/nats.so) (idx: 0 rank: -127 desc: [main]) Nov 28 18:26:35 k1-dev k1-dev[27366]: ERROR: <core> [main.c:1665]: main_loop(): error in init_child(PROC_INT) -- exiting Nov 28 18:26:35 k1-dev kamailio: ERROR: <core> [core/daemonize.c:303]: daemonize(): Main process exited before writing to pipe ```
in the kernel logs I do see also some error related to the libuv, not sure that this can cause nats module to panic ``` Nov 28 18:26:35 k1-dev kernel: kamailio[27366]: segfault at 70 ip 00007f42e55f0888 sp 00007ffcce36f970 error 4 in libuv.so.1.0.0[7f42e55e0000+2c000] ```
is this problem still persisting? Reviewing this again, it looks like it's an issue in libuv. Could you try uninstalling/installing libuv?
Sorry, I haven't had a chance to test with the latest Kamailio version yet. I just gave up and moved to the evapi module where I can use NATS connection without any issues.
ok - closing this issue since i cannot reproduce it.
Closed #3253 as completed.