There seems to be an intermittent issue with kamailio db_postgres module creating new TLS connection with postgresql 9.4 server. Every now and then, kamailio would crash when initiation db connection.
Here is the gdb core dump output. Thanks,
Thread 1 (Thread 0x7f02f2dab7c0 (LWP 1502)): #0 0x00007f02f261de84 in RSA_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #1 0x00007f02f264810a in EVP_PKEY_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #2 0x00007f02f2657a24 in pubkey_cb () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #3 0x00007f02f265d8ee in asn1_item_combine_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #4 0x00007f02f265dabf in ASN1_template_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #5 0x00007f02f265d9a6 in asn1_item_combine_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #6 0x00007f02f265dabf in ASN1_template_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #7 0x00007f02f265d9a6 in asn1_item_combine_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #8 0x00007f02f265daf5 in ASN1_item_free () from /usr/local/ssl/lib/libcrypto.so.1.0.0 No symbol table info available. #9 0x00007f02f2980b89 in ssl_set_pkey () from /usr/local/ssl/lib/libssl.so.1.0.0 No symbol table info available. #10 0x00007f02f2981076 in SSL_use_PrivateKey_file () from /usr/local/ssl/lib/libssl.so.1.0.0 No symbol table info available. #11 0x00007f02f2304f8a in initialize_SSL (conn=0xfe1150) at fe-secure.c:1310 buf = {st_dev = 64772, st_ino = 1331, st_nlink = 1, st_mode = 33152, st_uid = 498, st_gid = 498, __pad0 = 0, st_rdev = 0, st_size = 1704, st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1439918487, tv_nsec = 915125877}, st_mtim = {tv_sec = 1439918474, tv_nsec = 632125886}, st_ctim = {tv_sec = 1439918479, tv_nsec = 551125884}, __unused = {0, 0, 0}} homedir = "/var/run/kamailio\000r\000\000\000\000\000\334-r\000\000\000\000\000\200-)\357\002\177\000\000\020\320\022\357\002\177\000\000H-)\357\002\177\000\000e&T", '\000' <repeats 13 times>"\200, -)\357\002\177\000\000\000\000\000\000\000\000\000\000\200-)\357\002\177\000\000\260]A\000\000\000\000\000@\245\064\214\377\177", '\000' <repeats 18 times>"\340, g4\214\377\177\000\000\302\033T\000\000\000\000\000\260]A\000\000\000\000\000 \b&\357\002\177\000\000\260]A\000\000\000\000\000@\245\064\214\377\177\000\000\340g4\214\377\177\000\000\200-)\357\002\177\000\000\200h4\214\000\000\000\000\340\000&\357\002\177\000\000\000\000\000\000\000\000\000\000 \b&\357\002\177\000\000\260]A\000\000\000\000\000@\245\064\214\377\177\000\000\360j4\214\377\177\000\000\001\225U\000\000\000\000\000\300"... fnbuf = "/var/run/kamailio/.postgresql/postgresql.key\000\177\000\000\360c4\214\377\177\000\000\212\332a\000\000\000\000\000(\000\000\000\004\000\000\000\250\324\335\252\002\177\000\000\000`\255\252\002\177\000\000\340\000&\357\002\177\000\000\001\000\000\000\000\000\000\000\340\324\335\252\002\177\000\000\000\000\000\000\000\000\000\000\260c4\214\002\000\000\000\064\253a\000\022\001\000\000\340\000&\357\002\177", '\000' <repeats 18 times>, "\020\000\000\000\000\000\000\000\000`\255\252\003\000\000\000\004\000\000\000\000\000\000\000\360c4\214\377\177\000\000g\002\000\000\000\000\000\000\030d\324\352\002\177\000\000\000\000\000\000\000\000\000\000\020d4\214\377\177\000\000\352\252a\000\022\001\000\000\030d\324\352\002\177\000\000\340\065\324\352\004\000\000\000"... sebuf = "|Ak\362\002\177\000\000\002\000\000\000\n\000\000\000@k4\214\377\177\000\000\020" \253\002\177\000\000\000\000\000\000\000\000\000\000\240r\333\252\002\177\000\000p" \253\002\177\000\000p" \253\264\001\000\000|Ak\362\002\177\000\000\002\000\000\000\006\000\000\000\000\000\000\000\377\177\000\000w\271\320\352\002\177\000\000\340\065\324\352\022\001\000\000\240r\333\252\002\177\000\000\000\000\000\000\000\000\000\000p" \253\002\177\000\000\240}\222\362\002\177\000\000\063\364\\362\002\177\000\000\240k4\214\377\177\000\000p\000\000\000\000\000\000\000\260k4\214\377\177\000\000\020" \253\002\177\000\000\240}\222\362\002\177\000\000p" \253\002\177\000\000p\024\376\000\000\000\000\000\360m4\214\377\177\000\000\020" \253\002\177\000\000\300\005 \253\002\177\000\000\006\000\000\000\000\000\000\000p\024\376\000\000\000\000\000\020" \253\002\177\000\000\202"... have_homedir = <value optimized out> have_cert = <value optimized out> pkey = 0x0 #12 0x00007f02f2305996 in pqsecure_open_client (conn=0xfe1150) at fe-secure.c:299 rc = <value optimized out> #13 0x00007f02f22f5566 in PQconnectPoll (conn=0xfe1150) at fe-connect.c:2203 pollres = <value optimized out> res = <value optimized out> sebuf = "\000\000\000\000\000\000\000\000@", '\000' <repeats 31 times>, "\002\000\000\000\060\000\000\000[\000\000\000|\000\000\000w\000\000\000n\000\000\000[\000\000\000|\000\000\000\360\024\376\000\000\000\000\000\002\000\000\000\000\000\000\000\200n\354\361\002\177\000\000\350\021\376\000\000\000\000\000\020\251\376\000\000\000\000\000P\021\376\000\000\000\000\000\261\026\273\361\002\177\000\000\360\024\376\000\000\000\000\000\002\000\000\000\000\000\000\000\000%Q\362\002\177\000\000\222~\273\361\002\177\000\000p\024\376\000\000\000\000\000\070\255\376\000\000\000\000\000\020\251\376\000\000\000\000\000\320\217\064\214\377\177\000\000\270\251\376\000\000\000\000\000h8f/\220;\241g\020\251\376\000\000\000\000\000\004\000\000\000\000\000\000\000\320\217\064\214\377\177\000\000\270\251\376", '\000' <repeats 13 times>, "h8f/\220;\241g" optval = 0 #14 0x00007f02f22f657e in connectDBComplete (conn=0xfe1150) at fe-connect.c:1596 flag = <value optimized out> finish_time = -1 #15 0x00007f02f22f72bf in PQconnectdbParams (keywords=<value optimized out>, values=<value optimized out>, expand_dbname=<value optimized out>) at fe-connect.c:462 conn = 0xfe1150 #16 0x00007f02eeefe8e7 in db_postgres_new_connection (id=0x7f02ef203850) at km_pg_con.c:105 ptr = 0x7f02ef26fd10 ports = 0x0 i = 4 keywords = {0x7f02eef21e24 "host", 0x7f02eef21e29 "dbname", 0x7f02eef21e30 "user", 0x7f02eef21e35 "password", 0x0, 0x7f02eeabc844 "H\213E\310H\213\020H\213E\270\213@\bH\230H\215\004\002\306", 0x7f02ef1e2a28 "postgres", 0x7f02ef1e2a28 "postgres", 0x415db0 "1\355I\211\321^H\211\342H\203\344\360PTI\307\300\060Tn", 0x1268c34a540 <Address 0x1268c34a540 out of bounds>} values = {0x7f02ef150788 "localhost", 0x7f02ef151338 "asterisk", 0x7f02ef2009b8 "asterisk", 0x7f02ef14fa48 "password", 0x0, 0x7f02ef203850 "\230\070 \357\002\177", 0x7f02ef12d268 "", 0x0, 0x500000011 <Address 0x500000011 out of bounds>, 0x260000002f <Address 0x260000002f out of bounds>} to = "\343S\026\357\002\177\000\000\210\070 \357\002\177\000" __FUNCTION__ = "db_postgres_new_connection" #17 0x00007f02eeab7927 in db_do_init2 (url=0x7f02ef165488, new_connection=0x7f02eeefd272 <db_postgres_new_connection>, pooling=DB_POOLING_PERMITTED) at db.c:312 id = 0x7f02ef203850 con = 0x0 res = 0x7f02ef1723b0 con_size = 32 __FUNCTION__ = "db_do_init2" #18 0x00007f02eeab69b4 in db_do_init (url=0x7f02ef165488, new_connection=0x7f02eeefd272 <db_postgres_new_connection>) at db.c:265 No locals. #19 0x00007f02eeeef4f2 in db_postgres_init (_url=0x7f02ef165488) at km_dbase.c:137 No locals. #20 0x00007f02ee89ab3b in sql_connect () at sql_api.c:160 sc = 0x7f02ef165470 __FUNCTION__ = "sql_connect" #21 0x00007f02ee8a966a in child_init (rank=6) at sqlops.c:146 No locals. #22 0x0000000000594061 in init_mod_child (m=0x7f02ef151ff8, rank=6) at sr_module.c:898 __FUNCTION__ = "init_mod_child" #23 0x0000000000593d9c in init_mod_child (m=0x7f02ef152538, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #24 0x0000000000593d9c in init_mod_child (m=0x7f02ef152760, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #25 0x0000000000593d9c in init_mod_child (m=0x7f02ef152e38, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #26 0x0000000000593d9c in init_mod_child (m=0x7f02ef153400, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #27 0x0000000000593d9c in init_mod_child (m=0x7f02ef154ce0, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #28 0x0000000000593d9c in init_mod_child (m=0x7f02ef1557e8, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #29 0x0000000000593d9c in init_mod_child (m=0x7f02ef155cf8, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #30 0x0000000000593d9c in init_mod_child (m=0x7f02ef156670, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #31 0x0000000000593d9c in init_mod_child (m=0x7f02ef15b600, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #32 0x0000000000593d9c in init_mod_child (m=0x7f02ef15ba20, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #33 0x0000000000593d9c in init_mod_child (m=0x7f02ef15bcf8, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #34 0x0000000000593d9c in init_mod_child (m=0x7f02ef15c438, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #35 0x0000000000593d9c in init_mod_child (m=0x7f02ef15d120, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #36 0x0000000000593d9c in init_mod_child (m=0x7f02ef15da48, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #37 0x0000000000593d9c in init_mod_child (m=0x7f02ef15df68, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #38 0x0000000000593d9c in init_mod_child (m=0x7f02ef15e2e0, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #39 0x0000000000593d9c in init_mod_child (m=0x7f02ef15e7b8, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #40 0x0000000000593d9c in init_mod_child (m=0x7f02ef15fd30, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #41 0x0000000000593d9c in init_mod_child (m=0x7f02ef160230, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #42 0x0000000000593d9c in init_mod_child (m=0x7f02ef160568, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #43 0x0000000000593d9c in init_mod_child (m=0x7f02ef160ad8, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #44 0x0000000000593d9c in init_mod_child (m=0x7f02ef160f30, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #45 0x0000000000593d9c in init_mod_child (m=0x7f02ef162eb0, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #46 0x0000000000593d9c in init_mod_child (m=0x7f02ef163228, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #47 0x0000000000593d9c in init_mod_child (m=0x7f02ef1637b0, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #48 0x0000000000593d9c in init_mod_child (m=0x7f02ef163af0, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #49 0x0000000000593d9c in init_mod_child (m=0x7f02ef164628, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #50 0x0000000000593d9c in init_mod_child (m=0x7f02ef1649e0, rank=6) at sr_module.c:895 __FUNCTION__ = "init_mod_child" #51 0x000000000059436b in init_child (rank=6) at sr_module.c:924 No locals. #52 0x00000000004ea7d4 in fork_process (child_id=6, desc=0x7fff8c34a210 "udp receiver child=1 sock=127.0.0.1:5060", make_sock=1) at pt.c:327 pid = 0 child_process_no = 6 ret = -1 new_seed1 = 436139511 new_seed2 = 2008969640 sockfd = {19, 20} __FUNCTION__ = "fork_process" #53 0x00000000004a7cae in main_loop () at main.c:1563 i = 1 pid = 1500 si = 0x7f02ef14f4d0 si_desc = "udp receiver child=1 sock=127.0.0.1:5060\000\344\255\252\002\177\000\000\200\242\064\214\377\177\000\000+\216N\000\000\000\000\000\001\000\000\000\377\177\000\000\260\062\335\252\002\177\000\000\220a\324\352\002\177\000\000\022\000\000\000\002\177\000\000\360\065\324\352\002\177\000\000\003\000\000\000\001\000\000\000\320\242\064\214\377\177\000\000\340\020\000\020\000\000\000" nrprocs = 4 __FUNCTION__ = "main_loop" #54 0x00000000004ae302 in main (argc=11, argv=0x7fff8c34a548) at main.c:2533 cfg_stream = 0xf6b010 c = -1 r = 0 tmp = 0x7fff8c34af10 "" tmp_len = 0 port = -1 proto = 0 options = 0x704fe8 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:" ret = -1 seed = 3091819154 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x40d892 p = 0x1 <Address 0x1 out of bounds> __FUNCTION__ = "main"
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/292
Just some updates.
The "out of bounds" problems in the above trace are caused by pointers not being initialized properly, containing random values, We added the following lines before the pointers are being used.
memset(keywords, 0,(sizeof(char *) * 10)); memset(values, 0,(sizeof(char *) * 10)); memset(to, 0, (sizeof(char) * 16));
This seems to get rid of the "out of bounds" in the trace. However, we still see intermittent core dump with the some of the forked child processes. When the core dump happens, netstats shows the tls connection for the child processes in time_wait state. After some time, those connections seem to be recover to established state. If a call is handled by the dead child, it will fail, But other calls still go through. The dead child would produce a core dump file.
Hope the experts here can provide some insights on what we should try next.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/292#issuecomment-133043682
I pushed the memset fixes to master branch. Can you give the backtrace of the new core after adding those lines?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/292#issuecomment-133649381
Any update on this?
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/292#issuecomment-137696760
Yes. It seems the core dumps are related to the timing when the child processes are brought up.
After some tweaking, we added:
fork_delay=100000 modinit_delay=100000
in kamailio.cfg. And haven't seen any core dumps for some time. Will test more and let you know if we find anything new. We can close this issue now.
Thanks
On Fri, Sep 4, 2015 at 5:18 AM, Daniel-Constantin Mierla < notifications@github.com> wrote:
Any update on this?
— Reply to this email directly or view it on GitHub https://github.com/kamailio/kamailio/issues/292#issuecomment-137696760.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/292#issuecomment-137814655
If you had to enable those parameters, it could be selinux rate policy, killing the app when opening new connections at higher rate than it allows.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/292#issuecomment-138059977
Closed #292.
--- Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/292#event-402116774