Greetings,
I've run into a bug in Kamailio 1.4.4 where, if using the permissions module with db_mysql the database becomes unavailable, Kamailio will crash (SIGSEGV) somewhat arbitrarily after about 30-45 seconds. The process is a little random; sometimes it can take longer.
At first, I suspected the problem may be with 1.4.x, but got the exact same effect when trying 1.5.2. I also assumed the problem may be with db_mysql or even libmysqlclient, but when I remove the permissions functionality from the configuration the process does not crash. This is a configuration from the 1.4.x era with a lot of custom DB queries via avp_db_query() (pre-sqlops); when the database disappears, Kamailio will complain loudly in the logs, of course, but it will not crash unless I am calling allow_trusted() as part of the configuration and am using the permissions module with database backing.
The test system is CentOS 4.7 64-bit, but I've reproduced the problem on Debian lenny and unstable just as easily with libmysqlclient15.
Here are my steps for producing the bug:
1) Initialise Kamailio (the only modules that use database are permissions and avpops);
2) Firewall MySQL database off or shut it down;
3) Place a call - even one will do;
4) Kamailio will not crash immediately; it usually takes several request retransmissions or successive attempts to contact the database for this to happen, and it can take close to a minute. Eventually, though, there is a segfault that appears to be bubbling up from allow_trusted() in the call stack.
Here is the backtrace. IP addresses and phone numbers are sanitised to protect confidentiality, but have been substituted consistently:
(gdb) where 0 0x00000030eba21b70 in mysql_real_escape_string () from /usr/lib64/mysql/libmysqlclient.so.15 1 0x0000002a95be64c1 in db_mysql_val2str (_c=0x686e60, _v=0x7fbfffd900, _s=0x6331a7 "222.22.222.22'", _len=0x7fbfffd2cc) at val.c:233 0000002 0x00000000004b2f14 in db_print_where (_c=0x686e60, _b=0x63319f "src_ip='222.22.222.22'", _l=65473, _k=0x7fbfffd440, _o=0x0, _v=0x7fbfffd900, _n=1, val2str=0x2a95be62ca <db_mysql_val2str>) at db/db_ut.c:275 0000003 0x00000000004affb8 in db_do_query (_h=0x686e60, _k=0x7fbfffd440, _op=0x0, _v=0x7fbfffd900, _c=Variable "_c" is not available. ) at db/db_query.c:78 0000004 0x0000002a95be2dbf in db_mysql_query (_h=0x0, _k=0x6331a7, _op=0x2a962492a0, _v=0xe, _c=0x7fbfffd900, _n=6844000, _nc=0, _o=0x0, _r=0x0) at dbase.c:249 0000005 0x0000002a961428c0 in allow_trusted (msg=0x687038, src_ip=0x2a962492a0 "222.22.223.23", proto=1) at trusted.c:422 0000006 0x0000002a96143356 in allow_trusted_0 (_msg=0x687038, str1=Variable "str1" is not available. ) at ../../parser/../ip_addr.h:398 0000007 0x000000000040d4df in do_action (a=0x6566f8, msg=0x687038) at action.c:850 0000008 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 0000009 0x00000000004523dc in eval_expr (e=0x6567c8, msg=0x687038, val=0x0) at route.c:1116 0000010 0x00000000004527c2 in eval_expr (e=0x656810, msg=0x687038, val=0x0) at route.c:1429 11 0x0000000000452159 in eval_expr (e=0x656858, msg=0x687038, val=0x0) at route.c:1434 12 0x000000000040c933 in do_action (a=0x656e70, msg=0x687038) at action.c:705 13 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 14 0x000000000040e95f in do_action (a=0x657f00, msg=0x687038) at action.c:728 15 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 16 0x000000000040e95f in do_action (a=0x657fd0, msg=0x687038) at action.c:728 17 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 18 0x000000000040ef30 in run_top_route (a=0x651008, msg=0x687038) at action.c:118 ---Type <return> to continue, or q <return> to quit--- 19 0x00000000004467ef in receive_msg ( buf=0x622ac0 "INVITE sip:8005551212@123.132.132.7:5060;transport=udp SIP/2.0\r\nRecord-Route: sip:222.22.223.23;lr;rpdicor=VPSF506071629460;vsf=AAAAAB0GDgULBQIABgJ3AW4CFhgKGwIbARoJNDg-\r\nRecord-Route: <sip:90.148.216.254"..., len=922, rcv_info=0x7fbffff700) at receive.c:165 0000020 0x0000000000480c44 in udp_rcv_loop () at udp_server.c:449 21 0x0000000000422b4f in main (argc=Variable "argc" is not available. ) at main.c:692
BTW, for what it's worth: these effects have not been observed when using db_postgres instead.
Alex Balashov wrote:
Greetings,
I've run into a bug in Kamailio 1.4.4 where, if using the permissions module with db_mysql the database becomes unavailable, Kamailio will crash (SIGSEGV) somewhat arbitrarily after about 30-45 seconds. The process is a little random; sometimes it can take longer.
At first, I suspected the problem may be with 1.4.x, but got the exact same effect when trying 1.5.2. I also assumed the problem may be with db_mysql or even libmysqlclient, but when I remove the permissions functionality from the configuration the process does not crash. This is a configuration from the 1.4.x era with a lot of custom DB queries via avp_db_query() (pre-sqlops); when the database disappears, Kamailio will complain loudly in the logs, of course, but it will not crash unless I am calling allow_trusted() as part of the configuration and am using the permissions module with database backing.
The test system is CentOS 4.7 64-bit, but I've reproduced the problem on Debian lenny and unstable just as easily with libmysqlclient15.
Here are my steps for producing the bug:
- Initialise Kamailio (the only modules that use database are
permissions and avpops);
Firewall MySQL database off or shut it down;
Place a call - even one will do;
Kamailio will not crash immediately; it usually takes several
request retransmissions or successive attempts to contact the database for this to happen, and it can take close to a minute. Eventually, though, there is a segfault that appears to be bubbling up from allow_trusted() in the call stack.
Here is the backtrace. IP addresses and phone numbers are sanitised to protect confidentiality, but have been substituted consistently:
(gdb) where 0 0x00000030eba21b70 in mysql_real_escape_string () from /usr/lib64/mysql/libmysqlclient.so.15 1 0x0000002a95be64c1 in db_mysql_val2str (_c=0x686e60, _v=0x7fbfffd900, _s=0x6331a7 "222.22.222.22'", _len=0x7fbfffd2cc) at val.c:233 0000002 0x00000000004b2f14 in db_print_where (_c=0x686e60, _b=0x63319f "src_ip='222.22.222.22'", _l=65473, _k=0x7fbfffd440, _o=0x0, _v=0x7fbfffd900, _n=1, val2str=0x2a95be62ca <db_mysql_val2str>) at db/db_ut.c:275 0000003 0x00000000004affb8 in db_do_query (_h=0x686e60, _k=0x7fbfffd440, _op=0x0, _v=0x7fbfffd900, _c=Variable "_c" is not available. ) at db/db_query.c:78 0000004 0x0000002a95be2dbf in db_mysql_query (_h=0x0, _k=0x6331a7, _op=0x2a962492a0, _v=0xe, _c=0x7fbfffd900, _n=6844000, _nc=0, _o=0x0, _r=0x0) at dbase.c:249 0000005 0x0000002a961428c0 in allow_trusted (msg=0x687038, src_ip=0x2a962492a0 "222.22.223.23", proto=1) at trusted.c:422 0000006 0x0000002a96143356 in allow_trusted_0 (_msg=0x687038, str1=Variable "str1" is not available. ) at ../../parser/../ip_addr.h:398 0000007 0x000000000040d4df in do_action (a=0x6566f8, msg=0x687038) at action.c:850 0000008 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 0000009 0x00000000004523dc in eval_expr (e=0x6567c8, msg=0x687038, val=0x0) at route.c:1116 0000010 0x00000000004527c2 in eval_expr (e=0x656810, msg=0x687038, val=0x0) at route.c:1429 11 0x0000000000452159 in eval_expr (e=0x656858, msg=0x687038, val=0x0) at route.c:1434 12 0x000000000040c933 in do_action (a=0x656e70, msg=0x687038) at action.c:705 13 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 14 0x000000000040e95f in do_action (a=0x657f00, msg=0x687038) at action.c:728 15 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 16 0x000000000040e95f in do_action (a=0x657fd0, msg=0x687038) at action.c:728 17 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 18 0x000000000040ef30 in run_top_route (a=0x651008, msg=0x687038) at action.c:118 ---Type <return> to continue, or q <return> to quit--- 19 0x00000000004467ef in receive_msg ( buf=0x622ac0 "INVITE sip:8005551212@123.132.132.7:5060;transport=udp SIP/2.0\r\nRecord-Route: sip:222.22.223.23;lr;rpdicor=VPSF506071629460;vsf=AAAAAB0GDgULBQIABgJ3AW4CFhgKGwIbARoJNDg-\r\nRecord-Route: <sip:90.148.216.254"..., len=922, rcv_info=0x7fbffff700) at receive.c:165 0000020 0x0000000000480c44 in udp_rcv_loop () at udp_server.c:449 21 0x0000000000422b4f in main (argc=Variable "argc" is not available. ) at main.c:692
On Mittwoch, 30. September 2009, Alex Balashov wrote:
BTW, for what it's worth: these effects have not been observed when using db_postgres instead.
Hi Alex,
ok, this is perhaps because the postgres code uses another escape function which is probably more stable against invalid input, or the escape is not used in this case at all. (It looks like your crash was in the function mysql_real_escape_string)
One reason that it needs some time to crash could be that every process has its own database handle open, and because of the blocking different processes gets the request from time to time, until it find the "right" one that cause a crash.
Do you use allow_trusted in caching mode, or does it query the db for every invocation? If the latter, i'd take a closer look to the failure handling logic in the function, perhaps it lets some variables staying in a unclean state after a failure.
Cheers,
Henning
I've run into a bug in Kamailio 1.4.4 where, if using the permissions module with db_mysql the database becomes unavailable, Kamailio will crash (SIGSEGV) somewhat arbitrarily after about 30-45 seconds. The process is a little random; sometimes it can take longer.
At first, I suspected the problem may be with 1.4.x, but got the exact same effect when trying 1.5.2. I also assumed the problem may be with db_mysql or even libmysqlclient, but when I remove the permissions functionality from the configuration the process does not crash. This is a configuration from the 1.4.x era with a lot of custom DB queries via avp_db_query() (pre-sqlops); when the database disappears, Kamailio will complain loudly in the logs, of course, but it will not crash unless I am calling allow_trusted() as part of the configuration and am using the permissions module with database backing.
The test system is CentOS 4.7 64-bit, but I've reproduced the problem on Debian lenny and unstable just as easily with libmysqlclient15.
Here are my steps for producing the bug:
- Initialise Kamailio (the only modules that use database are
permissions and avpops);
Firewall MySQL database off or shut it down;
Place a call - even one will do;
Kamailio will not crash immediately; it usually takes several
request retransmissions or successive attempts to contact the database for this to happen, and it can take close to a minute. Eventually, though, there is a segfault that appears to be bubbling up from allow_trusted() in the call stack.
Here is the backtrace. IP addresses and phone numbers are sanitised to protect confidentiality, but have been substituted consistently:
(gdb) where 0 0x00000030eba21b70 in mysql_real_escape_string () from /usr/lib64/mysql/libmysqlclient.so.15 1 0x0000002a95be64c1 in db_mysql_val2str (_c=0x686e60, _v=0x7fbfffd900, _s=0x6331a7 "222.22.222.22'", _len=0x7fbfffd2cc) at val.c:233 0000002 0x00000000004b2f14 in db_print_where (_c=0x686e60, _b=0x63319f "src_ip='222.22.222.22'", _l=65473, _k=0x7fbfffd440, _o=0x0, _v=0x7fbfffd900, _n=1, val2str=0x2a95be62ca <db_mysql_val2str>) at db/db_ut.c:275 0000003 0x00000000004affb8 in db_do_query (_h=0x686e60, _k=0x7fbfffd440, _op=0x0, _v=0x7fbfffd900, _c=Variable "_c" is not available. ) at db/db_query.c:78 0000004 0x0000002a95be2dbf in db_mysql_query (_h=0x0, _k=0x6331a7, _op=0x2a962492a0, _v=0xe, _c=0x7fbfffd900, _n=6844000, _nc=0, _o=0x0, _r=0x0) at dbase.c:249 0000005 0x0000002a961428c0 in allow_trusted (msg=0x687038, src_ip=0x2a962492a0 "222.22.223.23", proto=1) at trusted.c:422 0000006 0x0000002a96143356 in allow_trusted_0 (_msg=0x687038, str1=Variable "str1" is not available. ) at ../../parser/../ip_addr.h:398 0000007 0x000000000040d4df in do_action (a=0x6566f8, msg=0x687038) at action.c:850 0000008 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 0000009 0x00000000004523dc in eval_expr (e=0x6567c8, msg=0x687038, val=0x0) at route.c:1116 0000010 0x00000000004527c2 in eval_expr (e=0x656810, msg=0x687038, val=0x0) at route.c:1429 11 0x0000000000452159 in eval_expr (e=0x656858, msg=0x687038, val=0x0) at route.c:1434 12 0x000000000040c933 in do_action (a=0x656e70, msg=0x687038) at action.c:705 13 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 14 0x000000000040e95f in do_action (a=0x657f00, msg=0x687038) at action.c:728 15 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 16 0x000000000040e95f in do_action (a=0x657fd0, msg=0x687038) at action.c:728 17 0x000000000040ec2b in run_action_list (a=Variable "a" is not available. ) at action.c:138 18 0x000000000040ef30 in run_top_route (a=0x651008, msg=0x687038) at action.c:118 ---Type <return> to continue, or q <return> to quit--- 19 0x00000000004467ef in receive_msg ( buf=0x622ac0 "INVITE sip:8005551212@123.132.132.7:5060;transport=udp SIP/2.0\r\nRecord-Route: sip:222.22.223.23;lr;rpdicor=VPSF506071629460;vsf=AAAAAB0GDgULBQIABgJ3AW 4CFhgKGwIbARoJNDg-\r\nRecord-Route: <sip:90.148.216.254"..., len=922, rcv_info=0x7fbffff700) at receive.c:165 0000020 0x0000000000480c44 in udp_rcv_loop () at udp_server.c:449 21 0x0000000000422b4f in main (argc=Variable "argc" is not available. ) at main.c:692
Henning Westerholt wrote:
Do you use allow_trusted in caching mode, or does it query the db for every invocation? If the latter, i'd take a closer look to the failure handling logic in the function, perhaps it lets some variables staying in a unclean state after a failure.
Non-caching mode, and by design. The list changes with some frequency.
I understood from the backtrace that the crash was *in* the function mysql_real_escape_string(), but I did not necessarily assume that means the problem is with the implementation of that function in the sense of a "bug."
Perhaps the function should do a better job of validating all input it receives, yes, but the fundamental problem may be with the way that it's being invoked by the caller under that particular set of circumstances. That's assuming the problem has to do with invalid parameters to the function at all, and not some indirect cause.
I have tried very hard to recreate this problem with a few other modules that utilise the database and also call standard libmysqlclient functions (including mysql_real_escape_string(), I would imagine) such as avpops, and I cannot get it to crash. My only conclusion is that there has got to be something that the implementation of allow_trusted() is doing that is a little bit ... special.
-- Alex
alex,
i took a look at allow_trusted code and perhaps the reason for the crash is that the code does not check that dh_handle exists before making the query and db connection is closed if db query fails.
-- juha
alex,
try with the enclosed trusted.c (for k 1.5). it checks in allow_trusted() function that dh_handle exists and does not close db connection if query fails.
-- juha
Juha,
Yes, this appears to have fixed the problem. Thank you.
There also do not seem to be any harmful side-effects to backporting your trusted.c into 1.4.4.
-- Alex
Juha Heinanen wrote:
alex,
try with the enclosed trusted.c (for k 1.5). it checks in allow_trusted() function that dh_handle exists and does not close db connection if query fails.
-- juha
On Mittwoch, 30. September 2009, Alex Balashov wrote:
Do you use allow_trusted in caching mode, or does it query the db for every invocation? If the latter, i'd take a closer look to the failure handling logic in the function, perhaps it lets some variables staying in a unclean state after a failure.
Non-caching mode, and by design. The list changes with some frequency.
I understood from the backtrace that the crash was *in* the function mysql_real_escape_string(), but I did not necessarily assume that means the problem is with the implementation of that function in the sense of a "bug."
Hi Alex,
yes, perhaps i expressed myself a bit unclear.
Perhaps the function should do a better job of validating all input it receives, yes, but the fundamental problem may be with the way that it's being invoked by the caller under that particular set of circumstances. That's assuming the problem has to do with invalid parameters to the function at all, and not some indirect cause.
This is also my opinion.
I have tried very hard to recreate this problem with a few other modules that utilise the database and also call standard libmysqlclient functions (including mysql_real_escape_string(), I would imagine) such as avpops, and I cannot get it to crash. My only conclusion is that there has got to be something that the implementation of allow_trusted() is doing that is a little bit ... special.
I also suspect this, as i don't get this behaviour so far in similar test/ production scenarios.
Henning