Am 18.05.2015 um 14:36 schrieb Daniel-Constantin
Mierla:
On 18/05/15 13:31, Klaus Darilion wrote:
On 15.05.2015 22:48, Daniel-Constantin Mierla
wrote:
On 15/05/15 21:55, Klaus Darilion wrote:
> Am 15.05.2015 um 17:54 schrieb Klaus Darilion:
>> Hi Daniel!
>>
>> On 08.05.2015 15:33, Daniel-Constantin Mierla wrote:
>>> Hello,
>>>
>>> On 08/05/15 15:13, Klaus Darilion wrote:
>>>> Hi!
>>>>
>>>> (Kamailio 4.2.2 on Debian Squeeze with libmysqlclient16 5.1.73)
>>>>
>>>> I have a problem that a lost TCP connection is not detected by
>>>> Kamailio/MySQL-CLient and this blocks the Kamailio process for 15
>>>> minutes until Linux reports the TCP connection as broken.
>>>>
>>>> I am using db_mysql with the default settings. Thus, according
>>>> to the
>>>> db_mysql README, the default timeout_interval() of 2 seconds
>>>> should
>>>> be used.
>>>>
>>>> As this is not the case I took a look at the code and got
>>>> confused: it
>>>> seems there are 2 different mysql modules in modules/db_mysql,
>>>> eg: the
>>>> interface is defined in mysql_mod.c and km_db_mysql.c. Which
>>>> one is
>>>> actually used? I see they have different timeout setting code.
>>>>
>>>> Any hints on why the timeout is not working?
>>> the files starting with km_* are used by kamailio modules. The
>>> other
>>> are
>>> for db api v2, used by few modules inherited from ser (those
>>> prefixed
>>> with uid_*).
>> I checked the code and the timeouts are correctly initialized.
>>
>> During all my tests Kamailio behaves correct and the lost TCP
>> connection
>> is detected after the configured timeout.
>>
>> But what drives my nuts is: it does not work on my production
>> server. Do
>> you have any ideas what could be the case that the timeout is not
>> detected?
> I think it is related to the debugger module. First tests show that
> the mysql timeout works if the debugger module is not loaded. Is
> there
> a known issue, e.g. with signal handlers, when the debugger module is
> loaded?
>
I am not aware of issues and I don't remember debugger module
overwriting any signal handler. What parameters do you set for
debugger
module?
It seems that I was wrong. After I enabled the debugger module
again, it
still works. Maybe it only worked due to the restart and the problem
starts only after a certain uptime.
Once I connected with strace to the process which waited for a mysql
timeout and the timeout did not worked. The strace output was (a few
times per second):
sigreturn() = ? (mask now [])
read(6, 0xa395f60, 16384) = -1 EINTR (Interrupted
system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn() = ? (mask now [])
read(6, 0xa395f60, 16384) = -1 EINTR (Interrupted
system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn() = ? (mask now [])
read(6, 0xa395f60, 16384) = -1 EINTR (Interrupted
system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
Do you have an idea what that could be related too?
Just a guess now, it might be
the non-blocking read inside the
libmysqlclient.
Very strange: This is an strace on the timer process (failure
route)
with a good DB connection:
22:51:51.042122 write(6, "e\0\0\0\3select value,attribute,type"...,
105) = 105
22:51:51.042188 read(6,
"\1\0\0\1\3F\0\0\2\3def\10kamailio\17usr_prefe"..., 16384) = 251
This is an strace on the timer process (failure route) with a broken
DB connection:
22:54:37.042057 poll([{fd=6, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
22:54:37.042092 write(6, "e\0\0\0\3select value,attribute,type"...,
105) = 105
22:54:37.042154 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted
system call)
22:54:37.103256 --- SIGALRM (Alarm clock) @ 0 (0) ---
22:54:37.103314 sigreturn() = ? (mask now [])
22:54:37.103360 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted
system call)
22:54:37.165745 --- SIGALRM (Alarm clock) @ 0 (0) ---
22:54:37.165800 sigreturn() = ? (mask now [])
22:54:37.165843 read(6, 0x9cabb28, 16384) = -1 EINTR (Interrupted
system call)
22:54:37.228245 --- SIGALRM (Alarm clock) @ 0 (0) ---
...
The strange thing is, that this does not happen always. Sometimes the
read just returns EAGAIN and then the read is tried 3 times and the
timeout is detected after 3x2s=6 seconds. Really weird. I never had
issues with Kamailio 3.1, but now with Kamailio 4.1 (I still use the
same old libmysqlclient 5.1.73)
If I understand it correctly then the read is interruted by a EINTR
signal and it seems libmysqlclient handles this buggy and retries
endlessly (until the IP stock reports the broken TCP connection). But
who sends the SIGALRM (approx every 60 ms)? Is it Kamailio? Any ideas
somebody?
SIGALRM could be issued for the timer process.
Cheers,
Daniel
--
Daniel-Constantin Mierla