Hi, finally I've discovered the cause of important issues with my Kamailio deployments which sometimes get frozen "with no reason":
- We use Debian Lenny with rsyslog writting the logs in local files (/var/log/kamailio.log) and also rsyslog-mysql extension to send the logs to a remote MySQL server using mysql TCP protocol.
- For "external" reason (external to me) such MySQL server sometimes fails or it's down.
- Then kamailio tries to log to syslog. rsyslog receives the request and tries to send it via MySQL. Unfortunatelly it gets frozen as the DB doesn't work, and it neither writes the log to the local log file !!! (this is really surprising to me).
- Worse is the fact the rsyslog replies nothing to kamailio process so the kamailio process gets 100% *blocked* waiting for the response (this is what I suspect).
- As a dirty workaround, stopping the local rsyslog server makes kamailio to work again.
So, I think this is mostly a bug in rsyslog-mysql as it shouldn't get frozen if the MySQL server doesn't work. However I also think that Kamailio should not depend on it, and perhaps it shouldn't wait indefinitely for the rsyslog response. What do you think? Would make sense a "ping" mechanism like with RtpProxy servers in order to enable/dissable the syslog logging?
Regards.
El Wed, 28 Jul 2010 17:29:42 +0200 Iñaki Baz Castillo ibc@aliax.net escribió:
- Then kamailio tries to log to syslog. rsyslog receives the request
and tries to send it via MySQL. Unfortunatelly it gets frozen as the DB doesn't work, and it neither writes the log to the local log file !!! (this is really surprising to me).
If the entry for mysql is before the entry for the local file, it's not surprising.
- Worse is the fact the rsyslog replies nothing to kamailio process so
the kamailio process gets 100% *blocked* waiting for the response (this is what I suspect).
- As a dirty workaround, stopping the local rsyslog server makes
kamailio to work again.
A better workaround is to have queue support in rsyslog. Messages are stored in memory and in a disk queue and they are pushed to the db when the connection is recovered.
Try something like this in your rsyslog.conf configuration:
# Memory and disk Queuing support $WorkDirectory /var/log/queue $ActionQueueType LinkedList $ActionQueueFileName SL $ActionQueueMaxDiskSpace 4g $ActionQueueSaveOnShutdown on $ActionResumeRetryCount -1
So, I think this is mostly a bug in rsyslog-mysql as it shouldn't get frozen if the MySQL server doesn't work. However I also think that Kamailio should not depend on it, and perhaps it shouldn't wait indefinitely for the rsyslog response. What do you think? Would make sense a "ping" mechanism like with RtpProxy servers in order to enable/dissable the syslog logging?
I think this is not kamailio's fault. Kamailio just uses syslog services and it's up to syslog server to be fault tolerant.
My $0,02
28 jul 2010 kl. 17.36 skrev Jon Bonilla (Manwe):
El Wed, 28 Jul 2010 17:29:42 +0200 Iñaki Baz Castillo ibc@aliax.net escribió:
- Then kamailio tries to log to syslog. rsyslog receives the request
and tries to send it via MySQL. Unfortunatelly it gets frozen as the DB doesn't work, and it neither writes the log to the local log file !!! (this is really surprising to me).
If the entry for mysql is before the entry for the local file, it's not surprising.
- Worse is the fact the rsyslog replies nothing to kamailio process so
the kamailio process gets 100% *blocked* waiting for the response (this is what I suspect).
- As a dirty workaround, stopping the local rsyslog server makes
kamailio to work again.
A better workaround is to have queue support in rsyslog. Messages are stored in memory and in a disk queue and they are pushed to the db when the connection is recovered.
Try something like this in your rsyslog.conf configuration:
# Memory and disk Queuing support $WorkDirectory /var/log/queue $ActionQueueType LinkedList $ActionQueueFileName SL $ActionQueueMaxDiskSpace 4g $ActionQueueSaveOnShutdown on $ActionResumeRetryCount -1
THis is great advice, thanks!
Found these docs: http://www.rsyslog.com/doc/queues.html
/O
2010/7/28 Jon Bonilla manwe@aholab.ehu.es:
El Wed, 28 Jul 2010 17:29:42 +0200 Iñaki Baz Castillo ibc@aliax.net escribió:
- Then kamailio tries to log to syslog. rsyslog receives the request
and tries to send it via MySQL. Unfortunatelly it gets frozen as the DB doesn't work, and it neither writes the log to the local log file !!! (this is really surprising to me).
If the entry for mysql is before the entry for the local file, it's not surprising.
- Worse is the fact the rsyslog replies nothing to kamailio process so
the kamailio process gets 100% *blocked* waiting for the response (this is what I suspect).
- As a dirty workaround, stopping the local rsyslog server makes
kamailio to work again.
A better workaround is to have queue support in rsyslog. Messages are stored in memory and in a disk queue and they are pushed to the db when the connection is recovered.
Try something like this in your rsyslog.conf configuration:
# Memory and disk Queuing support $WorkDirectory /var/log/queue $ActionQueueType LinkedList $ActionQueueFileName SL $ActionQueueMaxDiskSpace 4g $ActionQueueSaveOnShutdown on $ActionResumeRetryCount -1
Awesoming! I'll test your suggested configuration. Thanks a lot.
28 jul 2010 kl. 17.29 skrev Iñaki Baz Castillo:
Hi, finally I've discovered the cause of important issues with my Kamailio deployments which sometimes get frozen "with no reason":
- We use Debian Lenny with rsyslog writting the logs in local files
(/var/log/kamailio.log) and also rsyslog-mysql extension to send the logs to a remote MySQL server using mysql TCP protocol.
- For "external" reason (external to me) such MySQL server sometimes
fails or it's down.
- Then kamailio tries to log to syslog. rsyslog receives the request
and tries to send it via MySQL. Unfortunatelly it gets frozen as the DB doesn't work, and it neither writes the log to the local log file !!! (this is really surprising to me).
- Worse is the fact the rsyslog replies nothing to kamailio process so
the kamailio process gets 100% *blocked* waiting for the response (this is what I suspect).
- As a dirty workaround, stopping the local rsyslog server makes
kamailio to work again.
So, I think this is mostly a bug in rsyslog-mysql as it shouldn't get frozen if the MySQL server doesn't work. However I also think that Kamailio should not depend on it, and perhaps it shouldn't wait indefinitely for the rsyslog response. What do you think? Would make sense a "ping" mechanism like with RtpProxy servers in order to enable/dissable the syslog logging?
Syslog is defined in http://tools.ietf.org/html/rfc5424, but I can't find anything there about this situation. It's a simplex protocol, so the implementation should propably handle this situation, as you say, rsyslog is at fault here.
If it does, we need to figure out a way so that Kamailio doesn't lock up if this happens, not logging in the same process or thread or something...
/O
2010/7/28 Olle E. Johansson oej@edvina.net:
Syslog is defined in http://tools.ietf.org/html/rfc5424, but I can't find anything there about this situation. It's a simplex protocol, so the implementation should propably handle this situation
Note that we don't use the "syslog remote protocol" as defined in RFC 5424. Kamailio logs to the local rsyslog server and rsyslog logs to a local file(s) and also performs a SQL query to store the log in a MySQL server.
If it does, we need to figure out a way so that Kamailio doesn't lock up if this happens, not logging in the same process or thread or something...
Being a multiprocess blocking design it would be a bit difficult such solution. Perhaps the "ping syslog" mechanism I suggested in my initial mail :)
28 jul 2010 kl. 17.44 skrev Iñaki Baz Castillo:
2010/7/28 Olle E. Johansson oej@edvina.net:
Syslog is defined in http://tools.ietf.org/html/rfc5424, but I can't find anything there about this situation. It's a simplex protocol, so the implementation should propably handle this situation
Note that we don't use the "syslog remote protocol" as defined in RFC 5424. Kamailio logs to the local rsyslog server and rsyslog logs to a local file(s) and also performs a SQL query to store the log in a MySQL server.
Ahh.
If it does, we need to figure out a way so that Kamailio doesn't lock up if this happens, not logging in the same process or thread or something...
Being a multiprocess blocking design it would be a bit difficult such solution. Perhaps the "ping syslog" mechanism I suggested in my initial mail :)
I thought it wasn't easy... But still, we need some sort of solution since this can affect a lot of systems.
/O
2010/7/28 Olle E. Johansson oej@edvina.net:
Being a multiprocess blocking design it would be a bit difficult such solution. Perhaps the "ping syslog" mechanism I suggested in my initial mail :)
I thought it wasn't easy... But still, we need some sort of solution since this can affect a lot of systems.
Let me suggest this proposal:
- Having a dedicated kamailio process to act as syslogger.
- Other kamailio processes send logging messages to such syslogger process by using a posix queue fifo which is non blocking. This means that kamailio workers would never wait for the "syslog()" call to return.
- Posix mesage queues implement a configurable queue. But also, the syslogger process could manage its own queue (by using an array as a FIFO system) so messages received by the posix queue would be inserted in the FIFO and extracted from there when they are logged to syslog.
- In this way, if syslog daemon gets frozen kamailio workers wouldn't realize of it. Some logs would be lost in case the FIFO gets full but this is not so important as getting Kamailio non processing SIP messages :)
Opinnions? Do you consider it feasible?
28 jul 2010 kl. 22.18 skrev Iñaki Baz Castillo:
2010/7/28 Olle E. Johansson oej@edvina.net:
Being a multiprocess blocking design it would be a bit difficult such solution. Perhaps the "ping syslog" mechanism I suggested in my initial mail :)
I thought it wasn't easy... But still, we need some sort of solution since this can affect a lot of systems.
Let me suggest this proposal:
Having a dedicated kamailio process to act as syslogger.
Other kamailio processes send logging messages to such syslogger
process by using a posix queue fifo which is non blocking. This means that kamailio workers would never wait for the "syslog()" call to return.
- Posix mesage queues implement a configurable queue. But also, the
syslogger process could manage its own queue (by using an array as a FIFO system) so messages received by the posix queue would be inserted in the FIFO and extracted from there when they are logged to syslog.
- In this way, if syslog daemon gets frozen kamailio workers wouldn't
realize of it. Some logs would be lost in case the FIFO gets full but this is not so important as getting Kamailio non processing SIP messages :)
Opinnions? Do you consider it feasible?
Other people are better judges of the architecture, but I do agree with the results. We can accept syslog messages getting lost, as long as Kamailio keeps running the SIP handling processes.
/O
El Wed, 28 Jul 2010 22:27:26 +0200 "Olle E. Johansson" oej@edvina.net escribió:
28 jul 2010 kl. 22.18 skrev Iñaki Baz Castillo:
2010/7/28 Olle E. Johansson oej@edvina.net:
Being a multiprocess blocking design it would be a bit difficult such solution. Perhaps the "ping syslog" mechanism I suggested in my initial mail :)
I thought it wasn't easy... But still, we need some sort of solution since this can affect a lot of systems.
Let me suggest this proposal:
Having a dedicated kamailio process to act as syslogger.
Other kamailio processes send logging messages to such syslogger
process by using a posix queue fifo which is non blocking. This means that kamailio workers would never wait for the "syslog()" call to return.
- Posix mesage queues implement a configurable queue. But also, the
syslogger process could manage its own queue (by using an array as a FIFO system) so messages received by the posix queue would be inserted in the FIFO and extracted from there when they are logged to syslog.
- In this way, if syslog daemon gets frozen kamailio workers wouldn't
realize of it. Some logs would be lost in case the FIFO gets full but this is not so important as getting Kamailio non processing SIP messages :)
Opinnions? Do you consider it feasible?
Other people are better judges of the architecture, but I do agree with the results. We can accept syslog messages getting lost, as long as Kamailio keeps running the SIP handling processes.
I do not know from the programmer point of view if this approach is feasible. But kamailio is never the only process running in a system. Other processes as cron, a mta, ssh server (at least) run in the same machine. If syslog is blocked, you might get a frozen system anyways. Almost every service relies on syslog, which makes this solution incomplete.
I think that the solution to this problem is in syslog's side. It's a critical service in any system and this solution would just mask the problem.
2010/7/28 Jon Bonilla manwe@aholab.ehu.es:
I do not know from the programmer point of view if this approach is feasible.
I've coded a very similar solution for a personal project and works really well under scenarios with high traffic so I consider it feasible, I just wonder if it's feasible from the current Kamailio design point of view.
But kamailio is never the only process running in a system. Other processes as cron, a mta, ssh server (at least) run in the same machine. If syslog is blocked, you might get a frozen system anyways. Almost every service relies on syslog, which makes this solution incomplete.
Sure, but anyway I don't care ssh, a mta (never running in my kamailio servers!) or cron (just to rotate logs, no more). If kamailio remains working that's what I need :)
I think that the solution to this problem is in syslog's side. It's a critical service in any system and this solution would just mask the problem.
Don't think in my suggestion just from the point of view of "reliability" but also think in "efficience":
Currently a kamailio worker must wait for the syslog() call to return before continuing processing the SIP message (blocking design). Using a posix queue the worker sends the log to it which is not a blocking call. Just the kamailio syslogger process should have to deal with syslog daemon.
Regards.
El Thu, 29 Jul 2010 00:08:54 +0200 Iñaki Baz Castillo ibc@aliax.net escribió:
But kamailio is never the only process running in a system. Other processes as cron, a mta, ssh server (at least) run in the same machine. If syslog is blocked, you might get a frozen system anyways. Almost every service relies on syslog, which makes this solution incomplete.
Sure, but anyway I don't care ssh, a mta (never running in my kamailio servers!) or cron (just to rotate logs, no more). If kamailio remains working that's what I need :)
What your're suggesting here is having queues in kamailio instead of having them in syslog. You're trying to fix a syslog problem from kamailio side.
If syslog block because it's mesg buffer is full, kamailio will do what syslog should do. This solution only works for kamailio, and you're in risk of getting a frozen system anyways, no warranty of having a working kamailio.
Example: You have a cronjob every 5 minutes. This cronjob cannot finish because of the unresponsive syslog. You'll get a new cron process every 5 minutes until you no more processes can be called.
Don't think in my suggestion just from the point of view of "reliability" but also think in "efficience":
Ok. This is a different point of view. That "efficience" word should now trigger an "Andrei respawn" :)
On Wednesday 28 July 2010, Iñaki Baz Castillo wrote:
Being a multiprocess blocking design it would be a bit difficult such solution. Perhaps the "ping syslog" mechanism I suggested in my initial mail :)
I thought it wasn't easy... But still, we need some sort of solution since this can affect a lot of systems.
Let me suggest this proposal: [..]
Hi Iñaki,
in fact kamailio have already such a solution since 1.5.0:
- non-blocking logging * support for non-blocking log writes to prevent blocking of the server in certain error situations * can be enabled by setting the SYSLOG_ASYNC define in the Makefile.defs, the default is the old implementation
We implemented it sometimes in the past because of a somewhat similar issue. There is maybe still one bug left (need to dig for the report..), but in general it should work fine. I'm not sure if we ported this to sr, but should be not that hard.
Cheers,
Henning
2010/7/29 Henning Westerholt henning.westerholt@1und1.de:
Hi Iñaki,
in fact kamailio have already such a solution since 1.5.0:
- non-blocking logging
* support for non-blocking log writes to prevent blocking of the server in certain error situations * can be enabled by setting the SYSLOG_ASYNC define in the Makefile.defs, the default is the old implementation
We implemented it sometimes in the past because of a somewhat similar issue. There is maybe still one bug left (need to dig for the report..), but in general it should work fine. I'm not sure if we ported this to sr, but should be not that hard.
Ops, I didn't know it! Do you use it in production?
PS: The problem for me is that changes into Makefile.defs are not enabled when doing "make deb" :(
On Thursday 29 July 2010, Iñaki Baz Castillo wrote:
in fact kamailio have already such a solution since 1.5.0:
- non-blocking logging
- support for non-blocking log writes to prevent blocking of the server
in certain error situations
- can be enabled by setting the SYSLOG_ASYNC define in the
Makefile.defs, the default is the old implementation
We implemented it sometimes in the past because of a somewhat similar issue. There is maybe still one bug left (need to dig for the report..), but in general it should work fine. I'm not sure if we ported this to sr, but should be not that hard.
Ops, I didn't know it! Do you use it in production?
Hi Iñaki,
we planned to use it during some time were we needed some extensive logging on balancer class systems. But in the end we don't needed it, and don't use it so far. But the implementation should be straightforward, its a bit easier than you suggested - it just throw away log lines if some internal buffer is filled.
Cheers,
Henning
2010/7/29 Henning Westerholt henning.westerholt@1und1.de:
Ops, I didn't know it! Do you use it in production?
Hi Iñaki,
we planned to use it during some time were we needed some extensive logging on balancer class systems. But in the end we don't needed it, and don't use it so far. But the implementation should be straightforward, its a bit easier than you suggested - it just throw away log lines if some internal buffer is filled.
Ok, thanks a lot.