Dear Kamailians!
I’m trying to figure out DMQ and see how it reacts to server outages and downtime - the base protocol.
I’ve configured three nodes on different ports and run them locally. All have the others as notification peers plus one that is disabled. They are configured like this with transport:
modparam("dmq", "notification_address", "sip:192.168.40.107:5060;transport=udp")
It quickly turns out that DMQ has every node twice - when sending ut “notification_peer” messages, it sends all six.
KDMQ sip:notification_peer@192.168.40.107:5070 SIP/2.0 Via: SIP/2.0/UDP 192.168.40.107:5080;branch=z9hG4bK96bc.875f2460000000000000000000000000.0 To: sip:notification_peer@192.168.40.107:5070 From: sip:notification_peer@192.168.40.107:5080;tag=e0f3ecb7441aaf62d1734fc723545165-9deb0be2 CSeq: 10 KDMQ Call-ID: 0024de987491cd3e-30304@192.168.40.107 Content-Length: 234 User-agent: Kamailio Server 02 DMQtest Max-Forwards: 1 Content-Type: text/plain
sip:192.168.40.107:5080;status=active sip:192.168.40.107:5060;status=active sip:192.168.40.107:5090;status=active sip:192.168.40.107:5090;status=active sip:192.168.40.107:5060;status=active sip:192.168.40.107:5080;status=active
When I shut down one, it sends out a “disabled notification” where the other nodes responds that the other (copy) on the same IP/port is still active.
Looking into dmq with kamctl I see that three have transport “udp” and three have transport “*” (asterisk).
I don’t see transport being given as parameter in the notification_peer messages.
It seems to me like there’s a bug in here somewhere. Or is this the way it’s supposed to work?
/O
On 30 Dec 2022, at 14:26, Olle E. Johansson oej@edvina.net wrote:
Dear Kamailians!
I’m trying to figure out DMQ and see how it reacts to server outages and downtime - the base protocol.
I’ve configured three nodes on different ports and run them locally. All have the others as notification peers plus one that is disabled. They are configured like this with transport:
modparam("dmq", "notification_address", "sip:192.168.40.107:5060;transport=udp”)
It seems like “transport=udp” is not supported. Only “transport=tls” is documented.
/O
It quickly turns out that DMQ has every node twice - when sending ut “notification_peer” messages, it sends all six.
KDMQ sip:notification_peer@192.168.40.107:5070 SIP/2.0 Via: SIP/2.0/UDP 192.168.40.107:5080;branch=z9hG4bK96bc.875f2460000000000000000000000000.0 To: sip:notification_peer@192.168.40.107:5070 From: sip:notification_peer@192.168.40.107:5080;tag=e0f3ecb7441aaf62d1734fc723545165-9deb0be2 CSeq: 10 KDMQ Call-ID: 0024de987491cd3e-30304@192.168.40.107 Content-Length: 234 User-agent: Kamailio Server 02 DMQtest Max-Forwards: 1 Content-Type: text/plain
sip:192.168.40.107:5080;status=active sip:192.168.40.107:5060;status=active sip:192.168.40.107:5090;status=active sip:192.168.40.107:5090;status=active sip:192.168.40.107:5060;status=active sip:192.168.40.107:5080;status=active
When I shut down one, it sends out a “disabled notification” where the other nodes responds that the other (copy) on the same IP/port is still active.
Looking into dmq with kamctl I see that three have transport “udp” and three have transport “*” (asterisk).
I don’t see transport being given as parameter in the notification_peer messages.
It seems to me like there’s a bug in here somewhere. Or is this the way it’s supposed to work?
/O _______________________________________________ Kamailio (SER) - Development Mailing List To unsubscribe send an email to sr-dev-leave@lists.kamailio.org
After removing the “transport=udp” from the config, it looks better, but some nodes are still duplicated in the messages going back and forth. In memory, as checked with dmq.list_nodes, it’s still only the proper ones without any duplication.
/O
On 30 Dec 2022, at 15:11, Olle E. Johansson oej@edvina.net wrote:
On 30 Dec 2022, at 14:26, Olle E. Johansson oej@edvina.net wrote:
Dear Kamailians!
I’m trying to figure out DMQ and see how it reacts to server outages and downtime - the base protocol.
I’ve configured three nodes on different ports and run them locally. All have the others as notification peers plus one that is disabled. They are configured like this with transport:
modparam("dmq", "notification_address", "sip:192.168.40.107:5060;transport=udp”)
It seems like “transport=udp” is not supported. Only “transport=tls” is documented.
/O
It quickly turns out that DMQ has every node twice - when sending ut “notification_peer” messages, it sends all six.
KDMQ sip:notification_peer@192.168.40.107:5070 SIP/2.0 Via: SIP/2.0/UDP 192.168.40.107:5080;branch=z9hG4bK96bc.875f2460000000000000000000000000.0 To: sip:notification_peer@192.168.40.107:5070 From: sip:notification_peer@192.168.40.107:5080;tag=e0f3ecb7441aaf62d1734fc723545165-9deb0be2 CSeq: 10 KDMQ Call-ID: 0024de987491cd3e-30304@192.168.40.107 Content-Length: 234 User-agent: Kamailio Server 02 DMQtest Max-Forwards: 1 Content-Type: text/plain
sip:192.168.40.107:5080;status=active sip:192.168.40.107:5060;status=active sip:192.168.40.107:5090;status=active sip:192.168.40.107:5090;status=active sip:192.168.40.107:5060;status=active sip:192.168.40.107:5080;status=active
When I shut down one, it sends out a “disabled notification” where the other nodes responds that the other (copy) on the same IP/port is still active.
Looking into dmq with kamctl I see that three have transport “udp” and three have transport “*” (asterisk).
I don’t see transport being given as parameter in the notification_peer messages.
It seems to me like there’s a bug in here somewhere. Or is this the way it’s supposed to work?
/O _______________________________________________ Kamailio (SER) - Development Mailing List To unsubscribe send an email to sr-dev-leave@lists.kamailio.org
Kamailio (SER) - Development Mailing List To unsubscribe send an email to sr-dev-leave@lists.kamailio.org
On Fri, 2022-12-30 at 14:26 +0100, Olle E. Johansson wrote:
Dear Kamailians!
I’m trying to figure out DMQ and see how it reacts to server outages and downtime - the base protocol.
<snip>
Same here. The best I could figure is "it works when it works, until it doesn't" In our tests, the "edge cases" got weird.
Anyone who wants to explain how DMQ works in failure scenarios - there are two of us who are interested!
On 30 Dec 2022, at 15:14, Nathan Angelacos nangel@tetrasec.net wrote:
On Fri, 2022-12-30 at 14:26 +0100, Olle E. Johansson wrote:
Dear Kamailians!
I’m trying to figure out DMQ and see how it reacts to server outages and downtime - the base protocol.
<snip>
Same here. The best I could figure is "it works when it works, until it doesn't" In our tests, the "edge cases" got weird.
Anyone who wants to explain how DMQ works in failure scenarios - there are two of us who are interested!
There’s a core protocol that updates each node about node status - active, pending, disabled, timeout - according to the source code. I haven’t seen timeout in my tests.
On top of that there’s other implementations using this - htable and usrloc are two examples - and each app have implemented their own protocol on top of the DMQ bus.
I’m trying to focus on the core protocol at this point.
/O