Hello,
I pushed two patches to prevent the crash, even the modules is not used
as expected in the config.
Charles: can you check and see if both makes sense? The one in
worker_loop() function is to prevent the crash:
*
This should be backported if all goes fine with it.
The second one in empty_peer_callback() is to generated a 202-Accepted
response, otherwise in such cases the sender will do retransmissions:
*
But maybe it was on purpose not to send a response (i.e., to allow
sending the response from config), in such case it can be reverted.
Cheers,
Daniel
On 24.04.20 20:57, Charles Chance wrote:
Hi,
Did you try the config snippet I provided?
Basically dmq_handle_message() must be called if the message is not
your own, otherwise the node discovery/health check will not work and
you will see nodes disappearing as you described.
Here it is again:
if(is_method("KDMQ")){
if($rU =~ "userOnline"){
//user came online in cluster, resume transactions if-any
suspended
$avp(remoteUser) = $rb;
} else {
dmq_handle_message();
}
}
Notice that we check for your own/custom message first, then call
handle message if not matched.
Let me know if it works.
Cheers,
Charles
On Fri, 24 Apr 2020 at 19:52, SamyGo <govoiper(a)gmail.com
<mailto:govoiper@gmail.com>> wrote:
Yes,
I did read all(past 3+ years) his replies specific to DMQ and DMQ
USRLOC and only one matched exact description and there has no
resolution to it.
Github open+closed issues for DMQ didn't have anything similar
either. Could it be something I'm doing wrong !?
Additional info: One of the server is direct on Public IP and
Other one is behind NAT. Another test setup where it consistently
reproducible is two server behind NAT(AWS)
Here are the mod params. Only usrloc sync is done via DMQ and no
other module is using DMQ.
listen=udp:LocalIP:5060 advertise PublicIP:5060
modparam("dmq","server_address", DMQ_LOCAL_SERVER)
modparam("dmq", "notification_address", DMQ_REMOTE_SERVER)
modparam("dmq", "multi_notify", 0) //1 for DNS SRV
modparam("dmq", "num_workers", 10)
modparam("dmq", "ping_interval", 60)
modparam("dmq_usrloc", "enable", 1)
modparam("dmq_usrloc", "sync", 1)
modparam("dmq_usrloc", "batch_size", 4000)
modparam("dmq_usrloc", "batch_usleep", 1000)
modparam("dmq_usrloc", "usrloc_domain", "location")
Where: DMQ_REMOTE_SERVER = sip:PublicIP2:5060
GDB info as requested:
Core was generated by `/usr/local/sbin/kamailio -w /tmp/kamailio
-P /var/run/kamailio/kamailio.pid -f'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
reason=0x7ffd775e3ab8) at sl.c:276
276 if(reason->s[reason->len-1]=='\0') {
(gdb)
(gdb)
(gdb) frame 0
#0 0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
reason=0x7ffd775e3ab8) at sl.c:276
276 if(reason->s[reason->len-1]=='\0') {
(gdb) p *reason
$1 = {s = 0x0, len = 0}
(gdb)
(gdb) frame 1
#1 0x00007f24656c6549 in worker_loop (id=2) at worker.c:129
129
if(slb.freply(current_job->msg, peer_response.resp_code,
(gdb) p *worker
$3 = {queue = 0x7f2469f240a8, jobs_processed = 5, lock = {val =
2}, pid = 935}
(gdb)
(gdb)
(gdb) p *current_job
$6 = {f = 0x7f24656d6d8d <empty_peer_callback>, msg =
0x7f2469f88d40, orig_peer = 0x7f2469f6ed50, next = 0x0, prev = 0x0}
(gdb)
On Fri, Apr 24, 2020 at 1:30 PM Daniel-Constantin Mierla
<miconda(a)gmail.com <mailto:miconda@gmail.com>> wrote:
Hello,
have you tried the suggestion from Charles in the other
response? It can help figuring out where the problem resides.
Now, from C point of view, I would need the following output
from gdb of the core file:
frame 0
p *reason
frame 1
p *worker
p *current_job
I would also need to know the modparams for dmq and other
dmq_* module, plus the list if modules for which you enabled
dmq (eg, htable, dialog, presence, ...).
Cheers,
Daniel
On 24.04.20 18:10, SamyGo wrote:
Oops,apologize, missed that:
version: kamailio 5.3.3 (x86_64/linux) 44ccb9-dirty
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS,
DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC,
Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX,
FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER,
USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144,
MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 44ccb9 -dirty
compiled on 17:04:55 Apr 17 2020 with gcc 4.9.2
Tried this with version 5.0, 5.2, and now 5.3 same situation..
Thankyou for looking into this,
Sammy
On Fri, Apr 24, 2020 at 2:33 AM Daniel-Constantin Mierla
<miconda(a)gmail.com <mailto:miconda@gmail.com>> wrote:
Hello,
you have to provide the version of kamailio for each
reported kamailio issue, otherwise is hard to match with
the source code. Use 'kamailio -v' to get version details.
Cheers,
Daniel
On 23.04.20 23:36, SamyGo wrote:
Hi,
Is there a way to broadcast KDMQ to the cluster but not
expect a reply back !?as far as I've read the source
code dmq_bcast_message is exactly like dmq_send_message
in a way that it expects a callback to be executed on
response i.e expects a reply.
So, the situation I'm facing is I'm broadcasting message
to cluster and I do not want a reply back. The following
two options result in crash & core dump.
1 - If my script doesn't respond back, by use of
dmq_handle_message, it marks the destined servers as
"inactive" and stops usrloc sync process which
isn't desirable.
2 - If I respond back with the dmq_handle_message it
crashes the Kamailio which just received this
broadcasted message.
Here is how its done in script:
*broadcasting message to cluster:*
dmq_bcast_message("userOnline", "$fu",
"text/plain");
*Receiving and handling a broadcast message:*
route[DMQ_HANDLE] {
if(!(is_method("KDMQ") || $rm == "KDMQ")) return;
if(is_method("KDMQ") || $rm == "KDMQ"){
if($rU =~ "userOnline"){
//user came online in cluster,
resume transactions if-any suspended
$avp(remoteUser) = $rb;
}
dmq_handle_message();
exit;
}
}
*Related log lines:*
Apr 23 21:15:48 kamailio[916]: ALERT: <script>:
[da2c1-2f499] ------ DMQ_HANDLE: UserOnline Event
Received ------
Apr 23 21:15:48 kamailio[916]: DEBUG: dmq
[message.c:53]: ki_dmq_handle_message_rc():
dmq_handle_message [KDMQ sip:userOnline@9.8.7.123:5060
<http://sip:userOnline@9.8.7.123:5060>]
Apr 23 21:15:48 kamailio[916]: DEBUG: dmq
[message.c:66]: ki_dmq_handle_message_rc():
dmq_handle_message peer found: userOnline
Apr 23 21:15:48 kamailio[916]: DEBUG: <core>
[core/receive.c:437]: receive_msg(): request-route
executed in: 401461 usec
Apr 23 21:15:48 kamailio[935]: DEBUG: dmq
[worker.c:87]: worker_loop(): dmq_worker [2 935] lock
acquired
and crash/segfault..
Core dump:
https://pastebin.com/S7ekCPfF
Any help or pointers to solve this would be really
appreciated.
Best Regards,
Sammy
_______________________________________________
Kamailio (SER) - Users Mailing List
sr-users(a)lists.kamailio.org <mailto:sr-users@lists.kamailio.org>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
--
Daniel-Constantin Mierla --
www.asipto.com <http://www.asipto.com>
www.twitter.com/miconda <http://www.twitter.com/miconda> --
www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
--
Daniel-Constantin Mierla --
www.asipto.com <http://www.asipto.com>
www.twitter.com/miconda <http://www.twitter.com/miconda> --
www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
_______________________________________________
Kamailio (SER) - Users Mailing List
sr-users(a)lists.kamailio.org <mailto:sr-users@lists.kamailio.org>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
--
*Charles Chance*
Managing Director
t. 0330 120 1200 m. 07932 063 891
Sipcentric Ltd. Company registered in England & Wales no.
7365592. Registered office: Faraday Wharf, Innovation Birmingham
Campus, Holt Street, Birmingham Science Park, Birmingham B7 4BB.