I'm observing the following scenario:
* mod_dialog callbacks trigger 2 or more times (nearly) simultaneously for the same
dialog
* pua_dialoginfo sends PUBLISH 1, referencing etag A
* pua_dialoginfo sends PUBLISH 2, referencing etag A
* presence_dialoginfo processes PUBLISH 1, replies with new etag B
* presence_dialoginfo processes PUBLISH 2, replies with a 412 (because etag A no longer
exists)
* pua_dialoginfo receives the 412 and re-tries it as PUBLISH 3 ("sent a PUBLISH
within a dialog that no longer exists, send again an intial PUBLISH")
* presence_dialoginfo processes PUBLISH 3, and may or may not accept it
The situation as described is not ideal since it'll fill up your logs with errors, but
isn't critical per se. Much more problematic is when there are more than 2 PUBLISHes
generated for the same dialog simultaneously, as this can cause a (near) infinite race
between the various PUBLISH requests all fighting to update the same etag. For example, 10
PUBLISH are sent out for etag A; all but one are rejected with a 412; then the other 9
keep on bouncing back and forth between pua_dialoginfo and presence_dialoginfo because
they do not share the same view on the dialog's latest etag.
Even worse is when presence_dialoginfo is rejecting *all* incoming PUBLISHes with a 412,
for example because of a database/memory/replication problem or a malformed request. A
`t_reply("412", "Not today")` in the presence_dialoginfo server,
combined with a single PUBLISH from pua_dialoginfo is enough to reproducibly brick the
pua_dialoginfo server because it runs into critical memory fragmentation levels.
I think there are multiple ways to fix or alleviate this problem.
## pua generic
* pua (publ_cback_func) should not retry 412-failed PUBLISHes indefinitely, but e.g. at
most once
* pua should not generate simultaneous PUBLISHes for the same presentity. It should delay
PUBLISH 2 until PUBLISH 1 is either (permanently) accepted or rejected; or it should
discard PUBLISH 2 immediately when it is generated.
* Perhaps make handling of 412 replies more fine-grained. Currently every 412 reply is
handled like this ("sent a PUBLISH within a dialog that no longer exists"),
while that statement doesn't apply to all possible 412 replies.
## pua_dialoginfo specific
* pua_dialoginfo currently subscribes to a lot of mod_dialog callbacks. For example
subscribing to both DLGCB_CONFIRMED and DLGCB_CONFIRMED_NA will always get you two
(rapidly succeeding) PUBLISHes with exactly the same contents. Subscribing to
DLGCB_REQ_WITHIN means you'll get a new PUBLISH for every re-INVITE such as
hold/unhold or codec negotiation, which is useless in many usecases. It would be helpful
to allow configuring pua_dialoginfo with a list of callbacks to subscribe to.
* With or without a smaller set of mod_dialog callbacks, pua_dialoginfo can generate
multiple PUBLISH requests with exactly the same contents. Since pua is aware of the (last)
state that it published for the presentity, it could compare if the newly generated
PUBLISH is any different from the last known state, and discard it if it's not.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2048