Hi Bogdan, all,
I've made more digging and I need your opinion on my findings.
Pre-condition: run an AS in forked mode with 2 children.
1. AS sends a request (RQ).
2. AS receives a "408 Request Timeout" reply (RPL) for RQ.
3. After some time (about 30 seconds) AS receives another "408 Request Timeout",
initiated at another network node.
4. At this point, you want to send another request from AS.
Result: Very often, the replies for the last request cannot be matched with its
transaction in "tm" module, so
request retransmissions occur.
After a few days I've found that in "t_reply_matching" function from
"t_lookup.c" file, just before returning
(at the end of the function, not in the case when the reply gets matched) T is set to zero
(set_t(0)).
So, at step 3, the reply cannot be matched with an existing transaction and T is set to
zero in a child process.
If a reply of the request sent at step 4 gets handled by exactly the child process that
last set T to zero, the
transaction will not get matched (see t_check() and reply_received() functions), so the
retransmission
timer will hit => request retransmissions!!!
I think the problem is that T is set to zero in t_reply_matching() when a reply cannot be
matched.
I've changed that to set_t(T_UNDEFINED) and everything seems to work just fine.
What do you say Bogdan? Is this change safe? Is this the fix for my problem?
Thanks,
Sica.
----- Original Message ----
From: Bogdan-Andrei Iancu <bogdan(a)voice-system.ro>
To: Vasile Zaharia <sicavz(a)yahoo.com>
Cc: users(a)openser.org
Sent: Tuesday, October 30, 2007 9:49:38 AM
Subject: Re: [OpenSER-Users] retransmissions after final response
Salut Sica,
So you say that openser does retransmission even after receiving the
200 OK for the request? are you sure it is about the same transaction?
Could you provide the debug log showing also the retransmission events?
(just send me the entire log).
Salutari,
Bogdan
Vasile Zaharia wrote:
Salut Bogdan,
Thank you for your response.
I think the reply is correct, because in the log file I can see:
2007-10-15 06:28:48.545|t_lookup.c:824|t_reply_matching|DEBUG:
t_reply_matching:
reply matched (T=0xb5e98bd8)!\n
2007-10-15 06:28:48.545|t_lookup.c:924|t_check|DEBUG:
t_check:
end=0xb5e98bd8\n
2007-10-15
06:28:48.546|t_hooks.c:203|run_trans_callbacks|DBG:
trans=0xb5e98bd8, callback type
256, id 0 entered\n
2007-10-15
06:28:52.754|timer.c:434|wait_handler|DEBUG: wait_handler
: removing 0xb5e98bd8
from table \n
2007-10-15
06:28:52.754|timer.c:714|remove_timer_unsafe|DEBUG:
unlinking timer: tl=0xb5e98d54,
timeout=364, group=0\n
2007-10-15 06:28:52.753|timer.c:239|delete_cell|DEBUG:
delete
transaction 0xb5e98bd8\n
2007-10-15
06:28:52.753|timer.c:443|wait_handler|DEBUG: wait_handler
: done\n
2007-10-15 06:29:15.378|t_lookup.c:824|t_reply_matching|DEBUG:
t_reply_matching:
reply matched (T=0xb5e98bd8)!\n
2007-10-15 06:29:15.378|t_lookup.c:924|t_check|DEBUG:
t_check:
end=0xb5e98bd8\n
2007-10-15
06:29:15.378|t_hooks.c:203|run_trans_callbacks|DBG:
trans=0xb5e98bd8, callback type
256, id 0 entered\n
2007-10-15
06:29:19.871|timer.c:434|wait_handler|DEBUG: wait_handler
: removing 0xb5e98bd8
from table \n
2007-10-15
06:29:19.871|timer.c:714|remove_timer_unsafe|DEBUG:
unlinking timer: tl=0xb5e98d54,
timeout=391, group=0\n
2007-10-15 06:29:19.871|timer.c:239|delete_cell|DEBUG:
delete
transaction 0xb5e98bd8\n
2007-10-15
06:29:19.871|timer.c:443|wait_handler|DEBUG: wait_handler
: done\n
2007-10-15 06:29:21.853|t_lookup.c:824|t_reply_matching|DEBUG:
t_reply_matching:
reply matched (T=0xb5e98bd8)!\n [22569]
2007-10-15 06:29:21.853|t_lookup.c:924|t_check|DEBUG:
t_check:
end=0xb5e98bd8\n [22569]
2007-10-15
06:29:25.894|timer.c:434|wait_handler|DEBUG: wait_handler
: removing 0xb5e98bd8
from table \n
2007-10-15
06:29:25.894|timer.c:714|remove_timer_unsafe|DEBUG:
unlinking timer: tl=0xb5e98d54,
timeout=397, group=0\n
2007-10-15 06:29:25.894|timer.c:239|delete_cell|DEBUG:
delete
transaction 0xb5e98bd8\n
2007-10-15
06:29:25.894|timer.c:443|wait_handler|DEBUG: wait_handler
: done\n
2007-10-15 06:29:50.049|t_lookup.c:824|t_reply_matching|DEBUG:
t_reply_matching:
reply matched (T=0xb5e98bd8)!\n
2007-10-15 06:29:50.049|t_lookup.c:924|t_check|DEBUG:
t_check:
end=0xb5e98bd8\n
2007-10-15
06:29:50.051|t_hooks.c:203|run_trans_callbacks|DBG:
trans=0xb5e98bd8, callback type
256, id 0 entered\n
2007-10-15
06:29:55.014|timer.c:434|wait_handler|DEBUG: wait_handler
: removing 0xb5e98bd8
from table \n
2007-10-15
06:29:55.014|timer.c:714|remove_timer_unsafe|DEBUG:
unlinking timer: tl=0xb5e98d54,
timeout=425, group=0\n
2007-10-15 06:29:55.014|timer.c:239|delete_cell|DEBUG:
delete
transaction 0xb5e98bd8\n
2007-10-15
06:29:55.014|timer.c:443|wait_handler|DEBUG: wait_handler
: done\n
2007-10-15 06:31:45.071|t_lookup.c:824|t_reply_matching|DEBUG:
t_reply_matching:
reply matched (T=0xb5e98bd8)!\n
2007-10-15 06:31:45.071|t_lookup.c:924|t_check|DEBUG:
t_check:
end=0xb5e98bd8\n
2007-10-15
06:31:45.071|t_hooks.c:203|run_trans_callbacks|DBG:
trans=0xb5e98bd8, callback type
256, id 0 entered\n
2007-10-15
06:31:49.834|timer.c:434|wait_handler|DEBUG: wait_handler
: removing 0xb5e98bd8
from table \n
2007-10-15
06:31:49.834|timer.c:714|remove_timer_unsafe|DEBUG:
unlinking timer: tl=0xb5e98d54,
timeout=540, group=0\n
2007-10-15 06:31:49.834|timer.c:239|delete_cell|DEBUG:
delete
transaction 0xb5e98bd8\n
2007-10-15
06:31:49.834|timer.c:443|wait_handler|DEBUG: wait_handler
: done\n
Maybe I don't understand it correctly? It's about T=0xb5e98bd8. It
gets
matched, removed, deleted. Many times.
The situation was reproduced using 2 different clients: sipp and a
proprietary
client (using two opensource libraries: osip and exosip).
What do you think?
Thanks,
Sica.
----- Original Message ----
From: Bogdan-Andrei Iancu <bogdan(a)voice-system.ro>
To: Vasile Zaharia <sicavz(a)yahoo.com>
Cc: users(a)openser.org
Sent: Tuesday, October 30, 2007 6:35:10 AM
Subject: Re: [OpenSER-Users] retransmissions after final response
Salut Sica,
The problem may reside in two places:
1) the reply is not correct and the client is not able to match
it
to the sent request (and keeps retransmitting)
2) the client is not able to match (due whatever other bugs) the
reply.
Salutari,
Bogdan
Vasile Zaharia wrote:
> Hi all,
>
>
> I'm developing a Presence AS OpenSER module (derived from the
OpenSER
Presence module)
> and, from time to time, it happens the next situation:
> - AS sends a NOTIFY message
> - the client receives the NOTIFY and responds with 200 OK
> - AS receives 200 OK response, but it starts to retransmit the
original NOTIFY and it
> retransmits it until it gets back a 408 Request timeout.
>
> Please note that I use "tm" module for sending SIP messages, so I
don't
> handle retransmissions explicitly in my module.
> Also, note that it does not happen all the time, but from time to
time.
> Does anyone else met a similar situation?
>
>
> Thanks,
> Sica.
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
>
http://mail.yahoo.com
>
> _______________________________________________
> Users mailing list
> Users(a)openser.org
>
http://openser.org/cgi-bin/mailman/listinfo/users
>
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com