Hi Klaus, responses inline...(command and file names are taken from my memory, so don't pay much attention at the spelling)
Samuel.
Unclassified.
Klaus Darilion klaus.mailinglists@pernau.at 06/07/05 01:43PM >>>
Hi all!
Some time of using ser with ENUM revealed several problems which I would like to dicuss with you. Be aware - this email is long!
ENUM is a wonderful thing for call routing nevertheless, as it is DNS based there are some important things an ENUM aware application has to consider:
- You never know how long the lookup takes
- You never know if the lookup will fail or succeed
- If the lookup was successful, you never may trust the result>
I will explain this points in detail know:
1+2: Using enum, the application is giving control to the DNS resolver of the OS and the DNS infrastructure. Thus, the ser thread which performs the ENUM lookup will be blocked until there is a result from the system's DNS resolver.
If DNS is slow, or misconfigured (e.g. a zone is delegated to a nameserver which is down), the thread will be blocked for several seconds. E.g. if you use debian woody and 2 nameservers in /etc/resolv.conf, the timeout is 20 seconds. If you are lucky, the OS allows configuration of the DNS timeouts. Nevertheless, you have to consider that a ser thread will be blocked up to 20 seconds. This has impacts on your configuration:
I don't know the details but would it be really difficult to use an asyncrhonous resolver, such as resiprocate SIP stack does with ARES?? Besides exec_* calls, the main SER's performance bottelneck is the DNS resolving step thus it would be a great improvement adding asyncronous DNS queries.
Typically, you use some kind of the following logic: if (uri =~ "+[0-9].") { if (enum_lookup()) { t_relay(); break; } else { forward to PSTN gateway; break; } }
Thus, the INVITE will be received and the ENUM lookup will be performed. If the lookup will take longer than 0.5s, the SIP client will start restransmitting the INVITE. Thus, another thread will process this INVITE and enother ENUM lookup will be performed. After several seconds, all of ser's threads will be blocked with ENUM lookups and your SIP proxy will not handle any requests until the DNS query times out. Thus, it is very easy to generate a DoS attack against the proxy. Another funny thing is, that the SIP client will detect a proxy error and hangs up, but the INVITEs are still processed in the SIP proxy and after the timeout forwarded to the PSTN gateway.
A solution to stop the retransmission is to immediatle sl_send_reply("100","Trying"), But this rises another problem. Now, if the caller hangs up before the DNS timeout, the SIP client will send CANCEL (as it received 100) to the SIP proxy. But the SIP proxy can not cancel the transaction, as it is not genereated yet - the INVITE thread is still waiting for the ENUM lookup and the transaction will be generated after the ENUM lookup (after 20s timeout). Thus, we still end up with an INVITE forwarded to the PSTN gateway although the SIP client already hang up.
I thought of using t_newtran and t_forward_nonack_uri instead of t_realy to generate the transaction before doing the ENUM lookup. Thus, the thread which will process the CANCEL should find a transaction and stop it. But will this really prevent the INVITE sent to the PSTN gateway, once the DNS times out? (not tested)
I think the best way would be to add a reply(100) AND t_forward_nonack, something like: sl_send_reply("100","Trying"); t_newtran(); uri =~ "+[0-9].") { if (enum_lookup()) { t_forward_nonack(); break;} else{ t_forward_nonac();break; }
But it adds a complexity level which will lead to many non-working config files......but I vote for it!
btw: this problem does not only occour for ENUM lookups, but for all DNS lookups (SRV, A, AAAA).
3: If the ENUM lookup succeeds, you never may trust the result. It may be a invalid SIP URI, or a tel: URI, or anything else a user puts into its NAPTRs. This may result in a failed transaction, or like revealed at the ENUM plugtest in failed accounting. Even worse, maybe it is possible to complete crash ser using realy bad formated URIs?
Thus you can't avoid doing some URI checks against the URI received from the ENUM lookup. Perfomance issues are no valid arguements! Once I give control to external services (DNS, radius, exec), the perfomance penalties due to parsing the SIP URI are much more less than due to the ENUM lookup.
In case of ser, I would do the URI parsing in the ENUM module, or maybe generate a dedicated function/module for checking SIP URIs inside the routing logic. Thus, I can also check the result of exec calls.
Since URI check is a really global action, it should be implemented in SER's core, providing commands to the config file (something like uri_check(To|From|Req-URI|Contact) ) and functions to be called from any module.
I'm eagerly waiting for your opinions.
regards, Klaus
_______________________________________________ Serdev mailing list serdev@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serdev
On Jun 07, 2005 at 14:56, Samuel Osorio Calvo samuel.osorio@nl.thalesgroup.com wrote:
If DNS is slow, or misconfigured (e.g. a zone is delegated to a nameserver which is down), the thread will be blocked for several seconds. E.g. if you use debian woody and 2 nameservers in /etc/resolv.conf, the timeout is 20 seconds. If you are lucky, the OS allows configuration of the DNS timeouts. Nevertheless, you have to consider that a ser thread will be blocked up to 20 seconds. This has impacts on your configuration:
I don't know the details but would it be really difficult to use an asyncrhonous resolver, such as resiprocate SIP stack does with ARES?? Besides exec_* calls, the main SER's performance bottelneck is the DNS resolving step thus it would be a great improvement adding asyncronous DNS queries.
Using asynchronous dns would work as long as you have memory to save the state of the pending dns request. It could be easily attacked in the same way (lots of DNS requests that will take a long time to resolve => out of memory => no more messages processed). Besides using it would mean saving the complete state of the message and of the ser processing of the message in the moment the dns request was made. For example if you make a dns request in module foo, function bar() you should be able to continue from exactly the same point in exactly the same state, when you receive the reply. This would mean something equivalent to saving the whole call trace (the whole stack for that matter) and a lot of global variables. The ammount of complexity involved in converting ser to such a model (where such a detailed state is saved that it makes possible resuming processing at a later time) would be huge. I don't think this would be doable in finite time :-)
As an alternative one could fork threads (which would save all the information involved except all the global vars.), or new processes (which would save everything). However in this case we would deal with the forking overhead. This can be attacked too (turning ser into a fork bomb). I think it's much better to start ser with lots of children processes (let's say 500, or the maximum acceptable for your machine configuration).
So, I don't think async. dns would be a solution.
[...]
Andrei
Andrei Pelinescu-Onciul wrote:
On Jun 07, 2005 at 14:56, Samuel Osorio Calvo samuel.osorio@nl.thalesgroup.com wrote:
If DNS is slow, or misconfigured (e.g. a zone is delegated to a nameserver which is down), the thread will be blocked for several seconds. E.g. if you use debian woody and 2 nameservers in /etc/resolv.conf, the timeout is 20 seconds. If you are lucky, the OS allows configuration of the DNS timeouts. Nevertheless, you have to consider that a ser thread will be blocked up to 20 seconds. This has impacts on your configuration:
...
I think it's much better to start ser with lots of children processes (let's say 500, or the maximum acceptable for your machine configuration).
Nevertheless, this does not solve the retransmission problems. Any suggestions how to solve this?
btw: how much memory do I need for 500 threads? Any suggestions for a Dual P3 1.3 GHz with 512 MB ram?
So, I don't think async. dns would be a solution.
It also would be nice to have parallel ENUM requests, to parse multiple trees in parallel instead of serial.
regards, klaus