Hi all!
Some time of using ser with ENUM revealed several problems which I would like to dicuss with you. Be aware - this email is long!
ENUM is a wonderful thing for call routing nevertheless, as it is DNS based there are some important things an ENUM aware application has to consider: 1. You never know how long the lookup takes 2. You never know if the lookup will fail or succeed 3. If the lookup was successful, you never may trust the result
I will explain this points in detail know:
1+2: Using enum, the application is giving control to the DNS resolver of the OS and the DNS infrastructure. Thus, the ser thread which performs the ENUM lookup will be blocked until there is a result from the system's DNS resolver.
If DNS is slow, or misconfigured (e.g. a zone is delegated to a nameserver which is down), the thread will be blocked for several seconds. E.g. if you use debian woody and 2 nameservers in /etc/resolv.conf, the timeout is 20 seconds. If you are lucky, the OS allows configuration of the DNS timeouts. Nevertheless, you have to consider that a ser thread will be blocked up to 20 seconds. This has impacts on your configuration:
Typically, you use some kind of the following logic: if (uri =~ "+[0-9].") { if (enum_lookup()) { t_relay(); break; } else { forward to PSTN gateway; break; } }
Thus, the INVITE will be received and the ENUM lookup will be performed. If the lookup will take longer than 0.5s, the SIP client will start restransmitting the INVITE. Thus, another thread will process this INVITE and enother ENUM lookup will be performed. After several seconds, all of ser's threads will be blocked with ENUM lookups and your SIP proxy will not handle any requests until the DNS query times out. Thus, it is very easy to generate a DoS attack against the proxy. Another funny thing is, that the SIP client will detect a proxy error and hangs up, but the INVITEs are still processed in the SIP proxy and after the timeout forwarded to the PSTN gateway.
A solution to stop the retransmission is to immediatle sl_send_reply("100","Trying"), But this rises another problem. Now, if the caller hangs up before the DNS timeout, the SIP client will send CANCEL (as it received 100) to the SIP proxy. But the SIP proxy can not cancel the transaction, as it is not genereated yet - the INVITE thread is still waiting for the ENUM lookup and the transaction will be generated after the ENUM lookup (after 20s timeout). Thus, we still end up with an INVITE forwarded to the PSTN gateway although the SIP client already hang up.
I thought of using t_newtran and t_forward_nonack_uri instead of t_realy to generate the transaction before doing the ENUM lookup. Thus, the thread which will process the CANCEL should find a transaction and stop it. But will this really prevent the INVITE sent to the PSTN gateway, once the DNS times out? (not tested)
btw: this problem does not only occour for ENUM lookups, but for all DNS lookups (SRV, A, AAAA).
3: If the ENUM lookup succeeds, you never may trust the result. It may be a invalid SIP URI, or a tel: URI, or anything else a user puts into its NAPTRs. This may result in a failed transaction, or like revealed at the ENUM plugtest in failed accounting. Even worse, maybe it is possible to complete crash ser using realy bad formated URIs?
Thus you can't avoid doing some URI checks against the URI received from the ENUM lookup. Perfomance issues are no valid arguements! Once I give control to external services (DNS, radius, exec), the perfomance penalties due to parsing the SIP URI are much more less than due to the ENUM lookup.
In case of ser, I would do the URI parsing in the ENUM module, or maybe generate a dedicated function/module for checking SIP URIs inside the routing logic. Thus, I can also check the result of exec calls.
I'm eagerly waiting for your opinions.
regards, Klaus
Hi
Have been using ENUM for a month or so now, and point 3 is a problem, since I found some users not properly formatting their entries, which meant I got no route to them, no to test this for every single entry is/will be a pain.
Lookups here didnt really take much time I found, I dropped freenum, cause it just went pear shaped, and started to hang. i.e got no reply.
Dont really have much other opinions, except that URI parsing for correct format is great idea.
Iqbal
Klaus Darilion wrote:
Hi all!
Some time of using ser with ENUM revealed several problems which I would like to dicuss with you. Be aware - this email is long!
ENUM is a wonderful thing for call routing nevertheless, as it is DNS based there are some important things an ENUM aware application has to consider:
- You never know how long the lookup takes
- You never know if the lookup will fail or succeed
- If the lookup was successful, you never may trust the result
I will explain this points in detail know:
1+2: Using enum, the application is giving control to the DNS resolver of the OS and the DNS infrastructure. Thus, the ser thread which performs the ENUM lookup will be blocked until there is a result from the system's DNS resolver.
If DNS is slow, or misconfigured (e.g. a zone is delegated to a nameserver which is down), the thread will be blocked for several seconds. E.g. if you use debian woody and 2 nameservers in /etc/resolv.conf, the timeout is 20 seconds. If you are lucky, the OS allows configuration of the DNS timeouts. Nevertheless, you have to consider that a ser thread will be blocked up to 20 seconds. This has impacts on your configuration:
Typically, you use some kind of the following logic: if (uri =~ "+[0-9].") { if (enum_lookup()) { t_relay(); break; } else { forward to PSTN gateway; break; } }
Thus, the INVITE will be received and the ENUM lookup will be performed. If the lookup will take longer than 0.5s, the SIP client will start restransmitting the INVITE. Thus, another thread will process this INVITE and enother ENUM lookup will be performed. After several seconds, all of ser's threads will be blocked with ENUM lookups and your SIP proxy will not handle any requests until the DNS query times out. Thus, it is very easy to generate a DoS attack against the proxy. Another funny thing is, that the SIP client will detect a proxy error and hangs up, but the INVITEs are still processed in the SIP proxy and after the timeout forwarded to the PSTN gateway.
A solution to stop the retransmission is to immediatle sl_send_reply("100","Trying"), But this rises another problem. Now, if the caller hangs up before the DNS timeout, the SIP client will send CANCEL (as it received 100) to the SIP proxy. But the SIP proxy can not cancel the transaction, as it is not genereated yet - the INVITE thread is still waiting for the ENUM lookup and the transaction will be generated after the ENUM lookup (after 20s timeout). Thus, we still end up with an INVITE forwarded to the PSTN gateway although the SIP client already hang up.
I thought of using t_newtran and t_forward_nonack_uri instead of t_realy to generate the transaction before doing the ENUM lookup. Thus, the thread which will process the CANCEL should find a transaction and stop it. But will this really prevent the INVITE sent to the PSTN gateway, once the DNS times out? (not tested)
btw: this problem does not only occour for ENUM lookups, but for all DNS lookups (SRV, A, AAAA).
3: If the ENUM lookup succeeds, you never may trust the result. It may be a invalid SIP URI, or a tel: URI, or anything else a user puts into its NAPTRs. This may result in a failed transaction, or like revealed at the ENUM plugtest in failed accounting. Even worse, maybe it is possible to complete crash ser using realy bad formated URIs?
Thus you can't avoid doing some URI checks against the URI received from the ENUM lookup. Perfomance issues are no valid arguements! Once I give control to external services (DNS, radius, exec), the perfomance penalties due to parsing the SIP URI are much more less than due to the ENUM lookup.
In case of ser, I would do the URI parsing in the ENUM module, or maybe generate a dedicated function/module for checking SIP URIs inside the routing logic. Thus, I can also check the result of exec calls.
I'm eagerly waiting for your opinions.
regards, Klaus
Serusers mailing list serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
.
On Jun 07, 2005 at 13:43, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi all!
Some time of using ser with ENUM revealed several problems which I would like to dicuss with you. Be aware - this email is long!
ENUM is a wonderful thing for call routing nevertheless, as it is DNS based there are some important things an ENUM aware application has to consider:
- You never know how long the lookup takes
- You never know if the lookup will fail or succeed
- If the lookup was successful, you never may trust the result
I will explain this points in detail know:
1+2: Using enum, the application is giving control to the DNS resolver of the OS and the DNS infrastructure. Thus, the ser thread which performs the ENUM lookup will be blocked until there is a result from the system's DNS resolver.
If DNS is slow, or misconfigured (e.g. a zone is delegated to a nameserver which is down), the thread will be blocked for several seconds. E.g. if you use debian woody and 2 nameservers in /etc/resolv.conf, the timeout is 20 seconds. If you are lucky, the OS allows configuration of the DNS timeouts. Nevertheless, you have to consider that a ser thread will be blocked up to 20 seconds. This has impacts on your configuration:
This could be fixed by limiting the ammount of time that a dns lookup can take in ser (e.g. a new config parameter). Right now the best practice is to use a caching dns server/proxy, that will cache also negative replies.
[...]
3: If the ENUM lookup succeeds, you never may trust the result. It may be a invalid SIP URI, or a tel: URI, or anything else a user puts into its NAPTRs. This may result in a failed transaction, or like revealed at the ENUM plugtest in failed accounting. Even worse, maybe it is possible to complete crash ser using realy bad formated URIs?
No, it shouldn't be able to crash ser. ser survives bad uris.
Thus you can't avoid doing some URI checks against the URI received from the ENUM lookup. Perfomance issues are no valid arguements! Once I give control to external services (DNS, radius, exec), the perfomance penalties due to parsing the SIP URI are much more less than due to the ENUM lookup.
What kind of checks? Run parse_uri and if fails return an error? This will happen any way at the first forward attempt that takes uri into account (the forward will fail).
In case of ser, I would do the URI parsing in the ENUM module, or maybe generate a dedicated function/module for checking SIP URIs inside the routing logic. Thus, I can also check the result of exec calls.
This could be easily done, but as I said above, if the uri is bad forwarding will fail anyway.
Andrei
No, it shouldn't be able to crash ser. ser survives bad uris.
It does not crash but it causes SER to send stateless 500 back to the client and does not stand the transaction which in term does not start the accounting.
Thus you can't avoid doing some URI checks against the URI received from the ENUM lookup. Perfomance issues are no valid arguements! Once I give control to external services (DNS, radius, exec), the perfomance penalties due to parsing the SIP URI are much more less than due to the ENUM lookup.
What kind of checks? Run parse_uri and if fails return an error?
Check AT LEAST if the returned URI from NAPTR lookup has a valid scheme that SER support (e.g sip: sips:) h323 tel etc should be checked in the enum lookup function
This will happen any way at the first forward attempt that takes uri into account (the forward will fail).
In case of ser, I would do the URI parsing in the ENUM module, or maybe generate a dedicated function/module for checking SIP URIs inside the routing logic. Thus, I can also check the result of exec calls.
This could be easily done, but as I said above, if the uri is bad forwarding will fail anyway.
Failing parsing the URI is not bad, exiting the routing logic leaving behind unaccounted transactions is.
Andrei
Adrian
Adrian Georgescu writes:
It does not crash but it causes SER to send stateless 500 back to the client and does not stand the transaction which in term does not start the accounting.
there is not yet enough date why ser sent back 500. perhaps the uri returned from enum was too long for ser to handle. this would need to analyzed more carefully.
Check AT LEAST if the returned URI from NAPTR lookup has a valid scheme that SER support (e.g sip: sips:) h323 tel etc should be checked in the enum lookup function.
that kind of check cannot be done, because ser.cfg may decide to send 302 reply back to the ua instead of forwarding the request. the us may very well support h322 ot tel uris.
Failing parsing the URI is not bad, exiting the routing logic leaving behind unaccounted transactions is.
i agree, but most likely this has nothing to do with enum, but could happen whichever way the same uri was added to destination set.
-- juha
On Jun 7, 2005, at 4:43 PM, Juha Heinanen wrote:
Adrian Georgescu writes:
It does not crash but it causes SER to send stateless 500 back to the client and does not stand the transaction which in term does not start the accounting.
there is not yet enough date why ser sent back 500. perhaps the uri returned from enum was too long for ser to handle. this would need to analyzed more carefully.
Check AT LEAST if the returned URI from NAPTR lookup has a valid scheme that SER support (e.g sip: sips:) h323 tel etc should be checked in the enum lookup function.
that kind of check cannot be done, because ser.cfg may decide to send 302 reply back to the ua instead of forwarding the request. the us may very well support h322 ot tel uris.
SER does not allow you to take control and send 302 back, again 500 is sent back stateless to the client and the transaction fails.
Adrian
Adrian Georgescu writes:
SER does not allow you to take control and send 302 back, again 500 is sent back stateless to the client and the transaction fails.
what? you simply do like this:
if (enum_query()) { sl_send_reply("302", "Moved Temporalily"); break; }
-- juha
On Jun 7, 2005, at 4:55 PM, Juha Heinanen wrote:
Adrian Georgescu writes:
SER does not allow you to take control and send 302 back, again 500 is sent back stateless to the client and the transaction fails.
what? you simply do like this:
if (enum_query()) { sl_send_reply("302", "Moved Temporalily"); break; }
Not all clients support redirect so if you proxy the request it fails in SER. The client should check the contact if it gets 302 but SER should check the URI and should NOT fail to create the transaction if it does the proxy function further.
-- juha
Adrian
Adrian Georgescu writes:
Not all clients support redirect so if you proxy the request it fails in SER. The client should check the contact if it gets 302 but SER should check the URI.
as i said, ser cannot check uris returned from enum, because it may not understand all possible uri schemes. it could check uris where it understands the scheme, but making such partial checks makes no sense. any uri that will be later processed by ser IS checked as we discussed before. why ser sent back 500 is not yet known, but most likely it is a general problem and nothing to do with enum.
-- juha
Andrei Pelinescu-Onciul writes:
This could be fixed by limiting the ammount of time that a dns lookup can take in ser (e.g. a new config parameter).
that kind of parameter would indeed be a good thing. slow dns query has nothing special to do with enum. it would affect also lookups on request uri host.
This may result in a failed transaction, or like revealed at the ENUM plugtest in failed accounting.
accounting should succeed. failure may be result of yet another bug in radiusclient library, not necessarily in ser. i asked more details about this incident, but i guess adrian has been too bury to provide it.
Thus you can't avoid doing some URI checks against the URI received from the ENUM lookup. Perfomance issues are no valid arguements! Once I give control to external services (DNS, radius, exec), the perfomance penalties due to parsing the SIP URI are much more less than due to the ENUM lookup.
What kind of checks? Run parse_uri and if fails return an error? This will happen any way at the first forward attempt that takes uri into account (the forward will fail).
this is what i said too. i could try to parse every uri returned from enum, but i have considered it waste of time, because bad uris will be detected later anyway either by ser or, in case of 302, by sip ua.
This could be easily done, but as I said above, if the uri is bad forwarding will fail anyway.
exactly.
-- juha
On Jun 7, 2005, at 4:38 PM, Juha Heinanen wrote:
Andrei Pelinescu-Onciul writes:
This could be fixed by limiting the ammount of time that a dns lookup can take in ser (e.g. a new config parameter).
that kind of parameter would indeed be a good thing. slow dns query has nothing special to do with enum. it would affect also lookups on request uri host.
This may result in a failed transaction, or like revealed at the ENUM plugtest in failed accounting.
accounting should succeed. failure may be result of yet another bug in radiusclient library, not necessarily in ser. i asked more details about this incident, but i guess adrian has been too bury to provide it.
Sorry guys,
I quote from my last message:
"We run into a new problem after striping the spaces, the Radius accounting packet could not be sent to the server:
Jun 1 11:00:00 ns3 ser[18991]: ACC: call missed: method=INVITE, i-uri=sip:+441414960912@ag-projects.com, o-uri=sip: etsi7@IsThisAVeryVeryVeryLongLabelAnnoyToTheENUMClientsAndDNSClients.Thi sAIsVeryVeryVeryLongLabelToAnnoyTheENUMClientsAndDNSClients.AThisIsVeryV eryVeryLongLabelToAnnoyTheENUMClientsAndDNSClients.1.1.9.0.6.9.4.1.4.1.4 .4.e164.arpa., call_id=3c29000e249f-2yfy4pqkpyps@snom190, from="AG - ETSI 01" sip:40317109901@ag-projects.com;tag=iywbz91xdh, code=408 Request Timeout
Jun 1 11:00:00 ns3 ser[18991]: ERROR: acc_rad_request: rc_avpaid_add failed for number <2> attr <Sip-Translated-Request-URI> "
Basically we could produce calls to PSTN numbers where accounting did no start because the domain part was too long. Unfortunately my provisioning system does not allow me to put those long entries we had at ETSI plugtest but it is fairly easy to reproduce the problem.
Regards, Adrian
Adrian Georgescu writes:
Basically we could produce calls to PSTN numbers where accounting did no start because the domain part was too long. Unfortunately my provisioning system does not allow me to put those long entries we had at ETSI plugtest but it is fairly easy to reproduce the problem.
i asked if you could add a log statement to acc.c so that we could see for sure what kind of attribute value ser tries to send to radius. if that value looks ok, then we need to start fixing radiusclient library.
-- juha
Juha, I did not do that at that time, my apologies. I added now this number into ENUM with a long domain
+40317105104
This what I get in SER:
Jun 7 16:58:52 ns3 ser[13479]: ERROR: acc_rad_request: rc_avpaid_add failed for number <2> attr <Sip-Translated-Request-URI>
The START packet is not send to RADIUS server
Adrian
On Jun 7, 2005, at 4:53 PM, Juha Heinanen wrote:
f you could add a log statement to acc
Adrian Georgescu writes:
Juha, I did not do that at that time, my apologies. I added now this number into ENUM with a long domain
+40317105104
This what I get in SER:
Jun 7 16:58:52 ns3 ser[13479]: ERROR: acc_rad_request: rc_avpaid_add failed for number <2> attr <Sip-Translated-Request-URI>
and because you have the setup ready, i asked if you could add a log statement to acc.c in order to see also the value of the avp pair that ser tried to add. if the value looks ok, the bug is in radiusclient and not in ser.
-- juha
Hi Andrei!
Andrei Pelinescu-Onciul wrote:
On Jun 07, 2005 at 13:43, Klaus Darilion klaus.mailinglists@pernau.at wrote:
If DNS is slow, or misconfigured (e.g. a zone is delegated to a nameserver which is down), the thread will be blocked for several seconds. E.g. if you use debian woody and 2 nameservers in /etc/resolv.conf, the timeout is 20 seconds. If you are lucky, the OS allows configuration of the DNS timeouts. Nevertheless, you have to consider that a ser thread will be blocked up to 20 seconds. This has impacts on your configuration:
This could be fixed by limiting the ammount of time that a dns lookup can take in ser (e.g. a new config parameter). Right now the best practice is to use a caching dns server/proxy, that will cache also negative replies.
There is no caching for requests without answer. If the DNS response is SERVFAIL, this will be cached until the negative TTL times out. If the the authoritative NS does not answer, there is no answer which could be cached.
Limiting the time is the first step and will make ser more stable - nevertheless it is only the first step. To what time will you set the DNS timeout? Values > 0.5s still requries handling of the retransmissions.
[...]
3: If the ENUM lookup succeeds, you never may trust the result. It may be a invalid SIP URI, or a tel: URI, or anything else a user puts into its NAPTRs. This may result in a failed transaction, or like revealed at the ENUM plugtest in failed accounting. Even worse, maybe it is possible to complete crash ser using realy bad formated URIs?
No, it shouldn't be able to crash ser. ser survives bad uris.
Fine. What will be the response? 500? 4xx Bad URI? ...
In case of ser, I would do the URI parsing in the ENUM module, or maybe generate a dedicated function/module for checking SIP URIs inside the routing logic. Thus, I can also check the result of exec calls.
This could be easily done, but as I said above, if the uri is bad forwarding will fail anyway.
But fetching the bad URI before relaying the call (which will fail) I can use revert_uri and send the call to the PSTN.
regards, klaus