I am curious, what exactly is the purpose and philosophy of the various hash algorithms in the dispatcher module? I am referring to the ones that allow the gateway in the route set to be determined through a hash of various SIP headers, such as the From URI, the To URI, the Call-ID GUID, etc.
No guarantee of fair distribution - or in fact, any distribution that can be characterised by any describable pattern whatsoever - is implied by these algorithms.
Additionally, it seems that without knowing
(1) The exact hash algorithm in use;
(2) The distribution that this hash algorithm would yield for a given set of possible values of these header fields, which in many cases are specifically intended to be pseudorandom (e.g. Call-ID),
there is absolutely no way to determine, from any meaningfully deterministic perspective, which numerical entry in the route set these algorithms would computationally yield.
So, I guess my question is: With no implied uniformity or weighting in the distribution whatsoever based on the incidental character of such values, what practical use does it serve to use any algorithms except round-robin or random? Is it expected that the user will plot the hash values against a log of given input strings to determine how the distribution will shape up? Is there some reason why the sort of profoundly lopsided distribution that may create might be desirable?
Thanks,
The Call-ID hash is used to send all requests of the same dialog to the same endpoint (proxy,application server,gateway,whatever...). The reason after this behaviour is not to have these SIP endpoints sharing the status of all the dialogs (not many applications out there share this status and therefore you are required to send all within-dialog messages to the same SIP instance).
simple: to send the BYE to the same gateway you sent the INVITE...
This does not guarantee fare load distribution....that's why depending on what you are dispatching you can hash on several headers (if you don't have to keep dialog state,...)
hope it helps, Sam
2008/9/29 Alex Balashov abalashov@evaristesys.com
I am curious, what exactly is the purpose and philosophy of the various hash algorithms in the dispatcher module? I am referring to the ones that allow the gateway in the route set to be determined through a hash of various SIP headers, such as the From URI, the To URI, the Call-ID GUID, etc.
No guarantee of fair distribution - or in fact, any distribution that can be characterised by any describable pattern whatsoever - is implied by these algorithms.
Additionally, it seems that without knowing
(1) The exact hash algorithm in use;
(2) The distribution that this hash algorithm would yield for a given set of possible values of these header fields, which in many cases are specifically intended to be pseudorandom (e.g. Call-ID),
there is absolutely no way to determine, from any meaningfully deterministic perspective, which numerical entry in the route set these algorithms would computationally yield.
So, I guess my question is: With no implied uniformity or weighting in the distribution whatsoever based on the incidental character of such values, what practical use does it serve to use any algorithms except round-robin or random? Is it expected that the user will plot the hash values against a log of given input strings to determine how the distribution will shape up? Is there some reason why the sort of profoundly lopsided distribution that may create might be desirable?
Thanks,
-- Alex Balashov Evariste Systems Web : http://www.evaristesys.com/ Tel : (+1) (678) 954-0670 Direct : (+1) (678) 954-0671 Mobile : (+1) (706) 338-8599
Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
samuel wrote:
The Call-ID hash is used to send all requests of the same dialog to the same endpoint (proxy,application server,gateway,whatever...). The reason after this behaviour is not to have these SIP endpoints sharing the status of all the dialogs (not many applications out there share this status and therefore you are required to send all within-dialog messages to the same SIP instance).
simple: to send the BYE to the same gateway you sent the INVITE...
This does not guarantee fare load distribution....that's why depending on what you are dispatching you can hash on several headers (if you don't have to keep dialog state,...)
Yes, but is that not a relatively minor use case that applies to situations in which stateful transaction forwarding (TM module) is not used?
Meaning, if I t_relay() an INVITE to a gateway selected by dispatcher, subsequent provisional responses and in-dialog requests will be passed between the original endpoints without further intervention.
If it didn't work that way, round-robin and random wouldn't work as algorithms because the next message would be sent to another server.
So, stateless forwarding aside (why would you want to do that in a dispatcher load balancing or failover scenario?), why do these hash algorithms make any sense to use?
2008/9/29 Alex Balashov abalashov@evaristesys.com
samuel wrote:
The Call-ID hash is used to send all requests of the same dialog to the same endpoint (proxy,application server,gateway,whatever...). The reason after this behaviour is not to have these SIP endpoints sharing the status of all the dialogs (not many applications out there share this status and therefore you are required to send all within-dialog messages to the same SIP instance).
simple: to send the BYE to the same gateway you sent the INVITE...
This does not guarantee fare load distribution....that's why depending on what you are dispatching you can hash on several headers (if you don't have to keep dialog state,...)
Yes, but is that not a relatively minor use case that applies to situations in which stateful transaction forwarding (TM module) is not used?
Meaning, if I t_relay() an INVITE to a gateway selected by dispatcher, subsequent provisional responses and in-dialog requests will be passed between the original endpoints without further intervention.
Responses will traverse back the way and therefore will pass the t_relay() host. Further in-dialog requests might traverse the t_relay host or not (record-route stuff)...It depends on the network topology and the application. There are lots of HA deployments and proxy with PSTN termination that require this hash algorithm. In that cases you just want a loadbalancer with low processing (no tm) or to be in the middle of the dialog signal exchange (case of PSTN gateway).
If it didn't work that way, round-robin and random wouldn't work as algorithms because the next message would be sent to another server.
So, stateless forwarding aside (why would you want to do that in a dispatcher load balancing or failover scenario?), why do these hash algorithms make any sense to use?
-- Alex Balashov Evariste Systems Web : http://www.evaristesys.com/ Tel : (+1) (678) 954-0670 Direct : (+1) (678) 954-0671 Mobile : (+1) (706) 338-8599
samuel wrote:
Responses will traverse back the way and therefore will pass the t_relay() host. Further in-dialog requests might traverse the t_relay host or not (record-route stuff)...It depends on the network topology and the application. There are lots of HA deployments and proxy with PSTN termination that require this hash algorithm. In that cases you just want a loadbalancer with low processing (no tm) or to be in the middle of the dialog signal exchange (case of PSTN gateway).
You mean, loose route stuff?
On 09/29/08 14:22, Alex Balashov wrote:
samuel wrote:
The Call-ID hash is used to send all requests of the same dialog to the same endpoint (proxy,application server,gateway,whatever...). The reason after this behaviour is not to have these SIP endpoints sharing the status of all the dialogs (not many applications out there share this status and therefore you are required to send all within-dialog messages to the same SIP instance).
simple: to send the BYE to the same gateway you sent the INVITE...
This does not guarantee fare load distribution....that's why depending on what you are dispatching you can hash on several headers (if you don't have to keep dialog state,...)
Yes, but is that not a relatively minor use case that applies to situations in which stateful transaction forwarding (TM module) is not used?
Meaning, if I t_relay() an INVITE to a gateway selected by dispatcher, subsequent provisional responses and in-dialog requests will be passed between the original endpoints without further intervention.
If it didn't work that way, round-robin and random wouldn't work as algorithms because the next message would be sent to another server.
So, stateless forwarding aside (why would you want to do that in a dispatcher load balancing or failover scenario?), why do these hash algorithms make any sense to use?
There are use cases even when doing stateful processing. So: - hash over call id - it is fast, good distribution, can be used for calls to be sent to gateways, etc, works for stateless processing as well - hash over from uri - caller is sent to same server, good for cdr collection, authentication, etc - hash over to uri - good to send registrations for a user to same location server - hash over r-uri - good to send calls to same location server as the registration server for that user
Using a farm of servers, grouped by users, by combining the last three you can route the sip messages inside your network to get auth, acc and location services ok, and the first one to send to gateways :-)
Cheers, Daniel
Daniel,
Daniel-Constantin Mierla wrote:
There are use cases even when doing stateful processing. So:
- hash over call id - it is fast, good distribution, can be used for
calls to be sent to gateways, etc, works for stateless processing as well
- hash over from uri - caller is sent to same server, good for cdr
collection, authentication, etc
- hash over to uri - good to send registrations for a user to same
location server
- hash over r-uri - good to send calls to same location server as the
registration server for that user
Using a farm of servers, grouped by users, by combining the last three you can route the sip messages inside your network to get auth, acc and location services ok, and the first one to send to gateways :-)
I understand the concept of same keys hashing to the same values. :-) If one hashes a value that stays consistent within a dialog, then all requests within that dialog will go to the same place (and not just the transaction, which is the only thing TM is good for). If one hashes a value that's going to always be the same for a given user (such as a From URI), they will always be directed to the same gateway, etc.
What I still don't understand is what benefit this deterministic domain of values - this sameness - confers from a practical perspective. Yes, I know that if I hash the From URI, the caller will be sent to the same server, but, which server? Clearly, the answer is, "Whichever server their From URI hashes to." Sure. But what particular usefulness does that have, whether one is doing stateful or stateless processing?
Obviously, using a hash is more elegant and simple than statically assigning my users various bindings, as you point out in examples like "good to send calls to the same location server as the registration server for that user." But still, I am brought to ask - without having some means of determining exactly what that server will be, what's the advantage? It's obviously not load balancing, unless I know that my From URIs are going to have a certain desirable distribution when hashed, which I don't. Just keeping certain paths the same is nice, of course, but I fail to see how it's actually useful.
Sure, it's great if my registrants always go to the same location server, but if 90% of my users end up going to one location server because of the distribution that the variance of their From URIs provides, what does this really give me except a predictable route? It's not as if I can use the hash to "find" a user's location server -- unless the location server was determined using the same hash also. What's the point?
-- Alex
On 09/30/08 08:17, Alex Balashov wrote:
Daniel,
Daniel-Constantin Mierla wrote:
There are use cases even when doing stateful processing. So:
- hash over call id - it is fast, good distribution, can be used for
calls to be sent to gateways, etc, works for stateless processing as well
- hash over from uri - caller is sent to same server, good for cdr
collection, authentication, etc
- hash over to uri - good to send registrations for a user to same
location server
- hash over r-uri - good to send calls to same location server as the
registration server for that user
Using a farm of servers, grouped by users, by combining the last three you can route the sip messages inside your network to get auth, acc and location services ok, and the first one to send to gateways :-)
I understand the concept of same keys hashing to the same values. :-) If one hashes a value that stays consistent within a dialog, then all requests within that dialog will go to the same place (and not just the transaction, which is the only thing TM is good for). If one hashes a value that's going to always be the same for a given user (such as a From URI), they will always be directed to the same gateway, etc.
What I still don't understand is what benefit this deterministic domain of values - this sameness - confers from a practical perspective. Yes, I know that if I hash the From URI, the caller will be sent to the same server, but, which server? Clearly, the answer is, "Whichever server their From URI hashes to." Sure. But what particular usefulness does that have, whether one is doing stateful or stateless processing?
Obviously, using a hash is more elegant and simple than statically assigning my users various bindings, as you point out in examples like "good to send calls to the same location server as the registration server for that user." But still, I am brought to ask - without having some means of determining exactly what that server will be, what's the advantage? It's obviously not load balancing, unless I know that my From URIs are going to have a certain desirable distribution when hashed, which I don't. Just keeping certain paths the same is nice, of course, but I fail to see how it's actually useful.
Sure, it's great if my registrants always go to the same location server, but if 90% of my users end up going to one location server because of the distribution that the variance of their From URIs provides, what does this really give me except a predictable route? It's not as if I can use the hash to "find" a user's location server -- unless the location server was determined using the same hash also. What's the point?
The hash function was tested to get pretty fair distribution for AoR, most of the hashed values respect this format. If 90% of your users end up to same server, then you may need to code a bit :-) and add an alternative hash function to the module. For me the existing one seems good so far.
I do not need to know where a user call is going. Practically, they could share the same db backend for auth, but the location and other user profiles details may be in memory for speed purposes. In what I am doing, all the servers in a group have same config, if i add a new one, I get a new dispersion of the users across servers.
Getting the distribution is quite simple, take the has function and make a simple app that takes as parameter a string and outputs the hash value. Knowing your subscriber base ids, you can estimate the results of dispatching.
If you look to a more fair distribution, round robin is your solution, with its limitations, as well. There are some using even random hashing value and it meets their needs. So I believe that we see the benefits for something when we have a use case, I am not using many of kamailio/openser features/modules, but I am sure they have a practical usage somewhere and I may need sometime.
Cheers, Daniel
Daniel-Constantin Mierla wrote:
The hash function was tested to get pretty fair distribution for AoR, most of the hashed values respect this format. If 90% of your users end up to same server, then you may need to code a bit :-) and add an alternative hash function to the module. For me the existing one seems good so far.
I do not need to know where a user call is going. Practically, they could share the same db backend for auth, but the location and other user profiles details may be in memory for speed purposes. In what I am doing, all the servers in a group have same config, if i add a new one, I get a new dispersion of the users across servers.
Ah, yes. That makes complete sense. With round-robin I would only get RR distribution at the transactional level, but not based on any characteristics of the request. Thank you for clearing all that up!
On Tuesday 30 September 2008, Daniel-Constantin Mierla wrote:
[..] Sure, it's great if my registrants always go to the same location server, but if 90% of my users end up going to one location server because of the distribution that the variance of their From URIs provides, what does this really give me except a predictable route? It's not as if I can use the hash to "find" a user's location server -- unless the location server was determined using the same hash also. What's the point?
The hash function was tested to get pretty fair distribution for AoR, most of the hashed values respect this format. If 90% of your users end up to same server, then you may need to code a bit :-) and add an alternative hash function to the module. For me the existing one seems good so far.
I do not need to know where a user call is going. Practically, they could share the same db backend for auth, but the location and other user profiles details may be in memory for speed purposes. In what I am doing, all the servers in a group have same config, if i add a new one, I get a new dispersion of the users across servers.
Getting the distribution is quite simple, take the has function and make a simple app that takes as parameter a string and outputs the hash value. Knowing your subscriber base ids, you can estimate the results of dispatching.
Hi Alex,
don't that much about the dispatcher internals, but i think it also uses the CRC32 hash. In my tests with the carrierroute module that uses this too, requests over a random callid/ from user are distributed equally with about +/- 5% accuracy, which i think is ok. If you need to adjust the target host distribution, take a look at the hash_id functionality (carrierroute config file mode), here you can specify the exact destination. We use also the this approach of pre-calculating the hash value that Daniel described.
Cheers,
Henning
Hello,
On 10/06/08 13:27, Henning Westerholt wrote:
On Tuesday 30 September 2008, Daniel-Constantin Mierla wrote:
[..] Sure, it's great if my registrants always go to the same location server, but if 90% of my users end up going to one location server because of the distribution that the variance of their From URIs provides, what does this really give me except a predictable route? It's not as if I can use the hash to "find" a user's location server -- unless the location server was determined using the same hash also. What's the point?
The hash function was tested to get pretty fair distribution for AoR, most of the hashed values respect this format. If 90% of your users end up to same server, then you may need to code a bit :-) and add an alternative hash function to the module. For me the existing one seems good so far.
I do not need to know where a user call is going. Practically, they could share the same db backend for auth, but the location and other user profiles details may be in memory for speed purposes. In what I am doing, all the servers in a group have same config, if i add a new one, I get a new dispersion of the users across servers.
Getting the distribution is quite simple, take the has function and make a simple app that takes as parameter a string and outputs the hash value. Knowing your subscriber base ids, you can estimate the results of dispatching.
Hi Alex,
don't that much about the dispatcher internals, but i think it also uses the CRC32 hash. In my tests with the carrierroute module that uses this too, requests over a random callid/ from user are distributed equally with about +/- 5% accuracy, which i think is ok. If you need to adjust the target host distribution, take a look at the hash_id functionality (carrierroute config file mode), here you can specify the exact destination. We use also the this approach of pre-calculating the hash value that Daniel described.
dispatcher is using an internal function, from hash_func.h. God to know you have tested with crc32, I might make it available as alternative hashing function for dispatcher.
Cheers, Daniel
On Monday 06 October 2008, Daniel-Constantin Mierla wrote:
don't that much about the dispatcher internals, but i think it also uses the CRC32 hash. In my tests with the carrierroute module that uses this too, requests over a random callid/ from user are distributed equally with about +/- 5% accuracy, which i think is ok. If you need to adjust the target host distribution, take a look at the hash_id functionality (carrierroute config file mode), here you can specify the exact destination. We use also the this approach of pre-calculating the hash value that Daniel described.
dispatcher is using an internal function, from hash_func.h. God to know you have tested with crc32, I might make it available as alternative hashing function for dispatcher.
Cheers, Daniel
Hi Daniel,
ah, good to know, thanks for the clarification.
Cheers,
Henning