Hi, in the doc of HTable module all the examples show "a=>size=4;". As the doc says:
size - number specifying the size of hash table. The number of entries in the table is 2^size
This is, the examples just allow 16 concurrent entries, which is not enough in most of the cases. Did I misunderstand something? or do the module examples use a non very pertinent value?
Thanks a lot.
Iñaki Baz Castillo writes:
Hi, in the doc of HTable module all the examples show "a=>size=4;". As the doc says:
size - number specifying the size of hash table. The number of entries in the table is 2^size
This is, the examples just allow 16 concurrent entries, which is not enough in most of the cases. Did I misunderstand something? or do the module examples use a non very pertinent value?
inaki,
size=4 means that the hash table has 16 entries. it does not mean that the table could not contain more objects, but collisions will happen.
there is an example with size=8 too. for some reason exampe 1.13 shows two identical htables:
modparam("htable", "htable", "a=>size=4;autoexpire=7200;dbtable=htable_a;") modparam("htable", "htable", "a=>size=4;autoexpire=7200;dbtable=htable_a;")
-- juha
2010/10/16 Juha Heinanen jh@tutpro.com:
size=4 means that the hash table has 16 entries. it does not mean that the table could not contain more objects, but collisions will happen.
And those collisions mean that I could look for a key name and retrieve another key (as both key names produce the same hash), am I right? So, if I want to handle ~ 300 concurrent entries, with *no* collision, which "size" value is good enough? how to determine it?
Thanks a lot.
Iñaki Baz Castillo writes:
And those collisions mean that I could look for a key name and retrieve another key (as both key names produce the same hash), am I right?
no, the values whose keys hash to the same index of hash table, are added into a linked list and then list list will be searched until the correct key is found.
So, if I want to handle ~ 300 concurrent entries, with *no* collision, which "size" value is good enough? how to determine it?
you cannot unless you know the hash algorithm and your keys in advance.
-- juha
Iñaki,
Just as an addition from experience on hash tables, the first "sweet-spot" on the design is to have a fairly good algorithm so that different keys are mapped on different hash values. You cannot pick this (as far as I read) but we'll think it as fair enough. What you can do to "help" the algorithm is to pick a field already random or large. If you have redundant fields (like username and table ID from a database) try with both "unique" fields as a key and check which gives you better results.
Then you should use a table big enough to map a big number of different hash values. As computing is discrete you will find a maximum size, above that size you'll find collisions on some slots and empty slots (because the algorithm cannot generate those hash values). Keep in mind that sizing the hash table will consume memory resources.
Last you know that statistically you will bump into collisions, they are inevitable and no real-life practice will guarantee collision-free operation. When you have collisions, what you can do is have a powerful processor to operate searches and inserts quickly.
What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to: - If the table is pretty much or completely used and you have collisions, make the size parameter bigger - If the table is big enough so you'll have some slots free and still have collisions you can try picking some key (if you have redundant ones) that is more "random" so the hash result will be more "random". - If all above is not producing any result, add processor power to the system so searches and inserts are faster
Hope these guidelines help, they are some conclusions from hashing and queuing theory books.
Cheers, Uriel
On Sat, Oct 16, 2010 at 7:47 AM, Juha Heinanen jh@tutpro.com wrote:
Iñaki Baz Castillo writes:
And those collisions mean that I could look for a key name and retrieve another key (as both key names produce the same hash), am I right?
no, the values whose keys hash to the same index of hash table, are added into a linked list and then list list will be searched until the correct key is found.
So, if I want to handle ~ 300 concurrent entries, with *no* collision, which "size" value is good enough? how to determine it?
you cannot unless you know the hash algorithm and your keys in advance.
-- juha
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
On 10/16/2010 09:18 AM, Juha Heinanen wrote:
Uriel Rozenbaum writes:
What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to:
it would indeed be nice of htable module would exports stats about collisions.
http://sip-router.org/docbook/sip-router/branch/master/rpc_list/rpc_list.htm...
... is a crude way of visualising the distribution, I imagine.
2010/10/16 Uriel Rozenbaum uriel.rozenbaum@gmail.com:
What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to:
- If the table is pretty much or completely used and you have
collisions, make the size parameter bigger
- If the table is big enough so you'll have some slots free and still
have collisions you can try picking some key (if you have redundant ones) that is more "random" so the hash result will be more "random".
- If all above is not producing any result, add processor power to the
system so searches and inserts are faster
Hope these guidelines help, they are some conclusions from hashing and queuing theory books.
Sure, thanks a lot.
Maybe the term entry should be replaced with slot, looks to be more standard. A good doc about hash tables is on wikipedia:
http://en.wikipedia.org/wiki/Hash_table
The htable module keeps the number of items in a slot, but not exported via rpc, this could be added in the future.
Ramona
On 10/17/2010 04:39 PM, Iñaki Baz Castillo wrote:
2010/10/16 Uriel Rozenbaumuriel.rozenbaum@gmail.com:
What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to:
- If the table is pretty much or completely used and you have
collisions, make the size parameter bigger
- If the table is big enough so you'll have some slots free and still
have collisions you can try picking some key (if you have redundant ones) that is more "random" so the hash result will be more "random".
- If all above is not producing any result, add processor power to the
system so searches and inserts are faster
Hope these guidelines help, they are some conclusions from hashing and queuing theory books.
Sure, thanks a lot.