HTable 'size' parameter

List overview All Threads
Download

newer

older

siremis 2.0 ?

Kamailio v3.0.4 Released

Iñaki Baz Castillo

16 Oct 2010 16 Oct '10

10:15 a.m.

Hi, in the doc of HTable module all the examples show "a=>size=4;". As the doc says:

size - number specifying the size of hash table. The number of entries in the table is 2^size

This is, the examples just allow 16 concurrent entries, which is not enough in most of the cases. Did I misunderstand something? or do the module examples use a non very pertinent value?

Thanks a lot.

-- Iñaki Baz Castillo ibc@aliax.net

Show replies by date

Juha Heinanen

16 Oct 16 Oct

10:37 a.m.

Iñaki Baz Castillo writes:

...

Hi, in the doc of HTable module all the examples show "a=>size=4;". As the doc says:

size - number specifying the size of hash table. The number of entries in the table is 2^size

This is, the examples just allow 16 concurrent entries, which is not enough in most of the cases. Did I misunderstand something? or do the module examples use a non very pertinent value?

inaki,

size=4 means that the hash table has 16 entries. it does not mean that the table could not contain more objects, but collisions will happen.

there is an example with size=8 too. for some reason exampe 1.13 shows two identical htables:

modparam("htable", "htable", "a=>size=4;autoexpire=7200;dbtable=htable_a;") modparam("htable", "htable", "a=>size=4;autoexpire=7200;dbtable=htable_a;")

-- juha

Iñaki Baz Castillo

10:43 a.m.

2010/10/16 Juha Heinanen jh@tutpro.com:

...

size=4 means that the hash table has 16 entries. it does not mean that the table could not contain more objects, but collisions will happen.

And those collisions mean that I could look for a key name and retrieve another key (as both key names produce the same hash), am I right? So, if I want to handle ~ 300 concurrent entries, with *no* collision, which "size" value is good enough? how to determine it?

Thanks a lot.

-- Iñaki Baz Castillo ibc@aliax.net

Juha Heinanen

10:47 a.m.

Iñaki Baz Castillo writes:

...

And those collisions mean that I could look for a key name and retrieve another key (as both key names produce the same hash), am I right?

no, the values whose keys hash to the same index of hash table, are added into a linked list and then list list will be searched until the correct key is found.

...

So, if I want to handle ~ 300 concurrent entries, with *no* collision, which "size" value is good enough? how to determine it?

you cannot unless you know the hash algorithm and your keys in advance.

-- juha

Uriel Rozenbaum

1:12 p.m.

Iñaki,

Just as an addition from experience on hash tables, the first "sweet-spot" on the design is to have a fairly good algorithm so that different keys are mapped on different hash values. You cannot pick this (as far as I read) but we'll think it as fair enough. What you can do to "help" the algorithm is to pick a field already random or large. If you have redundant fields (like username and table ID from a database) try with both "unique" fields as a key and check which gives you better results.

Then you should use a table big enough to map a big number of different hash values. As computing is discrete you will find a maximum size, above that size you'll find collisions on some slots and empty slots (because the algorithm cannot generate those hash values). Keep in mind that sizing the hash table will consume memory resources.

Last you know that statistically you will bump into collisions, they are inevitable and no real-life practice will guarantee collision-free operation. When you have collisions, what you can do is have a powerful processor to operate searches and inserts quickly.

What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to: - If the table is pretty much or completely used and you have collisions, make the size parameter bigger - If the table is big enough so you'll have some slots free and still have collisions you can try picking some key (if you have redundant ones) that is more "random" so the hash result will be more "random". - If all above is not producing any result, add processor power to the system so searches and inserts are faster

Hope these guidelines help, they are some conclusions from hashing and queuing theory books.

Cheers, Uriel

On Sat, Oct 16, 2010 at 7:47 AM, Juha Heinanen jh@tutpro.com wrote:

...

Iñaki Baz Castillo writes:

...
And those collisions mean that I could look for a key name and retrieve another key (as both key names produce the same hash), am I right?

no, the values whose keys hash to the same index of hash table, are added into a linked list and then list list will be searched until the correct key is found.

...
So, if I want to handle ~ 300 concurrent entries, with *no* collision, which "size" value is good enough? how to determine it?

you cannot unless you know the hash algorithm and your keys in advance.

-- juha

SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

Juha Heinanen

1:18 p.m.

Uriel Rozenbaum writes:

...

What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to:

it would indeed be nice of htable module would exports stats about collisions.

-- juha

Alex Balashov

8:18 p.m.

On 10/16/2010 09:18 AM, Juha Heinanen wrote:

...

Uriel Rozenbaum writes:

...
What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to:

it would indeed be nice of htable module would exports stats about collisions.

http://sip-router.org/docbook/sip-router/branch/master/rpc_list/rpc_list.htm...

... is a crude way of visualising the distribution, I imagine.

-- Alex Balashov - Principal Evariste Systems LLC 1170 Peachtree Street 12th Floor, Suite 1200 Atlanta, GA 30309 Tel: +1-678-954-0670 Fax: +1-404-961-1892 Web: http://www.evaristesys.com/

Iñaki Baz Castillo

17 Oct 17 Oct

2:39 p.m.

2010/10/16 Uriel Rozenbaum uriel.rozenbaum@gmail.com:

...

What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to:

If the table is pretty much or completely used and you have

collisions, make the size parameter bigger

If the table is big enough so you'll have some slots free and still

have collisions you can try picking some key (if you have redundant ones) that is more "random" so the hash result will be more "random".

If all above is not producing any result, add processor power to the

system so searches and inserts are faster

Hope these guidelines help, they are some conclusions from hashing and queuing theory books.

Sure, thanks a lot.

-- Iñaki Baz Castillo ibc@aliax.net

Elena-Ramona Modroiu

21 Oct 21 Oct

9:01 a.m.

Maybe the term entry should be replaced with slot, looks to be more standard. A good doc about hash tables is on wikipedia:

http://en.wikipedia.org/wiki/Hash_table

The htable module keeps the number of items in a slot, but not exported via rpc, this could be added in the future.

Ramona

On 10/17/2010 04:39 PM, Iñaki Baz Castillo wrote:

...

2010/10/16 Uriel Rozenbaumuriel.rozenbaum@gmail.com:

...
What would be best to optimize the system is to have some statistic information, save every now and then the status of the htable and act accordingly to:

If the table is pretty much or completely used and you have

collisions, make the size parameter bigger

If the table is big enough so you'll have some slots free and still

have collisions you can try picking some key (if you have redundant ones) that is more "random" so the hash result will be more "random".

If all above is not producing any result, add processor power to the

system so searches and inserts are faster

Hope these guidelines help, they are some conclusions from hashing and queuing theory books.

Sure, thanks a lot.

5374

Age (days ago)

5379

Last active (days ago)

sr-users@lists.kamailio.org

8 comments

5 participants

tags (0)

participants (5)

Alex Balashov
Elena-Ramona Modroiu
Iñaki Baz Castillo
Juha Heinanen
Uriel Rozenbaum