Hi!
I've been running some performance tests on my OpenSER box and while registering large number of users I faced some very strange problems. At least they seem strange to me, but I hope someone on this list would be able to shed some light on this.
So, I use Openser 1.0.1 and I load test it using SIPp. I've done tests creating about 1000 registrations per second registering 300k-450k different users. Register rate is constat and for each register different user is used. What is strange to me that just about when 300k users have registered the CPU of the PC running the proxy hits 100% usage and messages start dropping. I made a nice image to illustrate my problem using Excel. I attached it to this mail. On this test I registered 300k different users at 1000 RPS and after registering those users the registering starts again from the beginning. As you can see from it the CPU usage rises constantly and drops immediately after I've registered 300k users and the registering starts again from the start. So then each user is being registered all over again, but still the CPU load grows. Does anyone have explanation for this? Why does the CPU usage grow based on the number of users registered? And why does it drop for a while when registering starts all over again? I've tried different usrloc modes, but there seems to be no difference. This one was done with having usrloc only in the memory.
Regards,
Teemu
-- Teemu Harju http://www.teemuharju.net
Hi Teemu,
this is a very interesting test :).
the increased CPU load is a result on the increased number of AOR records in the usrloc hash table - for each register, the usrloc module will try to see first if the user (AOR) is already registered or not....this means searching through the hash table...and as loaded the table is as higher the CPU load will be.
you may experiment by increasing the hash table size in order to reduce the number of collisions per branch (by default there are only 512 hash branches, so for 450K of distinctive users, you will have ~ 900 AORs per branch). if you want to change it, see the modules/usrloc/dlist.c line 341 (for unstable) : if (new_udomain(&(ptr->name), 512, &(ptr->d)) < 0) {
maybe in the future the hash size should be controllable via module parameter. NOTE that the size must be a power of 2!!
now, regarding the peeks.....the decreasing load may be a result of expiring contacts -> the load of the table decrees and so the CPU load....what is the expire of the registrations?
regards, bogdan
Teemu Harju wrote:
Hi!
I've been running some performance tests on my OpenSER box and while registering large number of users I faced some very strange problems. At least they seem strange to me, but I hope someone on this list would be able to shed some light on this.
So, I use Openser 1.0.1 and I load test it using SIPp. I've done tests creating about 1000 registrations per second registering 300k-450k different users. Register rate is constat and for each register different user is used. What is strange to me that just about when 300k users have registered the CPU of the PC running the proxy hits 100% usage and messages start dropping. I made a nice image to illustrate my problem using Excel. I attached it to this mail. On this test I registered 300k different users at 1000 RPS and after registering those users the registering starts again from the beginning. As you can see from it the CPU usage rises constantly and drops immediately after I've registered 300k users and the registering starts again from the start. So then each user is being registered all over again, but still the CPU load grows. Does anyone have explanation for this? Why does the CPU usage grow based on the number of users registered? And why does it drop for a while when registering starts all over again? I've tried different usrloc modes, but there seems to be no difference. This one was done with having usrloc only in the memory.
Regards,
Teemu
-- Teemu Harju http://www.teemuharju.net
Hi,
Thanks for the reply. I was thinking about something like that since the CPU load grows. I'll try what you mentioned and report back the result.
The strange thing still is that when the registrations start over again, the load drops immediately. So, if I follow the number of registered users, then exactly when the number reaches the 300k (that is with how many users I've been testing with) the load drops. You can see it also in the graph I had in the first email. After 5 minutes the load drops and starts growing again (5*60*1000 = 300k). So then it is re-registering the same users. The expire times for registrations are the default 3600s, so that should not be the reason for the drop in CPU load. I've thought that maybe there is something strange about the test I'm running, but those are just regular REGISTER messages.
- Teemu
2006/3/10, Bogdan-Andrei Iancu bogdan@voice-system.ro:
Hi Teemu,
this is a very interesting test :).
the increased CPU load is a result on the increased number of AOR records in the usrloc hash table - for each register, the usrloc module will try to see first if the user (AOR) is already registered or not....this means searching through the hash table...and as loaded the table is as higher the CPU load will be.
you may experiment by increasing the hash table size in order to reduce the number of collisions per branch (by default there are only 512 hash branches, so for 450K of distinctive users, you will have ~ 900 AORs per branch). if you want to change it, see the modules/usrloc/dlist.c line 341 (for unstable) : if (new_udomain(&(ptr->name), 512, &(ptr->d)) < 0) {
maybe in the future the hash size should be controllable via module parameter. NOTE that the size must be a power of 2!!
now, regarding the peeks.....the decreasing load may be a result of expiring contacts -> the load of the table decrees and so the CPU load....what is the expire of the registrations?
regards, bogdan
Teemu Harju wrote:
Hi!
I've been running some performance tests on my OpenSER box and while registering large number of users I faced some very strange problems. At least they seem strange to me, but I hope someone on this list would be able to shed some light on this.
So, I use Openser 1.0.1 and I load test it using SIPp. I've done tests creating about 1000 registrations per second registering 300k-450k different users. Register rate is constat and for each register different user is used. What is strange to me that just about when 300k users have registered the CPU of the PC running the proxy hits 100% usage and messages start dropping. I made a nice image to illustrate my problem using Excel. I attached it to this mail. On this test I registered 300k different users at 1000 RPS and after registering those users the registering starts again from the beginning. As you can see from it the CPU usage rises constantly and drops immediately after I've registered 300k users and the registering starts again from the start. So then each user is being registered all over again, but still the CPU load grows. Does anyone have explanation for this? Why does the CPU usage grow based on the number of users registered? And why does it drop for a while when registering starts all over again? I've tried different usrloc modes, but there seems to be no difference. This one was done with having usrloc only in the memory.
Regards,
Teemu
-- Teemu Harju http://www.teemuharju.net
-- Teemu Harju http://www.teemuharju.net
Problem solved. It seemed that since I was registering the users in sequential order, it was sort of the worst case scenario for the usrloc. When registering users in random order the load grows still according to the number of users, but after all users are registered the number stays constant.
I didn't yet increase the hash table size, but that might give some more performance so I might try it.
- Teemu
2006/3/10, Teemu Harju teemu.harju@gmail.com:
Hi,
Thanks for the reply. I was thinking about something like that since the CPU load grows. I'll try what you mentioned and report back the result.
The strange thing still is that when the registrations start over again, the load drops immediately. So, if I follow the number of registered users, then exactly when the number reaches the 300k (that is with how many users I've been testing with) the load drops. You can see it also in the graph I had in the first email. After 5 minutes the load drops and starts growing again (5*60*1000 = 300k). So then it is re-registering the same users. The expire times for registrations are the default 3600s, so that should not be the reason for the drop in CPU load. I've thought that maybe there is something strange about the test I'm running, but those are just regular REGISTER messages.
- Teemu
2006/3/10, Bogdan-Andrei Iancu <bogdan@voice-system.ro >:
Hi Teemu,
this is a very interesting test :).
the increased CPU load is a result on the increased number of AOR records in the usrloc hash table - for each register, the usrloc module will try to see first if the user (AOR) is already registered or not....this means searching through the hash table...and as loaded the table is as higher the CPU load will be.
you may experiment by increasing the hash table size in order to reduce the number of collisions per branch (by default there are only 512 hash branches, so for 450K of distinctive users, you will have ~ 900 AORs per branch). if you want to change it, see the modules/usrloc/dlist.c line 341 (for unstable) : if (new_udomain(&(ptr->name), 512, &(ptr->d)) < 0) {
maybe in the future the hash size should be controllable via module parameter. NOTE that the size must be a power of 2!!
now, regarding the peeks.....the decreasing load may be a result of expiring contacts -> the load of the table decrees and so the CPU load....what is the expire of the registrations?
regards, bogdan
Teemu Harju wrote:
Hi!
I've been running some performance tests on my OpenSER box and while registering large number of users I faced some very strange problems. At least they seem strange to me, but I hope someone on this list would be able to shed some light on this.
So, I use Openser 1.0.1 and I load test it using SIPp. I've done tests creating about 1000 registrations per second registering 300k-450k different users. Register rate is constat and for each register different user is used. What is strange to me that just about when 300k users have registered the CPU of the PC running the proxy hits 100% usage and messages start dropping. I made a nice image to illustrate my problem using Excel. I attached it to this mail. On this test I registered 300k different users at 1000 RPS and after registering those users the registering starts again from the beginning. As you can see from it the CPU usage rises constantly and drops immediately after I've registered 300k users and the registering
starts again from the start. So then each user is being registered all over again, but still the CPU load grows. Does anyone have explanation for this? Why does the CPU usage grow based on the number of users registered? And why does it drop for a while when registering starts all over again? I've tried different usrloc modes, but there seems to be no difference. This one was done with having usrloc only in the memory.
Regards,
Teemu
-- Teemu Harju http://www.teemuharju.net
-- Teemu Harju http://www.teemuharju.net
-- Teemu Harju http://www.teemuharju.net
Teemu,
Teemu Harju wrote:
Hi,
Thanks for the reply. I was thinking about something like that since the CPU load grows. I'll try what you mentioned and report back the result.
that will be great. btw, are you using the devel or stable version of openser? in devel version you can get more info via the statistics support: the amount of used memory, the number of AORs and contacts.
The strange thing still is that when the registrations start over again, the load drops immediately. So, if I follow the number of registered users, then exactly when the number reaches the 300k (that is with how many users I've been testing with) the load drops. You can see it also in the graph I had in the first email. After 5 minutes the load drops and starts growing again (5*60*1000 = 300k). So then it is re-registering the same users. The expire times for registrations are the default 3600s, so that should not be the reason for the drop in CPU load. I've thought that maybe there is something strange about the test I'm running, but those are just regular REGISTER messages.
the explanation is that you do not have a constant traffic of REGISTER. If you have 300K of REGISTER with 1000 REGISTER per second -> all 300K will be registered in 5 minutes (as you also said). The usrloc uses CPU power only when processing REGISTERs. once done, there is nothing to process until the next burst of register comes.
Also you mentioned something about messages being discarded - this may happen if the burst is high, the number of worker too low and the OS has a limited queue for UDP packages... solution -> increase the number of children processes.
regards, bogdan
- Teemu
2006/3/10, Bogdan-Andrei Iancu <bogdan@voice-system.ro mailto:bogdan@voice-system.ro>:
Hi Teemu, this is a very interesting test :). the increased CPU load is a result on the increased number of AOR records in the usrloc hash table - for each register, the usrloc module will try to see first if the user (AOR) is already registered or not....this means searching through the hash table...and as loaded the table is as higher the CPU load will be. you may experiment by increasing the hash table size in order to reduce the number of collisions per branch (by default there are only 512 hash branches, so for 450K of distinctive users, you will have ~ 900 AORs per branch). if you want to change it, see the modules/usrloc/dlist.c line 341 (for unstable) : if (new_udomain(&(ptr->name), 512, &(ptr->d)) < 0) { maybe in the future the hash size should be controllable via module parameter. NOTE that the size must be a power of 2!! now, regarding the peeks.....the decreasing load may be a result of expiring contacts -> the load of the table decrees and so the CPU load....what is the expire of the registrations? regards, bogdan Teemu Harju wrote: > Hi! > > I've been running some performance tests on my OpenSER box and while > registering large number of users I faced some very strange problems. > At least they seem strange to me, but I hope someone on this list > would be able to shed some light on this. > > So, I use Openser 1.0.1 and I load test it using SIPp. I've done tests > creating about 1000 registrations per second registering 300k-450k > different users. Register rate is constat and for each register > different user is used. What is strange to me that just about when > 300k users have registered the CPU of the PC running the proxy hits > 100% usage and messages start dropping. I made a nice image to > illustrate my problem using Excel. I attached it to this mail. On this > test I registered 300k different users at 1000 RPS and after > registering those users the registering starts again from the > beginning. As you can see from it the CPU usage rises constantly and > drops immediately after I've registered 300k users and the registering > starts again from the start. So then each user is being registered all > over again, but still the CPU load grows. Does anyone have explanation > for this? Why does the CPU usage grow based on the number of users > registered? And why does it drop for a while when registering starts > all over again? I've tried different usrloc modes, but there seems to > be no difference. This one was done with having usrloc only in the > memory. > > Regards, > > Teemu > > -- > Teemu Harju > http://www.teemuharju.net > ------------------------------------------------------------------------
-- Teemu Harju http://www.teemuharju.net