[kamailio/kamailio] Enabling DMQ cause high CPU usage (#822)

List overview All Threads
Download

newer

older

[kamailio/kamailio] Kamailio...

[kamailio/kamailio] dmq: remove...

Soapnix

14 Oct 2016 14 Oct '16

7:19 a.m.

Hi all! We trying to use two instances of kamailio on FreeBSD, kamailio was build from master branch. If I enables DMQ, 2 threads (pids) start utilize CPU, less than minute it's up to 100% (on a both servers). At that time kamailio successful handle calls, dmq and usrloc replication wokrs fine. While trying to understand what happens I got this debug info - something happens and K. cycles on shed_yeld() func. All additional info I put into attachments: [gdb_output.txt](https://github.com/kamailio/kamailio/files/528881/gdb_output.txt) [kamailio.log.gz](https://github.com/kamailio/kamailio/files/528879/kamailio.log.gz) [kamailio.txt](https://github.com/kamailio/kamailio/files/528883/kamailio.txt) [kdump.txt.gz](https://github.com/kamailio/kamailio/files/528880/kdump.txt.gz) [ktrace.out.gz](https://github.com/kamailio/kamailio/files/528878/ktrace.out.gz) [top_output.txt](https://github.com/kamailio/kamailio/files/528882/top_output.txt)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822

Attachments:

attachment.html (text/html — 2.7 KB)

Show replies by date

Daniel-Constantin Mierla

14 Oct 14 Oct

7:43 a.m.

Kamailio is using spin locks (for mutexes) and that can cause high cpu on some systems. I am not very familair with dmq internals, but it looks like it uses mutexes to trigger the presence of a new task for its worker processes, meaning that a worker stays in mutex-get operation (spinning), until a new task arrives. Maybe @charlesrchance can shed more light here.

Of course, this high cpu can show up in case of a deadlock, but if you say all works fine whether there is traffic to handle, then is not a deadlock.

The option is to compile to use POSIX semaphores -- the Makefile.defs needs to be changed and all recompiled.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-253731497

Soapnix

24 Oct 24 Oct

7:20 a.m.

On another one build I tried to enable POSIX semaphores, but problem still exists. Compiler change also didn't help. ``` root@kamailio-c1:/var/log# kamailio -V version: kamailio 5.0.0-dev6 (x86_64/freebsd) 580670 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, FAST_LOCK-ADAPTIVE_WAIT, USE_POSIX_SEM, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, select, kqueue. id: 580670 compiled on 16:53:43 Oct 21 2016 with gcc 4.2.1

```

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-255665829

Charles Chance

4:09 p.m.

Apologies, I have been away for a couple of weeks. DMQ does indeed use mutexes to trigger processing of a new task by its workers.

Your log shows everything is as expected with DMQ - i.e. with no other traffic, a lock is acquired/released once every minute with each peer notification (node ping). There is nothing else unusual about your log (or GDB output) that I can see.

So it does seem likely that something in the way these locks are being implemented is causing the high load on your particular system. Maybe someone else can suggest another alternative, if enabling POSIX semaphores has not resolved the issue for you.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-255786283

Soapnix

1 Nov 1 Nov

9:20 a.m.

Got deadlock (?) on latest build of kamailio: version: kamailio 5.0.0-dev6 (x86_64/freebsd) flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, select, kqueue. id: unknown compiled on 11:13:18 Nov 1 2016 with cc 3.4

backtrace in attachment [deadlock_bt.txt](https://github.com/kamailio/kamailio/files/563676/deadlock_bt.txt)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-257521294

Daniel-Constantin Mierla

4 Nov 4 Nov

9:34 a.m.

@charlesrchance - a solution is to use a shared memory variable to say if something needs to be consumed, like:

``` // mod init int *dmq_usrloc_tasks = shm_malloc(sizeof(int)); *dmq_usrloc_tasks = 0; gen_lock_t *dmq_usrloc_tasks_lock = ... // usual lock initialization

// runtime - producer lock_get(dmq_usrloc_tasks_lock); *dmq_usrloc_tasks++ lock_release(dmq_usrloc_tasks_lock);

// runtime - consumer while (1) { lock_get(dmq_usrloc_tasks_lock); while(*dmq_usrloc_tasks>0) *dmq_usrloc_tasks++; lock_release(dmq_usrloc_tasks_lock); sleep_us(dmq_usrloc_tasks_sleep); } ```

The incrementing/decrementing needs to be accompanied by producing/consuming the tasks.

The key here is the sleep_us(), where dmq_usrloc_tasks_sleep can be a new mod param to specify the miliseconds (or microseconds) to sleep. It proved that triggering a sleep takes the cpu from process and cpu load is kept low. One can tune its value to suit better its environment.

IIRC, this is used by maybe in async module for async_route().

An alternative is using in-memory sockets to pass tasks from producers to consumers. The consumer should be blocked in a read() and that should not consume cpu, although on some systems I got read() returning fast with error, ending in a loop of fast reads and high cpu again. This should be used by evapi to pass events between sip workers and evapi worker.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-258382642

Charles Chance

8 Nov 8 Nov

2:04 p.m.

Thanks, @miconda, will take a look.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-259144332

Soapnix

2 Feb 2 Feb

1:34 p.m.

Any progress?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-276957993

Charles Chance

28 Mar 28 Mar

8:04 p.m.

Revisiting this now after a busy few months.

@miconda - is it better in your opinion to introduce new locking method completely, or only in the presence of a compile-time flag?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-289887727

Tom Beard

9:34 p.m.

@charlesrchance do you need a lock at all?

https://github.com/tombeard/kamailio/commit/310c0f91858a6baf4b17a00798689f7f...

No idea how this performs on other platforms or in other situations, but for my specific requirement it solves the high cpu condition on FreeBSD waiting for lock yeild and I've had it running in production with dmq_usrloc for several months now without issue. Of course, your mileage may vary.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-289911773

Tom Beard

9:43 p.m.

@charlesrchance okay, after pressing send I see the obvious flaw in that, but it solves the problem in my specific situation with a single worker thread.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-289914166

Charles Chance

9:45 p.m.

@tombeard thanks - I guess not! Pull request?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-289914479

Charles Chance

9:55 p.m.

Ok, I'll refine and push for testing tomorrow. Thanks for the input @tombeard.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-289917098

Tom Beard

10:11 p.m.

@charlesrchance ignoring my second comment, I think that probably is safe. Either way, there's a pull request for testing / fixing.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-289920750

Charles Chance

29 Mar 29 Mar

12:33 p.m.

Thanks, @tombeard - it works and appears safe. However, in my own (Centos) tests, CPU utilisation when workers are idle is noticeably higher than before.

Increasing the sleep duration helps - so if it is implemented this way then it should be exposed as a mod param with a sensible default.

Personally, it seems like a (mostly) unnecessary trade-off between CPU cycles and job-processing delay, for what appears to be a limited number of cases (FreeBSD only one reported to my knowledge). Mutex by default for the remainder seems to be both more efficient and 'real-time'.

If everyone else agrees the alternative approach is better all round, then I am happy to merge once the sleep duration has been made configurable. It would still be good to hear @miconda's thoughts on making this a compile-time decision, though.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-290075638

Daniel-Constantin Mierla

1:46 p.m.

I am not sure if removing the lock/mutex is the proper solution here, as I didn't analyze the module properly, so it might be better to make it optional via modparam to run in old mode with locks and new mode with sleep. I guess it should not be that complex, given that the new patch is rather small.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-290094750

Charles Chance

1:53 p.m.

Yes, I would be happier implementing that way, since the lock method is better in most situations.

If no-one has any objections or reasons not to, then I shall proceed on that basis and make the necessary changes this evening.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-290096859

Charles Chance

6:11 p.m.

Closed #822.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#event-1020940437

Charles Chance

6:11 p.m.

Fixed in 71a88212 - please re-open if the issue remains after testing.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/822#issuecomment-290176026

3029

Age (days ago)

3195

Last active (days ago)

sr-dev@lists.kamailio.org

18 comments

4 participants

tags (0)

participants (4)

Charles Chance
Daniel-Constantin Mierla
Soapnix
Tom Beard