<!-- Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on sr-users mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
If you have questions about developing extensions to Kamailio or its existing C code, ask on sr-dev mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Please try to fill this template as much as possible for any issue. It helps the developers to troubleshoot the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment). -->
### Description
on busy servers, there's a sporadic crash in slow timer. we don't have any debugging data anymore nor a way to reliably reproduce it. just posting here in case someone sees the same and can find a way to reliably reproduce it and maybe test the patch below. <!-- Explain what you did, what you expected to happen, and what actually happened. -->
### Troubleshooting
#### Reproduction
<!-- If the issue can be reproduced, describe how it can be done. -->
### Possible Solutions
<!-- If you found a solution or workaround for the issue, describe it. Ideally, provide a pull request with a fix. --> with the patch below we don't see any crashes either `tl` or `tl->f` is null when it crashes. this is probably some race condition ``` diff --git a/src/core/timer.c b/src/core/timer.c index 0e0dc8812..3c59db3da 100644 --- a/src/core/timer.c +++ b/src/core/timer.c @@ -1128,7 +1128,7 @@ void slow_timer_main() #endif SET_RUNNING_SLOW(tl); UNLOCK_SLOW_TIMER_LIST(); - ret=tl->f(*ticks, tl, tl->data); + ret= (tl->f ? tl->f(*ticks, tl, tl->data) : 0); /* reset the configuration group handles */ cfg_reset_all(); if (ret==0){ ``` ### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` 5.1 / 5.2 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `uname -a`) -->
``` CentOS7 ```
Are the latest 5.1 / 5.2 versions? It seems to be some reset timer struct used there.
I am fine to push such safety check, but I would do it with some warning log message. I am going to push a patch and then you can adjust further if you need something else.
@miconda yes, latest 5.1 / 5.2 under heavy load. thanks
Pushed 574b080d69b2b968cfe871bc7cfe8fdf930fbc2e .
Have noticed some blocking of/slow message processing before that happening?
no warnings , just crash. what are the odds of `tl` being `null` ? we had some crashes where `fl` was `null`. maybe something is grabbing or invalidating the timer between `UNLOCK_SLOW_TIMER_LIST` and next instruction ? is that even possible ?
Did you mean `f` or `tl` in the **"we had some crashes where fl was null"**?
Probably the `tl` should not be null, because its value (pointer) should be copied in the timer process list. But its content depends on each module that adds items to the timer list. So tl in timer list should be a non-null value, but can end up pointing to an address which is no longer a timer_ln struct.
sorry, meant to write `tl`
OK.
Closing this one, with the patch applied -- if you get new data, reopen or create a new issue.
Closed #2120.