Hi Carlos, Carsten,
From a bit of code inspection, it looks like this affects the error paths for the diameter responses.
I've seen these warnings printed from both the s-cscf, and the i-cscf when there were diameter timeouts (although it didn't cause a crash every time).

Dec 15 12:13:23 kamailio kam-scscf[22542]: ERROR: <script>: We need to do an UNREG server SAR assignemnt
Dec 15 12:13:23 kamailio kam-scscf[22542]: INFO: ims_registrar_scscf [cxdx_sar.c:79]: create_return_code(): created AVP successfully : [saa_return_code] - [-2]
Dec 15 12:13:23 kamailio kam-scscf[22553]: INFO: ims_registrar_scscf [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_experimental_result_code: Failed finding avp
Dec 15 12:13:23 kamailio kam-scscf[22553]: INFO: ims_registrar_scscf [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_charging_info: Failed finding avp
Dec 15 12:13:23 kamailio kam-scscf[22553]: ERROR: <script>: Unknown return code from SAR, value is [<null>]
...
Dec 16 17:53:51 kamailio kam-icscf[23653]: INFO: ims_icscf [cxdx_uar.c:71]: create_uaa_return_code(): created AVP successfully : [uaa_return_code]
Dec 16 17:53:57 kamailio kam-icscf[23666]: ERROR: ims_icscf [cxdx_uar.c:107]: async_cdp_uar_callback(): Error timeout when  sending message via CDP
Dec 16 17:53:57 kamailio kam-icscf[23666]: ERROR: <script>: Unknown return code from UAR, value is [<null>]

I think there are two issues:
1) The return_code avp does not work causing a NULL value or crash. I experimented by restoring the avp lists from the suspended transaction in the 'error:' section and this seems to work (attached patch) - I can now see the  "-2" return code that was set up before the suspend. I'll leave it to you or others to decide if the error handling is being done properly in this function and if my patch is useful.

Dec 17 16:41:07 kamailio kam-scscf[25089]: ERROR: <script>: We need to do an UNREG server SAR assignemnt
Dec 17 16:41:07 kamailio kam-scscf[25089]: INFO: ims_registrar_scscf [cxdx_sar.c:79]: create_return_code(): created AVP successfully : [saa_return_code] - [-2]
Dec 17 16:41:07 kamailio kam-scscf[25099]: INFO: ims_registrar_scscf [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_experimental_result_code: Failed finding avp
Dec 17 16:41:07 kamailio kam-scscf[25099]: INFO: ims_registrar_scscf [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_charging_info: Failed finding avp
Dec 17 16:41:07 kamailio kam-scscf[25099]: ERROR: <script>: SAR error - error response sent from module

2) In these error cases, the original transaction is not responded to. This leaves hanging calls and other requests. Perhaps the example cfgs could be updated with default replies in the appropriate places.

Let me know if there are patches you want me to try.

Hugh

On 15/12/2013 21:17, Hugh Waite wrote:
Hello,
I am seeing a crash within the latest ims modules using the example cfg scripts. It also happened in 4.1

1) The s-cscf receives a request from an application server and runs 'assign_server_unreg' (cfg line 368) because the intended destination is not registered.
2) The HSS returns an error '5012: Unable to comply' and the suspended transaction is resumed into the UNREG_SAR_REPLY route (cxdx_sar.c:290)
3) The coredump shows that the AVP lists are nonsensical, so the action to get $avp(s:saa_return_code) causes a crash.

Do the avp lists need to be re-initialised from the suspended transaction, like in the 'success/done' section (cxdx_sar.c:252)?
Maybe someone who is more familiar with this code can shine some light on this?

Also in this scenario I can't see a code path that will send a response back to the application server e.g. '480 Temporarily Unavailable' - Should this be done in the cfg before calling assign_server_unreg?

Regards,
Hugh

Backtrace:
(gdb) bt
#0  0x000000000053dc89 in match_by_name (avp=0x303630363a6d6f63, id=116, name=0x7ffff29895f8) at usr_avp.c:391
#1  0x000000000053e411 in search_next_avp (s=0x7ffff29895f0, val=0x7ffff2989630) at usr_avp.c:507
#2  0x000000000053e120 in search_avp (ident=..., val=0x7ffff2989630, state=0x7ffff29895f0) at usr_avp.c:475
#3  0x000000000053de09 in search_first_avp (flags=1, name=..., val=0x7ffff2989630, s=0x7ffff29895f0) at usr_avp.c:427
#4  0x00007fa8de2f5626 in pv_get_avp (msg=0x7ffff298a030, param=0x7fa8de86b898, res=0x7ffff2989760) at pv_core.c:1475
#5  0x0000000000499270 in pv_get_spec_value (msg=0x7ffff298a030, sp=0x7fa8de86b880, value=0x7ffff2989760) at pvapi.c:1266
#6  0x00000000004c5f03 in rval_get_int (h=0x7ffff2989ef0, msg=0x7ffff298a030, i=0x7ffff2989d58, rv=0x7fa8de86b878, cache=0x0) at rvalue.c:978
#7  0x00000000004c89f5 in rval_expr_eval_int (h=0x7ffff2989ef0, msg=0x7ffff298a030, res=0x7ffff2989d58, rve=0x7fa8de86b870) at rvalue.c:1918
#8  0x0000000000420648 in do_action (h=0x7ffff2989ef0, a=0x7fa8de86eaa8, msg=0x7ffff298a030) at action.c:1219
#9  0x0000000000422878 in run_actions (h=0x7ffff2989ef0, a=0x7fa8de86aa30, msg=0x7ffff298a030) at action.c:1599
#10 0x0000000000423017 in run_top_route (a=0x7fa8de86aa30, msg=0x7ffff298a030, c=0x0) at action.c:1685
#11 0x00007fa8de59eae3 in t_continue (hash_index=15710, label=170389234, route=0x7fa8de86aa30) at t_suspend.c:245
#12 0x00007fa8da1ebc98 in async_cdp_callback (is_timeout=0, param=0x7fa8d5c68f40, saa=0x0, elapsed_msecs=1) at cxdx_sar.c:290
#13 0x00007fa8db23cacb in api_callback (p=0x7fa8d5c24d40, msg=0x7fa8d5c5aca8, ptr=0x0) at api_process.c:115
#14 0x00007fa8db27ad87 in worker_process (id=2) at worker.c:330
#15 0x00007fa8db257aea in diameter_peer_start (blocking=0) at diameter_peer.c:309
#16 0x00007fa8db25a02b in cdp_child_init (rank=0) at mod.c:237
#17 0x00000000004f7ec2 in init_mod_child (m=0x7fa8de841158, rank=0) at sr_module.c:924
#18 0x00000000004f7d65 in init_mod_child (m=0x7fa8de841d00, rank=0) at sr_module.c:921
#19 0x00000000004f7d65 in init_mod_child (m=0x7fa8de8420a8, rank=0) at sr_module.c:921
#20 0x00000000004f7d65 in init_mod_child (m=0x7fa8de842458, rank=0) at sr_module.c:921
#21 0x00000000004f7d65 in init_mod_child (m=0x7fa8de842ae8, rank=0) at sr_module.c:921
#22 0x00000000004f7d65 in init_mod_child (m=0x7fa8de842f60, rank=0) at sr_module.c:921
#23 0x00000000004f8048 in init_child (rank=0) at sr_module.c:948
#24 0x000000000046d57c in main_loop () at main.c:1694
#25 0x000000000047030b in main (argc=13, argv=0x7ffff298af78) at main.c:2533


-- 
Hugh Waite
Principal Design Engineer
Crocodile RCS Ltd.


_______________________________________________
sr-dev mailing list
sr-dev@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev


-- 
Hugh Waite
Principal Design Engineer
Crocodile RCS Ltd.