http_async and tm

List overview All Threads
Download

newer

older

How to restrict Kamailio adding...

apply keepalive to only udp

alexis

24 Aug 2024 24 Aug '24

10:48 p.m.

Hello all, context first, we have an REST API that performs queries to external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

Now we're consuming this API with http_client (synchronously) and as there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); }

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

Attachments:

attachment.html (text/html — 4.2 KB)

Show replies by date

Alex Balashov

25 Aug 25 Aug

12:02 a.m.

Not overtly related to your most immediate question, but:

1) "we're hitting a limit where all children become busy waiting for the API to answer."

2) "So i decided to move to http_async_client"

I'm not sure this is really going to solve your problem. You've hit fundamental, thermodynamic kind of limits here. Using async here just squeezes the balloon in one place and causes it to inflate in another.

It would be different if some of your workload were HTTP-dependent and other parts of the workload were not. Doing the HTTP queries would free up the core SIP workers to process other kinds of requests. That doesn't sound like it's the case, so all you're really liberating is reply processing, which, if this is a redirect server, is nonexistent anyway.

This matter might occasion some deeper reflection.

-- Alex

...

On Aug 24, 2024, at 6:48 PM, alexis via sr-users sr-users@lists.kamailio.org wrote:

Hello all, context first, we have an REST API that performs queries to external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

Now we're consuming this API with http_client (synchronously) and as there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); }

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Alexis Fidalgo

2:35 a.m.

it is. im dealing with this issue since a few weeks, i can push and add more cpu’s and more children and play with queues, timeouts, etc etc. but i know, im pretty sure that there’s a limit i can't surpass.

at this point im struggling on how to modify the http part (the api server) it would be great (and easy for me) if the execution pipelines can at least have some part where in can execute in parallel, but … a pipeline, why to execute following steps if not needed vs ‘execute and discard’ (and worst, some consumed webservices in the pipeline are transactional charged).

im happy, more that that, a docker swarm with 4 nodes with 10cpu/children each are handling ~1300 call attempls (invite,100,302,ack) per second, thats more than ok (and cpu/mem are low, really low, problem is the wait, no the power), but we need more (ill move to more cores/childrens and appeal to brute force by now to gain some time)

still seeing there’s no point to start using TM for a 100% stateless flow

...

On 24 Aug 2024, at 9:02 PM, Alex Balashov via sr-users sr-users@lists.kamailio.org wrote:

Not overtly related to your most immediate question, but:

"we're hitting a limit where all children become busy waiting for the API to answer."

"So i decided to move to http_async_client"

I'm not sure this is really going to solve your problem. You've hit fundamental, thermodynamic kind of limits here. Using async here just squeezes the balloon in one place and causes it to inflate in another.

It would be different if some of your workload were HTTP-dependent and other parts of the workload were not. Doing the HTTP queries would free up the core SIP workers to process other kinds of requests. That doesn't sound like it's the case, so all you're really liberating is reply processing, which, if this is a redirect server, is nonexistent anyway.

This matter might occasion some deeper reflection.

-- Alex

...
On Aug 24, 2024, at 6:48 PM, alexis via sr-users sr-users@lists.kamailio.org wrote:

Hello all, context first, we have an REST API that performs queries to external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

Now we're consuming this API with http_client (synchronously) and as there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); }

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Alex Balashov

4:17 a.m.

A few hundred CPS per node is impressive, but especially so when anything with HTTP is involved! Kamailio has HTTP client modules, but there's no pretending that being an HTTP client is natural or particularly performant for Kamailio.

Since most--not all, but most--of the workload is I/O wait, you could get away with just adding more children. The general principle that children should not exceed CPUs, because they'd just fight for CPU, applies when the workload is CPU-bound, or computational in nature. However, if most of the time is spent waiting on something to come across a socket, e.g. from a database or an HTTP server, you can overbook quite a bit relative to the CPUs.

You still don't want too many child processes, and I would be hard-pressed to say exactly how many is too many, but 2:1 or 3:1 should be safe.

-- Alex

...

On Aug 24, 2024, at 10:35 PM, Alexis Fidalgo alzrck@gmail.com wrote:

it is. im dealing with this issue since a few weeks, i can push and add more cpu’s and more children and play with queues, timeouts, etc etc. but i know, im pretty sure that there’s a limit i can't surpass.

at this point im struggling on how to modify the http part (the api server) it would be great (and easy for me) if the execution pipelines can at least have some part where in can execute in parallel, but … a pipeline, why to execute following steps if not needed vs ‘execute and discard’ (and worst, some consumed webservices in the pipeline are transactional charged).

im happy, more that that, a docker swarm with 4 nodes with 10cpu/children each are handling ~1300 call attempls (invite,100,302,ack) per second, thats more than ok (and cpu/mem are low, really low, problem is the wait, no the power), but we need more (ill move to more cores/childrens and appeal to brute force by now to gain some time)

still seeing there’s no point to start using TM for a 100% stateless flow

...
On 24 Aug 2024, at 9:02 PM, Alex Balashov via sr-users sr-users@lists.kamailio.org wrote:

Not overtly related to your most immediate question, but:

"we're hitting a limit where all children become busy waiting for the API to answer."

"So i decided to move to http_async_client"

I'm not sure this is really going to solve your problem. You've hit fundamental, thermodynamic kind of limits here. Using async here just squeezes the balloon in one place and causes it to inflate in another.

It would be different if some of your workload were HTTP-dependent and other parts of the workload were not. Doing the HTTP queries would free up the core SIP workers to process other kinds of requests. That doesn't sound like it's the case, so all you're really liberating is reply processing, which, if this is a redirect server, is nonexistent anyway.

This matter might occasion some deeper reflection.

-- Alex

...
On Aug 24, 2024, at 6:48 PM, alexis via sr-users sr-users@lists.kamailio.org wrote:

Hello all, context first, we have an REST API that performs queries to external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

Now we're consuming this API with http_client (synchronously) and as there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); }

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Henning Westerholt

7:11 a.m.

Hello Alex,

are you saying that using Kamailio with HTTP is not performant at all? This has not been my experience so far. I think many people are using it for large infrastructures. Your remarks regarding the latency and addressing the bottlenecks first are of course valid.

Regarding children to CPU core ratios, the default configuration is already giving you a 2:1 ratio (8 children for a standard 4 core server). As frequently suggested, the OP should probably investigate the UDP receiver queue (netstat/ss etc..) and increase the number of children if there is a significant and stable queue building up.

Cheers,

Henning

-- Henning Westerholt – https://skalatan.de/blog/ Kamailio services – https://gilawa.com > -----Original Message----- > From: Alex Balashov via sr-users sr-users@lists.kamailio.org > Sent: Sonntag, 25. August 2024 06:18 > To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org > Cc: Alex Balashov abalashov@evaristesys.com > Subject: [SR-Users] Re: http_async and tm > > A few hundred CPS per node is impressive, but especially so when anything > with HTTP is involved! Kamailio has HTTP client modules, but there's no > pretending that being an HTTP client is natural or particularly performant for > Kamailio. > > Since most--not all, but most--of the workload is I/O wait, you could get away > with just adding more children. The general principle that children should not > exceed CPUs, because they'd just fight for CPU, applies when the workload is > CPU-bound, or computational in nature. However, if most of the time is spent > waiting on something to come across a socket, e.g. from a database or an > HTTP server, you can overbook quite a bit relative to the CPUs. > > You still don't want too many child processes, and I would be hard-pressed to > say exactly how many is too many, but 2:1 or 3:1 should be safe. > > -- Alex > > > On Aug 24, 2024, at 10:35 PM, Alexis Fidalgo alzrck@gmail.com wrote: > > > > it is. im dealing with this issue since a few weeks, i can push and add more > cpu’s and more children and play with queues, timeouts, etc etc. but i know, > im pretty sure that there’s a limit i can't surpass. > > > > at this point im struggling on how to modify the http part (the api server) it > would be great (and easy for me) if the execution pipelines can at least have > some part where in can execute in parallel, but … a pipeline, why to execute > following steps if not needed vs ‘execute and discard’ (and worst, some > consumed webservices in the pipeline are transactional charged). > > > > im happy, more that that, a docker swarm with 4 nodes with > > 10cpu/children each are handling ~1300 call attempls > > (invite,100,302,ack) per second, thats more than ok (and cpu/mem are > > low, really low, problem is the wait, no the power), but we need more > > (ill move to more cores/childrens and appeal to brute force by now to > > gain some time) > > > > still seeing there’s no point to start using TM for a 100% stateless > > flow > > > > > >> On 24 Aug 2024, at 9:02 PM, Alex Balashov via sr-users <sr- > users@lists.kamailio.org> wrote: > >> > >> Not overtly related to your most immediate question, but: > >> > >> 1) "we're hitting a limit where all children become busy waiting for the API > to answer." > >> > >> 2) "So i decided to move to http_async_client" > >> > >> I'm not sure this is really going to solve your problem. You've hit > fundamental, thermodynamic kind of limits here. Using async here just > squeezes the balloon in one place and causes it to inflate in another. > >> > >> It would be different if some of your workload were HTTP-dependent and > other parts of the workload were not. Doing the HTTP queries would free up > the core SIP workers to process other kinds of requests. That doesn't sound > like it's the case, so all you're really liberating is reply processing, which, if this is > a redirect server, is nonexistent anyway. > >> > >> This matter might occasion some deeper reflection. > >> > >> -- Alex > >> > >>> On Aug 24, 2024, at 6:48 PM, alexis via sr-users <sr- > users@lists.kamailio.org> wrote: > >>> > >>> Hello all, context first, we have an REST API that performs queries to > external devices in the network (diameter to DRA's, REST to different servers) > and based on n conditions returns the content for a Contact header to be used > in a SIP 302. > >>> > >>> Now we're consuming this API with http_client (synchronously) and as > there's no way to speed up the API (pipeline executions, delays on external > api's etc etc) we're hitting a limit where all children become busy waiting for > the API to answer. > >>> > >>> So i decided to move to http_async_client and started working on it on the > lab with this first and base concept to test. > >>> > >>> request_route { > >>> > >>> > >>> #for testing purposes only > >>> if(is_method("ACK")){ > >>> exit; > >>> } > >>> $http_req(all) = $null; > >>> $http_req(suspend) = 1; > >>> $http_req(timeout) = 500; > >>> $http_req(method) = "POST"; > >>> $http_req(hdr) = "Content-Type: application/json"; > >>> jansson_set("string", "event", "sip-routing", "$var(cre_query)"); > >>> xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); > >>> $http_req(body) = $var(cre_query); > >>> t_newtran(); > >>> http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); } > >>> > >>> http://192.168.86.128:8000/ receives the POST, randomly creates a > >>> delay between 0.5 and 1 second and responds (simulating the real api > >>> with an excess delay to probe the concept) > >>> > >>> Then > >>> > >>> route[CRE_RESPONSE] { > >>> if ($http_ok && $http_rs == 200) { > >>> xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing > >>> purpose, Contact content will be replaced from the received api > >>> response > >>> append_to_reply("Contact: sip:1234@google.com\r\n"); > >>> send_reply(302,"Moved Temporarily"); exit; > >>> } > >>> send_reply(500, "Internal error"); > >>> exit; > >>> } > >>> > >>> INVITE is received and processed, API is called, after API response, 302 is > replied and then an ACK (ignored by now). > >>> > >>> Situation is that the 302 retransmitted > >>> > >>> 37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 > >>> Request: INVITE sip:service@192.168.86.128:5060 | > >>> 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: > >>> 100 Trying | > >>> 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: > >>> 302 Moved Temporarily | > >>> 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 > Request: > >>> ACK sip:service@192.168.86.128:5060 | > >>> 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: > >>> 302 Moved Temporarily | > >>> 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: > >>> 302 Moved Temporarily | > >>> 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: > >>> 302 Moved Temporarily | > >>> > >>> > >>> 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply > >>> retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., > >>> shmem=0x7f6d75187fd8: SIP/2.0 3 > >>> 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply > >>> retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., > >>> shmem=0x7f6d75187fd8: SIP/2.0 3 > >>> 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply > >>> retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., > >>> shmem=0x7f6d75187fd8: SIP/2.0 3 > >>> 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished > >>> transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) > >>> 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing > >>> transaction 0x7f6d75184cc8 from timer.c:643 > >>> > >>> Any help to avoid the retransmission and make the transaction just finish > right after the 302 will be appreciated. > >>> > >>> regards > >>> > >>> > >>> > >>> > >>> > >>> __________________________________________________________ > >>> Kamailio - Users Mailing List - Non Commercial Discussions To > >>> unsubscribe send an email to sr-users-leave@lists.kamailio.org > >>> Important: keep the mailing list in the recipients, do not reply only to the > sender! > >>> Edit mailing list options or unsubscribe: > >> > >> -- > >> Alex Balashov > >> Principal Consultant > >> Evariste Systems LLC > >> Web: https://evaristesys.com > >> Tel: +1-706-510-6800 > >> > >> __________________________________________________________ > >> Kamailio - Users Mailing List - Non Commercial Discussions To > >> unsubscribe send an email to sr-users-leave@lists.kamailio.org > >> Important: keep the mailing list in the recipients, do not reply only to the > sender! > >> Edit mailing list options or unsubscribe: > > > > -- > Alex Balashov > Principal Consultant > Evariste Systems LLC > Web: https://evaristesys.com > Tel: +1-706-510-6800 > > __________________________________________________________ > Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe > send an email to sr-users-leave@lists.kamailio.org > Important: keep the mailing list in the recipients, do not reply only to the > sender! > Edit mailing list options or unsubscribe:

Alexis Fidalgo

12:17 p.m.

I’m pretty confident it is, problem is on how fast is the response on the http server (yes, kamailio will suffer a little overhead to build the query, send the request, parse the response, etc, in my opinion negligible)

From your appreciation and Alex’s my path is increase and test children in steps to see where it starts to queue and then release a little bit.

And more important —but more complicated tough— to improve the http service and get it to respond faster

Thanks for the heads up, it was helpful.

Regards

Enviado desde dispositivo móvil

...

El 25 ago 2024, a la(s) 4:38 a. m., Henning Westerholt via sr-users sr-users@lists.kamailio.org escribió:

Hello Alex,

are you saying that using Kamailio with HTTP is not performant at all? This has not been my experience so far. I think many people are using it for large infrastructures. Your remarks regarding the latency and addressing the bottlenecks first are of course valid.

Regarding children to CPU core ratios, the default configuration is already giving you a 2:1 ratio (8 children for a standard 4 core server). As frequently suggested, the OP should probably investigate the UDP receiver queue (netstat/ss etc..) and increase the number of children if there is a significant and stable queue building up.

Cheers,

Henning

-- Henning Westerholt – https://skalatan.de/blog/ Kamailio services – https://gilawa.com

...
-----Original Message----- From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Sonntag, 25. August 2024 06:18 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

A few hundred CPS per node is impressive, but especially so when anything with HTTP is involved! Kamailio has HTTP client modules, but there's no pretending that being an HTTP client is natural or particularly performant for Kamailio.

Since most--not all, but most--of the workload is I/O wait, you could get away with just adding more children. The general principle that children should not exceed CPUs, because they'd just fight for CPU, applies when the workload is CPU-bound, or computational in nature. However, if most of the time is spent waiting on something to come across a socket, e.g. from a database or an HTTP server, you can overbook quite a bit relative to the CPUs.

You still don't want too many child processes, and I would be hard-pressed to say exactly how many is too many, but 2:1 or 3:1 should be safe.

-- Alex

...
...
On Aug 24, 2024, at 10:35 PM, Alexis Fidalgo alzrck@gmail.com wrote:

it is. im dealing with this issue since a few weeks, i can push and add more

cpu’s and more children and play with queues, timeouts, etc etc. but i know, im pretty sure that there’s a limit i can't surpass.

...
at this point im struggling on how to modify the http part (the api server) it

would be great (and easy for me) if the execution pipelines can at least have some part where in can execute in parallel, but … a pipeline, why to execute following steps if not needed vs ‘execute and discard’ (and worst, some consumed webservices in the pipeline are transactional charged).

...
im happy, more that that, a docker swarm with 4 nodes with 10cpu/children each are handling ~1300 call attempls (invite,100,302,ack) per second, thats more than ok (and cpu/mem are low, really low, problem is the wait, no the power), but we need more (ill move to more cores/childrens and appeal to brute force by now to gain some time)

still seeing there’s no point to start using TM for a 100% stateless flow

...
On 24 Aug 2024, at 9:02 PM, Alex Balashov via sr-users <sr-

users@lists.kamailio.org> wrote:

...
...
Not overtly related to your most immediate question, but:

"we're hitting a limit where all children become busy waiting for the API

to answer."

...
...

"So i decided to move to http_async_client"

I'm not sure this is really going to solve your problem. You've hit

fundamental, thermodynamic kind of limits here. Using async here just squeezes the balloon in one place and causes it to inflate in another.

...
...
It would be different if some of your workload were HTTP-dependent and

other parts of the workload were not. Doing the HTTP queries would free up the core SIP workers to process other kinds of requests. That doesn't sound like it's the case, so all you're really liberating is reply processing, which, if this is a redirect server, is nonexistent anyway.

...
...
This matter might occasion some deeper reflection.

-- Alex

...
On Aug 24, 2024, at 6:48 PM, alexis via sr-users <sr-

users@lists.kamailio.org> wrote:

...
...
...
Hello all, context first, we have an REST API that performs queries to

external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

...
...
...
Now we're consuming this API with http_client (synchronously) and as

there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

...
...
...
So i decided to move to http_async_client and started working on it on the

lab with this first and base concept to test.

...
...
...
request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); }

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is

replied and then an ACK (ignored by now).

...
...
...
Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453

Request:

...
...
...
ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish

right after the 302 will be appreciated.

...
...
...
regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the

sender!

...
...
...
Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the

sender!

...
...
Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Alex Balashov

2:17 p.m.

Hi Henning,

Well, "performant" is quite relative. I would say that asynchronous HTTP queries are not a very natural fit for Kamailio; they're rather bureaucratic, and require diverting around the most optimised code path--routing SIP messages.

I don't doubt that on modern hardware, the results are quite impressive regardless. That is clearly so.

But no, it's not the approach I'd choose to get the most throughout possible, if that's the objective.

-- Alex

— Sent from mobile, apologies for brevity and errors.

...

On Aug 25, 2024, at 3:11 AM, Henning Westerholt hw@gilawa.com wrote:

Hello Alex,

are you saying that using Kamailio with HTTP is not performant at all? This has not been my experience so far. I think many people are using it for large infrastructures. Your remarks regarding the latency and addressing the bottlenecks first are of course valid.

Regarding children to CPU core ratios, the default configuration is already giving you a 2:1 ratio (8 children for a standard 4 core server). As frequently suggested, the OP should probably investigate the UDP receiver queue (netstat/ss etc..) and increase the number of children if there is a significant and stable queue building up.

Cheers,

Henning

-- Henning Westerholt – https://skalatan.de/blog/ Kamailio services – https://gilawa.com

...
-----Original Message----- From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Sonntag, 25. August 2024 06:18 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

A few hundred CPS per node is impressive, but especially so when anything with HTTP is involved! Kamailio has HTTP client modules, but there's no pretending that being an HTTP client is natural or particularly performant for Kamailio.

Since most--not all, but most--of the workload is I/O wait, you could get away with just adding more children. The general principle that children should not exceed CPUs, because they'd just fight for CPU, applies when the workload is CPU-bound, or computational in nature. However, if most of the time is spent waiting on something to come across a socket, e.g. from a database or an HTTP server, you can overbook quite a bit relative to the CPUs.

You still don't want too many child processes, and I would be hard-pressed to say exactly how many is too many, but 2:1 or 3:1 should be safe.

-- Alex

...
...
On Aug 24, 2024, at 10:35 PM, Alexis Fidalgo alzrck@gmail.com wrote:

it is. im dealing with this issue since a few weeks, i can push and add more

cpu’s and more children and play with queues, timeouts, etc etc. but i know, im pretty sure that there’s a limit i can't surpass.

...
at this point im struggling on how to modify the http part (the api server) it

would be great (and easy for me) if the execution pipelines can at least have some part where in can execute in parallel, but … a pipeline, why to execute following steps if not needed vs ‘execute and discard’ (and worst, some consumed webservices in the pipeline are transactional charged).

...
im happy, more that that, a docker swarm with 4 nodes with 10cpu/children each are handling ~1300 call attempls (invite,100,302,ack) per second, thats more than ok (and cpu/mem are low, really low, problem is the wait, no the power), but we need more (ill move to more cores/childrens and appeal to brute force by now to gain some time)

still seeing there’s no point to start using TM for a 100% stateless flow

...
On 24 Aug 2024, at 9:02 PM, Alex Balashov via sr-users <sr-

users@lists.kamailio.org> wrote:

...
...
Not overtly related to your most immediate question, but:

"we're hitting a limit where all children become busy waiting for the API

to answer."

...
...

"So i decided to move to http_async_client"

I'm not sure this is really going to solve your problem. You've hit

fundamental, thermodynamic kind of limits here. Using async here just squeezes the balloon in one place and causes it to inflate in another.

...
...
It would be different if some of your workload were HTTP-dependent and

other parts of the workload were not. Doing the HTTP queries would free up the core SIP workers to process other kinds of requests. That doesn't sound like it's the case, so all you're really liberating is reply processing, which, if this is a redirect server, is nonexistent anyway.

...
...
This matter might occasion some deeper reflection.

-- Alex

...
On Aug 24, 2024, at 6:48 PM, alexis via sr-users <sr-

users@lists.kamailio.org> wrote:

...
...
...
Hello all, context first, we have an REST API that performs queries to

external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

...
...
...
Now we're consuming this API with http_client (synchronously) and as

there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

...
...
...
So i decided to move to http_async_client and started working on it on the

lab with this first and base concept to test.

...
...
...
request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); }

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is

replied and then an ACK (ignored by now).

...
...
...
Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453

Request:

...
...
...
ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish

right after the 302 will be appreciated.

...
...
...
regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the

sender!

...
...
...
Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the

sender!

...
...
Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Henning Westerholt

26 Aug 26 Aug

9:13 a.m.

Hello Alex,

I agree. If you can avoid e.g. using some cloud-based API server architecture that requires the extensive use of synchronous or asynchronous HTTP requests in Kamailio, this will be of course easier and probably also more performant.

But sometimes you don't have the choice, e.g. if the customer prefer to also adapt the VoIP infrastructure to a (in his world) more modern architecture.

Cheers,

Henning

...

-----Original Message----- From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Sonntag, 25. August 2024 16:18 To: sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

Hi Henning,

Well, "performant" is quite relative. I would say that asynchronous HTTP queries are not a very natural fit for Kamailio; they're rather bureaucratic, and require diverting around the most optimised code path--routing SIP messages.

I don't doubt that on modern hardware, the results are quite impressive regardless. That is clearly so.

But no, it's not the approach I'd choose to get the most throughout possible, if that's the objective.

-- Alex

— Sent from mobile, apologies for brevity and errors.

...
On Aug 25, 2024, at 3:11 AM, Henning Westerholt hw@gilawa.com

wrote:

...
Hello Alex,

are you saying that using Kamailio with HTTP is not performant at all? This

has not been my experience so far. I think many people are using it for large infrastructures. Your remarks regarding the latency and addressing the bottlenecks first are of course valid.

...
Regarding children to CPU core ratios, the default configuration is already

giving you a 2:1 ratio (8 children for a standard 4 core server). As frequently suggested, the OP should probably investigate the UDP receiver queue (netstat/ss etc..) and increase the number of children if there is a significant and stable queue building up.

...
Cheers,

Henning

-- Henning Westerholt – https://skalatan.de/blog/ Kamailio services – https://gilawa.com

...
-----Original Message----- From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Sonntag, 25. August 2024 06:18 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

A few hundred CPS per node is impressive, but especially so when anything with HTTP is involved! Kamailio has HTTP client modules, but there's no pretending that being an HTTP client is natural or particularly performant for Kamailio.

Since most--not all, but most--of the workload is I/O wait, you could get away with just adding more children. The general principle that children should not exceed CPUs, because they'd just fight for CPU, applies when the workload is CPU-bound, or computational in nature. However, if most of the time is spent waiting on something to come across a socket, e.g. from a database or an HTTP server, you can overbook

quite a bit relative to the CPUs.

...
...
You still don't want too many child processes, and I would be hard-pressed to say exactly how many is too many, but 2:1 or 3:1 should

be safe.

...
...
-- Alex

...
...
On Aug 24, 2024, at 10:35 PM, Alexis Fidalgo alzrck@gmail.com

wrote:

...
...
...
it is. im dealing with this issue since a few weeks, i can push and add more

cpu’s and more children and play with queues, timeouts, etc etc. but i know, im pretty sure that there’s a limit i can't surpass.

...
at this point im struggling on how to modify the http part (the api server) it

would be great (and easy for me) if the execution pipelines can at least have some part where in can execute in parallel, but … a pipeline, why to execute following steps if not needed vs ‘execute and discard’ (and worst, some consumed webservices in the pipeline are

transactional charged).

...
...
...
im happy, more that that, a docker swarm with 4 nodes with 10cpu/children each are handling ~1300 call attempls (invite,100,302,ack) per second, thats more than ok (and cpu/mem are low, really low, problem is the wait, no the power), but we need more (ill move to more cores/childrens and appeal to brute force by now to gain some time)

still seeing there’s no point to start using TM for a 100% stateless flow

...
On 24 Aug 2024, at 9:02 PM, Alex Balashov via sr-users <sr-

users@lists.kamailio.org> wrote:

...
...
Not overtly related to your most immediate question, but:

"we're hitting a limit where all children become busy waiting

for the API

to answer."

...
...

"So i decided to move to http_async_client"

I'm not sure this is really going to solve your problem. You've hit

fundamental, thermodynamic kind of limits here. Using async here just squeezes the balloon in one place and causes it to inflate in another.

...
...
It would be different if some of your workload were HTTP-dependent and

other parts of the workload were not. Doing the HTTP queries would free up the core SIP workers to process other kinds of requests. That doesn't sound like it's the case, so all you're really liberating is reply processing, which, if this is a redirect server, is nonexistent anyway.

...
...
This matter might occasion some deeper reflection.

-- Alex

...
On Aug 24, 2024, at 6:48 PM, alexis via sr-users <sr-

users@lists.kamailio.org> wrote:

...
...
...
Hello all, context first, we have an REST API that performs queries to

external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

...
...
...
Now we're consuming this API with http_client (synchronously) and as

there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

...
...
...
So i decided to move to http_async_client and started working on it on the

lab with this first and base concept to test.

...
...
...
request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE");

}

...
...
...
...
...
http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is

replied and then an ACK (ignored by now).

...
...
...
Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318

Status:

...
...
...
...
...
100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407

Status:

...
...
...
...
...
302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453

Request:

...
...
...
ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407

Status:

...
...
...
...
...
302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407

Status:

...
...
...
...
...
302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407

Status:

...
...
...
...
...
302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish

right after the 302 will be appreciated.

...
...
...
regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the

sender!

...
...
...
Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the

sender!

...
...
Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Alex Balashov

10:30 a.m.

...

On Aug 26, 2024, at 5:13 AM, Henning Westerholt hw@gilawa.com wrote:

Hello Alex,

I agree. If you can avoid e.g. using some cloud-based API server architecture that requires the extensive use of synchronous or asynchronous HTTP requests in Kamailio, this will be of course easier and probably also more performant.

But sometimes you don't have the choice, e.g. if the customer prefer to also adapt the VoIP infrastructure to a (in his world) more modern architecture.

Yeah, I'd agree with that.

I meant exactly that: HTTP is not the most performant option. I don't mean that it's not viable. In fact, it seems to be very popular, for the reasons you point out.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Alexis Fidalgo

25 Aug 25 Aug

11:30 a.m.

Good to hear, I’ll start increasing the children to see what happens. It’s a good start point.

On the other hand, we have different Kamailio implementations where no http or any kind of ‘external integration’ is involved and we get a lot more calls, it’s very clear where the problem is :)

Thanks for you help (again)

Enviado desde dispositivo móvil

...

El 25 ago 2024, a la(s) 1:47 a. m., Alex Balashov via sr-users sr-users@lists.kamailio.org escribió:

A few hundred CPS per node is impressive, but especially so when anything with HTTP is involved! Kamailio has HTTP client modules, but there's no pretending that being an HTTP client is natural or particularly performant for Kamailio.

Since most--not all, but most--of the workload is I/O wait, you could get away with just adding more children. The general principle that children should not exceed CPUs, because they'd just fight for CPU, applies when the workload is CPU-bound, or computational in nature. However, if most of the time is spent waiting on something to come across a socket, e.g. from a database or an HTTP server, you can overbook quite a bit relative to the CPUs.

You still don't want too many child processes, and I would be hard-pressed to say exactly how many is too many, but 2:1 or 3:1 should be safe.

-- Alex

...
On Aug 24, 2024, at 10:35 PM, Alexis Fidalgo alzrck@gmail.com wrote:

it is. im dealing with this issue since a few weeks, i can push and add more cpu’s and more children and play with queues, timeouts, etc etc. but i know, im pretty sure that there’s a limit i can't surpass.

at this point im struggling on how to modify the http part (the api server) it would be great (and easy for me) if the execution pipelines can at least have some part where in can execute in parallel, but … a pipeline, why to execute following steps if not needed vs ‘execute and discard’ (and worst, some consumed webservices in the pipeline are transactional charged).

im happy, more that that, a docker swarm with 4 nodes with 10cpu/children each are handling ~1300 call attempls (invite,100,302,ack) per second, thats more than ok (and cpu/mem are low, really low, problem is the wait, no the power), but we need more (ill move to more cores/childrens and appeal to brute force by now to gain some time)

still seeing there’s no point to start using TM for a 100% stateless flow

...
...
On 24 Aug 2024, at 9:02 PM, Alex Balashov via sr-users sr-users@lists.kamailio.org wrote:

Not overtly related to your most immediate question, but:

"we're hitting a limit where all children become busy waiting for the API to answer."

"So i decided to move to http_async_client"

I'm not sure this is really going to solve your problem. You've hit fundamental, thermodynamic kind of limits here. Using async here just squeezes the balloon in one place and causes it to inflate in another.

It would be different if some of your workload were HTTP-dependent and other parts of the workload were not. Doing the HTTP queries would free up the core SIP workers to process other kinds of requests. That doesn't sound like it's the case, so all you're really liberating is reply processing, which, if this is a redirect server, is nonexistent anyway.

This matter might occasion some deeper reflection.

-- Alex

...
On Aug 24, 2024, at 6:48 PM, alexis via sr-users sr-users@lists.kamailio.org wrote:

Hello all, context first, we have an REST API that performs queries to external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

Now we're consuming this API with http_client (synchronously) and as there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("http://192.168.86.128:8000/", "CRE_RESPONSE"); }

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Ben Kaufman

26 Aug 26 Aug

8:30 p.m.

Does the analogy apply here? Assuming a steady traffic rate and that the http request takes a consistent amount of time, all that's added is PDD as long as latency doesn't increase in the http request with load.

With a blocking HTTP request, then the number of requests that Kamailio can handle becomes limited by the number of children. If it's not blocking, then the limit becomes memory bound, but if the request rate is static, then the memory limit is also static.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

[img]https://www.sip.us/ [img]https://www.siptrunk.com/ [img]https://www.flowroute.com/

________________________________ From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Saturday, August 24, 2024 7:02 PM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Not overtly related to your most immediate question, but:

1) "we're hitting a limit where all children become busy waiting for the API to answer."

2) "So i decided to move to http_async_client"

This matter might occasion some deeper reflection.

-- Alex

...

On Aug 24, 2024, at 6:48 PM, alexis via sr-users sr-users@lists.kamailio.org wrote:

Hello all, context first, we have an REST API that performs queries to external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

Now we're consuming this API with http_client (synchronously) and as there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.86....http://192.168.86.128:8000/", "CRE_RESPONSE"); }

https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.86....http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

__________________________________________________________ Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Alex Balashov

8:59 p.m.

...

On Aug 26, 2024, at 4:30 PM, Ben Kaufman bkaufman@bcmone.com wrote:

Does the analogy apply here? Assuming a steady traffic rate and that the http request takes a consistent amount of time, all that's added is PDD as long as latency doesn't increase in the http request with load.

With a blocking HTTP request, then the number of requests that Kamailio can handle becomes limited by the number of children. If it's not blocking, then the limit becomes memory bound, but if the request rate is static, then the memory limit is also static.

Yes, but what sense of "handle" are you appealing to here? If all requests are HTTP-bound, and all requests take an async code path, then why bother with the async? What are you gaining?

There would only be a benefit if the primary children were freed up to do something else.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Henning Westerholt

27 Aug 27 Aug

8:17 a.m.

Hi Ben,

yes, for a stable load without fluctuations you will have a similar throughput with synchronous and asynchronous after a short time.

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Cheers,

Henning

From: Ben Kaufman via sr-users sr-users@lists.kamailio.org Sent: Montag, 26. August 2024 22:30 To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Ben Kaufman bkaufman@bcmone.com Subject: [SR-Users] Re: http_async and tm

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.commailto:bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768 [img]https://www.sip.us/ [img]https://www.siptrunk.com/ [img]https://www.flowroute.com/

________________________________ From: Alex Balashov via sr-users <sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org> Sent: Saturday, August 24, 2024 7:02 PM To: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org> Cc: Alex Balashov <abalashov@evaristesys.commailto:abalashov@evaristesys.com> Subject: [SR-Users] Re: http_async and tm

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Not overtly related to your most immediate question, but:

1) "we're hitting a limit where all children become busy waiting for the API to answer."

2) "So i decided to move to http_async_client"

This matter might occasion some deeper reflection.

-- Alex

...

On Aug 24, 2024, at 6:48 PM, alexis via sr-users <sr-users@lists.kamailio.orgmailto:sr-users@lists.kamailio.org> wrote:

Hello all, context first, we have an REST API that performs queries to external devices in the network (diameter to DRA's, REST to different servers) and based on n conditions returns the content for a Contact header to be used in a SIP 302.

Now we're consuming this API with http_client (synchronously) and as there's no way to speed up the API (pipeline executions, delays on external api's etc etc) we're hitting a limit where all children become busy waiting for the API to answer.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

#for testing purposes only if(is_method("ACK")){ exit; } $http_req(all) = $null; $http_req(suspend) = 1; $http_req(timeout) = 500; $http_req(method) = "POST"; $http_req(hdr) = "Content-Type: application/json"; jansson_set("string", "event", "sip-routing", "$var(cre_query)"); xlog("L_INFO","API ASYNC ROUTING REQUEST: $var(cre_query)\n"); $http_req(body) = $var(cre_query); t_newtran(); http_async_query("https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.86....http://192.168.86.128:8000/", "CRE_RESPONSE"); }

https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2F192.168.86....http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: sip:1234@google.com\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060 | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060 | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [t_reply.c:1703]: t_retransmit_reply(): reply retransmitted. buf=0x7f6d79745dc0: SIP/2.0 3..., shmem=0x7f6d75187fd8: SIP/2.0 3 18(24) DEBUG: tm [timer.c:634]: wait_handler(): finished transaction: 0x7f6d75184cc8 (p:0x7f6d74f600c8/n:0x7f6d74f600c8) 18(24) DEBUG: tm [h_table.c:132]: free_cell_helper(): freeing transaction 0x7f6d75184cc8 from timer.c:643

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.orgmailto:sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

__________________________________________________________ Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.orgmailto:sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Alex Balashov

11:57 a.m.

...

On Aug 27, 2024, at 4:17 AM, Henning Westerholt via sr-users sr-users@lists.kamailio.org wrote:

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Indeed. It's also worth reiterating that the meaning of "asynchronous" is somewhat environmentally and implementationally specific.

As the term has entered general use with the popularity of single-threaded / single event loop multiplexing systems, such as Node and JavaScript, it has come to refer to a programming and processing pattern in which the waiting and detection of I/O is delegated to the OS kernel network stack. The OS takes care of this juggling and calls event hooks or callbacks in your program when there is I/O to consume, or sets some flag or condition to indicate this so that you can read the I/O from some OS buffer at your convenience. In this way, your program is able to proceed executing other kinds of things while the OS is taking care of waiting on I/O. Provided that the workload consists of waiting on I/O and also other things, this is to the general benefit of "other things", not the I/O.

In Kamailio, asynchronous processing just means liberating the transaction from the main worker processes, which are part of a relatively small fixed-size pool, by suspending it and shipping it to another set of ancillary worker processes, also part of a relatively small, fixed-size pool. Within those ancillary worker processes, the execution is as linear, synchronous and blocking as it would be in the main worker processes. This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes.

The only real footnote to this is about situations in which the resumption of the transaction in the async workers is mediated by external events, e.g. a POST-back into Kamailio's `xhttp` server. While this does not change the nature of the subsequent synchronous execution of the route logic, it does mean that neither a core CIP worker nor an async worker is tied up while some kind of external processing is playing out.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Alexis Fidalgo

12:39 p.m.

In my case, problem is 100% in the REST API, trying to ‘fix’ it in kamailio was just an option that i was exploring or more like ‘am i doing something wrong?’

I gave up now on the kamailio side (only added more cpu’s and children to absorb he shockwave) and working on the API to start caching things in a redis and getting the ‘external’ data in a different thread to avoid the processing delay. I believe this will be the bigger gain.

...

On 27 Aug 2024, at 8:57 AM, Alex Balashov via sr-users sr-users@lists.kamailio.org wrote:

...
On Aug 27, 2024, at 4:17 AM, Henning Westerholt via sr-users sr-users@lists.kamailio.org wrote:

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Indeed. It's also worth reiterating that the meaning of "asynchronous" is somewhat environmentally and implementationally specific.

As the term has entered general use with the popularity of single-threaded / single event loop multiplexing systems, such as Node and JavaScript, it has come to refer to a programming and processing pattern in which the waiting and detection of I/O is delegated to the OS kernel network stack. The OS takes care of this juggling and calls event hooks or callbacks in your program when there is I/O to consume, or sets some flag or condition to indicate this so that you can read the I/O from some OS buffer at your convenience. In this way, your program is able to proceed executing other kinds of things while the OS is taking care of waiting on I/O. Provided that the workload consists of waiting on I/O and also other things, this is to the general benefit of "other things", not the I/O.

In Kamailio, asynchronous processing just means liberating the transaction from the main worker processes, which are part of a relatively small fixed-size pool, by suspending it and shipping it to another set of ancillary worker processes, also part of a relatively small, fixed-size pool. Within those ancillary worker processes, the execution is as linear, synchronous and blocking as it would be in the main worker processes. This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes.

The only real footnote to this is about situations in which the resumption of the transaction in the async workers is mediated by external events, e.g. a POST-back into Kamailio's `xhttp` server. While this does not change the nature of the subsequent synchronous execution of the route logic, it does mean that neither a core CIP worker nor an async worker is tied up while some kind of external processing is playing out.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Alex Balashov

12:57 p.m.

...

On Aug 27, 2024, at 8:39 AM, Alexis Fidalgo via sr-users sr-users@lists.kamailio.org wrote:

I gave up now on the kamailio side (only added more cpu’s and children to absorb he shockwave) and working on the API to start caching things in a redis and getting the ‘external’ data in a different thread to avoid the processing delay. I believe this will be the bigger gain.

You are certainly correct; this is the area where the juice is "worth the squeeze". But there are only so many gains you can make in the speed of an HTTP API call. By its very nature, this mechanism is designed for the relatively delay-tolerant web economy, not for real-time telecommunications systems.

This is not to deny that plenty of real-time telecommunications systems use HTTP API calls.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Alexandru Covalschi

1:54 p.m.

...

In my case, problem is 100% in the REST API, trying to ‘fix’ it in

kamailio was just an option that i was exploring or more like ‘am i doing something wrong?’

Not necessarily, but maybe moving from a centralized http entrypoint to mqtt may be beneficial in your case. Kamailio integration is pretty straightforward and doesn't differ much from http_async_client.

Alexis Fidalgo

2:16 p.m.

Which I agree, http is far away from being the best option, but you know, sometimes there’s no choice, my api consumes 3 external web services from third party apps and there’s no chance to change that. If it were my choice, no http will be involved at least for this

Enviado desde dispositivo móvil

...

El 27 ago 2024, a la(s) 10:16 a. m., Alex Balashov via sr-users sr-users@lists.kamailio.org escribió:

...
On Aug 27, 2024, at 8:39 AM, Alexis Fidalgo via sr-users sr-users@lists.kamailio.org wrote:

I gave up now on the kamailio side (only added more cpu’s and children to absorb he shockwave) and working on the API to start caching things in a redis and getting the ‘external’ data in a different thread to avoid the processing delay. I believe this will be the bigger gain.

You are certainly correct; this is the area where the juice is "worth the squeeze". But there are only so many gains you can make in the speed of an HTTP API call. By its very nature, this mechanism is designed for the relatively delay-tolerant web economy, not for real-time telecommunications systems.

This is not to deny that plenty of real-time telecommunications systems use HTTP API calls.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

Ben Kaufman

29 Aug 29 Aug

7:13 p.m.

I believe that there's a conflation of the http_async_client module and the async module going on here. My understanding (and testing bears out), that the http async client module works in the manner you describe as being "mediated by external events" in that the http reply is the external trigger that resumes the transaction. It appears that the module abstracts any actual calls to suspend/resume.

I created a test project that demonstrates this: https://github.com/whosgonna/kamailio_http_async

Using the container deminy/delayed-http-responsehttps://registry.hub.docker.com/r/deminy/delayed-http-response to create an http service that sleeps 1 second and then replies, and having Kamailio set with 2 child listening processes, and 1 http_async_client worker, a simple use of the http_async_query() function handles 300 cps sustained over a minute with no problems. This passes all of the latency load onto the http server. Yes, if the http server cannot handle the request load as the requests increase, that will be a problem, but I think the understanding of how this module works is incorrect.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

[img]https://www.sip.us/ [img]https://www.siptrunk.com/ [img]https://www.flowroute.com/

________________________________ From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Tuesday, August 27, 2024 6:57 AM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

...

On Aug 27, 2024, at 4:17 AM, Henning Westerholt via sr-users sr-users@lists.kamailio.org wrote:

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Indeed. It's also worth reiterating that the meaning of "asynchronous" is somewhat environmentally and implementationally specific.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

Alex Balashov

7:41 p.m.

Ben,

You're absolutely correct that the transaction is resumed upon receipt of an HTTP reply, and that the async workers do not block on this event.

This is also true of a variety of other patterns and workflows which shuttle the transactions off to an async process pool, not really specific to the HTTP client modules. `tsilo` comes to mind.

My overview was meant to make a more general point: once the transaction is resumed, the processing is otherwise linear and identical in every other respect to how it would unfold in the main SIP worker process. This is what I meant when I said:

"This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes."

My over-arching point is that throughput is still limited by the (fixed) size of the async worker pool, and that the same considerations about its sizing apply in async land as in regular child process land.

For this reason, I don't think you are correct to conclude from this exercise that using asynchronous mode / `http_async_client` increases _throughput_. Your example config has one async worker, and that worker can only handle one resumed transaction at a time. This is not a net increase in throughput. You say in your documentation: "Re-running the load test will result in heavily blocked and dropped SIP requests", which is absolutely true; in this paradigm, you are liberating the SIP workers from the congestion that would cause "blocked and dropped SIP requests", by doing what I described in my original message: "suspending it and shipping it to another set of ancillary worker processes".

The real conflation here is between "not dropping SIP requests" and "increased throughput". There is no increased throughput, only a deep queue of suspended transactions lined up at the async worker pool. This goes back to the question I raised in a previous message: what exactly is meant by "handle"? Are you "not dropping" 300 CPS, or are you relaying 300 CPS worth of requests? I fully believe that one async worker can churn through hundreds of resumed transactions per second, driven by near-simultaneous HTTP replies, when the effect is simply to send a '404 Not Found', as in your config. Real-world workloads don't particularly resemble that in most cases.

You're not getting more _throughput_. You're changing where the blocking occurs so as to not obstruct receipt of further SIP messages, which is precisely what I meant when I said: "suspending it and shipping it to another set of ancillary worker processes".

From the perspective of not dropping SIP messages, sweeping these transactions out of the main worker pool is indeed quite effective, and if it is in that narrow sense of "not dropping" that you meant "keeping up with the requests", then you are quite right. I was thinking about bottom-line throughput capacity. I think there is a misconception out there that other config script operations also acquire an asynchronous character, beyond the initial "shipping", as it were, and this is the idea I meant to single out.

-- Alex

...

On Aug 29, 2024, at 3:13 PM, Ben Kaufman bkaufman@bcmone.com wrote:

I believe that there's a conflation of the http_async_client module and the async module going on here. My understanding (and testing bears out), that the http async client module works in the manner you describe as being "mediated by external events" in that the http reply is the external trigger that resumes the transaction. It appears that the module abstracts any actual calls to suspend/resume.

I created a test project that demonstrates this: https://github.com/whosgonna/kamailio_http_async

Using the container deminy/delayed-http-response to create an http service that sleeps 1 second and then replies, and having Kamailio set with 2 child listening processes, and 1 http_async_client worker, a simple use of the http_async_query() function handles 300 cps sustained over a minute with no problems. This passes all of the latency load onto the http server. Yes, if the http server cannot handle the request load as the requests increase, that will be a problem, but I think the understanding of how this module works is incorrect.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Tuesday, August 27, 2024 6:57 AM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

...
On Aug 27, 2024, at 4:17 AM, Henning Westerholt via sr-users sr-users@lists.kamailio.org wrote:

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Indeed. It's also worth reiterating that the meaning of "asynchronous" is somewhat environmentally and implementationally specific.

As the term has entered general use with the popularity of single-threaded / single event loop multiplexing systems, such as Node and JavaScript, it has come to refer to a programming and processing pattern in which the waiting and detection of I/O is delegated to the OS kernel network stack. The OS takes care of this juggling and calls event hooks or callbacks in your program when there is I/O to consume, or sets some flag or condition to indicate this so that you can read the I/O from some OS buffer at your convenience. In this way, your program is able to proceed executing other kinds of things while the OS is taking care of waiting on I/O. Provided that the workload consists of waiting on I/O and also other things, this is to the general benefit of "other things", not the I/O.

In Kamailio, asynchronous processing just means liberating the transaction from the main worker processes, which are part of a relatively small fixed-size pool, by suspending it and shipping it to another set of ancillary worker processes, also part of a relatively small, fixed-size pool. Within those ancillary worker processes, the execution is as linear, synchronous and blocking as it would be in the main worker processes. This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes.

The only real footnote to this is about situations in which the resumption of the transaction in the async workers is mediated by external events, e.g. a POST-back into Kamailio's `xhttp` server. While this does not change the nature of the subsequent synchronous execution of the route logic, it does mean that neither a core CIP worker nor an async worker is tied up while some kind of external processing is playing out.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy... Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Ben Kaufman

8:18 p.m.

I'm not sure I understand how it's not getting more throughput. Every request got its reply in only a little over a second, from the first request to the 18,000 request. Using a blocking http request this config (low number of workers) died quickly.

The original post in this thread (i.e. the real-world example) was a 302-redirect server. My example traded that for a 404, because it was easier to implement than returning values from the web server to make a 302 reply, but certainly extracting information from the http reply and putting it into the SIP reply is trivial.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

[img]https://www.sip.us/ [img]https://www.siptrunk.com/ [img]https://www.flowroute.com/

________________________________ From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Thursday, August 29, 2024 2:41 PM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Ben,

You're absolutely correct that the transaction is resumed upon receipt of an HTTP reply, and that the async workers do not block on this event.

This is also true of a variety of other patterns and workflows which shuttle the transactions off to an async process pool, not really specific to the HTTP client modules. `tsilo` comes to mind.

-- Alex

...

On Aug 29, 2024, at 3:13 PM, Ben Kaufman bkaufman@bcmone.com wrote:

I believe that there's a conflation of the http_async_client module and the async module going on here. My understanding (and testing bears out), that the http async client module works in the manner you describe as being "mediated by external events" in that the http reply is the external trigger that resumes the transaction. It appears that the module abstracts any actual calls to suspend/resume.

I created a test project that demonstrates this: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...https://github.com/whosgonna/kamailio_http_async

Using the container deminy/delayed-http-response to create an http service that sleeps 1 second and then replies, and having Kamailio set with 2 child listening processes, and 1 http_async_client worker, a simple use of the http_async_query() function handles 300 cps sustained over a minute with no problems. This passes all of the latency load onto the http server. Yes, if the http server cannot handle the request load as the requests increase, that will be a problem, but I think the understanding of how this module works is incorrect.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Tuesday, August 27, 2024 6:57 AM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

...
On Aug 27, 2024, at 4:17 AM, Henning Westerholt via sr-users sr-users@lists.kamailio.org wrote:

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Indeed. It's also worth reiterating that the meaning of "asynchronous" is somewhat environmentally and implementationally specific.

As the term has entered general use with the popularity of single-threaded / single event loop multiplexing systems, such as Node and JavaScript, it has come to refer to a programming and processing pattern in which the waiting and detection of I/O is delegated to the OS kernel network stack. The OS takes care of this juggling and calls event hooks or callbacks in your program when there is I/O to consume, or sets some flag or condition to indicate this so that you can read the I/O from some OS buffer at your convenience. In this way, your program is able to proceed executing other kinds of things while the OS is taking care of waiting on I/O. Provided that the workload consists of waiting on I/O and also other things, this is to the general benefit of "other things", not the I/O.

In Kamailio, asynchronous processing just means liberating the transaction from the main worker processes, which are part of a relatively small fixed-size pool, by suspending it and shipping it to another set of ancillary worker processes, also part of a relatively small, fixed-size pool. Within those ancillary worker processes, the execution is as linear, synchronous and blocking as it would be in the main worker processes. This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes.

The only real footnote to this is about situations in which the resumption of the transaction in the async workers is mediated by external events, e.g. a POST-back into Kamailio's `xhttp` server. While this does not change the nature of the subsequent synchronous execution of the route logic, it does mean that neither a core CIP worker nor an async worker is tied up while some kind of external processing is playing out.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

Alex Balashov

8:45 p.m.

You might be thinking about it backwards. The throughput is the same in process A as it is in process B. The difference is that process B isn't in the critical path of new SIP packets, so it looks like success.

Consider a config in which you take every request, put it on an mqueue, dequeue it inside an async worker (i.e. while(mq_fetch(...)), and then do a blocking HTTP query inside that loop instead. You would get the same throughput, which is to say that the resumption of the transaction upon HTTP reply is just an implementational detail, not a saliently throughput-boosting feature.

I suppose "not dropping SIP messages" can be viewed as a separate dimension of work from the call processing, and in that sense, you're freeing up the main SIP workers to consume more packets. I just have a problem with this take because the packets are then all bound for the same pipeline, and ultimately, the same async worker(s).

The throughput-enhancing value of async depends on liberating workers to do _other_ things while $slow_workload unspools. If the $slow_workload _is_ the work, you're just moving food around on the plate. Whether you're marching straight from Maogong or stop at Songpan, you still end up at Yan'an.

-- Alex

...

On Aug 29, 2024, at 4:18 PM, Ben Kaufman bkaufman@bcmone.com wrote:

I'm not sure I understand how it's not getting more throughput. Every request got its reply in only a little over a second, from the first request to the 18,000 request. Using a blocking http request this config (low number of workers) died quickly.

The original post in this thread (i.e. the real-world example) was a 302-redirect server. My example traded that for a 404, because it was easier to implement than returning values from the web server to make a 302 reply, but certainly extracting information from the http reply and putting it into the SIP reply is trivial.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Thursday, August 29, 2024 2:41 PM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Ben,

You're absolutely correct that the transaction is resumed upon receipt of an HTTP reply, and that the async workers do not block on this event.

This is also true of a variety of other patterns and workflows which shuttle the transactions off to an async process pool, not really specific to the HTTP client modules. `tsilo` comes to mind.

My overview was meant to make a more general point: once the transaction is resumed, the processing is otherwise linear and identical in every other respect to how it would unfold in the main SIP worker process. This is what I meant when I said:

"This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes."

My over-arching point is that throughput is still limited by the (fixed) size of the async worker pool, and that the same considerations about its sizing apply in async land as in regular child process land.

For this reason, I don't think you are correct to conclude from this exercise that using asynchronous mode / `http_async_client` increases _throughput_. Your example config has one async worker, and that worker can only handle one resumed transaction at a time. This is not a net increase in throughput. You say in your documentation: "Re-running the load test will result in heavily blocked and dropped SIP requests", which is absolutely true; in this paradigm, you are liberating the SIP workers from the congestion that would cause "blocked and dropped SIP requests", by doing what I described in my original message: "suspending it and shipping it to another set of ancillary worker processes".

The real conflation here is between "not dropping SIP requests" and "increased throughput". There is no increased throughput, only a deep queue of suspended transactions lined up at the async worker pool. This goes back to the question I raised in a previous message: what exactly is meant by "handle"? Are you "not dropping" 300 CPS, or are you relaying 300 CPS worth of requests? I fully believe that one async worker can churn through hundreds of resumed transactions per second, driven by near-simultaneous HTTP replies, when the effect is simply to send a '404 Not Found', as in your config. Real-world workloads don't particularly resemble that in most cases.

You're not getting more _throughput_. You're changing where the blocking occurs so as to not obstruct receipt of further SIP messages, which is precisely what I meant when I said: "suspending it and shipping it to another set of ancillary worker processes".

From the perspective of not dropping SIP messages, sweeping these transactions out of the main worker pool is indeed quite effective, and if it is in that narrow sense of "not dropping" that you meant "keeping up with the requests", then you are quite right. I was thinking about bottom-line throughput capacity. I think there is a misconception out there that other config script operations also acquire an asynchronous character, beyond the initial "shipping", as it were, and this is the idea I meant to single out.

-- Alex

...
On Aug 29, 2024, at 3:13 PM, Ben Kaufman bkaufman@bcmone.com wrote:

I believe that there's a conflation of the http_async_client module and the async module going on here. My understanding (and testing bears out), that the http async client module works in the manner you describe as being "mediated by external events" in that the http reply is the external trigger that resumes the transaction. It appears that the module abstracts any actual calls to suspend/resume.

I created a test project that demonstrates this: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...

Using the container deminy/delayed-http-response to create an http service that sleeps 1 second and then replies, and having Kamailio set with 2 child listening processes, and 1 http_async_client worker, a simple use of the http_async_query() function handles 300 cps sustained over a minute with no problems. This passes all of the latency load onto the http server. Yes, if the http server cannot handle the request load as the requests increase, that will be a problem, but I think the understanding of how this module works is incorrect.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Tuesday, August 27, 2024 6:57 AM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

...
On Aug 27, 2024, at 4:17 AM, Henning Westerholt via sr-users sr-users@lists.kamailio.org wrote:

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Indeed. It's also worth reiterating that the meaning of "asynchronous" is somewhat environmentally and implementationally specific.

As the term has entered general use with the popularity of single-threaded / single event loop multiplexing systems, such as Node and JavaScript, it has come to refer to a programming and processing pattern in which the waiting and detection of I/O is delegated to the OS kernel network stack. The OS takes care of this juggling and calls event hooks or callbacks in your program when there is I/O to consume, or sets some flag or condition to indicate this so that you can read the I/O from some OS buffer at your convenience. In this way, your program is able to proceed executing other kinds of things while the OS is taking care of waiting on I/O. Provided that the workload consists of waiting on I/O and also other things, this is to the general benefit of "other things", not the I/O.

In Kamailio, asynchronous processing just means liberating the transaction from the main worker processes, which are part of a relatively small fixed-size pool, by suspending it and shipping it to another set of ancillary worker processes, also part of a relatively small, fixed-size pool. Within those ancillary worker processes, the execution is as linear, synchronous and blocking as it would be in the main worker processes. This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes.

The only real footnote to this is about situations in which the resumption of the transaction in the async workers is mediated by external events, e.g. a POST-back into Kamailio's `xhttp` server. While this does not change the nature of the subsequent synchronous execution of the route logic, it does mean that neither a core CIP worker nor an async worker is tied up while some kind of external processing is playing out.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy... Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy... Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Ben Kaufman

9:05 p.m.

* so it looks like success.

How is it not success? It is not just "not dropping messages". All messages are responded to in only slightly longer than the 1 second delay provided by the web server. How is handling 300 request per second rather than 2 (the number of children) not an improvement in throughput?

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

[img]https://www.sip.us/ [img]https://www.siptrunk.com/ [img]https://www.flowroute.com/

________________________________ From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Thursday, August 29, 2024 3:45 PM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

-- Alex

...

On Aug 29, 2024, at 4:18 PM, Ben Kaufman bkaufman@bcmone.com wrote:

I'm not sure I understand how it's not getting more throughput. Every request got its reply in only a little over a second, from the first request to the 18,000 request. Using a blocking http request this config (low number of workers) died quickly.

The original post in this thread (i.e. the real-world example) was a 302-redirect server. My example traded that for a 404, because it was easier to implement than returning values from the web server to make a 302 reply, but certainly extracting information from the http reply and putting it into the SIP reply is trivial.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Thursday, August 29, 2024 2:41 PM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Ben,

You're absolutely correct that the transaction is resumed upon receipt of an HTTP reply, and that the async workers do not block on this event.

This is also true of a variety of other patterns and workflows which shuttle the transactions off to an async process pool, not really specific to the HTTP client modules. `tsilo` comes to mind.

My overview was meant to make a more general point: once the transaction is resumed, the processing is otherwise linear and identical in every other respect to how it would unfold in the main SIP worker process. This is what I meant when I said:

"This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes."

My over-arching point is that throughput is still limited by the (fixed) size of the async worker pool, and that the same considerations about its sizing apply in async land as in regular child process land.

For this reason, I don't think you are correct to conclude from this exercise that using asynchronous mode / `http_async_client` increases _throughput_. Your example config has one async worker, and that worker can only handle one resumed transaction at a time. This is not a net increase in throughput. You say in your documentation: "Re-running the load test will result in heavily blocked and dropped SIP requests", which is absolutely true; in this paradigm, you are liberating the SIP workers from the congestion that would cause "blocked and dropped SIP requests", by doing what I described in my original message: "suspending it and shipping it to another set of ancillary worker processes".

The real conflation here is between "not dropping SIP requests" and "increased throughput". There is no increased throughput, only a deep queue of suspended transactions lined up at the async worker pool. This goes back to the question I raised in a previous message: what exactly is meant by "handle"? Are you "not dropping" 300 CPS, or are you relaying 300 CPS worth of requests? I fully believe that one async worker can churn through hundreds of resumed transactions per second, driven by near-simultaneous HTTP replies, when the effect is simply to send a '404 Not Found', as in your config. Real-world workloads don't particularly resemble that in most cases.

You're not getting more _throughput_. You're changing where the blocking occurs so as to not obstruct receipt of further SIP messages, which is precisely what I meant when I said: "suspending it and shipping it to another set of ancillary worker processes".

From the perspective of not dropping SIP messages, sweeping these transactions out of the main worker pool is indeed quite effective, and if it is in that narrow sense of "not dropping" that you meant "keeping up with the requests", then you are quite right. I was thinking about bottom-line throughput capacity. I think there is a misconception out there that other config script operations also acquire an asynchronous character, beyond the initial "shipping", as it were, and this is the idea I meant to single out.

-- Alex

...
On Aug 29, 2024, at 3:13 PM, Ben Kaufman bkaufman@bcmone.com wrote:

I believe that there's a conflation of the http_async_client module and the async module going on here. My understanding (and testing bears out), that the http async client module works in the manner you describe as being "mediated by external events" in that the http reply is the external trigger that resumes the transaction. It appears that the module abstracts any actual calls to suspend/resume.

I created a test project that demonstrates this: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com...https://github.com/whosgonna/kamailio_http_async

Using the container deminy/delayed-http-response to create an http service that sleeps 1 second and then replies, and having Kamailio set with 2 child listening processes, and 1 http_async_client worker, a simple use of the http_async_query() function handles 300 cps sustained over a minute with no problems. This passes all of the latency load onto the http server. Yes, if the http server cannot handle the request load as the requests increase, that will be a problem, but I think the understanding of how this module works is incorrect.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Tuesday, August 27, 2024 6:57 AM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

...
On Aug 27, 2024, at 4:17 AM, Henning Westerholt via sr-users sr-users@lists.kamailio.org wrote:

The asynchronous HTTP client only helps you if you are having other traffic that can be handled without the need for HTTP API calls, and/or if you are having traffic fluctuations, so you can prevent blocking by buffering requests in memory basically.

Indeed. It's also worth reiterating that the meaning of "asynchronous" is somewhat environmentally and implementationally specific.

As the term has entered general use with the popularity of single-threaded / single event loop multiplexing systems, such as Node and JavaScript, it has come to refer to a programming and processing pattern in which the waiting and detection of I/O is delegated to the OS kernel network stack. The OS takes care of this juggling and calls event hooks or callbacks in your program when there is I/O to consume, or sets some flag or condition to indicate this so that you can read the I/O from some OS buffer at your convenience. In this way, your program is able to proceed executing other kinds of things while the OS is taking care of waiting on I/O. Provided that the workload consists of waiting on I/O and also other things, this is to the general benefit of "other things", not the I/O.

In Kamailio, asynchronous processing just means liberating the transaction from the main worker processes, which are part of a relatively small fixed-size pool, by suspending it and shipping it to another set of ancillary worker processes, also part of a relatively small, fixed-size pool. Within those ancillary worker processes, the execution is as linear, synchronous and blocking as it would be in the main worker processes. This does not cause the processing to enter some generally more asynchronous mode in any other respect, and in that sense, is quite different to what most people have in mind when they think of asynchronous processing in the context of general-purpose programming runtimes.

The only real footnote to this is about situations in which the resumption of the transaction in the async workers is mediated by external events, e.g. a POST-back into Kamailio's `xhttp` server. While this does not change the nature of the subsequent synchronous execution of the route logic, it does mean that neither a core CIP worker nor an async worker is tied up while some kind of external processing is playing out.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

Kamailio - Users Mailing List - Non Commercial Discussions To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender! Edit mailing list options or unsubscribe:

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

Alex Balashov

9:30 p.m.

...

On Aug 29, 2024, at 5:05 PM, Ben Kaufman bkaufman@bcmone.com wrote:
• so it looks like success.
How is it not success? It is not just "not dropping messages". All messages are responded to in only slightly longer than the 1 second delay provided by the web server. How is handling 300 request per second rather than 2 (the number of children) not an improvement in throughput?

"Looks like success [with the tacit insinuation that it's actually not]" was probably uncharitable. You're right that

However, it's not an increase in _throughput_. It's a work around Kamailio's concurrency architecture vis-a-vis HTTP. You've just created an elastic buffer for slow HTTP requests. There is, essentially, process A (SIP worker) and process B (async worker), and they both process the request the same way.

Moving the work to process B is beneficial because it's not exposed to incoming SIP packets, while process A is. Instead of waiting on HTTP requests in processes of type A, you're waiting on them in processes of type B. You're still blocking a process and waiting. Vitally, the throughput is still bounded by process B and by available memory, and, more to the point, the considerations, and limitations, around increasing the number of processes, of either the A or B type, are the same.

The picture I painted was:

"asynchronous processing just means liberating the transaction from the main worker processes, which are part of a relatively small fixed-size pool, by suspending it and shipping it to another set of ancillary worker processes"

Your critique of this was, as I understood it:

"this does not simply 'hand off the transaction' to another pool of workers which then accumulate load."

My only aim here is to say that this is, in fact, an accurate characterisation of what is happening. You are handing off the transaction to another pool of workers. I also meant to convey that Kamailio's async model is more coarse than that of async I/O in other execution environments.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Alex Balashov

9:31 p.m.

...

On Aug 29, 2024, at 5:30 PM, Alex Balashov abalashov@evaristesys.com wrote:

"Looks like success [with the tacit insinuation that it's actually not]" was probably uncharitable. You're right that

Ah, trackpad accident: meant to say that you're right that it allows Kamailio to answer these requests instead of simply grinding to a halt.

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Ben Kaufman

10:35 p.m.

Semantics aside, the issue from the original post isn't so much that "the http request blocks processing the SIP message", but that "the http request blocks processing the SIP message for about a second, during which time the process cannot do anything else". If the original poster's web server responded in less than 5ms, we wouldn't be having this conversation.

So, using kamailio's ASYNC module, the pattern would be to call async_task_route() in listener process A. This passes the transaction to async_worker process A. Then, in the async_worker process, the http request is made and that async worker process is blocked until the web server responds. And yes, if that's what is happening it is just moving the wait time from one process to another.

With the http async client, however, my understanding is that the request is sent from listener process A, and then the transaction is suspened. The http async worker process, which not the same as async workers, is running an event loop (libevent is a requirement for http async client). When a http reply is received, the http async worker will resume the transaction that sent the request.

The difference is that with the http async client, waiting for the http reply doesn't block anything. Supposing we have a single listener process and a single http async worker process. In very quick succession, 5 SIP requests are received, all requiring an http request. If the 5th http request is the first to receive an http response, then that response will be handled immediately - it's not blocked waiting for the prior 4 http replies.

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

[img]https://www.sip.us/ [img]https://www.siptrunk.com/ [img]https://www.flowroute.com/

________________________________ From: Alex Balashov via sr-users sr-users@lists.kamailio.org Sent: Thursday, August 29, 2024 4:30 PM To: Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org Cc: Alex Balashov abalashov@evaristesys.com Subject: [SR-Users] Re: http_async and tm

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

...

On Aug 29, 2024, at 5:05 PM, Ben Kaufman bkaufman@bcmone.com wrote:
• so it looks like success.
How is it not success? It is not just "not dropping messages". All messages are responded to in only slightly longer than the 1 second delay provided by the web server. How is handling 300 request per second rather than 2 (the number of children) not an improvement in throughput?

"Looks like success [with the tacit insinuation that it's actually not]" was probably uncharitable. You're right that

The picture I painted was:

Your critique of this was, as I understood it:

"this does not simply 'hand off the transaction' to another pool of workers which then accumulate load."

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fevaristesy...https://evaristesys.com/ Tel: +1-706-510-6800

Alex Balashov

30 Aug 30 Aug

11:53 a.m.

...

On Aug 29, 2024, at 6:35 PM, Ben Kaufman bkaufman@bcmone.com wrote:

So, using kamailio's ASYNC module, the pattern would be to call async_task_route() in listener process A. This passes the transaction to async_worker process A. Then, in the async_worker process, the http request is made and that async worker process is blocked until the web server responds. And yes, if that's what is happening it is just moving the wait time from one process to another.

With the http async client, however, my understanding is that the request is sent from listener process A, and then the transaction is suspened. The http async worker process, which not the same as async workers, is running an event loop (libevent is a requirement for http async client). When a http reply is received, the http async worker will resume the transaction that sent the request.

The difference is that with the http async client, waiting for the http reply doesn't block anything. Supposing we have a single listener process and a single http async worker process. In very quick succession, 5 SIP requests are received, all requiring an http request. If the 5th http request is the first to receive an http response, then that response will be handled immediately - it's not blocked waiting for the prior 4 http replies.

I understand your argument, as I did before. There is a nonblocking socket event mux, in a dedicated process outside of the normal process pools, which is running an event loop on behalf of those normal processes and activating async workers with a resumed transaction when an HTTP reply is received. While that waiting is happening, the async workers are not themselves blocked.

However, in this setup, the async workers only do one kind of thing -- postprocess the HTTP response. That's all they do. So, there's nothing they're freed up to do by not waiting on HTTP queries, and, conversely, there's nothing to be gained from adding more of them in this paradigm that wouldn't also be gained by adding more of them and having them each do a blocking HTTP query, as in the first paradigm you discussed above.

This whole polemic is rooted in the fact that issuing redirects based on HTTP responses is all that this proxy does. If it did anything else, these would all be fair points.

-- Alex

-- Alex Balashov Principal Consultant Evariste Systems LLC Web: https://evaristesys.com Tel: +1-706-510-6800

Ben Kaufman

29 Aug 29 Aug

7:17 p.m.

To the original question posted here about the reply retransmission - is your testing client SIPp? SIPp seems to have an odd behavior where the branch value in it's via increases with each request. There is the ability to set [branch-N], where N is the value to decrement this. Since every message (both request and replies) increments this, if your pattern is INVITE, 100, 302, ACK, the ACK's branch value would need to be decremented by 3.

https://sipp.readthedocs.io/en/latest/scenarios/keywords.html?highlight=%5Bb...

Kaufman Senior Voice Engineer

E: bkaufman@bcmone.com

SIP.US Client Support: 800.566.9810 | SIPTRUNK Client Support: 800.250.6510 | Flowroute Client Support: 855.356.9768

[img]https://www.sip.us/ [img]https://www.siptrunk.com/ [img]https://www.flowroute.com/

________________________________ From: alexis via sr-users sr-users@lists.kamailio.org Sent: Saturday, August 24, 2024 5:48 PM To: sr-users@lists.kamailio.org sr-users@lists.kamailio.org Cc: alexis alzrck@gmail.com Subject: [SR-Users] http_async and tm

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

So i decided to move to http_async_client and started working on it on the lab with this first and base concept to test.

request_route {

http://192.168.86.128:8000/ receives the POST, randomly creates a delay between 0.5 and 1 second and responds (simulating the real api with an excess delay to probe the concept)

Then

route[CRE_RESPONSE] { if ($http_ok && $http_rs == 200) { xlog("L_INFO","CRE RESPONSE: $http_rb\n"); # for testing purpose, Contact content will be replaced from the received api response append_to_reply("Contact: <sip:1234@google.commailto:sip%3A1234@google.com>\r\n"); send_reply(302,"Moved Temporarily"); exit; } send_reply(500, "Internal error"); exit; }

INVITE is received and processed, API is called, after API response, 302 is replied and then an ACK (ignored by now).

Situation is that the 302 retransmitted

37 1519.846253067 192.168.86.34 → 192.168.86.128 SIP/SDP 585 Request: INVITE sip:service@192.168.86.128:5060http://sip:service@192.168.86.128:5060/ | 38 1519.848100380 192.168.86.128 → 192.168.86.34 SIP 318 Status: 100 Trying | 39 1520.094997642 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 40 1520.102323728 192.168.86.34 → 192.168.86.128 SIP 453 Request: ACK sip:service@192.168.86.128:5060http://sip:service@192.168.86.128:5060/ | 41 1520.591300933 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 42 1521.591061065 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily | 43 1523.591227956 192.168.86.128 → 192.168.86.34 SIP 407 Status: 302 Moved Temporarily |

Any help to avoid the retransmission and make the transaction just finish right after the 302 will be appreciated.

regards

298

Age (days ago)

304

Last active (days ago)

sr-users@lists.kamailio.org

27 comments

6 participants

tags (0)

participants (6)

Alex Balashov
Alexandru Covalschi
alexis
Alexis Fidalgo
Ben Kaufman
Henning Westerholt