Dear all

seems the issue was not on the module or related to kamailio, but related to the application we were using to read from tcp socket.
I saw that some messages sent with evapi_relay were encapsulated in the same frame, and i even tried to force the TCP_NODELAY option on the evapi socket by compiling the kamailio with this
--- a/src/modules/evapi/evapi_dispatch.c
+++ b/src/modules/evapi/evapi_dispatch.c
@@ -30,8 +30,8 @@
 #include <netinet/in.h>
 #include <arpa/inet.h>
 #include <fcntl.h>
-
 #include <ev.h>
+#include <netinet/tcp.h>
 
 #include "../../core/sr_module.h"
 #include "../../core/dprint.h"
@@ -690,6 +691,15 @@ int evapi_run_dispatcher(char *laddr, int lport)
                freeaddrinfo(ai_res);
                return -1;
        }
+      
+        if(setsockopt(evapi_srv_sock, IPPROTO_TCP, TCP_NODELAY,
+               &yes_true, sizeof(int)) < 0) {
+               LM_INFO("cannot set TCP_NODELAY option on descriptor\n");
+               close(evapi_srv_sock);
+               freeaddrinfo(ai_res);
+               return -1;
+       }
+
 
        if (bind(evapi_srv_sock, ai_res->ai_addr, ai_res->ai_addrlen) < 0) {
                LM_ERR("cannot bind to local address and port [%s:%d]\n", laddr, lport);

and i saw that with this change we had always a frame for each message published to evapi, but the issue was still there. 
So no matter if this option was activated or not in Kamailio, I had to tune the application (in erlang) to delimit the messages received by converting them to line mode. This way we could reach up to 1000 processed messages per second.

best regards
david

 

El lun, 30 nov 2020 a las 11:19, David Escartin (<descartin@sonoc.io>) escribió:
Dear all

we have been testing this module with the following setup
kamailio 5.3.2
evapi params
modparam("evapi", "workers", 4)
modparam("evapi", "netstring_format", 0)
modparam("evapi", "bind_addr", "127.0.0.1:8448")
modparam("evapi", "max_clients", 32)

then in the configuration we do evapi_relay of avp including a json data (which can be quite long), like this
{"key" : "aarp2q0tcpqhs0cpucuhukjs2ah2j00q@10.18.5.64" , "msg" : {"rg_in":"701","ani_init":{"ani_source":"pai", ....... }}}

We have an application listening on the tcp socket and writing those messages to a kafka cluster, and this works ok, and in the previous manual tests we have done no issue was found.
But when making some load tests, and passing some live traffic we see some issues

seems like some times, when there are messages to be sent to the tcp socket at the same time, they are sent in the same message, when normally each data sent using evapi_relay is sent in 1 message
We do sometimes see something like this on the application consuming from the tcp socket
2020-11-25 15:20:01.744 UTC [error] <0.706.0>@evapi_kafka_listener:handle_info:167 body "{\"key\" : \"6142651aa63616c6c04a783cd@72.21.24.130\" , \"msg\" : {\"rg_in\":\"677\",\"ani_init\":{\"ani_source\":\"fro\",.......}}}{\"key\" : \"isbc7caT4001915251VabcGhEfHdNiF0i@172.16.120.1\" , \"msg\" : {\"rg_in\":\"22\",\"ani_init\":{\"ani_source\":\"pai\", ....... ,\"translate" not valid json; error = {691,invalid_trailing_data}
2020-11-25 15:20:01.745 UTC [error] <0.706.0>@evapi_kafka_listener:handle_info:167 body "dPartition\":\"-1\",......}}}" not valid json; error = {1,invalid_json}

and we do see that the application cannot parse the json message fine, because we have like 2 json objects together ......{\"ani_source\":\"fro\",.......}}}{\"key\" : \"isbc7caT4001915251Vabc............
This happens with 2 different UDP receivers processing messages and calling evapi_relay at the same time. But i don't think this happens all the time. Seems like some issue when several processes try to use evapi workers at the same time.
We tried to increase evapi workers and it's the same

We also saw another issue I think. Seems when the avp sent to evapi socket is bigger than ~1680 char, the json is also truncated, and also happens when we use the socket in Lo interface which has an MTU of 65535.

Could you please take a look to see if there is any problem or limitation, or if we are using something wrong?

thanks and best regards 
david

--
Logo

David Escartín Almudévar
VoIP/Switch Engineer
descartin@sonoc.io

SONOC
C/ Josefa Amar y Borbón, 10, 4ª · 50001 Zaragoza, España
Tlf: +34 917019888 ·
 www.sonoc.io



--
Logo

David Escartín Almudévar
VoIP/Switch Engineer
descartin@sonoc.io

SONOC
C/ Josefa Amar y Borbón, 10, 4ª · 50001 Zaragoza, España
Tlf: +34 917019888 ·
 www.sonoc.io