### Description I´m using kamailio´s sqlops-module on a postgresql database.
If the database server gets unreachable during operation, sqlops detects this and tries to reconnect: ``` 14:40:48.784 WARNING: db_postgres [km_dbase.c:259]: db_postgres_submit_query(): postgres query command failed, connection status 1, error [no connection to the server] 14:40:48.784 DEBUG: db_postgres [km_dbase.c:262]: db_postgres_submit_query(): resetting the connection to postgress server 14:40:48.784 ERROR: db_postgres [km_dbase.c:267]: db_postgres_submit_query(): 0x7f80a8093680 PQsendQuery Error: could not connect to server: Connection refused ```
and finally, when the DB comes up again, succeeds in executing queries: ``` 14:41:39.738 WARNING: db_postgres [km_dbase.c:259]: db_postgres_submit_query(): postgres query command failed, connection status 1, error [SSL connection has been closed unexpectedly] 14:41:39.739 DEBUG: db_postgres [km_dbase.c:262]: db_postgres_submit_query(): resetting the connection to postgress server 14:41:39.810 DEBUG: db_postgres [km_dbase.c:249]: db_postgres_submit_query(): sending query ok: 0x7f80a8093680 (1) ``` So, obviously the module can handle database outages well.
But if the DB is already down when kamailio is started, the cyclic reconnect does not work. Instead, kamailio fails to start immediately:
``` 14:57:54.187 CRITICAL: <core> [main.c:1614]: main_loop(): Cannot fork 14:57:54.188 DEBUG: db_postgres [km_pg_con.c:112]: db_postgres_new_connection(): PQconnectdbParams(0x136bfb0) 14:57:54.188 DEBUG: <core> [core/sr_module.c:938]: init_mod_child(): idx 12 rank -1: rtimer [timer] 14:57:54.188 DEBUG: <core> [core/sr_module.c:938]: init_mod_child(): idx 12 rank -1: sqlops [timer] 14:57:54.188 DEBUG: <core> [core/sr_module.c:708]: find_mod_export_record(): found export of <db_bind_api> in module db_postgres [/usr/lib64/kamailio/modules/db_postgres.so] 14:57:54.188 DEBUG: <core> [db.c:209]: db_bind_mod(): using db bind api for db_postgres 14:57:54.188 DEBUG: <core> [db.c:314]: db_do_init2(): connection 0x7f23c0154120 not found in pool 14:57:54.188 DEBUG: db_postgres [km_pg_con.c:56]: db_postgres_new_connection(): db_id = 0x7f23c0154120 14:57:54.188 DEBUG: db_postgres [km_pg_con.c:70]: db_postgres_new_connection(): 0x7f23c0154380=pkg_malloc(80) 14:57:54.188 DEBUG: db_postgres [km_pg_con.c:84]: db_postgres_new_connection(): opening connection: postgres://xxxx:xxxx@xxxxxxxxxxxxxxxxxx 14:57:54.188 DEBUG: db_postgres [km_pg_con.c:112]: db_postgres_new_connection(): PQconnectdbParams(0x13a41e0) 14:57:54.188 ERROR: db_postgres [km_pg_con.c:115]: db_postgres_new_connection(): could not connect to server: Connection refused Is the server running on host "xxxxxxxxxxxx" () and accepting TCP/IP connections on port 5432? 14:57:54.188 ERROR: db_postgres [km_pg_con.c:148]: db_postgres_new_connection(): cleaning up 0x7f23c0154380=pkg_free() 14:57:54.188 ERROR: <core> [db.c:318]: db_do_init2(): could not add connection to the pool 14:57:54.188 ERROR: sqlops [sql_api.c:164]: sql_connect(): failed to connect to the database [cb] 14:57:54.188 ERROR: <core> [core/sr_module.c:942]: init_mod_child(): error while initializing module sqlops (/usr/lib64/kamailio/modules/sqlops.so) (idx: 12 rank: -1 desc: [timer]) 14:57:54.189 ERROR: <core> [core/pt.c:340]: fork_process(): init_child failed for process 12, pid 24036, "timer" 14:57:54.189 CRITICAL: <core> [main.c:1691]: main_loop(): cannot fork timer process 14:57:54.189 ALERT: <core> [main.c:743]: handle_sigs(): child process 24009 exited normally, status=255 14:57:54.190 ALERT: <core> [main.c:743]: handle_sigs(): child process 24010 exited normally, status=255 14:57:54.190 ALERT: <core> [main.c:743]: handle_sigs(): child process 24011 exited normally, status=255 14:57:54.191 ALERT: <core> [main.c:743]: handle_sigs(): child process 24013 exited normally, status=255 14:57:54.191 ALERT: <core> [main.c:743]: handle_sigs(): child process 24015 exited normally, status=255 14:57:54.191 ALERT: <core> [main.c:743]: handle_sigs(): child process 24019 exited normally, status=255 14:57:54.191 ALERT: <core> [main.c:743]: handle_sigs(): child process 24030 exited normally, status=255 14:57:54.192 INFO: <core> [main.c:771]: handle_sigs(): terminating due to SIGCHLD 14:57:54.192 INFO: <core> [main.c:826]: sig_usr(): signal 15 received 14:57:54.192 INFO: <core> [main.c:826]: sig_usr(): signal 15 received ```
#### Reproduction
- use kamailio-script with sqlops - stop kamailio - stop database - try to start kamailio
#### Log Messages see above
### Additional Information
* **Kamailio Version** - output of `kamailio -v` kamailio 5.1.2 (x86_64/linux)
* **Operating System**: RHEL 7.4 on x86_64
Hello, thank you for the report. This is the current expected behavior of Kamailio. Modules that need a database will block the start if the database is not available. You can start a discussion on the sr-dev list about this, if you plan to change this for the sqlops module e.g. with a pull request. Best regards, Henning
I like this default behaviour, but if that's not what you need one could add a configuration option to make it optional.
Hi, thanks for the responses. While I agree that the default behavior makes fully sense for most (or nearly all) modules, still there are usecases where a more tolerant handling would be nice. In my case, this is a multi-node homer-installation (as advised here: https://github.com/sipcapture/homer/issues/254 ) - the kamailio (used as sipcapture-server) uses a local sipcapture database, but all nodes share the same statistics database (accessed by sqlops) - in case the statistics-DB is not reachable temporarily during startup, I'd rather keep the sipcapture DB (and lose the statistics) than blocking startup of kamailio and losing both.
I'm not experienced in kamailio-development, still I tried to implement that functionality - in https://github.com/kamailio/kamailio/blob/master/src/modules/sqlops/sqlops.c I changed the static int child_init(int rank) to always return 0 (regardless of the return-value of sql_connect() ): ``` static int child_init(int rank) { if (rank==PROC_INIT || rank==PROC_MAIN || rank==PROC_TCP_MAIN) return 0; int sql_result = sql_connect(); LM_INFO("SQL result: %d \n", sql_result); return 0; } ``` The effect was, that kamailio starts even if the database is not reachable (and sql_connect() returns -1): ``` INFO: sqlops [sqlops.c:156]: child_init(): SQL result: -1 ERROR: db_postgres [km_pg_con.c:115]: db_postgres_new_connection(): could not connect to server: Connection refused ERROR: db_postgres [km_pg_con.c:148]: db_postgres_new_connection(): cleaning up 0x7f858b5d6f88=pkg_free() ERROR: <core> [db.c:318]: db_do_init2(): could not add connection to the pool ERROR: sqlops [sql_api.c:164]: sql_connect(): failed to connect to the database [cb] INFO: sqlops [sqlops.c:156]: child_init(): SQL result: -1 INFO: ctl [io_listener.c:210]: io_listen_loop(): io_listen_loop: using epoll_lt as the io watch method (auto detected) ERROR: db_postgres [km_pg_con.c:115]: db_postgres_new_connection(): could not connect to server: Connection refused ERROR: db_postgres [km_pg_con.c:148]: db_postgres_new_connection(): cleaning up 0x7f858b5d6f88=pkg_free() ERROR: <core> [db.c:318]: db_do_init2(): could not add connection to the pool ERROR: sqlops [sql_api.c:164]: sql_connect(): failed to connect to the database [cb] INFO: sqlops [sqlops.c:156]: child_init(): SQL result: -1 INFO: <script>: INSERT INTO stats_ip_mem as stats ( method, source_ip, total) VALUES('INVITE', ... ERROR: <core> [db_query.c:176]: db_do_raw_query(): invalid parameter value ERROR: sqlops [sql_api.c:265]: sql_do_query(): cannot do the query [INSERT INTO stats_ip_mem as stats ( method, source_ip, total) VA] ```
However, the module does not "recover" if the database gets reachable afterwards - I still get the error-message `ERROR: <core> [db_query.c:176]: db_do_raw_query(): invalid parameter value` for each sql_query in the script. Can somebod advise my where/when I could trigger the re-connect after the DB comes up again? As sqlops does that properly for DB-outages after a successful sql_connect(), I hoped it would work also in this case - but it seems that this case needs separate handling.
The issue here is that the sql_connect() bind and initialize the internal database handle structure. If this is not properly done, the module can't operate later on the database. If you want to implement a change like this, I would suggest to dig a bit deeper in the code.
One idea would be to periodically trigger (e.g. with a timer) a check if the DB got available, and then initialize it properly. Probably also necessary would be a protection of every DB access with a lock, to prevent errors during the initialization (because of the Kamailio multi-process architecture).
If you are interested in digging into that feel free to discuss your approach on our sr-dev list. If you managed to get a first working version, then just open a pull request for a review.
Thanks for your suggestions. I first tried the periodic timer, but I somehow did not get it working for the child-processes: altough I started a timer in each child, and the timer-method was even executed per child, it was always executed in the same process (verified with getpid()), so that the "postponed" initialization happen only in one process.
Anyhow, I then focussed on another approach, where the initialization is triggered with each sql-query, instead of periodically (which is anyway the more efficient way). A first version worked quite well in my tests, and I am already trying to figure out how to open a PR ;-)
Just opened https://github.com/kamailio/kamailio/pull/1706 - as this is my first pull request at all, I hope I did everything correctly ...
PR was merged.
Closed #1681.