I was trying to do a cfg_rpc update on a variable and some of the times, the variable would take on randomly large values or negative values.
The -1 here is probably causing issues.
https://github.com/kamailio/kamailio/blob/1d53ea3dba4e59b05b2e92ecc973c44159...
Here's an explanation of what's going on (debugged using gdb). In cfg_struc.c if you put a breakpoint in the following line in cfg_clone_global
https://github.com/kamailio/kamailio/blob/1d53ea3dba4e59b05b2e92ecc973c44159...
checking variable values:
``` --- Old (correct) value ---
(gdb) print sizeof(*(int *)(((unsigned char *)((*cfg_global)->vars + 984)) + 172)) $33 = 4 (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((*cfg_global)->vars + 984)) + 172)) + 0) $34 = 99 'c' (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((*cfg_global)->vars + 984)) + 172)) + 1) $35 = 0 '\000' (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((*cfg_global)->vars + 984)) + 172)) + 2) $36 = 0 '\000' (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((*cfg_global)->vars + 984)) + 172)) + 3) $37 = 0 '\000'
--- new (corrupt) value ---
(gdb) print sizeof(*(int *)(((unsigned char *)((block)->vars + 984)) + 172)) $28 = 4 (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((block)->vars + 984)) + 172)) + 0) $29 = 99 'c' (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((block)->vars + 984)) + 172)) + 1) $30 = 0 '\000' (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((block)->vars + 984)) + 172)) + 2) $31 = 0 '\000' (gdb) print *((unsigned char *)&(*(int *)(((unsigned char *)((block)->vars + 984)) + 172)) + 3) $32 = 130 '\202' ```
As visible, the last byte is getting corrupted. I checked the memory allocation variables:
``` --> cfg_block_size
(gdb) print cfg_block_size $2 = 1156
--> sizeof(cfg_block_t)
print sizeof(cfg_block_t) $15 = 8 ```
This means that total memory assigned = 8 + 1156 -(1) = 1163
``` --> address of new block
(gdb) print (void *) block $12 = (void *) 0x7f63086b6758
--> address of the corrupted variable in the new block
(gdb) print (void *)(((unsigned char *)((block)->vars + 984)) + 172) $13 = (void *) 0x7f63086b6be0
--> offset of the variable from the block start
(gdb) print 0x7f63086b6be0 - 0x7f63086b6758 $14 = 1160 ```
since the variable is an integer, memory that should be assigned = 1160 + 4 = 1164 However, we're assigning 1163.
Therefore the last byte is getting corrupted.
Does it make sense to remove the -1 from all the memory allocation in cfg_struct ?
Based on your description above, it seems you are right. I didn't have the time for proper analysis of the code, but at a quick look, there are many places where memory allocation is done with `-1`, maybe that should be removed in all those places.
Did you test by removing the `-1` and it worked as expected?
I asked Miklos (initial author of that code) and he clarified quickly: the reason for `-1` is to get rid of the byte of the field `unsigned char vars[1]` from the `struct _cfg_block`.
He spotted that sizeof(cfg_block_t) is 8 on your system, while it should be 9:
``` /*! \brief single memoy block that contains all the cfg values */ typedef struct _cfg_block { atomic_t refcnt; /*!< reference counter, the block is automatically deleted when it reaches 0 */ int _pad; /*!< force 8 byte alignment */ unsigned char vars[1]; /*!< blob that contains the values */ } cfg_block_t; ```
What version of kamailio are you using? Compiled by yourself or from packages built by the project? Also, what is the operating system and kernel (uname -a) versions?
Kamailio version - 4.4 Compiled by source code
uname -a (IP obfuscated) Linux ip-XXXXXXXXXX 4.4.23-31.54.amzn1.x86_64 #1 SMP Tue Oct 18 22:02:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
In the code that I have for cfg_block_t
``` /*! \brief single memoy block that contains all the cfg values */ typedef struct _cfg_block { atomic_t refcnt; /*!< reference counter, the block is automatically deleted when it reaches 0 */ unsigned char vars[1]; /*!< blob that contains the values */ } cfg_block_t; ```
There's no padding to force the 8 byte alignment. Could that be the reason for it to fail ?
Although if above was the case, theoretically, we should have the same issue.
Is a well known Linux distro such as Debian, CentOS, or is it a custom made Linux?
Paste here the output of `kamailio -I` (that's uppercase i) to see the compile flags and what atomic_t is built to. Even if it is not 9, the size of struct should have been 5, not 8.
it's running on AWS EC2's Amazon Linux (which is based on Enterprise Red Hat Linux).
the output of `kamailio -I` is:
``` Print out of kamailio internals Version: kamailio 4.4.2 (x86_64/linux) 892ad6 Default config: /xxxxxx/xxx/xxx/etc/kamailio/kamailio.cfg Default paths to modules: /xxxxxx/xxx/xxx/lib64/kamailio/modules Compile flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES MAX_RECV_BUFFER_SIZE=262144 MAX_LISTEN=16 MAX_URI_SIZE=1024 BUF_SIZE=65535 DEFAULT PKG_SIZE=8MB DEFAULT SHM_SIZE=64MB ADAPTIVE_WAIT_LOOPS=1024 TCP poll methods: poll, epoll_lt, epoll_et, sigio_rt, select Source code revision ID: 892ad6 Compiled with: x86_64-unknown-linux-gnu-gcc 4.9.4 Compiled on: 16:57:00 Jun 5 2018 Thank you for flying kamailio! ```
just to confirm if gdb is not messing up something, I calculated the memory used inside the C file:
``` int vin_block = sizeof(cfg_block_t); int vin_atom = sizeof(atomic_t); unsigned char v_vars[1]; int vin_vars = sizeof(v_vars);
--- GDB output ---
(gdb) print vin_block $1 = 8 (gdb) print vin_atom $2 = 4 (gdb) print vin_vars $3 = 1 ```
This should be fine as gcc usually does padding as discussed here: https://stackoverflow.com/questions/1841863/size-of-struct-in-c
So even if the sum of components is 5, it pads it to 8 (unless you have padding disabled in gcc). For, your case, where it's 9 ... it should actually be 12 if padding is enabled (this is enabled by default in gcc for many linux distros).
I'm not sure how the actual config variable mapping is done but it seems like the way it's being mapped, our variable ends up at the final 3 byte block (although the size is 4 bytes); which also means that there might be gaps in the middle of the memory block. However, this is just speculation.
let me know if you need any further info.
Can you enclose the structure between:
``` #pragma pack(push, 1) ... #pragma pack(pop) ``` and try again?
enclosing the structure in pragma block set the sizeof(cfg_block_t) to 5. After this, the bug also stopped occurring but that could just be co-incidental.
Could you explain why sizeof(cfg_block_t) = 8 fails and sizeof(cfg_block_t) = 5 works ?
The structure used a char[1] field at the end for a dynamic size array, the array was supposed to start at offset `sizeof(cfg_block_t) - 1`, respectively 4, which is aligned, but as the sizeof was rounded up (structure padded), practically, the array was no longer starting at the expected offset.
The latest versions of C standard allows use of `char[]` for dynamic size array, maybe we should switch to it. For now I will push this patch.
Pushed the commit to master branch, can you test and report if all goes fine?
Turns out the `pragma` fix was just co-incidental. I did a couple of other tests. This time I printed the complete memory mapping as follows in `cfg_clone_global`:
``` LOG(L_ERR,"START"); LOG(L_ERR,"block address: %p",(void *)block); LOG(L_ERR,"vars address: %p",(void *)(block->vars)); LOG(L_ERR,"Total allocated size: %d",(sizeof(cfg_block_t)+cfg_block_size-1)); int count = 0; for (group = cfg_group; group; group=group->next ){ count = count + 1; LOG(L_ERR,"Group number: %d", count); LOG(L_ERR,"Group location from start %d",(int)(((void *)(CFG_GROUP_DATA(block, group)))-((void *)block))); LOG(L_ERR,"Group size: %d", group->size); } ```
I did two tests: Test 1: With pragma block around cfg_block_t only Test 2: With the complete patch applied
Test1 output:
``` Memory mapping (with pragma for cfg_block_t) ============================================
(Error persists and variables takes random values)
## All the locations below assume that the address starts at 0
START block address: 0x7f6c37550d98 vars address: 0x7f6c37550d9c Total allocated size: 1160 Group number: 1 Group location from start 20 Group size: 248 Group number: 2 Group location from start 284 Group size: 56 Group number: 3 Group location from start 356 Group size: 4 Group number: 4 Group location from start 380 Group size: 4 Group number: 5 Group location from start 404 Group size: 4 Group number: 6 Group location from start 428 Group size: 176 Group number: 7 Group location from start 620 Group size: 104 Group number: 8 Group location from start 740 Group size: 200 Group number: 9 Group location from start 956 Group size: 16 Group number: 10 Group location from start 988 Group size: 172
## still one extra byte ```
As you can see 988+172=1160 but the address is starting from 0 here so we should have assigned 1161 bytes of memory.
Test2 output:
``` Memory mapping (with the complete patch applied) ================================================
(Error persists and variables takes random values)
## All the locations below assume that the address starts at 0
START block address: 0x7fc1537af4b8 vars address: 0x7fc1537af4bc Total allocated size: 1160 Group number: 1 Group location from start 20 Group size: 248 Group number: 2 Group location from start 284 Group size: 56 Group number: 3 Group location from start 356 Group size: 4 Group number: 4 Group location from start 380 Group size: 4 Group number: 5 Group location from start 404 Group size: 4 Group number: 6 Group location from start 428 Group size: 176 Group number: 7 Group location from start 620 Group size: 104 Group number: 8 Group location from start 740 Group size: 200 Group number: 9 Group location from start 956 Group size: 16 Group number: 10 Group location from start 988 Group size: 172
## still one extra byte ```
Not sure why the first group starts from location 20 but I can say with confidence that using `pragma` blocks doesn't solve this issue.
In fact, adding `pragma` made things even worse, because now, even less memory is being assigned. If we check the offset of the variable being corrupted in gdb, it's the same as before:
``` (gdb) print var->offset $1 = 172 (gdb) print group->var_offset $2 = 984 ``` However, since the group size of last group is 172, this 172 offset would corrupt 3 bytes now ... which I can see in the tests that I'm doing.
Another thing to verify would be how can a variable in a group have the offset 172, when the group size is calculated to be 172 (probably something to do with `pragma`).
To check what's going on, I added another integer cfg variable to the last group. The memory mapping this time (without using `pragma` blocks) is as follows:
``` START block address: 0x7fde5c80fa08 vars address: 0x7fde5c80fa0c Total allocated size: 1168 Group number: 1 Group location from start 20 Group size: 248 Group number: 2 Group location from start 284 Group size: 56 Group number: 3 Group location from start 356 Group size: 4 Group number: 4 Group location from start 380 Group size: 4 Group number: 5 Group location from start 404 Group size: 4 Group number: 6 Group location from start 428 Group size: 176 Group number: 7 Group location from start 620 Group size: 104 Group number: 8 Group location from start 740 Group size: 200 Group number: 9 Group location from start 956 Group size: 16 Group number: 10 Group location from start 988 Group size: 180
#For the new variable added
(gdb) print var->offset $3 = 176 (gdb) print group $4 = (cfg_group_t *) 0x7fdee1bca108 (gdb) print group->var_offset $5 = 984 ```
So, this time, the offset and group size make sense.
According to me, using `pragma` blocks would create more problems. The next thing to investigate would be this inconsistency between group size and variable offset.
Can you give a simple kamailio.cfg and the commands you use to set the vars that expose the issue? It will help to reproduce and troubleshoot locally.
unfortunately, I have these distributed over multiple files. However, the way I set it up looks something like these blocks:
``` . . . #!ifdef A group_name.a = A #!else group_name.a = 0 #!endif . . . ```
the variable used first in the file (`a` in this case) ends up being the last variable in the cfg block. Hence, that's the one that gets corrupted.
This doesn't happen always. We might have hit an edge case where this is happening so it might be hard for you to duplicate it.
To set these variables later, I just use:
``` kamctl kamcmd cfg.set group_name a 1 ```
It is usually during these set operations that the value gets corrupted.
It is up to you how you split your configs and I do not need your full config. But I think you can just make a minimal config file, without #!ifdefs or other hidden values, that I can use here to reproduce. Otherwise, it is hard to help.
I was able to reproduce this using a sample cfg:
``` #!KAMAILIO
sample_group.a = 0
sample_group.b = 0
sample_group.c = 0
sample_group.d = 0
sample_group.e = 0
sample_group.f = 0
sample_group.g = 0
sample_group.h = 0
sample_group.i = 0
sample_group.j = 0
sample_group.k = 0
sample_group.l = 0
sample_group.m = 0
sample_group.n = 0
sample_group.o = ""
sample_group.p = 0
sample_group.q = 0
sample_group.r = 4000
sample_group.s = "null"
sample_group.t = "null"
sample_group.u = "null"
sample_group.v = "null"
sample_group.w = 0
sample_group.x = 0
sample_group.y = 0
sample_group.z = 0
sample_group.a1 = 0 ```
The memory mapping is as follows:
``` START block address: 0x7f43ed1484d8 vars address: 0x7f43ed1484dc Total allocated size: 1352 Group number: 1 Group location from start 20 Group size: 248 Group number: 2 Group location from start 284 Group size: 56 Group number: 3 Group location from start 356 Group size: 4 Group number: 4 Group location from start 380 Group size: 4 Group number: 5 Group location from start 404 Group size: 4 Group number: 6 Group location from start 428 Group size: 176 Group number: 7 Group location from start 620 Group size: 104 Group number: 8 Group location from start 740 Group size: 200 Group number: 9 Group location from start 956 Group size: 16 Group number: 10 Group location from start 988 Group size: 172 Group number: 11 Group location from start 1180 Group size: 172 ```
As you can see that group 10 and 11(created using cfg above) have the same size.
Now, if you try to get the value of sample_group.a and put a breakpoint here:
https://github.com/kamailio/kamailio/blob/master/src/core/cfg/cfg_ctx.c#L131...
i.e. run the following command
`sudo kamctl kamcmd cfg.get sample_group a`
The offsets in gdb are as follows:
``` (gdb) print var->offset $1 = 172 (gdb) print group->var_offset $2 = 1176 ```
Also the value is getting corrupted:
``` sudo kamctl kamcmd cfg.get sample_group a 694019999 ```
if the above returns 0, try running the following script and see if you catch random values:
```
#!/bin/bashfor run in {1..100}; do date; sudo kamctl kamcmd cfg.set sample_group p 1; sudo kamctl kamcmd cfg.get sample_group a; done ```
My output looks like this:
``` . . . Wed Jul 11 18:36:37 UTC 2018 0 Wed Jul 11 18:36:37 UTC 2018 0 Wed Jul 11 18:36:37 UTC 2018 0 Wed Jul 11 18:36:37 UTC 2018 0 Wed Jul 11 18:36:37 UTC 2018 694019999 Wed Jul 11 18:36:37 UTC 2018 0 Wed Jul 11 18:36:37 UTC 2018 694019999 Wed Jul 11 18:36:38 UTC 2018 0 Wed Jul 11 18:36:38 UTC 2018 694019999 Wed Jul 11 18:36:38 UTC 2018 0 Wed Jul 11 18:36:38 UTC 2018 694019999 Wed Jul 11 18:36:38 UTC 2018 0 Wed Jul 11 18:36:38 UTC 2018 . . . ```
OK, that will help -- I will try to reproduce here when I get a chance.
Any news on this?
I was traveling (actually still there) in USA (Cluecon), so I didn't get a chance to try to reproduce, hopefully soon it will happen.
I tried to reproduce using the master branch, run the script couple of times, but no errors, all values are ok.
Can you give the output of `uname -a`?
Can you test in another system with a stock debian or centos?
If no follow up in few days, I am going to close this issue. As written above, I tried to reproduce but no errors.
Re-open if still an issue and you get new data that can help troubleshooting, some asked on previous comments. I tried to reproduce and all was ok.
Closed #1583.
Say you have a kamailio config file with variables ordered as A(int),B(int),C(string),D(int) – (all belonging to the same group). Kamailio first creates a linked list of these variables in a given group and calculates the group size as follows [actual code](https://github.com/kamailio/kamailio/blob/master/src/core/cfg/cfg_script.c#L...)):
``` Pseudo-code --------------------------------------------------------------------------------------------- group->size = 0 group->vars = null // the pointer to the head of the linked list of variables create_group_linked_list: for variable in group: if variable is integer: round up group_size to the nearest 4 byte multiple group_size += sizeof(int) if variable is string: round up group_size to the nearest 8 byte multiple group_size += sizeof(str) //str is a kamailio type (size 16 in our C++ compiler) if group->vars is null: group->vars = variable else variables->next = group->vars group->vars = variable --------------------------------------------------------------------------------------------- Linked list creation
NULL group->size 0
A(int) group->size 4
B(int) -> A group->size 8
C(string) -> B -> A group->size 24
D(int) -> C -> B -> A group->size 28 ``` However, this linked list is not used as is. it is copied to a single shared memory block for all child processes. This is done in the [cfg_shmize()](https://github.com/kamailio/kamailio/blob/master/src/core/cfg/cfg_struct.c#L...) which further calls [cfg_script_fixup()](https://github.com/kamailio/kamailio/blob/master/src/core/cfg/cfg_script.c#L...) Basically, the function goes through the linked list of variables created above and calculates the offset of each variable in the memory block assuming that the offset of the first variable is 0. Following is the pseudo-code of what cfg_script_fixup() does for the linked list and offset calculation:
``` Pseudo-code -------------------------------------------------------------------------------------------- offset = 0 node = group->vars: while node != null: if node's value is int: round up offset to nearest multiple of 4 node->offset = offset offset += sizeof(int) if node's value is str: round up offset to nearest multiple of 8 node->offset = offset offset += sizeof(str) node = node -> next --------------------------------------------------------------------------------------------
linked list (from above): D(int) -> C(string) -> B(int) -> A(int)
offset=0
D D->offset = 0 offset = 4
C C->offset = 8 offset = 24
B B->offset = 24 offset = 28 A A->offset = 28 offset = 32 (which would be the actual group size) ```
So, as visible, ***the order in which the linked list is being processed is causing issues*** (from A to D in one case and from D to A in the next case). while the group size calculation code block leads to a group size of 28, the group size according to the second code block would be 32 and the offset of the last variable would we 28. This would lead to the last variable (all 4 bytes) getting corrupted.
My temporary solution is to recalculate the group size every time a variable is added in the first code block by re-traversing the linked list but is definitely not the ideal solution.