<!-- Kamailio Pull Request Template -->
<!-- IMPORTANT: - for detailed contributing guidelines, read: https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md - pull requests must be done to master branch, unless they are backports of fixes from master branch to a stable branch - backports to stable branches must be done with 'git cherry-pick -x ...' - code is contributed under BSD for core and main components (tm, sl, auth, tls) - code is contributed GPLv2 or a compatible license for the other components - GPL code is contributed with OpenSSL licensing exception -->
#### Pre-Submission Checklist <!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply --> <!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above--> <!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list --> - [x] Commit message has the format required by CONTRIBUTING guide - [x] Commits are split per component (core, individual modules, libs, utils, ...) - [x] Each component has a single commit (if not, squash them into one commit) - [x] No commits to README files for modules (changes must be done to docbook files in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change - [ ] Small bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds new functionality) - [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist: <!-- Go over all points below, and after creating the PR, tick the checkboxes that apply --> - [ ] PR should be backported to stable branches - [ ] Tested changes locally - [ ] Related to issue #XXXX (replace XXXX with an open issue number)
#### Description <!-- Describe your changes in detail --> This PR fixes the conversion from UCS2 to UTF-8 and viceversa. Previously the code was not handling emojis and certain characters. You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/3546
-- Commit Summary --
* smsops: Fix conversion from UCS-2 to UTF-8 and viceversa
-- File Changes --
M src/modules/smsops/smsops_impl.c (219)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/3546.patch https://github.com/kamailio/kamailio/pull/3546.diff
Here is the trace after applying the fix where emojis are handled
[working_ucs2_conversion.zip](https://github.com/kamailio/kamailio/files/12396498/working_ucs2_conversion....)
@henningw requested changes on this pull request.
Thanks for the pull-request. I reviewed the changes, but of course don't tested the multiple paths for the UTF-8 conversion with the different lengths. Two remarks from my side:
- please fix the code formatting by running "clang-format" on the changed files (as indicated from the failing test) - please move your variable definitions from inside the function to the top of the function, this is a convention we try to follow in most cases in the code (like high_surrogate, low_surrogate etc..)
Then please force-push the changes to the commit in this PR to update it here. Other developers might also comment, before it get merged in some time.
Thank you for the review. Will address your comments.
@herlesupreeth pushed 1 commit.
75a96b5e400ac7c828347a343a93e55a64137385 smsops: Fix conversion from UCS-2 to UTF-8 and viceversa
@henningw approved this pull request.
Thanks for the quick feedback and improvements.
Merged #3546 into master.
Thanks for the contribution - looks good to me & merged!
@herlesupreeth: commenting on the code added, because a static code analyser reports possible issues like:
``` 614 codepoint = ((utf8_char & 0x07) << 18) 615 | (((unsigned char)utf8[utf8_index++] & 0x3F) << 12) 616 | (((unsigned char)utf8[utf8_index++] & 0x3F) << 6) 617 | ((unsigned char)utf8[utf8_index++] & 0x3F);
In "((utf8_char & 7) << 18) | (((unsigned char)utf8[utf8_index++] & 0x3f) << 12) | (((unsigned char)utf8[utf8_index++] & 0x3f) << 6)", "utf8_index" is written in "((unsigned char)utf8[utf8_index++] & 0x3f) << 6" and written in "((utf8_char & 7) << 18) | (((unsigned char)utf8[utf8_index++] & 0x3f) << 12)" but the order in which the side effects take place is undefined because there is no intervening sequence point. ```
Likely it is about the fact that in a bitwise OR expression `x | y` the standard does not explicitly enforces an evaluation order, so `y` can be evaluated before `x`.
For the code of this PR, may result that `utf8_index` is incremented first by an more-right side sub-part of the expression before others in the left and might give unpredictable results. It might be better to use `[utf8_index+1]`, `[utf8_index+2]` ... and afterwards update the value of `utf8_index`.
@miconda Ah I see. Thanks for pointing it out. I will open a new PR with that fixed over the weekend