@herlesupreeth: commenting on the code added, because a static code analyser reports possible issues like:
``` 614 codepoint = ((utf8_char & 0x07) << 18) 615 | (((unsigned char)utf8[utf8_index++] & 0x3F) << 12) 616 | (((unsigned char)utf8[utf8_index++] & 0x3F) << 6) 617 | ((unsigned char)utf8[utf8_index++] & 0x3F);
In "((utf8_char & 7) << 18) | (((unsigned char)utf8[utf8_index++] & 0x3f) << 12) | (((unsigned char)utf8[utf8_index++] & 0x3f) << 6)", "utf8_index" is written in "((unsigned char)utf8[utf8_index++] & 0x3f) << 6" and written in "((utf8_char & 7) << 18) | (((unsigned char)utf8[utf8_index++] & 0x3f) << 12)" but the order in which the side effects take place is undefined because there is no intervening sequence point. ```
Likely it is about the fact that in a bitwise OR expression `x | y` the standard does not explicitly enforces an evaluation order, so `y` can be evaluated before `x`.
For the code of this PR, may result that `utf8_index` is incremented first by an more-right side sub-part of the expression before others in the left and might give unpredictable results. It might be better to use `[utf8_index+1]`, `[utf8_index+2]` ... and afterwards update the value of `utf8_index`.