Hello,

One of the reasons I used libunistring for this detection is that for pretty much all of the code fragments I found online for doing this in a "simple" way I saw people pointing out flaws in those algorithms.  Can you confirm that this code doesn't have any of those flaws and is guaranteed to work in all cases (has this implementation been stubbed out and tested fully by someone here)?

What new systems does libunistring not work on?  I have only ever had problems with it on older OS versions which didn't contain it at all.

Does this really improve performance?  Only a tiny, tiny, subset of libunistring is used.  As a result it doesn't really matter if libunistring in general is slow, just whether or not the one function used from libunistring is slow.

It also looks like the code style (particularly indentation) of that patch doesn't really match the style from that file.  If that really is the case it is probably worth aligning the patch to the style in that file before applying and committing it.

Regards,

Peter


On 20 December 2013 11:29, Timo Teras <timo.teras@iki.fi> wrote:
Hi,

Yes - it is MIT license in the webpage. Might be a good thing to
mention that in the created header file too.

- Timo

On Fri, 20 Dec 2013 12:08:10 +0100
Daniel-Constantin Mierla <miconda@gmail.com> wrote:

> getting rid of libunistring is good, indeed, thanks.
>
> One thing that has to be cared with the websocket module is the need
> to link against libssl, so if the new included code is gpl we need to
> have an exception for it to comply with official debian packaging
> requirements.
>
> The page lists a different license than gpl, it seems to be MIT,
> which I guess is all fine, being compatible with GPL and allowing
> linking with libssl. But I wanted to highlight in case anyone else
> has different opinion/more details.
>
> Cheers,
> Daniel
>
>
> On 20/12/13 11:47, Timo Teräs wrote:
> > libunistring is old, slow, messy code and uncompilable on new
> > systems. remove the sole user of it, and replace it with inline
> > utf8 decoder implementation from
> > http://bjoern.hoehrmann.de/utf-8/decoder/dfa/.
> >
> > improves performance and portability as libunistring is not needed.
> > ---
> > @Peter, Hugh: Would it be ok for me to push this?
> >
> >   modules/websocket/Makefile                |  2 +-
> >   modules/websocket/README                  |  1 -
> >   modules/websocket/doc/websocket_admin.xml |  3 --
> >   modules/websocket/utf8_decode.h           | 52
> > +++++++++++++++++++++++++++++++
> > modules/websocket/ws_frame.c              |  4 +-- 5 files changed,
> > 55 insertions(+), 7 deletions(-) create mode 100644
> > modules/websocket/utf8_decode.h
> >
> > diff --git a/modules/websocket/Makefile b/modules/websocket/Makefile
> > index bb7c809..c686a82 100644
> > --- a/modules/websocket/Makefile
> > +++ b/modules/websocket/Makefile
> > @@ -27,7 +27,7 @@ else
> >     #       E.g.: make TLS_HOOKS=1 TLS_EXTRA_LIBS="-lz -lkrb5"
> >   endif
> >
> > -LIBS+= $(TLS_EXTRA_LIBS) -lunistring
> > +LIBS+= $(TLS_EXTRA_LIBS)
> >
> >   # Static linking, if you'd like to use TLS and WEBSOCKET at the
> > same time #
> > diff --git a/modules/websocket/README b/modules/websocket/README
> > index 49d8693..bdba3e4 100644
> > --- a/modules/websocket/README
> > +++ b/modules/websocket/README
> > @@ -316,7 +316,6 @@ onreply_route[WS_REPLY] {
> >      The following libraries must be installed before running
> > Kamailio with this module loaded:
> >        * OpenSSL.
> > -     * GNU libunistring.
> >
> >   4. Parameters
> >
> > diff --git a/modules/websocket/doc/websocket_admin.xml
> > b/modules/websocket/doc/websocket_admin.xml index fa7d300..e40dc09
> > 100644 --- a/modules/websocket/doc/websocket_admin.xml
> > +++ b/modules/websocket/doc/websocket_admin.xml
> > @@ -262,9 +262,6 @@ onreply_route[WS_REPLY] {
> >             <listitem>
> >             <para><emphasis>OpenSSL</emphasis>.</para>
> >             </listitem>
> > -           <listitem>
> > -           <para><emphasis>GNU libunistring</emphasis>.</para>
> > -           </listitem>
> >             </itemizedlist>
> >             </para>
> >     </section>
> > diff --git a/modules/websocket/utf8_decode.h
> > b/modules/websocket/utf8_decode.h new file mode 100644
> > index 0000000..b274fe7
> > --- /dev/null
> > +++ b/modules/websocket/utf8_decode.h
> > @@ -0,0 +1,52 @@
> > +#include <stdint.h>
> > +#include <stddef.h>
> > +
> > +// Copyright (c) 2008-2010 Bjoern Hoehrmann <bjoern@hoehrmann.de>
> > +// See http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ for details.
> > +
> > +#define UTF8_ACCEPT 0
> > +#define UTF8_REJECT 12
> > +
> > +static const uint8_t utf8d[] = {
> > +  // The first part of the table maps bytes to character classes
> > that
> > +  // to reduce the size of the transition table and create
> > bitmasks.
> > +   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > +   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > +   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > +   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> > +   1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
> > 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
> > +   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
> > 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
> > +   8,8,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
> > 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
> > +  10,3,3,3,3,3,3,3,3,3,3,3,3,4,3,3,
> > 11,6,6,6,5,8,8,8,8,8,8,8,8,8,8,8, +
> > +  // The second part is a transition table that maps a combination
> > +  // of a state of the automaton and a character class to a state.
> > +   0,12,24,36,60,96,84,12,12,12,48,72,
> > 12,12,12,12,12,12,12,12,12,12,12,12,
> > +  12, 0,12,12,12,12,12, 0,12, 0,12,12,
> > 12,24,12,12,12,12,12,24,12,24,12,12,
> > +  12,12,12,12,12,12,12,24,12,12,12,12,
> > 12,24,12,12,12,12,12,12,12,24,12,12,
> > +  12,12,12,12,12,12,12,36,12,36,12,12,
> > 12,36,12,12,12,12,12,36,12,36,12,12,
> > +  12,36,12,12,12,12,12,12,12,12,12,12,
> > +};
> > +
> > +static inline uint32_t decode(uint32_t* state, uint32_t* codep,
> > uint32_t byte) +{
> > +  uint32_t type = utf8d[byte];
> > +
> > +  *codep = (*state != UTF8_ACCEPT) ?
> > +    (byte & 0x3fu) | (*codep << 6) :
> > +    (0xff >> type) & (byte);
> > +
> > +  *state = utf8d[256 + *state + type];
> > +  return *state;
> > +}
> > +
> > +static inline int IsUTF8(uint8_t* s, size_t len)
> > +{
> > +  uint32_t codepoint, state = 0;
> > +
> > +  while (len--)
> > +    decode(&state, &codepoint, *s++);
> > +
> > +  return state == UTF8_ACCEPT;
> > +}
> > +
> > diff --git a/modules/websocket/ws_frame.c
> > b/modules/websocket/ws_frame.c index a3a4cef..3562437 100644
> > --- a/modules/websocket/ws_frame.c
> > +++ b/modules/websocket/ws_frame.c
> > @@ -22,7 +22,7 @@
> >    */
> >
> >   #include <limits.h>
> > -#include <unistr.h>
> > +#include "utf8_decode.h"
> >   #include "../../events.h"
> >   #include "../../receive.h"
> >   #include "../../stats.h"
> > @@ -695,7 +695,7 @@ int ws_frame_transmit(void *data)
> >     frame.fin = 1;
> >     /* Can't be sure whether this message is UTF-8 or not so
> > check to see if it "might" be UTF-8 and send as binary if it
> > definitely isn't */
> > -   frame.opcode = (u8_check((uint8_t *) wsev->buf, wsev->len)
> > == NULL) ?
> > +   frame.opcode = IsUTF8((uint8_t *) wsev->buf, wsev->len) ?
> >                             OPCODE_TEXT_FRAME :
> > OPCODE_BINARY_FRAME; frame.payload_len = wsev->len;
> >     frame.payload_data = wsev->buf;
>
>




--
Peter Dunkley
Technical Director
Crocodile RCS Ltd