Initial DB Merge Proposal

List overview All Threads
Download

newer

older

git:master: snmpstats (k): renamed...

[tracker] Comment added: Local 487...

Jan Janak

23 Sep 2009 23 Sep '09

1:41 a.m.

Hello everybody,

If we ever want to proceed to merging individual modules, we will need to address differences in our database schemas, because many modules depend on them. I would like to start the discussion off, so I prepared an initial proposal for the merge. You can find the HTML version of the document here:

http://iptel.org/jan/db_merge

Plain text version of the document is attached to this e-mail. It is a long document (90k) and you don't have to read it all. The document consists of two major parts.

In section "Concepts" I have tried to describe some of the techniques and concepts we use in the new SER database schema. I only described things that we do differently now than Kamailio or older SER versions. I also tried to work in a bit of historic perspective so that we know how things evolved, what worked and what didn't and so on.

In the second part of the document I went through all SER and Kamailio tables and tried to capture everything that came to my mind regarding differences, merge proposals, purpose of the tables, and so on. For most table I also tried to write down an initial merge proposal along compatibility considerations.

I classified all the tables from both schemas into five groups: * Easy tables: Tables in this group are very easy to marge, usually no changes are required.

* UID tables: This is a group of tables where we might need to do small changes in the table defintion mainly related to UIDs (unique user identifiers)--if we decide to use them in sip-router.

* Other tables: A bunch of tables where we need to be careful with merging.

* Undecided tables: I have no opinion on tables from this group yet, usually because I do not understand what are they really used for.

* serweb tables: SER database schema has a number of tables that only serweb uses.

Please note that this is nothing more than an initial draft proposal and as such it is likely to be biased towards SER database schema. Also the language in the document is not as polished as it should be and there are parts that are probably hard to read, I haven't had the time to edit it yet, I'm sorry.

It would be really great if we could discuss at least part of it at the next meeting in Berlin and if people think that we should proceed with this type of work then we can perhaps start merging some of the easy tables. I also prepared a presentation for the stuff in section "Concepts", it takes about 30 minutes and if we have a slot on the agenda then I would like to present it.

Hopefully the document will be useful, I welcome any feedback or questions!

Jan.

Attachments:

db_merge.org (application/octet-stream — 89.4 KB)

Show replies by date

Grzegorz Stanislawski

24 Sep 24 Sep

12:59 a.m.

Jan Janak pisze:

...

Hello everybody,

If we ever want to proceed to merging individual modules, we will need to address differences in our database schemas, because many modules depend on them. I would like to start the discussion off, so I prepared an initial proposal for the merge. You can find the HTML [..] Hopefully the document will be useful, I welcome any feedback or questions!

Hi there.

Maybe its good point to reconsider whole approach to database behind ser. Ser itself provides lots of flexibility and freedom to processing of sip messages and it is its main strength, however it provide no flexibility at all to structure of database which drives it, and me and many people i know it is its main weakness. I of course know there are modules like db_ops, using which one may do arbitrary queries, but you cannot use it to fetch most important data like credentials. Of course one can rename almost every column by setting appropriate modparam in config file, but different field names are just not sufficient in some cases. Also storage model of avp, very good and useful imo, is not acceptable for many situations and for many users. I know one can use avpops db_scheme but it looks like hack to achieve a feature i'm talking about, and its done by some extra queries, which would be unneeded.

What i'd like to propose is to allow user enter its own query string (as modparam) which would be executed by certain module in certain situation. What is to be decided is what fields module require in result, what type they have to be and in what order. and also what data may be provided in query do match right result. example usage would be modparam("auth_db","cred_query"."select passwd,flags from credentials where user='$au' and realm='$ar'"); or if this is to difficult to implement maybe: modparam("auth_db","cred_query"."select passwd,flags from credentials where user='%1' and realm='%2'"); while documentation would state that %1 is macro which will resolve to auth username %2 to auth realm and so on. And module expects at least 2 column in result where first one has to be string and contain user password, and second has to be int and contain flags. Going further we could make that if more columns available would be converted to user avps using column names as attribute names. (domain module could do same thing for domain attrs.)

For acc module in other example would be something like: modparam("acc_db","req_query","insert into acc values ($something)"); modparam("acc_db","reply_query","update acc set something=$something where whatever"); and so on

I'd like to notice that that scheme is used successfully for example in freeradius or gnugk.

I know it can break compatibility with old scripts, however i can imagine that modules can support both ways simply by checking if *_query modparam is defined, and use new way if yes and old way if not. I'm concerned that it will improve speed as query string can be passed (after resolving macros of pvs or avps) directly to db engine (currently there is a lot of code which builds structures for db_api, and then db driver builds query from that) I realize of course that overall performance may fall if user create some neck breaking queries, but it is his choice.

What do You think?

...

Jan.

Grzegorz Stanislawski

Martin Hoffmann

7:02 a.m.

Grzegorz Stanislawski wrote:

...

What i'd like to propose is to allow user enter its own query string (as modparam) which would be executed by certain module in certain situation.

Have you considered using database views? This should give you the freedom you need in your own database definition and maps it to the well-known database model for SR.

Keeping such a well-known model adds a layer of dependability. Everyone who knows SR will be able to query relevant information right away. And from the view definitions, this person will immediately see how it maps to your internal data model.

Regards, Martin

Jan Janak

10:51 a.m.

On Thu, Sep 24, 2009 at 9:02 AM, Martin Hoffmann martin.hoffmann@telio.ch wrote:

...

Grzegorz Stanislawski wrote:

...
What i'd like to propose is to allow user enter its own query string (as modparam) which would be executed by certain module in certain situation.

Have you considered using database views? This should give you the freedom you need in your own database definition and maps it to the well-known database model for SR.

Keeping such a well-known model adds a layer of dependability. Everyone who knows SR will be able to query relevant information right away. And from the view definitions, this person will immediately see how it maps to your internal data model.

You can still do it. But I think Grzegorz has a point in saying that instead configuring just column names we should make it possible to configure the whole query.

That's something that was not possible to do in older SER versions due to the way SQL queries were built and executed, but it is possible with the newer version of the database api.

Jan.

Grzegorz Stanislawski

8:38 p.m.

Martin Hoffmann pisze:

...

Grzegorz Stanislawski wrote:

...
What i'd like to propose is to allow user enter its own query string (as modparam) which would be executed by certain module in certain situation.

Have you considered using database views? This should give you the freedom you need in your own database definition and maps it to the well-known database model for SR.

Keeping such a well-known model adds a layer of dependability. Everyone who knows SR will be able to query relevant information right away. And from the view definitions, this person will immediately see how it maps to your internal data model.

Everyone, even who knows sr, will have to spend few days analyzing ser.cfg before he knows which information is important and where to find it, as most data for 'business logic' is stored in avps and those are named and used according to imagination of original script writer. Of course that database views and db_schemes can be used to retrieve important data from virtually any data model, special route blocks can do some lookups,checks and perform some actions, but sometimes it gets so complicated that person who have written it can get lost.

Someone might want to use a stored procedure in sql, instead of having simple select from credentials, which may do some internal checks (if account is enabled, customer has paid his bill etc) before it returns a credentials for authentication in ser. Why? because he don't have to write all that in ser.cfg, or just because his boss prefers this way.

I'm actually concerned that it would be easier to get familiar with someone else's script when there is fixed number of queries do analyze, than having to do reverse engineering of views, db_schemes or db_ops queries scattered all over the script.

...

Regards, Martin

Grzegorz Stanislawski

Jan Janak

10:28 a.m.

On Thu, Sep 24, 2009 at 2:59 AM, Grzegorz Stanislawski stangrze@netitel.pl wrote:

...

Jan Janak pisze:

...
Hello everybody,

If we ever want to proceed to merging individual modules, we will need to address differences in our database schemas, because many modules depend on them. I would like to start the discussion off, so I prepared an initial proposal for the merge. You can find the HTML [..] Hopefully the document will be useful, I welcome any feedback or questions!

...
Hi there.

What i'd like to propose is to allow user enter its own query string (as modparam) which would be executed by certain module in certain situation. What is to be decided is what fields module require in result, what type they have to be and in what order. and also what data may be provided in query do match right result. example usage would be modparam("auth_db","cred_query"."select passwd,flags from credentials where user='$au' and realm='$ar'"); or if this is to difficult to implement maybe: modparam("auth_db","cred_query"."select passwd,flags from credentials where user='%1' and realm='%2'"); while documentation would state that %1 is macro which will resolve to auth username %2 to auth realm and so on. And module expects at least 2 column in result where first one has to be string and contain user password, and second has to be int and contain flags.

Yes, I agree that it would be better to have the query configurable instead of configuring just column names. Currently most modules have column names configurable, but it is done this way for historic reasons, mainly because it was not possible to check column types and results in older versions of SER.

Regarding your two examples; the second example is actually easier to implement than the first one. To be more precise, it is trivial to implement in modules that come from SER and use libsrdb2. SER modules do not assemble the SQL query at runtime anymore, instead they compile it when SER initializes. The compiled query is sent to the database server and all return columns and column types are verified to ensure that the number of returned columns and their types match what the module expects.

If you rewrite your second example as: modparam("auth_db", "cred_query", "select passwd,flags from credentials where user=? and realm=?")

then implementing this parameter in auth_db module takes only a couple lines of code, because the value of the parameter is exactly the query string that the module builds and passes to the database api. There would be no need to check the number of returned columns or query parameters because this is what mysql does for us. We would just need to document that the query is given two argumens by the module and they should be denoted by question marks (in case of mysql). This would behave as expected, if you configure a query that does not return correct columns or takes more or less arguments then SER refuses to start.

Your first example with pseudo-variables inside the query string is more difficult to implement because we would need to parse the query string in SER to obtain a list of pseudo-variables, build the data structures for query parameters according to the number of pseudo-variables used inside the query and then resolve the values of those parameters at runtime when the module actually executes the query. Note that it not that difficult to do, most of the functions we would need to achieve this have already been implemented and are available in SER. But it takes more work and bigger changes in modules.

But I agree that this kind of flexibility would be good to have for special cases. Maybe we could start by implementing the easier version, that would give us the possibility to configure the query string but not its parameters. That's trivial to implement. Later we could add support for arbitrary parameters that are resolved to AVPs, selects, PVs, whatever.

...

Going further we could make that if more columns available would be converted to user avps using column names as attribute names. (domain module could do same thing for domain attrs.)

I am not sure about this yet. I mean it is doable, butthere would be some precautions we would need to take, for example, to ensure that select * does not overview anything important.

...

I know it can break compatibility with old scripts, however i can imagine that modules can support both ways simply by checking if *_query modparam is defined, and use new way if yes and old way if not.

Yes.

...

I'm concerned that it will improve speed as query string can be passed (after resolving macros of pvs or avps) directly to db engine (currently there is a lot of code which builds structures for db_api, and then db driver builds query from that)

SER modules (those that come from ser and not kamailio) won't be affected because they do not build SQL queries at runtime anymore. This is done just once in mod_init and the query is then executed repeatedly at runtime with different argument values.

...

I realize of course that overall performance may fall if user create some neck breaking queries, but it is his choice.

What do You think?

I think it would be definitely worth doing, especially the easy variant without configurable query arguments. That would give us extra flexibility with only minimal changes in the code. We can extend it and add support for selects, AVPs, or pseudo-variables when we have that.

Jan.

Grzegorz Stanislawski

10:17 p.m.

Jan Janak pisze:

...

On Thu, Sep 24, 2009 at 2:59 AM, Grzegorz Stanislawski stangrze@netitel.pl wrote:

...
What i'd like to propose is to allow user enter its own query string (as modparam) which would be executed by certain module in certain situation.

Yes, I agree that it would be better to have the query configurable instead of configuring just column names. Currently most modules have [..] special cases. Maybe we could start by implementing the easier version, that would give us the possibility to configure the query string but not its parameters. That's trivial to implement. Later we could add support for arbitrary parameters that are resolved to AVPs, selects, PVs, whatever.

Version capable of resolving variables would be very useful for accounting, and this is actually already done for db_extra modparam.

While having variable resolving for lookup queries might be overkill especially that in some cases some PVs or AVPs are not yet known which may lead to errors and confusion, i'd suggest not to trivialize it to syntax of mysql_prepare. Having well defined %1 %2 or something like this, would allow to for example use "select .. from credentials where user=%1 and domain=%2" when multidomain support is enabled and "select .. .. where user=%1" when not. simple parser in module init could change % macros to ? and add certain keys to list for further execution.

...

...
Going further we could make that if more columns available would be converted to user avps using column names as attribute names. (domain module could do same thing for domain attrs.)

I am not sure about this yet. I mean it is doable, butthere would be some precautions we would need to take, for example, to ensure that select * does not overview anything important.

From what i have learned, in most cases, there is a fixed small set of quite stable (not changed often) attributes which are used every time packet is processed. It might be better to have it loaded in one query together with credentials/domainid, (or even stored in domain cache). Having attributes in one row with user credentials is not something unusual, expression like modparam("avpops","db_scheme","email_scheme:table=subscriber;value_col=email_address;value_type=string") is common in many scripts. My intention is not to create table like in asterisk realtime which have 50 columns or so, but 0-10 looks reasonable. Other avp may be loaded when needed usual way from usual source (user_attrs/domain_attrs/avp_db's whatever)

I realize that it may complicate how this data is managed from mi/rpc side, bat maybe they can be marked just by another bit in flags and than handled accordingly.

Regarding 'select *' issue i don't think it should be our concern. User who is smart enough to handle writing of sr.cfg should be able realize what he is doing. And if he decides to do it he do at his own risk.

...

Jan.

Grzegorz Stanislawski

Jan Janak

25 Sep 25 Sep

11:04 a.m.

On Fri, Sep 25, 2009 at 12:17 AM, Grzegorz Stanislawski stangrze@netitel.pl wrote:

...

...
Maybe we could start by implementing the easier version, that would give us the possibility to configure the query string but not its parameters. That's trivial to implement. Later we could add support for arbitrary parameters that are resolved to AVPs, selects, PVs, whatever.

Version capable of resolving variables would be very useful for accounting, and this is actually already done for db_extra modparam.

While having variable resolving for lookup queries might be overkill especially that in some cases some PVs or AVPs are not yet known which may lead to errors and confusion, i'd suggest not to trivialize it to syntax of mysql_prepare. Having well defined %1 %2 or something like this, would allow to for example use "select .. from credentials where user=%1 and domain=%2" when multidomain support is enabled and "select .. .. where user=%1" when not. simple parser in module init could change % macros to ? and add certain keys to list for further execution.

That's not trivial to implement anymore, unlike my simplified proposal with '$' above. It's not only about the parser, you have to change the way how you pass parameters to the query in the module and that's the more complicated part. I am not saying that it is difficult to do, but it takes some planning and coding.

And if we eventually do something like this, I'd prefer to do that one little extra step and implement support for selects, PVs, and AVPs instead of escape sequences like %1 and %2. The query from your example might then look like this:

select ... where user=@authorization.username and domain=@from.uri.host

Internally escape sequences like %1 and %2 would almost always be resolved to selects or PVs anyway simply because there are so many of them, they are convenient to use and they can extract almost anything from SIP messages or the proxy environment.

...

[...] From what i have learned, in most cases, there is a fixed small set of quite stable (not changed often) attributes which are used every time packet is processed. It might be better to have it loaded in one query together with credentials/domainid, (or even stored in domain cache). Having attributes in one row with user credentials is not something unusual, expression like modparam("avpops","db_scheme","email_scheme:table=subscriber;value_col=email_address;value_type=string") is common in many scripts.

There is no need to store any attributes in the credentials table. You can get everything with one query from both credentials and user_attrs tables like this:

In the example above columns uid and password come from table credentials, columns name and value are the AVPs for the user stored in user_attrs table.

Minimizing the number of queries per SIP message is something I have had on my todo list for quite a while, including the optimization above. I have done some private experiments with this stuff and managed to cut the number of queries down to 2 per SIP message with MySQL. Those two queries loaded all the attributes, digest credentials, URIs and contacts from location table.

So yes, cutting down the number of queries is another thing worth doing. I did not want throw this in yet because we haven't agreed on a database schema and having a databases schema that supports this efficiently is a precondition for such optimizations. So right now we need to agree on the database schema for sip-router.

Jan.

Jan Janak

11:10 a.m.

On Fri, Sep 25, 2009 at 1:04 PM, Jan Janak jan@ryngle.com wrote:

...

On Fri, Sep 25, 2009 at 12:17 AM, Grzegorz Stanislawski stangrze@netitel.pl wrote:

...
...
Maybe we could start by implementing the easier version, that would give us the possibility to configure the query string but not its parameters. That's trivial to implement. Later we could add support for arbitrary parameters that are resolved to AVPs, selects, PVs, whatever.

Version capable of resolving variables would be very useful for accounting, and this is actually already done for db_extra modparam.

While having variable resolving for lookup queries might be overkill especially that in some cases some PVs or AVPs are not yet known which may lead to errors and confusion, i'd suggest not to trivialize it to syntax of mysql_prepare. Having well defined %1 %2 or something like this, would allow to for example use "select .. from credentials where user=%1 and domain=%2" when multidomain support is enabled and "select .. .. where user=%1" when not. simple parser in module init could change % macros to ? and add certain keys to list for further execution.

That's not trivial to implement anymore, unlike my simplified proposal with '$' above. It's not only about the parser, you have to change the way how you pass parameters to the query in the module and that's the more complicated part. I am not saying that it is difficult to do, but it takes some planning and coding.

And if we eventually do something like this, I'd prefer to do that one little extra step and implement support for selects, PVs, and AVPs instead of escape sequences like %1 and %2. The query from your example might then look like this:

select ... where user=@authorization.username and domain=@from.uri.host

Internally escape sequences like %1 and %2 would almost always be resolved to selects or PVs anyway simply because there are so many of them, they are convenient to use and they can extract almost anything from SIP messages or the proxy environment.

And I forgot to mention that patches are very much welcome, so if you implement any of this, I'd be happy to include it.

Thanks! Jan.

Grzegorz Stanislawski

28 Sep 28 Sep

12:19 a.m.

Jan Janak pisze:

...

And if we eventually do something like this, I'd prefer to do that one little extra step and implement support for selects, PVs, and AVPs instead of escape sequences like %1 and %2. The query from your example might then look like this:

I just thought that simplified parser would be better than involving parser for selects and PVs, but i'll not insist on that ;-)

...

There is no need to store any attributes in the credentials table. You can get everything with one query from both credentials and user_attrs

There might be a reason from business logic point of view, this may be a table which is used for number of other purposes (like other services) and may contain some critical data like "account enabled" and/or "account locked" (yes i know that this particular example can be "workarounded" by using database views)

...

Minimizing the number of queries per SIP message is something I have had on my todo list for quite a while, including the optimization above. I have done some private experiments with this stuff and managed to cut the number of queries down to 2 per SIP message with MySQL. Those two queries loaded all the attributes, digest credentials, URIs and contacts from location table.

My personal experience from other areas shows, that sometimes its better to do more simpler queries than constructing excessive joins, but this may be subjective view.

...

[..] efficiently is a precondition for such optimizations. So right now we need to agree on the database schema for sip-router.

Ok, so i guess i'm going to shut up (for) now ;-)

...

Jan.

Grzegorz Stanislawski

5762

Age (days ago)

5767

Last active (days ago)

sr-dev@lists.kamailio.org

9 comments

3 participants

tags (0)

participants (3)

Grzegorz Stanislawski
Jan Janak
Martin Hoffmann