Let me take on this, as I'm actually a keen advocate of the current DB model. Just a quick heads up before I go into details: I believe that the application with AVPs in a separate table creates a great value, and the performance penalty for it is sustainable. Note also that this is still a minor point compared to the data model as whole (and for those keen about this one, it can be supported by adding a rather easy SER operation).
Again -- I believe it's a trade-off between application and efficiency. Let me first begin with the application because I consider that clearly more important. The challange is that any ever-changing scheme is just too troublesome. It causes migration pain and it prohibits easy-to-do extensions. Think of web2.0 :-) .... admins or even subscribers wish to add a new attributed which will have impact on signaling, charging, whatever aspect of SIP life-cycle you are interested in. Changing table scheme makes it *very* difficult, and bites you back again on your next upgrade.
Now the question is how big is the actual penalty for having attributes in an extensible way in a separate table. Whereas there have been boldly articulated concerns, I've been missing data supporting those. Keep in mind that SER does very simple queries -- 1) determine UID 2) load associated AVPs.
Also -- the debate discussing this thing as "data model" as whole went IMO quite far. As Greger has mentioned, the consolidation from ambiguous strings to unambiguous IDs has greatly improved sanity. Just one example I have posted before: think of john@doe.com, who desired to change to have a new aliaas john.doe@doe.com, which changed after he married and took his spouse's name to john.foo@doe.com and later after the domain beinq acquired by someone else to john.foo@oracle.com. imagine what your CDR processing would generated if it was based on URIs.....
So I think proper next steps could look like this: - if folks have some reasonable measurements showing that AVPs in a separaete table are troublesome, come up with those. (for sake of completeness: SER is having a very trivial DB usage patterns and we don't doe any joins -- we only do two queries: a) determine UID b) load attributes specific to UIDs) - to provide a choice to those who for whatever reason don't wish to use attributes in a separate table, it should be easy to create a patch that fills the AVPs with values from credentials/columns (as opposed to from a separate table)
-jiri
At 10:25 03/12/2007, Greger V. Teigre wrote:
Dear Mahatma, If you want to engage the developers, you will have to subscribe to serdev, as many of the developers don't follow serusers.
I have not participated in SER's 2.0 data model, but my initial take on your suggestion is the following: With all respect, I believe you may have misunderstood what Tom is addressing in his discussion. He targets a generic database model built around attribute value pairs. This is not the case for SER, on the contrary, SER's new data model is much sounder from a db perspective (than 0.8 and 0..9) and is built around the uid and did as unique identifiers. Queries will through joins across the tables construct the needed data in a very efficient manner, as uid and did are indexed and where the queries will use uid and did in the where clause. Without having checked, I assume the tables have been normalized just as they should (i.e. splitting them up).
The attribute-value pairs you are referring to are not part of the core data model (which Tom covers), but rather attributes that may be loaded and made available in ser.cfg through a query created to retrieve the attributes-value pairs. Without creating a limited set of attributes that can be supported in ser.cfg, the generic avpairs cannot be avoided. However, the queries that retrieve avpairs do not use the semantic of the avpairs to select which avpairs to load, ALL avpairs belonging to a specific uid and did are loaded at the same time.
Also, I'm afraid this statement is wrong: "It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user." This is exactly how you should not do it if you have complex data relationships that need to be represented and retrieved without duplicating data.
Does this answer alleviate your fears? If not, please subscribe to serdev and post your comment there to engage people closer to the design of the database. g-)
X Z wrote:
Hi All, This is specifically for the SER/OpenSER developers, but I'm not a serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14 production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I downloaded it for testing. I was pretty disappointed with one aspect of the new data model and I'm requesting that the developers consider a further revision on the data model.
Basically, taking all fields out of the subscriber table like Last_name, first_name, email, timezone, rpid/asserted identity, etc, etc is not the greatest idea. It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user, and that is common in > 90% of SER's use cases ( i.e. corporate, carrier/VSP.) I would suggest adding voicemail_password, and maybe every other field that is being added into the default attributes script that I saw in CVS recently. If you already know what attributes a user has (and they have a 1 to 1 relationship), then its far better from a db performance perspective to keep these attributes in the user table. I know that the code becomes more complicated, but I think it may be a tradeoff worth discussing.
See this discussion ( http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... ) between Oracle users and Tom, (an Oracle engineer/architect.) The full text of this discussion is very informative and I highly recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and now being used in SER fails for all but the most trivial of applications. Maybe SER *by itself* qualifies as "trivial" from a database architect's perspective, but think about things like Asterisk integration, which is quite common. You quickly run into some very nasty queries . . .
Please note that I am not a software developer nor a database engineer, just a user who reads a lot, so I'm open to being the ignorant one here, but I thought that this should be discussed among users and developers.
Thanks for considering, Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084...
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and show you how ugly, hard and inefficient queries against your very flexible model will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for "maximum flexibility" or applications built in ways that prohibit performance. Many times - these are one in the
same thing! For example, it is well known you can represent any object in a database using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes
( attrId int primary key, attrName varchar2(255), datatype varchar2(25) );
Create table object_Attributes ( oid int, attrId int, value varchar2(4000), primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' ); insert into object_Attributes values( 1, 1, '15-mar-1965' ); insert into object_Attributes values( 1, 2, 'Thomas' ); insert into object_Attributes values( 1, 3, 'Kyte' );
commit;
insert into objects values ( 2, 'PERSON' ); insert into object_Attributes values( 2, 1, '21-oct-1968' ); insert into object_Attributes values( 2, 2, 'John' ); insert into object_Attributes values( 2, 3, 'Smith' );
commit;
And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and LAST_NAME of all PERSON records:
ops$tkyte@ORA920> select max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name 3 from objects, object_attributes, attributes 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
5 and object_attributes.attrId = attributes.attrId 6 and object_attributes.oid = objects.oid 7 and http://objects.nameobjects.name = 'PERSON' 8 group by objects.oid
9 /
FIRST_NAME LAST_NAME
Thomas Kyte John Smith
Looks great, right? I mean, the developers don't have to create tables anymore, we can
add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The developers can do whatever they want and the DBA can't stop them. This is ultimate "flexibility". I've seen people try to build entire systems on this model.
But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, last_name from person" query is transformed into a 3-table join with aggregates and all. Further, if the attributes are "NULLABLE" - that is, there might not be a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining which in some cases can remove more optimal query plans from consideration.
Writing queries might look pretty straightforward, but it's impossible to do in a
performant fashion. For example, if we wanted to get everyone that was born in MARCH or has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an inline view around that:
ops$tkyte@ORA920> select * 2 from ( 3 select max(decode(attrName, 'FIRST_NAME', value, null)) first_name, 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name,
5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) date_of_birth 6 from objects, object_attributes, attributes 7 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME', 'DATE_OF_BIRTH' ) 8 and object_attributes.attrId = attributes.attrId 9 and object_attributes.oid = objects.oid
10 and http://objects.nameobjects.name = 'PERSON' 11 group by objects.oid 12 ) 13 where last_name = 'Smith' 14 or date_of_birth like '%-mar-%'
15 /
FIRST_NAME LAST_NAME DATE_OF_BIRTH
Thomas Kyte 15-mar-1965 John Smith 21-oct-1968
So, it looks "easy" to query, but think about the performance! If you had a couple thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle would have to process the entire inner group by query first and then apply the WHERE
clause.
This is not a made up data model, one that I crafted just to make a point. This is an actual data model that I've seen people try to use. Their goal is ultimate flexibility. They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have.
Well - that is what the database was written for in the first place: Oracle implemented this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query them. You are trying to put a generic layer on top of a generic layer - and it fails each
and every time except for the most trivial of applications.
Serusers mailing list mailto:Serusers@lists.iptel.orgSerusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
-- Jiri Kuthan http://iptel.org/~jiri/