Hi All, This is specifically for the SER/OpenSER developers, but I'm not a serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I downloaded it for testing. I was pretty disappointed with one aspect of the new data model and I'm requesting that the developers consider a further revision on the data model.
Basically, taking all fields out of the subscriber table like Last_name, first_name, email, timezone, rpid/asserted identity, etc, etc is not the greatest idea. It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user, and that is common in > 90% of SER's use cases (i.e. corporate, carrier/VSP.) I would suggest adding voicemail_password, and maybe every other field that is being added into the default attributes script that I saw in CVS recently. If you already know what attributes a user has (and they have a 1 to 1 relationship), then its far better from a db performance perspective to keep these attributes in the user table. I know that the code becomes more complicated, but I think it may be a tradeoff worth discussing.
See this discussion ( http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084...) between Oracle users and Tom, (an Oracle engineer/architect.) The full text of this discussion is very informative and I highly recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and now being used in SER fails for all but the most trivial of applications. Maybe SER *by itself* qualifies as "trivial" from a database architect's perspective, but think about things like Asterisk integration, which is quite common. You quickly run into some very nasty queries . . .
Please note that I am not a software developer nor a database engineer, just a user who reads a lot, so I'm open to being the ignorant one here, but I thought that this should be discussed among users and developers.
Thanks for considering, Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084...
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and show you how ugly, hard and inefficient queries against your very flexible model will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for "maximum flexibility" or applications built in ways that prohibit performance. Many times - these are one in the same thing! For example, it is well known you can represent any object in a database using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes ( attrId int primary key, attrName varchar2(255), datatype varchar2(25) );
Create table object_Attributes ( oid int, attrId int, value varchar2(4000), primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int, primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' ); insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' ); insert into object_Attributes values( 1, 1, '15-mar-1965' ); insert into object_Attributes values( 1, 2, 'Thomas' ); insert into object_Attributes values( 1, 3, 'Kyte' ); commit;
insert into objects values ( 2, 'PERSON' ); insert into object_Attributes values( 2, 1, '21-oct-1968' ); insert into object_Attributes values( 2, 2, 'John' ); insert into object_Attributes values( 2, 3, 'Smith' ); commit;
And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and LAST_NAME of all PERSON records:
ops$tkyte@ORA920> select max( decode(attrName, 'FIRST_NAME', value, null )) first_name, 2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name 3 from objects, object_attributes, attributes 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' ) 5 and object_attributes.attrId = attributes.attrId 6 and object_attributes.oid = objects.oid 7 and objects.name = 'PERSON' 8 group by objects.oid 9 /
FIRST_NAME LAST_NAME -------------------- -------------------- Thomas Kyte John Smith
Looks great, right? I mean, the developers don't have to create tables anymore, we can add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The developers can do whatever they want and the DBA can't stop them. This is ultimate "flexibility". I've seen people try to build entire systems on this model.
But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, last_name from person" query is transformed into a 3-table join with aggregates and all. Further, if the attributes are "NULLABLE" - that is, there might not be a row in OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining which in some cases can remove more optimal query plans from consideration.
Writing queries might look pretty straightforward, but it's impossible to do in a performant fashion. For example, if we wanted to get everyone that was born in MARCH or has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an inline view around that:
ops$tkyte@ORA920> select * 2 from ( 3 select max(decode(attrName, 'FIRST_NAME', value, null)) first_name, 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name, 5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) date_of_birth 6 from objects, object_attributes, attributes 7 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME', 'DATE_OF_BIRTH' ) 8 and object_attributes.attrId = attributes.attrId 9 and object_attributes.oid = objects.oid 10 and objects.name = 'PERSON' 11 group by objects.oid 12 ) 13 where last_name = 'Smith' 14 or date_of_birth like '%-mar-%' 15 /
FIRST_NAME LAST_NAME DATE_OF_BIRTH -------------------- -------------------- -------------------- Thomas Kyte 15-mar-1965 John Smith 21-oct-1968
So, it looks "easy" to query, but think about the performance! If you had a couple thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle would have to process the entire inner group by query first and then apply the WHERE clause.
This is not a made up data model, one that I crafted just to make a point. This is an actual data model that I've seen people try to use. Their goal is ultimate flexibility. They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have. Well - that is what the database was written for in the first place: Oracle implemented this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query them. You are trying to put a generic layer on top of a generic layer - and it fails each and every time except for the most trivial of applications.
Dear Mahatma, If you want to engage the developers, you will have to subscribe to serdev, as many of the developers don't follow serusers.
I have not participated in SER's 2.0 data model, but my initial take on your suggestion is the following: With all respect, I believe you may have misunderstood what Tom is addressing in his discussion. He targets a generic database model built around attribute value pairs. This is not the case for SER, on the contrary, SER's new data model is much sounder from a db perspective (than 0.8 and 0..9) and is built around the uid and did as unique identifiers. Queries will through joins across the tables construct the needed data in a very efficient manner, as uid and did are indexed and where the queries will use uid and did in the where clause. Without having checked, I assume the tables have been normalized just as they should (i.e. splitting them up).
The attribute-value pairs you are referring to are not part of the core data model (which Tom covers), but rather attributes that may be loaded and made available in ser.cfg through a query created to retrieve the attributes-value pairs. Without creating a limited set of attributes that can be supported in ser.cfg, the generic avpairs cannot be avoided. However, the queries that retrieve avpairs do not use the semantic of the avpairs to select which avpairs to load, ALL avpairs belonging to a specific uid and did are loaded at the same time.
Also, I'm afraid this statement is wrong: "It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user." This is exactly how you should not do it if you have complex data relationships that need to be represented and retrieved without duplicating data.
Does this answer alleviate your fears? If not, please subscribe to serdev and post your comment there to engage people closer to the design of the database. g-)
X Z wrote:
Hi All, This is specifically for the SER/OpenSER developers, but I'm not a serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14 production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I downloaded it for testing. I was pretty disappointed with one aspect of the new data model and I'm requesting that the developers consider a further revision on the data model.
Basically, taking all fields out of the subscriber table like Last_name, first_name, email, timezone, rpid/asserted identity, etc, etc is not the greatest idea. It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user, and that is common in > 90% of SER's use cases ( i.e. corporate, carrier/VSP.) I would suggest adding voicemail_password, and maybe every other field that is being added into the default attributes script that I saw in CVS recently. If you already know what attributes a user has (and they have a 1 to 1 relationship), then its far better from a db performance perspective to keep these attributes in the user table. I know that the code becomes more complicated, but I think it may be a tradeoff worth discussing.
See this discussion ( http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... ) between Oracle users and Tom, (an Oracle engineer/architect.) The full text of this discussion is very informative and I highly recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and now being used in SER fails for all but the most trivial of applications. Maybe SER *by itself* qualifies as "trivial" from a database architect's perspective, but think about things like Asterisk integration, which is quite common. You quickly run into some very nasty queries . . .
Please note that I am not a software developer nor a database engineer, just a user who reads a lot, so I'm open to being the ignorant one here, but I thought that this should be discussed among users and developers.
Thanks for considering, Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and show you how ugly, hard and inefficient queries against your very flexible model will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for "maximum flexibility" or applications built in ways that prohibit performance. Many times - these are one in the
same thing! For example, it is well known you can represent any object in a database using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes
( attrId int primary key, attrName varchar2(255), datatype varchar2(25) );
Create table object_Attributes ( oid int, attrId int, value varchar2(4000), primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' ); insert into object_Attributes values( 1, 1, '15-mar-1965' ); insert into object_Attributes values( 1, 2, 'Thomas' ); insert into object_Attributes values( 1, 3, 'Kyte' );
commit;
insert into objects values ( 2, 'PERSON' ); insert into object_Attributes values( 2, 1, '21-oct-1968' ); insert into object_Attributes values( 2, 2, 'John' ); insert into object_Attributes values( 2, 3, 'Smith' );
commit;
And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and LAST_NAME of all PERSON records:
ops$tkyte@ORA920> select max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name 3 from objects, object_attributes, attributes 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
5 and object_attributes.attrId = attributes.attrId 6 and object_attributes.oid = objects.oid 7 and objects.name http://objects.name = 'PERSON' 8 group by objects.oid
9 /
FIRST_NAME LAST_NAME
Thomas Kyte John Smith
Looks great, right? I mean, the developers don't have to create tables anymore, we can
add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The developers can do whatever they want and the DBA can't stop them. This is ultimate "flexibility". I've seen people try to build entire systems on this model.
But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, last_name from person" query is transformed into a 3-table join with aggregates and all. Further, if the attributes are "NULLABLE" - that is, there might not be a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining which in some cases can remove more optimal query plans from consideration.
Writing queries might look pretty straightforward, but it's impossible to do in a
performant fashion. For example, if we wanted to get everyone that was born in MARCH or has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an inline view around that:
ops$tkyte@ORA920> select * 2 from ( 3 select max(decode(attrName, 'FIRST_NAME', value, null)) first_name, 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name,
5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) date_of_birth 6 from objects, object_attributes, attributes 7 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME', 'DATE_OF_BIRTH' ) 8 and object_attributes.attrId = attributes.attrId 9 and object_attributes.oid = objects.oid
10 and objects.name http://objects.name = 'PERSON' 11 group by objects.oid 12 ) 13 where last_name = 'Smith' 14 or date_of_birth like '%-mar-%'
15 /
FIRST_NAME LAST_NAME DATE_OF_BIRTH
Thomas Kyte 15-mar-1965 John Smith 21-oct-1968
So, it looks "easy" to query, but think about the performance! If you had a couple thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle would have to process the entire inner group by query first and then apply the WHERE
clause.
This is not a made up data model, one that I crafted just to make a point. This is an actual data model that I've seen people try to use. Their goal is ultimate flexibility. They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have.
Well - that is what the database was written for in the first place: Oracle implemented this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query them. You are trying to put a generic layer on top of a generic layer - and it fails each
and every time except for the most trivial of applications.
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
The current data model, while designed to make certain things easier DOES indeed encourage data duplication, as, in order to create a unified system, most of us will opt to duplicate data into more usable tables.
When meshing with an overall system, gathering all relevant data for a particular substructure or user by using joins is neither speedy (especially when the tables get HUGE -- and they very much will) nor terribly convenient.
Of course, data duplication is hardly a cardinal sin, but there's ALWAYS a trade off between abstraction and actual functionality. I understand the reasons WHY they've chosen the current data model, and to a degree it makes sense for the core SER system, but for meshing SER with other systems, it's god-awful ugly. :)
My recommendation, Mahatma, is to AVOID using the new data model for anything other than the most basic of SER functionality, or, if you gather users in the 50-100,000 user range, your user_attrs table is just going to be one ugly, unmanageably large pile of annoyance. Since the user_attrs and domain_attrs are designed in part, from what I can tell, to make selects more rational and to allow basic SIP flexibility, if you don't NEED to keep the data there, don't do it. They could keep modifying the data model to suit everyone's tastes, but right now, it's designed more to ensure that SER works and works well. Want to store user information like a cell phone number and a fax number and a timezone and a flag on whether or not that user has DND enabled or has access to the PSTN, etc, etc? Keep it elsewhere in a more usable table for such purposes.
As we've begun our testing with SER 2.0, we've had to modify things more drastically than we modified them in SER 0.9.6, but in ANY environment, you're going to have to modify things to work for you. At least the core system is flexible enough to let you tailor things. :)
N.
Greger V. Teigre wrote:
Dear Mahatma, If you want to engage the developers, you will have to subscribe to serdev, as many of the developers don't follow serusers.
I have not participated in SER's 2.0 data model, but my initial take on your suggestion is the following: With all respect, I believe you may have misunderstood what Tom is addressing in his discussion. He targets a generic database model built around attribute value pairs. This is not the case for SER, on the contrary, SER's new data model is much sounder from a db perspective (than 0.8 and 0..9) and is built around the uid and did as unique identifiers. Queries will through joins across the tables construct the needed data in a very efficient manner, as uid and did are indexed and where the queries will use uid and did in the where clause. Without having checked, I assume the tables have been normalized just as they should (i.e. splitting them up).
The attribute-value pairs you are referring to are not part of the core data model (which Tom covers), but rather attributes that may be loaded and made available in ser.cfg through a query created to retrieve the attributes-value pairs. Without creating a limited set of attributes that can be supported in ser.cfg, the generic avpairs cannot be avoided. However, the queries that retrieve avpairs do not use the semantic of the avpairs to select which avpairs to load, ALL avpairs belonging to a specific uid and did are loaded at the same time.
Also, I'm afraid this statement is wrong: "It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user." This is exactly how you should not do it if you have complex data relationships that need to be represented and retrieved without duplicating data.
Does this answer alleviate your fears? If not, please subscribe to serdev and post your comment there to engage people closer to the design of the database. g-)
X Z wrote:
Hi All, This is specifically for the SER/OpenSER developers, but I'm not a serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14 production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I downloaded it for testing. I was pretty disappointed with one aspect of the new data model and I'm requesting that the developers consider a further revision on the data model.
Basically, taking all fields out of the subscriber table like Last_name, first_name, email, timezone, rpid/asserted identity, etc, etc is not the greatest idea. It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user, and that is common in > 90% of SER's use cases ( i.e. corporate, carrier/VSP.) I would suggest adding voicemail_password, and maybe every other field that is being added into the default attributes script that I saw in CVS recently. If you already know what attributes a user has (and they have a 1 to 1 relationship), then its far better from a db performance perspective to keep these attributes in the user table. I know that the code becomes more complicated, but I think it may be a tradeoff worth discussing.
See this discussion ( http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... ) between Oracle users and Tom, (an Oracle engineer/architect.) The full text of this discussion is very informative and I highly recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and now being used in SER fails for all but the most trivial of applications. Maybe SER *by itself* qualifies as "trivial" from a database architect's perspective, but think about things like Asterisk integration, which is quite common. You quickly run into some very nasty queries . . .
Please note that I am not a software developer nor a database engineer, just a user who reads a lot, so I'm open to being the ignorant one here, but I thought that this should be discussed among users and developers.
Thanks for considering, Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and show you how ugly, hard and inefficient queries against your very flexible model will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for "maximum flexibility" or applications built in ways that prohibit performance. Many times - these are one in the
same thing! For example, it is well known you can represent any object in a database using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes
( attrId int primary key, attrName varchar2(255), datatype varchar2(25) );
Create table object_Attributes ( oid int, attrId int, value varchar2(4000), primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' ); insert into object_Attributes values( 1, 1, '15-mar-1965' ); insert into object_Attributes values( 1, 2, 'Thomas' ); insert into object_Attributes values( 1, 3, 'Kyte' );
commit;
insert into objects values ( 2, 'PERSON' ); insert into object_Attributes values( 2, 1, '21-oct-1968' ); insert into object_Attributes values( 2, 2, 'John' ); insert into object_Attributes values( 2, 3, 'Smith' );
commit;
And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and LAST_NAME of all PERSON records:
ops$tkyte@ORA920> select max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name 3 from objects, object_attributes, attributes 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
5 and object_attributes.attrId = attributes.attrId 6 and object_attributes.oid = objects.oid 7 and objects.name http://objects.name = 'PERSON' 8 group by objects.oid
9 /
FIRST_NAME LAST_NAME
Thomas Kyte John Smith
Looks great, right? I mean, the developers don't have to create tables anymore, we can
add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The developers can do whatever they want and the DBA can't stop them. This is ultimate "flexibility". I've seen people try to build entire systems on this model.
But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, last_name from person" query is transformed into a 3-table join with aggregates and all. Further, if the attributes are "NULLABLE" - that is, there might not be a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining which in some cases can remove more optimal query plans from consideration.
Writing queries might look pretty straightforward, but it's impossible to do in a
performant fashion. For example, if we wanted to get everyone that was born in MARCH or has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an inline view around that:
ops$tkyte@ORA920> select * 2 from ( 3 select max(decode(attrName, 'FIRST_NAME', value, null)) first_name, 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name,
5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) date_of_birth 6 from objects, object_attributes, attributes 7 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME', 'DATE_OF_BIRTH' ) 8 and object_attributes.attrId = attributes.attrId 9 and object_attributes.oid = objects.oid
10 and objects.name http://objects.name = 'PERSON' 11 group by objects.oid 12 ) 13 where last_name = 'Smith' 14 or date_of_birth like '%-mar-%'
15 /
FIRST_NAME LAST_NAME DATE_OF_BIRTH
Thomas Kyte 15-mar-1965 John Smith 21-oct-1968
So, it looks "easy" to query, but think about the performance! If you had a couple thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle would have to process the entire inner group by query first and then apply the WHERE
clause.
This is not a made up data model, one that I crafted just to make a point. This is an actual data model that I've seen people try to use. Their goal is ultimate flexibility. They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have.
Well - that is what the database was written for in the first place: Oracle implemented this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query them. You are trying to put a generic layer on top of a generic layer - and it fails each
and every time except for the most trivial of applications.
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Hello serdev list,
Please see the thread below started on serusers list.
I do understand the power and design of the new data model and that it does allow the SER Core functionality to be very efficient.
In addition, I definitely agree that of all the scenarios below, data duplication is the best of the undesirable options.
The undesirable options being: 1. The old data model that didn't support avpairs sufficiently, 2. The new data model that introduces very inefficient joins and ugly SQL queries in order to create a view to integrate with Asterisk or any other third party application 3. The new data model that works very well for SER core functionality PLUS a "subscriber" table that includes user data which may duplicated in the avpairs tables.
I wasn't suggesting that the avpairs model was bad, just that the result of the data model as it stands makes it inefficient to integrate with other apps data models (like Asterisk for example.)
I might also suggest that the subscriber table comes back into the core code in an expanded form that covers a high percentage of integration use cases, and that triggers get added in to populate the data which would be duplicated. Users can simply opt to not populate the subscriber table at all and the triggers can be disabled as a db installation option.
The great advantage to this is that new users don't have to reinvent the wheel when they want to integrate SER with third party apps to create an "overall system" as sip@arcdiv says below.
I'd be happy to propose a table structure and triggers if this idea gains traction with the development team. We could begin a discussion in the serusers list as to what people would like to see in this table.
I guess I'm suggesting that SER core is great, but coming from the user base, I'd like to see a few more considerations going into how SER will be used. I'm guessing that a vast majority of users will not be using it completely stand alone. Everyone is trying to add value-added services like voicemail, presence, a web interface for users and admin management, etc., etc. Can we come up with a reference model for this and included it in the repository?
I think it's very interesting that sip@ardev is commenting that he has to do more modifications to use the new version than he had to do before. I would hope that the development roadmap has improvements in core functionality AND functionality that helps users actually use SER (like simpler integrations and fewer custom mods). Making software simpler to implement always helps encourage adoption and helps increase user base and market share, which supports the success of the project long term.
Thanks, Mahatma
On Dec 3, 2007 4:17 AM, SIP sip@arcdiv.com wrote:
The current data model, while designed to make certain things easier DOES indeed encourage data duplication, as, in order to create a unified system, most of us will opt to duplicate data into more usable tables.
When meshing with an overall system, gathering all relevant data for a particular substructure or user by using joins is neither speedy (especially when the tables get HUGE -- and they very much will) nor terribly convenient.
Of course, data duplication is hardly a cardinal sin, but there's ALWAYS a trade off between abstraction and actual functionality. I understand the reasons WHY they've chosen the current data model, and to a degree it makes sense for the core SER system, but for meshing SER with other systems, it's god-awful ugly. :)
My recommendation, Mahatma, is to AVOID using the new data model for anything other than the most basic of SER functionality, or, if you gather users in the 50-100,000 user range, your user_attrs table is just going to be one ugly, unmanageably large pile of annoyance. Since the user_attrs and domain_attrs are designed in part, from what I can tell, to make selects more rational and to allow basic SIP flexibility, if you don't NEED to keep the data there, don't do it. They could keep modifying the data model to suit everyone's tastes, but right now, it's designed more to ensure that SER works and works well. Want to store user information like a cell phone number and a fax number and a timezone and a flag on whether or not that user has DND enabled or has access to the PSTN, etc, etc? Keep it elsewhere in a more usable table for such purposes.
As we've begun our testing with SER 2.0, we've had to modify things more drastically than we modified them in SER 0.9.6, but in ANY environment, you're going to have to modify things to work for you. At least the core system is flexible enough to let you tailor things. :)
N.
Greger V. Teigre wrote:
Dear Mahatma, If you want to engage the developers, you will have to subscribe to serdev, as many of the developers don't follow serusers.
I have not participated in SER's 2.0 data model, but my initial take on your suggestion is the following: With all respect, I believe you may have misunderstood what Tom is addressing in his discussion. He targets a generic database model built around attribute value pairs. This is not the case for SER, on the contrary, SER's new data model is much sounder from a db perspective (than 0.8 and 0..9) and is built around the uid and did as unique identifiers. Queries will through joins across the tables construct the needed data in a very efficient manner, as uid and did are indexed and where the queries will use uid and did in the where clause. Without having checked, I assume the tables have been normalized just as they should (i.e. splitting them up).
The attribute-value pairs you are referring to are not part of the core data model (which Tom covers), but rather attributes that may be loaded and made available in ser.cfg through a query created to retrieve the attributes-value pairs. Without creating a limited set of attributes that can be supported in ser.cfg, the generic avpairs cannot be avoided. However, the queries that retrieve avpairs do not use the semantic of the avpairs to select which avpairs to load, ALL avpairs belonging to a specific uid and did are loaded at the same time.
Also, I'm afraid this statement is wrong: "It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user." This is exactly how you should not do it if you have complex data relationships that need to be represented and retrieved without duplicating data.
Does this answer alleviate your fears? If not, please subscribe to serdev and post your comment there to engage people closer to the design of the database. g-)
X Z wrote:
Hi All, This is specifically for the SER/OpenSER developers, but I'm not a serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14 production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I downloaded it for testing. I was pretty disappointed with one aspect of the new data model and I'm requesting that the developers consider a further revision on the data model.
Basically, taking all fields out of the subscriber table like Last_name, first_name, email, timezone, rpid/asserted identity, etc, etc is not the greatest idea. It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user, and that is common in > 90% of SER's use cases ( i.e. corporate, carrier/VSP.) I would suggest adding voicemail_password, and maybe every other field that is being added into the default attributes script that I saw in CVS recently. If you already know what attributes a user has (and they have a 1 to 1 relationship), then its far better from a db performance perspective to keep these attributes in the user table. I know that the code becomes more complicated, but I think it may be a tradeoff worth discussing.
See this discussion (
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084...
) between Oracle users and Tom, (an Oracle engineer/architect.) The full text of this discussion is very informative and I highly recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and now being used in SER fails for all but the most trivial of applications. Maybe SER *by itself* qualifies as "trivial" from a database architect's perspective, but think about things like Asterisk integration, which is quite common. You quickly run into some very nasty queries . . .
Please note that I am not a software developer nor a database engineer, just a user who reads a lot, so I'm open to being the ignorant one here, but I thought that this should be discussed among users and developers.
Thanks for considering, Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084...
<
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084...
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and
show you how
ugly, hard and inefficient queries against your very flexible model
will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for
"maximum flexibility" or
applications built in ways that prohibit performance. Many times -
these are one in the
same thing! For example, it is well known you can represent any object
in a database
using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes
( attrId int primary key, attrName varchar2(255), datatype varchar2(25) );
Create table object_Attributes ( oid int, attrId int, value varchar2(4000), primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes
table up with rows
like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' ); insert into object_Attributes values( 1, 1, '15-mar-1965' ); insert into object_Attributes values( 1, 2, 'Thomas' ); insert into object_Attributes values( 1, 3, 'Kyte' );
commit;
insert into objects values ( 2, 'PERSON' ); insert into object_Attributes values( 2, 1, '21-oct-1968' ); insert into object_Attributes values( 2, 2, 'John' ); insert into object_Attributes values( 2, 3, 'Smith' );
commit;
And since I'm good at SQL, I can even query this record up to get the
FIRST_NAME and
LAST_NAME of all PERSON records:
ops$tkyte@ORA920> select max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name 3 from objects, object_attributes, attributes 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
5 and object_attributes.attrId = attributes.attrId 6 and object_attributes.oid = objects.oid 7 and objects.name http://objects.name = 'PERSON' 8 group by objects.oid
9 /
FIRST_NAME LAST_NAME
Thomas Kyte John Smith
Looks great, right? I mean, the developers don't have to create tables
anymore, we can
add columns at the drop of a hat (just requires an insert into the
ATTRIBUTES table). The
developers can do whatever they want and the DBA can't stop them. This
is ultimate
"flexibility". I've seen people try to build entire systems on this
model.
But, how does it perform? Miserably, terribly, horribly. A simple
"select first_name,
last_name from person" query is transformed into a 3-table join with
aggregates and all.
Further, if the attributes are "NULLABLE" - that is, there might not be
a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join
instead of just joining
which in some cases can remove more optimal query plans from
consideration.
Writing queries might look pretty straightforward, but it's impossible
to do in a
performant fashion. For example, if we wanted to get everyone that was
born in MARCH or
has a LAST_NAME = 'SMITH', we could simply take the query from above
and just wrap an
inline view around that:
ops$tkyte@ORA920> select * 2 from ( 3 select max(decode(attrName, 'FIRST_NAME', value, null)) first_name, 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name,
5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) date_of_birth 6 from objects, object_attributes, attributes 7 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME', 'DATE_OF_BIRTH' ) 8 and object_attributes.attrId = attributes.attrId 9 and object_attributes.oid = objects.oid
10 and objects.name http://objects.name = 'PERSON' 11 group by objects.oid 12 ) 13 where last_name = 'Smith' 14 or date_of_birth like '%-mar-%'
15 /
FIRST_NAME LAST_NAME DATE_OF_BIRTH
Thomas Kyte 15-mar-1965 John Smith 21-oct-1968
So, it looks "easy" to query, but think about the performance! If you
had a couple
thousand OBJECT records, and a couple tens of thousands of
OBJECT_ATTRIBUTES - Oracle
would have to process the entire inner group by query first and then
apply the WHERE
clause.
This is not a made up data model, one that I crafted just to make a
point. This is an
actual data model that I've seen people try to use. Their goal is
ultimate flexibility.
They don't know what OBJECTS they need, they don't know what ATTRIBUTES
they will have.
Well - that is what the database was written for in the first place:
Oracle implemented
this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use
SQL to query
them. You are trying to put a generic layer on top of a generic layer -
and it fails each
and every time except for the most trivial of applications.
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
On the flip side of having to do more modifications for integration, of course, is that we get the ABILITY to do more modifications.
SER 0.9.6 wasn't limited, per se, but certain aspects of what we've implemented (user-configurable timers, number-based AND uri-based call blocking, call forwarding, call return, whitelisting number mode, etc) in 0.9.6 has felt like a complete and total hack because of the way it has had to be done using a mish mash of AVPs. We're HOPING we can forego some of that with the new SER using the new db queries and such, as well as being able to better-implement some functionality of other RFCs using the more flexible core.
But yes... integration with the new db schema is proving to be a pain in the left ventricle if only because it's so totally focused around optimisation for SER.
As with all new stuff, there's good and there's bad.
N.
X Z wrote:
Hello serdev list,
Please see the thread below started on serusers list.
I do understand the power and design of the new data model and that it does allow the SER Core functionality to be very efficient.
In addition, I definitely agree that of all the scenarios below, data duplication is the best of the undesirable options.
The undesirable options being:
- The old data model that didn't support avpairs sufficiently,
- The new data model that introduces very inefficient joins and ugly
SQL queries in order to create a view to integrate with Asterisk or any other third party application 3. The new data model that works very well for SER core functionality PLUS a "subscriber" table that includes user data which may duplicated in the avpairs tables.
I wasn't suggesting that the avpairs model was bad, just that the result of the data model as it stands makes it inefficient to integrate with other apps data models (like Asterisk for example.)
I might also suggest that the subscriber table comes back into the core code in an expanded form that covers a high percentage of integration use cases, and that triggers get added in to populate the data which would be duplicated. Users can simply opt to not populate the subscriber table at all and the triggers can be disabled as a db installation option.
The great advantage to this is that new users don't have to reinvent the wheel when they want to integrate SER with third party apps to create an "overall system" as sip@arcdiv says below.
I'd be happy to propose a table structure and triggers if this idea gains traction with the development team. We could begin a discussion in the serusers list as to what people would like to see in this table.
I guess I'm suggesting that SER core is great, but coming from the user base, I'd like to see a few more considerations going into how SER will be used. I'm guessing that a vast majority of users will not be using it completely stand alone. Everyone is trying to add value-added services like voicemail, presence, a web interface for users and admin management, etc., etc. Can we come up with a reference model for this and included it in the repository?
I think it's very interesting that sip@ardev is commenting that he has to do more modifications to use the new version than he had to do before. I would hope that the development roadmap has improvements in core functionality AND functionality that helps users actually use SER (like simpler integrations and fewer custom mods). Making software simpler to implement always helps encourage adoption and helps increase user base and market share, which supports the success of the project long term.
Thanks, Mahatma
On Dec 3, 2007 4:17 AM, SIP <sip@arcdiv.com mailto:sip@arcdiv.com> wrote:
The current data model, while designed to make certain things easier DOES indeed encourage data duplication, as, in order to create a unified system, most of us will opt to duplicate data into more usable tables. When meshing with an overall system, gathering all relevant data for a particular substructure or user by using joins is neither speedy (especially when the tables get HUGE -- and they very much will) nor terribly convenient. Of course, data duplication is hardly a cardinal sin, but there's ALWAYS a trade off between abstraction and actual functionality. I understand the reasons WHY they've chosen the current data model, and to a degree it makes sense for the core SER system, but for meshing SER with other systems, it's god-awful ugly. :) My recommendation, Mahatma, is to AVOID using the new data model for anything other than the most basic of SER functionality, or, if you gather users in the 50-100,000 user range, your user_attrs table is just going to be one ugly, unmanageably large pile of annoyance. Since the user_attrs and domain_attrs are designed in part, from what I can tell, to make selects more rational and to allow basic SIP flexibility, if you don't NEED to keep the data there, don't do it. They could keep modifying the data model to suit everyone's tastes, but right now, it's designed more to ensure that SER works and works well. Want to store user information like a cell phone number and a fax number and a timezone and a flag on whether or not that user has DND enabled or has access to the PSTN, etc, etc? Keep it elsewhere in a more usable table for such purposes. As we've begun our testing with SER 2.0, we've had to modify things more drastically than we modified them in SER 0.9.6, but in ANY environment, you're going to have to modify things to work for you. At least the core system is flexible enough to let you tailor things. :) N. Greger V. Teigre wrote: > Dear Mahatma, > If you want to engage the developers, you will have to subscribe to > serdev, as many of the developers don't follow serusers. > > I have not participated in SER's 2.0 data model, but my initial take > on your suggestion is the following: > With all respect, I believe you may have misunderstood what Tom is > addressing in his discussion. He targets a generic database model > built around attribute value pairs. This is not the case for SER, on > the contrary, SER's new data model is much sounder from a db > perspective (than 0.8 and 0..9) and is built around the uid and did as > unique identifiers. Queries will through joins across the tables > construct the needed data in a very efficient manner, as uid and did > are indexed and where the queries will use uid and did in the where > clause. Without having checked, I assume the tables have been > normalized just as they should (i.e. splitting them up). > > The attribute-value pairs you are referring to are not part of the > core data model (which Tom covers), but rather attributes that may be > loaded and made available in ser.cfg through a query created to > retrieve the attributes-value pairs. Without creating a limited set of > attributes that can be supported in ser.cfg, the generic avpairs > cannot be avoided. However, the queries that retrieve avpairs do not > use the semantic of the avpairs to select which avpairs to load, ALL > avpairs belonging to a specific uid and did are loaded at the same time. > > Also, I'm afraid this statement is wrong: "It's a better idea from a > database architecture and performance perspective to keep adding > columns into that table for data that has a 1 to 1 relationship with a > user." > This is exactly how you should not do it if you have complex data > relationships that need to be represented and retrieved without > duplicating data. > > Does this answer alleviate your fears? > If not, please subscribe to serdev and post your comment there to > engage people closer to the design of the database. > g-) > > > > > X Z wrote: >> Hi All, >> This is specifically for the SER/OpenSER developers, but I'm not a >> serdev list member so I'm posting here. >> >> I've been using SER since version 0.8.X and I'm still running 0.8.14 >> production for my company PBX to this day. >> >> I was very excited as version 2 became a release candidate and I >> downloaded it for testing. I was pretty disappointed with one aspect >> of the new data model and I'm requesting that the developers consider >> a further revision on the data model. >> >> Basically, taking all fields out of the subscriber table like >> Last_name, first_name, email, timezone, rpid/asserted identity, etc, >> etc is not the greatest idea. It's a better idea from a database >> architecture and performance perspective to keep adding columns into >> that table for data that has a 1 to 1 relationship with a user, and >> that is common in > 90% of SER's use cases ( i.e. corporate, >> carrier/VSP.) I would suggest adding voicemail_password, and maybe >> every other field that is being added into the default attributes >> script that I saw in CVS recently. If you already know what >> attributes a user has (and they have a 1 to 1 relationship), then its >> far better from a db performance perspective to keep these attributes >> in the user table. I know that the code becomes more complicated, but >> I think it may be a tradeoff worth discussing. >> >> See this discussion ( >> http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056 >> ) between Oracle users and Tom, (an Oracle engineer/architect.) The >> full text of this discussion is very informative and I highly >> recommend people read it through. >> >> Tom's conclusion is that the type of data model being discussed, and >> now being used in SER fails for all but the most trivial of >> applications. Maybe SER *by itself* qualifies as "trivial" from a >> database architect's perspective, but think about things like >> Asterisk integration, which is quite common. You quickly run into >> some very nasty queries . . . >> >> Please note that I am not a software developer nor a database >> engineer, just a user who reads a lot, so I'm open to being the >> ignorant one here, but I thought that this should be discussed among >> users and developers. >> >> Thanks for considering, >> Mahatma >> >> http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056 <http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056> >> <http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056> >> >> >> The following is an excerpt from the above link: >> >> Here is a excerpt from my forthcoming book where I talk about this (and show you how >> ugly, hard and inefficient queries against your very flexible model will be) >> >> >> >> >> (2)Do not use Generic Data Models >> >> Frequently I see applications built on a generic data model for "maximum flexibility" or >> applications built in ways that prohibit performance. Many times - these are one in the >> >> same thing! For example, it is well known you can represent any object in a database >> using just four tables: >> >> Create table objects ( oid int primary key, name varchar2(255) ); >> >> Create table attributes >> >> ( attrId int primary key, attrName varchar2(255), >> datatype varchar2(25) ); >> >> Create table object_Attributes >> ( oid int, attrId int, value varchar2(4000), >> primary key(oid,attrId) ); >> >> Create table Links ( oid1 int, oid2 int, >> >> primary key (oid1, oid2) ); >> >> >> That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows >> like this: >> >> insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' ); >> >> insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); >> insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); >> commit; >> >> >> And now I'm ready to create a PERSON record: >> >> >> insert into objects values ( 1, 'PERSON' ); >> insert into object_Attributes values( 1, 1, '15-mar-1965' ); >> insert into object_Attributes values( 1, 2, 'Thomas' ); >> insert into object_Attributes values( 1, 3, 'Kyte' ); >> >> commit; >> >> insert into objects values ( 2, 'PERSON' ); >> insert into object_Attributes values( 2, 1, '21-oct-1968' ); >> insert into object_Attributes values( 2, 2, 'John' ); >> insert into object_Attributes values( 2, 3, 'Smith' ); >> >> commit; >> >> And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and >> LAST_NAME of all PERSON records: >> >> ops$tkyte@ORA920 > select >> max( decode(attrName, 'FIRST_NAME', value, null )) first_name, >> >> 2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name >> 3 from objects, object_attributes, attributes >> 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' ) >> >> 5 and object_attributes.attrId = attributes.attrId >> 6 and object_attributes.oid = objects.oid >> 7 and objects.name <http://objects.name> <http://objects.name> = 'PERSON' >> 8 group by objects.oid >> >> 9 / >> >> FIRST_NAME LAST_NAME >> -------------------- -------------------- >> Thomas Kyte >> John Smith >> >> >> Looks great, right? I mean, the developers don't have to create tables anymore, we can >> >> add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The >> developers can do whatever they want and the DBA can't stop them. This is ultimate >> "flexibility". I've seen people try to build entire systems on this model. >> >> >> But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, >> last_name from person" query is transformed into a 3-table join with aggregates and all. >> Further, if the attributes are "NULLABLE" - that is, there might not be a row in >> >> OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining >> which in some cases can remove more optimal query plans from consideration. >> >> Writing queries might look pretty straightforward, but it's impossible to do in a >> >> performant fashion. For example, if we wanted to get everyone that was born in MARCH or >> has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an >> inline view around that: >> >> >> >> ops$tkyte@ORA920> select * >> 2 from ( >> 3 select >> max(decode(attrName, 'FIRST_NAME', value, null)) first_name, >> 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name, >> >> 5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) >> date_of_birth >> 6 from objects, object_attributes, attributes >> 7 where attributes.attrName >> in ( 'FIRST_NAME', >> 'LAST_NAME', 'DATE_OF_BIRTH' ) >> 8 and object_attributes.attrId = attributes.attrId >> 9 and object_attributes.oid = objects.oid >> >> 10 and objects.name <http://objects.name> <http://objects.name> = 'PERSON' >> 11 group by objects.oid >> 12 ) >> 13 where last_name = 'Smith' >> 14 or date_of_birth like '%-mar-%' >> >> 15 / >> >> FIRST_NAME LAST_NAME DATE_OF_BIRTH >> -------------------- -------------------- -------------------- >> Thomas Kyte 15-mar-1965 >> John Smith 21-oct-1968 >> >> >> So, it looks "easy" to query, but think about the performance! If you had a couple >> thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle >> would have to process the entire inner group by query first and then apply the WHERE >> >> clause. >> >> This is not a made up data model, one that I crafted just to make a point. This is an >> actual data model that I've seen people try to use. Their goal is ultimate flexibility. >> They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have. >> >> Well - that is what the database was written for in the first place: Oracle implemented >> this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query >> them. You are trying to put a generic layer on top of a generic layer - and it fails each >> >> and every time except for the most trivial of applications. >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Serusers mailing list >> Serusers@lists.iptel.org <mailto:Serusers@lists.iptel.org> >> http://lists.iptel.org/mailman/listinfo/serusers <http://lists.iptel.org/mailman/listinfo/serusers> >> > ------------------------------------------------------------------------ > > _______________________________________________ > Serusers mailing list > Serusers@lists.iptel.org <mailto:Serusers@lists.iptel.org> > http://lists.iptel.org/mailman/listinfo/serusers >
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
At 19:39 03/12/2007, SIP wrote:
On the flip side of having to do more modifications for integration, of course, is that we get the ABILITY to do more modifications.
SER 0.9.6 wasn't limited, per se, but certain aspects of what we've implemented (user-configurable timers, number-based AND uri-based call blocking, call forwarding, call return, whitelisting number mode, etc) in 0.9.6 has felt like a complete and total hack because of the way it has had to be done using a mish mash of AVPs. We're HOPING we can forego some of that with the new SER using the new db queries and such, as well as being able to better-implement some functionality of other RFCs using the more flexible core.
But yes... integration with the new db schema is proving to be a pain in the left ventricle if only because it's so totally focused around optimisation for SER.
The question to me is whether this pain is really specific to current release of the data model (which I apparently don't think so) or to do the burden of migration from the previous. IMO, it is the latter, to be compensated by extensibility which minimizes future migration burden.
-jiri
As with all new stuff, there's good and there's bad.
N.
X Z wrote:
Hello serdev list,
Please see the thread below started on serusers list.
I do understand the power and design of the new data model and that it does allow the SER Core functionality to be very efficient.
In addition, I definitely agree that of all the scenarios below, data duplication is the best of the undesirable options.
The undesirable options being:
- The old data model that didn't support avpairs sufficiently,
- The new data model that introduces very inefficient joins and ugly
SQL queries in order to create a view to integrate with Asterisk or any other third party application 3. The new data model that works very well for SER core functionality PLUS a "subscriber" table that includes user data which may duplicated in the avpairs tables.
I wasn't suggesting that the avpairs model was bad, just that the result of the data model as it stands makes it inefficient to integrate with other apps data models (like Asterisk for example.)
I might also suggest that the subscriber table comes back into the core code in an expanded form that covers a high percentage of integration use cases, and that triggers get added in to populate the data which would be duplicated. Users can simply opt to not populate the subscriber table at all and the triggers can be disabled as a db installation option.
The great advantage to this is that new users don't have to reinvent the wheel when they want to integrate SER with third party apps to create an "overall system" as sip@arcdiv says below.
I'd be happy to propose a table structure and triggers if this idea gains traction with the development team. We could begin a discussion in the serusers list as to what people would like to see in this table.
I guess I'm suggesting that SER core is great, but coming from the user base, I'd like to see a few more considerations going into how SER will be used. I'm guessing that a vast majority of users will not be using it completely stand alone. Everyone is trying to add value-added services like voicemail, presence, a web interface for users and admin management, etc., etc. Can we come up with a reference model for this and included it in the repository?
I think it's very interesting that sip@ardev is commenting that he has to do more modifications to use the new version than he had to do before. I would hope that the development roadmap has improvements in core functionality AND functionality that helps users actually use SER (like simpler integrations and fewer custom mods). Making software simpler to implement always helps encourage adoption and helps increase user base and market share, which supports the success of the project long term.
Thanks, Mahatma
On Dec 3, 2007 4:17 AM, SIP <sip@arcdiv.com mailto:sip@arcdiv.com> wrote:
The current data model, while designed to make certain things easier DOES indeed encourage data duplication, as, in order to create a unified system, most of us will opt to duplicate data into more usable tables. When meshing with an overall system, gathering all relevant data for a particular substructure or user by using joins is neither speedy (especially when the tables get HUGE -- and they very much will) nor terribly convenient. Of course, data duplication is hardly a cardinal sin, but there's ALWAYS a trade off between abstraction and actual functionality. I understand the reasons WHY they've chosen the current data model, and to a degree it makes sense for the core SER system, but for meshing SER with other systems, it's god-awful ugly. :) My recommendation, Mahatma, is to AVOID using the new data model for anything other than the most basic of SER functionality, or, if you gather users in the 50-100,000 user range, your user_attrs table is just going to be one ugly, unmanageably large pile of annoyance. Since the user_attrs and domain_attrs are designed in part, from what I can tell, to make selects more rational and to allow basic SIP flexibility, if you don't NEED to keep the data there, don't do it. They could keep modifying the data model to suit everyone's tastes, but right now, it's designed more to ensure that SER works and works well. Want to store user information like a cell phone number and a fax number and a timezone and a flag on whether or not that user has DND enabled or has access to the PSTN, etc, etc? Keep it elsewhere in a more usable table for such purposes. As we've begun our testing with SER 2.0, we've had to modify things more drastically than we modified them in SER 0.9.6, but in ANY environment, you're going to have to modify things to work for you. At least the core system is flexible enough to let you tailor things. :) N. Greger V. Teigre wrote: > Dear Mahatma, > If you want to engage the developers, you will have to subscribe to > serdev, as many of the developers don't follow serusers. > > I have not participated in SER's 2.0 data model, but my initial take > on your suggestion is the following: > With all respect, I believe you may have misunderstood what Tom is > addressing in his discussion. He targets a generic database model > built around attribute value pairs. This is not the case for SER, on > the contrary, SER's new data model is much sounder from a db > perspective (than 0.8 and 0..9) and is built around the uid and did as > unique identifiers. Queries will through joins across the tables > construct the needed data in a very efficient manner, as uid and did > are indexed and where the queries will use uid and did in the where > clause. Without having checked, I assume the tables have been > normalized just as they should (i.e. splitting them up). > > The attribute-value pairs you are referring to are not part of the > core data model (which Tom covers), but rather attributes that may be > loaded and made available in ser.cfg through a query created to > retrieve the attributes-value pairs. Without creating a limited set of > attributes that can be supported in ser.cfg, the generic avpairs > cannot be avoided. However, the queries that retrieve avpairs do not > use the semantic of the avpairs to select which avpairs to load, ALL > avpairs belonging to a specific uid and did are loaded at the same time. > > Also, I'm afraid this statement is wrong: "It's a better idea from a > database architecture and performance perspective to keep adding > columns into that table for data that has a 1 to 1 relationship with a > user." > This is exactly how you should not do it if you have complex data > relationships that need to be represented and retrieved without > duplicating data. > > Does this answer alleviate your fears? > If not, please subscribe to serdev and post your comment there to > engage people closer to the design of the database. > g-) > > > > > X Z wrote: >> Hi All, >> This is specifically for the SER/OpenSER developers, but I'm not a >> serdev list member so I'm posting here. >> >> I've been using SER since version 0.8.X and I'm still running 0.8.14 >> production for my company PBX to this day. >> >> I was very excited as version 2 became a release candidate and I >> downloaded it for testing. I was pretty disappointed with one aspect >> of the new data model and I'm requesting that the developers consider >> a further revision on the data model. >> >> Basically, taking all fields out of the subscriber table like >> Last_name, first_name, email, timezone, rpid/asserted identity, etc, >> etc is not the greatest idea. It's a better idea from a database >> architecture and performance perspective to keep adding columns into >> that table for data that has a 1 to 1 relationship with a user, and >> that is common in > 90% of SER's use cases ( i.e. corporate, >> carrier/VSP.) I would suggest adding voicemail_password, and maybe >> every other field that is being added into the default attributes >> script that I saw in CVS recently. If you already know what >> attributes a user has (and they have a 1 to 1 relationship), then its >> far better from a db performance perspective to keep these attributes >> in the user table. I know that the code becomes more complicated, but >> I think it may be a tradeoff worth discussing. >> >> See this discussion ( >> http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056 >> ) between Oracle users and Tom, (an Oracle engineer/architect.) The >> full text of this discussion is very informative and I highly >> recommend people read it through. >> >> Tom's conclusion is that the type of data model being discussed, and >> now being used in SER fails for all but the most trivial of >> applications. Maybe SER *by itself* qualifies as "trivial" from a >> database architect's perspective, but think about things like >> Asterisk integration, which is quite common. You quickly run into >> some very nasty queries . . . >> >> Please note that I am not a software developer nor a database >> engineer, just a user who reads a lot, so I'm open to being the >> ignorant one here, but I thought that this should be discussed among >> users and developers. >> >> Thanks for considering, >> Mahatma >> >> http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056 <http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056> >> <http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056> >> >> >> The following is an excerpt from the above link: >> >> Here is a excerpt from my forthcoming book where I talk about this (and show you how >> ugly, hard and inefficient queries against your very flexible model will be) >> >> >> >> >> (2)Do not use Generic Data Models >> >> Frequently I see applications built on a generic data model for "maximum flexibility" or >> applications built in ways that prohibit performance. Many times - these are one in the >> >> same thing! For example, it is well known you can represent any object in a database >> using just four tables: >> >> Create table objects ( oid int primary key, name varchar2(255) ); >> >> Create table attributes >> >> ( attrId int primary key, attrName varchar2(255), >> datatype varchar2(25) ); >> >> Create table object_Attributes >> ( oid int, attrId int, value varchar2(4000), >> primary key(oid,attrId) ); >> >> Create table Links ( oid1 int, oid2 int, >> >> primary key (oid1, oid2) ); >> >> >> That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows >> like this: >> >> insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' ); >> >> insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); >> insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); >> commit; >> >> >> And now I'm ready to create a PERSON record: >> >> >> insert into objects values ( 1, 'PERSON' ); >> insert into object_Attributes values( 1, 1, '15-mar-1965' ); >> insert into object_Attributes values( 1, 2, 'Thomas' ); >> insert into object_Attributes values( 1, 3, 'Kyte' ); >> >> commit; >> >> insert into objects values ( 2, 'PERSON' ); >> insert into object_Attributes values( 2, 1, '21-oct-1968' ); >> insert into object_Attributes values( 2, 2, 'John' ); >> insert into object_Attributes values( 2, 3, 'Smith' ); >> >> commit; >> >> And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and >> LAST_NAME of all PERSON records: >> >> ops$tkyte@ORA920 > select >> max( decode(attrName, 'FIRST_NAME', value, null )) first_name, >> >> 2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name >> 3 from objects, object_attributes, attributes >> 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' ) >> >> 5 and object_attributes.attrId = attributes.attrId >> 6 and object_attributes.oid = objects.oid >> 7 and objects.name <http://objects.name> <http://objects.name> = 'PERSON' >> 8 group by objects.oid >> >> 9 / >> >> FIRST_NAME LAST_NAME >> -------------------- -------------------- >> Thomas Kyte >> John Smith >> >> >> Looks great, right? I mean, the developers don't have to create tables anymore, we can >> >> add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The >> developers can do whatever they want and the DBA can't stop them. This is ultimate >> "flexibility". I've seen people try to build entire systems on this model. >> >> >> But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, >> last_name from person" query is transformed into a 3-table join with aggregates and all. >> Further, if the attributes are "NULLABLE" - that is, there might not be a row in >> >> OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining >> which in some cases can remove more optimal query plans from consideration. >> >> Writing queries might look pretty straightforward, but it's impossible to do in a >> >> performant fashion. For example, if we wanted to get everyone that was born in MARCH or >> has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an >> inline view around that: >> >> >> >> ops$tkyte@ORA920> select * >> 2 from ( >> 3 select >> max(decode(attrName, 'FIRST_NAME', value, null)) first_name, >> 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name, >> >> 5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) >> date_of_birth >> 6 from objects, object_attributes, attributes >> 7 where attributes.attrName >> in ( 'FIRST_NAME', >> 'LAST_NAME', 'DATE_OF_BIRTH' ) >> 8 and object_attributes.attrId = attributes.attrId >> 9 and object_attributes.oid = objects.oid >> >> 10 and objects.name <http://objects.name> <http://objects.name> = 'PERSON' >> 11 group by objects.oid >> 12 ) >> 13 where last_name = 'Smith' >> 14 or date_of_birth like '%-mar-%' >> >> 15 / >> >> FIRST_NAME LAST_NAME DATE_OF_BIRTH >> -------------------- -------------------- -------------------- >> Thomas Kyte 15-mar-1965 >> John Smith 21-oct-1968 >> >> >> So, it looks "easy" to query, but think about the performance! If you had a couple >> thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle >> would have to process the entire inner group by query first and then apply the WHERE >> >> clause. >> >> This is not a made up data model, one that I crafted just to make a point. This is an >> actual data model that I've seen people try to use. Their goal is ultimate flexibility. >> They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have. >> >> Well - that is what the database was written for in the first place: Oracle implemented >> this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query >> them. You are trying to put a generic layer on top of a generic layer - and it fails each >> >> and every time except for the most trivial of applications. >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Serusers mailing list >> Serusers@lists.iptel.org <mailto:Serusers@lists.iptel.org> >> http://lists.iptel.org/mailman/listinfo/serusers <http://lists.iptel.org/mailman/listinfo/serusers> >> > ------------------------------------------------------------------------ > > _______________________________________________ > Serusers mailing list > Serusers@lists.iptel.org <mailto:Serusers@lists.iptel.org> > http://lists.iptel.org/mailman/listinfo/serusers >
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
-- Jiri Kuthan http://iptel.org/~jiri/
Jiri Kuthan wrote:
At 19:39 03/12/2007, SIP wrote:
On the flip side of having to do more modifications for integration, of course, is that we get the ABILITY to do more modifications.
SER 0.9.6 wasn't limited, per se, but certain aspects of what we've implemented (user-configurable timers, number-based AND uri-based call blocking, call forwarding, call return, whitelisting number mode, etc) in 0.9.6 has felt like a complete and total hack because of the way it has had to be done using a mish mash of AVPs. We're HOPING we can forego some of that with the new SER using the new db queries and such, as well as being able to better-implement some functionality of other RFCs using the more flexible core.
But yes... integration with the new db schema is proving to be a pain in the left ventricle if only because it's so totally focused around optimisation for SER.
The question to me is whether this pain is really specific to current release of the data model (which I apparently don't think so) or to do the burden of migration from the previous. IMO, it is the latter, to be compensated by extensibility which minimizes future migration burden.
I disagree entirely that it's strictly based on migration.
Let's look at the simple act of adding information to the system:
With a flat subscriber table with all the fields I desire, adding information to the system is 1 insert. Let's say that my subscriber table looks like this (because in the dev ser 0.9.6 box, it does):
+-------------------+--------------+------+-----+---------------------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------------------+-------+ | phplib_id | varchar(32) | NO | UNI | | | | username | varchar(64) | NO | PRI | | | | domain | varchar(128) | NO | PRI | | | | password | varchar(25) | NO | | | | | first_name | varchar(25) | NO | | | | | last_name | varchar(45) | NO | | | | | phone | varchar(15) | NO | | | | | email_address | varchar(50) | NO | | | | | datetime_created | datetime | NO | | 0000-00-00 00:00:00 | | | datetime_modified | timestamp | NO | | CURRENT_TIMESTAMP | | | confirmation | varchar(64) | NO | | | | | flag | char(1) | NO | | o | | | sendnotification | varchar(50) | NO | | | | | greeting | varchar(50) | NO | | | | | ha1 | varchar(128) | NO | | | | | ha1b | varchar(128) | NO | | | | | allow_find | char(1) | NO | | 0 | | | timezone | varchar(128) | YES | | NULL | | | rpid | varchar(128) | YES | | NULL | | | domn | int(10) | YES | | NULL | | | uuid | varchar(64) | YES | | NULL | | | signup_ip | varchar(15) | NO | | | | +-------------------+--------------+------+-----+---------------------+-------+
Now... ignoring the fact that we have 1 possibly 2 unused fields in that table, let's assume we have to add a user in the new schema.
That's 22 inserts right there (one for each attribute, especially since we can't have defaults).
Searching through data? Let's say I want to know the first_name,last_name,username,domain,confirmation, and the datetime_modified of a user. Simple enough. That's 1 query. If any of those data points were null or their default, grabbing that is still easy as can be.
Now... in the new schema, that's a join (which IS less efficient than a query, I'm afraid, though I can't quote exact meaningful metrics on how much). If any of those data points are not SET (assuming we were going to minimise our impact by NOT doing a full 22-insert creation for each user), it causes yet another series of headaches. When you want to reference data in other relations, it becomes even MORE of a hassle. The queries themselves become these unwieldy beasts that increase the chance of an error in syntax or logic.
One of the simplest queries we have is with our admin interface to query who's logged in in such a way that the data will be used in the web interface:
select username, first_name, last_name, signup_ip from subscriber where username=ANY (select username from location);
Simple. Succinct. Clear.We grab the fields we need and can format them. One short line.
Now I'll leave the new schema version as an exercise for the reader, as I don't recall the syntax right off (which is the first clue as to its added complexity). It's lengthy and utterly non-relational -- which is why we simply don't USE it for these sorts of things.
This is not just a migration headache. This is a schema problem. Now, I know you love your schema, and you're allowed, and I fully understand that it's flexible and good for the SER service. But for integration with any sort of system beyond the most basic, it's a right pain in the ass. You can tout its joys and wonders all you'd like, but we're developing with it, and I can assure you, it's not as easy and as straightforward as you like to believe.
I'm not saying there's anything wrong with the schema overall. There have been some great forward leaps in this one, such as the riddance of the incredibly tiresome domain column in every table (that was sometimes checked and sometimes not). But it DOES make for additional complexity when building a system around it.
N.
I disagree entirely that it's strictly based on migration.
Let's look at the simple act of adding information to the system:
With a flat subscriber table with all the fields I desire, adding information to the system is 1 insert. Let's say that my subscriber table looks like this (because in the dev ser 0.9.6 box, it does):
+-------------------+--------------+------+-----+---------------------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------------------+-------+ | phplib_id | varchar(32) | NO | UNI | | | | username | varchar(64) | NO | PRI | | | | domain | varchar(128) | NO | PRI | | | | password | varchar(25) | NO | | | | | first_name | varchar(25) | NO | | | | | last_name | varchar(45) | NO | | | | | phone | varchar(15) | NO | | | | | email_address | varchar(50) | NO | | | | | datetime_created | datetime | NO | | 0000-00-00 00:00:00 | | | datetime_modified | timestamp | NO | | CURRENT_TIMESTAMP | | | confirmation | varchar(64) | NO | | | | | flag | char(1) | NO | | o | | | sendnotification | varchar(50) | NO | | | | | greeting | varchar(50) | NO | | | | | ha1 | varchar(128) | NO | | | | | ha1b | varchar(128) | NO | | | | | allow_find | char(1) | NO | | 0 | | | timezone | varchar(128) | YES | | NULL | | | rpid | varchar(128) | YES | | NULL | | | domn | int(10) | YES | | NULL | | | uuid | varchar(64) | YES | | NULL | | | signup_ip | varchar(15) | NO | | | | +-------------------+--------------+------+-----+---------------------+-------+
Now... ignoring the fact that we have 1 possibly 2 unused fields in that table, let's assume we have to add a user in the new schema.
let's be first just a bit more accurate: - uuid, username,password,domain name, ha's, and their equivalents are like before in the table, which is now called credentials (about 7) - so we are speaking about the application stuff, which may be very different for different uses: callid, signup_up, timezone, email address, confirmation string, etc. etc. (about 15, how many depending on the actual deployment -- I'm wondering who is for example using "sendnotification")
That's 22 inserts right there (one for each attribute, especially since we can't have defaults).
I would guess about ten as opposed to 22. Nevertheless, as long as they are inserted by machines as opposed to humans, I'm less worried about that.
Searching through data? Let's say I want to know the first_name,last_name,username,domain,confirmation, and the datetime_modified of a user. Simple enough. That's 1 query. If any of those data points were null or their default, grabbing that is still easy as can be.
Now... in the new schema, that's a join (which IS less efficient than a query,
It is not necessarily join -- it may be a double query, with the first query being trivial (delivering UID) and the second actually too (querying attributed by UID). Many applications have UID stored about lifetime of their session (a subscriber logs in, his UID is stored) and effectively carry out only the latter query.
I'm afraid, though I can't quote exact meaningful metrics on how much).
Well, without that, this discussion is a bit missing grounds. I mean we can't really speak about performance concerns unless we have the numbers for it.
If any of those data points are not SET (assuming we were going to minimise our impact by NOT doing a full 22-insert creation for each user), it causes yet another series of headaches. When you want to reference data in other relations, it becomes even MORE of a hassle. The queries themselves become these unwieldy beasts that increase the chance of an error in syntax or logic.
One of the simplest queries we have is with our admin interface to query who's logged in in such a way that the data will be used in the web interface:
select username, first_name, last_name, signup_ip from subscriber where username=ANY (select username from location);
Simple. Succinct. Clear.We grab the fields we need and can format them. One short line.
I would add in your languagne "non-relational" too. Squizing everything you can in a single table does not have much of relation in it, does it?
Now I'll leave the new schema version as an exercise for the reader, as I don't recall the syntax right off (which is the first clue as to its added complexity). It's lengthy and utterly non-relational -- which is why we simply don't USE it for these sorts of things.
I understand that formulating a query may be more complex here, and that's for sure a good point.
This is not just a migration headache. This is a schema problem. Now, I know you love your schema, and you're allowed, and I fully understand that it's flexible and good for the SER service. But for integration with any sort of system beyond the most basic, it's a right pain in the ass. You can tout its joys and wonders all you'd like, but we're developing with it, and I can assure you, it's not as easy and as straightforward as you like to believe.
Maybe it depends on the type of apps you develop. We have had very dynamic apps too (say a la phpmyadmin), and the pain of changing scheme was unbearable.
I'm not saying there's anything wrong with the schema overall. There have been some great forward leaps in this one, such as the riddance of the incredibly tiresome domain column in every table (that was sometimes checked and sometimes not). But it DOES make for additional complexity when building a system around it.
Point taken, that's for sure agreeable. Again, if you think that's a way too big obstacle, doing a patch which allows to read AVPs from credentials table should be easy and should not cause conflicts with rest of codebase. It could be even done in a way that works over both types of structures. (Despite a certain mess risk.)
-jiri
N. _______________________________________________ Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
-- Jiri Kuthan http://iptel.org/~jiri/
At 18:05 03/12/2007, X Z wrote:
Hello serdev list,
Please see the thread below started on serusers list.
I do understand the power and design of the new data model and that it does allow the SER Core functionality to be very efficient.
In addition, I definitely agree that of all the scenarios below, data duplication is the best of the undesirable options.
The undesirable options being:
- The old data model that didn't support avpairs sufficiently,
- The new data model that introduces very inefficient joins and ugly SQL queries in order to create a view to integrate with Asterisk or any other third party application
- The new data model that works very well for SER core functionality PLUS a "subscriber" table that includes user data which may duplicated in the avpairs tables.
I wasn't suggesting that the avpairs model was bad, just that the result of the data model as it stands makes it inefficient to integrate with other apps data models (like Asterisk for example.)
I might also suggest that the subscriber table comes back into the core code in an expanded form that covers a high percentage of integration use cases, and that triggers get added in to populate the data which would be duplicated. Users can simply opt to not populate the subscriber table at all and the triggers can be disabled as a db installation option.
The trouble is everyone would wish to have his own subscriber table. One with call-forwarding, someone else with credit-card number, the other with subscriber's age, etc. That's clearly a ticket for problems -- we would have as many data model forks as SER uses outthere.
We could keep at least the codebase under such a model still quite stable by allowing to interpret any column in the credentials table as AVPs to be loaded along with credentials. That would be a minor code change, which would remain stable, and address the desires to load a user record with all possible stuff. However, migrations of data on any changes would be sort of painful.
Also for those, who think they need to do something very customized, there is the db_ops module.
The great advantage to this is that new users don't have to reinvent the wheel when they want to integrate SER with third party apps to create an "overall system" as sip@arcdiv says below.
I'd be happy to propose a table structure and triggers if this idea gains traction with the development team. We could begin a discussion in the serusers list as to what people would like to see in this table.
I guess I'm suggesting that SER core is great, but coming from the user base, I'd like to see a few more considerations going into how SER will be used. I'm guessing that a vast majority of users will not be using it completely stand alone. Everyone is trying to add value-added services like voicemail, presence, a web interface for users and admin management, etc., etc. Can we come up with a reference model for this and included it in the repository?
Exactly this type of things is what the current data model allows to do easily: whatever new stuff you have, you are not going to change your data structuctre and depart from what anyone else is having.
I think it's very interesting that sip@ardev is commenting that he has to do more modifications to use the new version than he had to do before.
I'm interested in that too and I'm frankly curious what those are.
I would hope that the development roadmap has improvements in core functionality AND functionality that helps users actually use SER (like simpler integrations and fewer custom mods). Making software simpler to implement always helps encourage adoption and helps increase user base and market share, which supports the success of the project long term.
I'm still not getting what is difficutl here????
-jiri
Thanks, Mahatma
On Dec 3, 2007 4:17 AM, SIP <mailto:sip@arcdiv.comsip@arcdiv.com> wrote: The current data model, while designed to make certain things easier DOES indeed encourage data duplication, as, in order to create a unified system, most of us will opt to duplicate data into more usable tables. When meshing with an overall system, gathering all relevant data for a particular substructure or user by using joins is neither speedy (especially when the tables get HUGE -- and they very much will) nor terribly convenient. Of course, data duplication is hardly a cardinal sin, but there's ALWAYS a trade off between abstraction and actual functionality. I understand the reasons WHY they've chosen the current data model, and to a degree it makes sense for the core SER system, but for meshing SER with other systems, it's god-awful ugly. :) My recommendation, Mahatma, is to AVOID using the new data model for anything other than the most basic of SER functionality, or, if you gather users in the 50-100,000 user range, your user_attrs table is just going to be one ugly, unmanageably large pile of annoyance. Since the user_attrs and domain_attrs are designed in part, from what I can tell, to make selects more rational and to allow basic SIP flexibility, if you don't NEED to keep the data there, don't do it. They could keep modifying the data model to suit everyone's tastes, but right now, it's designed more to ensure that SER works and works well. Want to store user information like a cell phone number and a fax number and a timezone and a flag on whether or not that user has DND enabled or has access to the PSTN, etc, etc? Keep it elsewhere in a more usable table for such purposes. As we've begun our testing with SER 2.0, we've had to modify things more drastically than we modified them in SER 0.9.6, but in ANY environment, you're going to have to modify things to work for you. At least the core system is flexible enough to let you tailor things. :) N.
Greger V. Teigre wrote:
Dear Mahatma, If you want to engage the developers, you will have to subscribe to serdev, as many of the developers don't follow serusers.
I have not participated in SER's 2.0 data model, but my initial take on your suggestion is the following: With all respect, I believe you may have misunderstood what Tom is addressing in his discussion. He targets a generic database model built around attribute value pairs. This is not the case for SER, on the contrary, SER's new data model is much sounder from a db perspective (than 0.8 and 0..9) and is built around the uid and did as unique identifiers. Queries will through joins across the tables construct the needed data in a very efficient manner, as uid and did are indexed and where the queries will use uid and did in the where clause. Without having checked, I assume the tables have been normalized just as they should (i.e. splitting them up).
The attribute-value pairs you are referring to are not part of the core data model (which Tom covers), but rather attributes that may be loaded and made available in ser.cfg through a query created to retrieve the attributes-value pairs. Without creating a limited set of attributes that can be supported in ser.cfg, the generic avpairs cannot be avoided. However, the queries that retrieve avpairs do not use the semantic of the avpairs to select which avpairs to load, ALL avpairs belonging to a specific uid and did are loaded at the same time.
Also, I'm afraid this statement is wrong: "It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user." This is exactly how you should not do it if you have complex data relationships that need to be represented and retrieved without duplicating data.
Does this answer alleviate your fears? If not, please subscribe to serdev and post your comment there to engage people closer to the design of the database. g-)
X Z wrote:
Hi All, This is specifically for the SER/OpenSER developers, but I'm not a serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14 production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I downloaded it for testing. I was pretty disappointed with one aspect of the new data model and I'm requesting that the developers consider a further revision on the data model.
Basically, taking all fields out of the subscriber table like Last_name, first_name, email, timezone, rpid/asserted identity, etc, etc is not the greatest idea. It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user, and that is common in > 90% of SER's use cases ( i.e. corporate, carrier/VSP.) I would suggest adding voicemail_password, and maybe every other field that is being added into the default attributes script that I saw in CVS recently. If you already know what attributes a user has (and they have a 1 to 1 relationship), then its far better from a db performance perspective to keep these attributes in the user table. I know that the code becomes more complicated, but I think it may be a tradeoff worth discussing.
See this discussion ( http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... ) between Oracle users and Tom, (an Oracle engineer/architect.) The full text of this discussion is very informative and I highly recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and now being used in SER fails for all but the most trivial of applications. Maybe SER *by itself* qualifies as "trivial" from a database architect's perspective, but think about things like Asterisk integration, which is quite common. You quickly run into some very nasty queries . . .
Please note that I am not a software developer nor a database engineer, just a user who reads a lot, so I'm open to being the ignorant one here, but I thought that this should be discussed among users and developers.
Thanks for considering, Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and show you how ugly, hard and inefficient queries against your very flexible model will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for "maximum flexibility" or applications built in ways that prohibit performance. Many times - these are one in the
same thing! For example, it is well known you can represent any object in a database using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes
( attrId int primary key, attrName varchar2(255), datatype varchar2(25) );
Create table object_Attributes ( oid int, attrId int, value varchar2(4000), primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' ); insert into object_Attributes values( 1, 1, '15-mar-1965' ); insert into object_Attributes values( 1, 2, 'Thomas' ); insert into object_Attributes values( 1, 3, 'Kyte' );
commit;
insert into objects values ( 2, 'PERSON' ); insert into object_Attributes values( 2, 1, '21-oct-1968' ); insert into object_Attributes values( 2, 2, 'John' ); insert into object_Attributes values( 2, 3, 'Smith' );
commit;
And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and LAST_NAME of all PERSON records:
ops$tkyte@ORA920 > select max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name 3 from objects, object_attributes, attributes 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
5 and object_attributes.attrId = attributes.attrId 6 and object_attributes.oid = objects.oid 7 and http://objects.nameobjects.name http://objects.name = 'PERSON' 8 group by objects.oid
9 /
FIRST_NAME LAST_NAME
Thomas Kyte John Smith
Looks great, right? I mean, the developers don't have to create tables anymore, we can
add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The developers can do whatever they want and the DBA can't stop them. This is ultimate "flexibility". I've seen people try to build entire systems on this model.
But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, last_name from person" query is transformed into a 3-table join with aggregates and all. Further, if the attributes are "NULLABLE" - that is, there might not be a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining which in some cases can remove more optimal query plans from consideration.
Writing queries might look pretty straightforward, but it's impossible to do in a
performant fashion. For example, if we wanted to get everyone that was born in MARCH or has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an inline view around that:
ops$tkyte@ORA920> select * 2 from ( 3 select max(decode(attrName, 'FIRST_NAME', value, null)) first_name, 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name,
5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) date_of_birth 6 from objects, object_attributes, attributes 7 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME', 'DATE_OF_BIRTH' ) 8 and object_attributes.attrId = attributes.attrId 9 and object_attributes.oid = objects.oid
10 and http://objects.nameobjects.name http://objects.name = 'PERSON' 11 group by objects.oid 12 ) 13 where last_name = 'Smith' 14 or date_of_birth like '%-mar-%'
15 /
FIRST_NAME LAST_NAME DATE_OF_BIRTH
Thomas Kyte 15-mar-1965 John Smith 21-oct-1968
So, it looks "easy" to query, but think about the performance! If you had a couple thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle would have to process the entire inner group by query first and then apply the WHERE
clause.
This is not a made up data model, one that I crafted just to make a point. This is an actual data model that I've seen people try to use. Their goal is ultimate flexibility. They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have.
Well - that is what the database was written for in the first place: Oracle implemented this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query them. You are trying to put a generic layer on top of a generic layer - and it fails each
and every time except for the most trivial of applications.
Serusers mailing list mailto:Serusers@lists.iptel.orgSerusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Serusers mailing list mailto:Serusers@lists.iptel.orgSerusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
-- Jiri Kuthan http://iptel.org/~jiri/
At 13:17 03/12/2007, SIP wrote:
The current data model, while designed to make certain things easier DOES indeed encourage data duplication, as, in order to create a unified system, most of us will opt to duplicate data into more usable tables. When meshing with an overall system, gathering all relevant data for a particular substructure or user by using joins is neither speedy (especially when the tables get HUGE -- and they very much will) nor terribly convenient. Of course, data duplication is hardly a cardinal sin, but there's ALWAYS a trade off between abstraction and actual functionality. I understand the reasons WHY they've chosen the current data model, and to a degree it makes sense for the core SER system, but for meshing SER with other systems, it's god-awful ugly. :)
My recommendation, Mahatma, is to AVOID using the new data model for anything other than the most basic of SER functionality, or, if you gather users in the 50-100,000 user range, your user_attrs table is just going to be one ugly, unmanageably large pile of annoyance.
My recommendation is USE it exactly for this reason. It allows extending functionality and making upgrades easier. The penalty is amount of data in a single table, but this type of problems used to be challenging in 70s IMO. We have been using it without any duplication, and I'm just wondering how one can come to the idea of putting any duplications in it.
Really -- I'm interested in a technichal discussion grounded on technical arguents. extensibility and easy-to-upgrade are such. "ugly, unmanagable," etc are expression of estehtic choice. That is of course is interesting to share, but unlikely to gain momentum for any change.
-jiri
-- Jiri Kuthan http://iptel.org/~jiri/
Let me take on this, as I'm actually a keen advocate of the current DB model. Just a quick heads up before I go into details: I believe that the application with AVPs in a separate table creates a great value, and the performance penalty for it is sustainable. Note also that this is still a minor point compared to the data model as whole (and for those keen about this one, it can be supported by adding a rather easy SER operation).
Again -- I believe it's a trade-off between application and efficiency. Let me first begin with the application because I consider that clearly more important. The challange is that any ever-changing scheme is just too troublesome. It causes migration pain and it prohibits easy-to-do extensions. Think of web2.0 :-) .... admins or even subscribers wish to add a new attributed which will have impact on signaling, charging, whatever aspect of SIP life-cycle you are interested in. Changing table scheme makes it *very* difficult, and bites you back again on your next upgrade.
Now the question is how big is the actual penalty for having attributes in an extensible way in a separate table. Whereas there have been boldly articulated concerns, I've been missing data supporting those. Keep in mind that SER does very simple queries -- 1) determine UID 2) load associated AVPs.
Also -- the debate discussing this thing as "data model" as whole went IMO quite far. As Greger has mentioned, the consolidation from ambiguous strings to unambiguous IDs has greatly improved sanity. Just one example I have posted before: think of john@doe.com, who desired to change to have a new aliaas john.doe@doe.com, which changed after he married and took his spouse's name to john.foo@doe.com and later after the domain beinq acquired by someone else to john.foo@oracle.com. imagine what your CDR processing would generated if it was based on URIs.....
So I think proper next steps could look like this: - if folks have some reasonable measurements showing that AVPs in a separaete table are troublesome, come up with those. (for sake of completeness: SER is having a very trivial DB usage patterns and we don't doe any joins -- we only do two queries: a) determine UID b) load attributes specific to UIDs) - to provide a choice to those who for whatever reason don't wish to use attributes in a separate table, it should be easy to create a patch that fills the AVPs with values from credentials/columns (as opposed to from a separate table)
-jiri
At 10:25 03/12/2007, Greger V. Teigre wrote:
Dear Mahatma, If you want to engage the developers, you will have to subscribe to serdev, as many of the developers don't follow serusers.
I have not participated in SER's 2.0 data model, but my initial take on your suggestion is the following: With all respect, I believe you may have misunderstood what Tom is addressing in his discussion. He targets a generic database model built around attribute value pairs. This is not the case for SER, on the contrary, SER's new data model is much sounder from a db perspective (than 0.8 and 0..9) and is built around the uid and did as unique identifiers. Queries will through joins across the tables construct the needed data in a very efficient manner, as uid and did are indexed and where the queries will use uid and did in the where clause. Without having checked, I assume the tables have been normalized just as they should (i.e. splitting them up).
The attribute-value pairs you are referring to are not part of the core data model (which Tom covers), but rather attributes that may be loaded and made available in ser.cfg through a query created to retrieve the attributes-value pairs. Without creating a limited set of attributes that can be supported in ser.cfg, the generic avpairs cannot be avoided. However, the queries that retrieve avpairs do not use the semantic of the avpairs to select which avpairs to load, ALL avpairs belonging to a specific uid and did are loaded at the same time.
Also, I'm afraid this statement is wrong: "It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user." This is exactly how you should not do it if you have complex data relationships that need to be represented and retrieved without duplicating data.
Does this answer alleviate your fears? If not, please subscribe to serdev and post your comment there to engage people closer to the design of the database. g-)
X Z wrote:
Hi All, This is specifically for the SER/OpenSER developers, but I'm not a serdev list member so I'm posting here.
I've been using SER since version 0.8.X and I'm still running 0.8.14 production for my company PBX to this day.
I was very excited as version 2 became a release candidate and I downloaded it for testing. I was pretty disappointed with one aspect of the new data model and I'm requesting that the developers consider a further revision on the data model.
Basically, taking all fields out of the subscriber table like Last_name, first_name, email, timezone, rpid/asserted identity, etc, etc is not the greatest idea. It's a better idea from a database architecture and performance perspective to keep adding columns into that table for data that has a 1 to 1 relationship with a user, and that is common in > 90% of SER's use cases ( i.e. corporate, carrier/VSP.) I would suggest adding voicemail_password, and maybe every other field that is being added into the default attributes script that I saw in CVS recently. If you already know what attributes a user has (and they have a 1 to 1 relationship), then its far better from a db performance perspective to keep these attributes in the user table. I know that the code becomes more complicated, but I think it may be a tradeoff worth discussing.
See this discussion ( http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084... ) between Oracle users and Tom, (an Oracle engineer/architect.) The full text of this discussion is very informative and I highly recommend people read it through.
Tom's conclusion is that the type of data model being discussed, and now being used in SER fails for all but the most trivial of applications. Maybe SER *by itself* qualifies as "trivial" from a database architect's perspective, but think about things like Asterisk integration, which is quite common. You quickly run into some very nasty queries . . .
Please note that I am not a software developer nor a database engineer, just a user who reads a lot, so I'm open to being the ignorant one here, but I thought that this should be discussed among users and developers.
Thanks for considering, Mahatma
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084...
The following is an excerpt from the above link:
Here is a excerpt from my forthcoming book where I talk about this (and show you how ugly, hard and inefficient queries against your very flexible model will be)
(2)Do not use Generic Data Models
Frequently I see applications built on a generic data model for "maximum flexibility" or applications built in ways that prohibit performance. Many times - these are one in the
same thing! For example, it is well known you can represent any object in a database using just four tables:
Create table objects ( oid int primary key, name varchar2(255) );
Create table attributes
( attrId int primary key, attrName varchar2(255), datatype varchar2(25) );
Create table object_Attributes ( oid int, attrId int, value varchar2(4000), primary key(oid,attrId) );
Create table Links ( oid1 int, oid2 int,
primary key (oid1, oid2) );
That's it - no more CREATE TABLE for me! I can fill the attributes table up with rows like this:
insert into attributes values ( 1, 'DATE_OF_BIRTH', 'DATE' );
insert into attributes values ( 2, 'FIRST_NAME', 'STRING' ); insert into attributes values ( 3, 'LAST_NAME', 'STRING' ); commit;
And now I'm ready to create a PERSON record:
insert into objects values ( 1, 'PERSON' ); insert into object_Attributes values( 1, 1, '15-mar-1965' ); insert into object_Attributes values( 1, 2, 'Thomas' ); insert into object_Attributes values( 1, 3, 'Kyte' );
commit;
insert into objects values ( 2, 'PERSON' ); insert into object_Attributes values( 2, 1, '21-oct-1968' ); insert into object_Attributes values( 2, 2, 'John' ); insert into object_Attributes values( 2, 3, 'Smith' );
commit;
And since I'm good at SQL, I can even query this record up to get the FIRST_NAME and LAST_NAME of all PERSON records:
ops$tkyte@ORA920> select max( decode(attrName, 'FIRST_NAME', value, null )) first_name,
2 max( decode( attrName, 'LAST_NAME', value, null ) ) last_name 3 from objects, object_attributes, attributes 4 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME' )
5 and object_attributes.attrId = attributes.attrId 6 and object_attributes.oid = objects.oid 7 and http://objects.nameobjects.name = 'PERSON' 8 group by objects.oid
9 /
FIRST_NAME LAST_NAME
Thomas Kyte John Smith
Looks great, right? I mean, the developers don't have to create tables anymore, we can
add columns at the drop of a hat (just requires an insert into the ATTRIBUTES table). The developers can do whatever they want and the DBA can't stop them. This is ultimate "flexibility". I've seen people try to build entire systems on this model.
But, how does it perform? Miserably, terribly, horribly. A simple "select first_name, last_name from person" query is transformed into a 3-table join with aggregates and all. Further, if the attributes are "NULLABLE" - that is, there might not be a row in
OBJECT_ATTRIBUTES for some attributes, you may have to outer join instead of just joining which in some cases can remove more optimal query plans from consideration.
Writing queries might look pretty straightforward, but it's impossible to do in a
performant fashion. For example, if we wanted to get everyone that was born in MARCH or has a LAST_NAME = 'SMITH', we could simply take the query from above and just wrap an inline view around that:
ops$tkyte@ORA920> select * 2 from ( 3 select max(decode(attrName, 'FIRST_NAME', value, null)) first_name, 4 max(decode(attrName, 'LAST_NAME', value, null)) last_name,
5 max(decode(attrName, 'DATE_OF_BIRTH', value, null)) date_of_birth 6 from objects, object_attributes, attributes 7 where attributes.attrName in ( 'FIRST_NAME', 'LAST_NAME', 'DATE_OF_BIRTH' ) 8 and object_attributes.attrId = attributes.attrId 9 and object_attributes.oid = objects.oid
10 and http://objects.nameobjects.name = 'PERSON' 11 group by objects.oid 12 ) 13 where last_name = 'Smith' 14 or date_of_birth like '%-mar-%'
15 /
FIRST_NAME LAST_NAME DATE_OF_BIRTH
Thomas Kyte 15-mar-1965 John Smith 21-oct-1968
So, it looks "easy" to query, but think about the performance! If you had a couple thousand OBJECT records, and a couple tens of thousands of OBJECT_ATTRIBUTES - Oracle would have to process the entire inner group by query first and then apply the WHERE
clause.
This is not a made up data model, one that I crafted just to make a point. This is an actual data model that I've seen people try to use. Their goal is ultimate flexibility. They don't know what OBJECTS they need, they don't know what ATTRIBUTES they will have.
Well - that is what the database was written for in the first place: Oracle implemented this thing called SQL to define OBJECTS and ATTRIBUTES and lets you use SQL to query them. You are trying to put a generic layer on top of a generic layer - and it fails each
and every time except for the most trivial of applications.
Serusers mailing list mailto:Serusers@lists.iptel.orgSerusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
-- Jiri Kuthan http://iptel.org/~jiri/