Salut,
Jiri Kuthan wrote:
At 23:33 30/11/2006, Martin Hoffmann wrote:
Part of the problem and also of the memory usage problem is that the database interface of SER requires that the entire table is slurped into SER's process memory instead of fetching and processing it row by row. This can cause funny behaviour during start-up and a near heart attack for the sysadmin.
it's a trade-off. I recall quite some providers who would have had a heart attack if usrloc was not cached.
This comment wasn't about the caching per se. The database interface allows you to access all rows as an array. This is rarely if ever needed. If the interface instead had a function a la dbf->get_next_row(), you wouldn't need to slurp a table of thousands of rows into pkg_mem first.
Another short-coming of the database API is that you can't do a "where expires < now()". This, however, is only a problem if you teach SER not to delete expired rows from the database and then forget to run the cron job that does it (Reminds me that I owe Atle a cookie for that one).
(think what happens when a popular IAD vendor sets its IADs to reregister at 3am)
If you have enough of those, the only thing you can do here is starting to 503 them. Just an idea: The problem really is that all UDP processes are stuck waiting for the database and new requests wouldn't get handled (which causes a re-sent storm that eventually kills you). If one counts the processes that are stuck, one can write a function that sends a 503 back if only one or two processes are left.
The problem may not appear on SIP side but on DB side, though.
Basically, you can preload (which is what we do), not to cache (which under some circumstances may cause real bad heart-attack) or perhaps something inbetween (less than 100% cache). Given other bottlenecks and price of memory, prelaoding seems feasibly the only down side is the loading time. This can be compensated by a reasonable network design with redundancy.
What you forget here is that your database has a query cache (or should have). This one is much better suited for this because it can cope with changes to the database from somewhere else. (We had to use serctl to update aliases which sometimes didn't work. The resulting script that tries to insert the alias, then checks whether it is actually there is quite impressive).
Plus, usrloc is actually only one out of two or three querries you do per INVITE: does_uri_exist() is probably done on every one (at least if you have call forwarding) and avp_load() is likely to be done for all incoming calls (That's 0.9, of course, dunno about 0.10 yet).
What killed me once wasn't usrloc but the avp_load(). And that was only because the indexes on the table were screwed and the select did a full table scan every time.
This leaves the registrar stuff. But that is writing to the database anyways. What would be more important here is to have it transactional in a sensible way. They way it works now is that if you have database problems, you delay your response which makes your UAs re-send the request which causes more database troubles. (This, BTW, is true for INVITE processing as well -- here you process your request with all the checks and database lookups and whatnots only to find out upon t_relay() that, oops, re-sent INVITE, needs to be dropped, all for nothing). True, this is not a problem if you use the right db_mode.
I think this is a good place for improvement indeed. We have been thinking of some aggregation of delayed writes but haven't moved forward on this yet.
I think a function "t_go_stateful()" might be enough (and use t_reply() in the registrar). The function checks if a transaction for the request exists and if so, ends processing right away. Otherwise it creates a transaction in a prelimary state.
Well -- it is certainly possible but you actually just push the problem from SER cluster to a DB cluster, which may bring you other type of headache.
Probably, but in this scenario I have several options to solve this, depending on my actual load. I can start with a central database that is accessed over the net, later switch to an elaborate scheme with replication and finally switch to a MySQL cluster-esque solution. High-performance databases are necessary in other applications, too, and do exist.
I am a follower of the old Unix strategy that everything does one thing and one thing only. Providing that data fast enough is the job of the database.
Regards, Martin
PS: Should we move this to serdev?