Comment regarding tm change in ser 0.10 inline.
Michal
On Tue, 2006-12-05 at 10:54 +0100, Martin Hoffmann wrote:
Salut,
Jiri Kuthan wrote:
At 23:33 30/11/2006, Martin Hoffmann wrote:
Part of the problem and also of the memory usage problem is that the
database interface of SER requires that the entire table is slurped into
SER's process memory instead of fetching and processing it row by row.
This can cause funny behaviour during start-up and a near heart attack
for the sysadmin.
it's a trade-off. I recall quite some providers who would have had
a heart attack if usrloc was not cached.
This comment wasn't about the caching per se. The database interface
allows you to access all rows as an array. This is rarely if ever
needed. If the interface instead had a function a la
dbf->get_next_row(), you wouldn't need to slurp a table of thousands of
rows into pkg_mem first.
Another short-coming of the database API is that you can't do a "where
expires < now()". This, however, is only a problem if you teach SER not
to delete expired rows from the database and then forget to run the cron
job that does it (Reminds me that I owe Atle a cookie for that one).
(think what happens when
a popular IAD vendor sets its IADs to reregister at 3am)
If you have enough of those, the only thing you can do here is starting
to 503 them. Just an idea: The problem really is that all UDP processes
are stuck waiting for the database and new requests wouldn't get handled
(which causes a re-sent storm that eventually kills you). If one counts
the processes that are stuck, one can write a function that sends a
503 back if only one or two processes are left.
The problem
may not appear on SIP side but on DB side, though.
Basically, you can preload (which is what we do), not to cache
(which under some circumstances may cause real bad heart-attack)
or perhaps something inbetween (less than 100% cache). Given
other bottlenecks and price of memory, prelaoding seems feasibly
the only down side is the loading time. This can be compensated
by a reasonable network design with redundancy.
What you forget here is that your database has a query cache (or should
have). This one is much better suited for this because it can cope with
changes to the database from somewhere else. (We had to use serctl to
update aliases which sometimes didn't work. The resulting script that
tries to insert the alias, then checks whether it is actually there is
quite impressive).
Plus, usrloc is actually only one out of two or three querries you do
per INVITE: does_uri_exist() is probably done on every one (at least if
you have call forwarding) and avp_load() is likely to be done for all
incoming calls (That's 0.9, of course, dunno about 0.10 yet).
What killed me once wasn't usrloc but the avp_load(). And that was only
because the indexes on the table were screwed and the select did a full
table scan every time.
This
leaves the registrar stuff. But that is writing to the database
anyways. What would be more important here is to have it transactional
in a sensible way. They way it works now is that if you have database
problems, you delay your response which makes your UAs re-send the
request which causes more database troubles. (This, BTW, is true for
INVITE processing as well -- here you process your request with all the
checks and database lookups and whatnots only to find out upon t_relay()
that, oops, re-sent INVITE, needs to be dropped, all for nothing).
True, this is not a problem if you use the right db_mode.
I think this is a good place for improvement indeed. We have been
thinking of some aggregation of delayed writes but haven't moved
forward on this yet.
I think a function "t_go_stateful()" might be enough (and use t_reply()
in the registrar). The function checks if a transaction for the request
exists and if so, ends processing right away. Otherwise it creates a
transaction in a prelimary state.
With Ottendorf you can use t_newtran() to start the transaction and
further in the script use functions of tm module like t_relay(),
t_reply() - it does not complain any longer, that the transaction was
started from the script earlier.
So it up to you, how you manage the selection of loosing CPU cycles
either in transactions lookup or in handling of retransmissions.
Well -- it is
certainly possible but you actually just push the problem
from SER cluster to a DB cluster, which may bring you other type of
headache.
Probably, but in this scenario I have several options to solve this,
depending on my actual load. I can start with a central database that is
accessed over the net, later switch to an elaborate scheme with
replication and finally switch to a MySQL cluster-esque solution.
High-performance databases are necessary in other applications, too, and
do exist.
I am a follower of the old Unix strategy that everything does one thing
and one thing only. Providing that data fast enough is the job of the
database.
Regards,
Martin
PS: Should we move this to serdev?
_______________________________________________
Serusers mailing list
Serusers(a)lists.iptel.org
http://lists.iptel.org/mailman/listinfo/serusers