Hi Martin,
Couple of points inline, mostly academic type of discussion, as I largely agree
that this type of optimization is missing the point, and one can do things in
many different ways. IMO the real point is a reasonable cluster design (which
includes DB processing too) and how to tune usrloc is eventually marginal.
-jiri
At 23:33 30/11/2006, Martin Hoffmann wrote:
samuel wrote:
Where is
the time saving coming from then?
I think the idea behind was the following:
The use case is for big providers with lots of entries in the usrloc
database. A restart in such situation might lead to stop in the
service for quite a few minutes (i don't recall the numbers) while the
server is loading the data.
Numbers obviously depend on your hardware and whether you have a local
database. But they are somewhere in the range of 20 seconds for 50,000
entries, 2 minutes for 100,000 entries and 10 minuits for 500,000
entries.
Part of the problem and also of the memory usage problem is that the
database interface of SER requires that the entire table is slurped into
SER's process memory instead of fetching and processing it row by row.
This can cause funny behaviour during start-up and a near heart attack
for the sysadmin.
it's a trade-off. I recall quite some providers who would have had
a heart attack if usrloc was not cached. (think what happens when
a popular IAD vendor sets its IADs to reregister at 3am) The problem
may not appear on SIP side but on DB side, though.
Basically, you can preload (which is what we do), not to cache
(which under some circumstances may cause real bad heart-attack)
or perhaps something inbetween (less than 100% cache). Given
other bottlenecks and price of memory, prelaoding seems feasibly
the only down side is the loading time. This can be compensated
by a reasonable network design with redundancy.
If you split
the data in chunks and load it sequentally, you can start
serving without interrumption...
As far as I understand the announcement (haven't looked at the actual
code), the idea is to load everything inside an extra process. The
problem with that kind of speed-up is that your responses will not be
correct during the loading phase. I am not sure if this is better than
being down as it may cause support calls and false problem alerts. If
you are in a phase of troubles and have to restart often, this wrong
behaviour can go on for hours.
But anyways, in my experience with large scale installations, the whole
caching thing in usrloc is unnecessary. I have it on good authority that
a modern PC can handle more than 100,000 subscribers with a cacheless
usrloc and a local database.
I agree with you on sunny days. The problem is there are rainy days too
and usrloc becomes bad bottleneck with significantly less subs.
I once wrote a replacement module that did
lookup() directly to the database without any usrloc. It was able to
serve substantially more than 100,000 subscribers. (Disclaimer: This
actually depends on your usage patterns. I can't provide CPS values,
though.)
This leaves the registrar stuff. But that is writing to the database
anyways. What would be more important here is to have it transactional
in a sensible way. They way it works now is that if you have database
problems, you delay your response which makes your UAs re-send the
request which causes more database troubles. (This, BTW, is true for
INVITE processing as well -- here you process your request with all the
checks and database lookups and whatnots only to find out upon t_relay()
that, oops, re-sent INVITE, needs to be dropped, all for nothing).
True, this is not a problem if you use the right db_mode.
I think this is a good place for improvement indeed. We have been
thinking of some aggregation of delayed writes but haven't moved
forward on this yet.
But there is another issue and that is reliability. At
a certain point,
you need to have a second SIP server because your superiors read about
the five-nine thing.
I would add that if they don't read about it, they may find themselves
being written about in popular magazines :-)
IMHO the easiest way to set this up is by having
several servers doing the exact same thing and then load balancing
traffic between them. This is only possible if you have a cacheless
usrloc and if registrations are written to the database ASAP.
Well -- it is certainly possible but you actually just push the problem
from SER cluster to a DB cluster, which may bring you other type of
headache.
So, I do think that this cache is one of those
optimizations that look
good on paper but in practice are missing the point. That, of course,
are just my sixteen øre. And just if someone cares to know, we are using
Andreas' usrloc-cl in production and appart from a segfault I introduced
while porting in our changes, it runs very smoothly.
Regards,
Martin
_______________________________________________
Serusers mailing list
Serusers(a)lists.iptel.org
http://lists.iptel.org/mailman/listinfo/serusers
--
Jiri Kuthan
http://iptel.org/~jiri/