Hi Andreas, You are probably one of the people on the list with the most experience with replication. AFAIK, you are correct on all statements below. I assume you have SERs on different locations since TCP connect timeout is a problem? But I'm not sure why removing the cache would help you?! Unless you want to move to a cluster or DB layer replication?
IMHO, there are only two valid paths for replication in SER: Either develop a SIP-layer replication with guaranteed deliveries, queue, non-blocking etc (which ends up being proprietary SER) or patch up SER to better be able to handle DB-based replication. I lean towards DB-based replication. Two prominent things that must be handled: Storing the Path information for proper routing of messages to UAs behind NAT and a cache that checks the DB if location is not found in memory.
I would be very interested in patches for this in the experimental CVS module ;-)
g-)
Andreas Granig wrote:
Hi all,
we use DNSSRV balancing and forward_tcp() to replicate registrations from one SER to the other SERs in the system.
Now when one machine completely crashes, all other SER processes on all other machines hang when processing a REGISTER until tcp-connect times out, leading to a system load of ~16 per machine assuming 16 child processes per SER, and no other messages can be processed.
I understand that replicating using UDP would solve this issue, but then replicated registrations get lost every now and then because of unreliable transmission, and as far as I found out t_replicate() can only be used for replicating to *one* other SER.
This really gets me thinking about patching out the internal location cache and lookup every location from memory, because this additional lookup really doesn't hurt because of ~10 other DB queries per call.
IMHO in systems with more than two SERs this cache is just a big pain.
Andy
Serusers mailing list serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers