Hi Greger. On October 25, 2005, you wrote:
At least one thing is for sure: I have now registered a two-digit number of people who are struggling with and proposing various solutions to load balancing/failover. We really need to find a solution soon, so all these bright people can spend their resources on tackling problems that will bring ser even further!! Such a solution should be a "best practice" that is *good enough*, and I would vote for simplicity. People who really need that top-performance/hardware are capable of tuning and fixing themselves (and maybe improve best-practice along the way), all the others need a simple, well-described setup. That is why I have been working with Andreas to try to mirror his efforts in an onsip.org setup that we will document and make available as soon as it is ready for prime time. I suggest that anybody who have opinions or suggestions put them forward so that everything will be taken into consideration.
I was wondering what progress has been made on this front. Obviously the documentation isn't ready since it hasn't been released, but I'd be interested in reading what's been done so far, and contributing my knowledge wherever I can.
Cheers, -- Nick e: nick.hoffman@altcall.com p: +61 7 5591 3588 f: +61 7 5591 6588
If you receive this email by mistake, please notify us and do not make any use of the email. We do not waive any privilege, confidentiality or copyright associated with it.
Well, usrloc-cl is in experimental module and latest version of openser has db-mode 3 (the same) and path implementation, so Andreas' setup is possible. The thing is, this way of doing it makes sense for some setups and for a small-scale setup (like 1000 accounts), the setup is a bit complicated. It also requires mysql cluster competence etc etc. Maybe it's the best, I don't know, people have different preferences. http://www.iptel.org/drupal/failover_redundancy_and_scalability_overview is an attempt at starting to describe the various setups and get people's input and maybe reach a consensus on what can be said to be a best practice setup. We need some reported experiences and people's input, so please discuss, add/change the description at the above link. So, your contribution is very welcome. We had an author for the best practice reference, but that person moved to another company than the one with focus on SER and redundancy/failover and was not interested in taking on the task. Open spot! ;-) g-)
Nick Hoffman wrote:
Hi Greger. On October 25, 2005, you wrote:
At least one thing is for sure: I have now registered a two-digit number of people who are struggling with and proposing various solutions to load balancing/failover. We really need to find a solution soon, so all these bright people can spend their resources on tackling problems that will bring ser even further!! Such a solution should be a "best practice" that is *good enough*, and I would vote for simplicity. People who really need that top-performance/hardware are capable of tuning and fixing themselves (and maybe improve best-practice along the way), all the others need a simple, well-described setup. That is why I have been working with Andreas to try to mirror his efforts in an onsip.org setup that we will document and make available as soon as it is ready for prime time. I suggest that anybody who have opinions or suggestions put them forward so that everything will be taken into consideration.
I was wondering what progress has been made on this front. Obviously the documentation isn't ready since it hasn't been released, but I'd be interested in reading what's been done so far, and contributing my knowledge wherever I can.
Cheers, -- Nick e: nick.hoffman@altcall.com p: +61 7 5591 3588 f: +61 7 5591 6588
If you receive this email by mistake, please notify us and do not make any use of the email. We do not waive any privilege, confidentiality or copyright associated with it.
Greger V. Teigre wrote:
Well, usrloc-cl is in experimental module and latest version of openser has db-mode 3 (the same) and path implementation, so Andreas' setup is possible. The thing is, this way of doing it makes sense for some setups and for a small-scale setup (like 1000 accounts), the setup is a bit complicated. It also requires mysql cluster competence etc etc.
Well, our system is designed for 100.000+ users with the possibility to scale to an arbitrary number... and it works pretty well... an overview of the system is given in chapter 7 of http://linguin.org/thesis.php
It's quite simple. What you need is some basic knowledge in heartbeat or any other FMS and in MySQL-Cluster.
Basic tests with one balancer and two proxies revealed a rate of ~160 call attempts per second (with a quite complex routing config with
1.000 lines) and ~600 registrations per second, so this should be
enough for most of the setups out there...
Andy
Hi Andreas, Don't take me wrong. Your setup sounds very robust and well thought-out to me. I just pointed out that for small-scale setups (like an SMB or enterprise installation), one might want to find a setup involing fewer components and thus reduce the number of areas competence is required. Also, my reference to "makes sense for some setups" is not an evaluation, more a reflection over all the various setups people have opted for. SER/openSER, in my experience, for most users the problem is not lack of features, but quite the opposite... so some simplications in the form of best practice setups would do good. Just like the onsip.org configs. g-)
Andreas Granig wrote:
Greger V. Teigre wrote:
Well, usrloc-cl is in experimental module and latest version of openser has db-mode 3 (the same) and path implementation, so Andreas' setup is possible. The thing is, this way of doing it makes sense for some setups and for a small-scale setup (like 1000 accounts), the setup is a bit complicated. It also requires mysql cluster competence etc etc.
Well, our system is designed for 100.000+ users with the possibility to scale to an arbitrary number... and it works pretty well... an overview of the system is given in chapter 7 of http://linguin.org/thesis.php
It's quite simple. What you need is some basic knowledge in heartbeat or any other FMS and in MySQL-Cluster.
Basic tests with one balancer and two proxies revealed a rate of ~160 call attempts per second (with a quite complex routing config with
1.000 lines) and ~600 registrations per second, so this should be
enough for most of the setups out there...
Andy
Greger V. Teigre wrote:
Don't take me wrong. Your setup sounds very robust and well thought-out to me. I just pointed out that for small-scale setups (like an SMB or enterprise installation), one might want to find a setup involing fewer components and thus reduce the number of areas competence is required.
I won't take you wrong ;o) Just wanted to point out that with today's ser/openser it's absolutely possible to also deploy large scale systems (opposed to, say, a year ago)...
Of course it's always a matter of what you currently need, how big you plan to grow, and how much time/money you are willing to invest...
Andy
Andreas Granig writes:
It's quite simple. What you need is some basic knowledge in heartbeat or any other FMS and in MySQL-Cluster.
which version of mysql cluster you are using? last time i tested with 4.1.15, after disconnecting ethernet cable from one of the two database hosts of the cluster, it took more than one minute before cluster became operational again (now using only one database host). in my opinion, disconnecting ethernet cable from one database host should not cause any interruption of the service. i complained about this on mysql cluster mailing list and they said that this kind of interruption is "normal" for mysql cluster.
-- juha
In the cluster world, a delay for failover is quite normal. One minute seems like a long time, but Oracle takes minimum 90 seconds to failover in cluster mode (and I've seen it take as long as three minutes).
That's standard active/passive behaviour.
What you're looking for is active/active. There are very few things that handle that capability well, and even fewer of them are databases. Oracle Parallel Server does it, but it's the single most annoying install of any software I've ever done in my life. That, and of course, because it's oracle, they charge you money for a license for both machines simultaneously because they're both running, plus the cluster license... so an OPS install will run a minimum of a half million US dollars or more.
While I agree that a minute seems like a long time, it's probably a lot more acceptable that just running one DB and having that server die with no failover.... or a manual failover.
N.
On Sun, 16 Jul 2006 12:44:28 +0300, Juha Heinanen wrote
Andreas Granig writes:
It's quite simple. What you need is some basic knowledge in
heartbeat or > any other FMS and in MySQL-Cluster.
which version of mysql cluster you are using? last time i tested with 4.1.15, after disconnecting ethernet cable from one of the two database hosts of the cluster, it took more than one minute before cluster became operational again (now using only one database host). in my opinion, disconnecting ethernet cable from one database host should not cause any interruption of the service. i complained about this on mysql cluster mailing list and they said that this kind of interruption is "normal" for mysql cluster.
-- juha _______________________________________________ Serusers mailing list Serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Juha Heinanen wrote:
which version of mysql cluster you are using? last time i tested with 4.1.15, after disconnecting ethernet cable from one of the two database hosts of the cluster, it took more than one minute before cluster became operational again (now using only one database host).
4.1.x was quite faulty, so I started with 5.0.15 and now use 5.0.22. I use two nodegroups with two nodes each, and each node is connected to the LAN with two NICs in active/backup bonding mode, so disconnecting one cable doesn't harm.
If you shoot one node in this setup, the cluster stays operational. If two nodes from different node groups get down simultaneously (in case of a power failure of one of the two UPS for example), it also may take up to a minute until the cluster is up again. If two nodes of one node group get down, then the cluster obviously fails.
This setup works fine for me, because you can always take one node at a time down for maintenance without effecting the system.
Andy