I am seeing a strange error on registration.
My usrloc table looks/looked like this:
root@ser-1:~# serctl showdb Online Accounts: +-------------------+------------------------------+---------------------+ | username | contact | expires | +-------------------+------------------------------+---------------------+ | alex@adomain.net | sip:alex@222.11.222.111:5060 | 2004-03-29 00:48:57 | +-------------------+------------------------------+---------------------+
(note the entry has expired a long while ago, and is from the same address at which I'm now trying to register).
When I try and register, I get:
->REGISTER sip:adomain.net <-401 Unauthorized (fine, needs digest) ->REGISTER sip.adomain.net <-500 Internal server error
The interesting component is: "P-Registrar-Error: usrloc_contact_insert failed"
And in the logs I see:
Apr 3 18:31:26 ser-1 ser[1855]: submit_query(): Duplicate entry 'alex-adomain.net-sip:alex@222.11.222.111:5060' for key 1 Apr 3 18:31:26 ser-1 ser[1855]: insert_row(): Error while submitting query Apr 3 18:31:26 ser-1 ser[1855]: db_insert_ucontact(): Error while inserting contact Apr 3 18:31:26 ser-1 ser[1855]: insert_ucontact(): Error while inserting in database Apr 3 18:31:26 ser-1 ser[1855]: insert(): Error while inserting contact Apr 3 18:31:26 ser-1 ser[1855]: contacts(): Error while inserting record
Now I'd suspect a database problem, config problem, or similar, except if I use another user, it works straight away:
root@ser-1:~# serctl showdb Online Accounts: +-------------------+------------------------------+---------------------+ | username | contact | expires | +-------------------+------------------------------+---------------------+ | alex@adomain.net | sip:alex@222.11.222.111:5060 | 2004-03-29 00:48:57 | | joel@adomain.net | sip:joel@222.11.222.111:5060 | 2004-04-03 19:09:43 | +-------------------+------------------------------+---------------------+
It's almost as if ser is not overwriting expired usrloc table entries. I could wipe the entry out the table, but that's hardly the point. It's also as if ser is not cleaning the entry out the table even if it expires.
I am running: root@ser-1:~# ser -V version: ser 0.8.12-tcp_nonb (i386/linux) flags: STATS:Off, USE_IPV6, USE_TCP, DISABLE_NAGLE, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
UA = X-Lite
Alex
--On 03 April 2004 18:51 +0100 Alex Bligh alex@alex.org.uk wrote:
I am seeing a strange error on registration.
My usrloc table looks/looked like this:
root@ser-1:~# serctl showdb Online Accounts: +-------------------+------------------------------+---------------------+ | username | contact | expires | +-------------------+------------------------------+---------------------+ | alex@adomain.net | sip:alex@222.11.222.111:5060 | 2004-03-29 00:48:57 | +-------------------+------------------------------+---------------------+
(note the entry has expired a long while ago, and is from the same address at which I'm now trying to register).
Oh & one further bit of information: "serctl ul show" does /not/ show this entry. Which I think means it's in the database on disk but not in RAM.
Alex
No, it does not show the entry because it is expired. What is the value of flags field of that line ?
Situation like this can happen when there are some problems connecting the database, I would suggest you to wipe the line out. Let me know if it happens again.
Jan.
On 03-04 21:10, Alex Bligh wrote:
--On 03 April 2004 18:51 +0100 Alex Bligh alex@alex.org.uk wrote:
I am seeing a strange error on registration.
My usrloc table looks/looked like this:
root@ser-1:~# serctl showdb Online Accounts: +-------------------+------------------------------+---------------------+ | username | contact | expires | +-------------------+------------------------------+---------------------+ | alex@adomain.net | sip:alex@222.11.222.111:5060 | 2004-03-29 00:48:57 | +-------------------+------------------------------+---------------------+
(note the entry has expired a long while ago, and is from the same address at which I'm now trying to register).
Oh & one further bit of information: "serctl ul show" does /not/ show this entry. Which I think means it's in the database on disk but not in RAM.
Alex
Serusers mailing list serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
ser does only delete expired contact if they are in the ram, furthermore the new entry can't be inserted into the location table, as (username,contact,domain) are a key, so the db insert fails.
I see 2 reasons: - ser doesn't the location database at startup - ser does delete the expired contact from RAM, but not from the db
Klaus
Alex Bligh wrote:
--On 03 April 2004 18:51 +0100 Alex Bligh alex@alex.org.uk wrote:
I am seeing a strange error on registration.
My usrloc table looks/looked like this:
root@ser-1:~# serctl showdb Online Accounts: +-------------------+------------------------------+---------------------+
| username | contact | expires | +-------------------+------------------------------+---------------------+
| alex@adomain.net | sip:alex@222.11.222.111:5060 | 2004-03-29 00:48:57 | +-------------------+------------------------------+---------------------+
(note the entry has expired a long while ago, and is from the same address at which I'm now trying to register).
Oh & one further bit of information: "serctl ul show" does /not/ show this entry. Which I think means it's in the database on disk but not in RAM.
Alex
Serusers mailing list serusers@lists.iptel.org http://lists.iptel.org/mailman/listinfo/serusers
Jan,
No, it does not show the entry because it is expired. What is the value of flags field of that line ?
flags is set to 0 (like the other entries).
Situation like this can happen when there are some problems connecting the database, I would suggest you to wipe the line out. Let me know if it happens again.
... and ...
--On 04 April 2004 20:47 +0200 Klaus Darilion klaus.mailinglists@pernau.at wrote:
ser does only delete expired contact if they are in the ram, furthermore the new entry can't be inserted into the location table, as (username,contact,domain) are a key, so the db insert fails.
I see 2 reasons:
- ser doesn't the location database at startup
- ser does delete the expired contact from RAM, but not from the db
Indeed - it seems ser has got into a state where the state has been wiped in RAM, but not in the database.
I realise I can fix the symptom just by deleting the appropriate line. However, this won't fix the underlying problem, which I think is a) ser shouldn't get into this state (probably inevitable if ser crashes, and I think it could have got into this state from me deliberately testing failover by being nasty to the boxes)
b) ser does not recover gracefully from this state (requires manual intervention). I /think/ what is happening here (and I haven't done a full code inspection) is ser tries to write into the database, it fails because a line with the same (username, domain, contact) 3-tuple is already there, but then it can as that entry is already expired, it never seems to get purged, and ser gets stuck.
I wonder if the solution is that if a database write fails, see if an entry with the same 3-tuple is already there, if it is, check (manually on expiry date) to see if it has expired, and if so, delete it, and reinsert the original entry.
I'm sort of perplexed as to why this particular long-expired entry hasn't just got deleted through expiry.
Alex
On 04-04 20:13, Alex Bligh wrote:
Jan,
No, it does not show the entry because it is expired. What is the value of flags field of that line ?
flags is set to 0 (like the other entries).
OK.
Situation like this can happen when there are some problems connecting the database, I would suggest you to wipe the line out. Let me know if it happens again.
... and ...
--On 04 April 2004 20:47 +0200 Klaus Darilion klaus.mailinglists@pernau.at wrote:
ser does only delete expired contact if they are in the ram, furthermore the new entry can't be inserted into the location table, as (username,contact,domain) are a key, so the db insert fails.
I see 2 reasons:
- ser doesn't the location database at startup
- ser does delete the expired contact from RAM, but not from the db
Indeed - it seems ser has got into a state where the state has been wiped in RAM, but not in the database.
I realise I can fix the symptom just by deleting the appropriate line. However, this won't fix the underlying problem, which I think is a) ser shouldn't get into this state (probably inevitable if ser crashes, and I think it could have got into this state from me deliberately testing failover by being nasty to the boxes)
No, it should not get into this state. All expired entries should be deleted in the next timer run once they are loaded into the memory. To find out why it does not happen in this case, enable debugging, restart the server and let it run (until the cleanup timer hits). It should print some error messages.
b) ser does not recover gracefully from this state (requires manual intervention). I /think/ what is happening here (and I haven't done a full code inspection) is ser tries to write into the database, it fails because a line with the same (username, domain, contact) 3-tuple is already there, but then it can as that entry is already expired, it never seems to get purged, and ser gets stuck.
No, this should not happen. When you restart the server it loads all the contacts from the database into memory. If an update comes (REGISTER message) it will update the entry in the memory and then it tries to update the entry in the database. The entry in the memory is valid even if the database update fails. That way ser can work even if database connection is broken.
The only situation I can imagine this could happen is when database delete fails. The entry is removed from the cache but it is still present in the database. In such a situation subsequent inserts and updates will fail. But ser should recover after restart because it will reload all entries.
Check if there is any error message during re-loading at startup or during the cleanup timer.
I wonder if the solution is that if a database write fails, see if an entry with the same 3-tuple is already there, if it is, check (manually on expiry date) to see if it has expired, and if so, delete it, and reinsert the original entry.
I'm sort of perplexed as to why this particular long-expired entry hasn't just got deleted through expiry.
Again, enable debugging and check the logs.
Jan.
Jan,
No, it should not get into this state. All expired entries should be deleted in the next timer run once they are loaded into the memory. To find out why it does not happen in this case, enable debugging, restart the server and let it run (until the cleanup timer hits). It should print some error messages.
OK - I have been deliberately avoiding restarting ser in case it clears the condition just through the restart, and I lose some data that would otherwise be useful for debugging. Can you confirm there's nothing else useful I can find? The server hasn't been restarted since this happened.
b) ser does not recover gracefully from this state (requires manual intervention). I /think/ what is happening here (and I haven't done a full code inspection) is ser tries to write into the database, it fails because a line with the same (username, domain, contact) 3-tuple is already there, but then it can as that entry is already expired, it never seems to get purged, and ser gets stuck.
No, this should not happen. When you restart the server it loads all the contacts from the database into memory. If an update comes (REGISTER message) it will update the entry in the memory and then it tries to update the entry in the database. The entry in the memory is valid even if the database update fails. That way ser can work even if database connection is broken.
That (the entry in memory being valid even if the database update fails) /doesn't/ seem to be happening. The database update is failing, and that is causing ser to 500 the REGISTER, and the internal memory copy does not update (or "serctl ul show" would show the new copy). So it seems that there can be at least one database update failure mode where the entry in memory does not update, and ser does not continue to work.
I'm using db_mode 1 if that helps any.
The only situation I can imagine this could happen is when database delete fails. The entry is removed from the cache but it is still present in the database. In such a situation subsequent inserts and updates will fail. But ser should recover after restart because it will reload all entries.
OK I'll try that when we know all the data is there (though I guess I could simulate it by manually inserting a bogus line into the mysql table).
Alex
On 04-04 20:49, Alex Bligh wrote:
Jan,
No, it should not get into this state. All expired entries should be deleted in the next timer run once they are loaded into the memory. To find out why it does not happen in this case, enable debugging, restart the server and let it run (until the cleanup timer hits). It should print some error messages.
OK - I have been deliberately avoiding restarting ser in case it clears the condition just through the restart, and I lose some data that would otherwise be useful for debugging. Can you confirm there's nothing else useful I can find? The server hasn't been restarted since this happened.
I see, please restart the server to see if it disappears. If yes then it is working as expected, if no then there is a bug in usrloc.
No, this should not happen. When you restart the server it loads all the contacts from the database into memory. If an update comes (REGISTER message) it will update the entry in the memory and then it tries to update the entry in the database. The entry in the memory is valid even if the database update fails. That way ser can work even if database connection is broken.
That (the entry in memory being valid even if the database update fails) /doesn't/ seem to be happening. The database update is failing, and that is causing ser to 500 the REGISTER, and the internal memory copy does not update (or "serctl ul show" would show the new copy). So it seems that there can be at least one database update failure mode where the entry in memory does not update, and ser does not continue to work.
I'm using db_mode 1 if that helps any.
OK, I thought that you were using db_mode 2. Mode 2 will not complain about this because it does all writes in background. Mode one is does all reads and writes immediately so if db fails then the request fails.
You are right, in db_mode 1 ser should recover, I will fix this, thanks a lot !
Jan.
Jan,
I see, please restart the server to see if it disappears. If yes then it is working as expected, if no then there is a bug in usrloc.
Yes, it disappeared about 20 seconds after (re)start with:
Apr 5 18:45:54 ser-1 ser[7237]: Binding 'alex@adomain.net','sip:alex@222.11.222.111:5060' has expired
OK, I thought that you were using db_mode 2. Mode 2 will not complain about this because it does all writes in background. Mode one is does all reads and writes immediately so if db fails then the request fails.
You are right, in db_mode 1 ser should recover, I will fix this, thanks a lot !
Thanks. We are using mode 1 to facilitate immediate failover. As you say, it looks like the error from doing the write synchronously is causing the problem.
Alex