I have gathered that RTPEngine has recently, or perhaps some time ago, evolved a recording feature set:
https://kamailio.org/docs/modules/5.0.x/modules/rtpengine.html#rtpengine.f.s...
Does anyone have any experience using it with Kamailio? How does it work? Any gotchas or pitfalls?
It looks like recording is done via a separate daemon specifically for that purpose. Does it emit directly in a playable format, or more or less just dump the raw, RTP-encapsulated frames?
Hi Alex,
I had the opportunity to play around with this recording and it works very well. We have it implemented on production by the way.
The information about how it works, recording formats you can find in the README here (https://github.com/sipwise/rtpengine).
Once you have the recording you need a third-party tool to convert those files to a wav format for instance. It depends on the codec the call is being recorded the convert tool you have to use like ffmpeg.
Another thing to say is for video calls it does not record the video stream just the audio stream.
Regards,
On Thu, Aug 10, 2017 at 11:27 AM, Alex Balashov abalashov@evaristesys.com wrote:
I have gathered that RTPEngine has recently, or perhaps some time ago, evolved a recording feature set:
https://kamailio.org/docs/modules/5.0.x/modules/ rtpengine.html#rtpengine.f.start_recording
Does anyone have any experience using it with Kamailio? How does it work? Any gotchas or pitfalls?
It looks like recording is done via a separate daemon specifically for that purpose. Does it emit directly in a playable format, or more or less just dump the raw, RTP-encapsulated frames?
-- Alex Balashov | Principal | Evariste Systems LLC
Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free) Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
On Thu, Aug 10, 2017 at 11:40:28AM -0400, Alberto Llamas wrote:
Once you have the recording you need a third-party tool to convert those files to a wav format for instance. It depends on the codec the call is being recorded the convert tool you have to use like ffmpeg.
Thank you for your feedback, Alberto.
Extracting the audio payload from RTP frames and mixing them together with 'sox' is not an incredibly difficult task. It works for 90% of simple calls. The problem with this approach to call recording is that certain calls have non-straightforward RTP timing scenarios which would lead to unintelligible garbage if simply interleaving the audio frames together. This is what sets the sophisticated call recording systems apart; they are RTP-aware, and emit playable recordings in an RTP-aware way that takes these nuances into account. RTP can get quite complex.
-- Alex
On 10/08/17 11:40 AM, Alberto Llamas wrote:
Hi Alex,
I had the opportunity to play around with this recording and it works very well. We have it implemented on production by the way.
The information about how it works, recording formats you can find in the README here (https://github.com/sipwise/rtpengine).
Once you have the recording you need a third-party tool to convert those files to a wav format for instance. It depends on the codec the call is being recorded the convert tool you have to use like ffmpeg.
There's actually two distinct recording methods implemented. The one you're describing simply outputs pcap files containing the raw packets involved in the call. The other method involves the aforementioned external recording daemon, which is able to output recordings in a variety of formats. Currently supported is audio output in WAV or MP3 files. Video is also not yet supported (but will be in the future).
Cheers
On Thu, Aug 10, 2017 at 12:22:32PM -0400, Richard Fuchs wrote:
There's actually two distinct recording methods implemented. The one you're describing simply outputs pcap files containing the raw packets involved in the call. The other method involves the aforementioned external recording daemon, which is able to output recordings in a variety of formats. Currently supported is audio output in WAV or MP3 files. Video is also not yet supported (but will be in the future).
I see. Thank you for that insight. Where is this distinction, and the modalities of the latter option, addressed in the documentation?
On 10/08/17 12:23 PM, Alex Balashov wrote:
On Thu, Aug 10, 2017 at 12:22:32PM -0400, Richard Fuchs wrote:
There's actually two distinct recording methods implemented. The one you're describing simply outputs pcap files containing the raw packets involved in the call. The other method involves the aforementioned external recording daemon, which is able to output recordings in a variety of formats. Currently supported is audio output in WAV or MP3 files. Video is also not yet supported (but will be in the future).
I see. Thank you for that insight. Where is this distinction, and the modalities of the latter option, addressed in the documentation?
The relevant config option is called `recording-method`.
The recording daemon itself is not documented yet as this is a fairly new feature and there might still be bugs lurking - use at your own risk.
Cheers
I have succeeded in prototyping a recording setup using the 'proc' method.
However, I've got one issue I can't seem to figure out. On inbound calls only, the inbound (caller) leg on the PSTN side seems to show up interleaved/stuttered in the recording, and also slowed down considerably. This does not happen on outbound calls to the same endpoint, only inbound. The anomalies do not manifest when playing back the audio in Wireshark. The anomalies persist regardless of whether WAV or MP3 output is used.
RTPEngine & recording-daemon are running on a KVM VM. I haven't tried running on bare metal or other VM technology to reproduce. Build is with ffmpeg libraries from rpmfusion for the reasons discussed here:
https://github.com/sipwise/rtpengine/issues/372
-- Alex
Also, the "proc" recording method has an interesting pipeline, involving an ephemeral metadata file and a memory sink exposed through /proc, and a userspace daemon picking up the data and writing it to disk (if this doesn't happen, audio frames going into the sink are discarded).
This pipeline leads to two or three philosophical questions:
1. Is the purpose of this design to allow for writing audio frames in a scatter-gather fashion, so that audio is written serially to storage?
Conventional recording solutions suffer from the problem that under high volume, a large number of file handles are written in parallel. This thrashes the disk. In the days of mechanical disks, this would lead to frequent disk burn-out. In the SSD era the situation is slightly improved, but is still very taxing on the SSD, as I understand it, in terms of write wear/write levelling.
This usually leads to a solution like writing recordings to a tmpfs area and then serially copying them out of there, one at a time.
Would I be correct to assume that this pipeline is designed to address this same problem, but in a different and novel manner?
2. Is the other purpose of a RTP sink to allow the possibility of real-time call intercept and diversion to live playback? If so, are there any plans for the recording daemon to expose an RTSP interface or similar to make this easier?
3. What happens if, under high load and I/O wait conditions, the userspace recording daemon cannot read frames from the sink fast enough, or the CPU encoding workload (-> WAV/MP3) is too high?
According to the documentation, the depth of the sink is only 10 frames:
Packet data is held in kernel memory until retrieved by the userspace component, but only a limited number of packets (default 10) per media stream. If packets are not retrieved in time, they will be simply discarded. This makes it possible to flag all calls to be recorded and then leave it to the userspace component to decided whether to use the packet data for any purpose or not.
Is it possible to increase that depth? Or is this not a concern because the userspace component is implemented in an asynchronous/threaded manner, so frames are retrieved quickly for processing and then enqueued into a local blocking queue?
-- Alex
On 12/08/17 03:49 PM, Alex Balashov wrote:
Also, the "proc" recording method has an interesting pipeline, involving an ephemeral metadata file and a memory sink exposed through /proc, and a userspace daemon picking up the data and writing it to disk (if this doesn't happen, audio frames going into the sink are discarded).
This pipeline leads to two or three philosophical questions:
- Is the purpose of this design to allow for writing audio frames in a
scatter-gather fashion, so that audio is written serially to storage?
The main idea behind this approach was that it makes it feasible with only minimal overhead to enable recording for all calls and then let the recording daemon decide which calls to actually record and in what way. if the recording daemon is not interested in the call at all, then having recording enabled anyway would only have negligible impact on the overall performance. And if the recording daemon is interested, then it has a variety of options on how to deal with the data: write to pcap, decode to playable audio, send off to external system, etc.
It should be noted that the included recording daemon doesn't support all of these options (yet?) and should only be seen as a reference implementation. The interface should be sufficiently simple that other recording solutions can be implemented on top of it.
- Is the other purpose of a RTP sink to allow the possibility of
real-time call intercept and diversion to live playback? If so, are there any plans for the recording daemon to expose an RTSP interface or similar to make this easier?
No plans from our side so far, but as I said above, it should be fairly simple to create an implementation for such a mechanism.
- What happens if, under high load and I/O wait conditions, the
userspace recording daemon cannot read frames from the sink fast enough, or the CPU encoding workload (-> WAV/MP3) is too high?
You would start to lose network frames (RTP packet data).
According to the documentation, the depth of the sink is only 10 frames:
Packet data is held in kernel memory until retrieved by the userspace component, but only a limited number of packets (default 10) per media stream. If packets are not retrieved in time, they will be simply discarded. This makes it possible to flag all calls to be recorded and then leave it to the userspace component to decided whether to use the packet data for any purpose or not.
Is it possible to increase that depth? Or is this not a concern because the userspace component is implemented in an asynchronous/threaded manner, so frames are retrieved quickly for processing and then enqueued into a local blocking queue?
The kernel module supports changing this limit on a per-stream basis when recording is enabled (however, such a switch isn't present in the rtpengine main daemon as of yet), as well as changing the default value of 10 as a module option that can be set when the module is loaded (e.g. through modprobe.d config options).
The included recording daemon is multi threaded, but the possibility that it cannot keep up with the work load exists nevertheless. If you simply queue up everything until it can be processed when there isn't enough processing (or I/O) power available, then you would simply run out of memory at some point.
Cheers