Hi all,
We ran into an issue with dialplan in regex mode when we leave the repl_exp empty. Consider this scenario:
if(dp_translate("1", "$rU/$var(out)") { # translation successful $rU = $var(out); }
If the dialplan entry has match_exp="^999" and subst_exp="^999" and repl_exp="" and my $rU is "999123", I'd expect $var(out) to be "123", because an empty replacement part is perfectly valid for a regex. The result however is that $var(out) is untouched, so whatever $var(out) was before the function call, will end up in $rU in this case. That's not what I'd expect if dp_translate returns success.
It should be fairly easy to fix and we can provide a patch for that, I'd just want to get some feedback if you agree on changing dp_translate behavior in such a case, as it might break existing scripts if you rely on the current way of working.
Comments?
Andreas
Andreas Granig writes:
If the dialplan entry has match_exp="^999" and subst_exp="^999" and repl_exp="" and my $rU is "999123", I'd expect $var(out) to be "123", because an empty replacement part is perfectly valid for a regex.
repl_exp is not regex. it is replacement expression where you can refer to () parts of subt_exp as \1, \2, etc. if repl_exp is empty, result is empty.
- juha
On 05/07/12 14:11, Juha Heinanen wrote:
Andreas Granig writes:
If the dialplan entry has match_exp="^999" and subst_exp="^999" and repl_exp="" and my $rU is "999123", I'd expect $var(out) to be "123", because an empty replacement part is perfectly valid for a regex.
repl_exp is not regex. it is replacement expression where you can refer to () parts of subt_exp as \1, \2, etc. if repl_exp is empty, result is empty.
But why are parts NOT matched by the subst_exp stripped from the output string? With a subst_exp of "(.)" and a repl_exp of "\1", I'd expect the output string to be unchanged.
BR Richard
Richard Fuchs writes:
But why are parts NOT matched by the subst_exp stripped from the output string? With a subst_exp of "(.)" and a repl_exp of "\1", I'd expect the output string to be unchanged.
if you have subst_exp (.*) and repl_exp \1, then you will get the whole thing as result.
-- juha
On 05/07/12 14:28, Juha Heinanen wrote:
Richard Fuchs writes:
But why are parts NOT matched by the subst_exp stripped from the output string? With a subst_exp of "(.)" and a repl_exp of "\1", I'd expect the output string to be unchanged.
if you have subst_exp (.*) and repl_exp \1, then you will get the whole thing as result.
I know that, but that doesn't answer my question. :) Regex substitutions work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves the string unchanged. Why is the dialplan module different?
BR Richard
Richard Fuchs writes:
I know that, but that doesn't answer my question. :) Regex substitutions work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves the string unchanged. Why is the dialplan module different?
are you sure that (.) does the trick? i have used (.*) and that works ok.
-- juha
Hello,
On 5/7/12 8:57 PM, Juha Heinanen wrote:
Richard Fuchs writes:
I know that, but that doesn't answer my question. :) Regex substitutions work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves the string unchanged. Why is the dialplan module different?
are you sure that (.) does the trick? i have used (.*) and that works ok.
indeed the .* is at least the posix standard way to match everything, '.' being for matching one single character.
Replacements in configuration file/dialplan do not use external library for substitution, only for matching (posix regexp for core/textops which is in libc and libpcre for dialplan). The replacement itself is made via a function from the core (iirc, Andrei Pelinescu-Onciul implemented it in the very early days of ser).
I am not that familiar with perl/sed and their full substitution rules, but in Kamailio, practically the subst_exp is supposed to break the matched value in tokens and then back-references in repl_exp can be used to build the new value.
Maybe I got used to this kind of model, to group parts of matches values and everything went fine for me.
For what Andreas exemplified in the first email in this thread, I would have used:
subst="^999(.*)" repl="\1"
I would consider a bug if there is no way to remove the full value (i.e., set the result to empty string), like no change will happen with: subst="(.*)" repl=""
Personally, I would not mind an update to get to a more common behaviour, if it will be properly documented and referenced to other well established languages/tutorials. So far the term was 'perl-like substitutions' not 'perl substitutions' more for syntax/idea. Also, we have our specific behaviour, the repl expression can include cfg variables (e.g., $avp(...), $var(...), ...) that are expanded when building the result.
However, I think it is too late for 3.3.0, because it will introduce lot of changes, perhaps many in the behaviour as well, and we are already 2 weeks in the testing phase.
Cheers, Daniel
On 05/07/12 16:47, Daniel-Constantin Mierla wrote:
Hello,
On 5/7/12 8:57 PM, Juha Heinanen wrote:
Richard Fuchs writes:
I know that, but that doesn't answer my question. :) Regex substitutions work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves the string unchanged. Why is the dialplan module different?
are you sure that (.) does the trick? i have used (.*) and that works ok.
indeed the .* is at least the posix standard way to match everything, '.' being for matching one single character.
Yes, with the point here being that a PCRE substitution commonly only works on the part of the string that the RE matched. Other parts would be left untouched. For example, you could use an RE of "^." (or even just ".") and a replacement string of nothing to simply strip away the first character.
tabasco:~# echo foobar | sed -e 's/^.//' oobar tabasco:~#
I would consider a bug if there is no way to remove the full value (i.e., set the result to empty string), like no change will happen with: subst="(.*)" repl=""
This would still be valid and would do the same thing.
Personally, I would not mind an update to get to a more common behaviour, if it will be properly documented and referenced to other well established languages/tutorials. So far the term was 'perl-like substitutions' not 'perl substitutions' more for syntax/idea.
Of course, but like I said, so far all Perl-like RE engines I've come across (PHP is another example with its preg_replace(), Python has re.sub(), etc) work by only replacing the matched part of the string. It's a search-and-replace operation, not a break-down-and-create-new-string operation.
Also, we have our specific behaviour, the repl expression can include cfg variables (e.g., $avp(...), $var(...), ...) that are expanded when building the result.
However, I think it is too late for 3.3.0, because it will introduce lot of changes, perhaps many in the behaviour as well, and we are already 2 weeks in the testing phase.
Understood. I've originally thought it was simply a bug because it didn't do what I'd expected, but I guess it's an old design decision. Thanks for clarification. Maybe for 3.4 then :)
BR Richard
Richard Fuchs writes:
Understood. I've originally thought it was simply a bug because it didn't do what I'd expected, but I guess it's an old design decision. Thanks for clarification. Maybe for 3.4 then :)
i don't see any problem with this design decision because all results are obtainable with it. it does not matter to me if it is compatible with sed/perl or whatever. if it matters to you, for 3.4 you are welcome to introduce another mode of repl_exp behavior as long as the current one remains the default.
-- juha
Hello,
Following up on this, now that master is open for changes again, here's a patch of how I'd suggest to change the behavior of how dialplan does regular expression substitution.
As a short summary, this patch is trying to achieve the following:
Up until now, the dialplan module performs RE substitution in a "break down string and construct new string" fashion, in other words RE sub-patterns are used to extract certain pieces from the original string, and then a new string is built from those pieces to replace the original string. Side effect of this is that parts of the string that the RE didn't match are discarded. While this works to perform all necessary operations for string rewriting, there's two drawbacks to this approach:
1) It's counter-intuitive to anybody who's ever used RE substitution in any other tool or language. Be it in VIM, SED, BASH, Perl, PHP, Python, you name it, they all do RE substitution as a "search and replace" operation. Whatever is matched by the RE is replaced by the replacement pattern, but anything not matched by the RE is left untouched. The dialplan module however would discard those non-matched parts.
2) There's a slight performance impact in many use cases, namely in those that only perform action on a known prefix or suffix. Most commonly, the user wants to match against a certain prefix or suffix and then either strip it out or replace it with something else, while leaving the rest of the string alone, or alternatively simply prepend or append something to the string. As it is now, the RE has to be constructed so that it always matches the whole string, with the parts of the string that are to be left alone captured in a sub-pattern. With this patch, the RE can be constructed so that it only matches the part of the string that you're interested in, meaning the PCRE engine has less work to do.
(More performance savings are possible by unifying match_exp and subst_exp, as it doesn't seem to make sense to first match on one part of the string and then perform substitution on some other part. There might be some use cases where it comes in useful, but I'd say the benefits of doing only one RE match vs. two outweighs that. But that would be another patch anyway.)
Here's a few examples of common substitution patterns (subst_exp and repl_exp pairs) and how they can be simplified with this patch:
Stripping prefix: old: "^00(.*)" -> "\1" new: "^00" -> ""
Replacing prefix: old: "^+(.*)" -> "00\1" new: "^+" -> "00"
Prepending new prefix: old: "(.*)" -> "00\1" new: "^" -> "00"
It should be noted that all "old" patterns from those examples will continue to work as they did before even with the patch applied. More generally, all patterns that are designed to always match the complete string will continue to work unchanged, providing backwards compatibility. Only patterns that make deliberate use of the side-effect of stripping out unmatched parts of the string will break, but I don't think there's a whole lot of those out there in the wild. However, there's always the possibility of adding a new module option to act as a new behavior vs. old behavior switch if desired.
As for the patch itself, it looks like it's more than it actually is, because it mostly moves things around. If you apply it and look at it with diff -b, it gives you a clearer view of what's changed.
Comments welcome.
cheers Richard
On 05/07/12 16:47, Daniel-Constantin Mierla wrote:
Hello,
On 5/7/12 8:57 PM, Juha Heinanen wrote:
Richard Fuchs writes:
I know that, but that doesn't answer my question. :) Regex substitutions work the same everywhere, s/(.)/\1/ in sed or Perl for example leaves the string unchanged. Why is the dialplan module different?
are you sure that (.) does the trick? i have used (.*) and that works ok.
indeed the .* is at least the posix standard way to match everything, '.' being for matching one single character.
Replacements in configuration file/dialplan do not use external library for substitution, only for matching (posix regexp for core/textops which is in libc and libpcre for dialplan). The replacement itself is made via a function from the core (iirc, Andrei Pelinescu-Onciul implemented it in the very early days of ser).
I am not that familiar with perl/sed and their full substitution rules, but in Kamailio, practically the subst_exp is supposed to break the matched value in tokens and then back-references in repl_exp can be used to build the new value.
Maybe I got used to this kind of model, to group parts of matches values and everything went fine for me.
For what Andreas exemplified in the first email in this thread, I would have used:
subst="^999(.*)" repl="\1"
I would consider a bug if there is no way to remove the full value (i.e., set the result to empty string), like no change will happen with: subst="(.*)" repl=""
Personally, I would not mind an update to get to a more common behaviour, if it will be properly documented and referenced to other well established languages/tutorials. So far the term was 'perl-like substitutions' not 'perl substitutions' more for syntax/idea. Also, we have our specific behaviour, the repl expression can include cfg variables (e.g., $avp(...), $var(...), ...) that are expanded when building the result.
However, I think it is too late for 3.3.0, because it will introduce lot of changes, perhaps many in the behaviour as well, and we are already 2 weeks in the testing phase.
Cheers, Daniel
Richard Fuchs writes:
It should be noted that all "old" patterns from those examples will continue to work as they did before even with the patch applied. More generally, all patterns that are designed to always match the complete string will continue to work unchanged, providing backwards compatibility. Only patterns that make deliberate use of the side-effect of stripping out unmatched parts of the string will break, but I don't think there's a whole lot of those out there in the wild.
i just checked my patterns and found many, for example:
subst_exp repl_exp ^+358(2|3|5|6|8|9) +358\1
this particular example will extract from e.164 number its certain kind country code/area code prefix.
i'm thus strongly against removing the current behavior. i suggest you write a new function to dialplan module that does what you want. adding module parameters just complicates the code and makes supporting it more difficult.
-- juha