Regex replace for MySQL

Hi All,

Recently I came across a requirement where I had to cleanse the data that I’m inserting in to MySQL database using regular expressions, so I started searching for a function to do this on MySQL, but unfortunately I couldn’t find any. There was one called UDB but that’s also you need to install a module and stuff like that. I gave up searching finally and wrote my own “regex_replace” MySQL function.

Below is the source code of my function;

DELIMITER $$
CREATE FUNCTION  `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000))

RETURNS VARCHAR(1000)
DETERMINISTIC
BEGIN 
 DECLARE temp VARCHAR(1000);
 DECLARE ch VARCHAR(1);
 DECLARE i INT;
 SET i = 1;
 SET temp = '';
 IF original REGEXP pattern THEN 
  loop_label: LOOP 
   IF i>CHAR_LENGTH(original) THEN
    LEAVE loop_label;
   END IF;
   SET ch = SUBSTRING(original,i,1);
   IF NOT ch REGEXP pattern THEN
    SET temp = CONCAT(temp,ch);
   ELSE
    SET temp = CONCAT(temp,replacement);
   END IF;
   SET i=i+1;
  END LOOP;
 ELSE
  SET temp = original;
 END IF;
 RETURN temp;
END$$
DELIMITER ;

Note:

If you are using MySQL version 5.0.1 or higher, make sure you set the NO_BACKSLASH_ESCAPES mode ON, before you use the above function to replace any characters which are escaped with back slash “\”, ie: \A,\B,etc… See how to set the NO_BACKSLASH_ESCAPES mode here

Example on how to use this function
mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \\ my- sql (regular) ,expressions ._,');

Happy Coding!! :)

Advertisement

33 Responses to “Regex replace for MySQL”

  1. Thank you. That was a good one.

  2. bitseeker Says:

    when using:
    SELECT regex_replace(‘[^a-zA-Z0-9]‘, ”, “%$&?/’|test><+-,][)(" )
    then the result is:
    $?'test<-])

    seems that not all characters are being replace ;o)

    • Your regex is only matching a single character. Try this:

      SELECT regex_replace(‘[^a-zA-Z0-9]+‘, ”, “%$&?/’|test><+-,][)(" )

      (Note the "+")

  3. Hi bitseeker,

    You are right !! I don’t know why it’s doing that, any way I fixed the problem with the following;

    FUNCTION `regex_replace`(pattern varchar(1000),replacement varchar(1000),original varchar(1000))
    RETURNS varchar(1000)
    DETERMINISTIC
    BEGIN
    DECLARE temp VARCHAR(1000);
    DECLARE ch VARCHAR(1);
    DECLARE i INT;
    SET i = 1;
    SET temp =”;
    IF original REGEXP pattern THEN
    loop_label: LOOP
    IF i>CHAR_LENGTH(original) THEN
    LEAVE loop_label;
    END IF;
    SET ch = SUBSTRING(original,i,1);
    IF NOT ch REGEXP pattern THEN
    SET temp = CONCAT(temp,ch);
    ELSE
    SET temp = CONCAT(temp,replacement);
    END IF;
    SET i=i+1;
    END LOOP;
    END IF;
    RETURN temp;
    END$$

    Thanks for your comment :)

  4. replacement – don’t work

    • Hi Misha,

      What is the problem you face? are you saying the above modification doesn’t replace the character that you have specified in “replacement” parameter?

      Cheers

  5. This function regular replace by chars

    FUNCTION regex_replace_char(pattern VARCHAR(255), replacement VARCHAR(255), subject VARCHAR(1000))
    RETURNS varchar(1000)
    BEGIN
    # ver 0.1
    # autor: Bunakov Mykhaylo

    DECLARE i INT;

    SET i = 1;
    SET @temp = `subject`;
    SET @len_prn = CHAR_LENGTH(pattern);

    WHILE i <= @len_prn DO
    SET @ch_ptn = SUBSTRING(pattern,i,1);
    SET @ch_rpl = SUBSTRING(replacement,i,1);

    SET @temp = REPLACE(@temp, @ch_ptn, @ch_rpl);

    SET i=i+1;
    END WHILE;

    RETURN @temp;
    END

  6. I modified your function, now it works and replace by pattern:

    FUNCTION regex_replace(pattern varchar(1000), replacer varchar(1000), original varchar(1000))
    RETURNS varchar(1000)
    DETERMINISTIC
    BEGIN
    DECLARE temp VARCHAR(1000);
    DECLARE ch VARCHAR(1);
    DECLARE i INT;
    SET i = 1;
    SET temp =”;
    IF original REGEXP pattern THEN
    loop_label: LOOP
    IF i>CHAR_LENGTH(original) THEN
    LEAVE loop_label;
    END IF;
    SET ch = SUBSTRING(original,i,1);
    IF NOT ch REGEXP pattern THEN
    SET temp = CONCAT(temp, ch);
    ELSE
    SET temp = CONCAT(temp,replacer);
    END IF;
    SET i=i+1;
    END LOOP;
    ELSE
    SET temp = original;
    END IF;
    RETURN temp;
    END

    • Hi Alex,

      Great work, I missed that bit. I was mainly focusing on removing the characters somehow I got carried away from replacing the unwanted characters, but you spotted it, that’s great. I’ll update the main post with the change.

      Thanks :)

  7. Hi,
    Can you please elaborate on how to create this function in the mysql monitor. mysql> create function… gives all kinds of errors, the first one being: “No database selected”?

    • Hi Jimmy,

      I haven’t tried the MySQL Monitor to create any functions, but I wonder why it would be difficult to create a function any way.
      Could you tell you what you tried to do ? or try with SQLyog

  8. Hi, I always get following error when trying to create your function:
    #1064 – You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ” at line 5

  9. When there’s nothing to replace is returing an empty string. I’ve fixed it to return the original.

    DROP FUNCTION IF EXISTS `tdoperr0sv1db`.`regex_replace`;
    CREATE FUNCTION `tdoperr0sv1db`.`regex_replace`(pattern varchar(1000),replacement varchar(1000),original varchar(1000))
    RETURNS varchar(1000)
    DETERMINISTIC
    BEGIN
    declare temp varchar(1000);
    declare ch varchar(1);
    declare i int;

    set i = 1;

    set temp = “”;
    if original REGEXP pattern then
    loop_label: LOOP
    if i>char_length(original) then
    LEAVE loop_label;
    end if;
    set ch = substring(original,i,1);
    if NOT ch REGEXP pattern then
    set temp = concat(temp,ch);

    else
    set temp = concat(temp,replacement);

    end if;
    set i=i+1;
    END LOOP;
    else
    return original;
    end if;
    return temp;

    END$$

  10. I do always get this error when trying to regex_replace (with your example in the original post):

    Unknown column ‘”’ in ‘field list’

    • Got it – I copy/pasted your example code and the

      set temp = ”

      was making some problems. I replaced the quotation marks with normal ones.

  11. When I do
    SELECT regex_replace(‘/A/’, ”, `firstname`) FROM `contacts`
    I have the following error:
    #1054 – Unknown column ‘â€

    Seems i’m mussing somethign, buyt what? I’m using utf8.

    • I suppose, it’s the same problem I had. If you copy/paste the code above be sure to replace the fancy quotation marks with normal ones and redeclare the function.

      (Drop it before with DROP FUNCTION regex_replace)

      • Yes Simon, you are correct. Kissifrot, make sure you replace the ” (double quotation) with ” (two single quotations) in the line set temp = ”

  12. I used two single quotes (ASCII code 39), with the same result.

    • Hi Kissifrot,

      I’ve updated the original post, so that it correctly display the single quotes. Copy the code in to a notepad and then copy it from there to your MySql environment and drop the existing function and run the script to create the function again.

      Then check whether you can run the following query and get the result.

      SELECT regex_replace(‘[^a-zA-Z0-9\-]‘,’ ‘,’2my test3_text-to. check \\ my- sql (regular) ,expressions ._,’);

      make sure to replace the single quotes properly.

      Cheers,

  13. FYI—BIG WARNING FOUND AFTER A LOT OF BANGING HEAD…This works great EXCEPT FOR:
    This only universally works 1) if NO_BACKSLASH_ESCAPES is set in sql_mode (Server Variable) or 2) if the input string doesn’t contain any character combinations that may be interpreted as escaped commands. IE: \Z \n \t \b etc.
    If the input string contains these escaped commands (intentionally or unintentionally) and NO_BACKSLASH_ESCAPES is not set, they are executed as commands and pollute/change the expected return value. Without the No_BACKSLASH_ESCAPES set, there is no way the regex_replace will work all the time for everything.

    • Hi Yourmomma,

      Thanks for the information. I’m sure it would help most of the readers who are going to use this function to regex_replace their text.
      If the NO_BACKSLASH_ESCAPES is unset you have to use STDSQL systax of escaping problamatic characters in your SQL query, ie: use double single quote ("''") to escape a single quote in your sql query instead of using backslash and single quote ("\'").

      I’ll add a note for this in my original post.

      Once again thanks very much for your valuable input :)

  14. As far as I could understand, this function can replace single characters only. It can’t replace pattern. F.ex, I need to remove HTML tags. To simplify regex let’s try to remove all starting from :

    select regex_replace(”,”,’aaappp’)

    But it does not work.

    And even select regex_replace(”,”,’aaappp’) does not work!
    If I’m right, it’s almost useless function. It’s functionality can be reached with several nested “replace” functions.

    • what do you mean it doesnt work
      try this:

      mysql> select regex_replace(‘[a-z]‘,”,’aaappp2′) as result;

      it show result 2, and with pattern replace any character a-z
      ——–
      |result|
      ——–
      |2 |

      you problem is understand about how regular expression work . :D

  15. I confirmed what Sergy said. It only work to replace single character.

    Example:

    SELECT regex_replace(‘test’, ”, ‘test test test’)

    will return unmodified original, which is wrong result.

  16. Despite your protestations, this function in its current form does not, replace a regular expression. It replaces all instances of single letters that match a regular expression, which is a very different thing.

    SET ch = SUBSTRING(original,i,1);
    IF NOT ch REGEXP pattern THEN

    Those two lines extract a single character, test that single character, and replace only that one character.

    As I understand it, (ans as my testing below shows) it is impossible, using this, to replace a string of more than one character.

    Here are some examples to explain the problem. In these examples, the goal is to replace the “foo”s in “foo1 foo2 foo3″ with “bar”s, to get “bar1 bar2 bar3″.

    ===

    The simplest regex, which should work, will not:
    SELECT regex_replace(‘foo’, ‘bar’, ‘foo1 foo2 foo3’);
    Output: foo1 foo2 foo3

    In PHP, which has regex replacements built in, it comes out correctly:
    echo preg_replace(‘/foo/’, ‘bar’, ‘foo1 foo2 foo3’);
    Output: bar1 bar2 bar3

    ===

    Just to show that the ‘/’ required by php is not relevant here:
    SELECT regex_replace(‘/foo/’, ‘bar’, ‘foo1 foo2 foo3′);
    Output: foo1 foo2 foo3

    ===

    Here are some more complex regexes, that illustrate the issue hopefully a little more clearly:

    SELECT regex_replace(‘[^0-9 ]+’, ‘bar’, ‘foo1 foo2 foo3′);
    Output: barbarbar1 barbarbar2 barbarbar3

    The SQL function is acting here like the regex for:
    [^0-9 ](?:[^0-9 ]*)

    In PHP:
    php -r “echo preg_replace(‘/[^0-9 ]+/’, ‘bar’, ‘foo1 foo2 foo3′);”
    Output: bar1 bar2 bar3

    ===

    SELECT regex_replace(‘fo*’, ‘bar’, ‘foo foo foo’);
    Output: baroo1 baroo2 baroo3

    The SQL function is acting here like the regex for:
    f(?:o*)

    In PHP:
    SELECT regex_replace(‘fo*’, ‘bar’, ‘foo foo foo’);
    Output: bar1 bar2 bar3

    Clearly illustrates how the SQL function matches only a single character, but the regular expression should match the whole string.

    ===

    The following, which is how you use it in your examples, is a character set (http://www.regular-expressions.info/charclass.html). It works, as a character set specifying a single character, but does not do what we want:
    SELECT regex_replace(‘[foo]‘, ‘bar’, ‘foo1 foo2 foo3’);

    Output: barbarbar1 barbarbar2 barbarbar3
    …which is correct, and the same as PHP gives, but is not what we are after.

    I hope that these examples have helped clarify the issue.

  17. Hii,

    I want to modify this function to get the return value with a alias name just like -> SELECT regexp_replace(‘[E\'\\!:\\?\'\]‘, ‘\’\')||\’ \’,P.description AS vals)

    How can I change the function. Highly appreciated

    Thanks.This post healed my great headache :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.