New Stuff
« Facebook Friends for a Whopper! Would you do it? | Main | iWORSHIP FLEXX - mpeg video component media (Kim Gentes Worship Tech Blog) »

Regular Expressions Except a Given String - Negative Patterns (Kim Gentes Worship & Tech Blog)

Occasionally, I use this column in technology and worship tech to put some tips out there on the technical side of things.  Today is such a day.  This is another Regular Expression segment that might help someone.

The goal of much software is to find things.  A way to find stuff in computer languages is a sub-language called "Regular Expressions".  Most regular expressions deal with finding specific instances of data inside of a larger string. When looking for those instances of data, we often use "patterns" to match what we intend to look for with the data we are looking through to find our desired information.  Those patterns often indicate what we are looking for, as in ".*Kim.*" (without the quotes) is a regex pattern that would look for my first name inside of any string.  Any string that contained my name would match that pattern.

But in real life, we don't always know what we are looking for in a positive fashion.  Sometimes we are looking for things simply because they AREN'T something else.  Let's go back to my name, Kim.  If I want to create a regex pattern that would match every string that did NOT contain my name, regex has a way to do that as well.  It is called "negative lookaround".  There are two types of negative lookaround- "negative lookahead" and "negative lookbehind".  One is for looking forward into a string, the other is for looking past the current position we are at in a string.  For simplicity sake, lets simply look forward, since that will be the most obvious case.

So let me clarify- what we want to do is write a pattern that will find every string that does NOT contain my name, Kim.  Ok, here you go:

^(?:(?!Kim).)*$

The core of the "negativeness" of this expression is (?!Kim), which simply says match the next thing forward that doesn't equal exactly "Kim".  The rest of the expression allows us to capture the entire string, from start to end. And if all you are doing is trying to make sure that you match a string that doesn't contain a specific pattern, then you are good.

However, sometimes what you are actually looking for is to find any part of any string that does not contain the negative pattern (the name for a pattern that finds a string avoiding a specific pattern).  In other words, what you want to do, is look through and extract all the data from any string, except avoid the data from the negative pattern.  This is actually a little more complicated, but here is one option:

^(((?:(?! Kim).)*)|((.*)Kim(.*)))$

This pattern will find lines of data that contain nothing to do with Kim, and it will capture data that is on a line with Kim (but can programmatically ignore Kim itself).  But in order for this to work, you must actually use what is called captured groups. Regex programmers will understand this as the chunks of identified data that matched groups in their expression.  A group in a regex expression is formed each time you use a pair of parenthesis.  Using numbered groups, you can get just the information you intended. In the above case, you will need a little user code to get the right data out.  So, in PHP, you would have the following code using the above pattern:

if (preg_match('/^(((?:(?!Kim).)*)|((.*)Kim(.*)))$/im', $rawstring, $regexps)) {
  $clean_line= $regexps[2];
  $clean_before_patt= $regexps[4];
  $clean_after_patt= $regexps[5];
} else {
  //failure
}

What you end up with is 3 variables as you parse through your strings. The variable "$clean_line" will contain the string that matches data that has no "Kim" in it at all.  The variable "$clean_before_patt" will contain the portion of a string which preceeds the the word "Kim". The variable "$clean_after_patt" will contain the portion of a string which follows the the word "Kim".  Simply evaluate the values off of those variables to determine what you want to use as you search through your strings.

Of course, you would replace "Kim" with whatever pattern you DON'T want to find in your strings.

Also, my examples use both matching ^ and $ at lines breaks and search case insensitive (on the PHP preg_match).  If you want to search case sensitive simply remove the "i" flag on the preg_match pattern. Similarly, if you don't want ^ and $ to match at lines breaks, just remove the "m" flag in the same preg_match situation (your use and regex engine may have its own flavor on both these flags). 

God bless, and happy coding

Kim Gentes

 

*YOU ARE FREE to use this algorithm in any application (commercial or personal or whatever). It comes with no warrantees.  If you DO end up using this REGEX pattern, I ask (but don't require) that you please do so with the following considerations:

  • Please make this notation in your source code:  ©2008 Kim Anthony Gentes - FREE TO USE ANYWHERE.
  • Please post a response on this blog entry below (you do that by clicking on the "Comments" link at the bottom of this entry), saying you found this and are using it. I'd just like to know if its helping people and how people are using it.

When using the regex, some important things to know:

Options (turned on in your language/utility): ^ and $ match at line breaks

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (7)

Thank you very much for your article !
I had a hard time to find it, but your simple ^(?:(?!Kim).)*$ saved my day :)

May 4, 2010 | Unregistered Commentermiracl

Thank you very much for your solution.It seems very good and helpful :) :)
But Now,Please, I would like applied this for many expression ( X && Y && Z) because I applied this on a JTable with many documents and i would like to filter my table with expression xhich contains many values eg : NOT IN ( Kim , Jean , Peter ).

Thank you for your help.

May 5, 2010 | Unregistered CommenterAli Mansour

Oh,it's so easy this expression works for me : ^(?:(?!((Kim|Jean|Peter)).)*)$

Thank you very much for you help another time.

May 6, 2010 | Unregistered CommenterAli Mansour

Hi, I have used all the answers here but still I cant find the solution to my problem. Can you help me please? Below is the detail.

I have two commands(lets name it Rule1 and Rule2) that runs on one string(lets name it THESTRING) to tag it as either marked or not.

Rule1 is:
THESTRING contains ":\program files\" then mark it

Rule2 is:
THESTRING contains "\geronito\" then mark it.

This is where the problem came:
THESTRING = "C:\program files\geronito\red.tsd"

The proper tag should be Rule2, But my code tags it as Rule1. below is my code:

Rule1
.*:\\progra(m files|\~1).*[^(\\geronito\\)]

Rule2
.*\\geronito\\.*


and also I would like to add that anything inside "program files\geronito\temp\" should NOT be detected by any rule at all thus Rule2 was changed below:

Rule2
.*\\geronito\\.*[^(\\temp\\)]

I dont know how I can fix my code. please help. I have been crazy about this for so many hours now. So any help would be appreciated. Thanks!

September 16, 2010 | Unregistered CommenterTicz

Thanks for this, it’s really helpful for changing from one markup language to another!

April 30, 2011 | Unregistered Commenterarkhi

Thank you !
^(?:(?!Kim).)*$
Really helpful for me.

April 18, 2012 | Unregistered CommenterCharlie

I've been looking all night for this! My trouble was that I was trying to get rid of a file extension on my webpage via .htaccess, but I also had a custom 404 redirect and they were not playing nicely. The regex worked perfectly, thanks a million!

April 9, 2013 | Unregistered CommenterAnna

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>