Worship Tech/Web Tools Blog - Kim Gentes - worship leader and writer

Worship Tech Web Tools Blog

This is an ongoing blog of web tools and technology related to worship, music and church. The idea is to give you good web points and resources that you can go to. Some of it is just me cruising the net, others are favorites of friends.

Enjoy what you see here. If you find an interesting, useful and technology related site or resource that deals with helping worship or musicians in general, please send us a note and we will check it out. Perhaps we can feature it here.

Thanks!

Enjoy! - Kim Gentes

Entries in programming (3)

Sun releases Broken Java update to Public [v6,updates 19 & 20]

Friday, April 16, 2010 at 3:50PM

One of the great things about today's software development culture is that competition has driven change to a feverish pace. If you watch Google, Apple, Microsoft and the biggies, you might think things plod along relatively quick. And for those companies they do. But with huge staffs of developers they can mitigate against quality issues by planning and throwing modern techniques (and hordes of people) to the technical challenges of keeping up with smaller, nimble companies who are focused on niche markets. The little developers have their pains- small staffs and tiny budgets mean they only have a few swings at the plate before using up the resources available will drain away.

The savior of all this was supposed to be open source. In the open source world, we would all be able to benefit from larger efforts shouldered by many, and let the specific applications be driven by those who cared about applying a great technology to a market. So those fundamental technologies became the backbone of open source development. Things like: Linux, PHP, (Apache/HTTP to a lesser extent), Javascript, Java, AJAX, and a cadre of other core technologies would let us all play nice and develop fast.

But what happens when one of those core technologies drops the ball? It lands on the consumers e-foot, that's what! The latest revision of lameness in technology land? None other than the mothership of ubiquitous programmatic lifeforce- Java! Java was lauded as the end-all-be-all language that would unite all platforms, hardware, OS, devices into a playground of loveliness for app developers. Java would bring us all together and unite our efforts to work everywhere. If Java proponents were to believed the Borg, Klingons, Ferrengi and Vulcans would be having tea parties and celebrating peace and harmony conferences to endorse Obama's nuclear disarmament agreements. Such is the hope.

Well, on March 31, the Java fiesta of loveliness was interrupted by a blip on the "what the!" radar. Turns out that Sun (the company that builds and releases the Java language, engine, clients and its updates) managed to release its update 19 of Java v6 with the profound ability to break literally every single applet that was signed by one of the largest authentication agencies in software credentialing. Tech geeks keep on reading for gory details, but for all you folks who already want to slap me, here is the short answer.

Java programs (called applets) need to be verified as "safe". The process of verifying them and "publishing" them as secure is done through a method called "signing". This signing basically places encrypted information on to the Java applet that verifies itself by announcing its identity and a secret code. When people use the internet and are about to use a Java applet that is "safe", their computer reads the identity and secret code from the applet. That code and identity is verified against a database at a "trusted" digital security company (companies that do this type of verification checking are called "authentication services"). If the identity and code don't match properly according to the standards at the security company database, it tells the web surfer that the Java program they are about to use is not to be trusted. Users, understandably, react by blocking the program from running. This is they way digital signing /security works on program applets for Java. Has for many years.

The problem is that Java recently released an update (both v6 updates 19 and 20 include this problem) that incorrectly breaks all the code signing certificates (the digital security) issued by Thawte. This is sad and funny, depending on who you are. Thawte is one of the largest digital security providers in the world. Having Java drop the ball on this is no small item. Thousands of applets all over the web are now reporting how unsafe they are! Thanks Java ! Thanks Sun!

To get the detailed skivvy on this, I went to my friend and web/developer guru Kevin Lott for the nitty gritty details. Lott says,

Java SE 6 update 19 was released on March 31. Java SE 6 update 20 was released on April 16th. Unfortunately, both of these updates are botched releases that will break all code signing certificates issued by Thawte. The algorithum on the Thawte Premium CA is MD5withRSA, however Sun released the update with the wrong algorithum - SHA1withRSA. This will cause the browser to prompt you with an ugly message saying "Java has discovered application components that could indicate a security concern" with the option to block unsafe components. Naturually customers will want to protect themselves and agree to the block breaking your Java Applet application. (Kevin Lott, April 16, 2010)

What does this mean? Well, if you are using any Java applets on any of your favorite websites and they suddenly say "Java has discovered application components that could indicate a security concern"--- you might contact the company before assuming there is anything wrong. There is a good chance that Java itself is causing the problem.

The fix? Well, for users and web surfers, your best bet is to uninstall Java and go back to v6, update 18. For companies who develop applets? Encourage your customers to revert to update 18, or hope and pray that Java releases a fix before the whole web GUI world decides to move permanently away from any use of their technology.

Well, now that you have had your juicy tech update... back to the grind stone people!

happy teching,

Kim Gentes

Kim Gentes |

Regular Expressions Except a Given String - Negative Patterns (Kim Gentes Worship & Tech Blog)

Saturday, December 20, 2008 at 12:46AM

Occasionally, I use this column in technology and worship tech to put some tips out there on the technical side of things. Today is such a day. This is another Regular Expression segment that might help someone.

The goal of much software is to find things. A way to find stuff in computer languages is a sub-language called "Regular Expressions". Most regular expressions deal with finding specific instances of data inside of a larger string. When looking for those instances of data, we often use "patterns" to match what we intend to look for with the data we are looking through to find our desired information. Those patterns often indicate what we are looking for, as in ".*Kim.*" (without the quotes) is a regex pattern that would look for my first name inside of any string. Any string that contained my name would match that pattern.

But in real life, we don't always know what we are looking for in a positive fashion. Sometimes we are looking for things simply because they AREN'T something else. Let's go back to my name, Kim. If I want to create a regex pattern that would match every string that did NOT contain my name, regex has a way to do that as well. It is called "negative lookaround". There are two types of negative lookaround- "negative lookahead" and "negative lookbehind". One is for looking forward into a string, the other is for looking past the current position we are at in a string. For simplicity sake, lets simply look forward, since that will be the most obvious case.

So let me clarify- what we want to do is write a pattern that will find every string that does NOT contain my name, Kim. Ok, here you go:

^(?:(?!Kim).)*$

The core of the "negativeness" of this expression is (?!Kim), which simply says match the next thing forward that doesn't equal exactly "Kim". The rest of the expression allows us to capture the entire string, from start to end. And if all you are doing is trying to make sure that you match a string that doesn't contain a specific pattern, then you are good.

However, sometimes what you are actually looking for is to find any part of any string that does not contain the negative pattern (the name for a pattern that finds a string avoiding a specific pattern). In other words, what you want to do, is look through and extract all the data from any string, except avoid the data from the negative pattern. This is actually a little more complicated, but here is one option:

^(((?:(?! Kim).)*)|((.*)Kim(.*)))$

This pattern will find lines of data that contain nothing to do with Kim, and it will capture data that is on a line with Kim (but can programmatically ignore Kim itself). But in order for this to work, you must actually use what is called captured groups. Regex programmers will understand this as the chunks of identified data that matched groups in their expression. A group in a regex expression is formed each time you use a pair of parenthesis. Using numbered groups, you can get just the information you intended. In the above case, you will need a little user code to get the right data out. So, in PHP, you would have the following code using the above pattern:

if (preg_match('/^(((?:(?!Kim).)*)|((.*)Kim(.*)))$/im', $rawstring, $regexps)) {
  $clean_line= $regexps[2];
  $clean_before_patt= $regexps[4];
  $clean_after_patt= $regexps[5];
} else {
  //failure
}

What you end up with is 3 variables as you parse through your strings. The variable "$clean_line" will contain the string that matches data that has no "Kim" in it at all. The variable "$clean_before_patt" will contain the portion of a string which preceeds the the word "Kim". The variable "$clean_after_patt" will contain the portion of a string which follows the the word "Kim". Simply evaluate the values off of those variables to determine what you want to use as you search through your strings.

Of course, you would replace "Kim" with whatever pattern you DON'T want to find in your strings.

Also, my examples use both matching ^ and $ at lines breaks and search case insensitive (on the PHP preg_match). If you want to search case sensitive simply remove the "i" flag on the preg_match pattern. Similarly, if you don't want ^ and $ to match at lines breaks, just remove the "m" flag in the same preg_match situation (your use and regex engine may have its own flavor on both these flags).

God bless, and happy coding

Kim Gentes

*YOU ARE FREE to use this algorithm in any application (commercial or personal or whatever). It comes with no warrantees. If you DO end up using this REGEX pattern, I ask (but don't require) that you please do so with the following considerations:

Please make this notation in your source code: ©2008 Kim Anthony Gentes - FREE TO USE ANYWHERE.
Please post a response on this blog entry below (you do that by clicking on the "Comments" link at the bottom of this entry), saying you found this and are using it. I'd just like to know if its helping people and how people are using it.

When using the regex, some important things to know:

Options (turned on in your language/utility): ^ and $ match at line breaks

Kim Gentes |

7 Comments |

tagged

regex,

regular expressions in

Programming,

Regex,

Software

Regex Pattern for Parsing CSV files with Embedded commas, double quotes and line breaks

Tuesday, October 14, 2008 at 5:32AM

While you have stumbled on KimGentes.com, you might be coming for a few different reasons. Some of you are interested in articles and resources on Christianity, music, worship and such. Others of you are interested in technology information related to church worship settings. Some other folks are programmers who are looking for helpful information on technical challenges. This particular post is a bleed over from some of my technical work in programming. Specifically, this is a post to present a solution to parsing CSV files.

Programmers understand that CSV files are simply text data files that have information stored in value fields in the file. Each of the fields is separated by commas to delimit when one value/field ends and the next begins. This is why they are called "Comma Separated Values" files (CSV for short). Anyone who is new to this concept or programming might think that writing a program to extract data from files wherein the commas separate the data fields, should be an easy task. And if that was the total sum of it, it would be quick and simple in virtually any language you could choose to do it in. But that is not the end of it. CSV files are written by a host of popular applications and read by thousands of programs as well, including almost every spreadsheet program in existence, including Microsoft Excel. When the first CSV file user started outputting values to fields and reading them in another destination, they quickly realized a limitation- if you wanted to include the literal character of a comma (,) inside of a field value itself, this could not be done, since it would be interpreted as a field separator and its value wouldn't be understood (as well as the field in which it appeared being literally chopped in half).

To overcome this problem, it's assumed that some Neanderthal software developers (back in the Jurassic era of programming) came up with an idea to allow programs to insert and read commas inside of comma separated fields. They would allow fields to be encased in double quotes as a signature that the value inside this field should be read literally (including commas) from the first double quote to the ending double quote. This worked fine and commas could now be embedded in CSV field values. But, as you can guess, these cause further problems for programs- now, the commas of the world had safe haven usage inside of comma separated values, however, double quotes now could not be included inside of a double quote encased field value. Programmers quickly realized that they couldn't keep adding special characters to allow for current special characters to be escaped (which is a way of saying interpreted as literal data without functional consequence in the interpretation of the data).

So, to avoid using other characters to escape current special meaning characters, CSV file progenitors harkened that users could escape double quotes inside of double quote encased CSV fields by placing two double quotes together in the text. This would the standard way of escaping a double quote character ("), by simply placing to double quote characters next to each other, as in "".

All this is fine for the people and programs writing the data- its simple straightforward programming to output such information. But reading CSV files that have embedded double quotes, commas and can include embedded line breaks is a complicated concept. Such is the life of a programmer :). To meet this challenge, we often use a pattern parsing language called Regex (which stands for Regular Expressions).

Regex maybe the most popular language in the programming world. It is used in literally every high level programming language we know of in the world, including Visual Basic, C#, Javascript, Java, PHP, Perl, Ruby and dozens more. It is included in several utilities such as search functions inside of UltraEdit and Ace Text. And it is included in most revisions of Unix (and other) OSes in command lines functions such as grep, Windows utilities powerGrep and so forth. Technically speaking Regex isn't a programming language on its own. It's a pattern matching engine that is often embedded inside of other languages. It became widely popular due to its inclusion primary in the Unix/Linux command line function of grep and the early web standard language of Perl. Now, most programmers can't conceive of a language that doesn't include some flavor of regex.

That all said, I have chosen to write a regex pattern that can handle parsing the fields of a CSV with all the conditions I mentioned above. There are plenty of other examples of CSV parsers around, but none seem to do the trick I was looking for, which is grandly frustrating when Excel can import and export a CSV with all the listed nuances quickly and easily. So, not finding a good solution, I have written a short CSV parsing pattern. It is below.

CSV-parser (regex pattern below)

^(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)$

*YOU ARE FREE to use this algorithm in any application (commercial or personal or whatever). It comes with no warrantees. If you DO end up using this REGEX pattern, please do so with the following considerations:

Please make this notation in your source code: ©2008 Kim Anthony Gentes - FREE TO USE ANYWHERE. No Warrantees are implied or offered. This software is offered "as-is". Usable by anyone (freeware, non-commercial or personal). No support or service is offered or implied by your usage. Use of the software implies your own assumption of maintenance, liability and operability of the same. Only restriction for us: you should include this copyright notice (full text) with the code.
Please post a response on this blog entry below (you do that by clicking on the "Comments" link at the bottom of this entry), saying you found this and are using it. I'd just like to know if its helping people and how people are using it.

When using the regex, some important things to know:

Options (turned on in your language/utility): ^ and $ match at line breaks

Description: below is a textual description of the regex pattern that may be helpful to programmers who want to understand what is happening in the regex.

Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below and capture its match into backreference number 1 «(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)»
   Match the regular expression below and capture its match into backreference number 2 «("(?:[^"]|"")*"|[^,]*)»
      Match either the regular expression below (attempting the next alternative only if this one fails) «"(?:[^"]|"")*"»
         Match the character “"” literally «"»
         Match the regular expression below «(?:[^"]|"")*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
            Match either the regular expression below (attempting the next alternative only if this one fails) «[^"]»
               Match any character that is NOT a “"” «[^"]»
            Or match regular expression number 2 below (the entire group fails if this one fails to match) «""»
               Match the characters “""” literally «""»
         Match the character “"” literally «"»
      Or match regular expression number 2 below (the entire group fails if this one fails to match) «[^,]*»
         Match any character that is NOT a “,” «[^,]*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Match the regular expression below and capture its match into backreference number 3 «(,("(?:[^"]|"")*"|[^,]*))*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «*»
      Match the character “,” literally «,»
      Match the regular expression below and capture its match into backreference number 4 «("(?:[^"]|"")*"|[^,]*)»
         Match either the regular expression below (attempting the next alternative only if this one fails) «"(?:[^"]|"")*"»
            Match the character “"” literally «"»
            Match the regular expression below «(?:[^"]|"")*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
               Match either the regular expression below (attempting the next alternative only if this one fails) «[^"]»
                  Match any character that is NOT a “"” «[^"]»
               Or match regular expression number 2 below (the entire group fails if this one fails to match) «""»
                  Match the characters “""” literally «""»
            Match the character “"” literally «"»
         Or match regular expression number 2 below (the entire group fails if this one fails to match) «[^,]*»
            Match any character that is NOT a “,” «[^,]*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of a line (at the end of the string or before a line break character) «$»

Thank you for all the additional information/examples and samples from various languages! Keep posting your ideas that can help others!

thanks

Kim

Kim Gentes |

34 Comments |

1 Reference |

tagged

CSV,

comma,

commas,

embedded,

excel,

programming,

quotes,

regex,

regular,

regular expressions,

separated,

values in

CSV,

Programming,

Regex