New Stuff

Worship Tech Web Tools Blog

4192093_illustration.gifThis is an ongoing blog of web tools and technology related to worship, music and church. The idea is to give you good web points and resources that you can go to. Some of it is just me cruising the net, others are favorites of friends.

Enjoy what you see here.  If you find an interesting, useful and technology related site or resource that deals with helping worship or musicians in general, please send us a note and we will check it out. Perhaps we can feature it here.

Thanks!

Enjoy! - Kim Gentes

Entries in CSV (2)

Web-Connecting CSV Files As External Data to Excel Spreadsheet (Kim Gentes Worship / Tech Blog)

If you use generated reports from websites that contain important operational or reporting data, it is often important to bring that data into an Excel Spreadsheet. If you do that regularly, you might find yourself bringing that data into a spreadsheet on a regular basis.  This can be tedious, because you have to do a repeated number of steps that require regular maintenance at all points:

  • Generate the CSV/Excel data
  • Download the CSV or Excel Data to your local computer
  • Delete the old data from your main spreadsheet
  • Import the Data into your main Excel spreadsheet
  • Reformat the data to fit your spreadsheet formulas and layout

If you find yourself doing this kind of thing on a regular basis, you might consider an alternate approach- connecting to your CSV/excel spreadsheet reports by a refresh-capable data connection.  This means you don't have to download the data manually, you don't have to delete the old data version from your spreadsheet, and you don't have to import the new data and reformat your spreadsheet once again.

All this can be automated, once you setup your CSV/excel data to be an external data source. Below are some easy steps to doing this.  I have included a lot of screen shots, but this process is really quite quick and simple.  After you have done it once, you won't have repeat it on that data source again, and you should be able to remember it quite easily for other uses.

The following example below is done with Microsoft Excel 2010 on a Windows 7 OS.  Most Windows OSes should work the same way, but I cannot get this same functionality using the Microsoft Excel product for the Mac.

Most importantly, MAKE sure you read STEP 1 carefully. Do NOT select "from Web", but use "From Text".

 

STEP 1 - Open up your main Excel Spreadsheet you use as the location where you import your data (this might be a final summary spreadsheet or such).  Once your are there, on the sheet you wish to import the data select the Data Tab. Select "From Text" (strangely, do not select "from web") as shown in the graphic here:

STEP 2 - A file selection dialog will appear. Instead of entering a file path or searching for a local computer file, enter the website URL of your CSV or Excel spreadsheet file, as shown in the graphic here:

STEP 3 - Select "Open" file and the file will begin requesting to be imported via the regular text import wizard, as is normal for Excel.  Enter the configuration of the wizard as you would if the file was local, configuring it to the fields and formats you wish. Shown in the graphic here:

STEP 4 - As per usual, choose the location of the data import placement on the current sheet.  Shown in the graphic here:

 

STEP 5 - Your data will import on the current sheet.  Once it does, select the "Properties" button in the "Connections" area of the menu. This will bring up the "External Data Range Properties" sheet. Shown in the graphic here:

STEP 6 - You can configure it to your liking, but I prefer the following settings that seem to work well in most settings- Unselect "Prompt for file name on refresh".  I leave the refresh control buttons off as well, so that I must click the Refresh button to ask for updates. I also unselect "Adjust column width" so that the spreadsheet is not altered in format/layout by the new data each time. Look at the graphic below for the other settings I recommend in most situations:

The goal of all this is to provide you with a instantly refreshable CSV data source that is web based. No more downloading, deleting old data, importing new, and reformatting.

 

I hope that helps some of you who have projects that this applies to.

 

Happy spreadsheets all!

Kim Gentes

Regex Pattern for Parsing CSV files with Embedded commas, double quotes and line breaks

While you have stumbled on KimGentes.com, you might be coming for a few different reasons. Some of you are interested in articles and resources on Christianity, music, worship and such. Others of you are interested in technology information related to church worship settings. Some other folks are programmers who are looking for helpful information on technical challenges. This particular post is a bleed over from some of my technical work in programming. Specifically, this is a post to present a solution to parsing CSV files.

Programmers understand that CSV files are simply text data files that have information stored in value fields in the file. Each of the fields is separated by commas to delimit when one value/field ends and the next begins. This is why they are called "Comma Separated Values" files (CSV for short). Anyone who is new to this concept or programming might think that writing a program to extract data from files wherein the commas separate the data fields, should be an easy task. And if that was the total sum of it, it would be quick and simple in virtually any language you could choose to do it in. But that is not the end of it. CSV files are written by a host of popular applications and read by thousands of programs as well, including almost every spreadsheet program in existence, including Microsoft Excel. When the first CSV file user started outputting values to fields and reading them in another destination, they quickly realized a limitation- if you wanted to include the literal character of a comma (,) inside of a field value itself, this could not be done, since it would be interpreted as a field separator and its value wouldn't be understood (as well as the field in which it appeared being literally chopped in half).

To overcome this problem, it's assumed that some Neanderthal software developers (back in the Jurassic era of programming) came up with an idea to allow programs to insert and read commas inside of comma separated fields. They would allow fields to be encased in double quotes as a signature that the value inside this field should be read literally (including commas) from the first double quote to the ending double quote. This worked fine and commas could now be embedded in CSV field values. But, as you can guess, these cause further problems for programs- now, the commas of the world had safe haven usage inside of comma separated values, however, double quotes now could not be included inside of a double quote encased field value. Programmers quickly realized that they couldn't keep adding special characters to allow for current special characters to be escaped (which is a way of saying interpreted as literal data without functional consequence in the interpretation of the data).

So, to avoid using other characters to escape current special meaning characters, CSV file progenitors harkened that users could escape double quotes inside of double quote encased CSV fields by placing two double quotes together in the text. This would the standard way of escaping a double quote character ("), by simply placing to double quote characters next to each other, as in "".

All this is fine for the people and programs writing the data- its simple straightforward programming to output such information. But reading CSV files that have embedded double quotes, commas and can include embedded line breaks is a complicated concept. Such is the life of a programmer :). To meet this challenge, we often use a pattern parsing language called Regex (which stands for Regular Expressions).

Regex maybe the most popular language in the programming world. It is used in literally every high level programming language we know of in the world, including Visual Basic, C#, Javascript,  Java, PHP, Perl, Ruby and dozens more. It is included in several utilities such as search functions inside of UltraEdit and Ace Text. And it is included in most revisions of Unix (and other) OSes in command lines functions such as grep, Windows utilities powerGrep and so forth. Technically speaking Regex isn't a programming language on its own. It's a pattern matching engine that is often embedded inside of other languages. It became widely popular due to its inclusion primary in the Unix/Linux command line function of grep and the early web standard language of Perl. Now, most programmers can't conceive of a language that doesn't include some flavor of regex.

That all said, I have chosen to write a regex pattern that can handle parsing the fields of a CSV with all the conditions I mentioned above. There are plenty of other examples of CSV parsers around, but none seem to do the trick I was looking for, which is grandly frustrating when Excel can import and export a CSV with all the listed nuances quickly and easily. So, not finding a good solution, I have written a short CSV parsing pattern. It is below.

CSV-parser (regex pattern below)

 

^(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)$

 

*YOU ARE FREE to use this algorithm in any application (commercial or personal or whatever). It comes with no warrantees.  If you DO end up using this REGEX pattern, please do so with the following considerations:

  • Please make this notation in your source code:  ©2008 Kim Anthony Gentes - FREE TO USE ANYWHERE. No Warrantees are implied or offered. This software is offered "as-is". Usable by anyone (freeware, non-commercial or personal). No support or service is offered or implied by your usage. Use of the software implies your own assumption of maintenance, liability and operability of the same. Only restriction for us: you should include this copyright notice (full text) with the code. 
  • Please post a response on this blog entry below (you do that by clicking on the "Comments" link at the bottom of this entry), saying you found this and are using it. I'd just like to know if its helping people and how people are using it.

 

When using the regex, some important things to know:

Options (turned on in your language/utility): ^ and $ match at line breaks

Description: below is a textual description of the regex pattern that may be helpful to programmers who want to understand what is happening in the regex.

 

Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below and capture its match into backreference number 1 «(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)»
   Match the regular expression below and capture its match into backreference number 2 «("(?:[^"]|"")*"|[^,]*)»
      Match either the regular expression below (attempting the next alternative only if this one fails) «"(?:[^"]|"")*"»
         Match the character “"” literally «"»
         Match the regular expression below «(?:[^"]|"")*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
            Match either the regular expression below (attempting the next alternative only if this one fails) «[^"]»
               Match any character that is NOT a “"” «[^"]»
            Or match regular expression number 2 below (the entire group fails if this one fails to match) «""»
               Match the characters “""” literally «""»
         Match the character “"” literally «"»
      Or match regular expression number 2 below (the entire group fails if this one fails to match) «[^,]*»
         Match any character that is NOT a “,” «[^,]*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Match the regular expression below and capture its match into backreference number 3 «(,("(?:[^"]|"")*"|[^,]*))*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «*»
      Match the character “,” literally «,»
      Match the regular expression below and capture its match into backreference number 4 «("(?:[^"]|"")*"|[^,]*)»
         Match either the regular expression below (attempting the next alternative only if this one fails) «"(?:[^"]|"")*"»
            Match the character “"” literally «"»
            Match the regular expression below «(?:[^"]|"")*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
               Match either the regular expression below (attempting the next alternative only if this one fails) «[^"]»
                  Match any character that is NOT a “"” «[^"]»
               Or match regular expression number 2 below (the entire group fails if this one fails to match) «""»
                  Match the characters “""” literally «""»
            Match the character “"” literally «"»
         Or match regular expression number 2 below (the entire group fails if this one fails to match) «[^,]*»
            Match any character that is NOT a “,” «[^,]*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
Thank you for all the additional information/examples and samples from various languages! Keep posting your ideas that can help others!
thanks
Kim