Wednesday 13 January 2021

regex search and replace in visual studio code

Where do I begin...

Always starts with a seemingly easy task, this one was uploading F9860 into Google bigquery - simple right?

Move it into GCS and then create a table, easy...

but, I kept have problems because the description field has ","s in it.

Okay there are a couple of fixes for this, firstly - just cleanse the data - super easy:

translate(SIMD, chr(10)||chr(11)||chr(13),’,’, ' ')

The code above will change commas, tabs and new lines into a space - perfect...  That means that the csv will have no dodgy commas.

But I was not running the SQL, so it was plan B - post SQL fix

I needed a regex that would find commas in Quotes, that was easy enough:

("[^",]+),([^"]+")

Then I needed to replace this in visual code studio, using the two grouped outputs $1$2


Note that you do this with the fields above, .* allows you to use capture groups in the replacements.  Note that I needed to run this a couple of times - as it was only doing the first comma.

There there is a little hovering over the results to see the "replace all" option - and we are done.  This did save me booting the virtual to do it at the command line.






No comments: