Monthly Archives: July 2009

PHP – Split with delimiter caputre

Split a string on a regular expression and also capture the delimiter.  This can be useful if you have regular expressions like /(cat|dog|fish)/.

$arr = preg_split($rx, $content, -1, PREG_SPLIT_DELIM_CAPTURE);
for ($i = 0; $i < count($arr); $i+=2) {
  $chunk = $arr[0];
  $delim = $arr[1];
}

Perl – quick read text files scripts

These two Perl utility scripts are used to quickly read a text file into either a string or an array.

sub read_file_into_string {
  my $s = '';
  open FH, @_[0];
  for (<FH>) { $s .= $_; }
  close FH;
  return $s;
}
sub read_file_into_array {
  my @lines = ();
  open FH, @_[0];
  for (<FH>) {
    chomp;
    push @lines, $_;
  }
  close FH;
  return @lines;
}

Python MySQL utf connection + dictionary recordsets

# -*- coding: utf-8 -*-
import MySQLdb, MySQLdb.cursors
db_connection = MySQLdb.connect(user="root", passwd="pass",
db="mydb", cursorclass = MySQLdb.cursors.DictCursor,
use_unicode = True)
db = db_connection.cursor()
# assuming that id=1271 has a val that has utf "µ" in it
sql = """
SELECT *
FROM vals
WHERE id = 1271
""" % ()
db.execute(sql)
r = db.fetchall()[0]
val = r['val']
print val
print type(val)

Python unicode + xml.dom.minidom + write to file

# -*- coding: utf-8 -*-
import xml.dom.minidom, codecs
impl = xml.dom.minidom.getDOMImplementation()
dom = impl.createDocument(None, "root", None)
root_el = dom.documentElement
ascii_el = dom.createElement('ascii')
root_el.appendChild(ascii_el)
ascii_el.appendChild(dom.createTextNode('abc'))
utf_el = dom.createElement('utf')
root_el.appendChild(utf_el)
utf_el.appendChild(dom.createTextNode(unicode('µ 6000', 'utf-8')))
# There's a couple of ways of dealing with output:
if True:
# Will write a utf encoded file without BOM
# <?xml version="1.0"?>
# Will print "µ" to the console
# type(xmlstr) == 'unicode' i.e. unicode string
xmlstr = dom.toxml()
f = codecs.open('utf_test.xml', 'w', 'utf-8')
f.write(xmlstr)
f.close()
print xmlstr
print type(xmlstr)
else:
# Will write a utf encoded file without BOM (same as above)
# <?xml version="1.0" encoding="utf-8"?>
# Will print "??" to the console
# type(xmlstr) == 'str' -- i.e. byte string
xmlstr = dom.toxml('utf-8')
f = open('utf_test.xml', 'w')
f.write(xmlstr)
f.close()
print xmlstr
print type(xmlstr)