Windowing an iterable with itertools

As any good python developer does, I make heavy use of python’s iterator protocol.  It’s easy, it’s efficient, it’s a good thing.  As you know, an iterator consumes an iterable piece by piece each time “next” is called — which means that the next value cannot be peeked without incrementing the iterator (thus consuming the data).  However, what if we want to peek the next value without having to increment the iterator?  The recipe below solves this problem with a wrapper class that adds two methods — peek and prev.

from itertools import tee

class Iterator(object):
    """Intended to be used inside a while loop"""
    def __init__(self, iterable):
        self._a, self._b = tee(iter(iterable), 2)
        self._previous = None
        self._peeked   = self._b.next()

    def __iter__(self):
        return self

    def next(self):
        self._previous = self._a.next()
        self._current  = self._peeked
        try:
            self._peeked = self._b.next()
        except StopIteration:
            self._peeked = None
        return self._current

    def prev(self): return self._previous

    def peek(self): return self._peeked

Notice that we only need two, not three, copies of the original iterable.  Initially the “current” value is undefined and the “peeked” value equals the first iteration of the iterable.  As we consume the data, the “current” becomes the “peeked” while the “peeked” is incremented.  The “previous” value starts at None and is one iteration behind.  Thus class Iterator’s “next” method provides us with a sliding window, solving the problem.  For example,

I = Iterator([1, 2, 3, 4])
fcn = lambda itr: (itr.prev(), itr.next(), itr.peek())

L = []
while True:
    try:
        L.append(fcn(I))
    except StopIteration:
        break

print 'L: ', L

>>> L: [(None, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, None)]

Using Iterator inside a ‘for’ loop works just like you’d expect, preserving the normal iterator protocol.  But what if we want to obtain the “peeked” and “previous” values inside of a ‘for’ loop?  The recipe can be modified as follows:

class Window(object):
    """Intended to be used with a for loop"""
    def __init__(self, iterable):
        self._a, self._b = tee(iter(iterable), 2)
        self._previous = None
        self._peeked   = self._b.next()

    def __iter__(self):
        return self

    def next(self):
        _prev = self._previous
        self._previous = self._a.next()
        self._current  = self._peeked
        try:
            self._peeked = self._b.next()
        except StopIteration:
            self._peeked = None
        return _prev, self._current, self._peeked

    def prev(self): return self._previous

    def peek(self): return self._peeked

Then,

W = Window([1, 2, 3, 4])
L = []

for prev, current, peeked in W:
    L.append((prev, current, peeked))

print 'L: ', L

>>> L: [(None, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, None)]

The coolest thing though is that any iterable can be used with Window and Iterator.  This includes generators!  Talk about powerful.

merging changes in plone

During development it is common practice to maintain a copy of the live site so that changes can be tested without breaking the live site.  Changes made on the development site are often stored in the ZODB via the customs folder.  However, before moving changes to the live site it is important to store them in svn.  Here is the process of moving changes from the customs folder into svn:

1) Use customDumper to dump customs folder to the filesystem

2) Zip the customs folder and move it to the home directory

3) From there, sftp and retrieve the zip file

4) Unzip the file into the products directory

5) Do a diff -r –brief custom product_name | grep “^Files”

6) Move those files which have changed from the customs folder into the product’s skins folder

7) Do a commit, and now you are done.

Parsing nginx log files

The python group at OASIS has been busy making plone do backflips!  We’ve hooked up several Zeo instances to nginx and varnish — cool stuff.  After the grunt work of digging through config files we faced the problem of how to test our setup; but none of us wanted to do end-user testing for thirty — big — sites.  What to do, what to do…

Well, as always, we found a solution.

We have Nginx logging set up and about 10,000 log entries per site.  All we needed was software that used the log entries to programmatically replicate user actions.  The first step, however, was to write some code that could take each log entry, parse it and check it for consistency.  Thats where RegTemplate fits in.  RegTemplate takes a $ delimited string and a dictionary of regular expressions who’s keys are the template identifiers.  Each identifier is replaced by its corresponding regular expression, forming the pattern for each log entry.  RegTemplate also has a method called ‘verify’ which takes a log entry as an argument and processes it through the constructed regular expression, then compares the result to the original log entry.  Cool huh?

Below is a sample use case for a simplified log entry:

if __name__ == '__main__':
   sample = '152.2.103.87 [22/Apr/2009] GET'

   print 'Sample log entry: ', sample, '\n'

   passesExact    = '152.2.103.87 [22/Apr/2009] GET'
   passesNotExact = '152.2.103.87 [22/Apr/2009] GET \\ '
   template       = '$ip [$date] $request'

   ip      = '(([0-9]+\.?)+)'
   date    = '([0-9]{2})\/([A-Za-z]+)\/([0-9]{4})'
   request = '(GET|POST)'
   dct     = {'ip':ip, 'date':date, 'request':request}

   print 'dictionary of regex substitutions: ', dct, '\n'

   rt = RegTemplate(template)
   print 'Template before compilation is: ', rt.template
   print 'Pattern after compilation is: ',   rt.compile(dct)._RegTemplate__pattern
   print 'Template after compilation is: ',  rt.template

   print '\n'

   print '"%s" passes exact match: '       % passesExact, rt.verify(passesExact, exact=True)
   print '"%s" passes exact match: '       % passesNotExact, rt.verify(passesNotExact, exact=True)
   print '"%s" passes approximate match: ' % passesNotExact, rt.verify(passesNotExact, exact=False)

   print '\n'

   match = rt.match(passesNotExact)
   print 'pulled ip from named group, value retrieved is: ',      match.group('ip')
   print 'pulled date from named group, value retrieved is: ',    match.group('date')
   print 'pulled request from named group, value retrieved is: ', match.group('request')
   print 'named groups are: ', rt.namedGroups(passesNotExact)

Read more

What is programming?

“So why programming, what is it really?”

Programming is writing instructions for how parts make a whole.  Take, for instance, legos.  lego-falconRemember getting legos for your birthday?  Did you read the instructions or just start building?  To build the lego Millennium Falcon you must assemble thousands of parts according to the blueprint.

“Legos are fun, but what do they have to do with programming?”

You program everyday without even knowing it.  When’s the last time you had a really hard question?  How did you solve it?  The first thing you did was to re-examine the question to understand it better; you looked for clues.  Then you went through each part and determined what it did.  At this point you could have written a little story detailing the parts and how they fit together.  This story is a lego model for your question, it is a program.

Programs give us answers by modeling a problem or task.  When we were little kids how did we accomplish stealing a cookie?  The cookie jar was too high so we had to find a chair.  So, to get a cookie we needed to find a chair, move it into place, and then grab a cookie.  There.  I’ve tricked you.  We’ve just written a program that simulates ruining our dinner.

“Sounds too easy, what’s the catch”

Well, programming can be complicated, thats why its called writing code.  But programming is no more complicated than a piece of art.  Programs are acts of creative expression, having structure and a sense of aesthetic just like a haiku, a sonata, or a painting. Each line of code expresses an idea and draws from a variety of metaphors and idioms.  Its like flying at the speed of thought.  This is what it feels like to program.  Pretty cool huh.

The awesomest simple template – part deux

Ok. So recently I posted some code for a simple templating utility that I wrote. I’ve done some refactoring and have an even cooler version. This version better extends the behavior of string.Template and I’ve found it to be far more intuitive to use.  Without further ado:

import string

class SimpleTemplate(string.Template):
   """
   Takes a string and either a dict or any other
   object with the __dict__ attribute. Attributes passed
   into the constructor can be overriden by manually setting
   attribute values on the template. For example:
   tmplt = 'my ${name}'
   fcn = lambda: True
   fcn.name = 'original name'
   sTemplate = SimpleTemplate(tmplt, fcn)
   fcn.name = 'new name' # not propagated
   sTemplate.name = 'new name' # overrides fcn.name
   sTemplate.substitute()
   """
   def __init__(self, tmplt, dct={}):
     super(SimpleTemplate, self).__init__(tmplt)
     if hasattr(dct, '__dict__'):
       dct = dct.__dict__
     for name, value in dct.iteritems():
       setattr(self, name, value)

   def substitute(self):
     """Performs iterative substitution for nested templates"""
     dct = self.__dict__
     for k, v in dct.iteritems():
       if isinstance(v, SimpleTemplate):
         dct[k] = v.safe_substitute(**v.__dict__)
     return self.safe_substitute(**dct)

Enjoy!

How to easy_install PIL on OS-X

Ok, so one of my earliest articles detailed how to install the Python Imaging Library on OS-X Leopard.  Well things have changed and the most awesome easy way has arrived.  It is now possible to easy install PIL!!!  Use the following from the command line:

easy_install --find-links http://dist.repoze.org/PIL-1.1.6.tar.gz PIL

Weird easy_install option

Ok so I’m working on customizing Martin Aspeli’s Uber-Buildout.  There’s been some confusion when trying to run the buildout on a mac.  Specifically, my coworker Kevin discovered lxml as a missing dependency and what do you know,

$ easy_install lxml

doesn’t work.  Lucky for us other intrepid bloggers have found a solution to this problem:

$ STATIC_DEPS=true easy_install 'lxml>=2.2alpha1'

Big thanks to Ian Bicking and his blog.

The Simplest Template

Ready for the simplest templating utility ever?  I was trying to prepare some JSON data stored in a tree and needed some simple code to generate templated text.  Below is a wrapper for python’s string.Template class:

import string

class SimpleTemplate(string.Template):
   """ Takes a string template and a tuple or list of identifier names
   """
   def __init__(self, tmplt, names):
     super(SimpleTemplate, self).__init__(tmplt)
     for name in names:
       setattr(self, name, '')

   def substitute(self):
     """ Does recursive substitution for nested template objects
     """
     d = self.__dict__
     for k, v in d.iteritems():
       if isinstance(v, SimpleTemplate):
         d[k] = v.substitute(**v.__dict__)
     return self.safe_substitute(**d)

Read more

SVN stupidity

Ok, I have the attention span of a three year old and a memory like a goldfish.  I’ve come to accept this.  Everyday I go and make coffee and everyday I forget that the kitchen light switch is on the left, not the right (er, maybe I have that backwards … I should check).  Likewise, every time I checkout some source code from an SVN repos I mess up the path and end up with all the code living inside a folder called trunk. Not the end of the world but just annoying enough to blog about.  As a reminder to me here is how to do it:

svn checkout https: … /path/to/code/trunk local_directory

I always forget the space between trunk and local_directory.  Laugh if you must, it is kind of funny.

More on balanced ternary — implementing binary search trees

Hi all – so I’ve been busy as hell but that’s ok, there’s always time for some fun.  In my earlier posts I described how one could construct a tree using balanced ternary – it’s implementation time!  Lets begin by creating a really simple binary search tree. Read more