John Tells All: May 2014

Friday, May 30, 2014

devops: per-process swap usage

awk '/Name/{name=$2}; /VmSwap/{printf "%20s\t%s %s\n",name,$2,$3}END{ print ""}' /proc/*/status

inspired from: adnans

Output:

chromium-browse 19664 kB

chromium-browse 6128 kB

avahi-daemon 320 kB

avahi-daemon 220 kB

colord 644 kB

smbd 840 kB

Thursday, May 29, 2014

ref: fio I/O performance/stress tool

"fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 19 different types of I/O engines"

fio – Freecode

Tuesday, May 27, 2014

StackOverflow: achievements unlocked

I'm now one of the StackOverflow "Top Python Answerers" of all time! I'm also a Top Answerer for Multiprocessing.

strace to the rescue, or, program uses mysterious config defaults

Strace is a powerful tool that lets you debug other programs, and find where config files are found.

Programs have bugs. They're written for one platform and ported to another. In any case sometimes things don't work as they should.

I manage this website using the rather nice FTP client lftp. It can automatically "mirror" a local directory on my lappytop onto the web server. This means I don't manage the server directly, I only push files up, so if something happens I can switch to another server with very little hassle.

Today I noticed I couldn't store the password to the FTP account. The documented command bookmark edit brought up an editor, but it didn't have my saved bookmark! On my system it's misconfigured. Bookmarks work, as I can do open jta to connect to this site.

To find out where lftp is storing bookmarks, I used the wonderful tool strace. It lets you run a subcommand, and monitor exactly what is going on from a system perspective. It logs system calls.

In this case, I knew I wanted to scan for the open() system call, when the program opens up the bookmark file. Here's the command I used to get lftp to cough up the information:

$ strace -o z -e trace=open lftp -c 'open jta'
Password:
cd: Login failed: 530 Login incorrect.

The above command ran lftp, read the bookmark file, and tried to connect (open) to the JTA ("John Tells All") FTP bookmark. I exited out because I just wanted to see the strace log to find the bookmark file.

I guessed the bookmark file would have the letters 'ook' in it:

$ egrep ook z
open("/home/johnm/.local/share/lftp/bookmarks", O_RDONLY) = 4

The above trace shows that the application is storing bookmarks in ~/.local/share/lftp/bookmarks The open() command returned a number which means the file was found and opened correctly.

Success! I can now edit this file and add my own bookmarks, including passwords.

Friday, May 23, 2014

code: search Pypi for all package versions

sample code to query the global Python package repository, PyPi, directly. Stack Overflow

#!/usr/bin/env python

'''
pversions.py -- search for package version from PyPi
'''
# adapted from pip.commands.SearchCommand

import sys, xmlrpclib

pnames = sys.argv[1:]
if not pnames:
    sys.exit('Usage: pversions (packagename)...')

pypi = xmlrpclib.ServerProxy('https://pypi.python.org/pypi')
for packagename in (pname.lower() for pname in pnames):
    print packagename,':'
    exact_hits = (
        hit for hit in pypi.search({'name': packagename})
        if hit['name'].lower() == packagename
    )
    print ', '.join( (hit['version'] for hit in exact_hits) )

Wednesday, May 21, 2014

code: sync files using FTP

My webhost doesn't support rsync, only FTP. For grins I wrote a script to scan through interesting local files and push them up to my webhost. Alas the progress wasn't as trivial as I'd hoped, so I abandoned the effort and used lftp the "mirror" command.

The code shows a number of best practices:

don't store passwords/keys, pass them in via environment variables
make an iterator of files, add to it using chain()
Python3 subprocess.check_output() is a little odd
logging, like voting, should be done early and often
code ensures FTP connection always closed cleanly

https://github.com/shavenwarthog/johntellsall/blob/master/class/django-queryset/ftpsync.py

talk: Functional Programming and Django QuerySets

slides -- source on GitHub

My talk yesterday went well! I warned a few people that their brains would explode, but in a good way. It seemed everyone understood the material, and they had three different types of beer for jogging the little grey cells. Success!

For further reference, check the last slide for great stuff to read, or use these links:

Functional Programming HOWTO by Andy Kuchling -- clearly written, with tons of answers to "what?" and "why?", along with lots of Python examples

Can Your Programming Language Do This? by Joel Spolsky -- not Python specific

Wikipedia: Functional Programming -- clear, general use cases

Using Django querysets effectively by Dave Hall -- great, very useful for Django

Thanks to all who came out to learn new stuff. My hour talk was only a brief sketch, but it was a lot of fun to hang out and share knowledge. As always I learn from every single question people have. Thank you so much.

Friday, May 9, 2014

code: use Bash to control workflow of processes

On Stack Overflow, someone asked how to control a workflow of processes. The issue is that sometimes the procs will jam up, and should be restarted. The following code does this.

One technique to control an errant pipeline is to count the number of pipeline processes after a while. If they haven't finished, then zap them and restart.

In this example code we use ask Bash for its process id (the $$ variable), then use pgrep to find all sub-processes of the script. That is, pgrep tells us all the workflow process IDs. If there are any, then we use the corresponding pkill command to squash them before a restart.

In addition, we use the date command to output a pretty log file, showing what the system is doing at what time.

full post on SO: How to send SIGKILLs to entire pipeline?

source: https://github.com/shavenwarthog/johntellsall/tree/master/karma/kill-pipeline

Saturday, May 3, 2014

code: Multiprocessing producer/consumer with logging

Wordclouds (and multiple processes) are fun

Let's say you're writing a browser toy, to display what people say in their important #beer tweets. On the server you want to scan Twitter for #beer, and store the tweets. On occasion, a browser will fetch the list of words, then display most recent related words in those tweets as a word cloud.

To reduce complexity you don't want to add any extra packages, which might be untested and/or sketchy. What do you do with a standard "batteries included" Python? You use multiprocessing!

The multiprocessing module lets you write programs as a system of a connected processes. In this case, one is a producer: it does work then pushes information to list of tweets, shared in the system. Another process is a consumer: it waits for data from producer, then processes it for display on a browser in a pretty word cloud.

Server programming: log early and often

Without writing asynchronous code it's hard to do a lot of I/O in a single Python process. By splitting up your project into multiple tasks, each with its own process, each task can run on a separate CPU in parallel. The multiprocessing module helps us start and stop proceses, and communicate data back and forth.

In app programming, debugging is interactive; "print" statements are one way of testing the code. On a server, this doesn't work as well. It's best to have logging statements and lots of them to make sure the system works, and to diagnose errors. Each logging entry has a timestamp, a severity, and a message. By reviewing ERROR and WARNING entries one can verify the system works, and can diagnose it if it's behaving strangely, even if issues are rare.

It's better to have too much logging than not enough. Your Operations people don't understand your code. If they see overall system problems it's easier for them to sift out the irrelevant logging messages, rather than add more logging into a complex system.

Log early, log often -- you and your operations people will love you for it.

This post was inspired by Playing with REALTIME data, Python and D3 by Brett Dangerfield. His code actually scans Twitter and does the word cloud display.

If you're even curious about Python, run don't walk to get Python Cookbook by David Beazley and Brian K. Jones. I've been programming in Python for 15 years and learn new tools and techniques from every chapter!

In modern Python3, take a look at the more graceful concurrent.futures solution.

Code: mptest_proxy

#!/usr/bin/env python

'''
mptest_proxy.py -- producer adds to fixed-sized list; scanner uses them

OPTIONS:
-v  verbose multiprocessing output
'''

import logging, multiprocessing, sys, time


def producer(objlist):
    '''
    add an item to list every 2 sec; ensure fixed size list
    '''
    logger = multiprocessing.get_logger()
    logger.info('start')
    while True:
        try:
            time.sleep(1)
        except KeyboardInterrupt:
            return
        msg = 'ding: {:04d}'.format(int(time.time()) % 10000)
        logger.info('put: %s', msg)
        del objlist[0]
        objlist.append( msg )


def scanner(objlist):
    '''
    every now and then, run calculation on objlist
    '''
    logger = multiprocessing.get_logger()
    logger.info('start')
    while True:
        try:
            time.sleep(5)
        except KeyboardInterrupt:
            return
        logger.info('items: %s', list(objlist))
            

def main():
    opt_verbose = '-v' in sys.argv[1:] 
    logger = multiprocessing.log_to_stderr(
            level=logging.DEBUG if opt_verbose else logging.INFO,
    )
    logger.info('setup')

    # create fixed-length list, shared between producer & consumer
    manager = multiprocessing.Manager()
    my_objlist = manager.list( # pylint: disable=E1101
        [None] * 10
    )

    multiprocessing.Process(
        target=producer,
        args=(my_objlist,),
        name='producer',
    ).start()

    multiprocessing.Process(
        target=scanner,
        args=(my_objlist,),
        name='scanner',
        ).start()

    logger.info('running forever')
    try:
        manager.join() # wait until both workers die
    except KeyboardInterrupt:
        pass
    logger.info('done')
    

if __name__=='__main__':
    main()

John Tells All