Archive

Posts Tagged ‘python’

Migrated to github pages

May 29, 2013 Comments off

I’ve migrated this with Pelican to blog.ionelmc.ro. It’s hosted on github pages (sources on github repo).

 

Tags:

Windows 8 doesn’t let you change the network type ?

November 18, 2012 3 comments

In case you’ve been feeling adventurous like me and installed Windows 8 you probably also installed Hyper-V cause well, it’s free and it has way better integration with Windows. Yes, you can make your VM start when Windows starts with VMware too – but Hyper-V remembers if you had it running before shutdown. And shutdown works unlike VMware which does poweroff in case you did not reinstall vmware-tools after kernel upgrade (at least with Ubuntu). Could I have gotten something wrong ? Don’t think so … cause it works properly on Hyper-V.

Well anyway, the Network and Sharing Center in Windows 8 is quite weird … could not change the network type (to private) for one virtual Hyper-V adapter even tho I’ve enabled all user control with gpedit.msc. To fix it I had to use the Network List Manager API’s SetCategory it seems. There was this PowerShell script that seem to do that but, damn it, I don’t want to edit the script every time I have the problem. And that PowerShell thing looks awful compared to Python …

Here’s a python script that does the right thing: asks the user what network should be made private.

import win32com.client
NETWORK_CATEGORIES = {
    1: "PRIVATE",
    0: "PUBLIC",
    2: "DOMAIN"
}
m = win32com.client.Dispatch("{DCB00C01-570F-4A9B-8D69-199FDBA5723B}")
more = 1
pos = 1
connections = m.GetNetworkConnections()

while more:
    connection, more = connections.Next(pos)
    if connection:
        network = connection.GetNetwork()
        category = network.GetCategory()
        print '%s. "%s" is %s' % (pos, network.GetName(), NETWORK_CATEGORIES[category])
        if not category and raw_input("Make private [N]") in ['y', 'Y']:
            network.SetCategory(1)
    pos += 1

Now isn’t this pretty ? (except the shitty iterator-wanna-be api in GetNetworkConnections …)

Tags: ,

Tweaks for making django admin faster

January 19, 2012 6 comments

Here follow a number of tricks I’ve employed in the past to make django admin faster.

Editable foreign keys in the changelist

If you have foreign keys in list_editable django will make 1 database query for each item in the changelist. Quite a lot for a changelist of 100 items. The trick is to cache the choices for that formfield:

class MyAdmin(admin.ModelAdmin):
    list_editable = 'myfield',
    def formfield_for_dbfield(self, db_field, **kwargs):
        request = kwargs['request']
        formfield = super(MyAdmin, self).formfield_for_dbfield(db_field, **kwargs)
        if db_field.name == 'myfield':
            myfield_choices_cache = getattr(request, 'myfield_choices_cache', None)
            if myfield_choices_cache is not None:
                formfield.choices = myfield_choices_cache
            else:
                request.myfield_choices_cache = formfield.choices
        return formfield

Foreign keys or many to many fields in admin inlines

If you have fk of m2m fields on InlineModelAdmin for every object in the formset you’ll get a database hit. You can avoid this by having something like:

class MyAdmin(admin.TabularInline):
    fields = 'myfield',
    def formfield_for_dbfield(self, db_field, **kwargs):
        formfield = super(MyAdmin, self).formfield_for_dbfield(db_field, **kwargs)
        if db_field.name == 'myfield':
            # dirty trick so queryset is evaluated and cached in .choices
            formfield.choices = formfield.choices
        return formfield

Enable template caching

It’s amazing how easy it is to forget to add this in your settings:

TEMPLATE_LOADERS = (
    ('django.template.loaders.cached.Loader', (
        'django.template.loaders.filesystem.Loader',
        'django.template.loaders.app_directories.Loader',
    )),
)

Use select_related for the edit forms too

In case you have some readonly fields on the edit form and they need related data to display list_select_related doesn’t help. Eg:

class MyAdmin(admin.ModelAdmin):
    readonly_fields = 'myfield',
    def queryset(self, request):
        return super(MyAdmin, self).queryset(request).select_related('myfield')

Use annotations if possible for function entries list_display instead of making additional queries

Check the aggregation api to see if you can use this. Here’s the typical example:

class AuthorAdmin(admin.ModelAdmin):
    list_display = 'books_count',
    
    def books_count(self, obj):
        return obj.books_count
    
    def queryset(self, request):
        return super(AuthorAdmin, self).queryset(
            request).annotate(books_count=Count('books'))

The models would look like this:

class Author(models.Model):
    name = models.CharField(max_length=100)

class Book(models.Model):
    name = models.CharField(max_length=100)
    author = models.ForeignKey(Author, related_name="books")

Cache the filters from the admin changelist

This has the obvious tradeoff that you’ll have stale data in the list of filter but if they don’t change that often and those distinct queries are killing your database then this will help a lot. Just add a custom change_list.html template in your project (eg: templates/<myapp>/change_list.html):

{% extends "admin/change_list.html" %}
{% load admin_list i18n cache %}

{% block filters %}
    {% cache 300 admin_filters request.GET.items request.path request.user.username %}
        {% if cl.has_filters %}
          <div id="changelist-filter">
            <h2>{% trans 'Filter' %}</h2>
            {% for spec in cl.filter_specs %}{% admin_list_filter cl spec %}{% endfor %}
          </div>
        {% endif %}
    {% endcache %}
{% endblock %}

Bonus trick

frame = sys._getframe(1)
while frame:
    if frame.f_code.co_name == 'render_change_form':
        if 'request' in frame.f_locals:
            request = frame.f_locals['request']
            break
    frame = frame.f_back
else:
    raise RuntimeError("Could not find request object.")

# do stuff with request

This could be used in some specific cases (eg: you need the request in a widget’s render method), as a last resort ofcourse ;)

What did you do to make django admin faster ?

Drop-in celery AbortableTask replacement

October 24, 2011 1 comment

If you need to report progress updates from the tasks (or you call update_state in the task) you cannot use the bundled AbortableTask from celery.contrib.abortable because it relies on status updates too. That means you’ll get race conditions if you do that.

You can use revokes for aborting tasks but they don’t give you enough control and it’s not guaranteed that your tasks will stop gracefully (or stop at all). Revokes can raise SoftTimeLimitExceeded if enabled (via TERM signal) however it might be tricky to perform cleanup – if you call C extension the exception will get delayed till the call returns. See the signal module docs for what happens when you raise an exception from a signal handler (that’s what celery does).

Given this, an alternative is to use redis to store the aborted task ids in a redis set. If you use the redis broker you can use this drop-in replacement:

from contextlib import contextmanager
import celery
from celery.task.base import Task
from celery.result import AsyncResult

from django.conf import settings

assert settings.BROKER_TRANSPORT == 'redis', "AbortableTask can only work with a 'redis' BROKER_TRANSPORT"
REDIS_KEY = getattr(settings, 'ABORTABLE_REDIS_KEY', 'task-aborts')

@contextmanager
def client_from_pool():
    connection = celery.current_app.pool.acquire()
    try:
        yield connection.default_channel.client
    finally:
        connection.release()

class AbortableAsyncResult(AsyncResult):

    def is_aborted(self):
        with client_from_pool() as client:
            return client.sismember(REDIS_KEY, self.task_id)

    def abort(self):
        with client_from_pool() as client:
            client.sadd(REDIS_KEY, self.task_id)

class AbortableTask(Task):

    @classmethod
    def AsyncResult(cls, task_id):
        return AbortableAsyncResult(task_id, backend=cls.backend,
                                             task_name=cls.name)

    def is_aborted(self, **kwargs):
        task_id = kwargs.get('task_id', self.request.id)
        with client_from_pool() as client:
            return client.sismember(REDIS_KEY, task_id)

    def cleanup(self, **kwargs):
        task_id = kwargs.get('task_id', self.request.id)
        with client_from_pool() as client:
            client.srem(REDIS_KEY, task_id)

    def after_return(self, status, retval, task_id, args, kwargs, einfo):
        self.cleanup(task_id=task_id)

This will use the broker’s connection pool if enabled (you should enable it, just set BROKER_POOL_LIMIT).

Tags: , ,

Django pro tip: if you only use the admin

October 11, 2011 5 comments

If you have a project that only exposes the admin you should just use the 500/404 templates from the admin.

Put this in your project’s urls.py:

from django.utils.functional import curry
from django.views.defaults import server_error, page_not_found

handler500 = curry(server_error, template_name='admin/500.html')
handler404 = curry(page_not_found, template_name='admin/404.html')

I wonder why django doesn’t mention those templates in the docs …

If you have other drop-in apps that need authentication (like rosetta or sentry) bare in mind that the admin doesn’t have a reusable login view so you must hook one. You should just reuse django admin’s login template. Put this in the urlpatterns (don’t forget to match it to LOGIN_URL in the settings):

    url(r'^accounts/login/$', 'django.contrib.auth.views.login', {'template_name': 'admin/login.html'}),

You might note that this is not very DRY but actually the LOGIN_URL might differ than the one in the urlpatterns (eg: you mount the django wsgi handler on a non-root path).

Tags: ,

Tmux scripting

September 25, 2011 5 comments

I usually need to run more than 1 command for some project and got tired of searching through those putty windows for the session I want. So I thought of using a terminal multiplexer like Tmux.

I’m using celery with two queues and I need to run this:

  • manage.py celeryd -Q queueA
  • manage.py celeryd -Q queueB
  • manage.py celerycam -E

I need celerycam because it will get those stats in djcelery up to date.

It’s also a good idea to tail the postgresql log – when you break your models or database in some cases Django isn’t very helpful so this helps a lot:

  • tail -f /var/log/postgresql/postgresql-8.4-main.log

I use a wide screen so I want a layout like this:

    +------------------------------------+-------------------+
    |                                    |                   |
    |              runserver             |                   |
    |                                    |     celerycam     |
    +------------------------------------+                   |
    |                                    |                   |
    |               celeryd              +-------------------+
    |                                    |                   |
    +------------------------------------+                   |
    |                                    |   postgresql log  |
    |               celeryd              |                   |
    |                                    |                   |
    +------------------------------------+-------------------+

I also want to start a new tmux session from the same command so I can close everything easily – those celeryd’s don’t reload automatically :(

You’d usually run something like:

tmux new-session "tmux splitw 'command1';  tmux splitw 'command3'; tmux splitw 'command3'; command4"

but that get’s rather long and you need to quote and escape, calculate the panel sizes manually (I want equal height) and for the layout above you also need to select the right panels before splitting.

The commands vary across projects (some have more and some have less) – so how about we make a script:

import subprocess

left_commands = [
    "python manage.py runserver",
    "python manage.py celeryd -Q queueA -c 2 -E -n worker1",
    "python manage.py celeryd -Q queueB -c 2 -E -n worker2",
]
right_commands = [
    "python manage.py celerycam",
    "tail -f /var/log/postgresql/postgresql-8.4-main.log",
]
session = ''

if right_commands:
    session += 'tmux selectp -t 0; tmux splitw -hd -p 35 \"%s\"; ' % right_commands[-1]
for index, command in enumerate(right_commands[:-1]):
    session += 'tmux selectp -t 1; tmux splitw -d -p %i \"%s\"; ' % (
        100 / (len(right_commands) - index),
        command
    )

for index, command in enumerate(left_commands[1:]):
    session += 'tmux selectp -t 0; tmux splitw -d -p %i \"%s\"; ' % (
        100 / (len(left_commands) - index),
        command
    )
if left_commands:
    session += left_commands[0]

args = [
    'tmux',
    'new-session',
    session,
]
print 'Running ', args
subprocess.call(args)

Measure your code

July 3, 2011 9 comments

I found a cool tool today to measure code: metrics. It measures SLOC, comments and cyclomatic complexity. It’s easy to install: pip install pygments metrics

Run this in your project’s root:

metrics -v `find . -type f \( -iname "*.css" -or -iname "*.py" -or -iname "*.js" -or -iname "*.html" -or -iname "*.txt" \) \! -path "*/migrations/*.py" -print`

I have a django project so I added \! -path "*/migrations/*.py" to skip any files that are in a migrations dir (I’d skip the automatically generated south migrations).

You probably bundle other libraries or apps in your source tree (eg: jquery or that nice django app the author didn’t bother to make a setup.py script for) so you want to measure only some specific paths. Eg, to collect stats only for files in src/foobar, lib/tools and src/otherbar:

metrics -v `find src/foobar lib/tools src/otherbar -type f \( -iname "*.css" -or -iname "*.py" -or -iname "*.js" -or -iname "*.html" -or -iname "*.txt" \) \! -path "*/migrations/*.py" -print`

If you work on multiple projects you can make a script or alias for this:

metrics -v `find \`cat METRICS\` -type f \( -iname "*.css" -or -iname "*.py" -or -iname "*.js" -or -iname "*.html" -or -iname "*.txt" \) \! -path "*/migrations/*.py" -print`

And in each project just save a METRICS file with the list of paths.

I get something like this for one random project:

Metrics Summary:
Files                       Language        SLOC Comment McCabe
----- ------------------------------ ----------- ------- ------
  129                         Python        4831     289    261
    2                      Text only           0       0      0
   49              HTML+Django/Jinja        1381      19    166
    7                     JavaScript        2204     231    352
   21                            CSS        1839     111      0
----- ------------------------------ ----------- ------- ------
  208                          Total       10255     650    779

Do McCabe (aka cyclomatic complexity) ratios (McCabe/(SLOC-Comment)) look odd ? (0.17 for javascript and and 0.05 for python).

Do you know other measurement tools adequate for Django projects ? Do tell me and, if possible, how to run it for specific files like above (I’m lazy :)

Custom app names in the django admin

June 24, 2011 19 comments

EDIT: This approach is flawed – it will never work in the app_index page (/admin/appname/) and can cause problems with contenttypes queries. You are better off overriding the admin templates (lots of them unfortunately). To avoid hardcoding the app_label in the templates wrap it in trans tags and use the internationalization framework to map the internal name to your desired display name.

EDIT 2: You can also use this.

Suppose you have a model like this:

class Stuff(models.Model):
    class Meta:
        verbose_name = u'The stuff'
        verbose_name_plural = u'The bunch of stuff'

    ...

You have verbose_name, however you want to customise app_label too for different display in admin. Unfortunatelly having some arbitrary string (with spaces) doesn’t work and it’s not for display anyway.

Turns out that the admin uses app_label.title() for display so we can make a little hack: str subclass with overriden title method:

class string_with_title(str):
    def __new__(cls, value, title):
        instance = str.__new__(cls, value)
        instance._title = title
        return instance

    def title(self):
        return self._title

    __copy__ = lambda self: self
    __deepcopy__ = lambda self, memodict: self

Now we can have the model like this:

class Stuff(models.Model):
    class Meta:
        app_label = string_with_title("stuffapp", "The stuff box")
        # 'stuffapp' is the name of the django app
        verbose_name = 'The stuff'
        verbose_name_plural = 'The bunch of stuff'

    ...

and the admin will show “The stuff box” as the app name.

Tags: ,

Stale formsets and the back button

March 17, 2011 2 comments

I have a problem on one of my projects:

I have a page with a form that depends on what is stored in the database. If your using django formsets or have some form that saves over multiple objects you have this problem too.

The user saves the form, data gets saved. However, if the user uses the back button he will get a page with the old form (that expects different data in the database). If the form gets resubmitted all kinds of problems may appear.

I had one case when you could get a ValueError: invalid literal for int() with base 10: '' if you resubmit a formset but instead of a existing object you have a new one. Easy to pull off by a regular user if he has multiple tabs opened.

The best solution, I think, is to reload the page when the users goes back in history. Turns out this is easy to pull off with some http headers:

Cache-Control: no-cache, must-revalidate, no-store
Pragma: no-cache

The “no-store” option actually makes browses re-request when using the back button. Also, I’ve seen people adding “post-check=0, pre-check=0″ to Cache-Control. Do NOT use those. They are Microsoft extensions to the http protocol, and if set they will actually make Internet Explorer request the page two times ! see this.

Here’s a simple view decorator if you’re using django:

from django.utils.decorators import wraps
def must_revalidate(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        response = func(*args, **kwargs)
        response["Cache-Control"] = "no-cache, must-revalidate, no-store"
        response["Pragma"] = "no-cache"
        return response
    return wrapper

Having no-store creates some additional load on the server and creates other user experience problems:

  • Users won’t have the old form data when they go back
  • Page scroll position isn’t kept
  • If the page has other state it isn’t kept – but depending on the browser that state might not be cached anyway (eg: dom changes)

But I think it’s still better than surprising the user with different results for a submission with the same form data or having to deal consistently with missing or extra data in the database on the server-side. What do you think ?

Tags: , ,

Wondering about django orm caching frameworks

August 25, 2010 2 comments

So briefly looking over the code reveals that:

  • johnny-cache will cache the rows returned by the execution machinery in django’s sql compiler (monkey-patches the compilers). It looks like it has fancy-pants invalidation (it basically has bulk invalidation through 2-tiered cache key scheme, unlike cache-machine with relies on set_many) and even support for transactions. I’m using this and it’s awesome.
  • django-cache-machine will cache the result of the QuerySet.iterator method. It seems that it has some limitations: it only (automatically) invalidates on forward relations (FKs) so you have to perform carefull invalidation through your code (eg: you use qs.update(), run queries through models without the custom CachingManager, use Model.create() and whatnot …). Also, cache-machine will be heavy on the memcached traffic (1 call for every invalidated object, using set_many though …)
  • django-cachebot will cache the rows on the same level as cache-machine (at QuerySet.iterator call). Also, it has a very nice feature that will prefetch objects from reverse relations (like FK reverse descriptors and many to many relations – eg: Group.objects.select_reverse('user_set') and then group.user_set_cache will be equal to group.user_set.all()). Unfortunately the author only tested it on django 1.1 and it needs a django patch to work (the django manager patch is only for 1.1). I really like that select_reverse feature – unfortunately I can’t use this on django 1.2 :(

So I’m thinking what I need is johnny-cache for the low level stuff and then some cache-machine to cache some of that “select_reverse” feng-shui that I would have to do myself.

Well, I’m probably missing something here and some other people should have better comparisons for these frameworks. Any feedback ?

Edit: cachebot’s select_reverse is based on django-selectreverse, however it’s enhanced to support nested reverse relations (eg, from the docs: Article.objects.select_reverse('book_set','book_set__publisher_set'))

Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 172 other followers