Weblog
Paul Stevens - Case Study: Zero-Downtime Plone Hosting
Presentatie tijdens de Nederlandse Plone gebruikersdag, 8 oktober 2012 in Musis Sacrum, Arnhem.
Meer info: Nederlandse Plone gebruikersdag 2012.
Paul Stevens werkt bij NFG.
Voor RIPE hebben we een Plone site gemaakt. Eis was dat deze website nooit down mag. Specifiek: 99,99 procent uptime. De capaciteit om websiteverkeer aan te kunnen moet hoog genoeg zijn, zodat de website snel is. De website moet ook niet tijdelijk onbereikbaar zijn in verband met onderhoud.
Vroeger was een website simpelweg een paar plaatjes en statische html. Dat is makkelijk in de lucht te houden. Plone is een complete applicatieserver. Wat je meestal doet is dat je een webserver hebt (Apache, nginx) die met de Plone applicatieserver praat. Schaling wordt dan lastig, want Plone kan minder verzoeken aan dan een webserver die alleen statische content serveert.
In ieder geval moet je ZEO gebruiken, zodat je de Plone scheidt van de database met de inhoud. Je kan varnish tussen de webserver en Plone zetten, zodat varnish vaak opgevraagde pagina's of onderdelen die vrijwel nooit veranderen, kan onthouden en kan aanbieden zonder via Plone te hoeven gaan. Zo kan je er lagen tussen blijven schuiven. De bottleneck blijft Plone. Zo schaal je verticaal op.
Je kan ook horizontaal opschalen: je zet meerdere Plone instanties naast elkaar. Dan kan je de eerste Plone herstarten met nieuwe code, dan de tweede, enzovoorts. Rolling upgrades noemen we dat. Dan blijft de gebruiker de website zien.
Maar: varnish weet niet dat die eerste Plone eventjes niet beschikbaar is. Dan zet je HA-Proxy erin. Die houdt dat bij en kan helpen bij sticky sessions.
Dan heb je nog steeds een single point of failure, één plek waar het mis kan gaan. Dat is in dit geval vooral de ZODB database. We zijn MySQL gaan gebruiken met RelStorage om de Plone database in op te slaan. MySQL hebben we in een cluster gezet, zodat er meerdere databaseservers zijn. We hebben performancetests gedaan en MySQL kwam er tot onze verbazing beter uit dan PostgreSQL.
Dan moet je nog zorgen dat niet alle verzoeken bij één webserver met zijn achterliggende varnish en HA-Proxy terechtkomen. Dus die opzet verdubbel je. De HA-Proxies kunnen, middels een cloudcomputing opzet van RIPE zelf, bij alle Plone instanties. De twee stacks draaien op verschillende locaties, in Amsterdam en Almere.
Daarboven zit dan een load balancer, maar die is de verantwoordelijkheid van de klant. Daarnaast hebben we ook bewakingssystemen die in de gaten houden of alles nog draait. Je moet ook regelen dat er iemand is die reageert als er problemen zijn, die 24 uur per dag in kan grijpen.
Sindsdien hebben we nooit de site uit de lucht hoeven halen. Er zijn wel storingen geweest, maar dat lag aan hardwareproblemen, zoals harde schijven en netwerkverbindingen.
We hebben het hier over een miljoen unieke bezoekers per maand.
Plone was eerder in beeld geweest bij RIPE, maar een interne Javaclub was er in eerste instantie mee aan de slag gegaan met een eigen systeem. Dat ging niet goed en uiteindelijk is toch weer voor Plone gekozen.
Wietze Helmantel - Case Study: 13 nieuwe websites in één Plone omgeving
Presentatie tijdens de Nederlandse Plone gebruikersdag, 8 oktober 2012 in Musis Sacrum, Arnhem.
Meer info: Nederlandse Plone gebruikersdag 2012.
Wietze Helmantel werkt bij GW20e. Hij geeft een case-study over het opzetten van 13 grote websites van Nuffic binnen één Plone omgeving en alles wat hierbij komt kijken.
De hoofdsite is http://nuffic.nl, met veel informatie over studeren in het buitenland en voor buitenlandse studenten in Nederland. De websites waren in Plone 3 en moesten in Plone 4. Het gaat om 13 (sub)sites met gedeelte content en redactie.
De wens was behoud van functionaliteit en geen contentmigratie. Meertaligheid op subsiteniveau is nodig. Uitgebreide zoekfunctionaliteit, middels Apache Solr. SEO was nogid voor behoud van huidige zoekmachineranking. Inloggen met OpenId.
Het design is gedaan door Evident, Utrecht. Externe hosting bij InterMaxx, Rotterdam.
We hebben ontwikkeld volgens de Scrum methodiek. We gebruikten daar de tool ScrumDo voor, waarin we user stories bij hielden, planning poker, een burndown chart voor de vooruitgang. Documentatie gebeurde op Google Docs. We gebruiken Lighthouse voor adaptief en correctief onderhoud. Dat is een online tool voor het bijhouden van issues. OTRS voor support, daar worden verzoeken geregistreerd die via e-mail of telefoon binnenkomen. Vanaf daar kunnen issues naar Lighthouse.
De bedoeling is om zo dicht mogelijk bij Plone te blijven, bij hoe Plone standaard werkt. We hebben Plone 4.1 gebruikt, toen de meest recente stabiele versie. Twee nuffic modules voor content types en het theme. Een aantal modules van Plone hebben we moeten aanpassen. We hebben dit op een 'fork' gedaan, een aparte ontwikkeltak waarvan in dit geval de functionaliteit niet in de standaardmodule terecht zal komen omdat het te specifiek voor deze klant is.
We gebruiken veel standaard Plone functionaliteit: gebruikersbeheer en de 'sharing tab', content rules, content types (nog Archetypes), workflows, het security model.
We hadden voor formulieren voor PloneFormGen kunnen kiezen, maar ze hadden Formdesk al, dus dat hebben we gehouden. Dat gebruiken we via iframes. Verder MailPlus voor nieuwsbrieven.
Voor diverse subsites was extra functionaliteit nodig. We hadden dit in dertien losse sites kunnen doen, maar hebben dit in één site gedaan.
Eric Steele: Everything about Plone, past and future
Key note presentation during the Dutch Plone Users day, 8 October 2012 in Musis Sacrum, Arnhem, The Netherlands.
More info: Dutch Plone Users Day 2012 (in Dutch).
Eric Steele is the release manager of Plone 4.
We want to improve the user experience by simplifying development.
One of the pillars is Dexterity, a new content type system as alternative to Archetypes. You can create new types through the web, so that is easier for end users.
Deco is a new layout engine. It has been going through a lot of changes the last few years. It is very ambitious but we now have Deco Light, a more stripped down version. collective.cover has been created, which is basically Deco for a front page of your site. So it is an alternative to add-ons like Collage. With Deco you move tiles around. The current ones in collective.cover are very simple, but the idea is that later you can add more complex ones, like showing fields for the current context object. But the basics are there.
Theming has been Plone's biggest hurdle. We added Diazo in Plone 4.2, with integration in plone.app.theming. This makes it easier to make Plone not look like a Plone site, but like your designer intended it. The design should not interfere with the editing experience. The CMSUI effort helps with that.
In Plone 4.3 we are building a theme editor that you can use through the web. This makes it easier to theme a site, also for a designer who does not know Plone and does not have experience customizing templates from Plone.
Plone 4.3 is in development, expected release date February 2013. After that, we may go for 4.4 or 5.0.
We will have a new DateTime version, which saves some memory. Dexterity content types are more efficient than Archetypes. Since Plone 4.2 we support Python 2.7, which has a few memory improvements, too. For multilingual support for dexterity you can look at plone.app.multilingual. Some are using this in production already.
Python Users Netherlands
Dutch Python Users meeting hosted by Nelen & Schuurmans in Utrecht on 21 September 2012.
See http://wiki.python.org/moin/PUN/nens_sept_2012
How 2Style4You uses Pyramid (and more Python), Wichert Akkerman
Why do we use pyramid in our project? We did not need a lot of standard functionality (so we did not choose Plone). We did not need pluggability (so we did not choose Zope). We had a complete design of our own, so we wanted to be able to customize things from the bottom up. We needed things fast, translatable, and built in Python.
We started with Pylons. We used that for about a year, but it was not maintained very much anymore at that point. We ported to repoze.bfg, with help from Chris McDonough. Took about two months to migrate the whole thing, which was about the time we estimated. This worked quite well actually.
repoze.bfg got merged into Pylons and became Pyramid. It was easy to migrate to that.
We have fashion portals. For each customer a website with an own design. There is standard functionality, like login, registration, figure analysis and a shop Each site has extras, like a CMS, magazines, celebrities, faceted search, etc.
We have a standard site. For each client we create a wrapper around it, where we add different routes (say urls) and replace some css or parts of templates.
The styling engine is a library to work with body measurements and clothes. Initially we implemented it in Python. We rewrote it into C++ with Boost.Python, which was an interesting experience, partly to be able to not give away the source code in clean text. (Audience: have you looked at bikeshed, which does something similar with C++? No, I don't think it existed yet.)
Internationalization and localization was painful. For us it also had to work in, for example, Chinese, so this was important. We needed translation of content, displaying numbers, times, dates, currencies, etc. Translating urls was interesting, including unicode urls. The design of a Chinese website is very different, much more visible on the page.
zope.i18n and babel both have problems. Python's POSIX gettext support is not complete.
For data entry we created a central system for tagging of clothes. The design was stolen from the NuPlone (R) skin. It requires a modern browser, specifically we tell our clients they must use Chrome.
Quality assurance was important. Everyone makes mistakes. It was hard to get our developers in China to write tests. We use tests and reviews, to avoid errors. We gather errors and logs from sites to get information about what still goes wrong on live sites.
What have we learned?
- There is no good (perfect) library and it is probably not possible to create this. There are too many ways to use forms. We currently use wtform, but just the schema part. Libraries only work well for a specific use case.
- Internationalization and localization is hard.
- Internationalization is very hard if you cannot read the language of the customer.
- It is important to follow the direction of the platform you are using.
- Pyramid has been great for us. It is very flexible, with a light weight core.
Slides: http://www.wiggy.net/presentations/2012/2Style4You%20en%20Python/
Disco - Guido Kollerie
Disco is a large scale data analysis platform. We had a project with a lot of data, which was not really doable on a single computer. It is an implementation of MapReduce, written in Erlang. You use Python to write jobs.
- Map: chop up your work into little bits.
- Reduce: work on the little bits.
Disco users start jobs in Python scripts. These send the jobs to a server that distributes them.
For more info about disco, plus another summary of tonight's talks, see his blog: http://blog.kollerie.com/2012/09/22/python_users_netherlands_meeting/
(Really) naive data mining, Joël Cox
I'm not a data mining expert, but I'll show some extremely simple easy algorithms.
First clustering, with K-Means. Then classification: find the label for a thing.
Take away from this: venture outside your own field and use your knowledge in that field and the other way around.
Slides: https://speakerdeck.com/u/joelcox/p/really-naive-data-mining
"Requests" library for easy json api access + testing dikes, Reinout van Rees
We had a project with sensors where we had to test dikes, to see when they would fail and get our feet wet.
We needed to get data out of an API, so get something from a server. You could use urllib or urllib2 for this, but Request is a much nicer Python library.
Example:
try: response = request.get(self.source_url, timeout=...) except requests.exceptions.Timeout: ... if response.json is None: ...
See http://docs.python-requests.org/ and http://pypi.python.org/pypi/requests
MongoEngine+Relational+Privileges (on Pyramid!), Marcel van den Elst
We used Django previously, but we only used about ten percent of it. We now use about 90 percent of Pyramid.
Gantt charts do not work. They are static, someone up top comes up with them and they are outdated when they are distributed. So we want progressive planning.
MongoDB is really easy to install. We use that as a basis.
We started with Django, because I knew it. Plus Postgres, in 2009. The problem was that our use case did not fit Django. We have RESTful API through Django-Tastypie. Wonderful, but it was very inefficient. It was not at all scalable without hard-coded SQL magic. There were no object level privileges, no offline working, etcetera. For example there was some code that saved some state just in case, but that killed our servers because the state consisted of lots and lots of relations.
Why adopt MongoDB? Everything is JSON, which is what we wanted. Deployment is a breeze. It is web-scalable and flexible out of the box. It has very active development and a growing community.
Why MongoEngine? It is small, transparent, active. mature, well-documented, readable, responsive authors. It connects Python (Django?) and MongoDB.
mongoengine-relational features. It manages changes to relations, automatically updates the other side, memoizes document fields to monitor differences and much more. As bonus we needed DocumentCache. It transparently caches documents in a thread-safe, self-attaching DocumentCache on the request object. Really easy to setup. You define trigger functions like on_change_.
Why mongoengine-privileges? We needed document-level and field-level permissions. We could not find an existing mixin that correctly and transparently handled object-level privileges across relations.
Why TastyMongo? We already had a RESTful API (TastyPie) that worked like a charm in Django. You can use TastyMongo to talk with MongoDB in Pyramid.
Front-end: d3.js.
Slides: http://prezi.com/_hx6kdevlxab/mongoengine-relational-privileges-goodness/
Python for those little throwaway scripts (that you end up not throwing away), Tikitu de Jager
We end up writing a lot of throwaway bash scripts. After a few months you improve it. Then a colleague improves it some more. So it is not throwaway code after all. So for starters put it in source code control.
Can it be in Python? Sure. You can create some basic stuff in house. We have something like this in all our scripts:
from in_house.toolbox import script @script(name='defrobnicator') def main(script_utils): ...
You want to prevent reinventing the wheel.
Cool:
from sh import ifconfig, git, ls, wc
I have not used that yet actually.
Also have a look at plumbum.
And:
from blessings import Terminal
See the speaker's own notes: http://www.logophile.org/blog/2012/09/25/a-less-than-lightning-talk/
Shell pearls (not to be confused with Perl shells), Remco Wendt - Maykin Media
$ ^chmod^chown # Run the previous line (with typo) with the improved spelling $ sudo !! # Run the previous command as sudo $ mv !-1:1 !-2:1 # Get stuff back from previous commands (careful when you used ``rm``) $ CTRL-x CTRL-e # Start your editor on the current line, save and run the command $ shopt -s globstar; ls **/*py # List all python files recursively
Slides: http://www.slideshare.net/sshanx/shell-pearls-not-perl-shells-pun-21092012
Bash history cheat sheet: http://www.catonmat.net/download/bash-history-cheat-sheet.pdf
Vagrant - Reinout van Rees
Fabric is a library to do stuff on remote machines. I use vagrant to easily create a VirtualBox with Linux on my Mac and use a small fabric-like script to easily use that.
cd ~/vm/my-vagrant-box/home/me/somewhere vc bin/test # ssh to the vagrant box, go to this dir and run bin/test
Thanks to Nelen & Schuurmans for hosting the meeting.
Python web meetup Netherlands
PUN web meetup with presentations.
On Wednesday 4 July, there was a PUN web meetup, formerly known as Dutch Python Django Meeting.
Building Single-page web-applications with Django, Twisted and TxWebsocket
Talk by Jeroen van Veen (Goldmund, Wyldebeast & Wunderliebe, GWW).
Websockets give you a persistent TCP connection from a browser to the server. So: you don't need to do a request every time. The connection is there. You need to decide what to send over that connection. This does not need to be HTTP.
With javascript you can upgrade to a persistent connection. The protocol for the handshake used here is changing but getting more stable. Google for Hybi-16. Both utf-8 encoded data and binary data are supported. You have low latency: no headers need to be sent every time, the connection does not need to be setup for each request. So it has a very small foot print. The preferred way of serialization, for me at least, is with JSON.
There is support for this in Python: TxWebsocket (my preferred library, based on Twisted), Autobahn WS, Gevent-websocket. Supported in browsers: Chrome, Firefox, Opera, Safari, IE 10. As browser fallback you could use Flash Websocket.
When buildout a web application, you have to decide whether to build this multi-page or single-page. Multi-page is of course what most websites are using. For a single-page approach, (almost) everything is done on one page. You bootstrap a page, that sets up the websocket connection using the handshake. In this handshake the sessionid cookie from Django can be reused so you can authenticate.
I want to mimic Django URL routing on the server side. And on the client side too, really, using javascript. I use XRegexp for named group support, which helps here. When clicking on a link, you do not want to end up in a fresh page, but want to stay on the same page. But you can change the URL of the navigation bar anyway with javascript. It starts acting more like a multi-page website then, but can still remain single-page. This is usable with the HTML5 history API, better than for example adding hashes to the url. Using this, you can build a website where you can edit the same document with multiple persons simultaneously.
Django WS arranges the authentication during the handshake with the session cookie. It handles a client transport pool, event subscription, websocket URL-routing protocol; Twisted MultiFile, runws, autoreload; BSD license. It is in development. Modules: core, misc, blog, wiki. It is used in a project now for narrow casting, letting messages appear on screens, possibly for a longer period.
There are some challenges. Cleanup the design and architecture. A usable pluggable reference CMS. Proper documentation. Community. SEO: a web crawler will not index much, as only the contents of the bootstrap page are indexed.
The Django WS Code is on github: https://github.com/phrearch/django-ws-core Find me on Twitter: @jeroen_van_veen.
Audience: look at Tornado, it does something similar.
Document automation (Office) using Django
Talk by Henk Vos: http://www.rapasso.eu
You want to generate for example a Word document by pushing a button on your website and have this document contain some information that comes out of your database. You could install MS Office on your server. But we have Linux, and it is not available there. You could use a different box. Or maybe Google Docs, or the Pyuno bridge to OpenOffice. There is also python-docx.
But really, such a Word document is just an archive with some xml files. So we can use the Django template engine. Put some normal {{Django variables}} literally in your Word document. Extract the important file from the archive, run the Django template engine over it, save it, zip it up again, and let your web server serve it in a request.
DjangoCon Europe
Talk by Reinout van Rees (http://reinout.vanrees.org/)
This year's DjangoCon Europe was in Zürich, inside a stadium, a few weeks ago.
There were several talks about the real-time web. Tricky is that you can end up with two parallel MVC stacks, one in the Django backend and one in the Javascript frontend. Putting a caching server in front to speed things up will not work with web sockets. Idea is at least to always use an API.
Databases. Not everything fits in one kind of database. Files are better handled on the file system. There were tips for Postgres. You could use a big box and store everything. The NoSQL support in Django core is not done yet.
A few talks about diversity. Involving women in the community. Hard discussion. Stereotypes everywhere.
Healthy web apps. dogslow: for tracebacks.
CSS. http://gluecss.com for sprite compression. Preprocessors: hurray!
Security. Login should be done using SSL (so https). Try a restore of your backups some time, otherwise you might as well not make any backups, as you do not know if it actually helps you. django-admin-sso for giving easier access to admins for several servers.
Heroku. It's a 'cloud unix'. Focused apps, one thing. You can only use an API to communicate with the outside world. Plus documentation. They have 2-3 person teams. That was the ideal size for them. Quality comes from solid engineering, so you need time, not deadlines.
Various bits. You need to work with timezone-aware datetimes. Flask micro framework. Django core moved to github, which turned out very well.
See summaries on my website: http://reinout.vanrees.org/ There are videos online too. Check the Heroku talk and the keynote about internationalization.
Rethinking the Django-CMS landscape
Talk by Diederik van der Boor (Edoburu).
Why does everyone create there own CMS (in Django)?
Do one thing and do it well. Tight focus versus feature creep. Don;t be afraid of multiple apps in your website, separate them out. Write for flexibility and distribution.
There are various Django CMSes. You've got Django CMS, FeinCMS, Fiber, Ella, Merengue, Philo and more. Big outside contenders are Wordpress (easy to setup, but plugin hell) and Umbraco (customized UI layouts, customised page node types).
For page contents and a page tree you can use the django-fluent-pages package. That might be the only CMS-like functionality you need in your application, instead of a full blown CMS. You can use plugins, like code highlighting, MarkDown, or create your own.
If you are going to create a blog you could do it as one application, but it should really be several separate applications. Do one thing and do it well: the django-fluent-comments package.
How about user authentication, social media auth, RSS feeds for comments, e-mail subscriptions, spam protection. Those should also be separate packages. Currently, most blog applications have all or some of these thrown together.
So mostly: separate your packages.
There is a balance to strike: a too monolithic one-package approach, versus lots of packages that need lots of glue code.
See the code at https://github.com/edoburu Find me on Twitter: @vdboor and @edoburu.
Audience: a danger is that your application might work today and is broken tomorrow when someone uploads a new release on PyPI that has backwards incompatible changes. You need to keep an eye on that. Freeze your dependencies in production.
TND Dataview and Metaclass Magic in Python
Talk by Maarten ter Horst from Top Notch Development.
Python metaclasses are ideal for frameworks and abstract classes, code that you want to reuse. A metaclass is a class that handles the creation of other classes. So an instance of a metaclass is a class. Normal classes inherit from object but meta classes inherit from type. It creates a new class based on your class. So basically you can use this to have a different base class instead of object and override some basic behaviour. You can for example check if a newly created class conforms to an interface.
We use it to inspect the class definition and add a urls method that lists all other methods that return an HTTP object.
Google 'IBM Python metaclass' for a good tutorial on metaclass programming. The presentation is available soon on Top Notch Development.