Prakhar Joshi: Transforming Safe Html under GSOC'15

published Oct 15, 2015

Talk by Prakhar Joshi at the Plone Conference 2015 in Bucharest.

I came to know Plone during my Google Summer of Code this year. I never used Plone before, but I started loving it.

My project was about transforming safe html, to do that with lxml now.

Plone is one of the most frequent participants in the Google Summer of Code, which I found a plus. The people on IRC were really helpful during the initial phase. You really helped me a lot! And there was awesome documentation. If you start from the first page, and on and on, then it will be a nightmare, but if you know what you are looking for, it is awesome.

I came across this ticket: https://dev.plone.org/ticket/14929. Proposed by Tom Gross, who had earlier tried it, but ran into many test failures. plone.transform was an earlier try as well, during GSOC'07. We had discussions, about moving it from how CMFDefault did it, to using lxml. Thought about the intelligenttext transform too.

I had difficulties working with Plone. It is not easy for anyone who is new. The code base is vast. The first two months it felt very alien. Difficult for me to figure out where to start. The safe_html of portal_transforms is quite old, and I was not used to those libraries, but I did need to know what this code was actually doing.

So lots of difficulties, but also lots of fun. I learnt how to work in a team, often asking 'can you help me?' I learnt new things about Plone, and how fun it is. Learnt how to write efficient code, and how to document how I did it. Pep8, indentation, etcetera, producing good code. Test driven development, I never tried that before. This is one of the best parts of Plone. I got to interact with cool people. They taught me a lot, including some new beers of Europe.

Main goal: get rid of the transform we were using from CMFDefault.

The new filter. Lots of things required to setup the add-on, generic setup, browser view, control panel, interface, an uninstall profile. We have automatic registration and unregistration, done when installing the add-on. Package is: experimental.safe_html_transform

Code: https://github.com/collective/experimental.safe_html_transform

New releases of Plone were done, and suddenly tests started failing. plone.app.widgets was not pinned and got pulled in, and the latest version was trying to pull in the latest CMFPlone version, which was too high at that point, giving an version conflict during buildout.

We created a new control panel, with separate permission.

The new transform uses lxml. It converts the document into the tree form and parses each node of the tree. It checks if the tags or attributes are safe, strips or removes them if not. lxml is faster than the previous solution from CMFDefault.

We used unit tests to test the transform, with sample html. Also automatic robot tests for the control panel.

Now TinyMCE should use our new transform. It uses getToolByName to get the portal_transforms to get the registered transform by its name: it wants safe_html. So we renamed the transform to safe_html.

One thing left: integration of control panel and script.

I learned test driven development, collaboration with people on a project, also a big project, how to understand errors from the error logs, how to keep logs for huge projects (blogging, task managers, etc), because you will forget what you have done five months ago.

Plans: I started loving Plone, so I look forward to learning more about it. First I will finish this project.

Most of the people in India do not use Plone. In my university no one used it. That is a barrier for picking up such a project from GSOC. More awareness is needed, so people are not stuck with Wordpress and need to learn the basics of Plone still.

I never took a training of Plone, that would have helped, like I did this week.

I will be here for one day of the sprints. I would like to work with you.