Weblog

published Nov 03, 2021, last modified Nov 04, 2021

Panel: The Future of Search in Plone, 2023 Edition

published Oct 05, 2023

Panel discussion at Plone conference 2023, Eibar, Basque Country.

Panelists are: Sally Kleinfeldt, Tiberiu Ichim, Timo Stollenwerk, Eric Steele, Eric Bréhault, Rikupekka Oksanen and Érico Andrei.

This panel provides a brief history and modern examples of Plone search, followed by a discussion of what improvements are needed - both from a marketing and technical perspective. We discussed this topic at the 2011 conference  and it will be interesting to see how our opinions have changed. The panel consists of people who have recently been active in Plone search advances.

Back in the 2000's it was: "Wow, a CMS with built-in search!" In the 2010's: "Wow, open source search engines are becoming really good." In 2020's: "Wow, we really need better search solutions on larger sites."
In 2011 we mentioned that for the navigation we need immediate update: a new item should be visible in the navigation immediately. But for search it is fine to have it a bit later. Solr/Elasticsearch have more features than the ZCatalog, there are armies of engineers behind them. We had collective.solr versus alm.solrindex. It felt like a good idea to ship with Solr/Elasticsearch integration, but not require it.
Do we need an easy Plone + Solr/Elasticsearch install? Do we need to choose between these two?

Timo: we use Solr on a regular basis, for most clients. For collective.solr we had a buildout solution which was supposed to make it easier, but it was adding an extra layer of indirection: it is better to rely on the Solr documentation. There should be a good default, and we can have a search control panel, but will need to learn about Solr to really configure it.

Rikupekka: we run small and large sites at the university. For small sites the standard Plone search is fine. For larger sites we use Solr. One problem with 50,000 documents is when hundreds have a title "Research". Would be nice to have a warning message then: "We already have this many documents with the same title, please be more specific."

Eric Steele: Would be good if we market this correctly.

Tiberiu: At EEA we use Elasticsearch. Lately alternative vector based solutions start popping up. Currently we simply fetch the html of the the page, just like Google Search does.

Guido: At Quaive we use Solr, so much better than Catalog. Tuning it to give more weight to some fields should be an easy way to improve the results.

Erico: We could get rid of ZCatalog in all Plone instances. If navigation works in one way, and search in a different way, it is going to be a nightmare to debug. If we have money, we should hire Nuclia.

Eric Brehault: It needs to be opinionated, configured correctly.

Sally: Solr is open source, Elasticsearch is not.

Timo: I don't care about Solr versus Elasticsearch, we can make any decision there. Integration is important: might be that collective.elasticsearch is doing some things smarter than collective.search.

Guido: If you use an external service, you should remove the SearchableText index from the ZCatalog. And you need to make sure the indexers work: can you extract text from PDF, Word, etc.

Erico: We want real faceted navigation/search, like eea.facetednavigation did. Danger is that some add-ons do not work if the search is different. With a well defined search api, this should be no problem.

Erico: Sounds like there are benefits to each solution. People will want to choose. An abstraction layer will make it a lot safer.

Timo: Solr and Elasticsearch differ a lot, especially on how they handle facets. It is difficult to have an abstraction layer for this. And the responses will be different. If you try to transform the results so you get the same answer from Solr and Elasticsearch, then it kills performance.

Erico: It should be the same type of info as you get from the ZCatalog.

Guido: We need a solution that can handle and fix inconsistencies. The ZCatalog takes part in the transaction machinery in Plone, the external solutions typically do not.

 Philip: At one point in Sorrento (Plone Open Garden) we picked Solr and said it should be a first-class citizen in Plone. Just a single sentence in a Google Doc somewhere.

Johannes Raggam: Collaboration revolutions in Quaive

published Oct 05, 2023

Talk by Johannes Raggam at Plone conference 2023, Eibar, Basque Country.

Currently we have quaive.app.onlyoffice and quaive.app.libreoffice for realtime collaboration. As html editor we use TipTap, using the pat-tiptap pattern, with Hocuspocus for synching changes. This can be used for example for collaboratively writing notes for a weekly meeting.

TipTap is a headless/designless editor, so you need to build your own. It fits nicely in our Patternslib ecosystem. It is based on ProseMirror. This means it has a strict content schema: you define exactly what html structures are allowed. The outcome is potentially nicer HTML than what you get from TinyMCE, the default editor in Plone. Our integration is opinionated, but well-reasoned.

We use the Yjs library, which is a bit like conflict resolution in the ZODB. Also offline support, with sync once you are back online.

Hocuspocus is Tiptap's oen collaboration library. Based on web sockets.

We have a first prototype. We don't have a collaboration history: the document is just saved. Currently only the initiating user can save. Permission check is only client-side for now.

Uses plone.app.textfield, with the Yjs document in the raw value.

Code:

We have to see if Plone as storage backend is good for this. There are some BBQ (Boring Bundling Questions).

We want to make this available in standard Quaive. Maybe even for Plone and Volto.

We would like multi-channel editing, so at the same time you can edit text, title, description, data, list of participants, etc.

A collaborative whiteboard would be nice. Collaborative card notes. Collaborative anything!

Guido Stevens: Quaive Roadmap 2025

published Oct 05, 2023

Talk by Guido Stevens at Plone conference 2023, Eibar, Basque County.

Quaive is an intranet for Plone. First full version is from 2015.

This summer we did a full redesign and rewrite of the notifications in Quaive. For various categories you can choose how you want to receive notifications.

We have a lot on our plate, for example:

  • Componentization and design system
  • Collaborative editing, Johannes will talk about this after my talk.
  • Search subsystem. Solo upgrade. Do we need an AI search? Also panel discussion after Johannes' talk.

Consultant-ware is bad business: you treat Plone as a half product, and promise the customer to build the other half. Instead we prefer a high-quality out-of-the-box base product that you can extend if needed.

There is a trade-off between customisability and a good UX. Between cheap, fast and quality you can choose two. I have seen customisation as quality, but really this is a fourth dimension. We do not want cheap customisations, which are really hacks. We need to build a proper solution into a product. This can still be fairly easy: a custom theme. Or a general product usable by all customers.

Commercial open source. If we see a bug in the base product that we want to fix, it is stuff we decide on, so we pay for it. The customer gets no extra bill, it is covered in their subscription. If the customer sees a bug that they want us to fix, then they pay for it.

Reframe the conversation. Developers can think in terms of costs and development time of a feature. The customers care about the value that a feature brings. We should think beyond coding for money. If you are here for the money, you should leave, because you can make more money outside of open source.

You have four organisation types:

  • Blue: machine bureaucracy, top down, control, efficiency, protocol focus, inward looking. In tenders you need to check all the boxes, otherwise you do not get this customer. Optimised towards efficiency.
  • Orange: effective organisation. Sales. Rules are important, but results are more important. Goal seeking. These customers always want more features. Outward looking, but still control. Optimised towards opportunity.
  • Then comes the big shift, with bottom-up empowerment.
  • Green: professional organisation. Self-managing teas, consensus drive, it is about people and feelings, trust. The people who do the actual work are more knowledgeable than their managers, who are there to make things work smoothly. Tech agnostic. Inward looking and empowerment focused. Optimised towards trust.
  • Yellow: network organisation. Fluid and networked, project structure, win-win value creation. There is often more value than money. Outward looking and empowerment focused. Optimised towards innovation.

Really every organisation needs all these layers: blue, orange, green, yellow. How does Quaive do this?

  • Blue: document management, news items (top down)
  • Orange: to-do app, process support
  • Green: conversations, workspaces, global activity stream, groups
  • Yellow: where is the knowledge: see who is specialised in a subject. Open networking.

Integrate this with "pervasive affordances", something you have throughout the stack: tagging, linking, conversation, search, etc.

Quaive project roadmap: agile project management. Kanban boards, roadmap timeline, live progress data instead of strict planning, project analytics. New pervasive affordance needed: make everything actionable. This gives more process support, which is mainly for orange organisations, but every organisation needs it to some extent.

Olatz Perez de Viñaspre: The effect of social biases in language models

published Oct 05, 2023

Keynote talk by Olatz Perez de Viñaspre at Plone conference 2023, Eibar, Basque Country.

Current language models are trained on huge amounts of texts. The quality and content of such text has a direct effect on the new generations created by the language model. In this talk we will focus on how language models reproduce the biases present in society.

Models are big. For the Falcon model you need 400 GB on your computer.
Masked language models predict hidden words in sentences. ChatGPT is a Causal Language Model, or generative model. That is currently the biggest part of the language models evolutionary tree.
The language corpus of ChatGPT is roughly 90 percent English. German is 0.17 percent. So what happens, is that the models have an Anglo-centric bias.
Most systems are proprietary, not open source. Meta had the Llama model, with 68% performance compared to ChatGPT. It went open source and three weeks later it was more than 90%.
Bias: allocational or representational harm. In one model, black persons were sometimes recognised as monkeys because the model had not been trained well enough on faces of black persons. Also, calling a woman "independent" is positive, but it is still a bias: you don't often call a man independent.
How do you measure bias? Manually made datasets often contain problems: they are biased themselves. Recently a more objective solution: Marked Personas. Ask a question like "define a white male" and compare the answer with "define a black woman". Does this show biases?
Where do models get their data? Web, books, videos. But the internet is also full of hate speech, so you can train a model on hate. So there is a problem of quantity versus quality. There is a lot of effort on debiasing models, but is still an open task with many edges. Ethics will need to guide the further development of large language models.
Different languages can have different biases. For example Basque has no he or she. And we see that if you translate "he/she is a doctor" to English, it becomes "he", and for a nurse it becomes "she". So you see bias in translation.

Lightning talks Wednesday

published Oct 04, 2023

Lightning talks on Wednesday at the Plone conference 2023 in Eibar, Basque Country.

Luistxo from CodeSyntax: Count von Count speaks Basque

Counting 1 to 10: bat, bi, hiru, lau, bost, sei, zazpi, zortzi, bederatzi, hamar. Eskerrik asko!

Dylan and Jon: Is Plone 6 Mach?

What is Mach? New buzzword gimmick. An alliance of mainly CMS vendors. Promise: scalability, flexibility, innovation, improved performance. Micro services, API first, Cloud-hosted, Headless.

Yes, Volto is API first. No, Plone Classic and Volto are not headless. But you can plug a front-end into the REST API. Microservices: no, it is a big monoliths, but you could add logic in micro services. Cloud native: yes (ish). Could be SAAS, though no one is offering this at the moment.

Let's go MACHO: Open source.

Karel: true costs of migration

In South Africa there is no power 24 hours a day, which actually means water is a problem. So we got a filtration system, but got into unexpected problems. That can happen during Plone migrations as well. You have expected and quantifiable costs, but also probable and/or unquantifiable costs. Do integrations work? Can the team be onboarded easily? What will you do once you find out what you did not know at the beginning? Migration may not be the best answer, but can be very expensive.

Code Syntax: #PrettyEibar competition

Take pictures from Eibar, post on Twitter/X/Mastodon, tag with #ploneconf2023, on Friday the winner will get a delicious price.

Fred van Dijk: "Lies to children"

You should try Terry Pratchett's parody fantasy novels. He also wrote "The science of Disc world" with two scientists. This introduced "lies to children". Lying is part of education. You tell that everything consists of tiny blocks, keeping it simple, but then you get to protons, electrons, Maxwell, snares, etc.

Let's market Plone 6 and 7. As developers we try to answer stupid questions honestly. Should be store everything in one big container? Developers: noooo! But maybe for test driving Plone it is okay. One click install to Plone! That is marketing.

Alexander Loechel: eduTAP

Bridging online identity for reliable and trustworthy service access. From eduGAIN and eduroam to eduTAP. Tap to pay, tap to open a door, not tapping a beer. Vision of European Commission: mobility for students should be as smooth as possible. Plastic cards for this is old. We are going to an interoperable campus card. A common ID for all education world wide. This works with wallet apps on your phone. Deliverables: core, libs, docs, central service directory.

Manabu Terada: PoC for LLM search on Plone

Want to search on sentences within an Intranet Plone. Basics of LLM vector search. The PoC worked, but not yet in Plone. So create a few index class, inherited from ZCTextIndex.

Package: https://github.com/cmscom/c2.search.llm. Repo and name may change. Sample package requires a GPU, did not work on my Mac M1.

Also: PyCon APAC 2023 is in Tokyo Japan, 27-28th.

Michael McFadden: demo search on rfa.org in Basque with Nuclia

It works!

Erico Andrei: a bunch of (crazy) ideas

Ideas that need some implementation, will pay with beer.

  1. Move your add-on tests to pytest, using pytest-plone
  2. Move my add-on.
  3. Add new features to plone.api. We had features in 2013. Ten years later we have the same features plus relations. Nothing else. Add for example api for vocabularies.
  4. Create an add-on that stores blobs on an object store: S3, mini, etc. CloudFlare and Thumbor do stuff as well.
  5. Implement a Volto block of every entry in the Wordpress.com list of available blocks.
  6. ActivityPub support for Plone.
  7. Make the most innovative implementation of a new plone.distribution.

Johannes Raggam: Debug javascript in classic Plone

It is documented in the mockup readme: https://github.com/plone/mockup#development

Basically: start the development server of mockup, get the url, paste it in the plone bundle in the Resource Registries. Then add debugger; somewhere in the mockup javascript.