Sorry, we don't support your browser.  Install a modern browser

Immutable IDs for pages#16

A system of having immutable ID’s for pages. Right now UUID’s are basically slugs, which can be changed by editors. The consequence is that the stored references to these pages break.

There’s already a plugin that tries to solve this: https://github.com/bnomei/kirby3-autoid

a month ago
Changed the title from "Immutable ID's for pages" to "Immutable IDs for pages"
a month ago
Changed the status to
Planned
a month ago

I’ve been using the AutoID plugin recently and where very happy with it. For our own implemention, I would – however – prefer to use the UUID version 4 standard (random numbers), as it is a well-established standard, that ensures that there is enough entropy to avoid collisions. Only the downside of random numbers could be, that this could cause merge conflicts in git, when 2 dev setups are assigning different UUIDs to the same page, when edited from the panel.

To circumvent this, the UUIDs for any page/file should be generated as soon, as the corresponding page/file has been created in the panel/via API, even if the UUID is not needed at that time.

a month ago

I get that it’s a hard problem: you have take into account multiple scenario’s (e.g. “directly drafting content in the filesystem via the Finder” or “performance for fetching pages via the Kirby internal page(), pages()and->find()` functionality” or “merge collisions when content is generated at multiple origins and then pushed into a repo somehow”).

There are already some ideas (e.g. Bruno’s autoid plugin that saves the id directly in the content file - but this one needs performance “hacks” e.g. for indexing I think), but their might be others. E.g. I’ve lately thought about what npm/composer do with their lockfiles: maybe we could use something like that too? Then we have something that contains an index, with key-values that can be used in the internal ->find() and ->index() functions or to store these “id’s” as relations in content files. It can be updated quickly on “slug-change”-events in the panel, and theoretically we can even have some kind of maintenance function that checks periodically if this file exists and if all pages are added to it. It can also be put in version control systems and will throw a merge conflict when there would be an issue. On the other hand, it’s a potential SPoF when you lose or corrupt this file.

a month ago
2

@Bart Vandeputte IMHO, storing the ID directly in the content file will give us the least amount of problems, because as you noted, having everything stored in a separate index could easily lead to catastrophic failure, once this file gets corrupted. But having everything stored within content files as a single source of truth is very save, unless 2 different installations of a project will generate a UUID for the same page/file. But that is less likely, when content gets a UUID assigned as soon as it is uploaded/created.

In that case, the lookup table just serves as a cache and could even be self-correcting. Each lookup can be easily verified by:

  1. Kirby get a request, e.g. $site->lookup('MY-UUID') (method name lookup just for example).
  2. Kirby queries the lookup table to see, if that UUID exists there:

If the UUID exists in the lookup table: Kirby fetches the corresponding page/file and checks if its ID matches the cache.

If so, Kirby returns the fetched page or file object. If the ID is not present in the lookup table or the corresponding page/file was not found, Kirby will traverse the site tree until it finds the requested page/file. If that also fails, return null.

Cases, in which the lookup table could get out-of-sync would be:

  1. Content files have been moved/created directly from the file system. The index can correct things on-the-fly, as the lookup() method is used. This means, the index file does not need to be 100% correct at any point. If e.g. the UUID for a page/file is never looked-up, it does not matter if its lookup entry remains broken until needed.
  2. A new content file has been created on another installation and the index has not been updated yet. In this case, the lookup table can just make a single lookup and add the corresponding page to the index.
  3. Something has been deleted from the file-system and does not exist in the content folder any more. In this case, the lookup table can just delete the corresponding entry.
  4. If a lot of content was changed and it would not make any sense to update the lookup table “entry-by-entry”, it would easily be possible to re-create it from scratch by traversing the site tree.

If a database is used to store content instead (with a dedicated UUID column), the lookup table could even be skipped, though it might still be relevant for performance reasons. I’m not so much of a database expert, but I remember that caching can often speed up things a lot, especially when the database is hosted on another server.

a month ago

I agree Fabian. The generated (?) cached lookup file (or redis cache or whatever) could be used as a performance trick to avoid slow reference lookup. But this would also mean it guarantees 100% accuracy at each moment in time.

a month ago

I think Fabian already outlined this very nicely. I’m also not too worried about out-of-sync indexes. AFAIK relational databases have the exact same problem. I.e. it’s quite easy and possible to wreck a MySQL database as well. With redundant indexes and caches there’s always a risk of corruption.

a month ago

@Bastian I’ve just been working on a site that uses AutoID for all references and stumbled upon a problem, that needs a good solution when persistant IDs are used:

Let’s say we use a file picker to select a hero image and reference it by its persistant ID. After that, we clone the page including all assets. What should happen in this case? Normally, the persistant ID will continue to point to the hero image on the original page, but since that file is not available as an option in any file picker popup on the new page, we are running into a problem.

At first glance, the hero image of the new page looks correct, but once the editor wants to change it, he*she will probably very surprised, that replacing the file will not have eny effect. If the hero image is just a generic illustration, e.g. a pattern that’s going to be re-used on multiple pages, changing it later on on the original page will cause side-effects to all duplicated pages.

So the question is: How should files/pages fields and anything else, that can theoretically use persistant IDs for referencing files deal with this scenario? The editor plugin ran into similar issues in earlier versions, because it referenced files by their absolute path, rather than storing just the filename. Should the panel update all references automatically when the page is duplicated? This would need quite some extra logic, but would otherwise lead to otherwise impossible file references, and content, that does not comply with the vaildation rules of its blueprint.

a month ago
?

Kirby stores its data in text files. The data requirements - such as each record needing an immutable, unique ID - are the same, regardless of the storage method. And the strategies for dealing with issues related to these features will also be the same. So, ask yourself: how would we handle this in a database?

In a database we’d have page records, related to other records - such as other pages, and/or image records and/or file records. In a database, we can specify what happens when we update and delete related records - in SQL you specify whether the operation cascades, updates or leaves the related pointer empty. In NoSQL the process is usually more manual, but the options are the same.

Copying is the same. In Kirby, if we ‘copy’ a page that has sub-pages, as well as files and images, we’ll copy those as well. In a database, this would be equal to creating new records - not only for the page, but also for the related pages, files and images. So, in a database we’d have to update the references in the new page to link to the newly copied sub-pages, files and images. In a NoSQL database, depending on how the records are structured, some of the sub-records might be embedded in the parent, and will be copied by default, while others might have to be manually re-linked.

The point is: these are all standard, data-handling issues, that have well-documented solutions, so it might end up posing less of a problem to the team than we might believe.

19 days ago

@anonymous; in an ideal world where every content update happens to be done in the panel you’re right. The problem is that some of us (myself included) don’t always use the panel to make content changes or add content. This poses an extra difficulty level since you cannot run a hook or something to a bunch of files when you use the file system.

19 days ago

i will quiet happily help the core team review my autoid plugin code to make sure they can avoid some pitfall right from the start. there are some issues that only pop up once you go beyond 3k pages in index.

18 days ago
4