April 28, 2016
Two questions were up for voting, 4 seats on the Board of Directors and approval of the amended By-Laws to join SPI.

Congratulations to our reelected and new board members Egbert Eich, Alex Deucher, Keith Packard and Bryce Harrington. Thanks a lot to Lucas Stach for running. And also big thanks to our outgoing board member Matt Dew, who stepped down for personal reasons.

On the bylaw changes and merging with SPI, 61 out of 65 active members voted, with 54 voting yes, 4 no and 3 abstained. Which means we're well past the 2/3rd quorum for bylaw changes, and everything's green now to proceed with the plan to join SPI!
April 27, 2016
Shaders can be huge, and tracking down compiler crashes (or asserts) in LLVM with a giant shader isn't a lot of fun. Luckily, LLVM has a tool called Bugpoint. It takes a given piece of LLVM IR and tries a bunch of simplifications such as removing instructions or basic blocks, while checking that a given condition is still satisfied. Make the given condition something like "llc asserts with message X", and you have a very useful tool for reducing test cases. Unfortunately, its documentation isn't the greatest, so let me briefly dump how I have used it in the past.

I have a little script called that looks like this:


if ! llc -mtriple=amdgcn-- -verify-machineinstrs "$@" 2>&1 | grep "error message here"; then
exit 0
exit $?
When I encounter a compiler assertion, I first make sure to collect the offending shader from our driver using R600_DEBUG=ps,vs,gs,tcs,tes and extract it into a file like bug.ll. (In very rare cases, one may need the preoptir option in R600_DEBUG.) Then I edit with the correct error message and run

bugpoint -compile-custom -compile-command ./ bug.ll
It'll churn for some time and produce a hopefully much smaller .bc file that one can use the usual tools on, such as llc, opt, and llvm-dis.

Occasionally, it can be useful to run the result through opt -instnamer or to simplify it further by hand, but usually, bugpoint provides a good starting point.
April 26, 2016

This is a question raised quite quite often, the last time in a blogpost by Thomas, so I thought it is a good idea to give a slightly longer explanation (and also create an article to link to…).

There are basically three reasons for using XML as the default format for metainfo files:

1. XML is easily forward/backward compatible, while YAML is not

This is a matter of extending the AppStream metainfo files with new entries, or adapt existing entries to new needs.

Take this example XML line for defining an icon for an application:

<icon type="cached">foobar.png</icon>

and now the equivalent YAML:

  cached: foobar.png

Now consider we want to add a width and height property to the icons, because we started to allow more than one icon size. Easy for the XML:

<icon type="cached" width="128" height="128">foobar.png</icon>

This line of XML can be read correctly by both old parsers, which will just see the icon as before without reading the size information, and new parsers, which can make use of the additional information if they want. The change is both forward and backward compatible.

This looks differently with the YAML file. The “foobar.png” is a string-type, and parsers will expect a string as value for the cached key, while we would need a dictionary there to include the additional width/height information:

  cached: name: foobar.png
          width: 128
          height: 128

The change shown above will break existing parsers though. Of course, we could add a cached2 key, but that would require people to write two entries, to keep compatibility with older parsers:

  cached: foobar.png
  cached2: name: foobar.png
          width: 128
          height: 128

Less than ideal.

While there are ways to break compatibility in XML documents too, as well as ways to design YAML documents in a way which minimizes the risk of breaking compatibility later, keeping the format future-proof is far easier with XML compared to YAML (and sometimes simply not possible with YAML documents). This makes XML a good choice for this usecase, since we can not do transitions with thousands of independent upstream projects easily, and need to care about backwards compatibility.

2. Translating YAML is not much fun

A property of AppStream metainfo files is that they can be easily translated into multiple languages. For that, tools like intltool and itstool exist to aid with translating XML using Gettext files. This can be done at project build-time, keeping a clean, minimal XML file, or before, storing the translated strings directly in the XML document. Generally, YAML files can be translated too. Take the following example (shamelessly copied from Dolphin):

<summary>File Manager</summary>
<summary xml:lang="bs">Upravitelj datoteka</summary>
<summary xml:lang="cs">Správce souborů</summary>
<summary xml:lang="da">Filhåndtering</summary>

This would become something like this in YAML:

  C: File Manager
  bs: Upravitelj datoteka
  cs: Správce souborů
  da: Filhåndtering

Looks manageable, right? Now, AppStream also covers long descriptions, where individual paragraphs can be translated by the translators. This looks like this in XML:

  <p>Dolphin is a lightweight file manager. It has been designed with ease of use and simplicity in mind, while still allowing flexibility and customisation. This means that you can do your file management exactly the way you want to do it.</p>
  <p xml:lang="de">Dolphin ist ein schlankes Programm zur Dateiverwaltung. Es wurde mit dem Ziel entwickelt, einfach in der Anwendung, dabei aber auch flexibel und anpassungsfähig zu sein. Sie können daher Ihre Dateiverwaltungsaufgaben genau nach Ihren Bedürfnissen ausführen.</p>
  <p xml:lang="de">Funktionen:</p>
  <p xml:lang="es">Características:</p>
    <li>Navigation (or breadcrumb) bar for URLs, allowing you to quickly navigate through the hierarchy of files and folders.</li>
    <li xml:lang="de">Navigationsleiste für Adressen (auch editierbar), mit der Sie schnell durch die Hierarchie der Dateien und Ordner navigieren können.</li>
    <li xml:lang="es">barra de navegación (o de ruta completa) para URL que permite navegar rápidamente a través de la jerarquía de archivos y carpetas.</li>
    <li>Supports several different kinds of view styles and properties and allows you to configure the view exactly how you want it.</li>

Now, how would you represent this in YAML? Since we need to preserve the paragraph and enumeration markup somehow, and creating a large chain of YAML dictionaries is not really a sane option, the only choices would be:

  • Embed the HTML markup in the file, and risk non-careful translators breaking the markup by e.g. not closing tags.
  • Use Markdown, and risk people not writing the markup correctly when translating a really long string in Gettext.

In both cases, we would loose the ability to translate individual paragraphs, which also means that as soon as the developer changes the original text in YAML, translators would need to translate the whole bunch again, which is inconvenient.

On top of that, there are no tools to translate YAML properly that I am aware of, so we would need to write those too.

3. Allowing XML and YAML makes a confusing story and adds complexity

While adding YAML as a format would not be too hard, given that we already support it for DEP-11 distro metadata (Debian uses this), it would make the business of creating metainfo files more confusing. At time, we have a clear story: Write the XML, store it in /usr/share/metainfo, use standard tools to translate the translatable entries. Adding YAML to the mix adds an additional choice that needs to be supported for eternity and also has the problems mentioned above.

I wanted to add YAML as format for AppStream, and we discussed this at the hackfest as well, but in the end I think it isn’t worth the pain of supporting it for upstream projects (remember, someone needs to maintain the parsers and specification too and keep XML and YAML in sync and updated). Don’t get me wrong, I love YAML, but for translated metadata which needs a guarantee on format stability it is not the ideal choice.

So yeah, XML isn’t fun to write by hand. But for this case, XML is a good choice.

Two weeks ago was the GNOME Software hackfest in London, and I’ve been there! And I just now found the time to blog about it, but better do it late than never 😉 .

Arriving in London and finding the Red Hat offices

After being stuck in trains for the weekend, but fortunately arriving at the airport in time, I finally made it to London with quite some delay due to the slow bus transfer from Stansted Airport. After finding the hotel, the next issue was to get food and a place which accepted my credit card, which was surprisingly hard – in defence of London I must say though, that it was a Sunday, 7 p.m. and my card is somewhat special (in Canada, it managed to crash some card readers, so they needed a hard-reset). While searching for food, I also found the Red Hat offices where the hackfest was starting the next day by accident. My hotel, the office and the tower bridge were really close, which was awesome! I have been to London in 2008 the last time, and only for a day, so being that close to the city center was great. The hackfest didn’t leave any time to visit the city much, but by being close to the center, one could hardly avoid the “London experience” 😉 .

Cool people working on great stuff

towerbridge2016That’s basically the summary for the hackfest 😉 . It was awesome to meet with Richard Hughes again, since we haven’t seen each other in person since 2011, but work on lots of stuff together. This was especially important, since we managed to solve quite some disagreements we had over stuff – Richard even almost managed to make me give in to adding <kudos/> to the AppStream spec, something which I was pretty against supporting (it didn’t make it yet, but I am no longer against the idea of having that – the remaining issues are solvable).

Meeting Iain Lane again (after FOSDEM) was also very nice, and also seeing other people I’ve only worked with over IRC or bug reports (e.g. William, Kalev, …) was great. Also lots of “new” people were there, like guys from Endless, who build their low-budget computer for developing/emerging countries on top of GNOME and Linux technologies. It’s pretty cool stuff they do, you should check out their website! (they also build their distribution on top of Debian, which is even more awesome, and something I didn’t know before (because many Endless people I met before were associated with GNOME or Fedora, I kind of implicitly assumed the system was based on Fedora 😛 )).

The incarnation of GNOME Software used by endless looks pretty different from what the normal GNOME user sees, since it’s adjusted for a different audience and input method. But it looks great, and is a good example for how versatile GS already is! And for upstream GNOME, we’ve seen some pretty great mockups done by Endless too – I hope those will make it into production somehow.

Ironically, a "snapstore" was close to the office ;-)

Ironically, a “snapstore” was close to the office ;-)

XdgApp and sandboxing of apps was also a big topic, aside from Ubuntu and Endless integration. Fortunately, Alexander Larsson was also there to answer all the sandboxing and XdgApp-questions.

I used the time to follow up on a conversation with Alexander we started at FOSDEM this year, about the Limba vs. XdgApp bundling issue. While we are in-line on the sandboxing approach, the way how software is distributed is implemented differently in Limba and XdgApp, and it is bad to have too many bundling systems around (doesn’t make for a good story where we can just tell developers “ship as this bundling format, and it will be supported everywhere”). Talking with Alex about this was very nice, and I think there is a way out of the too-many-solutions dilemma, at least for Limba and XdgApp – I will blog about that separately soon.

On the Ubuntu side, a lot of bugs and issues were squashed and changes upstreamed to GNOME, and people were generally doing their best to reduce Richard’s bus-factor on the project a little 😉 .

I mainly worked on AppStream issues, finishing up the last pieces of appstream-generator and running it against some sample package sets (and later that week against the whole Debian archive). I also started to implement support for showing AppStream issues in the Debian PTS (this work is not finished yet). I also managed to solve a few bugs in the old DEP-11 generator and prepare another release for Ubuntu.

We also enjoyed some good Japanese food, and some incredibly great, but also suddenly very expensive Indian food (but that’s a different story 😉 ).

The most important thing for me though was to get together with people actually using AppStream metadata in software centers and also more specialized places. This yielded some useful findings, e.g. that localized screenshots are not something weird, but actually a wanted feature of Endless for their curated AppStore. So localized screenshots will be part of the next AppStream spec. Also, there seems to be a general need to ship curation information for software centers somehow (which apps are featured? how are they styled? added special banners for some featured apps, “app of the day” features, etc.). This problem hasn’t been solved, since it’s highly implementation-specific, and AppStream should be distro-agnostic. But it is something we might be able to address in a generic way sooner or later (I need to talk to people at KDE and Elementary about it).

In summary…

It was a great event! Going to conferences and hackfests always makes me feel like it moves projects leaps ahead, even if you do little coding. Sorting out issues together with people you see in person (rather than communicating with them via text messages or video chat), is IMHO always the most productive way to move forward (yeah, unless you do this every week, but I think you get my point 😀 ).

For me, being the only (and youngest ^^) developer at the hackfest who was not employed by any company in the FLOSS business, the hackfest was also motivating to continue to invest spare time into working on these projects.

So, the only thing left to do is a huge shout out of “THANK YOU” to the Ubuntu Community Fund – and therefore the Ubuntu community – for sponsoring me! You rock! Also huge thanks to Canonical for organizing the sponsoring really quickly, so I didn’t get into trouble with paying my flights.

Laney and attente walking on the Millennium Bridge after we walked the distance between Red Hat and Canonical's offices.

Laney and attente on the Millennium Bridge after we walked the distance between Red Hat and Canonical’s offices.

To worried KDE people: No, I didn’t leave the blue side – I just generally work on cross-desktop stuff, and would like all desktops to work as well as possible 😉

April 24, 2016

In case you missed it: Please vote now on!
April 20, 2016
It's election season in land, and it matters: Besides new board seats we're also voting on bylaw changes and whether to join SPI or not.

Personally, and as the secretary of the board I'm very much in favour of joining SPI. It will allow us to offload all the boring bits of running a foundation, and those are also all the bits we tend to struggle with. And that would give the board more time to do things that actually matter and help the community. And all that for a really reasonable price - running our own legal entity isn't free, and not really worth it for our small budget mostly consisting of travel sponsoring and the occasional internship.

And bylaw changes need a qualified supermajority of all members, every vote counts and not voting essentially means voting no. Hence please vote, and please vote even when you don't want to join - this is our second attempt and I'd really like to see a clear verdict from our members, one way or the other.


Voting closes by  Apr 26 23:59 UTC, but please don't cut it short, it's a computer that decides when it's over ...

April 18, 2016

When we set of to do the Fedora Workstation we had some clear idea about where we wanted to take it, but we also realized there was a lot of cleaning up needed in our stack to make our vision viable. The biggest change we felt was needed to enable us was the move towards using application bundles as the primary shipping method for applications as opposed to the fine grained package universe that RPMS represent. That said we also saw the many advantages the packages brought in terms of easing security updates and allowing people to fine tune their system, so we didn’t want to throw the proverbial baby out with the bathwater. So we started investigating the various technologies out there, as we where of course not alone in thinking about these things. Unfortunately nothing clearly fit the bill of what we wanted and trying to modify for instance Docker to be a good technology for running desktop applications turned out to be nonviable. So we tasked Alex Larsson with designing and creating what today is known as xdg-app. Our requirements list looked something like this (in random order):

a) Easy bundling of needed libraries
b) A runtime system to reduce the application sizes to something more manageable and ease providing security updates.
c) A system designed to be managed by a desktop session as opposed to managed by sysadmin style tools
d) A security model that would let us gradually move towards sandboxing applications and alleviate the downsides of library bundling
e) An ability to reliably offer online updates of applications
f) Reuse as much of the technology created by others as possible to lower maintenance overhead
g) Design it in a way that makes supporting the applications cross multiple distributions possible and easy
h) Provide a top notch developer story so that this becomes a big positive for application developers and not another pain point.

As we investigated what we needed other requirements become obvious, like the need to migrate from X to Wayland in order to build a modern composited windowing system that renders using GL, instead of an aging one that has a rendering interface that is no longer used for the most part, and to be able to provide the level of system security we wanted. There was also the need to design new systems like Pinos for handling video and add new functionality to PulseAudio for dealing with sandboxed applications, creating libinput to have great input handling in Wayland and also let us share the input subsystem between X and Wayland. And of course we wanted our new packaging system tightly integrated into GNOME Software so that install, updating and running these applications became smooth and transparent to the user.

This would be a big undertaking and it turned out to be an even bigger effort than we initially thought, as there was a lot of swamp draining needed here and I am really happy that we have a team capable of pulling these things off. For instance there is not really many other people in the Linux community other than Peter Hutterer who could have created libinput, and without libinput there is no way Wayland would be a viable alternative to X (and without libinput neither would Mir which is a bit ironic for a system that was created because they couldn’t wait for Wayland :).

So going back to the title of this blog entry I feel that we are now finally exiting what I think of as Phase 1, even if we never formally designated it as such, of our development roadmap. For instance we initially hoped to have Wayland feature complete in a Fedora 22 timeframe, but it has taken us extra time to get all the smaller details right, so instead we are now having what we consider the first feature complete Wayland ready with Fedora Workstation 24. And if things go as we expect and hope that should become our default system starting from Fedora Workstation 25. The X Window session will be shipping and available for a long time yet I am sure, but not defaulting to it will mark a clear line in the sand for where the development focus is going forward.

At the same time Xdg-app has started to come together nicely over the last few Months with a lot of projects experimenting with it and bugs and issues being quickly resolved. We also taking major steps towards bringing xdg-app into the mainstream by Alex now working on making Xdg-apps OCI compliant, basically meaning that xdg-apps conform to the Open Container Initative requirements defined by Our expectation is that the next Xdg-app development release will include the needed bits to be OCI compliant. Another nice milestone for Xdg-app was that it recently got added to Debian, meaning that Xdg-apps should be more easily runable in both Fedora its downstreams and in Debian and its downstreams.

Another major piece of engineering that is coming to a close is moving major applications such as Firefox, LibreOffice and Eclipse to GTK3. This was needed both to get these applications able to run natively on Wayland, but it also enabled us to make them work nicely for HiDPI. This has also played out into how GTK3 have positioned itself which to be a toolkit dedicated to pushing the Linux desktop forward and helping that quickly adapt and adopt to changes in the technology landscape. This is why GTK3 is that toolkit that has been leading the charge on things like HiDPI support and Wayland support. At the same time we know some of the changes in GTK3 have been painful for application developers, especially the changes in how theming works, but with the bulk of the migration to using CSS for theming now being complete we expect that even for applications that use GTK3 in ‘weird ways’ like Firefox, LibreOffice and Eclipse, things should be stable.

Another piece of the puzzle we have wanted to improve is the general Linux hardware story. So since Red Hat joined Khronos last year the Red Hat Graphics team, with Dave Airlie and Adam Jackson leading the charge, has been able to participate in preparing the launch of Vulkan through doing review and we even managed to sneak in a bit of code contribution by Adam Jackson ensuring that there was a vendor neutral Vulkan loader available so that we didn’t end up in a situation where every vendor had to provide their own.

We have also been contributing to the vendor neutral OpenGL dispatcher. The dispatcher is basically a layer that routes an applications OpenGL rendering to the correct implementation, so if you have a system with a discrete GPU system you can for instance control which of your two GPUs handle a certain application or game. Adam Jackson has also been collaborating closely with NVidia on getting such a dispatch system complete for OpenGL, so that the age old problem of the Mesa OpenGL library and the proprietary NVidia OpenGL library conflicting can finally be resolved. So NVidia has of course handled the part in their driver and they where also the ones designing this, but Adam has been working on getting the Mesa parts completed. We think that this will make the GPU story on Linux a lot nicer going forward. There are still a few missing pieces before we have the discrete graphics card scenario handled in a smooth way, but we are getting there quickly.

The other thing we have been working on in terms of hardware support, which is still ongoing is improving the Red Hat certification process to cover more desktop hardware. You might ask what that has to do with Fedora Workstation, but it actually is a quite efficient way of improving the quality of Linux support for desktop hardware in general as most of the major vendors submit some of their laptops and desktops to Red Hat for certification. So the more issues the Red Hat certification process can detect the better Linux support on such hardware can become.

Another major effort where we have managed to cover a lot of our goals and targets is GNOME Software. Since the inception of Fedora Workstation we taken that tool and added functionality like UEFI firmware updates, codecs and font handling, GNOME Extensions handling, System upgrades, Xdg-app handling, users reviews, improved application metadata, improved handling of 3rd party repositories and improved general performance with the move from yum to hawkeye. And I think that the Software store has become a crucial part of what you expect of a desktop these days, with things like the Google Play store, the Apple Store and the Microsoft store to some degree defining their respective products more than the heuristics of the shell of Android, iPhone, MacOS or Windows. And I take it as an clear recognition of the great work Richard Hughes had done with GNOME Software that this week there is a special GNOME Software hackfest in London with participants from Fedora/Red Hat, Ubuntu/Canonical, Codethink and Endless.

So I am very happy with where we are at, and I want to say thank you to all long term Fedora users who been with us through the years and also say thank you and welcome to all the new Fedora Workstation users who has seen all the cool stuff we been doing and decided to join us over the last two years; seeing the strong growth in our userbase during this time has been a great source of joy for us and been a verification that we are on the right track.

I am also very happy about how the culmination of these efforts will be on display with the upcoming Fedora Workstation 24 release! Of course it also means it is time for the Fedora Workstation Working group to start planning what Phase 2 of reaching the Fedora Workstation vision should be :)

When we released graphics tablet support in libinput earlier this year, only tablet tools were supported. So while you could use the pen normally, the buttons, rings and strips on the physical tablet itself (the "pad") weren't detected by libinput and did not work. I have now merged the patches for pad support into libinput.

The reason for the delay was simple: we wanted to get it right [1]. Pads have a couple of properties that tools don't have and we always considered pads to be different to pens and initially focused on a more generic interface (the "buttonset" interface) to accommodate for those. After some coding, we have now arrived at a tablet pad-specific interface instead. This post is a high-level overview of the new tablet pad interface and how we intend it do be used.

The basic sign that a pad is present is when a device has the tablet pad capability. Unlike tools, pads don't have proximity events, they are always considered in proximity and it is up to the compositor to handle the focus accordingly. In most cases, this means tying it to the keyboard focus. Usually a pad is available as soon as a tablet is plugged in, but note that the Wacom ExpressKey Remote (EKR) is a separate, wireless device and may be connected after the physical pad. It is up to the compositor to link the EKR with the correct tablet (if there is more than one).

Pads have three sources of events: buttons, rings and strips. Rings and strips are touch-sensitive surfaces and provide absolute values - rings in degrees, strips in normalized [0.0, 1.0] coordinates. Similar to pointer axis sources we provide a source notification. If that source is "finger", then we send a terminating out-of-range event so that the caller can trigger things like kinetic scrolling.

Buttons on a pad are ... different. libinput usually re-uses the Linux kernel's include/input.h event codes for buttons and keys. But for the pad we decided to use plain sequential button numbering, starting at index 0. So rather than a semantic code like BTN_LEFT, you'd simply get a button 0 event. The reasoning behind this is a caveat in the kernel evdev API: event codes have semantic meaning (e.g. BTN_LEFT) but buttons on a tablet pad don't those meanings. There are some generic event ranges (e.g. BTN_0 through to BTN_9) and the Wacom tablets use those but once you have more than 10 buttons you leak into other ranges. The ranges are simply too narrow so we end up with seemingly different buttons even though all buttons are effectively the same. libinput's pad support undoes that split and combines the buttons into a simple sequential range and leaves any semantic mapping of buttons to the caller. Together with libwacom which describes the location of the buttons a caller can get a relatively good idea of how the layout looks like.

Mode switching is a commonly expected feature on tablet. One button is designated as mode switch button and toggles all other buttons between the available modes. On the Intuos Pro series tablets, that button is usually the button inside the ring. Button mapping and thus mode switching is however a feature we leave up to the caller, if you're working on a compositor you will have to implemented mode switching there.

Other than that, pad support is relatively simple and straightforward and should not cause any big troubles.

[1] or at least less wrong than in the past
[2] They're actually linux/input-event-codes.h in recent kernels

April 16, 2016

I moved my blog around a bit and it appears that static pages are now in favour, so I switched to that, by way of Hugo. CSS and such needs more tweaking, but it’ll make do for now.

As part of this, RSS feeds and such changed, if you want to subscribe to this (very seldomly updated) blog, use

AppStream GeneratorSince mid-2015 we were using the dep11-generator in Debian to build AppStream metadata about available software components in the distribution.

Getting rid of dep11-generator

Unfortunately, the old Python-based dep11-generator was hitting some hard limits pretty soon. For example, using multiprocessing with Python was a pain, since it resulted in some very hard-to-track bugs. Also, the multiprocessing approach (as opposed to multithreading) made it impossible to use the underlying LMDB database properly (it was basically closed and reopened in each forked off process, since pickling the Python LMDB object caused some really funny bugs, which usually manifested themselves in the application hanging forever without any information on what was going on). Additionally to that, the Python-based generator forced me to maintain two implementations of the AppStream YAML spec, one in C and one in Python, which consumes quite some time. There were also some other issues (e.g. no unit-tests) in the implementation, which made me think about rewriting the generator.

Adventures in Go / Rust / D

Since I didn’t want to write this new piece of software in C (or basically, writing it in C was my last option 😉 ), I explored Go and Rust for this purpose and also did a small prototype in the D programming language, when I was starting to feel really adventurous. And while I never intended to write the new generator in D (I was pretty fixated on Go…), this is what happened. The strong points for D for this particular project were its close relation to C (and ease of using existing C code), its super-flat learning curve for someone who knows and likes C and C++ and its pretty powerful implementations of the concurrent and parallel programming paradigms. That being said, not all is great in D and there are some pretty dark spots too, mainly when it comes to the standard library and compilers. I will dive into my experiences with D in a separate blogpost.

What good to expect from appstream-generator?

So, what can the new appstream-generator do for you? Basically, the same as the old dep11-generator: It will extract metadata from a distribution’s package archive, download and resize screenshots, search for icons and size them properly and generate reports in JSON and HTML of found metadata and issues.

LibAppStream-based parsing, generation of YAML or XML, multi-distro support, …

As opposed to the old generator, the new generator utilizes the metadata parsers and writers of libappstream. This allows it to return the extracted metadata as AppStream YAML (for Debian) or XML (everyone else) It is also written in a distribution-agnostic way, so if someone wants to use it in a different distribution than Debian, this is possible now. It just requires a very small distribution-specific backend to be written, all of the details of the metadata extraction are abstracted away (just two interfaces need to be implemented). While I do not expect anyone except Debian to use this in the near future (most distros have found a solution to generate metadata already), the frontend-backend split is a much cleaner design than what was available in the previous code. It also allows to unit-test the code properly, without providing a Debian archive in the testsuite.

Feature Flags, Optipng, …

The new generator also allows to enable and disable certain sets of features in a standardized way. E.g. Ubuntu uses a language-pack system for translations, which Debian doesn’t use. Features like this can be implemented as disableable separate modules in the generator. We use this at time to e.g. allow descriptions from packages to be used as AppStream descriptions, or for running optipng on the generated PNG images and icons.

No more Contents file dependency

Another issue the old generator had was that it used the Contents file from the Debian archive to find matching icons for an application. We could never be sure whether the contents in the Contents file actually matched the contents of the package we were currently dealing with. What made things worse is that at Ubuntu, the archive software is only updating the Contents file weekly daily (while the generator might run multiple times a day), which has lead to software being ignored in the metadata, because icons could not yet be found. Even on Debian, with its quickly-updated Contents file, we could immediately see the effects of an out-of-date Contents file when updating it failed once. In the new generator, we read the contents of each package ourselves now and store them in a LMDB database, bypassing the Contents file and removing the whole class of problems resulting from missing or wrong contents-data.

It can’t all be good, right?

That is true, there are also some known issues the new generator has:

Large amounts of RAM required

The better speed of the new generator comes at the cost of holding more stuff in RAM. Much more. When processing data from 5 architectures initially on Debian, the amount of required RAM might lie above 4GB, with the OOM killer sometimes being quicker than the garbage collector… That being said, on subsequent runs the amount of required memory is much lower. Still, this is something I am working on to improve.

What are symbolic links?

To be faster, the appstream-generator will read the md5sum file in .deb packages instead of extracting the payload archive and reading its contents. Since the md5sums file does not list symbolic links, symlinks basically don’t exist for the new generator. This is a problem for software symlinking icons or even .desktop files around, like e.g. LibreOffice does.

I am still investigating how widespread the use of symlinks for icons and .desktop files is, but it looks like fixing packages (making them not-symlink stuff and rather move the files) might be the better approach than investing additional computing power to find symlinks or even switch back to parsing the Contents file. Input on this is welcome!

Deploying asgen

I finished the last pieces of the appstream-generator (together with doing lots of other cool things and talking to great people) at the GNOME Software Hackfest in London last week (detailed blogposts about things that happened there will follow – many thanks once again for the Ubuntu community for sponsoring my attendance!).

Since today, the new generator is running on the Debian infrastructure. If bigger issues are found, we can still roll back to the old code. I decided to deploy this faster, so we can get some good testing done before the Stretch release. Please report any issues you may find!

April 13, 2016

A little less than 2 months after our latest major release, here is the new minor version of Gnocchi, stamped 2.1.0. It was a smooth release, but with one major feature implemented by my fellow fantastic developer Mehdi Abaakouk: the ability to create resource types dynamically.

Resource types REST API

This new version of Gnocchi offers the long-awaited ability to create resource types dynamically. What does that mean? Well, until version 2.0, the resources that you were able to create in Gnocchi had a particular type that was defined in the code: instance, volume, SNMP host, Swift account, etc. All of them were tied to OpenStack, since it was our primary use case.

Now, the API allows to create resource types dynamically. This means you can create your own custom types to describe your own architecture. You then can exploit the same features that were offered before: history of your resources, searching through them, associating metrics, etc!

Performances improvement

We did some profiling on Gnocchi, and some benchmarks, and with the help of my fellow developer Gordon Chung, improved the metric performances.

The API speed improved a bit, and I've measured the Gnocchi API endpoint of being able to ingest up to 190k measures/s with only one node (the same as used in my previous benchmark) using uwsgi, so a 50 % improvement. The time required to compute aggregation on new measures is now also metered and displayed in the gnocchi-metricd log in debug mode. Handy to have an idea of how fast your measures are treated.

Ceph backend optimization

The Ceph back-end has been improved again by Mehdi. We're now relying on OMAP rather than xattr for finer grained control and better performance.

We already have a few new features being prepared for our next release, so stay tuned! And if you have any suggestion, feel free to say a word. Election Time — Vote Now

It's more important than usual to actually get your vote in — we're asking the membership to vote on changes the the bylaws that are necessary for to become a SPI affiliate project, instead of continuing on as a separate organization. While I'm in favor of this transition as I think it will provide much needed legal and financial help, the real reason we need everyone to vote is that we need ⅔ of the membership to cast ballots for the vote to be valid. Last time, we didn't reach that value, so even though we had a majority voting in favor of the change, it didn't take effect. If you aren't in favor of this change, I'd still encourage you to vote as I'd like to get a valid result, no matter the outcome.

Of course, we're also electing four members to the board. I'm happy to note that there are five candidates running for the four available seats, which shows that there are enough people willing to help serve the community in this fashion.

April 08, 2016

There's a lot of situation where you end up needing a software deployed temporarily. This can happen when testing something manually, when running a script or when launching a test suite.

Indeed, many applications need to use and interconnect with external software: a RDBMS (PostgreSQL, MySQL…), a cache (memcached, Redis…) or any other external component. This tends to make more difficult running a software (or its test suite). If you want to rely on this component being installed and deployed, you end up needing a full environment set-up and properly configured to run your tests. Which is discouraging.

The different OpenStack projects I work on ended up pretty soon spawning some of their back-ends temporarily to run their tests. Some of those unit tests somehow became entirely what you would call functional or integration tests. But that's just a name. In the end, what we ended up doing is testing that the software was really working. And there's no better way doing that than talking to a real PostgreSQL instance rather than mocking every call.

Pifpaf to the rescue

To solve that issue, I created a new tool, named Pifpaf. Pifpaf eases the run of any daemon in a test mode for a brief moment, before making it disappear completely. It's pretty easy to install as it is available on PyPI:

$ pip install pifpaf
Collecting pifpaf
Installing collected packages: pifpaf
Successfully installed pifpaf-0.0.7

You can then use it to run any of the listed daemons:

$ pifpaf list
| Daemons |
| redis |
| postgresql |
| mongodb |
| zookeeper |
| aodh |
| influxdb |
| ceph |
| elasticsearch |
| etcd |
| mysql |
| memcached |
| rabbitmq |
| gnocchi |

Pifpaf accepts any shell command line to execute after its arguments:

$ pifpaf run postgresql -- psql
Expanded display is used automatically.
Line style is unicode.
psql (9.5.2)
Type "help" for help.
template1=# \l
List of databases
Name │ Owner │ Encoding │ Collate │ Ctype │ Access privileges
postgres │ jd │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │
template0 │ jd │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd ↵
│ │ │ │ │ jd=CTc/jd
template1 │ jd │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd ↵
│ │ │ │ │ jd=CTc/jd
(3 rows)
template1=# create database foobar;
template1=# \l
List of databases
Name │ Owner │ Encoding │ Collate │ Ctype │ Access privileges
foobar │ jd │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │
postgres │ jd │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │
template0 │ jd │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd ↵
│ │ │ │ │ jd=CTc/jd
template1 │ jd │ UTF8 │ en_US.UTF-8 │ en_US.UTF-8 │ =c/jd ↵
│ │ │ │ │ jd=CTc/jd
(4 rows)
template1=# \q

What pifpaf does is that it runs the different commands needed to create a new PostgreSQL cluster and then run PostgreSQL on a temporary port for you. So your psql session actually connects to a temporary PostgreSQL server, that is trashed as soon as you quit psql. And all of that in less than 10 seconds, without the use of any virtualization or container technology!

You can see what it does in detail using the debug mode:

$ pifpaf --debug run mysql $SHELL
DEBUG: pifpaf.drivers: executing: ['mysqld', '--initialize-insecure', '--datadir=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg']
DEBUG: pifpaf.drivers: executing: ['mysqld', '--datadir=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg', '--pid-file=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg/', '--socket=/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg/mysql.socket', '--skip-networking', '--skip-grant-tables']
DEBUG: pifpaf.drivers: executing: ['mysql', '--no-defaults', '-S', '/var/folders/7k/pwdhb_mj2cv4zyr0kyrlzjx40000gq/T/tmpkut9bg/mysql.socket', '-e', 'CREATE DATABASE test;']
$ exit
DEBUG: pifpaf.drivers: mysqld output: 2016-04-08T08:52:04.202143Z 0 [Note] InnoDB: Starting shutdown...

Pifpaf also supports my pet project Gnocchi, so you can run and try that timeseries database in a snap:

$ pifpaf run gnocchi $SHELL
$ gnocchi metric create
| Field | Value |
| archive_policy/aggregation_methods | std, count, 95pct, min, max, sum, median, mean |
| archive_policy/back_window | 0 |
| archive_policy/definition | - points: 12, granularity: 0:05:00, timespan: 1:00:00 |
| | - points: 24, granularity: 1:00:00, timespan: 1 day, 0:00:00 |
| | - points: 30, granularity: 1 day, 0:00:00, timespan: 30 days, 0:00:00 |
| archive_policy/name | low |
| created_by_project_id | admin |
| created_by_user_id | admin |
| id | ff825d33-c8c8-46d4-b696-4b1e8f84a871 |
| name | None |
| resource/id | None |
$ exit

And it takes less than 10 seconds to launch Gnocchi on my laptop using pifpaf. I'm then able to play with the gnocchi command line tool. It's by far faster than using OpenStack devstack to deloy everything the software.

Using pifpaf with your test suite

We leverage Pifpaf in several of our OpenStack telemetry related projects now, and even in tooz. For example, to run unit/functional tests with a memcached server available, a tox.ini file should like this:

commands = pifpaf run memcached -- python testr

The tests can then use the environment variable PIFPAF_MEMCACHED_PORT to connect to memcached and run tests using it. As soon as the tests are finished, memcached is killed by pifpaf and the temporary data are trashed.

We move a few OpenStack projects to using Pifpaf already, and I'm planning to make use of it in a few more. My fellow developer Mehdi Abaakouk added support for RabbitMQ in Pifpaf and added support for more advanced tests in oslo.messaging (such as failure scenarios) using Pifpaf.

Pifpaf is a very small and handy tool. Give it a try and let me know how it works for you!

April 07, 2016

Most days, at least one of the bugs I deal with requests something along the lines of "just add $FOO as a config option". In this post, I'll explain why this is usually a bad solution. First, read and keep those arguments in mind. Generally, there are two groups of configuration options - hardware options and user options. Hardware options are those that deal with specific quirks needed on some hardware, but not on other hardware. User options are those that deal with user preferences such as tapping or two-finger vs. edge scrolling.

In the old synaptics driver, we added options whenever something new came up and we tried to make those options generic. This was a big mistake. The driver now has over 70 configuration options resulting in a test matrix with a googolplex of combinations. In other words, it's completely untestable. To make a device work users often have to find the right combination of options from somewhere, write out a root-owned config file and then hope this works. Why do we still think this is acceptable? Even worse: some options are very specific to hardware but still spread in user forum examples like an STD during spring break.

In libinput, we're having none of that. When hardware doesn't work we expect a user to file a bug, we get it fixed upstream for the specific model and thus automatically fix it for all users of that device. We're leaning heavily on udev's hwdb which we have extended to correct devices when the firmware announces wrong information. This has the advantage that there is only one authoritative source of quirks a device needs to work. And we can update this as time goes by without having to worry about stale configuration options. One good example here is the custom acceleration profile that Lenovo X230 touchpads have in libinput. All in all, there is little pushback for the lack of hardware-specific configuration options and most users are fine with it once they accept the initial waiting period to get the patch into their distribution.

User-specific options are more contentious. In our opinion, some features should be configurable and others should not. Where to draw that line is of course quite undefined. For example, tapping on or off was one of the first configuration options available and that was never a cause for arguments either way (except whether the default should be on or off). Other options are more contentious. Clickpad software buttons are always on the bottom edge and their size is hardcoded (synaptics allowed almost free positioning of those buttons). Other features such as changing a two-finger tap to some other button event is not supported at all in libinput. This effectively comes down to cost. You see, whenever you write "it's just 5 lines of code to make this an option", what I think is "once the patch is reviewed and applied, I'll spend two days to write test cases and documentation. I'll need to handle any bug reports related to this, and I'm expected to make sure this option works indefinitely. Any addition of another feature may conflict with this option, so I need to make sure the right combination is possible and test cases are written." So your work ends after writing a 5 line patch, my work as maintainer merely starts. And unless it pays off long-term, the effort is not worth it. Some features make that cut, others don't if they are too much of a niche feature.

All this is of course nothing new and every software project needs to make these decisions. Input isn't even a special case here, it pales in comparison with e.g. the decisions UI designers need to make. However, in FOSS we have a tendency to think that because something is possible, it should be done. Legally, you have freedom to do almost anything with the software, so you can maintain a local fork of libinput with that extra feature applied. If that isn't acceptable, why would it be acceptable to merge the patch and expect others to shoulder the costs?

April 05, 2016

I just pushed a patch to libinput master to enable a middle button on the clickpad software buttons. Until now, our stance was that clickpads only get a left and right software button, split at the 50% mark. The reasoning is simple: touchpads only have markings for left and right buttons (if any!) and the middle button's extents are not easily discoverable if there is no visual or haptic feedback. A middle button event could however be triggered through middle button emulation, i.e. by clicking the touchpad with a finger on the left and right software button area (see the instructions here).

This is nice in theory but, as usual, reality gets in the way. Most interactions with the middle button are quick and short-lived, i.e. clicking the button once to paste. This interaction is what many touchpads are spectacularly bad at. For middle button emulation to be handled correctly, both fingers must be registered before the physical button press. The scanout rate on a touchpad is often too low and on touchpads with extremely light resistance like the T440 it's common to register the physical click before we know that there's a second finger on the touchpad. But even on a T450 and an X220 with much higher clickpad resistance I barely managed to get above 7 out of 10 correctly registered middle button events. That is simply not good enough.

So the patch I just pushed out to master enables a middle software button between the left and the right button. The exact width of the button scales with the touchpad but it's usually around 20-25mm and it's centered on the touchpad so despite the lack of visual or haptic guides it should be reliable to hit. The new behaviour is hard-coded and for now middle button emulation continues to work on touchpads. In the future, I expect I will remove middle button emulation on touchpads or at least disable it by default.

The middle button will be available in libinput 1.3.

April 03, 2016

Announcing systemd.conf 2016

We are happy to announce the 2016 installment of systemd.conf, the conference of the systemd project!

After our successful first conference 2015 we’d like to repeat the event in 2016 for the second time. The conference will take place on September 28th until October 1st, 2016 at betahaus in Berlin, Germany. The event is a few days before LinuxCon Europe, which also is located in Berlin this year. This year, the conference will consist of two days of presentations, a one-day hackfest and one day of hands-on training sessions.

The website is online now, please visit

Tickets at early-bird prices are available already. Purchase them at

The Call for Presentations will open soon, we are looking forward to your submissions! A separate announcement will be published as soon as the CfP is open.

systemd.conf 2016 is a organized jointly by the systemd community and

We are looking for sponsors! We’ve got early commitments from some of last year’s sponsors: Collabora, Pengutronix & Red Hat. Please see the web site for details about how your company may become a sponsor, too.

If you have any questions, please contact us at

March 30, 2016

When I started contributing to OpenStack, almost five years ago, it was a small ecosystem. There were no foundation, a handful of projects and you could understand the code base in a few days.

Fast forward 2016, and it is a totally different beast. The project grew to no less than 54 teams, each team providing one or more deliverable. For example, the Nova and Swift team each one produces one service and its client, whereas the Telemetry team produces 3 services and 3 different clients.

In 5 years, OpenStack went to a few IaaS projects, to 54 different teams tackling different areas related to cloud computing. Once upon a time, OpenStack was all about starting some virtual machines on a network, backed by images and volumes. Nowadays, it's also about orchestrating your network deployment over containers, while managing your application life-cycle using a database service, everything being metered and billed for.

This exponential growth has been made possible with the decision of the OpenStack Technical Committee to open the gates with the project structure reform voted at the end of 2014.

This amendment suppresses the old OpenStack model of "integrated projects" (i.e. Nova, Glance, Swift…). The big tent, as it's called, allowed OpenStack to land new projects every month, growing from the 20 project teams of December 2014 to the 54 we have today – multiplying the number of projects by 2.7 in a little more than a year.

Amazing growth, right?

And this was clearly a good change. I sat at the Technical Committee in 2013, when projects were trying to apply to be "integrated", after Ceilometer and Heat were. It was painful to see how the Technical Committee was trying to assess whether new projects should be brought in or not.

But what I notice these days, is how OpenStack is still stuck between its old and new models. On one side, it accepted a lot of new teams, but on the other side, many are considered as second-class citizens. Efforts are made to continue to build an OpenStack project that does not exist anymore.

For example, there is a team trying to define what's OpenStack core, named DefCore. That is looking to define which projects are, somehow, actually OpenStack. This leads to weird situations, such as having non-DefCore projects seeing their doc rejected from installation guides. Again, I reiterated my proposal to publish documentation as part of each project code to solve that dishonest situation and put everything on a level playing field

Some cross-projects specs are also pushed without implication of all OpenStack projects. For example, The deprecate-cli spec which proposes to deprecate command-line interface tools proposed by each project had a lot of sense in the old OpenStack sense, where the goal was to build a unified and ubiquitous cloud platform. But when you now have tens of projects with largely different scopes, this start making less sense. Still, this spec was merged by the OpenStack Technical Committee this cycle. Keystone is the first project to proudly force users to rely on openstack-client, removing its old keystone command line tool. I find it odd to push that specs when it's pretty clear that some projects (e.g. Swift, Gnocchi…) have no intention to go down that path.

Unfortunately, most specs pushed by the Technical Committee are in the realm of wishful thinking. It somehow makes sense, since only a few of the members are actively contributing to OpenStack projects, and they can't by themselves implement all of that magically. But OpenStack is no exception in the free software world and remains a do-ocracy.

There is good cross-project content in OpenStack, such as the API working group. While the work done should probably not be OpenStack specific, there's a lot that teams have learned by building various HTTP REST API with different frameworks. Compiling this knowledge and offering it as a guidance to various teams is a great help.

My fellow developer Chris Dent wrote a post about what he would do on the Technical Committee. In this article, he points to a lot of the shortcomings I described here, and his confusion between OpenStack being a product or being a kit is quite understandable. Indeed, the message broadcasted by OpenStack is still very confusing after the big tent openness. There's no enough user experience improvement being done.

The OpenStack Technical Committee election is opened for April 2016, and from what I read so far, many candidates are proposing to now clean up the big tent, kicking out projects that do not match certain criteria anymore. This is probably a good idea, there is some inactive project laying around. But I don't think that will be enough to solve the identity crisis that OpenStack is experiencing.

So this is why, once again this cycle, I will throw my hat in the ring and submit my candidacy for OpenStack Technical Committee.

March 16, 2016

So I been a little quiet on the blogging front due to a combination of a lot of things, not the least becoming a dad for the second time :)
Amelie with Aliyan

Hope to pick up pace with some blogging on our plans for Fedora Workstation in 2016 soon :)

March 13, 2016

A question that pops up with some regularity is whether libinput has a global configuration storage system, and, subsequently, why it doesn't have one. Comparisons are drawn to the X-specific tool xinput that allows to trigger all configuration options (see below though).

As maintainer of libinput, I can state that libinput will not get a configuration storage system. That job is squarely in the realm of the caller (i.e. usually the compositor and/or the desktop environment). Here are a few reasons:

First, you'd get conflicts with the caller. You need to prioritise which configuration has precendence to decide what to do when libinput and the caller disagree on a configuration item. You (the user) also have to figure why a configuration doesn't work when it's clearly enabled in one of those systems. Ok, so you can work around this by putting in a warning somewhere, provided that you make sure that the warning shows up in the right place and users know where to look. And to know when to send the warning, because again, that requires libinput to know which config has priority.

This is amplified by the lack of an autoritative support system. Speaking from X experience, the number of posts on, say, the ubuntu forums that advocate setting configuration options that haven't existed for years is quite sad. Or users advocate config snippets that set everything but the feature they claim it enables. That gets copy-pasted, etc.

Some configuration options can be incompatible with each other. If you have conflicting configuration systems it gets harder because each configuration system cannot make certain options an either/or anymore. We don't have any of these options just yet, but they may come in the future.

Over time, supported features will change. A setting may be exposed in the magic libinput config system and, a few months later, it is now also exposed by, say, GNOME. All the documentation that points to the libinput configuration is now obsolete because GNOME overrides it. Unless you're running version Foo.Bar, or maybe the patched version from $DISTRIBUTION. It gets messy quickly.

How fine-grained do you need the configuration? libinput's config API applies separately for each device. For most desktop environments it is sufficient to have configuration per device type (touchpad vs mouse vs tablet, etc.). But some exceptions may apply and a newer generic touchpad configuration may need to override device-specific older configuration. Because the newer config is designed around the new libinput version, but the older config is something you copied off a forum post from 2 years ago.

Oh, implicit in the request for a configuration system in libinput is usually also: please write and maintain that system for free, and fix any bugs in a timely manner. And I can't shake the feeling that a large number of these requests are only triggered by "my desktop doesn't expose this setting, if we get it in libinput I can force it into any desktop".

Ironically enough, the tool that is usually used as an example for how it could work is xinput. The only reason xinput works is because the xf86-input-libinput driver exposes properties and maps changes in those to the libinput API calls. i.e. everything is handled by libinput's caller, not libinput itself. Just as it should be.

Lest I be accused of only shooting down ideas here is a constructive idea: write a library or DBus-daemon that exposes some configuration storage and makes it easy to query. Then convince the various desktop environments to use that system instead their existing solutions, and bang, you have a generic configuration storage system for libinput.

March 10, 2016
The 4.5 release is close, it's time to look at what's in store for the next kernel's merge window in the Intel graphics driver.

Headline features for sure are that FBC and PSR are enabled by default. And this time around I'm really hopeful that it will stick, since Paulo&Rodrigo have done a stellar job hunting down all the corner cases, writing testcases for them all and in general making sure we have a really solid foundation for display power saving features. There's still some oddball cornercases, which means it's not yet enabled everywhere and on all platforms, but like I said: Looking really good, and the culmination of over 1 year of effort to get the code infrastructure fixed up and solid.

Another project ongoing is atomic display support, with again lots of work from Maarten and Matt and others to move things forward. Specifically Maarten adapted the load detect logic to atomic and removed a pile of legacy structures no longer needed in preparation of next steps. Matt continued to work on atomic display fifo watermark updates. Another area that has seen a lot of work in the background is runtime PM. Mika, Imre and Ville have massively improved the debugging infrastructure we have to track down rare bugs in our power status tracking code. They also merged lots of fixes in this area. Unfortunately we're not yet at a point where we can enable overall runtime PM for the device by default.

More on the feature side is pixel clock limit checks for all outputs and platforms from Mika Kahola. This is related to the work to also enable dynamic display clock scaling on Gen9, but that part is still being worked on. Ville worked a lot on how offsets and alignment are handled for display planes, all in preparation to support rotated multi-planar formats like NV12. Again a feature where a lot of hard work is required to make the final patch to enable it all look really simple.

On the plain hardware and platform enabling side Jani implemented support for version 3 of the VBT DSI descriptions, which should extend DSI panel support to all current hardware. Which includes the Surface 3.

Finally on the GEM side there have been mostly small fixes and imrpovements under the hood. Tvrtko decoupled the internal engine representation from the userspace ABI defines. He also restructed the CS irq handler code, and started to fix up some locking issues in execlist. Chris tracked down some coherency issues in the execlist interrupt handler. Dave Gordon finally started to somewhat untangle the execlist initialization logic and some of the confusion in how all the different software structures connect.

One real feature work by Alex Dai though was enabled ADS for GuC, which is some means to hand additional metadata to the GuC firmware scheduler. But since GuC based command submission isn't enabled yet, this doesn't have a direct impact.

And of course there have been bugfixes all over the place, as usual.
March 03, 2016
It's been a busy month.  I spent most of it working on the Raspberry Pi 3 support so I could have a working branch for upstream day 1.  That involved cleaning up the SDHOST driver for submission, cleaning up pinctrl DT, writing an I2C GPIO expander driver, debugging the I2C controller, fixing HDMI hotplug handling, debugging EMMC (not quite done!), scraping together some wireless firmware, and a bunch of work trying to get BT working on the UART.  I'm happy to say that on day 1 I published a branch that worked the same as a RPi2, and by the end of the day I had wireless working.  Some of the patches are now out for review, and I'll be working on cleaning up the rest in the near future.

For VC4, my big push recently has been to support some sort of panel.  Panels are really big with Raspberry Pi users, and it's the primary complaint I hear about the open driver.  The official 7" DSI touchscreen seems like the most promising device to support, since it doesn't hog all your GPIOs (letting me use my serial console) and it's quite popular.

Unfortunately, DSI isn't going well.  The DSI0 peripheral is responding to me, but while I can read from DSI1 it won't respond to any writes.  DSI1 is, unfortunately, the one that the RPi exposes on its DSI connector.  (Note: this debugging is with the panel up and running from the firmware's boot configuration).  Debug code is at drm-vc4-dsi-boot

So, since DSI1's not cooperating, I switched tasks.  I had also picked up a DPI panel using the Adafruit Kippah, and a little SPI-driven panel.  I hadn't started with DPI because hogging all the GPIOs makes kernel debugging a mostly black box experience.  The upside is that DPI is crazy simple -- set the GPIOs muxes to output from DPI, set one register in DPI, and use the same pixelvalve setup from before.  I was surprised when 2 days in I got display output.  Here it is, running HDMI and DPI at the same time:

Expect patches soon on a mailing list near you.  Until then, it's at drm-vc4-dpi-boot
March 01, 2016
Now is a high time to start discussing what you might want to do, for both student candidates and possible mentors.

Students, have a look at our project idea examples to get a feeling of what kind of projects you could propose. First you will need to contribute at least a small but significant patch to show that you understand the workflow, we have put some first task ideas together.

There are our application instructions for students. Of course all the pages are reachable from the Wayland GSoC wiki page and also the Wayland organization page.

If you want to become a mentor, please contact me or Kat, the contact details are on the Wayland GSoC wiki page.

Note, that students can also apply under the X.Org Foundation organization since Wayland is within their scope too and they also have other excellent graphics project ideas. You are welcome to submit your Wayland proposals to both projects.

Last week's libinput 1.2 release included the new graphics tablet support. This work, originally started as Lyude's 2014 GSoC project enables the use of drawing tablets through libinput (think Wacom tablets, not iPad/Android tablet).

Wacom tablets provide three input types: pen-like tools, buttons and touch rings/strips on the tablet itself, and touch. libinput's tablet support work focuses on the tool support only, pad buttons are still work in progress. Touch is already supported, either through the touchpad interfaces for external tablets (e.g. Intuos) or touchscreen interfaces for direct-touch tablets (e.g. Cintiq). So the below only talks about how to get events from tools, the pad events will be covered in a separate post when it's ready.

How do things work in the xf86-input-wacom driver, the current standard for tablet support in Linux? The driver checks the kernel device node for capabilities and creates multiple X devices, usually pen, eraser, cursor, and pad. When a pen comes into proximity the driver sends events through the pen device, etc. The pen device has all possible axes available but some (e.g. rotation) won't be used unless you're actually using an Wacom Art Pen. Unless specifically configured, all pens send through the same device, all erasers send through the same device, etc.

The libinput tablet API is a notable departure from this approach. In libinput, each tool is a separate entity generating events. A client doesn't wait for events from the tablet, it waits for events from a tool. The tablet itself is little more than an idle device. This has a couple of effects:

  • A struct libinput_tablet_tool is created on-the-fly as a tool comes into proximity and its this struct that events are tied to.
  • This means we default to per-tool handling. Two pens will always be separate and never multiplexed through one device. [1]
  • The tool can be uniquely identified [1]. It's easy to track a tool across two tablets even though this is quite a niche case.
  • A client can query the tool struct for capabilities, but not the tablet. Hence you cannot know what capabilities are available until a tool is in proximity.
  • The tool-based approach theoretically enables us to have multiple tools in proximity simultaneously, though no current hardware supports this.
Now, the advantages for the professional use-case where artists have multiple tools and/or multiple tablets is quite obvious. But it also changes some things for the average user with a single tool, specifically: the data provided by libinput is now precise and reflects the tool you're using. No more fake rotation axis on your standard pen. But you cannot query that information until the pen has been in proximity at least once. This is a change that clients will have to eventually deal with.

I just pushed the tablet support for the xf86-input-libinput driver and this driver reflects the new approach that libinput takes. When a tablet is detected, we create one device that has no axes and serves as the parent device. Once the first tool comes into proximity, a new device is created with the exact capabilities that the tool provides and the serial number in the name:

$> xinput list
⎡ Virtual core pointer id=2 [master pointer (3)]
⎜ ↳ Wacom Intuos5 touch M Pen Pen (0x99800b93) id=21 [slave pointer (2)]
⎣ Virtual core keyboard id=3 [master keyboard (2)]
↳ Wacom Intuos5 touch M Pen id=11 [slave keyboard (3)]
Device 11 is effectively mute (in the future this may become the Pad device, not sure yet). Device 21 appeared once I moved the tool into proximity. That tool has all the properties for configuration as they appear on the device. So far this works, but it means static configuration becomes more difficult. If you are still using xinit scripts or other scripts that only run once on login and not once per new device then the options may not apply correctly.

[1] provided the hardware can uniquely identify the pens

February 29, 2016
Durante semana, no mês de janeiro, eu estive no Brasil, no Rio de Janeiro para uma hackatona de design, com os designers de Endless e do projeto GNOME.

Que é o produto de Endless?

O maior produto de Endless é um sistema operativo para mini computadores que eles fazem, o Endless Mini e o Endless (Maxi?). O sistema operativo usa Linux e uma versão de GNOME com algumas changes (mudanças). O uso principal desses computadores é de ter muitas informações sem acesso à Internet. Por exemplo, tem muitos aplicativos sobre viagens, animais e etc que são diretamente dentro do computador, usando Wikipedia como fonte, e uma outra aplicação de receitas, com uma outra terceira fonte.

A hackatona em si

Os dois primeiros dias foram para viajar e visitar os usuários “beta” do Endless computadores, um dia na Rocinha, uma favela do Rio. E um outro dia em Magé, uma cidade rural do estado do Rio.
Os três últimos dias foram para discussões no escritório de Endless.


É uma coisa para fazer testes de usabilidade nos EUA e na Europa, e uma outra coisa de fazer isso num país sem habitude de usar “computadores pessoais” com Windows o MacOS X, mas muita mais habitude com celulares.

Por exemplo:
- Se se tem um mouse, vão dar dublo clique. Não é um problema com teclados sensíveis.
- Dividir a tela para ter um aplicativo ao lado de uma outra é difícil também.
- Se não se tem um acesso à Internet, não vão tentar instalar o acessar outros aplicativos que estão já no computador.
- Não estão acostumados a fechar aplicativos que não usam mais. Um sistema operativo de celular vai fechar os aplicativos antigos de maneira transparente.


Muitas coisas que o Endless ou GNOME podem mudar ou melhorar.

- GNOME tem alguns vídeos para explicar o “overview”. Um jogo ou tutorial podem ser melhor para explicar e ter certeza de que os usuários entendem.
- GNOME precisa melhorar a integração de modems celulares. ModemManager tem as funções que GNOME não usa.
- “Web” precisa de integração com detecção de malware, que ele não tem agora, mas foi uma ideia para o Summer Of Code dos anos precedentes.
- GNOME pode melhorar a primeira tela de todos os aplicativos e do sistema também, especialmente se o usuário não tem Internet para baixar conteúdo.

Muito obrigado a fundação GNOME pelas minhas passagens. Obrigado ao Endless e o Allan Day pela a organizacão. Obrigado ao meu empregador Red Hat pela oportunidade. E, enfim, obrigado à Caro pela correcção!

February 19, 2016

A little more than 3 months after our latest minor release, here is the new major version of Gnocchi, stamped 2.0.0. It contains a lot of new and exciting features, and I'd like to talk about some of them to celebrate!

You may notice that this release happens in the middle of the OpenStack release cycle. Indeed, Gnocchi does not follow that 6-months cycle, and we release whenever our code is ready. That forces us to have a more iterative approach, less disruptive for other projects and allow us to achieve a higher velocity. Applying the good old mantra release early, release often.


This version features a large documentation update. Gnocchi is still the only OpenStack server project that implements a "no doc, no merge" policy, meaning any code must come with the documentation addition or change included in the patch. The full documentation is included in the source code and available online at

Data split & compression

I've already covered this change extensively in my last blog about timeseries compression. Long story short, Gnocchi now splits timeseries archives in small chunks that are compressed, increasing speed and decreasing data size.

Measures batching support

Gnocchi now supports batching, which allow submitting several measures for different metric in a single request. This is especially useful in the context where your application tends to cache metrics for a while and is able to send them in a batch. Usage is fully documented for the REST API.

Group by support in aggregation

One of the most demanded features was the ability to do measure aggregation no resource, using a group by type query. This is now possible using the new groupby parameter to aggregation queries.

Ceph backend optimization

We improved the Ceph back-end a lot. Mehdi Abaakouk wrote a new Python binding for Ceph, called Cradox, that is going to replace the current Python rados module in the subsequent Ceph releases. Gnocchi makes usage of this new module to speed things up, making the Ceph based driver really, really faster than before. We also implemented asynchronous data deletion, which improves performance a bit.

The next step will be to run some new benchmarks like I did a few months ago and compare with the Gnocchi 1.3 series. Stay tuned!

February 16, 2016
How is an uncompressed raster image laid out in computer memory? How is a pixel represented? What are stride and pitch and what do you need them for? How do you address a pixel in memory? How do you describe an image in memory?

I tried to find a web page for dummies explaining all that, and all I could find was this. So, I decided to write it down myself with the things I see as essential.

An image and a pixel

Wikipedia explains the concept of raster graphics, so let us take that idea as a given. An image, or more precisely, an uncompressed raster image, consists of a rectangular grid of pixels. An image has a width and height measured in pixels, and the total number of pixels in an image is obviously width×height.

A pixel can be addressed with coordinates x,y after you have decided where the origin is and which way the coordinate axes go.

A pixel has a property called color, and it may or may not have opacity (or occupancy). Color is usually described as three numerical values, let us call them "red", "green", and "blue", or R, G, and B. If opacity (or occupancy) exists, it is usually called "alpha" or A. What R, G, B, and A actually mean is irrelevant when looking at how they are stored in memory. The relevant thing is that each of them is encoded with a certain number of bits. Each of R, G, B, and A is called a channel.

When describing how much memory a pixel takes, one can use units of bits or bytes per pixel. Both can be abbreviated as "bpp", so be careful which one it is and favour more explicit names in code. Also bits per channel is used sometimes, and channels can have a different number of bits per pixel each. For example, rgb565 format is 16 bits per pixel, 2 bytes per pixel, 5 bits per R and B channels, and 6 bits per G channel.

A pixel in memory

Pixels do not come in arbitrary sizes. A pixel is usually 32 or 16 bits, or 8 or even 1 bit. 32 and 16 bit quantities are easy and efficient to process on 32 and 64 bit CPUs. Your usual RGB-image with 8 bits per channel is most likely in memory with 32 bit pixels, the extra 8 bits per pixel are simply unused (often marked with X in pixel format names). True 24 bits per pixel formats are rarely used in memory because trading some memory for simpler and more efficient code or circuitry is almost always a net win in image processing. The term "depth" is often used to describe how many significant bits a pixel uses, to distinguish from how many bits or bytes it occupies in memory. The usual RGB-image therefore has 32 bits per pixel and a depth of 24 bits.

How channels are packed in a pixel is specified by the pixel format. There are dozens of pixel formats. When decoding a pixel format, you first have to understand if it is referring to an array of bytes (particularly used when each channel is 8 bits) or bits in a unit. A 32 bits per pixel format has a unit of 32 bits, that is uint32_t in C parlance, for instance.

The difference between an array of bytes and bits in a unit is the CPU architecture endianess. If you have two pixel formats, one written in array of bytes form and one written in bits in a unit form, and they are equivalent on big-endian architecture, then they will not be equivalent on little-endian architecture. And vice versa. This is important to remember when you are mapping one set of pixel formats to another, between OpenGL and anything else, for instance. Figure 1 shows three different pixel format definitions that produce identical binary data in memory.

Figure 1. Three equivalent pixel formats with 8 bits for each channel. The writing convention here is to list channels from highest to lowest bits in a unit. That is, abgr8888 has r in bits 0-7, g in bits 8-15, etc.

It is also possible, though extremely rare, that architecture endianess also affects the order of bits in a byte. Pixman, undoubtedly inheriting it from X11 pixel format definitions, is the only place where I have seen that.

An image in memory

The usual way to store an image in memory is to store its pixels one by one, row by row. The origin of the coordinates is chosen to be the top-left corner, so that the leftmost pixel of the topmost row has coordinates 0,0. First there are all the pixels of the first row, then the second row, and so on, including the last row. A two-dimensional image has been laid out as a one-dimensional array of pixels in memory. This is shown in Figure 2.

Image layout in memory.
Figure 2. The usual layout of pixels of an image in memory.
There are not only the width×height number of pixels, but each row also has some padding. The padding area is not used for storing anything, it only aligns the length of the row. Having padding requires a new concept: image stride.

Padding is often necessary due to hardware reasons. The more specialized and efficient hardware for pixel manipulation, the more likely it is that it has specific requirements on the row start and length alignment. For example, Pixman and therefore also Cairo (image backend particularly) require that rows are aligned to 4 byte boundaries. This makes it easier to write efficient image manipulations using vectorized or other instructions that may even process multiple pixels at the same time.

Stride or pitch

Image width is practically always measured in pixels. Stride on the other hand is related to memory addresses and therefore it is often given in bytes. Pitch is another name for the same concept as stride, but can be in different units.

You may have heard rules of thumb that stride is in bytes and pitch is in pixels, or vice versa. Stride and pitch are used interchangeably, so be sure of the conventions used in the code base you might be working on. Do not trust your instinct on bytes vs. pixels here.

Addressing a pixel

How do you compute the memory address of a given pixel x,y? The canonical formula is:
pixel_address = data_begin + y * stride_bytes + x * bytes_per_pixel.
The formula stars with the address of the first pixel in memory data_begin, then skips to row y while each row is stride_bytes long, and finally skips to pixel x on that row.

In C code, if we have 32 bit pixels, we can write
uint32_t *p = data_begin;
p += y * stride_bytes / sizeof(uint32_t);
p += x;
Notice, how the type of p affects the computations, counting in units of uint32_t instead of bytes.

Let us assume the pixel format in this example is argb8888 which is defined in bits of a unit form, and we want to extract the R value:
uint32_t v = *p;
uint8_t r = (v >> 16) & 0xff;
Finally, Figure 3 gives a cheat sheet.

Figure 3. How to compute the address of a pixel.

Now we have covered the essentials, and you can stop reading. The rest is just good to know.

Not everyone has the "right" way up

In the above we have assumed that the image origin is the top-left corner, and rows are stored top-most first. The most notable exception to this is the OpenGL API, which defines image data to be in bottom-most row first. (Traditionally also BMP file format does this.)

Multi-planar formats

In the above, we have talked about single-planar formats. That means that there is only a single two-dimensional array of pixels forming an image. Multi-planar formats use two or more two-dimensional arrays for forming an image.

A simple example with an RGB-image would be to store R channel in the first plane (2D-array) and GB channels in the second plane. Pixels on the first plane have only R value, while pixels on the second plane have G and B values. However, this example is not used in practice.

Common and real use cases for multi-planar images are various YUV color formats. Y channel is stored on the first plane, and UV channels are stored on the second plane, for instance. A benefit of this is that e.g. the UV plane can be sub-sampled - its resolution could be only half of the plane with Y, saving some memory.

Tiled formats

If you have read about GPUs, you may have heard of tiling or tiled formats (tiled renderer is a different thing). These are special pixel layouts, where an image is not stored row by row but a rectangular block by block. Tiled formats are far too wild and various to explain here, but if you want a taste, take a look at Nouveau's documentation on G80 surface formats.
February 15, 2016
As promised in the post introducing my recent work on Patchwork, I've written some more in-depth documentation to explain how to hook testing to Patchwork. I've also realized that a blog post might not be the best place to put that documentation and opted to put it in the proper manual:

Happy reading!

The first major version of the scalable timeserie database I work on, Gnocchi was a released a few months ago. In this first iteration, it took a rather naive approach to data storage. We had little ideas about if and how our distributed back-ends were going to be heavily used, so we stuck to the code of the first proof-of-concept written a couple of years ago.

Recently we got more feedbacks from our users, ran a few benchmarks. That gave us enough feedback to start investigating in improving our storage strategy.

Data split

Up to Gnocchi 1.3, all data for a single metric are stored in a single gigantic file per aggregation method (min, max, average…). This means that the file can grow to several megabytes in size, which make it slow to manipulate. For the next version of Gnocchi, our first work has been to rework that storage and split the data into smaller parts.

Gnocchi Carbonara archives split

The diagram above shows how data are organized inside Gnocchi. Until version 1.3, there would have been only one file for each aggregation methods.

In the upcoming 2.0 version, Gnocchi will split all these data into smaller parts, where each data split is stored in a file/object. This allows to manipulate smaller pieces of data and to increase the parallelism of the CRUD operations on the back-end – leading to large speed improvement.

In order to split timeseries into several chunks, Gnocchi defines a maximum number of N points to keep per chunk, to limit their maximum size. It then defines a hash function that produces a non-unique key for any timestamp. It makes it easy to find in which chunk any timestamp should be stored or retrieved.

Data compression

Up to Gnocchi 1.3, the data stored for each metric is simply serialized using msgpack, a fast and small serialization format. Though, this format does not provide any compression. That means that storing data points needs 8 bytes for a timestamp (64 bits timestamp with nanosecond precision) and 8 bytes for a value (64 bits double-precision floating-point), plus some overhead (extra information and msgpack itself).

After looking around on how to compress all these measures, I stumbled upon a paper from some Facebook engineers called about Gorilla, their in-memory timeserie database, entitled "Gorilla: A Fast, Scalable, In-Memory Time Series Database". For reference, part of this encoding is also used by InfluxDB in its new storage engine.

The first technique I implemented is easy enough, and it's inspired from delta-of-delta encoding. Instead of storing each timestamp for each data point, and since all the data points are aggregated on a regular interval, we transpose points to be the time difference divided by the interval. For example, the suite of timestamps timestamps = [41230, 41235, 41240, 41250, 41255] is encoded into timestamps = [41230, 1, 1, 2, 1], interval = 5. This allows regular compression algorithms to reduce the size of the integer list using run-length encoding.

To actually compress the values, I tried two different algorithms:

  • [LZ4](, a fast compression/decompression algorithm

  • The XOR based compression scheme described in the Gorilla paper mentioned above – that I had to implement myself. For reference, it also exists a Go implementation in go-tsz.

I then benchmarked these solutions:

Gnocchi Carbonara compression speed

The XOR algorithm implemented in Python is pretty slow, compared to LZ4. Truth is that python-lz4 is fully implemented in C, which makes it fast. I've profiled my XOR implementation in Python, to discover that one operation took 20 % of the time: count_lead_and_trail_zeroes, which is in charge of counting the number of leading and trailing zeroes in a binary number.

Gnocchi Carbonara compression XOR profiling

I tried 2 Python implementations of the same algorithm (and submitted them to my friend and Python developer Victor Stinner by the way).

The first version using string search with .index() is 10× faster than the second one that only do integer computation. Ah, Python… As Victor explained, each Python operation is slow and there's a lot in the second version, whereas .index() is implemented in C and really well optimized and only needs 2 Python operations.

Finally, I ended up optimizing that code by leveraging cffi to use directly ffsll() and flsll(). That decreased the run-time of count_lead_and_trail_zeroes by 45 %, making the entire XOR compression code speed increased by a small 7 %. This is not enough to catch up with LZ4 speed. At this stage, the only solution to achieve a high-speed would probably to go with a full C implementation.

Gnocchi Carbonara compression size

Considering the compression ratio of the different algorithms, they are pretty much identical. The worst case scenario (random values) for LZ4 compress down to 9 bytes per data point, whereas XOR can go down to 7.38 bytes per data point. In general XOR encoding beats LZ4 by 15 %, except for cases where all values are 0 or 1. However, LZ4 is faster than XOR by a factor of 4×-70× depending on cases.

That means that we'll use LZ4 for data compression in Gnocchi 2.0. It's possible that we could achieve as fast compression/decompression algorithm, but I don't think it's worth the effort right now – it'd represent a lot of code to write and to maintain.

Just FYI:

There is now a mailing list for virglrenderer library hosted at freedesktop. It is to be used for development discussion of the virgl->GL renderer library and for patches to it.

The git tree is also now hosted at:

My personal repo will only be used for my own development stuff.

I'll also get a freedesktop patchwork instance set up for this asap.

I'm also contemplating bugzilla vs phabricator.

February 13, 2016

The mailing-list problem

Many software projects use mailing-lists, which usually means mailman, not only for discussions around that project, but also for code contributions. A lot of open source projects work that way, including the one I interact with the most, the Linux kernel. A contributor sends patches to a mailing list, these days using git send-email, and waits for feedback or for his/her patches to be picked up for inclusion if fortunate enough.

Problem is, mailing-lists are awful for code contribution.

A few of the issues at hand:
  • Dealing with patches and emails can be daunting for new contributors,
  • There's no feedback that someone will look into the patch at some point,
  • There's no tracking of which patch has been processed (eg. included into the tree). A shocking number of patches are just dropped as a direct consequence,
  • There's no way to add metadata to a submission. For instance, we can't assign a reviewer from a pool of people working on the project. As a result, review is only working thanks to the good will of people. It's not necessarily a bad thing, but it doesn't work in a corporate environment with deadlines,
  • Mailing-lists are all or nothing: one subscribes to the activity of the full project, but may only care about following the progress of a couple of patches,
  • There's no structure at all actually, it's all just emails,
  • No easy way to hook continuous integration testing,
  • The tools are really bad any time they need to interact with the mailing-list: try to send a patch as a reply to a review comment, addressing it. It starts with going to look at the headers of the review email to copy/paste its Message-ID, followed by an arcane incantation:
    $ git send-email --to=<mailing-list> --cc=<reviewer> \
    --in-reply-to=<reviewer-mail-message-id> \
    --reroll-count 2 -1 HEAD~2

Alternative to mailing-lists

Before mentioning Patchwork, it's worth saying that a project can merely decide to switch to using something else than a mailing-list to handle code contributions; To name a few: Gerrit, Phabricator, Github, Gitlab, Crucible.

However, there can be some friction preventing the adoption those tools. People have built their own workflow around mailing-lists for years and it's somewhat difficult to adopt anything else over night. Projects can be big with no clear way to make decisions, so sticking to mailing-lists can just be the result of inertia.

The alternatives also have problems of their own and there's no clear winner, nothing like how git took over the world.


So, the path of least resistance is to keep mailing-lists. Jemery Kerr had the idea to augment mailing-lists with a tool that would track the activity there and build a database of patches and their status (new, reviewed, merged, dropped, ...). Patchwork was born.

Here are some Patchwork instances in the wild:

The KMS and DRI Linux subsystems are using to host their mailing-lists, which includes the i915 Intel driver, project I've been contributing to since 2012. We have an instance of Patchwork there, and, while somewhat useful, the tool fell short of what we really wanted to do with our code contribution process.

Patches are welcome!

So? it was time to do something about the situation and I started improving Patchwork to answer some of the problems outlined above. Given enough time, it's possible to help on all fronts.

The code can be found on github, along with the current list of issues and enhancements we have thought about. I also maintain's instance for the graphics team at Intel, but also any project that would like to give it a try.

Design, Design, Design

First things first, we improved how Patchwork looks and feels. Belén, of OpenEmbedded/Yocto fame, has very graciously spent some of her time to rethink how the interaction should behave.

Before, ...

... and after!

There is still a lot of work remaining to roll out the new design and the new interaction model on all of Patchwork. A glimpse of what that interaction looks like so far:


One thing was clear from the start: I didn't want to have Patches as the main object tracked, but Series, a collection of patches. Typically, developing a  new feature requires more than one patch, especially with the kernel where it's customary to write a lot of orthogonal smaller commits rather than a big (and often all over the place) one. Single isolated commits, like a small bug fix, are treated as a series of one patch.

But that's not all. Series actually evolve over time as the developer answers review comments and the patch-set matures. Patchwork also tracks that evolution, creating several Revisions for the same series. This colour management series from Lionel shows that history tracking (beware, this is not the final design!).

I have started documenting what Patchwork can understand. Two ways can be used to trigger the creation of a new revision: sending a revised patch as a reply to the reviewer email or resending the full series with a similar cover letter subject.

There are many ambiguous cases and some others cases not really handled yet, one of them being sending a series as a reply to another series. That can be quite confusing for the patch submitter but the documented flows should work.


Next is dusting off Patchwork's XML-RPC API. I wanted to be able to use the same API from both the web pages and git-pw, a command line client.

This new API is close to complete enough to replace the XML-RPC one and already offers a few more features (eg. testing integration). I've also been carefully documenting it.


Rob Clark had been asking for years for a better integration with git from the Patchwork's command line tool, especially sharing its configuration file. There also are a number of git "plugins" that have appeared to bridge git with various tools, like git-bz or git-phab.

Patchwork has now his own git-pw, using the REST API. There, again, more work is needed to be in an acceptable shape, but it can already be quite handy to, for instance, apply a full series in one go:

$ git pw apply -s 122
Applying series: DP refactoring v2 (rev 1)
Applying: drm/i915: Don't pass *DP around to link training functions
Applying: drm/i915: Split write of pattern to DP reg from intel_dp_set_link_train
Applying: drm/i915 Call get_adjust_train() from clock recovery and channel eq
Applying: drm/i915: Move register write into intel_dp_set_signal_levels()
Applying: drm/i915: Move generic link training code to a separate file

Testing Integration

This is what kept my busy the last couple of months: How to integrate patches sent to a mailing-list with Continuous Integration systems. The flow I came up with is not very complicated but a picture always helps:

Hooking tests to Patchwork

Patchwork is exposing an API so mailing-lists are completely abstracted from systems using that API. Both retrieving the series/patches to test and sending back test results is done through HTTP. That makes testing systems fairly easy to write.

Tomi Sarvela hooked our test-suite, intel-gpu-tools, to patches sent to intel-gfx and we're now gating patch acceptance to the kernel driver with the result of that testing.

Of course, it's not that easy. In our case, we've accumulated some technical debt in both the driver and the test suite, which means it will take time to beat both into be a fully reliable go/no-go signal. People have been actively looking at improving the situation though (thanks!) and I have hope we can reach that reliability sooner rather than later.

As a few words of caution about the above, I'd like to remind everyone that the devil always is in the details:
  • We've restricted the automated testing to a subset of the tests we have (Basic Acceptance Tests aka BATs) to provide a quick answer to developers, but also because some of our tests aren't well bounded,
  • We have no idea how much code coverage that subset really exercises, playing with the kernel gcov support would be interesting for sure,
  • We definitely don't deal with the variety of display sinks (panels and monitors) that are present in the wild.
This means we won't catch all the i915 regressions. Time will definitely improve things as we connect more devices to the testing system and fix our tests and driver.

Anyway, let's leave i915 specific details for another time. A last thing about this testing integration is that Patchwork can be configured to send emails back to the submitter/mailing-list with some test results. As an example, I've written a integration that will tell people to fix their patches without the need of a reviewer to do it. I know, living in the future.

For more in-depth documentation about continuous testing with Patchwork, see the testing section of the manual.

What's next?

This blog post is long enough as is, let's finish by the list of things I'd like to be in a acceptable state before I'll happily tag a first version:
  • Series support without without known bugs
  • REST API and git pw able to replace XML-RPC and pwclient
  • Series, Patches and Bundles web pages ported to the REST API and the new filter/action interaction.
  • CI integration
  • Patch and Series life cycle redesigned with more automatic state changes (ie. when someone gives a reviewed-by tag, the patch state should change to reviewed)
There are plenty of other exciting ideas captured in the github issues for when this is done.


February 10, 2016
I do a lot of cross driver subsytem refactorings, and DRM has lots of drivers that only run on ARM. Which means I routinely break a leg or arm since at least in the past cross-compiling was somehow always super painful. But I've just learned (thanks to Daniel Stone) that cross-compiling this stuff has become real easy, so here's my handy script for this. This assumes Debian, but the difference is just in installing a different cross-compiler toolchain.

First get the tooling:

$ sudo apt-get install gcc-arm-linux-gnueabihf
Then create another git checkout. I prefer the recently merged worktree support, since with that all your branches and remotes transparently work in the new checkout, too.

~/kernel/src/ $ git worktree add ../armhf HEAD
With that we're all set up. For building any $branch wrap the following lines into a scrip:

$ cd ~/kernel/armhf
$ git checkout --detach $branch
$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- multi_v7_defconfig zImage modules

I'm using --detach to avoid complaints from the git worktree code that I've checked out a branch already in the main repo. Note: Never accidentally run plain make in the ARM build directory - mixing up architectures seriously confuses the kernel build system.
February 08, 2016
Back from another awesome, still the best general Linux conference even the second time around. I've also done a talk about all the shiny new atomic display support in the kernel, and the great LCA AV team has uploaded the video already, and the slides are here.
February 06, 2016

Last week-end, I was in Brussels, Belgium for the FOSDEM, one of the greatest open source developer conference. I was not sure to go there this year (I already skipped it in 2015), but it turned out I was requested to do a talk in the shared Lua & GNU Guile devroom.

As a long time Lua user and developer, and a follower of GNU Guile for several years, the organizer asked me to run a talk that would be a link between the two languages.

I've entitled my talk "How awesome ended up with Lua and not Guile" and gave it to a room full of interested users of the awesome window manager 🙂.

We continued with a panel discussion entitled "The future of small languages Experience of Lua and Guile" composed of Andy Wingo, Christopher Webber, Ludovic Courtès, Etiene Dalcol, Hisham Muhammaad and myself. It was a pretty interesting discussion, where both language shared their views on the state of their languages.

It was a bit awkward to talk about Lua & Guile whereas most of my knowledge was years old, but it turns out many things didn't change. I hope I was able to provide interesting hindsight to both community. Finally, it was a pretty interesting FOSDEM to me, and it was a long time I didn't give talk here, so I really enjoyed it. See you next year!

February 04, 2016

The slides from my FOSDEM talk are now available.

The video of the talk will eventually be available. It takes a lot of time and hard work to prepare all of the videos. The notes in the slides are probably more coherent that I am in the video. There were a few good questions at the end, so you may want to fast forward to that.

I am planning to post the source for the partial Amiga simulator next week. When I do, I will also have another blog post. Watch this space...

February 03, 2016

I've moved my blog, you can read my new posts here

January 30, 2016

Over the last few days I've been at the GNOME Developer Experience hackfest in Brussels, looking into xdg-app and how best to use it in Debian and Debian derivatives.

xdg-app is basically a way to run "non-core" software on Linux distributions, analogous to apps on Android and iOS. It doesn't replace distributions like Debian or packaging systems, but it adds a layer above them. It's mostly aimed towards third-party apps obtained from somewhere that isn't your distribution vendor, aiming to address a few long-standing problems in that space:

  • There's no single ABI that can be called "a standard Linux system" in the same way there would be for Windows or OS X or Android or whatever, apart from LSB which is rather limited. Testing that a third-party app "works on Linux", or even "works on stable Linux releases from 2015", involves a combinatorial explosion of different distributions, desktop environments and local configurations. Steam uses the Steam Runtime, a chroot environment closely resembling Ubuntu 12.04 LTS; other vendors tend to test on a vaguely recent Ubuntu LTS and leave it at that.

  • There's no widely-supported mechanism for installing third-party applications as an ordinary user. used to distribute Ubuntu- and Debian-compatible .deb files, but installing a .deb involves running arbitrary vendor-supplied scripts as root, which should worry anyone who wants any sort of privilege-separation. (They have now switched to executable self-extracting installers, which involve running arbitrary vendor-supplied scripts as an ordinary user... better, but not perfect.)

  • Relatedly, the third-party application itself runs with the user's full privileges: a malicious or security-buggy third-party application can do more or less anything, unless you either switch to a different uid to run third-party apps, or use a carefully-written, app-specific AppArmor profile or equivalent.

To address the first point, each application uses a specified "runtime", which is available as /usr inside its sandbox. This can be used to run application bundles with multiple, potentially incompatible sets of dependencies within the same desktop environment. A runtime can be updated within its branch - for instance, if an application uses the "GNOME 3.18" runtime (consisting of a basic Linux system, the GNOME 3.18 libraries, other related libraries like Mesa, and their recursive dependencies like libjpeg), it can expect to see minor-version updates from GNOME 3.18.x (including any security updates that might be necessary for the bundled libraries), but not a jump to GNOME 3.20.

To address the second issue, the plan is for application bundles to be available as a single file, containing metadata (such as the runtime to use), the app itself, and any dependencies that are not available in the runtime (which the app vendor is responsible for updating if necessary). However, the primary way to distribute and upgrade runtimes and applications is to package them as OSTree repositories, which provide a git-like content-addressed filesystem, with efficient updates using binary deltas. The resulting files are hard-linked into place.

To address the last point, application bundles run partially isolated from the wider system, using containerization techniques such as namespaces to prevent direct access to system resources. Resources from outside the sandbox can be accessed via "portal" services, which are responsible for access control; for example, the Documents portal (the only one, so far) displays an "Open" dialog outside the sandbox, then allows the application to access only the selected file.

xdg-app for Debian

One thing I've been doing at this hackfest is improving the existing Debian/Ubuntu packaging for xdg-app (and its dependencies ostree and libgsystem), aiming to get it into a state where I can upload it to Debian experimental. Because xdg-app aims to be a general freedesktop project, I'm currently intending to make it part of the "Utopia" packaging team alongside projects like D-Bus and polkit, but I'm open to suggestions if people want to co-maintain it elsewhere.

In the process of updating xdg-app, I sent various patches to Alex, mostly fixing build and test issues, which are in the new 0.4.8 release.

I'd appreciate co-maintainers and further testing for this stuff, particularly ostree: ostree is primarily a whole-OS deployment technology, which isn't a use-case that I've tested, and in particular ostree-grub2 probably doesn't work yet.

Source code:

Binaries (no trust path, so only use these if you have a test VM):

  • deb xdg-app main

The "Hello, World" of xdg-apps

Another thing I set out to do here was to make a runtime and an app out of Debian packages. Most of the test applications in and around GNOME use the "freedesktop" or "GNOME" runtimes, which consist of a Yocto base system and lots of RPMs, are rebuilt from first principles on-demand, and are extensive and capable enough that they make it somewhat non-obvious what's in an app or a runtime.

So, here's a step-by-step route through xdg-app, first using typical GNOME instructions, but then using the simplest GUI app I could find - xvt, a small xterm clone. I'm using a Debian testing (stretch) x86_64 virtual machine for all this. xdg-app currently requires systemd-logind to put users and apps in cgroups, either with systemd as pid 1 (systemd-sysv) or systemd-shim and cgmanager; I used the default systemd-sysv. In principle it could work with plain cgmanager, but nobody has contributed that support yet.

Demonstrating an existing xdg-app

Debian's kernel is currently patched to be able to allow unprivileged users to create user namespaces, but make it runtime-configurable, because there have been various security issues in that feature, making it a security risk for a typical machine (and particularly a server). Hopefully unprivileged user namespaces will soon be secure enough that we can enable them by default, but for now, we have to do one of three things to let xdg-app use them:

  • enable unprivileged user namespaces via sysctl:

    sudo sysctl kernel.unprivileged_userns_clone=1
  • make xdg-app root-privileged (it will keep CAP_SYS_ADMIN and drop the rest):

    sudo dpkg-statoverride --update --add root root 04755 /usr/bin/xdg-app-helper
  • make xdg-app slightly less privileged:

    sudo setcap cap_sys_admin+ep /usr/bin/xdg-app-helper

First, we'll need a runtime. The standard xdg-app tutorial would tell you to download the "GNOME Platform" version 3.18. To do that, you'd add a remote, which is a bit like a git remote, and a bit like an apt repository:

$ wget
$ xdg-app remote-add --user --gpg-import=gnome-sdk.gpg gnome \

(I'm ignoring considerations like trust paths and security here, for brevity; in real life, you'd want to obtain the signing key via https and/or have a trust path to it, just like you would for a secure-apt signing key.)

You can list what's available in a remote:

$ xdg-app remote-ls --user gnome

The Platform runtimes are what we want here: they are collections of runtime libraries with which you can run an application. The Sdk runtimes add development tools, header files, etc. to be able to compile apps that will be compatible with the Platform.

For now, all we want is the GNOME 3.18 platform:

$ xdg-app install --user gnome org.gnome.Platform 3.18

Next, we can install an app that uses it, from Alex Larsson's nightly builds of a subset of GNOME. The server they're on doesn't have a great deal of bandwidth, so be nice :-)

$ wget
$ xdg-app remote-add --user --gpg-import=nightly.gpg nightly \
$ xdg-app install --user nightly org.mypaint.MypaintDevel

We now have one app, and the runtime it needs:

$ xdg-app list
$ xdg-app run org.mypaint.MypaintDevel
[you see a GUI window]

Digression: what's in a runtime?

Behind the scenes, xdg-app runtimes and apps are both OSTree trees. This means the ostree tool, from the package of the same name, can be used to inspect them.

$ sudo apt install ostree
$ ostree refs --repo ~/.local/share/xdg-app/repo

A "ref" has roughly the same meaning as in git: something like a branch or a tag. ostree can list the directory tree that it represents:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
d00755 0 0      0 /
-00644 0 0    493 /metadata
d00755 0 0      0 /files
$ ostree ls --repo ~/.local/share/xdg-app/repo \
    runtime/org.gnome.Platform/x86_64/3.18 /files
d00755 0 0      0 /files
l00777 0 0      0 /files/local -> ../var/usrlocal
l00777 0 0      0 /files/sbin -> bin
d00755 0 0      0 /files/bin
d00755 0 0      0 /files/cache
d00755 0 0      0 /files/etc
d00755 0 0      0 /files/games
d00755 0 0      0 /files/include
d00755 0 0      0 /files/lib
d00755 0 0      0 /files/lib64
d00755 0 0      0 /files/libexec
d00755 0 0      0 /files/share
d00755 0 0      0 /files/src

You can see that /files in a runtime is basically a copy of /usr. This is not coincidental: the runtime's /files gets mounted at /usr inside the xdg-app container. There is also some metadata, which is in the ini-like syntax seen in .desktop files:

$ ostree cat --repo ~/.local/share/xdg-app/repo \
    runtime/org.gnome.Platform/x86_64/3.18 /metadata

[Extension org.freedesktop.Platform.GL]

[Extension org.freedesktop.Platform.Timezones]

[Extension org.gnome.Platform.Locale]


Looking at an app, the situation is fairly similar:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
d00755 0 0      0 /
-00644 0 0    258 /metadata
d00755 0 0      0 /export
d00755 0 0      0 /files

This time, /files maps to what will become /app for the application, which was compiled with --prefix=/app:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /files
d00755 0 0      0 /files
-00644 0 0   4599 /files/manifest.json
d00755 0 0      0 /files/bin
d00755 0 0      0 /files/lib
d00755 0 0      0 /files/share

There is also a /export directory, which is made visible to the host system so that the contained app can appear as a "first-class citizen" in menus:

$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /export
d00755 0 0      0 /export
d00755 0 0      0 /export/share
user@debian:~$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /export/share
d00755 0 0      0 /export/share
d00755 0 0      0 /export/share/app-info
d00755 0 0      0 /export/share/applications
d00755 0 0      0 /export/share/icons
user@debian:~$ ostree ls --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /export/share/applications
d00755 0 0      0 /export/share/applications
-00644 0 0    715 /export/share/applications/org.mypaint.MypaintDevel.desktop
$ ostree cat --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master \
[Desktop Entry]
Name=(Nightly) MyPaint
Exec=mypaint %f
Comment=Painting program for digital artists
GenericName=Raster Graphics Editor
GenericName[fr]=Éditeur d'Image Matricielle

Again, there's some metadata:

$ ostree cat --repo ~/.local/share/xdg-app/repo \
    app/org.mypaint.MypaintDevel/x86_64/master /metadata


[Extension org.mypaint.MypaintDevel.Debug]

Building a runtime, probably the wrong way

The way in which the reference/demo runtimes and containers are generated is... involved. As far as I can tell, there's a base OS built using Yocto, and the actual GNOME bits come from RPMs. However, we don't need to go that far to get a working runtime.

In preparing this runtime I'm probably completely ignoring some best-practices and tools - but it works, so it's good enough.

First we'll need a repository:

$ sudo install -d -o$(id -nu) /srv/xdg-apps
$ ostree init --repo /srv/xdg-apps

I'm just keeping this local for this demonstration, but you could rsync it to a web server's exported directory or something - a lot like a git repository, it's just a collection of files. We want everything in /usr because that's what xdg-app expects, hence usrmerge:

$ sudo mount -t tmpfs -o mode=0755 tmpfs /mnt
$ sudo debootstrap --arch=amd64 --include=libx11-6,usrmerge \
    --variant=minbase stretch /mnt
$ sudo mkdir /mnt/runtime
$ sudo mv /mnt/usr /mnt/runtime/files

This obviously has a lot of stuff in it that we don't need - most obviously init, apt and dpkg - but it's Good Enough™.

We will also need some metadata. This is sufficient:

$ sudo sh -c 'cat > /mnt/runtime/metadata'

That's a runtime. We can commit it to ostree, and generate xdg-app metadata:

$ ostree commit --repo /srv/xdg-apps \
    --branch runtime/org.debian.Debootstrap/x86_64/8.20160130 \
$ fakeroot ostree commit --repo /srv/xdg-apps \
    --branch runtime/org.debian.Debootstrap/x86_64/8.20160130
$ fakeroot xdg-app build-update-repo /srv/xdg-apps

(I'm not sure why ostree and xdg-app report "Operation not permitted" when we aren't root or fakeroot - feedback welcome.)

build-update-repo would presumably also be the right place to GPG-sign your repository, if you were doing that.

We can add that as another xdg-app remote:

$ xdg-app remote-add --user --no-gpg-verify local file:///srv/xdg-apps
$ xdg-app remote-ls --user local

Building an app, probably the wrong way

The right way to build an app is to build a "SDK" runtime - similar to that platform runtime, but with development files and tools - and recompile the app and any missing libraries with ./configure --prefix=/app && make && make install. I'm not going to do that, because simplicity is nice, and I'm reasonably sure xvt doesn't actually hard-code /usr into the binary:

$ install -d xvt-app/files/bin
$ sudo apt-get --download-only install xvt
$ dpkg-deb --fsys-tarfile /var/cache/apt/archives/xvt_2.1-20.1_amd64.deb \
    | tar -xvf - ./usr/bin/xvt
$ mv usr xvt-app/files

Again, we'll need metadata, and it's much simpler than the more production-quality GNOME nightly builds:

$ cat > xvt-app/metadata

$ fakeroot ostree commit --repo /srv/xdg-apps \
    --branch app/org.debian.packages.xvt/x86_64/2.1-20.1 xvt-app
$ fakeroot xdg-app build-update-repo /srv/xdg-apps
Updating appstream branch
No appstream data for runtime/org.debian.Debootstrap/x86_64/8.20160130
No appstream data for app/org.debian.packages.xvt/x86_64/2.1-20.1
Updating summary
$ xdg-app remote-ls --user local

The obligatory screenshot

OK, good, now we can install it:

$ xdg-app install --user local org.debian.Debootstrap 8.20160130
$ xdg-app install --user local org.debian.packages.xvt 2.1-20.1
$ xdg-app run --branch=2.1-20.1 org.debian.packages.xvt

xvt in a container

and you can play around with the shell in the xvt and see what you can and can't do in the container.

I'm sure there were better ways to do most of this, but I think there's value in having such a simplistic demo to go alongside the various GNOMEish apps.


Thanks to all those!

January 25, 2016

libinput 1.1.5 has a change in how we deal with semi-mt touchpads, in particular: interpretation of touch points will cease and we will rely on the single touch position and the BTN_TOOL_* flags instead to detect multi-finger interaction. For most of you this will have little effect, even if you have a semi-mt touchpad. As a reminder: semi-mt touchpads are those that can detect the bounding box of two-finger interactions but cannot identify which finger is which. This provides some ambiguity, a pair of touch points at x1/y1 and x2/y2 could be a physical pair of touches at x1/y2 and x2/y1. More importantly, we found issues with semi-mt touchpads that go beyond the ambiguity and reduce the usability of the touchpoints.

Some devices have an extremely low resolution when two-fingers are down (see Bug 91135), the data is little better than garbage. We have had 2-finger scrolling disabled on these touchpads since before libinput 1.0. More recently, Bug 93583 showed that some semi-mt touchpads do not assign the finger positions for some fingers, especially when three fingers are down. This results in touches defaulting to position 0/0 which triggers palm detection or results in scroll jumps, neither of which are helpful. Other semi-mt touchpads assign a straightforward 0/0 as position data and don't update until several events later (see Red Hat Bug 1295073). libinput is not particularly suited to handle this, and even if it did, the touchpad's reaction to a three-finger tap would be noticeably delayed.

In light of these problems, and since these affect all three big semi-mt touchpad manufacturers we decided to drop back and handle semi-mt touchpads as single-finger touchpads with extra finger capability. This means we track only one touchpoint but detect two- and three-finger interactions. Two-finger scrolling is still possible and so is two- and three-finger tapping or the clickfinger behaviour. What isn't possible anymore are pinch gestures and some of the built-in palm detection is deactivated. As mentioned above, this is unlikely to affect you too much, but if you're wondering why gestures don't work on your semi-mt device: the data is garbage.

This question turns up a lot, on the irc channel, mailing lists, forums, your local Stammtisch and at weddings. The correct answer is: this is the wrong question. And I'll explain why in this post. Note that I'll be skipping over a couple of technical bits, if you notice those then you're probably not the person that needs to ask the question in the first place.

On your current Linux desktop, right now, you have at least three processes running: the X server, a window manager/compositor and your web browser. The X server is responsible for rendering things to the screen and handling your input. The window manager is responsible for telling the X server where to render the web browser window. Your web browser is responsible for displaying this post. The X server and the window manager communicate over the X protocol, the X server and the web browser do so too. The browser and the window manager communicate through X properties using the X server as a middle man. That too is done via the X protocol. Note: This is of course a very simplified view.

Wayland is a protocol and it replaces the X protocol. Under Wayland, you only need two processes: a compositor and your web browser. The compositor is effectively equivalent to the X server and window manager merged into one thing, and it communicates with the web browser over the Wayland protocol. For this to work you need the compositor and the web browser to be able to understand the Wayland protocol.

This is why the question "is wayland ready yet" does not make a lot of sense. Wayland is the communication protocol and says very little about the implementation of the two sides that you want to communicate.

Let's assume a scenario where we all decide to switch from English to French because it sounds nicer and English was designed in the 80s when ASCII was king so it doesn't support those funky squiggles that the French like to put on every second character. In this scenario, you wouldn't ask "Is French ready yet?" If no-one around you speaks French yet, then that's not the language not being ready, the implementation (i.e. the humans) aren't ready. Maybe you can use French in a restaurant, but not yet in the supermarket. Maybe one waiter speaks both English and French, but the other one French only. So whether you can use French depends very much on the situation. But everyone agrees that eventually we'll all speak French, even though English will hang around for ages until it finally falls out of use. And those squiggles are so cute!

Wayland is the same. The protocol is stable and has been for a while. But not every compositor and/or toolkit/application speak Wayland yet, so it may not be sufficient for your use-case. So rather than asking "Is Wayland ready yet", you should be asking: "Can I run GNOME/KDE/Enlightenment/etc. under Wayland?" That is the right question to ask, and the answer is generally "It depends what you expect to work flawlessly." This also means "people working on Wayland" is often better stated as "people working on Wayland support in ....".

An exception to the above: Wayland as a protocol defines what you can talk about. As a young protocol (compared to X with 30 years worth of extensions) there are things that should be defined in the protocol but aren't yet. For example, Wacom tablet support is currently missing. Those are the legitimate cases where you can say Wayland isn't ready yet and where people are "working on Wayland". Of course, once the protocol is agreed on, you fall back to the above case: both sides of the equation need to implement the new protocol before you can make use of it.

Update 25/01/15: Matthias' answer to Is GNOME on Wayland ready yet?

January 24, 2016
I’ve had this post sitting in my drafts for the last 7 months. The code is stale, but the concepts are correct. I had intended to add some pictures before posting, but it’s clear that won’t happen now. Words are better than nothing, I suppose… Recently I pushed an intel-gpu-tool tool for modifying GPU frequencies. […]


GPU mirroring provides a mechanism to have the CPU and the GPU use the same virtual address for the same physical (or IOMMU) page. An immediate result of this is that relocations can be eliminated. There are a few derivative benefits from the removal of the relocation mechanism, but it really all boils down to that. Other people call it other things, but I chose this name before I had heard other names. SVM would probably have been a better name had I read the OCL spec sooner. This is not an exclusive feature restricted to OpenCL. Any GPU client will hopefully eventually have this capability provided to them.

If you’re going to read any single PPGTT post of this series, I think it should not be this one. I was not sure I’d write this post when I started documenting the PPGTT (part 1, part2, part3). I had hoped that any of the following things would have solidified the decision by the time I completed part3.

  1. CODE: The code is not not merged, not reviewed, and not tested (by anyone but me). There’s no indication about the “upstreamability”. What this means is that if you read my blog to understand how the i915 driver currently works, you’ll be taking a crap-shoot on this one.
  2. DOCS: The Broadwell public Programmer Reference Manuals are not available. I can’t refer to them directly, I can only refer to the code.
  3. PRODUCT: Broadwell has not yet shipped. My ulterior motive had always been to rally the masses to test the code. Without product, that isn’t possible.

Concomitant with these facts, my memory of the code and interesting parts of the hardware it utilizes continues to degrade. Ultimately, I decided to write down what I can while it’s still fresh (for some very warped definition of “fresh”).


GPU mirroring is the goal. Dynamic page table allocations are very valuable by itself. Using dynamic page table allocations can dramatically conserve system memory when running with multiple address spaces (part 3 if you forgot), which is something which should become pretty common shortly. Consider for a moment a Broadwell legacy 32b system (more details later). TYou would require about 8MB for page tables to map one page of system memory. With the dynamic page table allocations, this would be reduced to 8K. Dynamic page table allocations are also an indirect requirement for implementing a 64b virtual address space. Having a 64b virtual address space is a pretty unremarkable feature by itself. On current workloads [that I am aware of] it provides no real benefit. Supporting 64b did require cleaning up the infrastructure code quite a bit though and should anything from the series get merged, and I believe the result is a huge improvement in code readability.

Current Status

I briefly mentioned dogfooding these several months ago. At that time I only had the dynamic page table allocations on GEN7 working. The fallout wasn’t nearly as bad as I was expecting, but things were far from stable. There was a second posting which is much more stable and contains support of everything through Broadwell. To summarize:

Feature Status TODO
Dynamic page tables Implemented Test and fix bugs
64b Address space Implemented Test and fix bugs
GPU mirroring Proof of Concept Decide on interface; Implement interface.1

Testing has been limited to just one machine, mine, when I don’t have a million other things to do. With that caveat, on top of my last PPGTT stabilization patches things look pretty stable.

Present: Relocations

Throughout many of my previous blog posts I’ve gone out of the way to avoid explaining relocations. My reluctance was because explaining the mechanics is quite tedious, not because it is a difficult concept. It’s impossible [and extremely unfortunate for my weekend] to make the case for why these new PPGTT features are cool without touching on relocations at least a little bit. The following picture exemplifies both the CPU and GPU mapping the same pages with the current relocation mechanism.

Current PPGTT supportCurrent PPGTT support

To get to the above state, something like the following would happen.

  1. Create BOx
  2. Create BOy
  3. Request BOx be uncached via (IOCTL DRM_IOCTL_I915_GEM_SET_CACHING).
  4. Do one of aforementioned operations on BOx and BOy
  5. Perform execbuf2.

Accesses to the BO from the CPU require having a CPU virtual address that eventually points to the pages representing the BO2. The GPU has no notion of CPU virtual addresses (unless you have a bug in your code). Inevitably, all the GPU really cares about is physical pages; which ones. On the other hand, userspace needs to build up a set of GPU commands which sometimes need to be aware of the absolute graphics address.

Several commands do not need an absolute address. 3DSTATE_VS for instance does not need to know anything about where Scratch Space Base Offset
is actually located. It needs to provide an offset to the General State Base Address. The General State Base Address does need to be known by userspace:

Using the relocation mechanism gives userspace a way to inform the i915 driver about the BOs which needs an absolute address. The handles plus some information about the GPU commands that need absolute graphics addresses are submitted at execbuf time. The kernel will make a GPU mapping for all the pages that constitute the BO, process the list of GPU commands needing update, and finally submit the work to the GPU.

Future: No relocations

GPU MirroringGPU Mirroring

The diagram above demonstrates the goal. Symmetric mappings to a BO on both the GPU and the CPU. There are benefits for ditching relocations. One of the nice side effects of getting rid of relocations is it allows us to drop the use of the DRM memory manager and simply rely on malloc as the address space allocator. The DRM memory allocator does not get the same amount of attention with regard to performance as malloc does. Even if it did perform as ideally as possible, it’s still a superfluous CPU workload. Other people can probably explain the CPU overhead in better detail. Oh, and OpenCL 2.0 requires it.

"OpenCL 2.0 adds support for shared virtual memory (a.k.a. SVM). SVM allows the host and 
kernels executing on devices to directly share complex, pointer-containing data structures such 
as trees and linked lists. It also eliminates the need to marshal data between the host and devices. 
As a result, SVM substantially simplifies OpenCL programming and may improve performance."

Makin’ it Happen


As I’ve already mentioned, the most obvious requirement is expanding the GPU address space to match the CPU.

Page Table HierarchyPage Table Hierarchy

If you have taken any sort of Operating Systems class, or read up on Linux MM within the last 10 years or so, the above drawing should be incredibly unremarkable. If you have not, you’re probably left with a big ‘WTF’ face. I probably can’t help you if you’re in the latter group, but I do sympathize. For the other camp: Broadwell brought 4 level page tables that work exactly how you’d expect them to. Instead of the x86 CPU’s CR3, GEN GPUs have PML4. When operating in legacy 32b mode, there are 4 PDP registers that each point to a page directory and therefore map 4GB of address space3. The register is just a simple logical address pointing to a page directory. The actual changes in hardware interactions are trivial on top of all the existing PPGTT work.

The keen observer will notice that there are only 256 PML4 entries. This has to do with the way in which we've come about 64b addressing in x86. <a href="" target="_blank">This wikipedia article</a> explains it pretty well, and has links.

“This will take one week. I can just allocate everything up front.” (Dynamic Page Table Allocation)

Funny story. I was asked to estimate how long it would take me to get this GPU mirror stuff in shape for a very rough proof of concept. “One week. I can just allocate everything up front.” If what I have is, “done” then I was off by 10x.

Where I went wrong in my estimate was math. If you consider the above, you quickly see why allocating everything up front is a terrible idea and flat out impossible on some systems.

Page for the PML4
512 PDP pages per PML4 (512, ok we actually use 256)
512 PD pages per PDP (256 * 512 pages for PDs)
512 PT pages per PD (256 * 512 * 512 pages for PTs)
(256 * 512^2 + 256 * 512 + 256 + 1) * PAGE_SIZE = ~256G = oops

Dissimilarities to x86

First and foremost, there are no GPU page faults to speak of. We cannot demand allocate anything in the traditional sense. I was naive though, and one of the first thoughts I had was: the Linux kernel [heck, just about everything that calls itself an OS] manages 4 level pages tables on multiple architectures. The page table format on Broadwell is remarkably similar to x86 page tables. If I can’t use the code directly, surely I can copy. Wrong.

Here is some code from the Linux kernel which demonstrates how you can get a PTE for a given address in Linux.

typedef unsigned long   pteval_t;
typedef struct { pteval_t pte; } pte_t;

static inline pteval_t native_pte_val(pte_t pte)
        return pte.pte;

static inline pteval_t pte_flags(pte_t pte)
        return native_pte_val(pte) & PTE_FLAGS_MASK;

static inline int pte_present(pte_t a)
        return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)
        return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);
#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))

#define pgd_offset(mm, address) ( (mm)->pgd + pgd_index((address)))
static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address)
        return (pud_t *)pgd_page_vaddr(*pgd) + pud_index(address);
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);

/* My completely fabricated example of finding page presence */
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *ptep;
struct mm_struct *mm = current->mm;
unsigned long address = 0xdefeca7e;

pgd = pgd_offset(mm, address);
pud = pud_offset(pgd, address);
pmd = pmd_offset(pud, address);
ptep = pte_offset_map(pmd, address);
printk("Page is present: %s\n", pte_present(*ptep) ? "yes" : "no");

X86 page table code has a two very distinct property that does not exist here (warning, this is slightly hand-wavy).

  1. The kernel knows exactly where in physical memory the page tables reside4. On x86, it need only read CR3. We don’t know where our pages tables reside in physical memory because of the IOMMU. When VT-d is enabled, the i915 driver only knows the DMA address of the page tables.
  2. There is a strong correlation between a CPU process and an mm (set of page tables). Keeping mappings around of the page tables is easy to do if you don’t want to take the hit to map them every time you need to look at a PTE.

If the Linux kernel needs to find if a page is present or not without taking a fault, it need only look to one of those two options. After about of week of making the IOMMU driver do things it shouldn’t do, and trying to push the square block through the round hole, I gave up on reusing the x86 code.

Why Do We Actually Need Page Table Tracking?

The IOMMU interfaces were not designed to pull a physical address from a DMA address. Pre-allocation is right out. It’s difficult to try to get the instantaneous state of the page tables…

Another thought I had very early on was that tracking could be avoided if we just never tore down page tables. I knew this wasn’t a good solution, but at that time I just wanted to get the thing working and didn’t really care if things blew up spectacularly after running for a few minutes. There is actually a really easy set of operations that show why this won’t work. For the following, think of the four level page tables as arrays. ie.

  • PML4[0-255], each point to a PDP
  • PDP[0-255][0-511], each point to a PD
  • PD[0-255][0-511][0-511], each point to a PT
  • PT[0-255][0-511][0-511][0-511] (where PT[0][0][0][0][0] is the 0th PTE in the system)
  1. [mesa] Create a 2M sized BO. Write to it. Submit it via execbuffer
  2. [i915] See new BO in the execbuffer list. Allocate page tables for it…
    1. [DRM]Find that address 0 is free.
    2. [i915]Allocate PDP for PML4[0]
    3. [i915]Allocate PD for PDP[0][0]
    4. [i915]Allocate PT for PD[0][0][0]/li>
    5. [i915](condensed)Set pointers from PML4->PDP->PD->PT
    6. [i915]Set the 512 PTEs PT[0][0][0][0][511-0] to point to the BO’s backing page.
  3. [i915] Dispatch work to the GPU on behalf of mesa.
  4. [i915] Observe the hardware has completed
  5. [mesa] Create a 4k sized BO. Write to it. Submit both BOs via execbuffer.
  6. [i915] See new BO in the execbuffer list. Allocate page tables for it…
    1. [DRM]Find that address 0x200000 is free.
    2. [i915]Allocate PDP[0][0], PD[0][0][0], PT[0][0][0][1].
    3. Set pointers… Wait. Is PDP[0][0] allocated already? Did we already set pointers? I have no freaking idea!
    4. Abort.

Page Tables Tracking with Bitmaps

Okay. I could have used a sentinel for empty entries. It is possible to achieve this same thing by using a sentinel value (point the page table entry to the scratch page). To implement this involves reading back potentially large amounts of data from the page tables which will be slow. It should work though. I didn’t try it.

After I had determined I couldn’t reuse x86 code, and that I need some way to track which page table elements were allocated, I was pretty set on using bitmaps for tracking usage. The idea of a hash table came and went – none of the upsides of a hash table are useful here, but all of the downsides are present(space). Bitmaps was sort of the default case. Unfortunately though, I did some math at this point, notice the LaTex!.
\frac{2^{47}bytes}{\frac{4096bytes}{1 page}} = 34359738368 pages \  34359738368 pages \times \frac{1bit}{1page} = 34359738368 bits \  34359738368 bits \times \frac{8bits}{1byte} = 4294967296 bytes
That’s 4GB simply to track every page. There’s some more overhead because page [tables, directories, directory pointers] are also tracked.
  256entries + (256\times512)entries + (256\times512^2)entries = 67240192entries \  67240192entries \times \frac{1bit}{1entry} = 67240192bits \  67240192bits \times \frac{8bits}{1byte} = 8405024bytes \  4294967296bytes + 8405024bytes = 4303372320bytes \  4303372320bytes \times \frac{1GB}{1073741824bytes} = 4.0078G

I can’t remember whether I had planned to statically pre-allocate the bitmaps, or I was so caught up in the details and couldn’t see the big picture. I remember thinking, 4GB just for the bitmaps, that will never fly. I probably spent a week trying to figure out a better solution. When we invent time travel, I will go back and talk to my former self: 4GB of bitmap tracking if you’re using 128TB of memory is inconsequential. That is 0.3% of the memory used by the GPU. Hopefully you didn’t fall into that trap, and I just wasted your time, but there it is anyway.

Sample code to walk the page tables

This code does not actually exist, but it is very similar to the real code. The following shows how one would “walk” to a specific address allocating the necessary page tables and setting the bitmaps along the way. Teardown is a bit harder, but it is similar.

static struct i915_pagedirpo *
alloc_one_pdp(struct i915_pml4 *pml4, int entry)

static struct i915_pagedir *
alloc_one_pd(struct i915_pagedirpo *pdp, int entry)

static struct i915_tab *
alloc_one_pt(struct i915_pagedir *pd, int entry)

 * alloc_page_tables - Allocate all page tables for the given virtual address.
 * This will allocate all the necessary page tables to map exactly one page at
 * @address. The page tables will not be connected, and the PTE will not point
 * to a page.
 * @ppgtt:  The PPGTT structure encapsulating the virtual address space.
 * @address:    The virtual address for which we want page tables.
static void
alloc_page_tables(ppgtt, unsigned long address)
    struct i915_pagetab *pt;
    struct i915_pagedir *pd;
    struct i915_pagedirpo *pdp;
    struct i915_pml4 *pml4 = &ppgtt->pml4; /* Always there */

    int pml4e = (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
    int pdpe = (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
    int pde = (address >> GEN8_PDE_SHIFT) & I915_PDE_MASK;
    int pte = (address & I915_PDES_PER_PD);

    if (!test_bit(pml4e, pml4->used_pml4es))
        goto alloc;

    pdp = pml4->pagedirpo[pml4e];
    if (!test_bit(pdpe, pdp->used_pdpes;))
        goto alloc;

    pd = pdp->pagedirs[pdpe];
    if (!test_bit(pde, pd->used_pdes)
        goto alloc;

    pt = pd->page_tables[pde];
    if (test_bit(pte, pt->used_ptes))

    pdp = alloc_one_pdp(pml4, pml4e);
    set_bit(pml4e, pml4->used_pml4es);
    pd = alloc_one_pd(pdp, pdpe);
    set_bit(pdpe, pdp->used_pdpes);
    pt = alloc_one_pt(pd, pde);
    set_bit(pde, pd->used_pdes);

Here is a picture which shows the bitmaps for the 2 allocation example above.

Bitmaps tracking page tablesBitmaps tracking page tables

The GPU mirroring interface

I really don’t want to spend too much time here. In other words, no more pictures. As I’ve already mentioned, the interface was designed for a proof of concept which already had code using userptr. The shortest path was to simply reuse the interface.

In the patches I’ve submitted, 2 changes were made to the existing userptr interface (which wasn’t then, but is now, merged upstream). I added a context ID, and the flag to specify you want mirroring.

struct drm_i915_gem_userptr {
    __u64 user_ptr;
    __u64 user_size;
    __u32 ctx_id;
    __u32 flags;</p>

<h1>define I915_USERPTR_READ_ONLY          (1&lt;&lt;0)</h1>

<h1>define I915_USERPTR_GPU_MIRROR         (1&lt;&lt;1)</h1>

<h1>define I915_USERPTR_UNSYNCHRONIZED     (1&lt;&lt;31)</h1>

 * Returned handle for the object.
 * Object handles are nonzero.
__u32 handle;
__u32 pad;


The context argument is to tell the i915 driver for which address space we’ll be mirroring the BO. Recall from part 3 that a GPU process may have multiple contexts. The flag is simply to tell the kernel to use the value in user_ptr as the address to map the BO in the virtual address space of the GEN GPU. When using the normal userptr interface, the i915 driver will pick the GPU virtual address.

  • Pros:
    • This interface is very simple.
    • Existing userptr code does the hard work for us
  • Cons:
    • You need 1 IOCTL per object. Much undeeded overhead.
    • It’s subject to a lot of problems userptr has5
    • Userptr was already merged, so unless pad get’s repruposed, we’re screwed

What should be: soft pin

There hasn’t been too much discussion here, so it’s hard to say. I believe the trends of the discussion (and the author’s personal preference) would be to add flags to the existing execbuf relocation mechanism. The flag would tell the kernel to not relocate it, and use the presumed_offset field that already exists. This is sometimes called, “soft pin.” It is a bit of a chicken and egg problem since the amount of work in userspace to make this useful is non-trivial, and the feature can’t merged until there is an open source userspace. Stay tuned. Perhaps I’ll update the blog as the story unfolds.

Wrapping it up (all 4 parts)

As usual, please report bugs or ask questions.

So with the 4 parts you should understand how the GPU interacts with system memory. You should know what the Global GTT is, why it still exists, and how it works. You might recall what a PPGTT is, and the intricacies of multiple address space. Hopefully you remember what you just read about 64b and GPU mirror. Expect a rebased patch series from me soon with all that was discussed (quite a bit has changed around me since my original posting of the patches).

This is the last post I will be writing on how GEN hardware interfaces with system memory, and how that related to the i915 driver. Unlike the Rocky movie series, I will stop at the 4th. Like the Rocky movie series, I hope this is the best. Yes, I just went there.

Unlike the usual, “buy me a beer if you liked this”, I would like to buy you a beer if you read it and considered giving me feedback. So if you know me, or meet me somewhere, feel free to reclaim the voucher.

Image links

The images I’ve created. Feel free to do with them as you please.

Download PDF

  1. The patches I posted for enabling GPU mirroring piggyback of of the existing userptr interface. Before those patches were merged I added some info to the API (a flag + context) for the point of testing. I needed to get this working quickly and porting from the existing userptr code was the shortest path. Since then userptr has been merged without this extra info which makes things difficult for people trying to test things. In any case an interface needs to be agreed upon. My preference would be to do this via the existing relocation flags. One could add a new flag called "SOFT_PIN" 

  2. The GEM and BO terminology is a fancy sounding wrapper for the notion that we want an interface to coherently write data which the GPU can read (input), and have CPU observe data which the GPU has written (output)  

  3. The PDP registers are are not PDPEs because they do not have any of the associated flags of a PDPE. Also, note that in my patch series I submitted a patch which defines the number of these to be PDPE. This is incorrect. 

  4. I am not sure how KVM works manages page tables. At least conceptually I’d think they’d have a similar problem to the i915 driver’s page table management. I should have probably looked a bit closer as I may have been able to leverage that; but I didn’t have the idea until just now… looking at the KVM code, it does have a lot of similarities to the approach I took 

  5. Let me be clear that I don’t think userptr is a bad thing. It’s a very hard thing to get right, and much of the trickery needed for it is *not* needed for GPU mirroring 

  • EDIT1: I forgot to include a diagram I did of the software state machine for some presentation. I long lost the SVG, and it got kind of messed up, but it’s there at the bottom.
  • EDIT2: (Apologies to aggregators) Grammar fixes. Fixed some bugs in a couple of the images.
  • EDIT3: (Again, apologies to aggregators) s/indirect rendering/direct rendering. I had to fix this or else the sentence made no sense.
  • EDIT4 (2017-07-13): I was under the impression we were not yet allowed to talk about preemption. But apparently we are. So feature matrix at the bottom is updated.

The Per-Process Graphics Translation Tables provide real process isolation among the various graphics processes running within an i915 based system. When in use, the combination of the PPGTT and the Hardware Context provide the equivalent of the traditional CPU process. Most of the same capabilities can be provided, and most of the same limitations come with it. True PPGTT encompasses all of the functionality currently merged into the i915 kernel driver that support page tables and address spaces. It’s called, “true” because the Aliasing PPGTT was introduced first and often was simply called, “PPGTT.”

The True PPGTT patches represent one of the more challenging aspects of working on a project like the Linux kernel. The feature couldn’t realistically be enabled in isolation of the existing driver. When regressions occur it’s likely that the user gets no display. To say we get chided on occasion would be an understatement. Ipso facto, this feature is not enabled by default. There are quite a few patches on the mailing list that build new functionality on top of this support, and to help stabilize existing support. If one wishes to try enabling the real PPGTT, one must simply use the i915 module parameter: enable_ppgtt=2. I highly recommended that the stability patches be used unless you’re reading this in some future where the stability problems are fixed upstream.

Unlike the previous posts where I tried to emphasize the hardware architecture for this feature, the following will not go into almost no detail about how hardware works. There won’t be PRM references, or hardware state machines. All of those mechanics have been described in parts 1 and part 2

A Brief History of the i915 Graphics Process

There have been three stages of the definition of a graphics process within the i915 driver. I believe that by explaining the stages one can get a better appreciation for the capabilities. In the following pictures there is meant to be a highlighted region (yellow in the first two, yellow, orange and blue in the last) that denote the scope of a GPU context/process with the specified feature. Incrementally the definition of a process begins to bleed between the CPU, and the GPU.

Unfortunately I have some overlap with my earlier post about Hardware Contexts. I found no good way to write this post without doing so. If you read that post, consider this a refresher.

File Descriptors

Initially all GPU state was shared by every GPU client. The only partition was done via the operating system. Every process that does direct rendering will get a file descriptor for the device. The file descriptor is the thing through which commands are submitted. This could be used by the i915 driver to help disambiguate “who” was doing “what.” This permitted the i915 kernel driver to prevent one GPU client from directly referencing the buffers owned by a different GPU client. By making the buffer object handles per file descriptor (this is very easy to implement, it’s just an idr in the kernel) there exist no mechanism to reference buffer handles from a different file descriptor. For applications which do not require context saved, non-buggy apps, or non-malicious apps, this separation is still perfectly sufficient. As an example, BO handle #1 for the X server is not the same as BO handle #1 for xonotic since each has a different file descriptor1. Even though we had this partition at the software level, nothing was enforced by the hardware. Provided a GPU client could guess where another buffer resided, it could easily operate on that buffer. Similarly, a GPU client could not expect the GPU state it had set previously to be preserved for any amount of time.

File descriptor isolation.  Before hardware contexts.File descriptor isolation.Before hardware contexts.

Hardware Contexts

The next step towards isolation was the Hardware Context2. The hardware contexts built upon the isolation provided  by the original file descriptor mechanism. The hardware context was an opt-in interface which meant that those not wishing to use the interface received the old behavior: they could purposefully or accidentally use the state from another GPU client3. There was quite a bit of discussion around this at the time the patches were in review, and there’s not really any point in lamenting about how it could be better, now.

The context exists within the domain of the process/file descriptor in the same way that a BO exists in that domain. Contexts cannot be shared [intentionally]. The interface created was, and remains extremely simple.

struct drm_i915_gem_context_create {
    /* output: id of new context*/
    __u32 ctx_id;
    __u32 pad;

struct drm_i915_gem_context_destroy {
    __u32 ctx_id;
    __u32 pad;

As you can see from the two IOCTL payloads above, I wasn’t lying about the simplicity. Because there was not a great deal of variable functionality, there just wasn’t a lot to add in terms of the interface. Destroy is an optional call because we have the file descriptor and can clean up if a process does not. The primary motivation for destroy() is simply to allow very meticulous and memory conscious GPU clients to keep things tidy. Earlier I had a list of 3 types of GPU clients that could survive without this separation. Considering their inverse; this takes one of those off the list.

  • GPU clients needed HW context preserved
  • Buggy applications writing to random memory
  • Malicious applications

The block diagram is quite similar to above diagram with the exception that now there are discrete blocks for the persistent state. I was a bit lazy with the separation on this drawing. Hopefully, you get the idea.

Hardware context isolationHardware context isolation


The last piece was to provide a discrete virtual address space for each GPU client. For completeness, I will provide the diagram, but by now you should already know what to expect.

PPGTT, full isolationPPGTT, full isolation

If I write about this picture, there would be no point in continuing with an organized blog post :-). So I’ll continue to explain this topic. Take my word for it that this addresses the other two types of GPU clients

  • GPU clients needed HW context preserved
  • Buggy applications writing to random memory
  • Malicious applications

Since the GGTT isn’t really mentioned much in this post, I’d like to point out  that the GTT still exists as you can see in this diagram. It is required for several components that were listed in my previous blog post.

VMAs and Address Spaces (AKA VMs)

The patch series which began to implement PPGTT was actually a separate series. It was the one that introduced the Virtual Memory Area for the PPGTT, simply referred to as, VMA4. You can think of a VMA in a very similar way to a GEM BO. It is an identifiable, continuous range within an address space. Conceptually there isn’t much difference between a GEM BO. To try to define it in my horrible math jargon: a logical grouping of virtual addresses representing an operand for some GPU operation within a given PPGTT domain. A VMA is uniquely identified via the tuple (BO, Address space). In the likely case that I made no sense just there, a VMA is just another handle on a chunk of GPU memory used for rendering.

Sharing VMAs

You can’t (see the note at the bottom). There’s not a whole lot I can say without doing another post about DMA-Buf, and/or Flink. Perhaps someday I will, but for now I’ll keep things general and brief.

It is impossible to share a VMA. To repeat, a VMA is uniquely identifiable by the address space, and a BO. It remains possible to share a BO. An address space exists for an individual GPU client’s process. Therefore it makes no sense to share a VMA since the address space cannot be shared5. As a result of using the existing sharing interfaces a GPU will get multiple VMAs that reference the same BO. Trying to go back to the math jargon again:

  1. VMA: (BO, Address Space) // Some BO mapped by the address space.
  2. VMA′: (BO′, Address Space) // Another BO mapped into the address space
  3. VMA″: (BO, Address Space′) // The same BO as 1, mapped into a different address space.
VMA : PPGTT :: BO : GGTTM = {1,2,3,…} N = {1,2,3,…}

In case it’s still unclear, I’ll use an example (which is kind of a simplified/false demonstration). The scanout buffer is the thing which is displayed on the screen. When doing frontbuffer rendering, one directly renders to that buffer. If we remember my previous post, the Display Engine requires a GGTT mapping. Therefore we know we have VMAglobal. Jumping ahead, a GPU client cannot have a global mapping, therefore, to render to the frontbuffer it too has a VMA, VMApp. There you have two VMAs pointing to the same Buffer Object.

NOTE: You can actually share VMAs if you are already sharing a Context/PPGTT. I can’t think of any real world examples off of the top of my head, but it is possible, and potentially a useful thing to do.

Data Structures

Here are the relevant data structures cropped for the sake of brevity.

struct i915_address_space {
        struct drm_mm mm;
    unsigned long start;            /* Start offset always 0 for dri2 */
    size_t total;           /* size addr space maps (ex. 2GB for ggtt) */
    struct list_head active_list;
    struct list_head inactive_list;

struct i915_hw_ppgtt {
        struct i915_address_space base;
    int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
             struct intel_engine_cs *ring,
             bool synchronous);

struct i915_vma {
        struct drm_mm_node node;
        struct drm_i915_gem_object *obj;
        struct i915_address_space *vm;

The struct i915_hw_ppgtt is a subclass of a struct i915_address_space. Only two implementors of i915_address space exist: the i915_hw_ppgtt (a PPGTT), and the i915_gtt (the GGTT). It might make some sense to create a new PPGTT subclass for GEN8+ but I’ve not opted to do this. I feel there is too much duplication for not enough benefit.

I’ve already explained in different words that a range of used address space is the VMA.  If the address space has the drm_mm, then it should make direct sense that the VMA has the drm_mm_node because this is the used part of the address space6. In the i915_vma struct above is a pointer to the address space for which the VMA exists, and the object the VMA is referencing. This provides the tuple that define the VMA.

HOLE 0x0->0x64000 VMA 1 0x64000->0x69000 HOLE 0x69000->512M VMA 2 512M->512.004M HOLE 1.5GBHOLE 0x0->0x64000
VMA 1 0x64000->0x69000
HOLE 0x69000->512M
VMA 2 512M->512.004M
HOLE ~512M->2GB
Allocated space: 0x6000 Free space: 0x7fffa000

Relation to the Hardware Context

struct intel_context {
    struct kref ref;
    int id;
    struct i915_address_space *vm;

With the 3 elements discussed a few times already: file descriptor, context, PPGTT, we get real GPU process isolation. Since the context was historically an opt-in interface, changes needed to be made in order to keep the opt-in behavior yet provide isolation behind the scenes regardless of what the GPU client tried to do. If this was not done, then innocent GPU clients could feel the wrath. The file descriptor was already intimately connected with the direct rendering process (one cannot render without getting a file descriptor), it made sense to hook off of that to create the contexts and PPGTTs.

Implicit Context (“private default context”)

From here on out we can consider a, “context” as the 3 elements: fd, HW context, and a PPGTT. In the driver as it exists today if a GPU client does not provide a context for rendering, it cannot rely on GPU state being preserved. A context is created for GPU clients that do not provide one, but the state of this context should be considered completely opaque to all GPU clients. I’ve called this the Private Default Context as it very much resembles the default context that exists for the whole system (again, let me point you to the previous blog post on contexts). The driver will isolate the various contexts within the system from implicit contexts, and vice versa. Hardware state is undefined while using the private default context. Hardware state maintains it’s state from the previous render operation when using the IOCTLs.

The behavior of the implicit context does result in waste when userspace uses contexts (as mesa/libgl does).  There are a few solutions to this problem, and I’ve submitted patches for all of them (I can count 3 off the top of my head). Perhaps one day in the not too distant future, this above section will be false and we can just say – every process will get a context when they open the DRI file. If they want more contexts, they can use the IOCTL.

Multi Context

A GPU client can create more than one context. The context they wish to use for a given rendering command is built into the execbuffer2 API (note that KMS is not context savvy).

struct drm_i915_gem_execbuffer2 {
     * List of gem_exec_object2 structs
    __u64 buffers_ptr;
    __u32 buffer_count;

    /** Offset in the batchbuffer to start execution from. */
    __u32 batch_start_offset;
    /** Bytes used in batchbuffer from batch_start_offset */
    __u32 batch_len;
    __u64 flags;
    __u64 rsvd1; /* now used for context info */
    __u64 rsvd2;

A process may wish to create several GL contexts. The API allows this, and for reasons I don’t understand, it’s something some applications wish to do. If there was no mechanism to create a new contexts, userspace would be forced to open a new file descriptor for each GL context or else they would not reap the benefits of everything we’ve discussed for a GL context.

The Big Picture – literally



One of the more contentious topics in the very early stages of development was the relationship and connection of a PPGTT and a HW context.

Quoting myself from one of earlier public declarations, here:

My long term vision is for contexts to have a 1:1 relationship with a PPGTT. Sharing objects between address spaces would work similarly to the flink/dmabuf model if needed.

My idea was to embed the PPGTT within the context structure, and creating a context always resulted in a new PPGTT. Creating a PPGTT by itself would have been impossible. This is not what we ended up doing. The implementation allows multiple hardware contexts to share a PPGTT. I’m still unclear exactly what is needed to support share groups within OpenGL, but it has been speculated that this is a requirement for share groups. Fundamentally this would allow the client to create multiple GPU contexts that share an address space (it resembles what you’d get back when there was only HW contexts). The execbuffer2 IOCTL allows one to specify the context. Behaviorally however, my proposal matches what is in use currently. I think it’s a bit easier to think of things this way too.

Current Mesa Current DDX 2 hypothetical scenariosCurrent Mesa
Current DDX
2 hypothetical scenarios


Please feel free to send me issues or questions.
Oh yeah. Here is a state machine that I did for a presentation on this. Things got rendered weird, and I lost the original SVG file, but perhaps it will be of some value to someone.

State MachineState Machine


As I alluded to earlier, there is still some work left to do in order to get this feature turned on by default. I gave the links to some patches, and the parameter to make it happen. If you feel motivated to help get this stuff moving forward, test it, report bugs, try to fix stuff, don’t yell at me when things break :-).


That’s most of it. I like to give the 10 second summary.

  1. i915_vma, i915_hw_ppgtt, i915_address_space: important things.
  2. The GPU has a virtual address space per DRI file descriptor.
  3. There is a connection between the PPGTT, and a Hardware Context.
  4. VMAs are backed by BOs which are backed by physical pages.
  5. GPU clients have some flexibility with how they interact with contexts, and therefore the PPGTT.

And finally, since I compared our now well defined notion of a GPU process to the traditional CPU process, I wanted to create a quick list of what I think are some interesting data points regarding the capabilities of the processors.

Thing,Modern X86 CPU,Modern i915 GPU
Phys Address Limit,48b?,~40b
Process Isolation, Yes, Yes (with True PPGTT)
Virtual Address Space, Yes, Yes
64b VA Space, Yes, GEN8+ 48b only
PTE access controls, Yes, No
Page Fault Handling, Yes, No
Preemption ((I am defining the word, "preemption" as the ability to switch at an arbitrary point in time between contexts. On the CPU, this is <em>easily</em> accomplished. The GPU running the i915 driver, as of today, has no way to do this. Once a batch is running, it cannot be interrupted except for RC6.)), Yes, *With execlists

So while True PPGTT brings the GPU closer to having all of the [what I consider to be] interesting features of a modern x86 CPU – it still has a ways to go. I would be surprised if things didn’t continue going in this direction.

SVG Links

As usual, please feel free to do something useful with the images I’ve created. Also as usual, they are really poorly named.

Download PDF

  1. It’s technically possible to make them be the same BO through the two buffer sharing mechanisms. 

  2. Around the same time Hardware Contexts were introduced, so was the Aliasing PPGTT. The Aliasing PPGTT was interesting, however it does not contribute to any part of the GPU “process” 

  3. Hardware contexts use a mechanism which will inhibit the restoration of state when not opted-in. This means if one GPU client does opt-in, and another does not, the client without contexts can reuse the state of the client with contexts. As the address space is still shared, this is actually a really dangerous thing to allow. 

  4. I would have preferred the reservation of a space within the address space be called a, “GVMA”, but that was shot down during review 

  5. There’s a whole section below describing how this statement could be false. For now, let’s pretend address spaces can’t be shared 

  6. For those unfamiliar with the Direct Render Manager memory manager, a drm_mm is the structure for the memory manager provided by the DRM midlayer helper layer. It does all the things you’d expect out of a memory manager like, find free nodes, allocate nodes, free up nodes… A drm_mm_node is a structure representing an allocation from the memory manager. The PPGTT code relies entirely on thedrm_mm and the DRM helper functions in order to actually do the address space allocations and frees. 

January 20, 2016

In light of recent general confusion between X.Org the technical project and X.Org the Foundation here's a little overview.

X.Org the project

X.Org is the current reference implementation of the X Window System which has been around since the mid-80s. Its most prominent members is the X server and the related drivers but we put a whole bunch of other things under the same umbrella, e.g. mesa, drm, and - yes - wayland. Like most free software projects it is loosely organised and very few developers are involved in everything, everybody has their niche. If you're running Linux or a BSD and you can see a desktop environment in front of you, X.Org the technical project is somewhere in that stack.

X.Org the Foundation

The foundation is a non-profit organisation tasked with the stewardship of the X Window System, particularly the X.Org implementation. The most important thing is: the X.Org Foundation does not control the technical direction, it acts in a supporting role only. X.Org has a 501(c)3 tax code in the US which means that donations can be tax deducted (though we haven't collected donations in years). It also means that how we can spend money is very restricted. These days the Foundation's supporting roles are largely: sponsoring the annual X Developers Conference (XDC), providing travel sponsorship to XDC attendees and be the organisation to participate in the Google Summer of Code. Oh, and did I mention that the X.Org Foundation does not control the technical direction?

What does it matter?

The difference matters, especially for well-nuanced and thought-out statements like "X must die" in response to articles about the X.Org Foundation. If you want the Foundation to cease to exist, you're essentially saying "XDC and X.Org's GSoC participation must die". Given that a significant percentage of those two are now Wayland-related that may have some unintended side-effects. If you want the technical project to die, it may be wise to consider the side-effects. Wayland isn't quite ready yet, much of the work that is done under the umbrella of X benefits Wayland (libinput, graphics driver work, etc.).

Now if you excuse me, there's a windmill that needs tilting at. Rocinante, where are you?

January 14, 2016
First the title is a slight lie, this really is about compositor switching and not necessarily about using Linux VTs for that. But I hope that the title draws in the right folks and tempts them to read this. Since with atomic there's a bit a problem if you want to switch between different compositors - maybe you have X running and hack on wayland-mutter, or kwin and mutter or just a DE and a login manager  - and expect it to not end up in a modern arts project like this.

Now the trouble with atomic modesetting and switching between different compositors is that atomic display updates are incremental for two reasons:
  • First we need to be able to support the legacy KMS interface for existing userspace, and that inteface was only updating parts of the overall display state.
  • Second we want atomic to be extensible. Which means compositors must be able to only update the properties they understand and leave everything else at a presumably reasonable default values.
But if you mix this with a bunch of different compositors which all understand different subsets of all the atomic extensions a driver supports, suddenly the assumption that unhandled values have reasonable settings becomes invalid, and partial updates become a problem. Recently there's been a discussions on mailing lists and IRC about how to solve this, which ultimately ended in the conclusion that us kernel folks don't really know what would work best for distros, desktop environments and their compositors. Just going ahead with some new kernel ABI could easily result in a mistake that we have to support for the next 10 years. Therefore this blog post here covers the ideas with come up with to tackle this, just to make it clear that kernel folks are aware of this gap. As soon as userspace people with real clue about this topic run into problems they're more than welcome on and then we can figure out what to implement.

With that out of the way, let's look at possible solutions.

FBDEV resets to defaults

This is the cheap cop-out that is essentially implemented right now - when switching to a kernel console running on top of the FBDEV emulation KMS drivers can provide, the driver resets atomic properties of new extensions (like rotation, color management, Z-position/alpha/blending, whatever) to hopefully sane defaults. That's good enough for developers hacking around on different compositors, as long as you remember to VT-switch to a kernel console after things went south.

But FBDEV is seriously uncool, and a lot of people are working towards removing the kernel's VT subsytem from modern distros, too. This doesn't really work everywhere, but it's kinda the minimal and what we'll definitely implement. This has also the downside that maybe you only want to restore some properties, while keeping others (since they might be crucial to your setup, for example rotating the screen).

System compositor restores boot-up state

If doing something in the kernel isn't flexible enough then the usual approach is to do it in userspace. Most systems have some kind of master compositor that's run in-between user sessions, like a login manager. That system compositor could restore modeset state to something sensible every time it runs again, and user session compositors could then take over that sensible setup as their starting point.

This has the benefit that more clever and specific stuff like only restoring some properties is easy to implement. This shouldn't need a hole lot of code since a compositor needs to be able to restore it's state anyway to allow switching between them.

But, you're saying, how can a system compositor know about all the possible and future atomic KMS extensions? Won't this scheme break the much heralded extensibility? Well the neat trick is that userspace doesn't need to be able to understand properties to save and restore them - the actual property value transport between kernel and userspace is fully generic. There are a few special cases, like the need to disable outputs that have been unplugged meanwhile, and also some properties with special meaning, like framebuffers. But generic userspace can read out all the metadata, and even if future property types extend e.g. the value range that should still work.

The downsides of this approach is that it depends upon the global system compositor to do this right, so if that crashes and leaves your display setup in shambles you can't easily recover any more. This also requires that everyone who cares about switching between different compositors to have such a system compositors.

Interlude: Only launching compositors matters

The above observation that atomic clients can always faithfully restore a state (even if they don't understand the semantics of all properties) means that switching compositors itself will always work. The real trouble is making sure that a compositor can start up with a sane enough configuration to be able to successfully get pixels onto the screen.

New atomic IOCTL kernel flag

What might be needed is a way to make this safe state persistent in the kernel. The next option tries that, by adding a new flag to the atomic IOCTL which asks the kernel to start out with an atomic KMS state reset to some default value.

But again the trouble here is, like with the FBDEV approach, that it's monolithic, and doesn't easily allow userspace to control what's being restored. And again the question is, should things get restored to the boot-up state (and which boot-up state - something equivalent to what FBDEV emulation would pick, what the firmware would have picked or a mix), or maybe reset values (set everything to unrotated) is better?

On top of that most often compositors don't want to reset state at all, to be able to smoothly take over the display configuration from the preceeding KMS client. Users have lost pretty much all appreciation of unsightly flickering that commonly happened in the pre-KMS world when switching compositors.

Another problem with keeping the boot-up state around is that the kernel then needs to keep a copy of all such state (and some objects like gamma tables are sizeable) around. Just in case there's userspace around to ask for it.

Use SysRq to reset atomic state

A problem with adding a flag to the atomic IOCTL to reset state is that all compositors need to implement support for it, and somehow make a decision for when to employ it. But on most systems compositors shouldn't completely mess up the atomic state. Most likely that's after a crash, and then there's no userspace around anyway to fix things up. Now generally this is, or well should, only be a problem for developers, and a possible solution might be to wire up a SysRq hotkey where the kernel force-resets atomic state to defaults. This would be similar to the FBDEV based solution, except without FBDEV and not tied to VT switching.

An alternative would be to implement this in the boot-splash service, by sampling boot-up state and providing some command to force-reset to that. But that's pretty much the system compositor approach, but using the boot splash, and a tool to restore its state from a stored location, as the system compositor.

Expose reset or boot-up state

An easy fix to give control back to userspace over what will get restored is to instead expose the boot-up values or the reset values to userspace, through an extension to the GET_PROPERTY IOCTL. But again storing boot-up state in the kernel would be wasteful on systems that will never need it (like Android or CrOS), and exposing reset values somewhat pointless if e.g. you really want your screen rotated, always.

Per-compositor atomic state

A slight spiel on all this is to make atomic state per-compositor in the kernel. This sounds a bit like it might help, but on the other hand implementing full state restore isn't more effort when compositors need to restore their state after a VT switch anyway. This leaves the tricky question of what the inherited state should be when a new compositor starts up: Normally you want the current state, so that the compositor can take over smoothly. Except when that's a really bad idea because the current state is badly mangled from a different compositor that just crashed.

Overall lots of different approaches and ideas, but no clear winner. That's why kernel folks need distro, compositor and desktop people to run into this issue first, to make sure the solution that lands actually solves the right problem. And in a way that suits userspace.

Thanks to Daniel Stone, Pekka Paalanen and Ville Syrjälä for input on this.
January 11, 2016
Kernel version 4.4 is released, it's time for our regular look at what's in store for the Intel graphics driver in the next release.

Overall this cycle has seen lots and lots of bugfixes, and the reason for that is that we're rebuilding our CI infrastructure after it went up in a poof of smoke last summer. Really big thanks to the entire team for the effort invested! And that's why this overview is a bit different and we'll start with bugfix efforts before delving into the few feature additions:

Ville fixed up display fifo underruns all over the place: FDI modeset fixes for Haswell/Broadwell, correctly detecting fused-of VGA on the same,  and disabling fifo underrun reporting in some places where we've learned that underruns just happen - mostly around starting up the display pipeline.

Next up is improved runtime PM wakelock debugging from Imre Deak, with efforts from other folks trying to fix up various issues. Unfortunately this turned up so many little buglets that we had to disable the reporting again, at least for now. But at least we'll now have a very clear list of things to address, and a reliable way to audit any failures, to finally be able to enable runtime PM by default hopefully soon.

There's also been lots of fixes for PSR and FBC from Rodrigo and Paulo respectively. PSR enabled by default missed the 4.5 merge window by just a hair - it's already enabled for 4.6. FBC is also pretty close, the last bit Paulo is working is untangling the locking issues. FBC is sitting between GEM and KMS and FBC code gets called by both subsystems. And that's an easy recipe for deadlocks, which Paulo is now working to resolve.

There's also fixes included from Tvrtko Ursulin, Lukas Wunner and Chris Wilson to remedy some long-standing regressions in the fbdev framebuffer setup code. In GEM finally Dave Gordon fixed issues with the page dirty tracking. And Chris Wilson fine-tuned the request polling logic to avoid needlessly wasting CPU cycles.

Imre, Patrik and others have done a lot of work to fix up various issues in the DMC firmware loader for Skylake and the DC5/6 support. It works well now on that platform, and could reenable the overall display power well support again on Skylake. But there's still plenty of issues on Broxton unfortunately.

Since bugfixes have been highly prioritized over feature work this time around there's only very little progress on atomic modesettting and specifically atomic watermark updates. But 4.5 includes a few more prep patches from Maarten Lankhorst and Matt Roper.

There have been some real features though still: Alex Goins from nvidia implementd proper sync for page-flipping dma-buf backed framebuffers, benefitting setups where nVidia renders buffers that the Intel driver displays.

Finally there's also been the usual amount of internal refactoring to prepare the code for the future and keep it maintainable. Jani Nikula rewrote the VBT parsing code.  And Ander started to rework the DP detection code as the first step of a large DP support revamp. And finally ther's been a bit of enabling for Kabylake too, but it's not yet complete.

And of courese there's been a lot more smaller things, again mostly bugfixes.
January 10, 2016
This summer Intel sponsored some work to improve the kerneldoc toolchain, with the aim to use all that to extend the DRM and i915 driver documentation we have. Most of it landed, but the last bit to integrate some type  of text markup processing was stalled until it could be discussed at the kernel summit, see the LWN summary. Unfortunately it died in a bikeshed fest due to an alliance of people who think docs are useless and you should just read the code, and others who didn't even know how to convert the kerneldoc into something pretty.

But we still need this, since without lists, highlighting, basic tables and inserting code snippets it's really hard to write decent documentation. Luckily Dave Airlie is ok with using it for DRM kerneldoc as long as Intel maintains the support. It's purely opt-in and the only downside of not using asciidoc is that the resulting docs won't be as pretty. All the changes to the text itself to use this markup are going into upstream as normal. The only bit that's not in upstream is the tooling, which is available in a topic branch at

        git:// topic/kerneldoc

If you want to build pretty docs just install asciidoc and base your drm documentation patches on top of drm-intel-nightly from the same repository - that tree also includes all of Dave's tree. Alternatively pull in the above topic branch into your own personal tree. Note that asciidoc is detected automatically, so you really only need it and the tooling branch to check the rendering of your changes.

For added convenience Intel also maintains an autobuilder that pushes latest drm-intel-nightly DRM documentation builds to

Aside: If all you want to build is just the GPU DocBook instead of all of them, you can do that with

        $ make DOCBOOKS="gpu.xml" htmldocs

With that have fun reading the new&improved documentation, and if you spot anything please submit a patch to
As we were working on audio jack notifications, and were wondering whether the type of notification we'd pop up in this case could be reused in other cases, I encountered a feature request that could now be solved easily with the rfkill D-Bus service we added to gnome-settings-daemon for the 3.10 release.

If you have keyboard buttons on your laptop to enable or disable Bluetooth, or Airplane mode, you can now use them. Note that the "UWB" toggle key will toggle the whole airplane mode mainly because no in-kernel driver uses it, and nobody remembers what UWB is.

Note that the labels and icons used are still subject to changes. In particular as you can see that the labels are too long for lower resolutions.

January 09, 2016
Prodded by me while I snoozed on his sofa and with his cat warming me up, a day before the Content Applications hackfest, Florian Müllner started working on fixing a long-standing gjs bug that made it impossible to use gom in GNOME/JavaScript applications. The result of that initial research came a few days later, and is now part of the latest gjs release.

This also fixes using GtkBuilder and json-glib when the libraries create new objects for the benefit of the JavaScript code.

We can finally use gom to store user data in applications like Bolso. Thanks Florian!
January 05, 2016

One of the things that makes me really happy in terms of the public reception to the Fedora Workstation is all the people calling out how stable and solid it is, as this was and is one of our big goals from the start of the Fedora Workstation effort.

From the start we wanted to bury the old idea of Fedora being only for people who didn’t mind risking a lot of instability in return for being on the so called bleeding edge. We also wanted to bury the related idea that by using Fedora you where basically alpha testing highly unstable and unfinished software for Red Hat Enterprise Linux. Yet at the same time we did want to preserve and build upon the idea that Fedora is a great operating system if you want to experience a lot of the latest and greatest new developments as they are happening. At first glance those two goals might seem a bit contradictory, but we decided that we should be able to do both by both adjusting our policies a bit and also by relying more on the Fedora retrace server as our bug fixing prioritization tool.

So in terms of policies the division of Fedora into a distinct server and workstation images and also the clearer separation of the spins, allowed us to start making decisions without worrying so much how they affected other usecases than our own. Because sometimes what from a user perspective seems like a bug or something being broken was non-workstation policy decisions getting in the way of the desktop behaving as expected, for instance firewall rules hindering basic desktop functions.

Secondly we incorporated a more careful approach into what and when we brought in new stuff, meaning we still try to keep on top of major upstream developments and be a leading edge system, but at the same time we do a little mental exercise for each decision to make sure its a decision that makes us ‘leading edge’ and not ‘bleeding edge’. And if we really want something in, but it isn’t 100% ready for prime time yet we do what we have done with Wayland or the GTK3 port of LibreOffice, we make it available as an option for early adopters, but we default to the safer choice while we work out the last wrinkles. (Btw, if you are interested in progress on Wayland, Kevin Martin, sent out an emailing with a link to a good Wayland development status just before the Holidays.

The final piece of the puzzle is regularly checking and identifying important bugs from the Fedora retrace server. Because like almost all developers we get way more bug reports than we realistically can ever address, so having the data from the retrace server allows us to easily identify the crashes that affect the most users, and just as importantly lets us filter out the bug reports that are likely caused by users installing weird stuff on their system. When we started using retrace various desktop modules tended to dominate the top 3 pages when sorting bugs based on count, but due to a continuous effort over the last few years desktop modules appearing in the top crashers list are few and far between and when they do appear we make sure to get fixes done quickly for them. So if you ever wonder if the data collected by these kind of systems are actually helping developers working on the software you use better, I can say that it is true for Fedora for sure.

That said I thought it could be interesting to explain a bit the challenges we have with tracking our progress in this area. So lets start by looking at a graph I pulled from the retrace server.
Looking at that graph one could say that it is clear that we have made great strides in improving system stability and I do believe that is the case, however the graphs doesn’t truly prove that inconclusively, they are just an indication. The reason they are not hard evidence is that there are a lot of things you need to take into consideration when reading them. First of all they are not adjusted based on total user population, which means that if you win or lose a lot of users between releases it can create an appearance of increased instability or decreased instability which is actually due to increase or decrease in user population, not in ‘how well does the system run on an individual users system’. So from what we see through other metrics our user population has been increasing since we launched the Fedora Workstation which means we shouldn’t be getting any ‘help’ in these graphs from a declining user population.

A second reason is that there are a lot of false positives being reported here, for instance we had an issue for a long while that the Intel graphics drivers generating a ton of this crash reports without it actually being crashes as such. So while they did represent bugs that should ideally be fixed they where not issues you might actually have noticed as a user of the system. So we spent some effort between Fedora Workstation 21 and Fedora Workstation 22 to reduce the amount of noise caused by this, which was an useful effort for us in terms of reducing noise in our retrace server, but from a user perspective it didn’t really make a tangible difference. And even with our efforts there are a still a lot of kernel issues showing up here which are not impacting users in a way that they are likely to perceive as the system being unstable.

A third item that might in a given release skewer the statistics is that we currently don’t differentiate between Fedora Workstation and spins in the statistics, which means that there might be issues caused by one of the spins generating a lot of bug reports against a module, but that might be a bug or an API usage issue that is not triggered by the Workstation edition and thus those items appearing or disappearing might affect the statistics, but as a user of the Fedora Workstation you would never experience it.

So keeping this is mind the retrace server is an important tool for us and one that at least gives us a decent indication of how we are doing with quality. But we can always do better so we will keep reviewing the reports we get through the ABRT and retrace systems and I also do strong recommend any application or library maintainers out there to look into what major issues are reported against their own modules.

December 22, 2015
Now that we have a few years of experience with the Wayland protocol, I thought I would put some of my observations in writing. This, what will hopefully become a series rather than just one post, considers how to design Wayland protocol extensions the right way.

This first post considers protocol object lifespan and the related races between the compositor/server and the client. I assume that the reader is already aware of the Wayland protocol basics. If not, I suggest reading Chapter 4. Wayland Protocol and Model of Operation.

How protocol objects are created

On a new Wayland connection, the only object that exists is the wl_display which is a specially constructed object. You always have it, and there is no wire protocol for creating it.

The only thing the client can create next is a wl_registry through the wl_display. Registry is the root of the whole interface (class) hierarchy. Wl_registry advertises the global objects by numerical name, and using wl_registry.bind request to bind to a global is the first normal way to create a protocol object.

Binding is slightly special still, as the protocol specification in XML for wl_registry uses the new_id argument type, but does not specify the interface (class) for the new object. In the wire protocol, this special argument gets turned into three arguments: interface name (string), interface version (uint32_t), and the new object ID (uint32_t). This is unique in the Wayland core protocol.

The usual way to create a new protocol object is for the client to send a request that has a new_id type of argument. The protocol specification (XML) defines what the interface is, so there is no need to communicate the interface type over the wire. All that is needed on the wire is the new object ID. Almost all object creation happens this way.

Although rare, also the server may create protocol objects for the client. This happens by having a new_id type of argument in an event. Every time the client receives this event, it receives a new protocol object.

As all requests and events are always part of some interface (like a member of a class), this creates an interface hierarchy. For example, wl_compositor objects are created from wl_registry, and wl_surface objects are created from wl_compositor.

Object creation never fails. Once the request or event is sent, the new objects it creates exists, period. This keeps the protocol asynchronous, as there is no need to reply or check that the creation succeeded.

How protocol objects are destroyed

There are two ways to destroy a protocol object. By far the most common one is to have a request in the interface that is specified to be a destructor. Most often this request is called "destroy". When the client code calls the function wl_foobar_destroy(), the request is sent to the server and the client side proxy (struct wl_proxy) for the object gets destroyed. The server then handles the destructor request at some point in the future.

The other way is to destroy the object by an event. In that case, no destructor must be defined in the interface's protocol specification, and the event must be clearly documented to be destructive as there is no automation nor safeties for this. This is for cases where the server decides when an object dies, and requires extreme care in protocol design to work right in all cases. When a client receives such an event, all it can do is destroy the proxy. The (in)famous example of an interface like this is wl_callback.

Enter the boogeyman: races

It is very important that both the client and the server agree on which protocol objects exist. If the client sends a request on, or references as an argument, an object that does not exist in the server's opinion, the server raises a protocol error, and disconnects the client. Obviously this should never happen, nor should it happen that the server sends an event to an object that the client destroyed.

Wayland being a completely asynchronous protocol, we have no implicit guarantees. The server may send an event at the same time as the client destroys the object, and now the event targets an object the client does not know about anymore. Rather than the client shooting itself dead (that's the server's job), we have a trick in libwayland-client: it silently ignores events to destroyed objects, until the server confirms that the object is truly gone.

This works very well for interfaces where the destructor is a request. If the client first sends the destructor request and then sends another request on the destroyed object, it just shot its own head off - no race needed.

Things get tricky for the other case, destructor events. The server may send the destructor event at the same time the client is sending a request on the same object. When the server finally gets the request, the object is already gone, and the client gets taken behind the shed and shot. Therefore pretty much the only safe way to use destructor events is if the interface does not define any requests at all. Ever, not even in future extensions. Furthermore, objects with that interface should not be used as arguments anywhere, or you may hit the race. That is why destructor events are difficult to use right.

The boogeyman's brother

There is yet another nasty race with events that create objects, i.e. server-created objects. If the client is destroying the (parent) object at the same time as the server is sending an event on that object, creating a new (child) object, the server cannot know if the client actually handled the event or not. If the client ignored the event, it will never tell the server to destroy that new object, and you leak in the server.

You could try to make your way out of that pitfall by writing in your protocol specification, that when the (parent) object is destroyed, all the child objects will be destroyed implicitly. But then the client must not send the destructor request for the child objects after it has destroyed the parent, because otherwise the server sees requests on objects it does not know about, and kicks you in the groin, hard. If the child interface defines a destructor, the client cannot destroy its proxies after destroying the parent object. If the child interface does not define a destructor, you can never free the server-side resources until the parent gets destroyed.

The client could destroy all the child objects with a defined destructor in one go, and then immediately destroy the parent object. I am not sure if that works, but it might. If it does not, you have to specify a whole tear-down protocol sequence. The client tells the server it wants to destroy the parent object, the server acks and guarantees it no longer sends any events on it, then the client actually destroys the parent object. Hey, you have a round-trip and just turned a beautiful asynchronous protocol into synchronous, congratulations!

Concluding with recommendations

Here are my recommendations when designing Wayland protocol extensions:
  • Always make sure there is a guaranteed way to destroy all objects. This may sound obvious, but we have fixed several cases in the Wayland core protocol where there was no way to destroy a created protocol object such, that all resources on both server and client side could be freed. And there are still some cases not fixed.
  • Always define a destructor request. If you have any doubt whether your new interface needs a destructor request, just put it there. It is more awkward to add later than normal requests. If you do not have one, the client cannot tell the server to free those protocol object resources.
  • Do not use destructor events. They are hard to design right, and extending the interface later will be a bitch. The client cannot tell the server to free the resources, so objects with destructor events should be short-lived, and the destruction must be guaranteed.
  • Do not use server-side created objects without a serious thought. Designing the destruction sequence such that it never leaks nor explodes is tricky.
December 19, 2015
Having finished debugging and fixing a rather tricky GPU VM fault bug in the radeonsi driver, I thought I'd document the bug chasing process I went through (some parts are cleaned up with dead ends removed). May it help myself and others in the future.

Fortunately, the original submitter of the bug had already bisected the cause of the VM faults to a change in LLVM, so the fault was clearly due to some shader. Unfortunately, the triggering commit in LLVM was completely unrelated to Radeon, so it was very unclear what was going on. Still, the bug occured in the publically available Unreal Elemental Demo and was easily reproducible, so off I went.

Since some shader was to blame, the first thing to do after reproduction was to collect all shaders before and after the bad commit, using RADEON_DUMP_SHADERS=y (R600_DEBUG=ps,vs,gs also does this). This resulted in a lot of output with a large diff between the good and the bad run. Clearly, the change in LLVM subtly affected register allocation and/or instruction scheduling in the compiler in a way that affected many shaders and exposed some pre-existing, underlying bug. I needed to find the exact shader that caused problems.

The next step, to ensure even more reliable and deterministic reproduction, was to record an apitrace. This allows us to replay the exact same sequence of OpenGL calls that leads to the VM faults over and over again, to learn ever more about what's going on. (The Unreal Elemental Demo always plays the same scene, but it is affected by timing.)

Now it was time to find the exact draw call that caused the problems. The driver has some tools to help with that: the GALLIUM_DDEBUG feature is meant to detect lockups, but it conveniently causes a command stream flush after every draw call, so I used it by setting GALLIUM_DDEBUG=800. This makes the replay terribly slow (there's a reason we batch many draw calls into a single CS, also called IB in the kernel). So I implemented a GALLIUM_DDEBUG_SKIP feature in the driver that let me skip the additional flushes and lockup checks for the initial, known-good segment of the trace.

In addition, the driver comes with a debug feature that detects and aborts on VM faults, which is enabled via R600_DEBUG=check_vm. Since the fault comes from a shader, we also need a way to cross-reference the dected fault to the currently bound shader. This is achieved by dumping shaders and enabling the vm debug option, for a full command line of something like

RADEON_DUMP_SHADERS=y R600_DEBUG=vm,check_vm \
glretrace -v ElementalDemo.trace > runXXX.log 2>&1
The option -v for glretrace dumps all OpenGL calls as they are executed, which also turned out to be useful.

How to find the faulty shader from all that? Well, the check_vm logic not only detects VM faults, but also writes helpful logging dumps to a file in ~/ddebug_dumps/ (use less -R to make sense of the coloring escape codes that are written to the file). Most crucially, this dump contains a list of all buffers that were mapped, obviously including the buffers that contain the shader binaries. In one example run:

Size VM start page VM end page Usage
245 -- hole --
1 0x0000000113a9b 0x0000000113a9c USER_SHADER
268 -- hole --
564 -- hole --
2 0x000000016f00d 0x000000016f00f USER_SHADER
145 -- hole --
Remember that we enabled the vm debug option together with shader dumping? This means our log contains lots of lines of the form

VM start=0x105249000 end=0x10524A000 | Buffer 4096 bytes
(Note that the log contains byte addresses, while the check_vm dump contains page numbers. Pages are 4KB, so you just need to add or remove three 0s at the end to go from bytes to pages and vice versa.) All we need to do is grep the log file for "VM start=0x113A9B" and "VM start=0x16F00D" (mmh, I'm getting hungry...). And while those might appear multiple times if a buffer is reused or destroyed, the last occurence will be where the shader binary was created.

The shader dump contains three versions of the shader: the initial TGSI, which is what Gallium's state tracker provides to the hardware-dependent driver, the LLVM IR after an initial optimization pass, and the disassembly of the final shader binary as provided by LLVM. I extracted the IR of the two shaders (vertex and fragment), and compiled them with LLVM's standalone compiler, once with the "good" version of LLVM and once with the "bad" (in both cases using the command line llc -march=amdgcn -mcpu=tonga < shader.ll). It turned out that both shaders were affected by the change, so I needed to figure out whether it was the vertex or the fragment shader.

The GUI of apitrace has the wonderful ability of allowing you to edit a trace. Remember that I used the -v option of glretrace? That produces lots of lines like

2511163 @2 glDrawRangeElements(mode = GL_TRIANGLES, start = 0, end = 9212, ...
Indeed, that was the final reported draw call, i.e. the one causing the fault. So let's open the trace with qapitrace and jump to that exact call using its number. Sure enough, not long before the call we find

We need to find where this program is linked, which is typically much earlier in the program. So we search for glLinkProgram(program = 505) and find exactly one such call. A short bit above, we find the calls

glAttachShader(505, 69)
glAttachShader(505, 504)
where the fragment and vertex shaders are attached to the program. Finally, we search for glShaderSource(shader = 69 and 504 to find the call where the source is loaded into the shader. Now, we can edit those calls to replace the shaders by dummy versions. Be careful, though: the length of the shader source is passed as a separate argument, and you must adjust it manually or you will get surprising error messages when running the modified trace.

By doing so, I determined that the fragment shader was at fault: even minor modifications such as re-ordering statements without changing any effects removed VM faults when applied to the fragment shader, but not when applied to the vertex shader.

So... time to stare at the disassembly of the fragment shader. Unfortunately, there was nothing that caught my eye. It was a long shader with more than 700 instructions. So what next? Since even minor changes at the source level fixed the fault, no matter what kind of change, I needed to go deeper and modify the binary directly. I wrote a new feature for radeonsi that would help me do just that, by allowing me to tell the driver to replace the binary created by LLVM on the N'th compile by a binary that is supplied as an environment variable.

I would have loved to be able to edit assembly at this point. Unfortunately, the llvm-mc tool, which can theoretically act as an assembler, is not able to parse all the assembly constructs that llc generates for the AMDGPU backend. So I went with the next best option, creating an ELF object file using llc -march=amdgcn -mcpu=tonga -filetype=obj and editing the binary directly with a hex editor.

That wasn't too bad though: since VM faults are generated by memory instructions, I could just replace those memory instructions by NOPs. Since the shader dumps collected above helpfully include the binary representation of instructions in addition to the assembly, the instructions aren't too hard to find, either. I only needed to take care not to NOP out memory instructions whose output was then later used as addresses or resource descriptors for other memory instructions, or else I would have introduced new sources for VM faults!

At that point, I ran into a tough problem. My plan was to NOP out large groups of memory instructions initially, and then do a kind of binary search to isolate the bad access. Unfortunately, what happened was, roughly speaking, that when I NOP'ed out group A of instructions or group B of instructions, the VM faults remained, but when I NOP'ed out both groups at the same time, the VM faults disappeared. This can happen when there are really two underlying bugs, but unfortunately I did not see a plausible culprit in either group (in fact, the first bug which I found was actually outside both groups, but - as far as I understand it - depended subtly on timing and on how that affected the scheduling of shader waves by the hardware).

Luckily, at that point I had long suspected the problem to be in wait state handling. You see, in order to save on complicated circuitry in the hardware, there are some rarely occuring data hazards which the compiler must avoid by inserting NOPs explicitly (there is also the s_waitcnt instruction which is part of the strategy for hiding memory latency without complex out-of-order circuitry). So I read the documentation of those hazards again and noticed that the compiler in fact didn't insert enough wait states for a sequence involving register spills (i.e., something affecting only very large shaders). I fixed that, and while it didn't eliminate the VM faults, it changed their pattern sufficiently to give me new hope.

Indeed, with the additional wait states, my original idea of finding the bad instructions by binary search was successful. I narrowed the problem down to accesses to one single texture. At that point, my brain was too exhausted to see the bug that was rather obvious in hindsight, but a colleague pointed it out to me: there was a multi-word register copy which attempted to copy a resource descriptor (consisting of 8 32-bit words) between overlapping register ranges, and it was doing that copy in the wrong direction - kind of like using memcpy when you should be using memmove.

Once this second bug was found, coming up with a fix was relatively straightforward. And yes, I checked: both bug fixes were actually required to completely fix the original bug.

That was my story. Hopefully you've learned something if you've come this far, but there is not really much of a moral to it. Perhaps it is this: pray you have deterministically reproducible bugs. If you do, patiently collecting more and more information will lead you to a solution. If you don't, well, sometimes you just have to be lucky.
December 15, 2015

This post only applies to users of the Lenovo x220 laptop experiencing issues when using the touchpad. Specifically, the touchpad is imprecise and "jumpy" after a firmware update, as outlined in Fedora bug 1264453. The cause is buggy touchpad firmware, identifiable by the string "fw: 8.1" in the dmesg output for the touchpad:

[ +0.005261] psmouse serio1: synaptics: Touchpad model: 1, fw: 8.1,
id: 0x1e2b1, caps: 0xd002a3/0x940300/0x123800, board id: 1611, fw id: 1099905
If you are experiencing these touchpad issues and your dmesg shows the 8.1 firmware version, please read on for a solution. By default, the x220 shipped with version 8.0 so unless you updated the firmware as part of a Lenovo update, you are not affected by this bug.

The touchpad issues seem identical as the ones seen on the Lenovo x230 model which has the same physical hardware and also ships with a firmware version 8.1. The root cause as seen by libinput is that the touchpad only sends events once the finger moves approximately 50 device units in either direction. The touchpad advertises a resolution of 65 units/mm horizontally and 136 units/mm vertically, but the effective resolution is reduced by roughly 75% and 30% This bugzilla attachment 1082925 shows the recording, you can easily see that while the pressure is upgraded with high granularity, the motion coordinates jump from one position to the next. From what we know this was introduced by the touchpad firmware v8.1, presumably as part of a filter to reduce the jitter some x230 users saw.

libinput automatically detects the x230 and enables a custom acceleration function for just that model. That same acceleration function works for the x220 v8.1, but unfortunately we cannot automatically detect it. As of libinput 1.1.3, libinput recognises a special udev tag, LIBINPUT_MODEL_LENOVO_X220_TOUCHPAD_FW81, to mark such an updated x220 and enable a better pointer behaviour. To apply this tag, please do the following:

  1. Create a new file /etc/udev/hwdb.d/90-libinput-x220-fw8.1.hwdb
  2. Look for X220 in the 90-libinput-model-quirks.hwdb file, copy the match and the property assignment into the file. As of the time of writing, the two lines are as below, but make sure you take the latest from your locally installed libinput version or the link above.

    libinput:name:SynPS/2 Synaptics TouchPad:dmi:*svnLENOVO:*:pvrThinkPadX220*
  3. Update the udev hwdb with sudo udevadm hwdb --update
  4. Verify the tag shows up with sudo udevadm test /sys/class/input/event4 (adjust the event node if necessary)
  5. Reboot
The touchpad is now marked as requiring special treatment and libinput will apply a different pointer acceleration for this touchpad.

Note that any udev property starting with LIBINPUT_MODEL_ is private API and subject to change at any time. We will never break the meaning of the LIBINPUT_MODEL_LENOVO_X220_TOUCHPAD_FW81 property, but the exact behaviour of the property is implementation-dependent and may change at any time. Do not use it for any other purpose than marking the touchpad on a Lenovo x220 with an updated touchpad firmware version v8.1.

Big news for the VC4 project today:

commit 21de54b3c4d08d2b20e80876c6def0b421dfec2e
Merge: 870a171 2146136
Author: Dave Airlie
Date: Tue Dec 15 10:43:27 2015 +1000

Merge tag 'drm-vc4-next-2015-12-11' of into drm-next

This is the last step for getting the VC4 driver upstream: Dave's pulled my code for inclusion in kernel 4.5 (probably to be released around mid-March).  The ABI is now stable, so I'm working on getting that ABI usage into the Mesa 11.1 release.  Hopefully I'll land that in the next couple of days.

As far as using it out of the box, we're not there yet.  I've been getting my code included in some builds for the Raspberry Pi Foundation developers.  They've been working on switching to kernel 4.2, and their tree has VC4 support up to the previous ABI.  Once the Mesa 11.1 merge happens, I'll ask them to pull the new kernel ABI and rebuild userspace using Mesa 11.1-rc4.  Hopefully this then appears in the next Raspbian Jessie build they produce.  Until that release happens, there are instructions for the development environment on the DRI wiki, and I'd recommend trying out the continuous integration builds linked from there.

The Raspberry Pi folks aren't ready to swap everyone over to the vc4 driver quite yet, though.  They want to make sure we don't regress functionality, obviously, and there are some big chunks of work left to do: HDMI audio support, video overlays, and integration of the vc4 driver with the camera and video decode support come time mind.  And then there's the matter of making sure that 3D performance doesn't suffer.  That's a bit hard to test, since only a few apps work with the existing GLES2 support, while the vc4 driver gives GLX, EGL-on-X11, EGL-on-gbm, most of GL2.1, and all of GLES2, but doesn't support the EGL-on-Dispmanx interface that the previous driver used.  So, until then, they're going to have a devicetree overlay that determines whether the firmware sets itself up for Linux using the vc4 driver or the closed driver.

Part of what's taken so long to get to this point has been trying to get my dependencies merged to the kernel.  To turn on V3D, I need to turn on power, which means a power domain driver.  Turning on the power required talking to the firmware, which required resurrecting an old patchset for talking to the firmware, which got bikeshedded harder than I've ever had happen to my code before.  Programming video modes required a clock driver.  Every step of the way is misery to get the code merged, and I would give up a lot to never work on the Linux kernel again.

Until then, though, I've become as Raspberry Pi kernel maintainer, so that I can ack other people's patches and help shepherd them into the kernel.  Hopefully for 4.5 I can get the aux clock driver bikeshedding dealt with and resubmit it, at which point people can use UART1 and SPI1/2.  I have a third rework to do of my power domain driver so that if we're lucky we can get it merged nad actually turn on the 3D core (and manage power of many other devices, too!).  Martin Sperl is doing a major rewrite of the SPI driver (an area I know basically nothing about), and his recent patch split may deal with the subsystem maintainer's concerns.  I want to pull in feedback and merge Lubomir's thermal driver.  There's also a cpufreq driver (for actually doing the overclocking you can set with config.txt) from Lubomir, which I expect to be harder to deal with the feedback on.

So, while I've been quiet on the blogging front, there's been a lot going on for vc4, and it's in pretty good shape now.  Hopefully more folks can give it a try as it becomes more upstreamed and accessible.
December 14, 2015

AppStream on DebianBack in 2011, when the AppStream meeting in Nürnberg had just happened, I published the DEP-11 (Debian Extension Project 11) draft together with Michael Vogt and Julian Andres Klode, as an approach to implement AppStream in Debian.

Back then, the FTPMasters team rejected the suggestion to use the official XML specification, and so the DEP-11 specification was adapted to be based on YAML instead of XML. This wasn’t much of a big deal, since the initial design of DEP-11 was to be a superset of the AppStream specification, so it wasn’t meant to be exactly like AppStream anyway. AppStream back then was only designed for applications (as in “stuff that provides a .desktop file”), but with DEP-11 we aimed for much more: DEP-11 should also describe fonts, drivers, pkg-config files and other metadata, so in the end one would be able to ask the package manager meaningful questions like “is the firmware of device X installed?” or request actions such as “please install me the GIMP”, making it unnecessary to know package names at all, and making packages a mere implementation detail.

Then, GNOME-Software happened and demanded all these features. Back then, I was the de-facto maintainer of the AppStream upstream project already, but didn’t feel like being the maintainer yet, so I only curated the existing specification, without extending it much. The big push forward GNOME-Software created changed that dramatically, and with me taking control of the specification and documenting it properly, the very essence of DEP-11 became AppStream (that was around the AppStream 0.6 release). So today, DEP-11 is mainly a YAML-based version of the AppStream XML specification.

AppStream XML and DEP-11 YAML are implemented by two projects, GLib and Qt libraries exist to access the metadata and AppStream is used by the software centers of GNOME, KDE and Elementary.

Today there are two things to celebrate for me: First of all, there is the release of AppStream 0.9 (that happened last Saturday already), which brings some nice improvements to the API for developers and some micro-optimizations to speed up Xapian database queries. Yay!

The second thing is full DEP-11 support in Debian! This means that you don’t need to copy metadata around manually, or install extra packages: All you need to do is to install the appstream package, everything else is done for you, and the data is kept up to date automatically.

This is made possible by APT 1.1 (thanks to the whole APT team!), some dedicated support for it in AppStream directly, the work of our Sysadmin team at Debian, which set up infrastructure to build the metadata automatically, as well as our FTPMasters team where Joerg helped with the final steps of getting the metadata into the archive.

That AppStream data is now in the archive doesn’t mean we live in a perfect utopia yet – there are still issues to be handled, but all the major work is done now and we can now gradually improve the data generator and tools and squash the remaining bugs.

And another item from the good news department: It’s highly likely that Ubuntu will follow Debian in AppStream/DEP-11 support with the upcoming Xenial release!

But how can I make use of the new metadata?

Just install the appstream package – everything is done for you! Another easy way is to install GNOME-Software, which makes use of the new metadata already. KDE Discover in Debian does not enable support for AppStream yet, this will likely come later.

If you prefer to use the command-line, you can now use commands like

sudo appsteamcli install org.kde.kate.desktop

This will simply install the Kate text editor.

Who wants some statistics?

At time the Debian Sid/Unstable suite contains 1714 valid software components. It could be even more if the errors generated during metadata extraction would be resolved. For that, the metadata generator has a nice statistics page, showing the amount of each hint type in the suite and the development of the available software components in Debian and the hint types count over time (this plot feature was just added recently, so we are still a bit low on data). For packagers and interested upstreams, the data extractor creates detailed reports for each package, explaining why data was not included and how to fix the issue (in case something is unclear, please file a bug report and/or get in contact with me).

In summary

Thanks to everyone who helped to make this happen! For me this project means a lot, when writing this blog post I realized that I am basically working on it for almost 5 years (!) now (and the idea is even older). Seeing it to grow to such a huge success in other distributions was a joy, but now Debian can join the game with first-class AppStream support as well, which makes me even happier. Afterall Debian is the distribution I feel most at home.

There is still lots of work to do (and already a few bugs known), but the hardest part of the journey is done – let’s walk into a bright future with AppStream!

December 11, 2015

tanglu logo pureIt is time again for another Tanglu blogpost (to be honest, this article is pretty much overdue 😉 ). I want to shine a spotlight on the work that’s done in Tanglu, be it ongoing or past work done for the new release.

Why is the new release taking longer than usual?

As you might have noticed, usually a new Tanglu release (Tanglu 4 “Dasyatis”) should be released this month. We decided a while ago, however, to defer the release and are now aiming for an release in February / March 2016.

Reason for this change in schedule is the GCC 5 transition (and more importantly the huge amount of follow-up transitions) as well as some other major infrastructure tasks, which we would like to complete and give them a good amount of testing before releasing. Also, some issues with our build system, and generally less build power than in previous releases is a problem (At least the Debile build-system issues could be worked around or solved). The hard disk crash in the forum and bugtracking server also delayed the start of the Tanglu 4 development process a lot.

In all of these tasks, manpower is of course the main problem 😉

Software Tasks

General infrastructure tasks

Improvements on Synchrotron

Synchrotron, the software which is synchronizing the Tanglu archive with the Debian archive, received a few tweaks to make it more robust and detect “installability” of packages more reliably. We also run it more often now.

Rapidumo improvements

Rapidumo is a software written to complement dak (the Debian Archive Kit) in managing the Tanglu archive. It performs automatic QA tasks on the archive and provides a collection of tools to perform various tasks (like triggering package rebuilds).

For Tanglu 4, we sometimes drop broken packages from the archive now, to speed up transitions and to remove packages which got uninstallable more quickly. Packages removed from the release suite still have a chance to enter it again, but they need to go through Tanglu’s staging area again first. The removal process is currently semiautomatic, to avoid unneccessary removals and breakage.

Rapidumo could benefit from some improvements and an interactive web interface (as opposed to static HTML pages) would be nice. Some early work on this is done, but not completed (and has a very low priority at the moment).

DEP-11 integration

There will be a bigger announcement on AppStream and DEP-11 in the next days, so I keep this terse: Tanglu will go for full AppStream integration with the next release, which means no more packaged metadata, but data placed directly in the archive. Work on this has started in Tanglu, but I needed to get back to the drawing board with it, to incorporate some new ideas for using synergies with Debian on generating the DEP-11 metadata.


Phabricator has been integrated well into our infrastructure, but there are still some pending tasks. E.g. we need subprojects in Phabricator, and a more powerful Conduit interface. Those are upstream bugs on Phabricator, and are actively being worked on.

As soon as the missing features are available in Phabricator, we will also pursue the integration of Tanglu bug information with the Debian DistroTracker, which was discussed at DebConf this summer.

UEFI support

UEFI support is a tricky beast. Full UEFI support is a release-goal for Tanglu 4 (so we won’t release without it). At time, our Live-CDs start on pure EFI systems, but there are several reported issues with the Calamares live-installer as well as Debian-Installer, which fails to install GRUB correctly.

Work on resolving these problems is ongoing.

Major LiveCD rework

Tanglu 4 will ship with improved live-cds, which e.g. allow selecting the preferred locale early in the bootloader. We switched from live-boot to manage our live sessions to casper, the same tool which is also used in Ubuntu. Casper fixed a few issues we had, and brought some new, but overall using it was a good choice and work on making top-notch live-cds is progressing well.

KDE Plasma

Integration of the latest Plasma release is progressing, but its speed has slowed down since fewer people are working on it. If you want to help Tanglu’s KDE Workspace integration, please help!

For the upcoming release, the Plasma 5 packages which we created based on a collaboration with Kubuntu for the previous Tanglu 3 release have been merged with their Debian counterparts. This action fortunately was possible without major problems. Now Tanglu is (mostly) using the same Plasma 5 packages as Debian again (lots of Kudos go the the Kubuntu and Debian Qt/KDE packagers!).


The same re-merge with Debian has been done on Tanglu’s GNOME flavor (Tanglu also shipped with a more recent GNOME release than Debian in Tanglu 3). So Tanglu 4 GNOME and Debian Testing GNOME are at (almost) the same level.

Unfortunately, the GNOME team is heavily understaffed – a GNOME team member left at the beginning of the year for personal reasons, and the team now only has one semi-active member and me.


An fvwm-nightshade spin is being worked on 🙂 . Apart from that, there are no teams maintaining other flavors in Tanglu.

Security & Stable maintenance

Thanks to several awesome people, the current Tanglu Stable release (Tanglu 3 “Chromodoris”) receives regular updates, even for many packages which are not in the “fully supported” set.

Tangluverse Tasks

Tasks (not) done in the global Tanglu universe.

HTTPS for community sites

Thanks to our participation in the Let’s Encrypt closed beta program, Tanglu websites like the user forums and bugtracker have been fully encrypted for a while, which should make submitting data to these sites much more secure. So far, we didn’t encounter issues, which means that we will likely aim for getting HTTPS encryption enabled on every Tanglu website. website

The Tanglu main website didn’t receive much love at all. It could use a facelift and more importantly updated content about Tanglu.

So far, nobody volunteered to update the website, so this task is still open. It is, however, a high-priority task to increase our public visibility as a project, so updating the website is definitely something which will be done soon.


It doesn’t hurt to think about how to sell Tanglu as “brand”: What is Tanglu? Our motivation, our goals? What is our slogan? How can we communicate all of this to new users, without having them to read long explanations? (e.g. the Fedora Project has this nicely covered on their main website, and their slogan “Freedom, Friends, Features, First” immediately communicates the project’s key values)

Those are issues engineers don’t think about often, still it is important to present Tanglu in a way that is easy to grasp for people hearing about it for the first time. So far, no people are working on this task specifically, although it regularly comes up on IRC.

Sponsoring & Government

Tanglu is – by design – not backed by any corporation. However, we still need to get our servers paid. Currently, Tanglu is supported by three sponsors, which basically provide the majority of our infrastructure. Also, developers provide additional machines to get Tanglu running and/or packages built.

Still, this dependency on very few sponsors paying the majority of the costs is very bad, since it makes Tanglu vulnerable in case one sponsor decides to end sponsorship. We have no financial backing to be able to e.g. continue to pay an important server in case it’s sponsor decides to drop sponsorship.

Also, since Tanglu is no legal entity, accepting donations is hard for us at time, since a private person would need to accept the money on behalf of Tanglu. So we can’t easily make donating to Tanglu possible (at least not in the way we want donations to be).

These issues are currently being worked on, there are a couple of possible solutions on the table. I will write about details as soon as it makes sense to go public with them.


In general, I am very happy with the Tanglu community: The developer community is a pleasure to work with, and interactions with our users on the forums or via the bugtracker are highly productive and friendly. Although the development speed has slowed down a bit, Tanglu is still an active and awesome project, and I am looking forward to the cool stuff we will do in future!


December 09, 2015
Due to vacations, conferences and other things I'm way later than usual and 4.3 has been released a while ago. More than overdue to take a look at what's in store in the next kernel release.
First looking at overall infrastructure work on the display side there's a lot of atomic conversion progress again. One feature that's now on solid fundations is fastboot, built on top of atomic infrastructure with patches from Maarten. Unfortunately we had to disable it again due to some backligh issues early in 4.4-rc. The other big piece is reworking the watermark update code (Ville&Matt), which unfortunately ran into regression roadblocks already in the development cycle and had to be reverted partially. Another piece of infrastructure building on top of atomic is validation&adjusting the display clock - some ULT chips can't drive all DP screens and the driver now detects that, and it should also downclock when less bandwidth is needed. This was implemented by Mika Kahola and Ville.

Again this round has seen a lot of improvements and bug fixes to PSR code (from Rodrigo) and for FBC (from Paulo). Unfortunately we're not yet done with those, but it looks really good that at least PSR can finally be enabled for 4.5. Still on the display side of the driver there was a pile of smaller improvements all over: Prep work for Broxton DSI support (Shashank Sharma). HDMI detection finally checks the hotplug sense, after some workaround from Sonika. And tons of cleanups all over. Fixing up DMC support (for new low-power display states) was also a topic, but we've only managed to fix it up for real in 4.5.

On the GEM side the big thing for sure is support for the extended 48-bit GPU address space on Broadwell and later chips, from Michel Thierry. And then there's the code for GuC-based command submission (Alex Dai and Dave Gordon), which is merged but not yet enabled by default. The idea behind that is to feed all command submission through an on-chip microcontroller, which can then react much faster to changing workloads and tune power states accordingly. It should also help long-term with better scheduling by supporting preemption. But none of that is implemented yet, so this is just fundations.

For existing features there are bugfixes for userptr and shrinker improvements from Chris Wilson. And Tvrtko has extended the vma view code in prepartion of rotation support for NV12.

Of course there's also been the usual enabling work for new platforms, this time around mostly consisting of workaround patches for Skylake and Broxton. But Zhiyuan Lv submitted support for the virtualized XenGT gpu support on Broadwell.

Finally for driver internals there's the massive work from Ville to make the register access functions type safe. This is escpecially a problem for writing registers, where both the register and the value that needs to be written are of type uin32_t. That resulted in subtile bugs fairly often. Ville encapsulated the register offset into a struct and converted all the thousands of register #defines and users over to that, and now compilation will fail if we ever get this wrong again.
December 08, 2015
As you might already have noticed from the posts on Planet GNOME, and can find again on the hackfest's page, we spent some time in the MediaLab Prado discussing and hacking on Content Apps.


Following discussions about Music's state, I did my bit trying to gather more contributors by porting it to grilo 0.3, and thus bringing it back into the default jhbuild target.


I made some progress on Videos' "series grouping" feature. Loads of backend code written, but not much in the way of UI for now. We however made some progress discussing said UI with Allan.

I also took the opportunity to fix a few low-hanging fruit^Wbugs.


This is where the majority of my energy went. After getting a new enough version of LibreOffice going on my machine (Fedora users, that lives in rawhide only right), no thanks to COPR, I tested Pranav's LibreOfficeKit integration into gnome-documents, after Cosimo rebased it.

You can test it now by checking out the wip/lokdocview-rebase branch of gnome-documents, grabbing the above mentioned version of LibreOffice, and running:

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/libreoffice/program/  gjs org.gnome.Documents

After a number of fixes, and bugs filed in the Document Foundation bugzilla, we should be able to land this so that you can preview and edit word processing documents, presentations and spreadsheets without going through the heavy PDF preview.

A picture, which doubles the length of my blog post

And the side-effect of this work is that we can start adding new "views" to the application without too much trouble, like, say, an epub view.


Many thanks to the GNOME Foundation for sponsoring my travel, the MediaLab Prado for hosting us, and Allan and Florian for organising the hackfest.

December 03, 2015

In a previous blog post I said, "If you have encountered a regression and you are building a driver from source, please provide the results of git-bisect." There is some feeling that performing a bisect is hard, time consuming, or both. Back in the bad-old-days, that was true... but git bisect run changed all that.

Most of performing a bisect is mechanical and repetitious:

  1. Build the project.
  2. If the build fails, run git bisect skip.
  3. Run the test.
  4. Inspect the results.
  5. Run git bisect good or git bisect bad depending on the test result.
  6. While there are more steps to bisect, repeat from step 1.
  7. Run git bisect reset to restore the tree to its original state.

Some years ago, someone noticed that this seems like a task a computer could do. At least as early as git 1.6.0 (around 2010), this has been possible with using git bisect run. Once you get the hang of it, it's surprisingly easy to use.

A Word of Caution

Before actually discussing automated bisects, I want to offer a word of caution. Bisecting, automated or otherwise, is a process that still requires the application of common sense. A critical step at the end of bisecting is manually testing the guilty commit and the commit immediately before the guilty commit. You should also look at the commit that git-bisect claims is guilty. Over the years I have seen many bug reports for Mesa that either point at commits that only change comments in the code or only change a driver other than the driver on which the bug was observed.

I have observed these failures to have two causes, although other causes may be possible. With a manual bisect, it is really easy to accidentally git bisect good when you meant git bisect bad. I have done this by using up-arrow to go back through the shell command history when I'm too lazy to type the command again. It's really easy to get the wrong one doing that.

The other cause of failure I have observed occurs when multiple problems occur between the known-good commit and the known-bad commit. In this case the bisect will report a commit that was already know to be bad and was already fixed. This false information can waste a lot of time for the developer who is trying to fix the bug. They will spend time trying to determine why the commit reported by the bisect still causes problems when some other commit is the actual cause. The remedy is proper application of common sense while performing the bisect. It's often not enough to just see that the test case fails. The mode of the failure must also be considered.

Automated Bisect Basics

All of the magic in an automated bisect revolves around a single script that you supply. This script analyzes the commit and tells bisect what to do next. There are four things that the script can tell bisect, and each of them is communicated using the script's return code.

  1. Skip this commit because it cannot be analyzed. This is identical to manually running git bisect skip.This can be used, for example, if the commit does not build. A script might contain something like:

    if ! make ; then
        exit 125

    As you might infer from the code example, a return code of 125 instructs bisect to skip the commit.

  2. Accept this commit as good. This is identical to git bisect good. A return code of 0 instructs bisect to accept the commit as good.

  3. Reject this commit as bad. This is identical to git bisect bad. All tests in the piglit test suite print a string in a specific format when a test passes or fails. This can be used by the script to generate the exit code. For example:

    bin/arb_clear_buffer_object-null-data -auto > /tmp/result.$$
    if grep -q 'PIGLIT: {"result": "pass" }' /tmp/result.$$; then
        rm /tmp/result.$$
        exit 0
        cat /tmp/result.$$
        rm /tmp/result.$$
        exit 1

    In this bit of script, the output of the test is printed in the "bad" case. This can be very useful. Bisects of long ranges of commits may encounter failures unrelated to the failure you are trying to bisect. Seeing the output from the test may alert you to failures with unrelated causes.

    Looking for simple "pass" or "fail" output from the test may not be good enough. It may be better to look for specific failure messages from the test. As mentioned above, it is important to only report a commit as bad if it the test fails due to the problem you are trying to bisect.

    Imagine a case where a failure in the arb_clear_buffer_object-null-data on the master branch is being bisected. The particular failure is an incorrect error being generated, and the known-good commit is HEAD~200 when the last stable release occurred (on a different branch with a common root). However, HEAD~110..HEAD~90 contain an unrelated rendering error that was fixed in HEAD~89. Since git-bisect performs a binary search, it will test HEAD~100 first and see the wrong failure. Simply looking for test failure would incorrectly identify HEAD~110 as the problem commit. If the script instead checked for the specific incorrect error message, the correct guilty commit is more likely to be found.

    A return code with any value of 1 through 127, excluding 125, instructs bisect to reject the commit as bad.

  4. Stop at this commit and wait for human interaction. This can be used when something really catastrophic happens that requires human attention. Imagine a scenario where the bisect is being performed on one system but tests are run on another. This could be used if the bisect system is unable to communicate wit the test system. A return code with any value of 128 through 255 will halt the bisect.

All of this can be used to generate a variety of scripts for any sort of complex environment. To quote Alan Kay, "Simple things should be simple, complex things should be possible." For a simple make-based project and an appropriately written test case, an automated bisect script could be as simple as:

    if ! make ; then
        exit 125

    # Use the return code from the test case
    exec test_case

Since this is just a script that builds the project and runs a test, you can easily test the script. Testing the script is a very good idea if you plan to leave the computer when the bisect starts. It would be shame to leave the computer for several hours only to find it stuck at the first commit due to a problem in the automated bisect script. Assuming the script is called, testing the script can be as easy as:

    $ ./ ; echo $?

Now all of the human interaction for the entire bisect would be three commands:

    $ git bisect start bad-commit good-commit
    $ git bisect run
    $ git bisect reset

If there are a lot commits in good-commit..bad-commit, building the project takes a long time, or running the tests takes a long time, feel free to go have a sandwich while you wait. Or play Quake. Or do other work.

Broken Builds

The bane of most software developer's existence is a broken build. Few things are more irritating. With GIT, it is possible to have transient build failures that nobody notices. It's not unheard of for a 20 patch series to break at patch 9 and fix at patch 11. This commonly occurs either when people move patches around in a series during development or when reviewers suggest splitting large patches into smaller patches. In either case patch 9 could add a call to a function that isn't added until patch 11, for example. If nobody builds at patch 9 the break goes unnoticed.

The break goes unnoticed until a bisect hits exactly patch 9. If the problem being bisected and the build break are unrelated (and the build break is short lived), the normal skip process is sufficient. The range of commits that don't build will skip. Assuming the commit before the range of commits that don't build and the commit after the range of commits that don't build are both good or bad, the guilty commit will be found.

Sometimes things are not quite so rosy. You are bisecting because there was a problem, after all. Why have just one problem when you can have a whole mess of them? I believe that the glass is either empty or overflowing with steaming hot fail. The failing case might look something like:

    $ git bisect start HEAD HEAD~20
    Bisecting: 9 revisions left to test after this (roughly 3 steps)
    [2d712d35c57900fc0aa0f1455381de48cdda0073] gallium/radeon: move printing texture info into a separate function
    $ git bisect run ./
    running ./ says skip
    Bisecting: 9 revisions left to test after this (roughly 3 steps)
    [622186fbdf47e4c77aadba3e38567636ecbcccf5] mesa: errors: validate the length of null terminated string
    running ./ says good
    Bisecting: 8 revisions left to test after this (roughly 3 steps)
    [19eaceb6edc6cd3a9ae878c89f9deb79afae4dd6] gallium/radeon: print more information about textures
    running ./ says skip
    Bisecting: 8 revisions left to test after this (roughly 3 steps)
    [5294debfa4910e4259112ce3c6d5a8c1cd346ae9] automake: Fix typo in MSVC2008 compat flags.
    running ./ says good
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7] gallium/radeon: print more info about CMASK
    running ./ says skip
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [c60d49161e3496b9e64b99ecbbc7ec9a02b15a17] gallium/radeon: remove unused r600_texture::pitch_override
    running ./ says skip
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [84fbb0aff98d6e90e4759bbe701c9484e569c869] gallium/radeon: rename fmask::pitch -> pitch_in_pixels
    running ./ says skip
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [bfc14796b077444011c81f544ceec5d8592c5c77] radeonsi: fix occlusion queries on Fiji
    running ./ says bad
    Bisecting: 5 revisions left to test after this (roughly 3 steps)
    [a0bfb2798d243a4685d6ea32e9a7091fcec74700] gallium/radeon: print more info about HTILE
    running ./ says skip
    Bisecting: 5 revisions left to test after this (roughly 3 steps)
    [75d64698f0b0c906d611e69d9f8b118c35026efa] gallium/radeon: remove DBG_TEXMIP
    running ./ says skip
    Bisecting: 5 revisions left to test after this (roughly 3 steps)
    [3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007] radeonsi: print framebuffer info into ddebug logs
    running ./ says bad
    Bisecting: 3 revisions left to test after this (roughly 2 steps)
    [a5055e2f86e698a35da850378cd2eaa128df978a] gallium/aux/util: Trivial, we already have format use it
    running ./ says skip
    There are only 'skip'ped commits left to test.
    The first bad commit could be any of:
    We cannot bisect more!
    bisect run cannot continue any more

In even more extreme cases, the range of breaks can be even longer. Six or seven is about the most that I have personally experienced.

The problem doesn't have to be a broken build. It could be anything that prevents the test case from running. On Mesa I have experienced problems where a bug that prevents one driver from being able to load or create an OpenGL context persists for a few commits. Anything that prevents the test from running (e.g., not produce a pass or fail result) or causes additional, unrelated failures should be skipped.

Usually the problem is something really trivial. If the problem was fixed, a patch for the problem may even already exist. Let's assume a patch exists in a file named fix-the-build.patch. We also know that the build broke at commit 75d6469, and it was fixed at commit 3a6de8c. This means that the range 75d6469^..3a6de8c^ need the patch applied. If you're not convinced that the ^ is necessary, observe the log output:

    $ git log --oneline 75d6469^..3a6de8c^
    a0bfb27 gallium/radeon: print more info about HTILE
    1cca259 gallium/radeon: print more info about CMASK
    84fbb0a gallium/radeon: rename fmask::pitch -> pitch_in_pixels
    19eaceb gallium/radeon: print more information about textures
    2d712d3 gallium/radeon: move printing texture info into a separate function
    c60d491 gallium/radeon: remove unused r600_texture::pitch_override
    75d6469 gallium/radeon: remove DBG_TEXMIP

Notice that the bottom commit in the list is the commit where the break is first experienced, and the top commit in the list is not the one where the break is fixed.

Using this information is simple. The bisect script need only determine the current commit is in the list of commits that need the patch and conditionally apply the patch.

    # Get the short-from SHA of the current commit
    SHA=$(git log --oneline HEAD^.. | cut -f1 -d' ')

    # If the current commit is in the list of commits that need the patch
    # applied, do it.  If applying the patch fails, even partially, abort.
    if grep --silent "^$SHA " <(git log --oneline 75d6469^..3a6de8c^)
        #                     ^^                                    ^
        #                     This bit runs git-log, redirects the output
        #                     to a temporary file, then uses that temporary
        #                     file as the input to grep.  Non-bash shells
        #                     will probably need to do all that manually.
        if ! patch -p1 --forward --silent < fix-the-build.patch ; then
            exit 255

Before exiting, the script must return the tree to its original state. If it does not, applying the next commit may fail or applying the patch on the next step will certainly fail. git-reset can do most of the work. It just has to be applied everywhere this script might exit. I generally do this using a wrapper function. The simple bisect script from before might look like:

    function report()
        git reset --hard HEAD
        exit $1

    # Get the short-from SHA of the current commit
    SHA=$(git log --oneline HEAD^.. | cut -f1 -d' ')

    # If the current commit is in the list of commits that need the patch
    # applied, do it.  If applying the patch fails, even partially, abort.
    if grep --silent "^$SHA " <(git log --oneline 75d6469^..3a6de8c^)
        if ! patch -p1 --forward --silent < fix-the-build.patch ; then
            # Just exit here... so that we can see what went wrong
            exit 255

    if ! make ; then
        report 125

    # Use the return code from the test case
    report $?

This can be extended to any number of patches to fix any number of problems.

There is one other tip here. If the first bisect attempt produced inconclusive results due to skipped commits, it may not have been wasted effort. Referring back to the previous output, there were two good commits found. These commits can be given to the next invocation of git bisect start. This helps reduce the search space from 9 to 6 in this case.

    $ git bisect start HEAD HEAD~20 622186fbdf47e4c77aadba3e38567636ecbcccf5 5294debfa4910e4259112ce3c6d5a8c1cd346ae9 
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7] gallium/radeon: print more info about CMASK

Using the last bad commit can reduce the search even further.

    $ git bisect start 3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007 HEAD~20 622186fbdf47e4c77aadba3e38567636ecbcccf5 5294debfa4910e4259112ce3c6d5a8c1cd346ae9 
    Bisecting: 4 revisions left to test after this (roughly 2 steps)
    [2d712d35c57900fc0aa0f1455381de48cdda0073] gallium/radeon: move printing texture info into a separate function

Note that git-bisect does not emit "good" or "bad" information. You have to author your bisect script to emit that information. The report function is a good place to do this.

    function report()
        if [ $1 -eq 0 ]; then
            echo " says good"
        elif [ $1 -eq 125 ]; then
            echo " says skip"
            echo " says bad"

        git reset --hard HEAD
        exit $1

Remote Test Systems

Running tests on remote systems pose additional challenges. At the very least, there are three additional steps: get the built project on the remote system, start test on the remote system, and retrieve the result.

For these extra steps, rsync and ssh are powerful tools. There are numerous blog posts and tutorials dedicated to using rsync and ssh in various environments, and duplicating that effort is well beyond the scope of this post. However, there is one couple nice feature relative to automated bisects that is worth mentioning.

Recall that returning 255 from the script will cause the bisect to halt waiting for human intervention. It just so happens that ssh returns 255 when an error occurs. Otherwise it returns the result of the remote command. To make use of this, split the work across two scripts instead of putting all of the test in a single script. A new contains all of the commands that run on the build / bisect system, and contains all of the commands that run on the testing system. should (only) execute the test and exit with the same sort of return code as would. should build the project, copy the build to the testing system, and start the test on the testing system. The return code from should be directly returned from A simple doesn't look too different from

    if ! make ; then
        exit 125

    if ! rsync build_results; then
        exit 255

    # Use the return code from the test case
    exec ssh /path/to/test/

Since returns "normal" automated bisect return codes and ssh returns 255 on non-test failures, everything is taken care of.

Interactive Test Cases

Automated bisecting doesn't work out too well when the test itself cannot be automated. There is still some benefit to be had from automating the process. Optionally applying patches, building the project, sending files to remote systems, and starting the test can all still be automated, and I think "automated" applies only very loosely. When the test is done, the script should exit with a return code of 255. This will halt the bisect. Run git bisect good or git bisect bad. Then, run git bisect run ./ again.

It's tempting to just run by hand and skip git bisect run. The small advantage to the later is that skipping build failures will still be automated.

Going further requires making an interactive test case be non-interactive. For developers of OpenGL drivers, it is common to need to bisect rendering errors in games. This can be really, really painful and tedious. Most of the pain comes from the task not being automatable. Just loading the game and getting to the place where the error occurs can often take several minutes. These bugs are often reported by end-users who last tested with the previous release. From the 11.0 branch point to the 11.1 branch point on Mesa there were 2,308 commits.

    $ git bisect start 11.1-branchpoint 11.0-branchpoint 
    Bisecting: 1153 revisions left to test after this (roughly 10 steps)
    [bf5f931aee35e8448a6560545d86deb35f0639b3] nir: make nir_instrs_equal() static

When you realized that bisect will be 10 steps with at least 10 to 15 minutes per step, you may begin to feel your insides die. It's even worse if you accidentally type git bisect good when you meant git bisect bad along the way.

This is a common problem testing interactive applications. A variety of tools exist to remove the interactivity from interactive applications. apitrace is one such tool. Using apitrace, the OpenGL commands from the application can be recorded. This step must be done manually. The resulting trace can then be run at a known good commit, and an image can be captured from the portion of the trace that would exhibit the bug. This step must also be done manually, but the image capture is performed by a command line option to the trace replay command. Now a script can reply the trace, collect a new image, and compare the new image with the old image. If the images match, the commit is good. Otherwise, the commit is bad. This can be error prone, so it's a good idea to keep all the images from the bisect. A human can then examine all the images after the bisect to make sure the right choice were made at each commit tested.

A full apitrace tutorial is beyond the scope of this post. The apitrace documentation has some additional information about using apitrace with automated bisects.

What Now?

git-bisect is an amazingly powerful tool. Some of the more powerful aspects of GIT get a bad rap for being hard to use correctly: make one mistake, and you'll have to re-clone your tree. With even the more powerful aspects of git-bisect, such as automated bisects, it's actually hard to go wrong. There are two absolutely critical final tips. First, remember that you're bisecting. If you start performing other GIT commands in the middle of the bisect, both you and your tree will get confused. Second, remember to reset your tree using git bisect reset when you're done. Without this step, you'll still be bisecting, so see the first tip. git-bisect and automated bisects really make simple things simple and complex things possible.

November 29, 2015
For public documentation's sake, if you want to run Steam on 64-bit Ubuntu 15.10 together with the open-source radeonsi driver (the part of Mesa that implements OpenGL for recent AMD GPUs), you'll probably run into problems that can be fixed by starting Steam as

LD_PRELOAD=/usr/lib/i386-linux-gnu/ steam
(You'll have to have installed certain 32-bit packages, which you should have automatically been prompted to do while installing Steam.)

The background of this issue is that both Steam and radeonsi contain C++ code and dynamically link against the C++ runtime libraries. However, Steam comes with its own copy of these libraries, which it prefers to link against. Meanwhile, Ubuntu 15.10 is built using the most recent g++, which contains a new C++ ABI. This ABI is not supported by the C++ libraries shipped with Steam, so that by default, the radeonsi driver fails to load due to linking errors.

The workaround noted above fixes the problem by forcing the use of the newer C++ library. (Note that it does so only for 32-bit applications. If you're running 64-bit games from Steam, a similar workaround may be required.)
November 23, 2015

Long time no post!


A quick reminder for students (*) in Spain interested in participating in this year’s CUSL, the deadline for the project proposals has been extended until December 1st:

You’re still on time to submit a proposal!

* Universidad, bachiller, ciclos de grado medio…

Filed under: FreeDesktop Planet, GNOME Planet, GNU Planet, Meetings, Planets, Uncategorized
November 20, 2015

Yesterday I pushed an implementation of a new OpenGL extension GL_EXT_shader_samples_identical to Mesa. This extension will be in the Mesa 11.1 release in a few short weeks, and it will be enabled on various Intel platforms:

  • GEN7 (Ivy Bridge, Baytrail, Haswell): Only currently effective in the fragment shader. More details below.
  • GEN8 (Broadwell, Cherry Trail, Braswell): Only currently effective in the vertex shader and fragment shader. More details below.
  • GEN9 (Skylake): Only currently effective in the vertex shader and fragment shader. More details below.

The extension hasn't yet been published in the official OpenGL extension registry, but I will take care of that before Mesa 11.1 is released.


Multisample anti-aliasing (MSAA) is a well known technique for reducing aliasing effects ("jaggies") in rendered images. The core idea is that the expensive part of generating a single pixel color happens once. The cheaper part of determining where that color exists in the pixel happens multiple times. For 2x MSAA this happens twice, for 4x MSAA this happens four times, etc. The computation cost is not increased by much, but the storage and memory bandwidth costs are increased linearly.

Some time ago, a clever person noticed that in areas where the whole pixel is covered by the triangle, all of the samples have exactly the same value. Furthermore, since most triangles are (much) bigger than a pixel, this is really common. From there it is trivial to apply some sort of simple data compression to the sample data, and all modern GPUs do this in some form. In addition to the surface that stores the data, there is a multisample control surface (MCS) that describes the compression.

On Intel GPUs, sample data is stored in n separate planes. For 4x MSAA, there are four planes. The MCS has a small table for each pixel that maps a sample to a plane. If the entry for sample 2 in the MCS is 0, then the data for sample 2 is stored in plane 0. The GPU automatically uses this to reduce bandwidth usage. When writing a pixel on the interior of a polygon (where all the samples have the same value), the MCS gets all zeros written, and the sample value is written only to plane 0.

This does add some complexity to the shader compiler. When a shader executes the texelFetch function, several things happen behind the scenes. First, an instruction is issued to read the MCS. Then a second instruction is executed to read the sample data. This second instruction uses the sample index and the result of the MCS read as inputs.

A simple shader like

    #version 150
    uniform sampler2DMS tex;
    uniform int samplePos;

    in vec2 coord;
    out vec4 frag_color;

    void main()
       frag_color = texelFetch(tex, ivec2(coord), samplePos);

generates this assembly

    pln(16)         g8<1>F          g7<0,1,0>F      g2<8,8,1>F      { align1 1H compacted };
    pln(16)         g10<1>F         g7.4<0,1,0>F    g2<8,8,1>F      { align1 1H compacted };
    mov(16)         g12<1>F         g6<0,1,0>F                      { align1 1H compacted };
    mov(16)         g16<1>D         g8<8,8,1>F                      { align1 1H compacted };
    mov(16)         g18<1>D         g10<8,8,1>F                     { align1 1H compacted };
    send(16)        g2<1>UW         g16<8,8,1>F
                                sampler ld_mcs SIMD16 Surface = 1 Sampler = 0 mlen 4 rlen 8 { align1 1H };
    mov(16)         g14<1>F         g2<8,8,1>F                      { align1 1H compacted };
    send(16)        g120<1>UW       g12<8,8,1>F
                                sampler ld2dms SIMD16 Surface = 1 Sampler = 0 mlen 8 rlen 8 { align1 1H };
    sendc(16)       null<1>UW       g120<8,8,1>F
                                render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 { align1 1H EOT };

The ld_mcs instruction is the read from the MCS, and the ld2dms is the read from the multisample surface using the MCS data. If a shader reads multiple samples from the same location, the compiler will likely eliminate all but one of the ld_mcs instructions.

Modern GPUs also have an additional optimization. When an application clears a surface, some values are much more commonly used than others. Permutations of 0s and 1s are, by far, the most common. Bandwidth usage can further be reduced by taking advantage of this. With a single bit for each of red, green, blue, and alpha, only four bits are necessary to describe a clear color that contains only 0s and 1s. A special value could then be stored in the MCS for each sample that uses the fast-clear color. A clear operation that uses a fast-clear compatible color only has to modify the MCS.

All of this is well documented in the Programmer's Reference Manuals for Intel GPUs.

There's More

Information from the MCS can also help users of the multisample surface reduce memory bandwidth usage. Imagine a simple, straight forward shader that performs an MSAA resolve operation:

    #version 150
    uniform sampler2DMS tex;

    #define NUM_SAMPLES 4 // generate a different shader for each sample count

    in vec2 coord;
    out vec4 frag_color;

    void main()
        vec4 color = texelFetch(tex, ivec2(coord), 0);

        for (int i = 1; i < NUM_SAMPLES; i++)
            color += texelFetch(tex, ivec2(coord), i);

        frag_color = color / float(NUM_SAMPLES);

The problem should be obvious. On most pixels all of the samples will have the same color, but the shader still reads every sample. It's tempting to think the compiler should be able to fix this. In a very simple cases like this one, that may be possible, but such an optimization would be both challenging to implement and, likely, very easy to fool.

A better approach is to just make the data available to the shader, and that is where this extension comes in. A new function textureSamplesIdenticalEXT is added that allows the shader to detect the common case where all the samples have the same value. The new, optimized shader would be:

    #version 150
    #extension GL_EXT_shader_samples_identical: enable
    uniform sampler2DMS tex;

    #define NUM_SAMPLES 4 // generate a different shader for each sample count

    #if !defined GL_EXT_shader_samples_identical
    #define textureSamplesIdenticalEXT(t, c)  false

    in vec2 coord;
    out vec4 frag_color;

    void main()
        vec4 color = texelFetch(tex, ivec2(coord), 0);

        if (! textureSamplesIdenticalEXT(tex, ivec2(coord)) {
            for (int i = 1; i < NUM_SAMPLES; i++)
                color += texelFetch(tex, ivec2(coord), i);

            color /= float(NUM_SAMPLES);

        frag_color = color;

The intention is that this function be implemented by simply examining the MCS data. At least on Intel GPUs, if the MCS for a pixel is all 0s, then all the samples are the same. Since textureSamplesIdenticalEXT can reuse the MCS data read by the first texelFetch call, there are no extra reads from memory. There is just a single compare and conditional branch. These added instructions can be scheduled while waiting for the ld2dms instruction to read from memory (slow), so they are practically free.

It is also tempting to use textureSamplesIdenticalEXT in conjunction with anyInvocationsARB (from GL_ARB_shader_group_vote). Such a shader might look like:

    #version 430
    #extension GL_EXT_shader_samples_identical: require
    #extension GL_ARB_shader_group_vote: require
    uniform sampler2DMS tex;

    #define NUM_SAMPLES 4 // generate a different shader for each sample count

    in vec2 coord;
    out vec4 frag_color;

    void main()
        vec4 color = texelFetch(tex, ivec2(coord), 0);

        if (anyInvocationsARB(!textureSamplesIdenticalEXT(tex, ivec2(coord))) {
            for (int i = 1; i < NUM_SAMPLES; i++)
                color += texelFetch(tex, ivec2(coord), i);

            color /= float(NUM_SAMPLES);

        frag_color = color;

Whether or not using anyInvocationsARB improves performance is likely to be dependent on both the shader and the underlying GPU hardware. Currently Mesa does not support GL_ARB_shader_group_vote, so I don't have any data one way or the other.


The implementation of this extension that will ship with Mesa 11.1 has a three main caveats. Each of these will likely be resolved to some extent in future releases.

The extension is only effective on scalar shader units. This means on GEN7 it is effective in fragment shaders. On GEN8 and GEN9 it is only effective in vertex shaders and fragment shaders. It is supported in all shader stages, but in non-scalar stages textureSamplesIdenticalEXT always returns false. The implementation for the non-scalar stages is slightly different, and, on GEN9, the exact set of instructions depends on the number of samples. I didn't think it was likely that people would want to use this feature in a vertex shader or geometry shader, so I just didn't finish the implementation. This will almost certainly be resolved in Mesa 11.2.

The current implementation also returns a false negative for texels fully set to the fast-clear color. There are two problems with the fast-clear color. It uses a different value than the "all plane 0" case, and the size of the value depends on the number of samples. For 2x MSAA, the MCS read returns 0x000000ff, but for 8x MSAA it returns 0xffffffff.

The first problem means the compiler would needs to generate additional instructions to check for "all plane 0" or "all fast-clear color." This could hurt the performance of applications that either don't use a fast-clear color or, more likely, that later draw non-clear data to the entire surface. The second problem means the compiler would needs to do state-based recompiles when the number of samples changes.

In the end, we decided that "all plane 0" was by far the most common case, so we have ignored the "all fast-clear color" case for the time being. We are still collecting data from applications, and we're working on several uses of this functionality inside our driver. In future versions we may implement a heuristic to determine whether or not to check for the fast-clear color case.

As mentioned above, Mesa does not currently support GL_ARB_shader_group_vote. Applications that want to use textureSamplesIdenticalEXT on Mesa will need paths that do not use anyInvocationsARB for at least the time being.


As stated by issue #3, the extension still needs to gain SPIR-V support. This extension would be just as useful in Vulkan and OpenCL as it is in OpenGL.

At some point there is likely to be a follow-on extension that provides more MCS data to the shader in a more raw form. As stated in issue #2 and previously in this post, there are a few problems with providing raw MCS data. The biggest problem is how the data is returned. Each sample count needs a different amount of data. Current 8x MSAA surfaces have 32-bits (returned) per pixel. Current 16x MSAA MCS surfaces have 64-bits per pixel. Future 32x MSAA, should that ever exist, would need 192 bits. Additionally, there would need to be a set of new texelFetch functions that take both a sample index and the MCS data. This, again, has problems with variable data size.

Applications would also want to query things about the MCS values. How many times is plane 0 used? Which samples use plane 2? What is the highest plane used? There could be other useful queries. I can imagine that a high quality, high performance multisample resolve filter could want all of this information. Since the data changes based on the sample count and could change on future hardware, the future extension really should not directly expose the encoding of the MCS data. How should it provide the data? I'm expecting to write some demo applications and experiment with a bunch of different things. Obviously, this is an open area of research.