August 22, 2016
Last week I finally plugged in the camera module I got a while ago to go take a look at what vc4 needs for displaying camera output.

The surprising answer was "nothing."  vc4 could successfully import RGB dmabufs and display them as planes, even though I had been expecting to need fixes on that front.

However, the bcm2835 v4l camera driver needs a lot of work.  First of all, it doesn't use the proper contiguous memory support in v4l (vb2-dma-contig), and instead asks the firmware to copy from the firmware's contiguous memory into vmalloced kernel memory.  This wastes memory and wastes memory bandwidth, and doesn't give us dma-buf support.

Even more, MMAL (the v4l equivalent that the firmware exposes for driving the hardware) wants to output planar buffers with specific padding.  However, instead of using the multi-plane format support in v4l to expose buffers with that padding, the bcm2835 driver asks the firmware to do another copy from the firmware's planar layout into the old no-padding V4L planar format.

As a user of the V4L api, you're also in trouble because none of these formats have any priority information that I can see: The camera driver says it's equally happy to give you RGB or planar, even though RGB costs an extra copy.  I think properly done today, the camera driver would be exposing multi-plane planar YUV, and giving you a mem2mem adapter that could use MMAL calls to turn the planar YUV into RGB.

For now, I've updated the bug report with links to the demo code and instructions.

I also spent a little bit of time last week finishing off the series to use st/nir in vc4.  I managed to get to no regressions, and landed it today.  It doesn't eliminate TGSI, but it does mean TGSI is gone from the normal GLSL path.

Finally, I got inspired to do some work on testing.  I've been doing some free time work on servo, Mozilla's Rust-based web browser, and their development environment has been a delight as a new developer.  All patch submissions, from core developers or from newbies, go through github pull requests.  When you generate a PR, Travis builds and runs the unit tests on the PR.  Then a core developer reviews the code by adding a "r" comment in the PR or provides feedback.  Once it's reviewed, a bot picks up the pull request, tries merging it to master, then runs the full integration test suite on it.  If the test suite passes, the bot merges it to master, otherwise the bot writes a comment with a link to the build/test logs.

Compare this to Mesa's development process.  You make a patch.  You file it in the issue tracker and it gets utterly ignored.  You complain, and someone tells you you got the process wrong, so you join the mailing list and send your patch (and then get a flood of email until you unsubscribe).  It gets mangled by your email client, and you get told to use git-send-email, so you screw around with that for a while before you get an email that will actually show up in people's inboxes.  Then someone reviews it (hopefully) before it scrolls off the end of their inbox, and then it doesn't get committed anyway because your name was familiar enough that the reviewer thought maybe you had commit access.  Or they do land your patch, and it turns out you hasn't run the integration tests and then people complain at you for not testing.

So, as a first step toward making a process like Mozilla's possible, I put some time into fixing up Travis on Mesa, and building Travis support for the X Server.  If I can get Travis to run piglit and ensure that expected-pass tests don't regress, that at least gives us a documentable path for new developers in these two projects to put their code up on github and get automated testing of the branches they're proposing on the mailing lists.
August 16, 2016

Wrapping libudev using LD_PRELOAD

Peter Hutterer and I were chasing down an X server bug which was exposed when running the libinput test suite against the X server with a separate thread for input. This was crashing deep inside libudev, which led us to suspect that libudev was getting run from multiple threads at the same time.

I figured I'd be able to tell by wrapping all of the libudev calls from the server and checking to make sure we weren't ever calling it from both threads at the same time. My first attempt was a simple set of cpp macros, but that failed when I discovered that libwacom was calling libgudev, which was calling libudev.

Instead of recompiling the world with my magic macros, I created a new library which exposes all of the (public) symbols in libudev. Each of these functions does a bit of checking and then simply calls down to the 'real' function.

Finding the real symbols

Here's the snippet which finds the real symbols:

static void *udev_symbol(const char *symbol)
    static void *libudev;
    static pthread_mutex_t  find_lock = PTHREAD_MUTEX_INITIALIZER;

    void *sym;
    if (!libudev) {
        libudev = dlopen("", RTLD_LOCAL | RTLD_NOW);
    sym = dlsym(libudev, symbol);
    return sym;

Yeah, the libudev version is hard-coded into the source; I didn't want to accidentally load the wrong one. This could probably be improved...

Checking for re-entrancy

As mentioned above, we suspected that the bug was caused when libudev got called from two threads at the same time. So, our checks are pretty simple; we just count the number of calls into any udev function (to handle udev calling itself). If there are other calls in process, we make sure the thread ID for those is the same as the current thread.

static void udev_enter(const char *func) {
    assert (udev_running == 0 || udev_thread == pthread_self());
    udev_thread = pthread_self();
    udev_func[udev_running] = func;

static void udev_exit(void) {
    if (udev_running == 0)
    udev_thread = 0;
    udev_func[udev_running] = 0;

Wrapping functions

Now, the ugly part -- libudev exposes 93 different functions, with a wide variety of parameters and return types. I constructed a hacky macro, calls for which could be constructed pretty easily from the prototypes found in libudev.h, and which would construct our stub function:

#define make_func(type, name, formals, actuals)         \
    type name formals {                     \
    type ret;                       \
    static void *f;                     \
    if (!f)                         \
        f = udev_symbol(__func__);              \
    udev_enter(__func__);                   \
    ret = ((typeof (&name)) f) actuals;         \
    udev_exit();                        \
    return ret;                     \

There are 93 invocations of this macro (or a variant for void functions) which look much like:

make_func(struct udev *,
      (struct udev *udev),

Using udevwrap

To use udevwrap, simply stick the filename of the .so in LD_PRELOAD and run your program normally:

# LD_PRELOAD=/usr/local/lib/ Xorg 

Source code

I stuck udevwrap in my git repository:;a=summary

You can clone it using

$ git git://
August 15, 2016

A Preliminary systemd.conf 2016 Schedule is Now Available!

We have just published a first, preliminary version of the systemd.conf 2016 schedule. There is a small number of white slots in the schedule still, because we're missing confirmation from a small number of presenters. The missing talks will be added in as soon as they are confirmed.

The schedule consists of 5 workshops by high-profile speakers during the workshop day, 22 exciting talks during the main conference days, followed by one full day of hackfests.

Please sign up for the conference soon! Only a limited number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) Please sign up here for the conference!

Last week I mostly worked on getting the upstream work I and others have done into downstream Raspbian (most of that time unfortunately in setting up another Raspbian development environment, after yet another SD card failed).

However, the most exciting thing for most users is that with the merge of the rpi-4.4.y-dsi-stub-squash branch, the DSI display should now come up by default with the open source driver.  This is unfortunately not a full upstreamable DSI driver, because the closed-source firmware is getting in the way of Linux by stealing our interrupts and then talking to the hardware behind our backs.  To work around the firmware, I never talk to the DSI hardware, and we just replace the HVS display plane configuration on the DSI's output pipe.  This means your display backlight is always on and the DSI link is always running, but better that than no display.

I also transferred the wiki I had made for VC4 over to github.  In doing so, I was pleasantly surprised at how much documentation I wanted to write once I got off of the awful wiki software at freedesktop.  You can find more information on VC4 at my mesa and linux trees.

(Side note, wikis on github are interesting.  When you make your fork, you inherit the wiki of whoever you fork from, and you can do PRs back to their wiki similarly to how you would for the main repo.  So my linux tree has Raspberry Pi's wiki too, and I'm wondering if I want to move all of my wiki over to their tree.  I'm not sure.)

Is there anything that people think should be documented for the vc4 project that isn't there?
August 12, 2016

So we have two jobs openings in the Red Hat desktop team. What we are looking for is people to help us ensure that Fedora and RHEL runs great on various desktop hardware, with a focus on laptops. Since these jobs require continuous access to a lot of new and different hardware we can not accept applications this time for remotees, but require you to work out of out office in Munich, Germany. We are looking for people with people not afraid to jump into a lot of different code and who likes tinkering with new hardware. The hardware enablement here might include some kernel level work, but will more likely involve improving higher level stacks. So for example if we have a new laptop where bluetooth doesn’t work you would need to investigate and figure out if the problem is in the kernel, in the bluez stack or in our Bluetooth desktop parts.

This will be quite varied work and we expect you to be part of a team which will be looking at anything from driver bugs, battery life issues, implementing new stacks, biometric login and enabling existing features in the kernel or in low level libraries in the user interface.

You can read more about the jobs at the That link lists a Senior Engineer, but we also got a Principal Engineer position open with id 53653, but that one is not on the website as I post this, but should hopefully be very soon.

Also if you happen to be in the Karlsruhe area or at GUADEC this year I will be here until Sunday, so you could come over for a chat. Feel free to email me on if you are interested in meeting up.

August 11, 2016
A couple of weeks ago, I hinted at a presentation that I wanted to do during this year's GUADEC, as a Lightning talk.

Unfortunately, I didn't get a chance to finish the work that I set out to do, encountering a couple of bugs that set me back. Hopefully this will get resolved post-GUADEC, so you can expect some announcements later on in the year.

At least one of the tasks I set to do worked out, and was promptly obsoleted by a nicer solution. Let's dive in.

How to compile for a different architecture

There are four possible solutions to compile programs for a different architecture:

  • Native compilation: get a machine of that architecture, install your development packages, and compile. This is nice when you have fast machines with plenty of RAM to compile on, usually developer boards, not so good when you target low-power devices.
  • Cross-compilation: install a version of GCC and friends that runs on your machine's architecture, but produces binaries for your target one. This is usually fast, but you won't be able to run the binaries created, so might end up with some data created from a different set of options, and won't be able to run the generated test suite.
  • Virtual Machine: you'd run a virtual machine for the target architecture, install an OS, and build everything. This is slower than cross-compilation, but avoids the problems you'd see in cross-compilation.
The final option is one that's used more and more, mixing the last 2 solutions: the QEmu user-space emulator.

Using the QEMU user-space emulator

If you want to run just the one command, you'd do something like:

qemu-static-arm myarmbinary

Easy enough, but hardly something you want to try when compiling a whole application, with library dependencies. This is where binfmt support in Linux comes into play. Register the ELF format for your target with that user-space emulator, and you can run myarmbinary without any commands before it.

One thing to note though, is that this won't work as easily if the qemu user-space emulator and the target executable are built as a dynamic executables: QEmu will need to find the libraries for your architecture, usually x86-64, to launch itself, and the emulated binary will also need to find its libraries.

To solve that first problem, there are QEmu static binaries available in a number of distributions (Fedora support is coming). For the second one, the easiest would be if we didn't have to mix native and target libraries on the filesystem, in a chroot, or container for example. Hmm, container you say.

Running QEmu user-space emulator in a container

We have our statically compiled QEmu, and a filesystem with our target binaries, and switched the root filesystem. Well, you try to run anything, and you get a bunch of errors. The problem is that there is a single binfmt configuration for the kernel, whether it's the normal OS, or inside a container or chroot.

The Flatpak hack

This commit for Flatpak works-around the problem. The binary for the emulator needs to have the right path, so it can be found within the chroot'ed environment, and it will need to be copied there so it is accessible too, which is what this patch will do for you.

Follow the instructions in the commit, and test it out with this Flatpak script for GNU Hello.

$ TARGET=arm ./
$ ls org.gnu.hello.arm.xdgapp
918k org.gnu.hello.arm.xdgapp

Ready to install on your device!

The proper way

The above solution was built before it looked like the "proper way" was going to find its way in the upstream kernel. This should hopefully land in the upcoming 4.8 kernel.

Instead of launching a separate binary for each non-native invocation, this patchset allows the kernel to keep the binary opened, so it doesn't need to be copied to the container.

In short

With the work being done on Fedora's static QEmu user-space emulators, and the kernel feature that will land, we should be able to have a nice tickbox in Builder to build for any of the targets supported by QEmu.

Get cross-compiling!

Three years after my definitive guide on Python classic, static, class and abstract methods, it seems to be time for a new one. Here, I would like to dissect and discuss Python exceptions.

Dissecting the base exceptions

In Python, the base exception class is named BaseException. Being rarely used in any program or library, it ought to be considered as an implementation detail. But to discover how it's implemented, you can go and read Objects/exceptions.c in the CPython source code. In that file, what is interesting is to see that the BaseException class defines all the basic methods and attribute of exceptions. The basic well-known Exception class is then simply defined as a subclass of BaseException, nothing more:

* Exception extends BaseException
SimpleExtendsException(PyExc_BaseException, Exception,
"Common base class for all non-exit exceptions.");

The only other exceptions that inherits directly from BaseException are GeneratorExit, SystemExit and KeyboardInterrupt. All the other builtin exceptions inherits from Exception. The whole hierarchy can be seen by running pydoc2 exceptions or pydoc3 builtins.

Here are the graph representing the builtin exceptions inheritance in Python 2 and Python 3 (generated using this script).

Python 2 builtin exceptions inheritance graph
Python 3 builtin exceptions inheritance graph

The BaseException.__init__ signature is actually BaseException.__init__(*args). This initialization method stores any arguments that is passed in the args attribute of the exception. This can be seen in the exceptions.c source code – and is true for both Python 2 and Python 3:

static int
BaseException_init(PyBaseExceptionObject *self, PyObject *args, PyObject *kwds)
if (!_PyArg_NoKeywords(Py_TYPE(self)->tp_name, kwds))
return -1;
Py_XSETREF(self->args, args);
return 0;

The only place where this args attribute is used is in the BaseException.__str__ method. This method uses self.args to convert an exception to a string:

static PyObject *
BaseException_str(PyBaseExceptionObject *self)
switch (PyTuple_GET_SIZE(self->args)) {
case 0:
return PyUnicode_FromString("");
case 1:
return PyObject_Str(PyTuple_GET_ITEM(self->args, 0));
return PyObject_Str(self->args);

This can be translated in Python to:

def __str__(self):
if len(self.args) == 0:
return ""
if len(self.args) == 1:
return str(self.args[0])
return str(self.args)

Therefore, the message to display for an exception should be passed as the first and the only argument to the BaseException.__init__ method.

Defining your exceptions properly

As you may already know, in Python, exceptions can be raised in any part of the program. The basic exception is called Exception and can be used anywhere in your program. In real life, however no program nor library should ever raise Exception directly: it's not specific enough to be helpful.

Since all exceptions are expected to be derived from the base class Exception, this base class can easily be used as a catch-all:

except Exception:
# THis will catch any exception!
print("Something terrible happened")

To define your own exceptions correctly, there are a few rules and best practice that you need to follow:

  • Always inherit from (at least) Exception:
    class MyOwnError(Exception):

  • Leverage what we saw earlier about BaseException.__str__: it uses the first argument passed to BaseException.__init__ to be printed, so always call BaseException.__init__ with only one argument.

  • When building a library, define a base class inheriting from Excepion. It will make it easier for consumers to catch any exception from the library:

    class ShoeError(Exception):
    """Basic exception for errors raised by shoes"""
    class UntiedShoelace(ShoeError):
    """You could fall"""
    class WrongFoot(ShoeError):
    """When you try to wear your left show on your right foot"""

    It then makes it easy to use except ShoeError when doing anything with that piece of code related to shoes. For example, Django does not do that for some of its exceptions, making it hard to catch "any exception raised by Django".

  • Provide details about the error. This is extremely valuable to be able to log correctly errors or take further action and try to recover:

    class CarError(Exception):
    """Basic exception for errors raised by cars"""
    def init(self, car, msg=None):
    if msg is None:
    # Set some default useful error message
    msg = "An error occured with car %s" % car
    super(CarError, self).init(msg) = car
    class CarCrashError(CarError):
    """When you drive too fast"""
    def init(self, car, other_car, speed):
    super(CarCrashError, self).init(
    car, msg="Car crashed into %s at speed %d" % (other_car, speed))
    self.speed = speed
    self.other_car = other_car

    Then, any code can inspect the exception to take further action:
    except CarCrashError as e:
    # If we crash at high speed, we call emergency
    if e.speed >= 30:

    For example, this is leveraged in Gnocchi to raise specific application exceptions (NoSuchArchivePolicy) on expected foreign key violations raised by SQL constraints:
    with self.facade.writer() as session:
    except exception.DBReferenceError as e:
    if e.constraint == 'fk_metric_ap_name_ap_name':
    raise indexer.NoSuchArchivePolicy(archive_policy_name)

  • Inherits from builtin exceptions types when it makes sense. This makes it easier for programs to not be specific to your application or library:
    class CarError(Exception):
    """Basic exception for errors raised by cars"""
    class InvalidColor(CarError, ValueError):
    """Raised when the color for a car is invalid"""

    That allows many programs to catch errors in a more generic way without noticing your own defined type. If a program already knows how to handle a ValueError, it won't need any specific code nor modification.


There is no limitation on where and when you can define exceptions. As they are, after all, normal classes, they can be defined in any module, function or class – even as closures.

Most libraries package their exceptions into a specific exception module: SQLAlchemy has them in sqlalchemy.exc, requests has them in requests.exceptions, Werkzeug has them in werkzeug.exceptions, etc.

That makes sense for libraries to export exceptions that way, as it makes it very easy for consumers to import their exception module and know where the exceptions are defined when writing code to handle errors.

This is not mandatory, and smaller Python modules might want to retain their exceptions into their sole module. Typically, if your module is small enough to be kept in one file, don't bother splitting your exceptions into a different file/module.

While this wisely applies to libraries, applications tend to be different beasts. Usually, they are composed of different subsystems, where each one might have its own set of exceptions. This is why I generally discourage going with only one exception module in an application, but to split them across the different parts of one's program. There might be no need of a special myapp.exceptions module.

For example, if your application is composed of an HTTP REST API defined into the module myapp.http and of a TCP server contained into myapp.tcp, it's likely they can both define different exceptions tied to their own protocol errors and cycle of life. Defining those exceptions in a myapp.exceptions module would just scatter the code for the sake of some useless consistency. If the exceptions are local to a file, just define them somewhere at the top of that file. It will simplify the maintenance of the code.

Wrapping exceptions

Wrapping exception is the practice by which one exception is encapsulated into another:

class MylibError(Exception):
"""Generic exception for mylib"""
def __init__(self, msg, original_exception)
super(MylibError, self).__init__(msg + (": %s" % original_exception))
self.original_exception = original_exception
except requests.exceptions.ConnectionError as e:
raise MylibError("Unable to connect", e)

This makes sense when writing a library which leverages other libraries. If a library uses requests and does not encapsulate requests exceptions into its own defined error classes, it will be a case of layer violation. Any application using your library might receive a requests.exceptions.ConnectionError, which is a problem because:

  1. The application has no clue that the library was using requests and does not need/want to know about it.
  2. The application will have to import requests.exceptions itself and therefore will depend on requests – even if it does not use it directly.
  3. As soon as mylib changes from requests to e.g. httplib2, the application code catching requests exceptions will become irrelevant.

The Tooz library is a good example of wrapping, as it uses a driver-based approach and depends on a lot of different Python modules to talk to different backends (ZooKeeper, PostgreSQL, etcd…). Therefore, it wraps exception from other modules on every occasion into its own set of error classes. Python 3 introduced the raise from form to help with that, and that's what Tooz leverages to raise its own error.

It's also possible to encapsulate the original exception into a custom defined exception, as done above. That makes the original exception available for inspection easily.

Catching and logging

When designing exceptions, it's important to remember that they should be targeted both at humans and computers. That's why they should include an explicit message, and embed as much information as possible. That will help to debug and write resilient programs that can pivot their behavior depending on the attributes of exception, as seen above.

Also, silencing exceptions completely is to be considered as bad practice. You should not write code like that:

except Exception:
# Whatever

Not having any kind of information in a program where an exception occurs is a nightmare to debug.

If you use (and you should) the logging library, you can use the exc_info parameter to log a complete traceback when an exception occurs, which might help debugging on severe and unrecoverable failure:

except Exception:
logging.getLogger().error("Something bad happened", exc_info=True)

Further reading

If you understood everything so far, congratulations, you might be ready to handle exception in Python! If you want to have a broader scope on exceptions and what Python misses, I encourage you to read about condition systems and discover the generalization of exceptions – that I hope we'll see in Python one day!

I hope this will help you building better libraries and application. Feel free to shoot any question in the comment section!

August 09, 2016
At the bottom of the release notes for GNOME 3.20, you might have seen the line:
If you plug in an audio device (such as a headset, headphones or microphone) and it cannot be identified, you will now be asked what kind of device it is. This addresses an issue that prevented headsets and microphones being used on many Dell computers.
Before I start explaining what this does, as a picture is worth a thousand words:

This selection dialogue is one you will get on some laptops and desktop machines when the hardware is not able to detect whether the plugged in device is headphones, a microphone, or a combination of both, probably because it doesn't have an impedance detection circuit to figure that out.

This functionality was integrated into Unity's gnome-settings-daemon version a couple of years ago, written by David Henningsson.

The code that existed for this functionality was completely independent, not using any of the facilities available in the media-keys plugin to volume keys, and it could probably have been split as an external binary with very little effort.

After a bit of to and fro, most of the sound backend functionality was merged into libgnome-volume-control, leaving just 2 entry points, one to signal that something was plugged into the jack, and another to select which type of device was plugged in, in response to the user selection. This means that the functionality should be easily implementable in other desktop environments that use libgnome-volume-control to interact with PulseAudio.

Many thanks to David Henningsson for the original code, and his help integrating the functionality into GNOME, Bednet for providing hardware to test and maintain this functionality, and Allan, Florian and Rui for working on the UI notification part of the functionality, and wiring it all up after I abandoned them to go on holidays ;)
August 08, 2016
Last week's project for vc4 was to take a look at memory usage.  Eben had expressed concern that the new driver stack would use more memory than the closed stack, and so I figured I would spend a little while working on that.

I first pulled out valgrind's massif tool on piglit's glsl-algebraic-add-add-1.shader_test.  This works as a minimum "how much memory does it take to render *anything* with this driver?" test.  We were consuming 1605k of heap at the peak, and there were some obvious fixes to be made.

First, the gallium state_tracker was allocating 659kb of space at context creation so that it could bytecode-interpret TGSI if needed for glRasterPos() and glRenderMode(GL_FEEDBACK).  Given that nobody should ever use those features, and luckily they rarely do, I delayed the allocation of the somewhat misleadingly-named "draw" context until the fallbacks were needed.

Second, Mesa was allocating the memory for the GL 1.x matrix stacks up front at context creation.  We advertise 32 matrices for modelview/projection, 10 per texture unit (32 of those), and 4 for programs.  I instead implemented a typical doubling array reallocation scheme for storing the matrices, so that only the top matrix per stack is allocated at context creation.  This saved 63kb of dirty memory per context.

722KB for these two fixes may not seem like a whole lot of memory to readers on fancy desktop hardware with 8GB of RAM, but the Raspberry Pi has only 1GB of RAM, and when you exhaust that you're swapping to an SD card.  You should also expect a desktop to have several GL contexts created: the X Server uses one to do its rendering, you have a GL-based compositor with its own context, and your web browser and LibreOffice may each have one or more.  Additionally, trying to complete our piglit testsuite on the Raspberry Pi is currently taking me 6.5 hours (when it even succeeds and doesn't see piglit's python runner get shot by the OOM killer), so I could use any help I can get in reducing context initialization time.

However, malloc()-based memory isn't all that's involved.  The GPU buffer objects that get allocated don't get counted by massif in my analysis above.  To try to approximately fix this, I added in valgrind macro calls to mark the mmap()ed space in a buffer object as being a malloc-like operation until the point that the BO is freed.  This doesn't get at allocations for things like the window-system renderbuffers or the kernel's overflow BO (valgrind requires that you have a pointer involved to report it to massif), but it does help.

Once I has massif reporting more, I noticed that glmark2 -b terrain was allocating a *lot* of memory for shader BOs.  Going through them, an obvious problem was that we were generating a lot of shaders for glGenerateMipmap().  A few weeks ago I improved performance on the benchmark by fixing glGenerateMipmap()'s fallback blits that we were doing because vc4 doesn't support the GL_TEXTURE_BASE_LEVEL that the gallium aux code uses.  I had fixed the fallback by making the shader do an explicit-LOD lookup of the base level if the GL_TEXTURE_BASE_LEVEL==GL_TEXTURE_MAX_LEVEL.  However, in the process I made the shader depend on that base level, so we would comple a new shader variant per level of the texture.  The fix was to make the base level into a uniform value that's uploaded per draw call, and with that change I dropped 572 shader variants from my shader-db results.

Reducing extra shaders was fun, so I set off on another project I had thought of before.  VC4's vertex shader to fragment shader IO system is a bit unusual in that it's just a FIFO of floats (effectively), with none of these silly "vec4"s that permeate GLSL.  Since I can take my inputs in any order, and more flexibility in the FS means avoiding register allocation failures sometimes, I have the FS compiler tell the VS what order it would like its inputs in.  However, the list of all the inputs in their arbitrary orders would be expensive to hash at draw time, so I had just been using the identity of the compiled fragment shader variant in the VS and CS's key to decide when to recompile it in case output order changed.  The trick was that, while the set of all possible orders is huge, the number that any particular application will use is quite small.  I take the FS's input order array, keep it in a set, and use the pointer to the data in the set as the key.  This cut 712 shaders from shader-db.

Also, embarassingly, when I mentioned tracking the FS in the CS's key above?  Coordinate shaders don't output anything to the fragment shader.  Like the name says, they just generate coordinates, which get consumed by the binner.  So, by removing the FS from the CS key, I trivially cut 754 shaders from shader-db.  Between the two, piglit's gl-1.0-blend-func test now passes instead of OOMing, so we get test coverage on blending.

Relatedly, while working on fixing a kernel oops recently, I had noticed that we were still reallocating the overflow BO on every draw call.  This was old debug code from when I was first figuring out how overflow worked.  Since each client can have up to 5 outstanding jobs (limited by Mesa) and each job was allocating a 256KB BO, we coud be saving a MB or so per client assuming they weren't using much of their overflow (likely true for the X Server).  The solution, now that I understand the overflow system better, was just to not reallocate and let the new job fill out the previous overflow area.

Other projects for the week that I won't expand on here: Debugging GPU hang in piglit glsl-routing (generated fixes for vc4-gpu-tools parser, tried writing a GFXH30 workaround patch, still not fixed) and working on supporting direct GLSL IR to NIR translation (lots of cleanups, a couple fixes, patches on the Mesa list).
August 05, 2016

A common issue with users typing on a laptop is that the user's palms will inadvertently get in contact with the touchpad at some point, causing the cursor to move and/or click. In the best case it's annoying, in the worst case you're now typing your password into the newly focused twitter application. While this provides some general entertainment and thus makes the world a better place for a short while, here at the libinput HQ [1] we strive to keep life as boring as possible and avoid those situations.

The best way to avoid accidental input is to detect palm touches and simply ignore them. That works ok-ish on some touchpads and fails badly on others. Lots of hardware is barely able to provide an accurate touch location, let alone enough information to decide whether a touch is a palm. libinput's palm detection largely works by using areas on the touchpad that are likely to be touched by the palms.

The second-best way to avoid accidental input is to disable the touchpad while a user is typing. The libinput marketing department [2] has decided to name this feature "disable-while-typing" (DWT) and it's been in libinput for quite a while. In this post I'll describe how exactly DWT works in libinput.

Back in the olden days of roughly two years ago we all used the synaptics X.Org driver and were happy with it [3]. Disable-while-typing was featured there through the use of a tool called syndaemon. This synaptics daemon [4] has two modes. One was to poll the keyboard state every few milliseconds and check whether a key was down. If so, syndaemon sends a command to the driver to tell it to disable itself. After a timeout when the keyboard state is neutral again syndaemon tells the driver to re-enable itself. This causes a lot of wakeups, especially during those 95% of the time when the user isn't actually typing. Or missed keys if the press + release occurs between two polls. Hence the second mode, using the RECORD extension, where syndaemon opens a second connection to the X server and end checks for key events [5]. If it sees one float past, it tells the driver to disable itself, and so on and so forth. Either way, you had a separate process that did that job. syndaemon had a couple of extra options and features that I'm not going to discuss here, but we've replicated the useful ones in libinput.

libinput has no external process, DWT is integrated into the library with a couple of smart extra features. This is made easier by libinput controlling all the devices, so all keyboard events are passed around internally to the touchpad backend. That backend then decides whether it should stop sending events. And this is where the interesting bits come in.

First, we have different timeouts: if you only hit a single key, the touchpad will re-enable itself quicker than after a period of typing. So if you use the touchpad, hit a key to trigger some UI the pointer only stops moving for a very short time. But once you type, the touchpad disables itself longer. Since your hand is now in a position over the keyboard, moving back to the touchpad takes time anyway so a longer timeout doesn't hurt. And as typing is interrupted by pauses, a longer timeout bridges over those to avoid accidental movement of the cursor.

Second, any touch started while you were typing is permanently ignored, so it's safe to rest the palm on the touchpad while typing and leave it there. But we keep track of the start time of each touch so any touch started after the last key event will work normally once the DWT timeout expires. You may feel a short delay but it should be well in the acceptable range of a tens of ms.

Third, libinput is smart enough to detect which keyboard to pair with. If you have an external touchpad like the Apple Magic Trackpad or a Logitech T650, DWT will never enable on those. Likewise, typing on an external keyboard won't disable the internal touchpad. And in the rare case of two internal touchpads [6], both of them will do the right thing. As of systemd v231 the information of whether a touchpad is internal or external is available in the ID_INPUT_TOUCHPAD_INTEGRATION udev tag and thus available to everyone, not just libinput.

Finally, modifier keys are ignored for DWT, so using the touchpad to do shift-clicks works unimpeded. This also goes for the F-Key row and the numpad if you have any. These keys are usually out of the range of the touchpad anyway so interference is not an issue here. As of today, modifier key combos work too. So hitting Ctrl+S to save a document won't disable the touchpad (or any other modifiers + key combination). But once you are typing DWT activates and if you now type Shift+S to type the letter 'S' the touchpad remains disabled.

So in summary: what we've gained from switching to libinput is one external process less that causes wakeups and the ability to be a lot smarter about when we disable the touchpad. Coincidentally, libinput has similar code to avoid touchpad interference when the trackpoint is in use.

[1] that would be me
[2] also me
[3] uphill, both ways, snow, etc.
[4] nope. this one wasn't my fault
[5] Yes, syndaemon is effectively a keylogger, except it doesn't do any of the "logging" bit a keylogger would be expected to do to live up to its name
[6] This currently happens on some Dell laptops using hid-i2c. We get two devices, one named "DLL0704:01 06CB:76AE Touchpad" or similar and one "SynPS/2 Synaptics TouchPad". The latter one will never send events unless hid-i2c is disabled in the kernel

August 01, 2016
This weekend I landed a patchset in mesa to add support for resource shadowing and batch re-ordering in freedreno.  What this is, will take a bit of explaining, but the tl;dr: is a nice fps boost in many games/apps.

But first, a bit of background about tiling gpu's:  the basic idea of a tiler is to render N draw calls a tile at a time, with a tile's worth of the "framebuffer state" (ie. each of the MRT color bufs + depth/stencil) resident in an internal tile buffer.  The idea is that most of your memory traffic is to/from your color and z/s buffers.  So rather than rendering each of your draw calls in it's entirety, you split the screen up into tiles and repeat each of the N draws for each tile to fast internal/on-chip memory.  This avoids going back to main memory for each of the color and z/s buffer accesses, and enables a tiler to do more with less memory bandwidth.  But it means there is never a single point in the sequence of draws.. ie. draw #1 for tile #2 could happen after draw #2 for tile #1.  (Also, that is why GL_TIMESTAMP queries are bonkers for tilers.)

For purpose of discussion (and also how things are named in the code, if you look), I will define a tile-pass, ie. rendering of N draws for each tile in succession (or even if multiple tiles are rendered in parallel) as a "batch".

Unfortunately, many games/apps are not written with tilers in mind.  There are a handful of common anti-patterns which force a driver for a tiling gpu to flush the current batch.  Examples are unnecessary FBO switches, and texture or UBO uploads mid-batch.

For example, with a 1920x1080 r8g8b8a8 render target, with z24s8 depth/stencil buffer, an unnecessary batch flush costs you 16MB of write memory bandwidth, plus another 16MB of read when we later need to pull the data back into the tile buffer.  That number can easily get much bigger with games using float16 or float32 (rather than 8 bits per component) intermediate buffers, and/or multiple render targets.  Ie. two MRT's with float16 internal-format plus z24s8 z/s would be 40MB write + 40MB read per extra flush.

So, take the example of a UBO update, at a point where you are not otherwise needing to flush the batch (ie. swapbuffers or FBO switch).  A straightforward gl driver for a tiler would need to flush the current batch, so each of the draws before the UBO update would see the old state, and each of the draws after the UBO update would see the new state.

Enter resource shadowing and batch reordering.  Two reasonably big (ie. touches a lot of the code) changes in the driver which combine to avoid these extra batch flushes, as much as possible.

Resource shadowing is allocating a new backing GEM buffer object (BO) for the resource (texture/UBO/VBO/etc), and if necessary copying parts of the BO contents to the new buffer (back-blit).

So for the example of the UBO update, rather than taking the 16MB+16MB (or more) hit of a tile flush, why not just create two versions of the UBO.  It might involve copying a few KB's of UBO (ie. whatever was not overwritten by the game), but that is a lot less than 32MB?

But of course, it is not that simple.  Was the buffer or texture level mapped with GL_MAP_INVALIDATE_BUFFER_BIT or GL_MAP_INVALIDATE_RANGE_BIT?  (Or GL API that implies the equivalent, although fortunately as a gallium driver we don't have to care so much about all the various different GL paths that amount to the same thing for the hw.)  For a texture with mipmap levels, we unfortunately don't know at the time where we need to create the new shadow BO whether the next GL calls will glGenerateMipmap() or upload the remaining mipmap levels.  So there is a bit of complexity in handling all the cases properly.  There may be a few more cases we could handle without falling back to flushing the current batch, but for now we handle all the common cases.

The batch re-ordering component of this allows any potential back-blits from the shadow'd BO to the new BO (when resource shadowing kicks in), to be split out into a separate batch.  The resource/dependency tracking between batches and resources (ie. if various batches need to read from a given resource, we need to know that so they can be executed before something writes to the resource) lets us know which order to flush various in-flight batches to achieve correct results.  Note that this is partly because we use util_blitter, which turns any internally generated resource-shadowing back-blits into normal draw calls (since we don't have a dedicated blit pipe).. but this approach also handles the unnecessary FBO switch case for free.

Unfortunately, the batch re-ordering required a bit of an overhaul about how cmdstream buffers are handled, which required changes in all layers of the stack (mesa + libdrm + kernel).  The kernel changes are in drm-next for 4.8 and libdrm parts are in the latest libdrm release.  And while things will continue to work with a new userspace and old kernel, all these new optimizations will be disabled.

(And, while there is a growing number of snapdragon/adreno SBC's and phones/tablets getting upstream attention, if you are stuck on a downstream 3.10 kernel, look here.)

And for now, even with a new enough kernel, for the time being reorder support is not enabled by default.  There are a couple more piglit tests remaining to investigate, but I'll probably flip it to be enabled by default (if you have a new enough kernel) before the next mesa release branch.  Until then, use FD_MESA_DEBUG=reorder (and once the default is switched, that would be FD_MESA_DEBUG=noreorder to disable).

I'll cover the implementation and tricks to keep the CPU overhead of all this extra bookkeeping small later (probably at XDC2016), since this post is already getting rather long.  But the juicy bits: ~30% gain in supertuxkart (new render engine) and ~20% gain in manhattan are the big winners.  In general at least a few percent gain in most things I looked at, generally in the 5-10% range.

July 27, 2016

Please note that the systemd.conf 2016 Call for Participation ends on Monday, on Aug. 1st! Please send in your talk proposal by then! We’ve already got a good number of excellent submissions, but we are very interested in yours, too!

We are looking for talks on all facets of systemd: deployment, maintenance, administration, development. Regardless of whether you use it in the cloud, on embedded, on IoT, on the desktop, on mobile, in a container or on the server: we are interested in your submissions!

In addition to proposals for talks for the main conference, we are looking for proposals for workshop sessions held during our Workshop Day (the first day of the conference). The workshop format consists of a day of 2-3h training sessions, that may cover any systemd-related topic you'd like. We are both interested in submissions from the developer community as well as submissions from organizations making use of systemd! Introductory workshop sessions are particularly welcome, as the Workshop Day is intended to open up our conference to newcomers and people who aren't systemd gurus yet, but would like to become more fluent.

For further details on the submissions we are looking for and the CfP process, please consult the CfP page and submit your proposal using the provided form!

ALSO: Please sign up for the conference soon! Only a limited number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) Please sign up here for the conference!

AND OF COURSE: We are also looking for more sponsors for systemd.conf! If you are working on systemd-related projects, or make use of it in your company, please consider becoming a sponsor of systemd.conf 2016! Without our sponsors we couldn't organize systemd.conf 2016!

Thank you very much, and see you in Berlin!

July 20, 2016

Don't panic. Of course it isn't. Stop typing that angry letter to the editor and read on. I just picked that title because it's clickbait and these days that's all that matters, right?

With the release of libinput 1.4 and the newest feature to add tablet pad mode switching, we've now finished the TODO list we had when libinput was first conceived. Let's see what we have in libinput right now:

  • keyboard support (actually quite boring)
  • touchscreen support (actually quite boring too)
  • support for mice, including middle button emulation where needed
  • support for trackballs including the ability to use them rotated and to use button-based scrolling
  • touchpad support, most notably:
    • proper multitouch support on touchpads [1]
    • two-finger scrolling and edge scrolling
    • tapping, tap-to-drag and drag-lock (all configurable)
    • pinch and swipe gestures
    • built-in palm and thumb detection
    • smart disable-while-typing without the need for an external process like syndaemon
    • more predictable touchpad behaviours because everything is based on physical units [2]
    • a proper API to allow for kinetic scrolling on a per-widget basis
  • tracksticks work with middle button scrolling and communicate with the touchpad where needed
  • tablet support, most notably:
    • each tool is a separate entity with its own capabilities
    • the pad itself is a separate entity with its own capabilities and events
    • mode switching is exported by the libinput API and should work consistently across callers
  • a way to identify if multiple kernel devices belong to the same physical device (libinput device groups)
  • a reliable test suite
  • Documentation!
The side-effect of libinput is that we are also trying to fix the rest of the stack where appropriate. Mostly this meant pushing stuff into systemd/udev so far, with the odd kernel fix as well. Specifically the udev bits means we
  • know the DPI density of a mouse
  • know whether a touchpad is internal or external
  • fix up incorrect axis ranges on absolute devices (mostly touchpads)
  • try to set the trackstick sensitivity to something sensible
  • know when the wheel click is less/more than the default 15 degrees
And of course, the whole point of libinput is that it can be used from any Wayland compositor and take away most of the effort of implementing an input stack. GNOME, KDE and enlightenment already uses libinput, and so does Canonical's Mir. And some distribution use libinput as the default driver in X through xf86-input-libinput (Fedora 22 was the first to do this). So overall libinput is already quite a success.

The hard work doesn't stop of course, there are still plenty of areas where we need to be better. And of course, new features come as HW manufacturers bring out new hardware. I already have touch arbitration on my todo list. But it's nice to wave at this big milestone as we pass it into the way to the glorious future of perfect, bug-free input. At this point, I'd like to extend my thanks to all our contributors: Andreas Pokorny, Benjamin Tissoires, Caibin Chen, Carlos Garnacho, Carlos Olmedo Escobar, David Herrmann, Derek Foreman, Eric Engestrom, Friedrich Schöller, Gilles Dartiguelongue, Hans de Goede, Jackie Huang, Jan Alexander Steffens (heftig), Jan Engelhardt, Jason Gerecke, Jasper St. Pierre, Jon A. Cruz, Jonas Ådahl, JoonCheol Park, Kristian Høgsberg, Krzysztof A. Sobiecki, Marek Chalupa, Olivier Blin, Olivier Fourdan, Peter Frühberger, Peter Hutterer, Peter Korsgaard, Stephen Chandler Paul, Thomas Hindoe Paaboel Andersen, Tomi Leppänen, U. Artie Eoff, Velimir Lisec.

Finally: libinput was started by Jonas Ådahl in late 2013, so it's already over 2.5 years old. And the git log shows we're approaching 2000 commits and a simple LOCC says over 60000 lines of code. I would also like to point out that the vast majority of commits were done by Red Hat employees, I've been working on it pretty much full-time since 2014 [3]. libinput is another example of Red Hat putting money, time and effort into the less press-worthy plumbing layers that keep our systems running. [4]

[1] Ironically, that's also the biggest cause of bugs because touchpads are terrible. synaptics still only does single-finger with a bit of icing and on bad touchpads that often papers over hardware issues. We now do that in libinput for affected hardware too.
[2] The synaptics driver uses absolute numbers, mostly based on the axis ranges for Synaptics touchpads making them unpredictable or at least different on other touchpads.
[3] Coincidentally, if you see someone suggesting that input is easy and you can "just do $foo", their assumptions may not match reality
[4] No, Red Hat did not require me to add this. I can pretty much write what I want in this blog and these opinions are my own anyway and don't necessary reflect Red Hat yadi yadi ya. The fact that I felt I had to add this footnote to counteract whatever wild conspiracy comes up next is depressing enough.

July 19, 2016
(email sent to mesa-devel list).

I was waiting for an open source driver to appear when I realised I should really just write one myself, some talking with Bas later, and we decided to see where we could get.

This is the point at which we were willing to show it to others, it's not really a vulkan driver yet, so far it's a vulkan triangle demos driver.

It renders the tri and cube demos from the vulkan loader,
and the triangle demo from Sascha Willems demos
and the Vulkan CTS smoke tests (all 4 of them one of which draws a triangle).

There is a lot of work to do, and it's at the stage where we are seeing if anyone else wants to join in at the start, before we make too many serious design decisions or take a path we really don't want to.

So far it's only been run on Tonga and Fiji chips I think, we are hoping to support radeon kernel driver for SI/CIK at some point, but I think we need to get things a bit further on VI chips first.

The code is currently here:

There is a not-interesting branch which contains all the pre-history which might be useful for someone else bringing up a vulkan driver on other hardware.

The code is pretty much based on the Intel anv driver, with the winsys ported from gallium driver,
and most of the state setup from there. Bas wrote the code to connect NIR<->LLVM IR so we could reuse it in the future for SPIR-V in GL if required. It also copies AMD addrlib over, (this should be shared).

Also we don't do SPIR-V->LLVM direct. We use NIR as it has the best chance for inter shader stage optimisations (vertex/fragment combined) which neither SPIR-V or LLVM handles for us, (nir doesn't do it yet but it can).

If you want to submit bug reports, they will only be taken seriously if accompanied by working patches at this stage, and we've no plans to merge to master yet, but open to discussion on when we could do that and what would be required.
I will be presenting a lightning talk during this year's GUADEC, and running a contest related to what I will be presenting.


To enter the contest, you will need to create a Flatpak for a piece of software that hasn't been flatpak'ed up to now (application, runtime or extension), hosted in a public repository.

You will have to send me an email about the location of that repository.

I will choose a winner amongst the participants, on the eve of the lightning talks, depending on, but not limited to, the difficulty of packaging, the popularity of the software packaged and its redistributability potential.

You can find plenty of examples (and a list of already packaged applications and runtimes) on this Wiki page.


A piece of hardware that you can use to replicate my presentation (or to replicate my attempts at a presentation, depending ;). You will need to be present during my presentation at GUADEC to claim your prize.

Good luck to one and all!

I finally unlazied and moved my blog away from the Google mothership to something simply, fast and statically generated. It’s built on Jekyll, hosted on github. It’s not quite as fancy as the old one, but with some googling I figured out how to add pages for tags and an archive section, and that’s about all that’s really needed.

Comments are gone too, because I couldn’t be bothered, and because everything seems to add Orwellian amounts of trackers. Ping me on IRC, by mail or on twitter instead. The share buttons are also just plain links now without tracking for Twitter (because I’m there) and G+ (because all the cool kernel hackers are there, but I’m not cool enough).

And in case you wonder why I blatter for so long about this change: I need a new blog entry to double check that the generated feeds are still at the right spots for the various planets to pick them up …

July 18, 2016

Please note that the systemd.conf 2016 Call for Participation ends in less than two weeks, on Aug. 1st! Please send in your talk proposal by then! We’ve already got a good number of excellent submissions, but we are interested in yours even more!

We are looking for talks on all facets of systemd: deployment, maintenance, administration, development. Regardless of whether you use it in the cloud, on embedded, on IoT, on the desktop, on mobile, in a container or on the server: we are interested in your submissions!

In addition to proposals for talks for the main conference, we are looking for proposals for workshop sessions held during our Workshop Day (the first day of the conference). The workshop format consists of a day of 2-3h training sessions, that may cover any systemd-related topic you'd like. We are both interested in submissions from the developer community as well as submissions from organizations making use of systemd! Introductory workshop sessions are particularly welcome, as the Workshop Day is intended to open up our conference to newcomers and people who aren't systemd gurus yet, but would like to become more fluent.

For further details on the submissions we are looking for and the CfP process, please consult the CfP page and submit your proposal using the provided form!

And keep in mind:

REMINDER: Please sign up for the conference soon! Only a limited number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) Please sign up here for the conference!

AND OF COURSE: We are also looking for more sponsors for systemd.conf! If you are working on systemd-related projects, or make use of it in your company, please consider becoming a sponsor of systemd.conf 2016! Without our sponsors we couldn't organize systemd.conf 2016!

Thank you very much, and see you in Berlin!

July 15, 2016

More and more distros are switching to libinput by default. That's a good thing but one side-effect is that the synclient tool does not work anymore [1], it just complains that "Couldn't find synaptics properties. No synaptics driver loaded?"

What is synclient? A bit of history first. Many years ago the only way to configure input devices was through xorg.conf options, there was nothing that allowed for run-time configuration. The Xorg synaptics driver found a solution to that: the driver would initialize a shared memory segment that kept the configuration options and a little tool, synclient (synaptics client), would know about that segment. Calling synclient with options would write to that SHM segment and thus toggle the various options at runtime. Driver and synclient had to be of the same version to know the layout of the segment and it's about as secure as you expect it to be. In 2008 I added input device properties to the server (X Input Extension 1.5 and it's part of 2.0 as well of course). Rather than the SHM segment we now had a generic API to talk to the driver. The API is quite simple, you effectively have two keys (device ID and property number) and you can set any value(s). Properties literally support just about anything but drivers restrict what they allow on their properties and which value maps to what. For example, to enable left-finger tap-to-click in synaptics you need to set the 5th byte of the "Synaptics Tap Action" property to 1.

xinput, a commandline tool and debug helper, has a generic API to change those properties so you can do things like xinput set-prop "device name" "property name" 1 [2]. It does a little bit under the hood but generally it's pretty stupid. You can run xinput set-prop and try to set a value that's out of range, or try to switch from int to float, or just generally do random things.

We were able to keep backwards compatibility in synclient, so where before it would use the SHM segment it would now use the property API, without the user interface changing (except the error messages are now standard Xlib errors complaining about BadValue, BadMatch or BadAccess). But synclient and xinput use the same API to talk to the server and the server can't tell the difference between the two.

Fast forward 8 years and now we have libinput, wrapped by the xf86-input-libinput driver. That driver does the same as synaptics, the config toggles are exported as properties and xinput can read and change them. Because really, you do the smart work by selecting the right property names and values and xinput just passes on the data. But synclient is broken now, simply because it requires the synaptics driver and won't work with anything else. It checks for a synaptics-specific property ("Synaptics Edges") and if that doesn't exists it complains with "Couldn't find synaptics properties. No synaptics driver loaded?". libinput doesn't initialise that property, it has its own set of properties. We did look into whether it's possible to have property-compatibility with synaptics in the libinput driver but it turned out to be a huge effort, flaky reliability at best (not all synaptics options map into libinput options and vice versa) and the benefit was quite limited. Because, as we've been saying since about 2009 - your desktop environment should take over configuration of input devices, hand-written scripts are dodo-esque.

So if you must insist on shellscripts to configure your input devices use xinput instead. synclient is like fsck.ext2, on that glorious day you switch to btrfs it won't work because it was only designed with one purpose in mind.

[1] Neither does syndaemon btw but it's functionality is built into libinput so that doesn't matter.
[2] xinput set-prop --type=int --format=32 "device name" "hey I have a banana" 1 2 3 4 5 6 and congratulations, you've just created a new property for all X clients to see. It doesn't do anything, but you could use those to attach info to devices. If anything was around to read that.

July 14, 2016

xinput is a commandline tool to change X device properties. Specifically, it's a generic interface to change X input driver configuration at run-time, used primarily in the absence of a desktop environment or just for testing things. But there's a feature of xinput that many don't appear to know: it resolves device and property names correctly. So plenty of times you see advice to run a command like this:

xinput set-prop 15 281 1
This is bad advice, it's almost impossible to figure out what this is supposed to do, it depends on the device ID never changing (spoiler: it will) and the property number never changing (spoiler: it will). Worst case, you may suddenly end up setting a different property on a different device and you won't even notice. Instead, just use the built-in name resolution features of xinput:

xinput set-prop "SynPS/2 Synaptics TouchPad" "libinput Tapping Enabled" 1
This command will work regardless of the device ID for the touchpad and regardless of the property number. Plus it's self-documenting. This has been possible for many many years, so please stop using the number-only approach.

July 13, 2016

In case you haven’t heard yet, with the recently announced Mesa 12.0 release, Intel gen8+ GPUs expose OpenGL 4.3, which is quite a big leap from the previous OpenGL 3.3!

OpenGL 4.3The Mesa i965 Intel driver now exposes OpenGL 4.3 on Broadwell and later!

Although this might surprise some, the truth is that even if the i965 driver only exposed OpenGL 3.3 it had been exposing many of the OpenGL 4.x extensions for quite some time, however, there was one OpenGL 4.0 extension in particular that was still missing and preventing the driver from exposing a higher version: ARB_gpu_shader_fp64 (fp64 for short). There was a good reason for this: it is a very large feature that has been in the works by Intel first and Igalia later for quite some time. We first started to work on this as far back as November 2015 and by that time Intel had already been working on it for months.

I won’t cover here what made this such a large effort because there would be a lot of stuff to cover and I don’t feel like spending weeks writing a series of posts on the subject :). Hopefully I will get a chance to talk about all that at XDC in September, so instead I’ll focus on explaining why we only have this working in gen8+ at the moment and the status of gen7 hardware.

The plan for ARB_gpu_shader_fp64 was always to focus on gen8+ hardware (Broadwell and later) first because it has better support for the feature. I must add that it also has fewer hardware bugs too, although we only found out about that later ;). So the plan was to do gen8+ and then extend the implementation to cover the quirks required by gen7 hardware (IvyBridge, Haswell, ValleyView).

At this point I should explain that Intel GPUs have two code generation backends: scalar and vector. The main difference between both backends is that the vector backend (also known as align16) operates on vectors (surprise, right?) and has native support for things like swizzles and writemasks, while the scalar backend (known as align1) operates on scalars, which means that, for example, a vec4 GLSL operation running is broken up into 4 separate instructions, each one operating on a single component. You might think that this makes the scalar backend slower, but that would not be accurate. In fact it is usually faster because it allows the GPU to exploit SIMD better than the vector backend.

The thing is that different hardware generations use one backend or the other for different shader stages. For example, gen8+ used to run Vertex, Fragment and Compute shaders through the scalar backend and Geometry and Tessellation shaders via the vector backend, whereas Haswell and IvyBridge use the vector backend also for Vertex shaders.

Because you can use 64-bit floating point in any shader stage, the original plan was to implement fp64 support on both backends. Implementing fp64 requires a lot of changes throughout the driver compiler backends, which makes the task anything but trivial, but the vector backend is particularly difficult to implement because the hardware only supports 32-bit swizzles. This restriction means that a hardware swizzle such as XYZW only selects components XY in a dvecN and therefore, there is no direct mechanism to access components ZW. As a consequence, dealing with anything bigger than a dvec2 requires more creative solutions, which then need to face some other hardware limitations and bugs, etc, which eventually makes the vector backend require a significantly larger development effort than the scalar backend.

Thankfully, gen8+ hardware supports scalar Geometry and Tessellation shaders and Intel‘s Kenneth Graunke had been working on enabling that for a while. When we realized that the vector fp64 backend was going to require much more effort than what we had initially thought, he gave a final push to the full scalar gen8+ implementation, which in turn allowed us to have a full fp64 implementation for this hardware and expose OpenGL 4.0, and soon after, OpenGL 4.3.

That does not mean that we don’t care about gen7 though. As I said above, the plan has always been to bring fp64 and OpenGL4 to gen7 as well. In fact, we have been hard at work on that since even before we started sending the gen8+ implementation for review and we have made some good progress.

Besides addressing the quirks of fp64 for IvyBridge and Haswell (yes, they have different implementation requirements) we also need to implement the full fp64 vector backend support from scratch, which as I said, is not a trivial undertaking. Because Haswell seems to require fewer changes we have started with that and I am happy to report that we have a working version already. In fact, we have already sent a small set of patches for review that implement Haswell‘s requirements for the scalar backend and as I write this I am cleaning-up an initial implementation of the vector backend in preparation for review (currently at about 100 patches, but I hope to trim it down a bit before we start the review process). IvyBridge and ValleView will come next.

The initial implementation for the vector backend has room for improvement since the focus was on getting it working first so we can expose OpenGL4 in gen7 as soon as possible. The good thing is that it is more or less clear how we can improve the implementation going forward (you can see an excellent post by Curro on that topic here).

You might also be wondering about OpenGL 4.1’s ARB_vertex_attrib_64bit, after all, that kind of goes hand in hand with ARB_gpu_shader_fp64 and we implemented the extension for gen8+ too. There is good news here too, as my colleague Juan Suárez has already implemented this for Haswell and I would expect it to mostly work on IvyBridge as is or with minor tweaks. With that we should be able to expose at least OpenGL 4.2 on all gen7 hardware once we are done.

So far, implementing ARB_gpu_shader_fp64 has been quite the ride and I have learned a lot of interesting stuff about how the i965 driver and Intel GPUs operate in the process. Hopefully, I’ll get to talk about all this in more detail at XDC later this year. If you are planning to attend and you are interested in discussing this or other Mesa stuff with me, please find me there, I’ll be looking forward to it.

Finally, I’d like to thank both Intel and Igalia for supporting my work on Mesa and i965 all this time, my igalian friends Samuel Iglesias, who has been hard at work with me on the fp64 implementation all this time, Juan Suárez and Andrés Gómez, who have done a lot of work to improve the fp64 test suite in Piglit and all the friends at Intel who have been helping us in the process, very especially Connor Abbot, Francisco Jerez, Jason Ekstrand and Kenneth Graunke.

July 11, 2016

In an earlier post, I explained how we added graphics tablet pad support to libinput. Read that article first, otherwise this article here will be quite confusing.

A lot of tablet pads have mode-switching capabilities. Specifically, they have a set of LEDs and pressing one of the buttons cycles the LEDs. And software is expected to map the ring, strip or buttons to different functionality depending on the mode. A common configuration for a ring or strip would be to send scroll events in mode 1 but zoom in/out when in mode 2. On the Intuos Pro series tablets that mode switch button is the one in the center of the ring. On the Cintiq 21UX2 there are two sets of buttons, one left and one right and one mode toggle button each. The Cintiq 24HD is even more special, it has three separate buttons on each side to switch to a mode directly (rather than just cycling through the modes).

In the upcoming libinput 1.4 we will have mode switching support in libinput, though modes themselves have no real effect within libinput, it is merely extra information to be used by the caller. The important terms here are "mode" and "mode group". A mode is a logical set of button, strip and ring functions, as interpreted by the compositor or the client. How they are used is up to them as well. The Wacom control panels for OS X and Windows allow mode assignment only to the strip and rings while the buttons remain in the same mode at all times. We assign a mode to each button so a caller may provide differing functionality on each button. But that's optional, having a OS X/Windows-style configuration is easy, just ignore the button modes.

A mode group is a physical set of buttons, strips and rings that belong together. On most tablets there is only one mode group but tablets like the Cintiq 21UX2 and the 24HD have two independently controlled mode groups - one left and one right. That's all there is to mode groups, modes are a function of mode groups and can thus be independently handled. Each button, ring or strip belongs to exactly one mode group. And finally, libinput provides information about which button will toggle modes or whether a specific event has toggled the mode. Documentation and a starting point for which functions to look at is available in the libinput documentation.

Mode switching on Wacom tablets is actually software-controlled. The tablet relies on some daemon running to intercept button events and write to the right sysfs files to toggle the LEDs. In the past this was handled by e.g. a callout by gnome-settings-daemon. The first libinput draft implementation took over that functionality so we only have one process to handle the events. But there are a few issues with that approach. First, we need write access to the sysfs file that exposes the LED. Second, running multiple libinput instances would result in conflicts during LED access. Third, the sysfs interface is decidedly nonstandard and quite quirky to handle. And fourth, the most recent device, the Express Key Remote has hardware-controlled LEDs.

So instead we opted for a two-factor solution: the non-standard sysfs interface will be deprecated in favour of a proper kernel LED interface (/sys/class/leds/...) with the same contents as other LEDs. And second, the kernel will take over mode switching using LED triggers that are set up to cover the most common case - hitting a mode toggle button changes the mode. Benjamin Tissoires is currently working on those patches. Until then, libinput's backend implementation will just pretend that each tablet only has one mode group with a single mode. This allows us to get the rest of the userstack in place and then, once the kernel patches are in a released kernel, switch over to the right backend.

June 25, 2016

sign-big-150dpi-magnified-name-200x200I’m sad to say it’s the end of the road for me with Gentoo, after 13 years volunteering my time (my “anniversary” is tomorrow). My time and motivation to commit to Gentoo have steadily declined over the past couple of years and eventually stopped entirely. It was an enormous part of my life for more than a decade, and I’m very grateful to everyone I’ve worked with over the years.

My last major involvement was running our participation in the Google Summer of Code, which is now fully handed off to others. Prior to that, I was involved in many things from migrating our X11 packages through the Big Modularization and maintaining nearly 400 packages to serving 6 terms on the council and as desktop manager in the pre-council days. I spent a long time trying to change and modernize our distro and culture. Some parts worked better than others, but the inertia I had to fight along the way was enormous.

No doubt I’ve got some packages floating around that need reassignment, and my retirement bug is already in progress.

Thanks, folks. You can reach me by email using my nick at this domain, or on Twitter, if you’d like to keep in touch.

Tagged: gentoo,
June 23, 2016


I’ve been fortunate enough lately to attend the largest virtual reality professional event/conference : SVVR. This virtual reality conference’s been held each year in the Silicon Valley for 3 years now. This year, it showcased more than 100 VR companies on the exhibit floor and welcomed more than 1400 VR professionals and enthusiasts from all around the world. As a VR enthusiast myself, I attended the full 3-day conference and met most of the exhibitors and I’d like to summarize my thoughts, and the things I learned below, grouped under various themes. This post is by no means exhaustive and consists of my own, personal opinions.


I realize that content creation for VR is really becoming the one area where most players will end up working. Hardware manufacturers and platform software companies are building the VR infrastructure as we speak (and it’s already comfortably usable), but as we move along and standards become more solid, I’m pretty sure we’re going to see lots and lots of new start-ups in the VR Content world, creating immersive games, 360 video contents, live VR events, etc… Right now, the realms of deployment possibilities for a content developer is not really elaborate. The vast majority of content creators are targeting the Unity3D plug-in, since it’s got built-in support for virtually all VR devices there is on the market like the Oculus family of headsets, HTC Vive, PlayStation VR, Samsung’s GearVR and even generic D3D or OpenGL-based applications on PC/Mac/Linux.

2 types of content

There really is two main types of VR content out there. First, 3D virtual artificially-generated content and 360 real-life captured content.


The former being what we usually refer to when thinking about VR, that is, computer-generated 3D worlds, e.g. in games, in which VR user can wander and interact. This is usually the kind of contents used in VR games, but also in VR applications, like Google’s great drawing app called TiltBrush (more info below). Click here to see a nice demo video!

newThe latter is everything that’s not generated but rather “captured” from real-life and projected or rendered in the VR space with the use of, most commonly, spherical projections and post-processing stitching and filtering. Simply said, we’re talking about 360 videos here (both 2D and 3D). Usually, this kind of contents does not let VR users interact with the VR world as “immersively” as the computer-generated 3D contents. It’s rather “played-back” and “replayed” just like regular online television series, for example, except for the fact that watchers can “look around”.

At SVVR2016, there were so many exhibitors doing VR content… Like InnerVision VR, Baobab Studios, SculptVR, MiddleVR, Cubicle ninjas, etc… on the computer-generated side, and Facade TV, VR Sports, Koncept VR, etc… on the 360 video production side.


Personally, I think tracking is by far the most important factor when considering the whole VR user experience. You have to actually try the HTC Vive tracking system to understand. The HTC Vive uses two “Lighthouse” camera towers placed in the room to let you track a larger space, something close to 15′ x 15′ ! I tried it a lot of times and tracking always seemed to keep pretty solid and constant. With the Vive you can literally walk in the VR space, zig-zag, leap and dodge without losing detection. On that front, I think competition is doing quite poorly. For example, Oculus’ CV1 is only tracking your movement from the front and the tracking angle  is pretty narrow… tracking was often lost when I faced away just a little… disappointing!

Talking about tracking, one of the most amazing talks was Leap Motion CTO David Holz’s demo of his brand new ‘Orion’, which is a truly impressive hand tracking camera with very powerful detection algos and very, very low latency. We could only “watch” David interact, but it looked so natural !  Check it out for yourself !


Audio is becoming increasingly crucial to the VR work flow since it adds so much to the VR experience. It is generally agreed in the VR community that awesome, well 3D-localised audio that seems “real” can add a lot of realism even to the visuals. At SVVR2016, there were a few audio-centric exhibitors like Ossic and Subpac. The former is releasing a kickstarter-funded 3D headset that lets you “pan” stereo audio content by rotating your head left-right. The latter is showcasing a complete body suit using tactile transducers and vibrotactile membranes to make you “feel” audio. The goal of this article is not to review specific technologies, but to discuss every aspects/domains part of the VR experience and, when it comes to audio, I unfortunately feel we’re still at the “3D sound is enough” level, but I believe it’s not.

See, proper audio 3D localization is a must of course. You obviously do no want to play a VR game where a dog appearing on your right is barking on your left!… nor do you want to have the impression a hovercraft is approaching up ahead when it’s actually coming from the back. Fortunately, we now have pretty good audio engines that correctly render audio coming from anywhere around you with good front/back discrimination. A good example of that is 3Dception from TwoBigEars. 3D specialization of audio channels is a must-have and yet, it’s an absolute minimum in my opinion. Try it for yourself ! Most of today’s VR games have coherent sound, spatially, but most of the time, you just do not believe sound is actually “real”. Why ?

Well, there are a number of reasons going from “limited audio diversity” (limited number of objects/details found in audio feed… like missing tiny air flows/sounds, user’s respiration or room’s ambient noise level) to limited  sound cancellation capability (ability to suppress high-pitched ambient sounds coming from the “outside” of the game) but I guess one of the most important factors is simply the way audio is recorded and rendered in our day-to day cheap stereo headset… A lot of promises is brought with binaural recording and stereo-to-binaural conversion algorithm. Binaural recording is a technique that records audio through two tiny omni microphones located under diaphragm structures resembling the human ears, so that audio is bounced back just like it is being routed through the human ears before reaching the microphones. The binaural audio experience is striking and the “stereo” feeling is magnified. It is very difficult to explain, you have to hear it for yourself. Talking about ear structure that has a direct impact on audio spectrum, I think one of the most promising techniques moving forward for added audio realism will be the whole geometry-based audio modeling field, where you can basically render sound as if it had actually been reflected on a computed-generated 3D geometry. Using such vrworks-audio-planmodels, a dog barking in front of a tiled metal shed will sound really differently than the same dog barking near a wooden chalet. The brain does pick up those tiny details and that’s why you find guys like Nvidia releasing their brand new “Physically Based Acoustic Simulator Engine” in VrWorks.


Haptics is another very interesting VR domain that consists of letting users perceive virtual objects not through visual nor aural channels, but through touch. Usually, this sense of touch in VR experience is brought in by the use of special haptic wands that, using force feedback and other technologies, make you think that you are actually really pushing an object in the VR world.

You mostly find two types of haptic devices out there. Wand-based and Glove-based. Gloves for haptics are of course more natural to most users. It’s easy to picture yourself in a VR game trying to “feel” rain drops falling on your fingers or in an flight simulator, pushing buttons and really feeling them. However, by talking to many exhibitors at SVVR, it seems we’ll be stuck at the “feel button pushes” level for quite some time, as we’re very far from being able to render “textures” since spatial resolutions involved would simply be too high for any haptic technology that’s currently available. There are some pretty cool start-ups with awesome glove-based haptic technologies like Kickstarter-funded Neurodigital Technologies GloveOne or Virtuix’s Hands Omni.

Now, I’m not saying wand-based haptic technologies are outdated and not promising. In fact, I think they are more promising than gloves for any VR application that relies on “tools” like a painting app requiring you to use a brush or a remote-surgery medical application requiring you to use an actual scalpel ! When it comes to wands, tools and the like, the potential for haptic feedback is multiplied because you simply have more room to fit more actuators and gyros. I once tried an arm-based 3D joystick in a CAD application and I could swear I was really hitting objects with my design tool…  it was stunning !


If VR really takes off in the consumer mass market someday soon, it will most probably be social. That’s something I heard at SVVR2016 (paraphrased) in the very interesting talk by David Baszucki titled : “Why the future of VR is social”. I mean, in essence, let’s just take a look at current technology appropriation nowadays and let’s just acknowledge that the vast majority of applications rely on the “social” aspect, right ? People want to “connect”, “communicate” and “share”. So when VR comes around, why would it be suddenly different? Of course, gamers will want to play really immersive VR games and workers will want to use VR in their daily tasks to boost productivity, but most users will probably want to put on their VR glasses to talk to their relatives, thousands of miles away, as if they were sitting in the same room. See ? Even the gamers and the workers I referred to above will want to play or work “with other real people”. No matter how you use VR, I truly believe the social factor will be one of the most important ones to consider to build successful software. At SVVR 2016, I discovered a very interesting start-up that focused on the social VR experience. With mimesys‘s telepresence demo, using a HTC Vive controller, they had me collaborate on a painting with a “real” guy hooked to the same system, painting from his home apartment in France, some 9850 km away and I had a pretty good sense of his “presence”. The 3D geometry and rendered textures were not perfect, but it was good enough to have a true collaboration experience !


We’re only at the very beginning of this very exciting journey through Virtual Reality and it’s really difficult to predict what VR devices will look like in only 3-5 years from now because things are just moving so quickly… An big area I did not cover in my post and that will surely change of lot of parameters moving forward in the VR world is… AR – Augmented Reality:) Check out what’s MagicLeap‘s up to these days !



June 21, 2016
Early bird gets eaten by the Nyarlathotep
The more adventurous of you can use those (designed as embeddable) Lua scripts to transform your DRM-free downloads into Flatpaks.

The long-term goal would obviously be for this not to be needed, and for online games stores to ship ".flatpak" files, with metadata so we know what things are in GNOME Software, which automatically picks up the right voice/subtitle language, and presents its extra music and documents in the respective GNOME applications.
But in the meanwhile, and for the sake of the games already out there, there's flatpak-games. Note that lua-archive is still fiddly.
Support for a few Humble Bundle formats (some formats already are), grab-all RPMs and Debs, and those old Loki games is also planned.
It's late here, I'll be off to do some testing I think :)

PS: Even though I have enough programs that would fail to create bundles in my personal collection to accept "game donations", I'm still looking for original copies of Loki games. Drop me a message if you can spare one!

This is a very exciting day for me as two major projects I am deeply involved with are having a major launch. First of all Fedora Workstation 24 is out which crosses a few critical milestones for us. Maybe most visible is that this is the first time you can use the new graphical update mechanism in GNOME Software to take you from Fedora Workstation 23 to Fedora Workstation 24. This means that when you open GNOME Software it will show you an option to do a system upgrade to Fedora Workstation 24. We been testing and doing a lot of QA work around this feature so my expectation is that it will provide a smooth upgrade experience for you.
Fedora System Upgrade

The second major milestone is that we do feel Wayland is now in a state where the vast majority of users should be able to use it on a day to day basis. We been working through the kinks and resolving many corner cases during the previous 6 Months, with a lot of effort put into making sure that the interaction between applications running natively on Wayland and those running using XWayland is smooth. For instance one item we crossed off the list early in this development cycle was adding middle-mouse button cut and paste as we know that was a crucial feature for many long time linux users looking to make the switch. So once you updated I ask all of you to try switching to the Wayland session by clicking on the little cogwheel in the login screen, so that we get as much testing as possible of Wayland during the Fedora Workstation 24 lifespan. Feedback provided by our users during the Fedora Workstation 24 lifecycle will be a crucial information to allow us to make the final decision about Wayland as the default for Fedora Workstation 25. Of course the team will be working ardently during Fedora Workstation 24 to make sure we find and address any niggling issues left.

In addition to that there is also of course a long list of usability improvements, new features and bugfixes across the desktop, both coming in from our desktop team at Red Hat and from the GNOME community in general.

There was also the formal announcement of Flatpak today (be sure to read that press release), which is the great new technology for shipping desktop applications. For those of you who have read my previous blog entries you probably seen me talking about this technology using its old name xdg-app. Flatpak is an incredible piece of engineering designed by Alexander Larsson we developed alongside a lot of other components.
Because as Matthew Garret pointed out not long ago, unless we move away from X11 we can not really produce a secure desktop container technology, which is why we kept such a high focus on pushing Wayland forward for the last year. It is also why we invested so much time into Pinos which is as I mentioned in my original annoucement of the project our video equivalent of PulseAudio (and yes a proper Pinos website is getting close :). Wim Taymans who created Pinos have also been working on patches to PulseAudio to make it more suitable for using with sandboxed applications and those patches have recently been taken over by community member Ahmed S. Darwish who is trying to get them ready for merging into the main codebase.

We are feeling very confident about Flatpak as it has a lot of critical features designed in from the start. First of all it was built to be a cross distribution solution from day one, meaning that making Flatpak run on any major linux distribution out there should be simple. We already got Simon McVittie working on Debian support, we got Arch support and there is also an Ubuntu PPA that the team put together that allows you to run Flatpaks fully featured on Ubuntu. And Endless Mobile has chosen flatpak as their application delivery format going forward for their operating system.

We use the same base technologies as Docker like namespaces, bind mounts and cgroups for Flatpak, which means that any system out there wanting to support Docker images would also have the necessary components to support Flatpaks. Which means that we will also be able to take advantage of the investment and development happening around server side containers.

Flatpak is also heavy using another exciting technology, OSTree, which was originally developed by Colin Walters for GNOME. This technology is actually seeing a lot of investment and development these days as it became the foundation for Project Atomic, which is Red Hats effort to create an enterprise ready platform for running server side containers. OStree provides us with a lot of important features like efficient storage of application images and a very efficient transport mechanism. For example one core feature OSTree brings us is de-duplication of files which means you don’t need to keep multiple copies on your disk of the same file, so if ten Flatpak images share the same file, then you only keep one copy of it on your local disk.

Another critical feature of Flatpak is its runtime separation, which basically means that you can have different runtimes for some families of usecases. So for instance you can have a GNOME runtime that allows all your GNOME applications to share a lot of libraries yet giving you a single point for security updates to those libraries. So while we don’t want a forest of runtimes it does allow us to create a few important ones to cover certain families of applications and thus reduce disk usage further and improve system security.

Going forward we are looking at a lot of exciting features for Flatpak. The most important of these is the thing I mentioned earlier, Portals.
In the current release of flatpak you can choose between two options. Either make it completely sandboxed or not make it sandboxed at all. Portals are basically the way you can sandbox your application yet still allow it to interact with your general desktop and storage. For instance Pinos and PulseAudios role for containers is to provide such portals for handling audio and video. Of course more portals are needed and during the the GTK+ hackfest in Toronto last week a lot of time was spent on mapping out the roadmap for Portals. Expect more news about Portals as they are getting developed going forward.

I want to mention that we of course realize that a new technology like Flatpak should come with a high quality developer story, which is why Christian Hergert has been spending time working on support for Flatpak in the Builder IDE. There is some support in already, but expect to see us flesh this out significantly over the next Months. We are also working on adding more documentation to the Flatpak website, to cover how to integrate more build systems and similar with Flatpak.

And last, but not least Richard Hughes has been making sure we have great Flatpak support in Software in Fedora Workstation 24 ensuring that as an end user you shouldn’t have to care about if your application is a Flatpak or a RPM.

June 20, 2016

I'm back from the GTK hackfest in Toronto, Canada and mostly recovered from jetlag, so it's time to write up my notes on what we discussed there.

Despite the hackfest's title, I was mainly there to talk about non-GUI parts of the stack, and technologies that fit more closely in what could be seen as the platform than they do in GNOME. In particular, I'm interested in Flatpak as a way to deploy self-contained "apps" in a freedesktop-based, sandboxed runtime environment layered over the Universal Operating System and its many derivatives, with both binary and source compatibility with other GNU/Linux distributions.

I'm mainly only writing about discussions I was directly involved in: lots of what sounded like good discussion about the actual graphics toolkit went over my head completely :-) More notes, mostly from Matthias Clasen, are available on the GNOME wiki.

In no particular order:

Thinking with portals

We spent some time discussing Flatpak's portals, mostly on Tuesday. These are the components that expose a subset of desktop functionality as D-Bus services that can be used by contained applications: they are part of the security boundary between a contained app and the rest of the desktop session. Android's intents are a similar concept seen elsewhere. While the portals are primarily designed for Flatpak, there's no real reason why they couldn't be used by other app-containment solutions such as Canonical's Snap.

One major topic of discussion was their overall design and layout. Most portals will consist of a UX-independent part in Flatpak itself, together with a UX-specific implementation of any user interaction the portal needs. For example, the portal for file selection has a D-Bus service in Flatpak, which interacts with some UX-specific service that will pop up a standard UX-specific "Open" dialog — for GNOME and probably other GTK environments, that dialog is in (a branch of) GTK.

A design principle that was reiterated in this discussion is that the UX-independent part should do as much as possible, with the UX-specific part only carrying out the user interactions that need to comply with a particular UX design (in the GTK case, GNOME's design). This minimizes the amount of work that needs to be redone for other desktop or embedded environments, while still ensuring that the other environments can have their chosen UX design. In particular, it's important that, as much as possible, the security- and performance-sensitive work (such as data transport and authentication) is shared between all environments.

The aim is for portals to get the user's permission to carry out actions, while keeping it as implicit as possible, avoiding an "are you sure?" step where feasible. For example, if an application asks to open a file, the user's permission is implicitly given by them selecting the file in the file-chooser dialog and pressing OK: if they do not want this application to open a file at all, they can deny permission by cancelling. Similarly, if an application asks to stream webcam data, the UX we expect is for GNOME's Cheese app (or a similar non-GNOME app) to appear, open the webcam to provide a preview window so they can see what they are about to send, but not actually start sending the stream to the requesting app until the user has pressed a "Start" button. When defining the API "contracts" to be provided by applications in that situation, we will need to be clear about whether the provider is expected to obtain confirmation like this: in most cases I would anticipate that it is.

One security trade-off here is that we have to have a small amount of trust in the providing app. For example, continuing the example of Cheese as a webcam provider, Cheese could (and perhaps should) be a contained app itself, whether via something like Flatpak, an LSM like AppArmor or both. If Cheese is compromised somehow, then whenever it is running, it would be technically possible for it to open the webcam, stream video and send it to a hostile third-party application. We concluded that this is an acceptable trade-off: each application needs to be trusted with the privileges that it needs to do its job, and we should not put up barriers that are easy to circumvent or otherwise serve no purpose.

The main (only?) portal so far is the file chooser, in which the contained application asks the wider system to show an "Open..." dialog, and if the user selects a file, it is returned to the contained application through a FUSE filesystem, the document portal. The reference implementation of the UX for this is in GTK, and is basically a GtkFileChooserDialog. The intention is that other environments such as KDE will substitute their own equivalent.

Other planned portals include:

  • image capture (scanner/camera)
  • opening a specified URI
    • this needs design feedback on how it should work for non-http(s)
  • sharing content, for example on social networks (like Android's Sharing menu)
  • proxying joystick/gamepad input (perhaps via Wayland or FUSE, or perhaps by modifying libraries like SDL with a new input source)
  • network proxies (GProxyResolver) and availability (GNetworkMonitor)
  • contacts/address book, probably vCard-based
  • notifications, probably based on Notifications
  • video streaming (perhaps using Pinot, analogous to PulseAudio but for video)

Environment variables

GNOME on Wayland currently has a problem with environment variables: there are some traditional ways to set environment variables for X11 sessions or login shells using shell script fragments (/etc/X11/Xsession.d, /etc/X11/xinit/xinitrc.d, /etc/profile.d), but these do not apply to Wayland, or to noninteractive login environments like cron and systemd --user. We are also keen to avoid requiring a Turing-complete shell language during session startup, because it's difficult to reason about and potentially rather inefficient.

Some uses of environment variables can be dismissed as unnecessary or even unwanted, similar to the statement in Debian Policy §9.9: "A program must not depend on environment variables to get reasonable defaults." However, there are two common situations where environment variables can be necessary for proper OS integration: search-paths like $PATH, $XDG_DATA_DIRS and $PYTHONPATH (particularly necessary for things like Flatpak), and optionally-loaded modules like $GTK_MODULES and $QT_ACCESSIBILITY where a package influences the configuration of another package.

There is a stopgap solution in GNOME's gdm display manager, /usr/share/gdm/env.d, but this is gdm-specific and insufficiently expressive to provide the functionality needed by Flatpak: "set XDG_DATA_DIRS to its specified default value if unset, then add a couple of extra paths".

pam_env comes closer — PAM is run at every transition from "no user logged in" to "user can execute arbitrary code as themselves" — but it doesn't support .d fragments, which are required if we want distribution packages to be able to extend search paths. pam_env also turns off per-user configuration by default, citing security concerns.

I'll write more about this when I have a concrete proposal for how to solve it. I think the best solution is probably a PAM module similar to pam_env but supporting .d directories, either by modifying pam_env directly or out-of-tree, combined with clarifying what the security concerns for per-user configuration are and how they can be avoided.

Relocatable binary packages

On Windows and OS X, various GLib APIs automatically discover where the application binary is located and use search paths relative to that; for example, if C:\myprefix\bin\app.exe is running, GLib might put C:\myprefix\share into the result of g_get_system_data_dirs(), so that the application can ask to load app/data.xml from the data directories and get C:\myprefix\share\app\data.xml. We would like to be able to do the same on Linux, for example so that the apps in a Flatpak or Snap package can be constructed from RPM or dpkg packages without needing to be recompiled for a different --prefix, and so that other third-party software packages like the games on Steam and can easily locate their own resources.

Relatedly, there are currently no well-defined semantics for what happens when a .desktop file or a D-Bus .service file has Exec=./bin/foo. The meaning of Exec=foo is well-defined (it searches $PATH) and the meaning of Exec=/opt/whatever/bin/foo is obvious. When this came up in D-Bus previously, my assertion was that the meaning should be the same as in .desktop files, whatever that is.

We agreed to propose that the meaning of a non-absolute path in a .desktop or .service file should be interpreted relative to the directory where the .desktop or .service file was found: for example, if /opt/whatever/share/applications/foo.desktop says Exec=../../bin/foo, then /opt/whatever/bin/foo would be the right thing to execute. While preparing a mail to the freedesktop and D-Bus mailing lists proposing this, I found that I had proposed the same thing almost 2 years ago... this time I hope I can actually make it happen!

Flatpak and OSTree bug fixing

On the way to the hackfest, and while the discussion moved to topics that I didn't have useful input on, I spent some time fixing up the Debian packaging for Flatpak and its dependencies. In particular, I did my first upload as a co-maintainer of bubblewrap, uploaded ostree to unstable (with the known limitation that the grub, dracut and systemd integration is missing for now since I haven't been able to test it yet), got most of the way through packaging Flatpak 0.6.5 (which I'll upload soon), cherry-picked the right patches to make ostree compile on Debian 8 in an effort to make backports trivial, and spent some time disentangling a flatpak test failure which was breaking the Debian package's installed-tests. I'm still looking into ostree test failures on little-endian MIPS, which I was able to reproduce on a Debian porterbox just before the end of the hackfest.

OSTree + Debian

I also had some useful conversations with developers from Endless, who recently opened up a version of their OSTree build scripts for public access. Hopefully that information brings me a bit closer to being able to publish a walkthrough for how to deploy a simple Debian derivative using OSTree (help with that is very welcome of course!).

GTK life-cycle and versioning

The life-cycle of GTK releases has already been mentioned here and elsewhere, and there are some interesting responses in the comments on my earlier blog post.

It's important to note that what we discussed at the hackfest is only a proposal: a hackfest discussion between a subset of the GTK maintainers and a small number of other GTK users (I am in the latter category) doesn't, and shouldn't, set policy for all of GTK or for all of GNOME. I believe the intention is that the GTK maintainers will discuss the proposals further at GUADEC, and make a decision after that.

As I said before, I hope that being more realistic about API and ABI guarantees can avoid GTK going too far towards either of the possible extremes: either becoming unable to advance because it's too constrained by compatibility, or breaking applications because it isn't constrained enough. The current situation, where it is meant to be compatible within the GTK 3 branch but in practice applications still sometimes break, doesn't seem ideal for anyone, and I hope we can do better in future.


Thanks to everyone involved, particularly:

  • Matthias Clasen, who organised the hackfest and took a lot of notes
  • Allison Lortie, who provided on-site cat-herding and led us to some excellent restaurants
  • Red Hat Inc., who provided the venue (a conference room in their Toronto office), snacks, a lot of coffee, and several participants
  • my employers Collabora Ltd., who sponsored my travel and accomodation
June 17, 2016

limba-smallI wanted to write this blogpost since April, and even announced it in two previous posts, but never got to actually write it until now. And with the recent events in Snappy and Flatpak land, I can not defer this post any longer (unless I want to answer the same questions over and over on IRC ^^).

As you know, I develop the Limba 3rd-party software installer since 2014 (see this LWN article explaining the project better then I could do 😉 ) which is a spiritual successor to the Listaller project which was in development since roughly 2008. Limba got some competition by Flatpak and Snappy, so it’s natural to ask what the projects next steps will be.

Meeting with the competition

At last FOSDEM and at the GNOME Software sprint this year in April, I met with Alexander Larsson and we discussed the rather unfortunate situation we got into, with Flatpak and Limba being in competition.

Both Alex and I have been experimenting with 3rd-party app distribution for quite some time, with me working on Listaller and him working on Glick and Glick2. All these projects never went anywhere. Around the time when I started Limba, fixing design mistakes done with Listaller, Alex started a new attempt at software distribution, this time with sandboxing added to the mix and a new OSTree-based design of the software-distribution mechanism. It wasn’t at all clear that XdgApp, later to be renamed to Flatpak, would get huge backing by GNOME and later Red Hat, becoming a very promising candidate for a truly cross-distro software distribution system.

The main difference between Limba and Flatpak is that Limba allows modular runtimes, with things like the toolkit, helper libraries and programs being separate modules, which can be updated independently. Flatpak on the other hand, allows just one static runtime and enforces everything that is not in the runtime already to be bundled with the actual application. So, while a Limba bundle might depend on multiple individual other bundles, Flatpak bundles only have one fixed dependency on a runtime. Getting a compromise between those two concepts is not possible, and since the modular vs. static approach in Limba and Flatpak where fundamental, conscious design decisions, merging the projects was also not possible.

Alex and I had very productive discussions, and except for the modularity issue, we were pretty much on the same page in every other aspect regarding the sandboxing and app-distribution matters.

Sometimes stepping out of the way is the best way to achieve progress

So, what to do now? Obviously, I can continue to push Limba forward, but given all the other projects I maintain, this seems to be a waste of resources (Limba eats a lot of my spare time). Now with Flatpak and Snappy being available, I am basically competing with Canonical and Red Hat, who can make much more progress faster then I can do as a single developer. Also, Flatpaks bigger base of contributors compared to Limba is a clear sign which project the community favors more.

Furthermore, I started the Listaller and Limba projects to scratch an itch. When being new to Linux, it was very annoying to me to see some applications only being made available in compiled form for one distribution, and sometimes one that I didn’t use. Getting software was incredibly hard for me as a newbie, and using the package-manager was also unusual back then (no software center apps existed, only package lists). If you wanted to update one app, you usually needed to update your whole distribution, sometimes even to a development version or rolling-release channel, sacrificing stability.

So, if now this issue gets solved by someone else in a good way, there is no point in pushing my solution hard. I developed a tool to solve a problem, and it looks like another tool will fix that issue now before mine does, which is fine, because this longstanding problem will finally be solved. And that’s the thing I actually care most about.

I still think Limba is the superior solution for app distribution, but it is also the one that is most complex and requires additional work by upstream projects to use it properly. Which is something most projects don’t want, and that’s completely fine. 😉

And that being said: I think Flatpak is a great project. Alex has much experience in this matter, and the design of Flatpak is sound. It solves many issues 3rd-party app development faces in a pretty elegant way, and I like it very much for that. Also the focus on sandboxing is great, although that part will need more time to become really useful. (Aside from that, working with Alexander is a pleasure, and he really cares about making Flatpak a truly cross-distributional, vendor independent project.)

Moving forward

So, what I will do now is not to stop Limba development completely, but keep it going as a research project. Maybe one can use Limba bundles to create Flatpak packages more easily. We also discussed having Flatpak launch applications installed by Limba, which would allow both systems to coexist and benefit from each other. Since Limba (unlike Listaller) was also explicitly designed for web-applications, and therefore has a slightly wider scope than Flatpak, this could make sense.

In any case though, I will invest much less time in the Limba project. This is good news for all the people out there using the Tanglu Linux distribution, AppStream-metadata-consuming services, PackageKit on Debian, etc. – those will receive more attention 😉

An integral part of Limba is a web service called “LimbaHub” to accept new bundles, do QA on them and publish them in a public repository. I will likely rewrite it to be a service using Flatpak bundles, maybe even supporting Flatpak bundles and Limba bundles side-by-side (and if useful, maybe also support AppImageKit and Snappy). But this project is still on the drawing board.

Let’s see 🙂

P.S: If you come to Debconf in Cape Town, make sure to not miss my talks about AppStream and bundling 🙂

June 14, 2016

Allison Lortie has provoked a lot of comment with her blog post on a new proposal for how GTK is versioned. Here's some more context from the discussion at the GTK hackfest that prompted that proposal: there's actually quite a close analogy in how new Debian versions are developed.

The problem we're trying to address here is the two sides of a trade-off:

  • Without new development, a library (or an OS) can't go anywhere new
  • New development sometimes breaks existing applications

Historically, GTK has aimed to keep compatible within a major version, where major versions are rather far apart (GTK 1 in 1998, GTK 2 in 2002, GTK 3 in 2011, GTK 4 somewhere in the future). Meanwhile, fixing bugs, improving performance and introducing new features sometimes results in major changes behind the scenes. In an ideal world, these behind-the-scenes changes would never break applications; however, the world isn't ideal. (The Debian analogy here is that as much as we aspire to having the upgrade from one stable release to the next not break anything at all, I don't think we've ever achieved that in practice - we still ask users to read the release notes, even though ideally that wouldn't be necessary.)

In particular, the perceived cost of doing a proper ABI break (a fully parallel-installable GTK 4) means there's a strong temptation to make changes that don't actually remove or change C symbols, but are clearly an ABI break, in the sense that an application that previously worked and was considered correct no longer works. A prominent recent example is the theming changes in GTK 3.20: the ABI in terms of functions available didn't change, but what happens when you call those functions changed in an incompatible way. This makes GTK hard to rely on for applications outside the GNOME release cycle, which is a problem that needs to be fixed (without stopping development from continuing).

The goal of the plan we discussed today is to decouple the latest branch of development, which moves fast and sometimes breaks API, from the API-stable branches, which only get bug fixes. This model should look quite familiar to Debian contributors, because it's a lot like the way we release Debian and Ubuntu.

In Debian, at any given time we have a development branch (testing/unstable) - currently "stretch", the future Debian 9. We also have some stable branches, of which the most recent are Debian 8 "jessie" and Debian 7 "wheezy". Different users of Debian have different trade-offs that lead them to choose one or the other of these. Users who value stability and want to avoid unexpected changes, even at a cost in terms of features and fixes for non-critical bugs, choose to use a stable release, preferably the most recent; they only need to change what they run on top of Debian for OS API changes (for instance webapps, local scripts, or the way they interact with the GUI) approximately every 2 years, or perhaps less often than that with the Debian-LTS project supporting non-current stable releases. Meanwhile, users who value the latest versions and are willing to work with a "moving target" as a result choose to use testing/unstable.

The GTK analogy here is really quite close. In the new versioning model, library users who value stability over new things would prefer to use a stable-branch, ideally the latest; library users who want the latest features, the latest bug-fixes and the latest new bugs would use the branch that's the current focus of development. In practice we expect that the latter would be mostly GNOME projects. There's been some discussion at the hackfest about how often we'd have a new stable-branch: the fastest rate that's been considered is a stable-branch every 2 years, similar to Ubuntu LTS and Debian, but there's no consensus yet on whether they will be that frequent in practice.

How many stable versions of GTK would end up shipped in Debian depends on how rapidly projects move from "old-stable" to "new-stable" upstream, how much those projects' Debian maintainers are willing to patch them to move between branches, and how many versions the release team will tolerate. Once we reach a steady state, I'd hope that we might have 1 or 2 stable-branched versions active at a time, packaged as separate parallel-installable source packages (a lot like how we handle Qt). GTK 2 might well stay around as an additional active version just from historical inertia. The stable versions are intended to be fully parallel-installable, just like the situation with GTK 1.2, GTK 2 and GTK 3 or with the major versions of Qt.

For the "current development" version, I'd anticipate that we'd probably only ship one source package, and do ABI transitions for one version active at a time, a lot like how we deal with libgnome-desktop and the evolution-data-server family of libraries. Those versions would have parallel-installable runtime libraries but non-parallel-installable development files, again similar to libgnome-desktop.

At the risk of stretching the Debian/Ubuntu analogy too far, the intermediate "current development" GTK releases that would accompany a GNOME release are like Ubuntu's non-LTS suites: they're more up to date than the fully stable releases (Ubuntu LTS, which has a release schedule similar to Debian stable), but less stable and not supported for as long.

Hopefully this plan can meet both of its goals: minimize breakage for applications, while not holding back the development of new APIs.

June 09, 2016

During the OpenStack summit a few weeks ago, I had the chance to talk to some people about my experience on running open source projects. It turns out that after hanging out in communities and contributing to many projects for years, I may be able to provide some hindsight and an external eye to many of those who are new to it.

There are plenty of resource explaining how to run an open source projects out there. Today, I would like to take a different angle and emphasize what you should not socially do in your projects. This list comes from various open source projects I encountered these past years. I'm going to go through some of the bad practice I've spotted, in a random order, illustrated by some concrete example.

Seeing contributors as an annoyance

When software developers and maintainers are busy, there's one thing they don't need: more work. To many people, the instinctive reactions to external contribution is: damn, more work. And actually, it is.

Therefore, some maintainers tend to avoid that surplus of work: they state they don't want contributions, or make contributors feel un-welcomed. This can take a lot of different forms, from ignoring them to being unpleasant. It indeed avoids the immediate need to deal with the work that has been added on the maintainer shoulders.

This is one of the biggest mistake and misconception of open source. If people are sending you more work, you should do whatever it takes to feel them welcome so they continue working with you. They might pretty soon become the guys doing the work you are doing instead of you. Think: retirement!

Let's take a look at my friend Gordon, who I saw starting as a Ceilometer contributor in 2013. He was doing great code reviews, but he was actually giving me more work by catching bugs in my patches and sending patches I had to review. Instead of being a bully so he would stop making me rework my code and reviews his patches, I requested that we trust him even more by adding him as a core reviewer. time contribution.

And if they don't do this one-time contribution, they won't make it two. They won't make any. Those projects may have just lost their new maintainers.

Letting people only do the grunt work

When new contributors arrive and want to contribute to a particular project, they may have very different motivation. Some of them are users, but some of them are just people looking to see how it is to contribute. Getting the thrill of contribution, as an exercise, or as a willingness to learn and start contributing back to the ecosystem they use.

The usual response from maintainers is to push people into doing grunt work. That means doing jobs that have no interest, little value, and probably no direct impact on the project.

Some people actually have no problem with it, some have. Some will feel offended to do low impact work, and some will love it as soon as you give them some sort of acknowledgment. Be aware of it, and be sure to high-five people doing it. That's the only way to keep them around.

Not valorizing small contributions

When the first patch that comes in from a new contributor is a typo fix, what developers think? That they don't care, that you're wasting their precious time with your small contribution. And nobody cares about bad English in the documentation, don't they?

This is wrong. See my first contributions to home-assistant and Postmodern: I fixed typos in the documentation.

I contributed to Org-mode for a few years. My first patch to orgmode was about fixing a docstring. Then, I sent 56 patches, fixing bugs and adding fancy features and also wrote a few external modules. To this day, I'm still #16 in the top-committer list of Org-mode who contains 390 contributors. So not that would call a small contributor. I am sure the community is glad they did not despise my documentation fix.

Setting the bar too high for new comers

When new contributors arrive, their knowledge about the project, its context, and the technologies can vary largely. One of the mistakes people often make is to ask contributors too complicated things that they cannot realize. That scares them away (many people are going to be shy or introvert) and they may just disappear, feeling too stupid to help.

Before making any comment, you should not have any assumption about their knowledge. That should avoid such situation. You also should be very delicate when assessing their skills, as some people might feel vexed if you underestimate them too much.

Once that level has been properly evaluated (a few exchanges should be enough), you need to mentor to the right degree your contributor so it can blossom. It takes time and experience to master this, and you may likely lose some of them in the process, but it's a path every maintainer has to take.

Mentoring is a very important aspect of welcoming new contributors to your project, whatever it is. Pretty sure that applies nicely outside free software too.

Requiring people to make sacrifices with their lives

This is an aspect that varies a lot depending on the project and context, but it's really important. As a free software project, where most people will contribute on their own good will and sometimes spare time, you must not require them to make big sacrifices. This won't work.

One of the worst implementation of that is requiring people to fly 5 000 kilometers to meet in some place to discuss the project. This puts contributors in an unfair position, based on their ability to leave their family for a week, take a plane/boat/car/train, rent an hotel, etc. This is not good, and everything should be avoided to require people to do that in order to participate and feel included in the project and blend in your community. Don't get me wrong: that does not me social activities should be prohibited, on the contrary. Just avoid excluding people when you discuss any project.

The same apply to any other form of discussion that makes it complicated for everyone to participate: IRC meetings (it's hard for some people to book an hour, especially depending on the timezone they live in), video-conference (especially using non-free software), etc.

Everything that requires people to basically interact with the project in a synchronous manner for a period of time will put constraints on them that can make them uncomfortable.

The best medium is still e-mail and asynchronous derivative (bug trackers, etc), as it is asynchronous and allow people to work at their own pace at their own time.

Not having an (implicit) CoC

Codes of conduct seem to be a trendy topic (and a touchy subject), as more and more communities are opening to a wilder audience than they used to be – which is great.

Actually, all communities have a code of conduct, being written with black ink or being carried in everyone's mind unconsciously. Its form is a matter of community size and culture.

Now, depending on the size of your community and how you feel comfortable applying it, you may want to have it composed in a document, e.g. like Debian did.

Having a code of conduct does not transform your whole project community magically into a bunch of carebears following its guidance. But it provides an interesting point you can refer to as soon as you need. It can help throwing it at some people, to indicate that their behavior is not welcome in the project, and somehow, ease their potential exclusion – even if nobody wants to go that far generally, and that's it's rarely that useful.

I don't think it's mandatory to have such a paper on smaller projects. But you have to keep in mind that the implicit code of conduct will be derived from your own behavior. The way your leader(s) will communicate with others will set the entire social mood of the project. Do not underestimate that.

When we started the Ceilometer project, we implicitly followed the OpenStack Code of Conduct before it even existed, and probably set the bar a little higher. Being nice, welcoming and open-minded, we achieved a descent score of diversity, having up to 25% of our core team being women – way above the current ratio in OpenStack and most open source projects!

Making people not English native feeling like outsider

It's quite important to be aware of that the vast majority of free software project out there are using English as the common language of communication. It makes a lot of sense: it's a commonly spoken language, and it seems to do the job correctly.

But a large part of the hackers out there are not native English speakers. Many are not able to speak English fluently. That means the rate at which they can communicate and run a conversation might be very low, which can make some people frustrated, especially native English speaker.

The principal demonstration of this phenomena can be seen in social events (e.g. conferences) where people are debating. It can be very hard for people to explain their thoughts in English and to communicate properly at a decent rate, making the conversation and the transmission of ideas slow. The worst thing that one can see in this context is an English native speaker cutting people off and ignoring them, just because they are talking too slowly. I do understand that it can be frustrating, but the problem here is not the non-native English speaking, it's the medium being used that does not make your fellow on the same level of everyone by moving the conversation orally.

To a lesser extent, the same applies to IRC meetings, which are by relatively synchronous. Completely asynchronous media do not have this flaw, that's why they should also be preferred in my opinion.

No vision, no delegation

Two of the most commonly encountered mistakes in open source projects: seeing the maintainer struggling with the growth of its project while having people trying to help.

Indeed, when the flow of contributor starts coming in, adding new features, asking for feedback and directions, some maintainers choke and don't know how to respond. That ends up frustrating contributors, which therefore may simply vanish.

It's important to have a vision for your project and communicate it. Make it clear for contributors what you want or don't want in your project. Transferring that in a clear (and non-aggressive, please) manner, is a good way of lowering the friction between contributors. They'll pretty soon know if they want to join your ship or not, and what to expect. So be a good captain.

If they chose to work with you and contribute, you should start trusting them as soon as you can and delegate some of your responsibilities. This can be anything that you used to do: review patches targeting some subsystem, fixing bugs, writing docs. Let people own an entire part of the project so they feel responsible and care about it as much as you do. Doing the opposite, which is being a control-freak, is the best shot at staying alone with your open source software.

And no project is going to grow and be successful that way.

In 2009, when Uli Schlachter sent his first patch to awesome, this was more work for me. I had to review this patch, and I was already pretty busy designing the new versions of awesome and doing my day job! Uli's work was not perfect, and I had to fix it myself. More work. And what did I do? A few minutes later, I replied to him with a clear plan of what he should do and what I thought about his work.

In response, Uli sent patches and improved the project. Do you know what Uli does today? He manages the awesome window manager project since 2010 instead of me. I managed to transmit my vision, delegate, and then retired!

Non-recognition of contributions

People contribute in different ways, and it's not always code. There's a lot of things around a free software projects: documentation, bug triage, user support, user experience design, communication, translation…

It took a while for example to Debian to recognize that their translators could have the status of Debian Developer. OpenStack is working in the same direction by trying to recognize non-technical contributions.

As soon as your project starts attributing badges to some people and creating classes of different members in the community, you should be very careful that you don't forget anyone. That's the easiest road to losing contributors along the road.

Don't forget to be thankful

This whole list has been inspired by many years of open source hacking and free software contributions. Everyone's experience and feeling might be different, or malpractice may have been seen under different forms. Let me know and if there's any other point that you encountered and blocked you to contribute to open source projects!

June 06, 2016

Also, silly titles. Atomic has taken of for real, right now there’s 17 drivers supporting atomic modesetting merged into the DRM subsystem. And still a pile of them each release pending for review&merging. But it’s not just new drivers, there’s also been a steady stream of small improvements over the past year, I think it’s time for an update.

It seems small, but a big improvement made over the past few months is that most driver callbacks used by the helper libraries are now optional. Which means tons and tons of dummy functions and boilerplate code can be removed from drivers, leading to less clutter and easier to understand driver code. Aside: Not all drivers have been decluttered, doing that is great starter project for contributing a first few patches to the DRM subsystem. Many thanks to Boris Brezillion, Noralf Trønnes and many others for making this happen.

A long standing complaint about the DRM kernel mode setting is that it’s too complicated, especially compared to fbdev when all you have is one dumb framebuffer and nothing else. And yes, in that case there’s really no point in having distinct CRTC, plane and encoder objects, and now finally Noralf Trønnes has volunteered to write a helper library for simple display pipelines to hide all that complexity from drivers. It’s not yet merged but I’m postive it’ll land in 4.8. And it will help to make writing DRM drivers for simple hardware easy and the driver code clutter-free.

Another piece many dumb framebuffer drivers need is support for manually uploading new contents to the screen. Often on this simple panels there’s no point in doing page-flipping of entire buffers since a real render engine is nowhere to be seen. And the panels are often behind a really slow bus, making full screen uploads to expensive. Instead it’s all done by directly drawing into the frontbuffer, and then telling the driver what changed so that it can shovel the new bits over the bus to the panel. DRM has full support for this through a dirty interface and IOCTL, and legacy fbdev also has some support for this. But the fbdev emulation helpers in DRM never wired these two bits together, forcing drivers to all type their own boilerplate. Noralf has fixed this by implementing fbdev deferred I/O support for the DRM fbdev helpers.

A related improvement is generic support to disable the fbdev emulation from Archit Tajena, both through a Kconfig option and a module option. Most distributions still expect fbdev to work for the boot splash, recovery console and emergency logging. But some, like ChromeOS, are entirely legacy-free and don’t need any of this. Thus far every DRM driver had to add implement support for fbdev emulation and disabling it optionally itself. Now that’s all done in the library using dummy stub functions in the disabled case, again simplifying driver code.

Somehow most ARM-SoC display drivers start out their system suspend/resume support with a dumb register save/restore. I guess because with simple hardware that works, and regmap provides it almost for free. And then everyone learns the lessons why the atomic modeset helpers have a very strict state transition model the hard way: Display hardware gets upset extremely easily when things are done in the wrong order, or without the required delays, obeying the depencies between components and much more. Dumb register restoring does none of that. To fix this Thierry Redding implemented suspend/resume helpers for atomic drivers. Unfortunately not many drivers use this support yet, which is again a nice opportunity to get a kernel patch merged if you have the hardware for such a driver.

Another big gap in the original atomic infrastructure that’s finally getting close is generic support for nonblocking commits. The tricky part there is getting the depency tracking between commits on different display parts right, and I secretly hoped that with a few examples it would be easier to implement something that’s useful for most drivers. With 17 examples I’ve finally run out of excuse to postpone this, after more than 1 year.

But even more important than making the code prettier for atomic drivers and removing boilerplate with better helpers and libraries is in my opinion explaing it all, and making sure all drivers work the same. Over the past few months there’s been massive sphinx-based documentation toolchain - the above links are already generated using that for a peek at all the new pretty.

The flip side is testing, and on that front Collabora’s effort to  convert all the kernel mode-setting tests in Tomeu Vizoso’s blog on validating changes to KMS drivers.

Finally it’s not all improvements to make it easier to write great drivers, there’s also some new feature work. Lionel Landwerlin added new atomic properties to implement color management support. And there’s work on-going to implement Z-order and blending properties, and lots more, but that’s not yet ready for merging.

June 05, 2016

Quite a lot has happened in xdg-app since last time I blogged about it. Most noticeably, it isn't called xdg-app any more, having been renamed to Flatpak. It is now available in Debian experimental under that name, and the xdg-app package that was briefly there has been removed. I'm currently in the process of updating Flatpak to the latest version 0.6.4.

The privileged part has also spun off into a separate project, Bubblewrap, which recently had its first release (0.1.0). This is intended as a common component with which unprivileged users can start a container in a way that won't let them escalate privileges, like a more flexible version of linux-user-chroot.

Bubblewrap has also been made available in Debian, maintained by Laszlo Boszormenyi (also maintainer of linux-user-chroot). Yesterday I sent a patch to update Laszlo's packaging for 0.1.0. I'm hoping to become a co-maintainer to upload that myself, since I suspect Flatpak and Bubblewrap might need to track each other quite closely. For the moment, Flatpak still uses its own internal copy of Bubblewrap, but I consider that to be a bug and I'd like to be able to fix it soon.

At some point I also want to experiment with using Bubblewrap to sandbox some of the game engines that are packaged in Debian: networked games are a large attack surface, and typically consist of the sort of speed-optimized C or C++ code that is an ideal home for security vulnerabilities. I've already made some progress on jailing game engines with AppArmor, but making sensitive files completely invisible to the game engine seems even better than preventing them from being opened.

Next weekend I'm going to be heading to Toronto for the GTK Hackfest, primarily to to talk to GNOME and Flatpak developers about their plans for sandboxing, portals and Flatpak. Hopefully we can make some good progress there: the more I know about the state of software security, the less happy I am with random applications all being equally privileged. Newer display technologies like Wayland and Mir represent an opportunity to plug one of the largest holes in typical application containerization, which is a major step in bringing sandboxes like Flatpak and Snap from proof-of-concept to a practical improvement in security.

Other next steps for Flatpak in Debian:

  • To get into the next stable release (Debian 9), Flatpak needs to move from experimental into unstable and testing. I've taken the first step towards that by uploading libgsystem to unstable. Before Flatpak can follow, OSTree also needs to move.
  • Now that it's in Debian, please report bugs in the usual Debian way or send patches to fix bugs: Flatpak, OSTree, libgsystem.
  • In particular, there are some OSTree bugs tagged help. I'd appreciate contributions to the OSTree packaging from people who are interested in using it to deploy dpkg-based operating systems - I'm primarily looking at it from the Flatpak perspective, so the boot/OS side of it isn't so well tested. Red Hat have rpm-ostree, and I believe Endless do something analogous to build OS images with dpkg, but I haven't had a chance to look into that in detail yet.
  • Co-maintainers for Flatpak, OSTree, libgsystem would also be very welcome.
June 04, 2016

In some project there’s an awesome process to handle newcomer’s contributions - autobuilder picks up your pull and runs full CI on it, coding style checkers automatically do basic review, and the functional review load is also at least all assigned with tooling too.

Then there’s projects where utter chaos and ad-hoc process reign, like the Linux kernel or the community, and it’s much harder for new folks to get their foot into the door. Of course there’s documentation trying to bridge that gap, tools like to figure out whom to ping, but that’s kinda the details. In the end you need someone from the inside to care about what you’re doing and guide you through the maze the first few times.

I’ve been pinged about this a few times recently on IRC, so I figured I’ll type up my recommended best practices.

The crucial bit is that such unstructured developer communities run entirely on mutual trust, and patches get reviewed through a market of favours as in “I review yours and you review my patches”. As a newcomer you have neither. The goal is to build up enough trust and review favours owed to you to get your patches in.

  • Write a few small, simple patches in related areas. When developing your patches you probably had to read some code, documentation, look at testcases, and very likely you spotted a small thing or two that could be improved. Create a patch for each of those and submit them - it’s a great way to test-drive the process and make first contact. Some projects even have tools to hunt for trivial things, e.g. Linux’ And as a maintainer I try hard to give everyone a chance to land their first few simple patches fast and try to review&merge them within days.

  • Talk about what you’re doing on IRC or whatever your project uses for fast&informal communication. Interactive communication is much faster at clearing up initial misunderstandings compared to mailing lists and similar things like pull request discussions. It’s also more scary, but let’s be honest: If the community you want to contribute too isn’t open&friendly, you probably don’t want to stick around anyway. In short, it’s a good community test too.

  • Go the extra mile: If you create a new interface, helper function or whatever try to roll it out everywhere. Even in code you can’t run yourself (like drivers for other hardware or similar things), or when you’re unsure about how to convert a given piece of code to the new way of doing things. If you have no idea what other piece could benefit from your work, ask around - that’s why you started out with talking on IRC after all. The benefit is that by touching lots of other codes you automatically also gain lots more reviewers who might be interesting in your work and help push it forward.

  • When submitting your work, or resubmitting revised versions, make sure that everyone who showed interest or might be interested stays informed. There’s various support in tooling for this, but with git and mailing-list patch submissions I just add everyone who ever commented on a patch, or who’s reviewer or maintainer for an area with a Cc: entry to the patch itself. That way I never forget to add them when resubmitting.

  • Doing all this you will have learned a lot - apply that knowledge by reviewing patches from people who commented on your work in related areas. That will also fill your review favours kitty and help get your patches landed.

  • If the project uses the maintainer model for committing with no commit rights for regular contributors, don’t ask the maintainer for review, they’re generally overloaded. Instead fan out and try to find the subject expert, like you would do in a project where all regular contributors have commit rights.

  • Most important of all: Don’t just sit&wait around scared and hope your patches will magically land. They won’t, not because they’re bad but simply because people are busy, and your patches are forever lost in the chaos after just a week or two. Hence ping after a few days about the status, but don’t ask when the patches will land, but instead what you can do to move them forward. It’s a cheap trick, but it helps to elicit useful responses, at least in my experience.

  • And equally important: Don’t fall into the imposter syndrome gap or end up blaming yourself when things are a bit bumpy at first - it’s just plain hard to figure out undocumented rules of projects who run on chaos and personal relationships. But do try to improve the documentation once you’ve managed all the pitfalls, to make it easier for the next new person. After all, as the most recent new contributor, you’re now the residential expert on this topic!

  • And finally your patches have landed, and in the process you’ve learned to know a few interesting people. And getting to know new folks is in my opinion really what open source and community is all about. Congrats, and all the best for the next step in your journey!

And finally for the flip side, there’s a great write up from Sarah Sharp about doing review, which applies especially to reviewing newcomer’s patches.

May 27, 2016

Last week was the sixth edition of the Paris Monitoring Meetup, where I was invited as a speaker to present and talk about Gnocchi.

There was around 50 persons in the room, listening to my presentation of Gnocchi.

The talk went fine and I had a few interesting questions and feedback. One interesting point that keeps coming when talking about Gnocchi, is its OpenStack label, which scares away a lot of people. We definitely need to continue explaining that the project work stand-alone has a no dependency on OpenStack, just a great integration with it.

The slides are available online for those who are interested and may have not been present that day!

The Monitoring-fr organization also interviewed me after the meetup about Gnocchi. The interview is in French, obviously. I talk about Gnocchi, what it does, how it does it and why we started the project a couple of years ago. Enjoy, and let me know what you think!

May 25, 2016

After missing the last few GStreamer hackfests I finally managed to attend this time. It was held in Thessaloniki, Greece’s second largest city. The city is located by the sea side and the entire hackfest and related activities were either directly by the sea or just a couple blocks away.

Collabora was very well represented, with Nicolas, Mathieu, Lubosz also attending.

Nicolas concentrated his efforts on making kmssink and v4l2dec work together to provide zero-copy decoding and display on a Exynos 4 board without a compositor or other form of display manager. Expect a blog post soon  explaining how to make this all fit together.

Lubosz showed off his VR kit. He implemented a viewer for planar point clouds acquired from a Kinect. He’s working on a set of GStreamer plugins to play back spherical videos. He’s also promised to blog about all this soon!

Mathieu started the hackfest by investigating the intricacies of Albanian customs, then arrived on the second day in Thessaloniki and hacked on hotdoc, his new fancy documentation generation tool. He’ll also be posting a blog about it, however in the meantime you can read more about it here.

As for myself, I took the opportunity to fix a couple GStreamer bugs that really annoyed me. First, I looked into bug #766422: why glvideomixer and compositor didn’t work with RTSP sources. Then I tried to add a ->set_caps() virtual function to GstAggregator, but it turns out I first needed to delay all serialized events to the output thread to get predictable outcomes and that was trickier than expected. Finally, I got distracted by a bee and decided to start porting the contents of to Markdown and updating it to the GStreamer 1.0 API so we can finally retire the old website.

I’d also like to thank Sebastian and Vivia for organising the hackfest and for making us all feel welcomed!

GStreamer Hackfest Venue

Last year, after DisplayLink released the first version of the supporting tools for their USB3 chipsets, I tried it out on my Dell S2340T.

As I wanted a clean way to test new versions, I took Eric Nothen's RPMs, and updated them along with newer versions, automating the creation of 32- and 64-bit x86 versions.

The RPM contains 3 parts, evdi, a GPLv2 kernel module that creates a virtual display, the LGPL library to access it, and a proprietary service which comes with "firmware" files.

Eric's initial RPMs used the precompiled, and proprietary bits, compiling only the kernel module with dkms when needed. I changed this, compiling the library from the upstream repository, using the minimal amount of pre-compiled binaries.

This package supports quite a few OEM devices, but does not work correctly with Wayland, so you'll need to disable Wayland support in /etc/gdm/custom.conf if you want it to work at the login screen, and without having to restart the displaylink.service systemd service after logging in.

 Plugged in via DisplayPort and USB (but I can only see one at a time)

The source for the RPM are on GitHub. Simply clone and run make in the repository to create 32-bit and 64-bit RPMs. The proprietary parts are redistributable, so if somebody wants to host and maintain those RPMs, I'd be glad to pass this on.
May 19, 2016
Writing the accelerated glReadPixels path for reads to PBOs for Gallium, I wanted to make sure the various possible format conversions are working correctly. They do, but I noticed something strange: when reading from a GL_RGB565 framebuffer to GL_UNSIGNED_BYTE, I was getting tiny differences in the results depending on the code path that was taken. What was going on?

Color values are conceptually floating point values, but most of the time, so-called normalized formats are used to store the values in memory. In fact, many probably think of color values as 8-bit normalized values by default, because of the way many graphics programs present color values and because of the #cccccc color format of HTML.

Normalized formats generalize this well-known notion to an arbitrary number of bits. Given a normalized integer value x in N bits, the corresponding floating point value is x / (2**N - 1) - for example, x / 255 for 8 bits and x / 31 for 5 bits. When converting between normalized formats with different bit depths, the values cannot be mapped perfectly. For example, since 255 and 31 are coprime, the only floating point values representable exactly in both 5- and 8-bit channels are 0.0 and 1.0.

So some imprecision is unavoidable, but why was I getting different values in different code paths?

It turns out that the non-PBO path first blits the requested framebuffer region to a staging texture, from where the result is then memcpy()d to the user's buffer. It is the GPU that takes care of the copy from VRAM, the de-tiling of the framebuffer, and the format conversion. The blit uses the normal 3D pipeline with a simple fragment shader that reads from the "framebuffer" (which is really bound as a texture during the blit) and writes to the staging texture (which is bound as the framebuffer).

Normally, fragment shaders operate on 32-bit floating point numbers. However, Radeon hardware allows an optimization where color values are exported from the shader to the CB hardware unit as 16-bit half-precision floating point numbers when the framebuffer does not require the full floating point precision. This is useful because it reduces the bandwidth required for shader exports and allows more shader waves to be in flight simultaneously, because less memory is reserved for the exports.

And it turns out that the value 20 in a 5-bit color channel, when first converted into half-float (fp16) format, becomes 164 in an 8-bit color channel, even though the 8-bit color value that is closest to the floating point number represented by 20 in 5-bit is actually 165. The temporary conversion to fp16 cuts off a bit that would make the difference.

Intrigued, I wrote a little script to see how often this happens. It turns out that 20 in a 5-bit channel and 32 in a 6-bit channel are the only cases where the temporary conversion to fp16 leads to the resulting 8-bit value to be off by one. Luckily, people don't usually use GL_RGB565 framebuffers... and as a general rule, taking a value from an N-bit channel, converting it to fp16, and then storing the value again in an N-bit value (of the same bit depth!) will always result in what we started out with, as long as N <= 11 (figuring out why is an exercise left to the reader ;-)) - so the use cases we really care about are fine.
May 13, 2016
Quite some time ago, I was asked for a way to use the AV amplifier (which has a fair bunch of speakers connected to it) in our living-room that didn't require turning on the TV to choose a source.

I decided to try and solve this problem myself, as an exercise rather than a cost saving measure (there are good-quality Bluetooth receivers available for between 15 and 20€).

Introducing Blutella

I found this pot of Nutella in my travels (in Europe, smaller quantities are usually in a jar that looks like a mustard glass, with straight sides) and thought it would be a perfect receptacle for a CHIP, to allow streaming via Bluetooth to the amp. I wanted to make a nice how-to for you, dear reader, but best laid plans...

First, the materials:
  • a CHIP
  • jar of Nutella, and "Burnt umber" acrylic paint
  • micro-USB to USB-A and jack 3.5mm to RCA cables
  • Some white Sugru, for a nice finish around the cables
  • bit of foam, a Stanley knife, a CD marker

That's around 10€ in parts (cables always seem to be expensive), not including our salvaged Nutella jar, and the CHIP itself (9$ + shipping).

You'll start by painting the whole of the jar, on the inside, with the acrylic paint. Allow a couple of days to dry, it'll be quite thick.

So, the plan that went awry. Turns out that the CHIP, with the cables plugged in, doesn't fit inside this 140g jar of Nutella. I also didn't make the holes exactly in the right place. The CHIP is tiny, but not small enough to rotate inside the jar without hitting the side, and the groove to screw the cap also have only one position.

Anyway, I pierced two holes in the lid for the audio jack and the USB charging cable, stuffed the CHIP inside, and forced the lid on so it clipped on the jar's groove.

I had nice photos with foam I cut to hold the CHIP in place, but the finish isn't quite up to my standards. I guess that means I can attempt this again with a bigger jar ;)

The software

After flashing the CHIP with Debian, I logged in, and launched a script which I put together to avoid either long how-tos, or errors when I tried to reproduce the setup after a firmware update and reset.

The script for setting things up is in the CHIP-bluetooth-speaker repository. There are a few bugs due to drivers, and lack of integration, but this blog is the wrong place to track them, so check out the issues list.

Apart from those driver problems, I found the integration between PulseAudio and BlueZ pretty impressive, though I wish there was a way for the speaker to reconnect to the phone I streamed from when turned on again, as Bluetooth speakers and headsets do, removing one step from playing back audio.
May 12, 2016

So after a lot of work to put the policies and pieces in place we are now giving Fedora users access to the OpenH264 plugin from <a href="
Dennis Gilmore posted a nice blog entry explaining how you can install OpenH264 in Fedora 24.

That said the plugin is of limited use today for a variety of reasons. The first being that the plugin only supports the Baseline profile. For those not intimately familiar with what H264 profiles are they are
basically a way to define subsets of the codec. So as you might guess from the name Baseline, the Baseline profile is pretty much at the bottom of the H264 profile list and thus any file encoded with another profile of H264 will not work with it. The profile you need for most online videos is the High profile. If you encode a file using OpenH264 though it will work with any decoder that can do Baseline or higher, which is basically every one of them.
And there are some things using H264 Baseline, like WebRTC.

But we realize that to make this a truly useful addition for our users we need to improve the profile support in OpenH264 and luckily we have Wim Taymans looking at the issue and he will work with Cisco engineers to widen the range of profiles supported.

Of course just adding H264 doesn’t solve the codec issue, and we are looking at ways to bring even more codecs to Fedora Workstation. Of course there is a limit to what we can do there, but I do think we will have some announcements this year that will bring us a lot closer and long term I am confident that efforts like Alliance for Open Media will provide us a path for a future dominated by royalty free media formats.

But for now thanks to everyone involved from Cisco, Fedora Release Engineering and the Workstation Working Group for helping to make this happen.

May 11, 2016

The systemd.conf 2016 Call for Participation is Now Open!

We’d like to invite presentation and workshop proposals for systemd.conf 2016!

The conference will consist of three parts:

  • One day of workshops, consisting of in-depth (2-3hr) training and learning-by-doing sessions (Sept. 28th)
  • Two days of regular talks (Sept. 29th-30th)
  • One day of hackfest (Oct. 1st)

We are now accepting submissions for the first three days: proposals for workshops, training sessions and regular talks. In particular, we are looking for sessions including, but not limited to, the following topics:

  • Use Cases: systemd in today’s and tomorrow’s devices and applications
  • systemd and containers, in the cloud and on servers
  • systemd in distributions
  • Embedded systemd and in IoT
  • systemd on the desktop
  • Networking with systemd
  • … and everything else related to systemd

Please submit your proposals by August 1st, 2016. Notification of acceptance will be sent out 1-2 weeks later.

If submitting a workshop proposal please contact the organizers for more details.

To submit a talk, please visit our CfP submission page.

For further information on systemd.conf 2016, please visit our conference web site.

May 10, 2016

The 4.6 release is almost out of the door, it’s time to look at what’s in store for 4.7.

Let’s first look at the epic saga called atomic support. In 4.7 the atomic watermark update support for Ironlake through Broadwell from Matt Roper, Ville Syrjälä and others finally landed. This took about 3 attempts to get merged because there’s lots of small little corner cases that caused regressions each time around, but it’s finally done. And it’s an absolutely key piece for atomic support, since Intel hardware does not support atomic updates of the watermark settings for the display fetch fifos. And if those values are wrong tearings and other ugly things will result. We still need corresponding support for other platforms, but this is a really big step. But that’s not the only atomic work: Maarten Lankhorst made the hardware state checker atomic, and there’s been tons of smaller things all over to move the driver towards the shiny new.

Another big feature on the display side is color management, implemented by Lionel Landwerlin, and then fixes to make it fully atomic from Maarten. Color management aims for more accurate reproduction of a well definied color space on panels, using a de-gamma table, then a color matrix, and finally a gamma table.

For platform enabling the big thing is support for DSI panels on Broxton from Jani Nikula and Ramalingam C. One fallout from this effort is the cleaned up VBT parsing code, done by Jani. There’s now a clean split between parsing the different VBT versions on all the various platforms, now neatly consolidated, and using that information in different places within the driver. Ville also hooked up upscaling/panel fitting for DSI panels on all platforms.

Looking more at driver internals Ander Conselvan de Oliviera and Ville refactored the entire display PLL code on all platforms, with the goal to reuse it in the DP detection code for upfront link training. This is needed to detect the link configuration in certain situations like USB type C connectors. Shubhangi Shrivastava reworked the DP detection code itself, again to prep for these features. Still on pure display topics Ville fixed lots of underrun issues to appease our CI on lots of platforms. Together with the atomic watermark updates this should shut up one of the largest sources of noise in our test results.

Moving on to power management work the big thing is lots of small fixes for the runtime PM support all over the place from Imre Deak and Ville, with a big focus on the Broxton platform. And while we talk features affecting the entire driver: Imre** added fault injection to the driver load paths** so that we can start to exercise all that code in an automated way.

Finally looking at the render/GEM side of the driver the short summary is that Tvrtko Ursulin and Chris Wilson worked the code all over the place: A cleanup up and tuned forcewake handling code from Tvrtko, fixes for more userptr corner cases from Chris, a new notifier to handle vmap exhaustion and assorted polish in the related shrinker code, cleaned up and fixed handling of gpu reset corner cases, fixes for context related hard hangs on Sandybridge and Ironlake, large-scale renaming of parameters and structures to realign old code with the newish execlist hardware mode, the list goes on. And finally a rather big piece, and one which causes some trouble, is all the work to speed up the execlist code, with a big focusing on reducing interrupt handling overhead. This was done by moving the expensive parts of execlist interrupt handling into a tasklet. Unfortunately that uncovered some bugs in our interrupt handling on Braswell, so Ville jumped in and fixed it all up, plus of course removed some cruft and applied some nice polish.

Other work in the GT are are gpu hang fixes for Skylake GT3 and GT4 configurations from Mika Kuoppala. Mika also provided patches to improve the edram handling on those same chips. Alex Dai and Dave Gordon kept working on making GuC ready for prime time, but not yet there. And Peter Antoine improved the MOCS support to work on all engines.

And of course there’s been tons of smaller improvements, bugfixes, cleanups and refactorings all over the place, as usual.

May 09, 2016

I recently worked on creating an xdg-app bundle for GNOME Videos, aka Totem, so it would be built along with other GNOME applications, every night, and made available via the GNOME xdg-app repositories.

There's some functionality that's not working yet though:
  • No support for optical discs
  • The MPRIS plugin doesn't work as we're missing dbus-python (I'm not sure that the plugin will survive anyway, it's more suited to audio players, don't worry though, it's not going to be removed until we have made changes to the sound system in GNOME)
  • No libva/VDPAU hardware acceleration (which would require plugins, and possibly device access of some sort)
However, I created a bundle that extends the freedesktop runtime, that contains gst-libav. We'll need to figure out a way to distribute it in a way that doesn't cause problems for US hosts.

As we also have a recurring problem in Fedora with rpmfusion being out of date, and I sometimes need a third-party movie player to test things out, I put together an mpv manifest, which is the only MPlayer-like with a .desktop and a GUI when launched without any command-line arguments.

Finally, I put together a RetroArch bundle for research into a future project, which uncovered the lack of joystick/joypad support in the xdg-app sandbox.

Hopefully, those few manifests will be useful to other application developers wanting to distribute their applications themselves. There are some other bundles being worked on, and that can be used as examples, linked to in the Wiki.
May 06, 2016

I recently wrote a bigger project in the D programming language, the appstream-generator (asgen). Since I rarely leave the C/C++/Python realm, and came to like many aspects of D, I thought blogging about my experience could be useful for people considering to use D.

Disclaimer: I am not an expert on programming language design, and this is not universally valid criticism of D – just my personal opinion from building one project with it.

Why choose D in the first place?

The previous AppStream generator was written in Python, which wasn’t ideal for the task for multiple reasons, most notably multiprocessing and LMDB not working well together (and in general, multiprocessing being terrible to work with) and the need to reimplement some already existing C code in Python again.

So, I wanted a compiled language which would work well together with the existing C code in libappstream. Using C was an option, but my least favourite one (writing this in C would have been much more cumbersome). I looked at Go and Rust and wrote some small programs performing basic operations that I needed for asgen, to get a feeling for the language. Interfacing C code with Go was relatively hard – since libappstream is a GObject-based C library, I expected to be able to auto-generate Go bindings from the GIR, but there were only few outdated projects available which did that. Rust on the other hand required the most time in learning it, and since I only briefly looked into it, I still can’t write Rust code without having the coding reference open. I started to implement the same examples in D just for fun, as I didn’t plan to use D (I was aiming at Go back then), but the language looked interesting. The D language had the huge advantage of being very familiar to me as a C/C++ programmer, while also having a rich standard library, which included great stuff like std.concurrency.Generator, std.parallelism, etc. Translating Python code into D was incredibly easy, additionally a gir-d-generator which is actively maintained exists (I created a small fork anyway, to be able to directly link against the libappstream library, instead of dynamically loading it).

What is great about D?

This list is just a huge braindump of things I had on my mind at the time of writing 😉

Interfacing with C

There are multiple things which make D awesome, for example interfacing with C code – and to a limited degree with C++ code – is really easy. Also, working with functions from C in D feels natural. Take these C functions imported into D:


struct _mystruct {}
alias mystruct_p = _mystruct*;

mystruct_p = mystruct_create ();
mystruct_load_file (mystruct_p my, const(char) *filename);
mystruct_free (mystruct_p my);

You can call them from D code in two ways:

auto test = mystruct_create ();
// treating "test" as function parameter
mystruct_load_file (test, "/tmp/example");
// treating the function as member of "test"
test.mystruct_load_file ("/tmp/example");
test.mystruct_free ();

This allows writing logically sane code, in case the C functions can really be considered member functions of the struct they are acting on. This property of the language is a general concept, so a function which takes a string as first parameter, can also be called like a member function of string.

Writing D bindings to existing C code is also really simple, and can even be automatized using tools like dstep. Since D can also easily export C functions, calling D code from C is also possible.

Getting rid of C++ “cruft”

There are many things which are bad in C++, some of which are inherited from C. D kills pretty much all of the stuff I found annoying. Some cool stuff from D is now in C++ as well, which makes this point a bit less strong, but it’s still valid. E.g. getting rid of the #include preprocessor dance by using symbolic import statements makes sense, and there have IMHO been huge improvements over C++ when it comes to metaprogramming.

Incredibly powerful metaprogramming

Getting into detail about that would take way too long, but the metaprogramming abilities of D must be mentioned. You can do pretty much anything at compiletime, for example compiling regular expressions to make them run faster at runtime, or mixing in additional code from string constants. The template system is also very well thought out, and never caused me headaches as much as C++ sometimes manages to do.

Built-in unit-test support

Unittesting with D is really easy: You just add one or more unittest { } blocks to your code, in which you write your tests. When running the tests, the D compiler will collect the unittest blocks and build a test application out of them.

The unittest scope is useful, because you can keep the actual code and the tests close together, and it encourages writing tests and keep them up-to-date. Additionally, D has built-in support for contract programming, which helps to further reduce bugs by validating input/output.

Safe D

While D gives you the whole power of a low-level system programming language, it also allows you to write safer code and have the compiler check for that, while still being able to use unsafe functions when needed.

Unfortunately, @safe is not the default for functions though.

Separate operators for addition and concatenation

D exclusively uses the + operator for addition, while the ~ operator is used for concatenation. This is likely a personal quirk, but I love it very much that this distinction exists. It’s nice for things like addition of two vectors vs. concatenation of vectors, and makes the whole language much more precise in its meaning.

Optional garbage collector

D has an optional garbage collector. Developing in D without GC is currently a bit cumbersome, but these issues are being addressed. If you can live with a GC though, having it active makes programming much easier.

Built-in documentation generator

This is almost granted for most new languages, but still something I want to mention: Ddoc is a standard tool to generate code documentation for D code, with a defined syntax for describing function parameters, classes, etc. It will even take the contents of a unittest { } scope to generate automatic examples for the usage of a function, which is pretty cool.

Scope blocks

The scope statement allows one to execute a bit of code before the function exists, when it failed or was successful. This is incredibly useful when working with C code, where a free statement needs to be issued when the function is exited, or some arbitrary cleanup needs to be performed on error. Yes, we do have smart pointers in C++ and – with some GCC/Clang extensions – a similar feature in C too. But the scopes concept in D is much more powerful. See Scope Guard Statement for details.

Built-in syntax for parallel programming

Working with threads is so much more fun in D compared to C! I recommend taking a look at the parallelism chapter of the “Programming in D” book.

“Pure” functions

D allows to mark functions as purely-functional, which allows the compiler to do optimizations on them, e.g. cache their return value. See pure-functions.

D is fast!

D matches the speed of C++ in almost all occasions, so you won’t lose performance when writing D code – that is, unless you have the GC run often in a threaded environment.

Very active and friendly community

The D community is very active and friendly – so far I only had good experience, and I basically came into the community asking some tough questions regarding distro-integration and ABI stability of D. The D community is very enthusiastic about pushing D and especially the metaprogramming features of D to its limits, and consists of very knowledgeable people. Most discussion happens at the forums/newsgroups at

What is bad about D?

Half-proprietary reference compiler

This is probably the biggest issue. Not because the proprietary compiler is bad per se, but because of the implications this has for the D ecosystem.

For the reference D compiler, Digital Mars’ D (DMD), only the frontend is distributed under a free license (Boost), while the backend is proprietary. The FLOSS frontend is what the free compilers, LLVM D Compiler (LDC) and GNU D Compiler (GDC) are based on. But since DMD is the reference compiler, most features land there first, and the Phobos standard library and druntime is tuned to work with DMD first.

Since major Linux distributions can’t ship with DMD, and the free compilers GDC and LDC lack behind DMD in terms of language, runtime and standard-library compatibility, this creates a split world of code that compiles with LDC, GDC or DMD, but never with all D compilers due to it relying on features not yet in e.g. GDCs Phobos.

Especially for Linux distributions, there is no way to say “use this compiler to get the best and latest D compatibility”. Additionally, if people can’t simply apt install latest-d, they are less likely to try the language. This is probably mainly an issue on Linux, but since Linux is the place where web applications are usually written and people are likely to try out new languages, it’s really bad that the proprietary reference compiler is hurting D adoption in that way.

That being said, I want to make clear DMD is a great compiler, which is very fast and build efficient code. I only criticise the fact that it is the language reference compiler.

UPDATE: To clarify the half-proprietary nature of the compiler, let me quote the D FAQ:

The front end for the dmd D compiler is open source. The back end for dmd is licensed from Symantec, and is not compatible with open-source licenses such as the GPL. Nonetheless, the complete source comes with the compiler, and all development takes place publically on github. Compilers using the DMD front end and the GCC and LLVM open source backends are also available. The runtime library is completely open source using the Boost License 1.0. The gdc and ldc D compilers are completely open sourced.

Phobos (standard library) is deprecating features too quickly

This basically goes hand in hand with the compiler issue mentioned above. Each D compiler ships its own version of Phobos, which it was tested against. For GDC, which I used to compile my code due to LDC having bugs at that time, this means that it is shipping with a very outdated copy of Phobos. Due to the rapid evolution of Phobos, this meant that the documentation of Phobos and the actual code I was working with were not always in sync, leading to many frustrating experiences.

Furthermore, Phobos is sometimes removing deprecated bits about a year after they have been deprecated. Together with the older-Phobos situation, you might find yourself in a place where a feature was dropped, but the cool replacement is not yet available. Or you are unable to import some 3rd-party code because it uses some deprecated-and-removed feature internally. Or you are unable to use other code, because it was developed with a D compiler shipping with a newer Phobos.

This is really annoying, and probably the biggest source of unhappiness I had while working with D – especially the documentation not matching the actual code is a bad experience for someone new to the language.

Incomplete free compilers with varying degrees of maturity

LDC and GDC have bugs, and for someone new to the language it’s not clear which one to choose. Both LDC and GDC have their own issues at time, but they are rapidly getting better, and I only encountered some actual compiler bugs in LDC (GDC worked fine, but with an incredibly out-of-date Phobos). All issues are fixed meanwhile, but this was a frustrating experience. Some clear advice or explanation which of the free compilers is to prefer when you are new to D would be neat.

For GDC in particular, being developed outside of the main GCC project is likely a problem, because distributors need to manually add it to their GCC packaging, instead of having it readily available. I assume this is due to the DRuntime/Phobos not being subjected to the FSF CLA, but I can’t actually say anything substantial about this issue. Debian adds GDC to its GCC packaging, but e.g. Fedora does not do that.

No ABI compatibility

D has a defined ABI – too bad that in reality, the compilers are not interoperable. A binary compiled with GDC can’t call a library compiled with LDC or DMD. GDC actually doesn’t even support building shared libraries yet. For distributions, this is quite terrible, because it means that there must be one default D compiler, without any exception, and that users also need to use that specific compiler to link against distribution-provided D libraries. The different runtimes per compiler complicate that problem further.

The D package manager, dub, does not yet play well with distro packaging

This is an issue that is important to me, since I want my software to be easily packageable by Linux distributions. The issues causing packaging to be hard are reported as dub issue #838 and issue #839, with quite positive feedback so far, so this might soon be solved.

The GC is sometimes an issue

The garbage collector in D is quite dated (according to their own docs) and is currently being reworked. While working with asgen, which is a program creating a large amount of interconnected data structures in a threaded environment, I realized that the GC is significantly slowing down the application when threads are used (it also seems to use UNIX signals SIGUSR1 and SIGUSR2 to stop/resume threads, which I still find odd). Also, the GC performed poorly on memory pressure, which did get asgen killed by the OOM killer on some more memory-constrained machines. Triggering a manual collection run after a large amount of these interconnected data structures wasn’t needed anymore solved this problem for most systems, but it would of course have been better to not needing to give the GC any hints. The stop-the-world behavior isn’t a problem for asgen, but it might be for other applications.

These issues are at time being worked on, with a GSoC project laying the foundation for further GC improvements.

“version” is a reserved word

Okay, that is admittedly a very tiny nitpick, but when developing an app which works with packages and versions, it’s slightly annoying. The version keyword is used for conditional compilation, and needing to abbreviate it to ver in all parts of the code sucks a little (e.g. the “Package” interface can’t have a property “version”, but now has “ver” instead).

The ecosystem is not (yet) mature

In general it can be said that the D ecosystem, while existing for almost 9 years, is not yet that mature. There are various quirks you have to deal with when working with D code on Linux. It’s always nothing major, usually you can easily solve these issues and go on, but it’s annoying to have these papercuts.

This is not something which can be resolved by D itself, this point will solve itself as more people start to use D and D support in Linux distributions gets more polished.


I like to work with D, and I consider it to be a great language – the quirks it has in its toolchain are not that bad to prevent writing great things with it.

At time, if I am not writing a shared library or something which uses much existing C++ code, I would prefer D for that task. If a garbage collector is a problem (e.g. for some real-time applications, or when the target architecture can’t run a GC), I would not recommend to use D. Rust seems to be the much better choice then.

In any case, D’s flat learning curve (for C/C++ people) paired with the smart choices taken in language design, the powerful metaprogramming, the rich standard library and helpful community makes it great to try out and to develop software for scenarios where you would otherwise choose C++ or Java. Quite honestly, I think D could be a great language for tasks where you would usually choose Python, Java or C++, and I am seriously considering to replace quite some Python code with D code. For very low-level stuff, C is IMHO still the better choice.

As always, choosing the right programming language is only 50% technical aspects, and 50% personal taste 😉

UPDATE: To get some idea of D, check out the D tour on the new website

May 04, 2016

A recurring question I encounter is the question whether uinput or evdev should be the approach do implement some feature the user cares about. This question is unfortunately wrongly framed as uinput and evdev have no real overlap and work independent of each other. This post outlines what the differences are. Note that "evdev" here refers to the kernel API, not to the X.Org evdev driver.

First, the easy flowchart: do you have to create a new virtual device that has a set of specific capabilities? Use uinput. Do you have to read and handle events from an existing device? Use evdev. Do you have to create a device and read events from that device? You (probably) need two processes, one doing the uinput bit, one doing the evdev bit.

Ok, let's talk about the difference between evdev and uinput. evdev is the default input API that all kernel input device nodes provide. Each device provides one or more /dev/input/eventN nodes that a process can interact with. This usually means checking a few capability bits ("does this device have a left mouse button?") and reading events from the device. The events themselves are in the form of struct input_event, defined in linux/input.h and consist of a event type (relative, absolute, key, ...) and an event code specific to the type (x axis, left button, etc.). See linux/input-event-codes.h for a list or linux/input.h in older kernels.Specific to evdev is that events are serialised - framed by events of type EV_SYN and code SYN_REPORT. Anything before a SYN_REPORT should be considered one logical hardware event. For example, if you receive an x and y movement within the same SYN_REPORT frame, the device has moved diagonally.

Any event coming from the physical hardware goes into the kernel's input subsystem and is converted to an evdev event that is then available on the event node. That's pretty much it for evdev. It's a fairly simple API but it does have some quirks that are not immediately obvious so I recommend using libevdev whenever you actually need to communicate with a kernel device directly.

uinput is something completely different. uinput is an kernel device driver that provides the /dev/uinput node. A process can open this node, write a bunch of custom commands to it and the kernel then creates a virtual input device. That device, like all others, presents an /dev/input/eventN node. Any event written to the /dev/uinput node will re-appear in that /dev/input/eventN node and a device created through uinput looks just pretty much like a physical device to a process. You can detect uinput-created virtual devices, but usually a process doesn't need to care so all the common userspace (libinput, Xorg) doesn't bother. The evemu tool is one of the most commonly used applications using uinput.

Now, there is one thing that may cause confusion: first, to set up a uinput device you'll have to use the familiar evdev type/code combinations (followed-by a couple of uinput-specific ioctls). Events written to uinput also use the struct input_event form, so looking at uinput code one can easily mistake it for evdev code. Nevertheless, the two serve a completely different purpose. As with evdev, I recommend using libevdev to initalise uinput devices. libevdev has a couple of uinput-related functions that make life easier.

Below is a basic illustration of how things work together. The physical devices send their events through the event nodes and libinput is a process that reads those events. evemu talks to the uinput module and creates a virtual device which then too sends events through its event node - for libinput to read.

Since I seem to be not so good at finding time for blog updates recently, this update probably covers a greater timespan than it should, and some of this is already old news ;-)

Already quite some time ago, but in case you didn't already notice: with the mesa 11.1 release, freedreno now supports up to (desktop) gl3.1 on both a3xx and a4xx (in addition to gles3).  Which is high enough to show up on the front page at glxinfo.  (Which, btw, is a useful tool to see exactly which gl/gles extensions are supported by which version of mesa on various different hw.)

A couple months back, I spent a bit of time starting to look at performance.  On master now (so will be in 11.3), we have timestamp and time-elapsed query support for a4xx, and I may expose a few more performance counters (mostly for the benefit of gallium HUD).  I still need to add support for a3xx, but already this is useful to help profile.  In addition, I've cobbled together a simple fdperf cmdline tool:

I also got around to (finally) implementing hw binning support for a4xx, which for *some* games can have a pretty big perf boost:
  • glmark2 'refract' bench (an extreme example): 31fps -> 124fps
  • xonotic (med): 44.4fps -> 50.3fps
  • supertuxkart (new render engine): 15fps -> 19fps
More recently I've started to run the dEQP gles3 tests against freedreno.  Initially the results where not too good, but since then I've fixed a couple thousand test cases.. fortunately it was just a few bugs and a couple missing workaround for hw bug/limitations (depending on how you look at it) which counted for the bulk of the fails.  Now we are at 98.9% pass (or 99.5% if you don't count the 'skips' against the pass ratio).  These fixes have also helped piglit, where we are now up to 98.3% pass.  These figures are a4xx, but most of the fixes apply to a3xx as well.

I've also made some improvements in ir3 (shader compiler for a3xx and later) so the code it generates is starting to be pretty decent.  The immediate->const lowering that I just pushed helps reduce register pressure in a lot of cases.  We still need support for spilling, but at least now shadertoy (which is some sort of cruel joke against shader compiler writers) isn't a complete horror show:

In other cool news, in case you had not already seen: Rob Herring and John Stultz from linaro have been doing some cool work, with Rob getting android running on an upstream kernel plus mesa running on db410c and qemu (with freedreno and virtgl), and John taking all that, and getting it all running on a nexus7 tablet.  (And more recently, getting wifi working as well.)  I had the opportunity to see this in person when I was at Linaro Connect in March.  It might not seem impressive if you are unfamiliar with the extent to which android device kernels diverge from upstream, but to see an upstream kernel running on an actual device with only ~50patches is quite a feat:

The UI was actually reasonably fast, despite not yet using overlays to bypass GPU for composition.  But as ongoing work in drm/kms for explicit fencing, and mesa EGL_ANDROID_native_fence_sync land, we should be able to get hw composition working.

Short version

dnf copr enable hadess/emoji
dnf update cairo
dnf install eosrei-emojione-fonts

Long version

A little while ago, I was reading this article, called "Emoji: how do you get from U+1F355 to 🍕?", which said, and I reluctantly quote: "[...] and I don’t know what Linux does, but it’s probably black and white and who cares [...]".

Well. I care. And you probably do as well if your pizza slice above is black and white.

So I set out to check on the status of Behdad Esfahbod (or just "Behdad" as we know him)'s patches to add colour font support to cairo, which he presented at GUADEC in Strasbourg Gothenburg. It adds support for the "bitmap in font" as Android does, and as freetype supports.

It kind of worked, and Matthias Clasen reworked the patches a few times, completing the support. This is probably not the code that will be worked on and will land in cairo, but it's a good enough base for people interested in contributing to use.

After that, we needed something to display using that feature. We ended up using the same font recommended in this article, the Emoji One font.

There's still plenty to be done to support emojis, even after the cairo support is merged. We'd need a way to input emojis (maybe Lalo Martins is listening), and support in a lot of toolkits other than GNOME (Firefox only supports the SVG-in-OTF format, WebKit, Chrome, LibreOffice don't seem to know about colour fonts either).

You can find more information about design interests in GNOME around Emoji on the Wiki.

Update: Behdad's presentation was in Gothenburg, not Strasbourg. You can also see the video on YouTube.

Exactly a year ago, I released the second edition of my book The Hacker's Guide to Python. One more time, it has been a wonderful release and I received a lot of amazing feedback from my readers all over this year.

Since then, the book has been translated into 2 languages: Korean and Chinese. A few thousands of copies has been distributed there, and I'm very glad the book has been such a success. I'm looking into getting it translated into more languages – don't hesitate to get in touch with me if you have any interesting connections in your country.

For those who still don't know about this guide, that I first released a couple of years ago, let me sum up by saying it's the Python book that I always wanted to read, never found, and finally wrote. It does not cover the basics of the language, but deals with concrete problems, best practice and some of the languages internals.

It includes content about unit testing, methods, decorators, AST, distribution, documentation, functional programming, scaling, Python 3, etc. All of that made it pretty successful! It comes with awesome 9 interviews that I realized with some of my fellow experienced Python hackers and developers!

The paperback 3rd edition

The Korean edition!

In that 3rd edition, there is, like in each new edition, a few fixes on code, typos, etc. I guess books need a lot of time to become perfect! I also updated some of the content: things evolved a bit since I last revised the content a year ago. Finally, a new chapter about timestamps handling and timezone has made his appearance too.

If you didn't get the book yet, it's time to go check it out and use the coupon THGTP3LAUNCH to get 20 % off during the next 48 hours!

May 03, 2016
DB migration support has been added in Django 1.7+, superseding South. More specifically, it's possible to automatically generate migrations steps when one or more changes in the application models are detected. Definitely a nice feature!

I've written a small generic unit-test that one should be able to drop into the tests directory of any Django project and that checks there's no pending migrations, ie. if the models are correctly in sync with the migrations declared in the application. Handy to check nobody has forgotten to git add the migration file or that an innocent looking change in doesn't need a migration step generated. Enjoy!

See the code on djangosnippets or as a github gist!

May 02, 2016

Writing compilers is hard. I don't think that anybody disputes this. However, I've grown frustrated with the lack of compiler performance and robustness in the Monte toolchain. Monte will have a developer preview release in a few weeks and I need to get some stuff concerning compilers out of my head and onto the page.

Monte, the Mess

Right now, Monte is in the doldrums. We have deliberately wound down effort on features and exploration in order to produce a developer preview meant to pique interest and generate awareness of Monte, E, object capabilities, etc. As a result, it's worth taking stock of what we've built so far.

Monte's reference implementation is embodied by the Typhon VM, a JIT written in RPython which implements the runtime and interpreter, but does not do any parsing or expansion. Typhon is satisfactory, performing much like early CPython, and outperforming E-on-Java by a bit. However, its JIT does not provide any speed boost compared to interpretation; our execution model is far too sloppy. Additionally, the JIT is fragile and crash-prone as of writing, and we have it disabled by default.

Our current method of execution is to load Kernel-Monte, compile it to an in-memory bytecode resembling, but slightly different from, Smallcaps; and then provide user-built objects which interoperate with runtime objects and internally run this quasi-Smallcaps bytecode.

Performance is behind CPython by a serious but not insurmountable margin. This is unacceptable. One of the goals of Monte is to, by virtue of being less dynamic than Python, be faster than Python in execution. It's been a truism of compilers that lower expressiveness correlates with greater optimization opportunities for a long time, and clearly we are missing out.

Monte, the Metalanguage

A non-trivial portion of the ideology of Monte, which I did not realize when I first embarked on this journey, is that Monte is really an object calculus of some sort; it hides beneath it a simple core language (Kernel-Monte) that turns out to be a very simple universal computer based on message-passing. Almost everything that makes Monte distinct as a language is built on top of this core, from promises and vats, through modules and quasiliterals, to the entirety of the safe scope. The only gnarl that I have found while working with this core, in honesty, is the semantics of mutable names (var x := f()), which I am still on the fence about, but overall is not a tough complication. (Specifically, I don't know whether mutable slots should be created by a virtual machine instruction, or a primitive _makeVarSlot object.)

Unfortunately, Monte's metalanguage doesn't exactly correspond to anything in the literature. Worse, it somewhat resembles many different things in the literature simultaneously, making the choice of techniques much harder. Computer science, as a discipline, has developed an amazing corpus of compiler techniques, but they do require one to already have committed to a particular semantics, and choosing the semantic and evaluation model for Monte has been a challenge.

I'm only one person. Whatever I end up using has to be comprehensible to me, because otherwise I can't build it. This is unfortunate, as I'm something of a dunce, so I would prefer it if my semantics were simple. Additionally, typing is hard and it would be nice to find something easy to implement.

As a brief aside, I want to emphasize that I am not going to talk about parsing today. Monte's parsing story is very simple and solid, and the canonical expansion from Full-Monte into Kernel-Monte is also relatively well-understood. I want to talk about the stuff that makes compilers hard to scale; I want to talk about optimizations and modelling of semantics.

When I say "semantics of Monte" today, I am referring to the bundle of concepts that represent Monte's evaluation at its lowest level. Everything I'm talking about today starts at Kernel-Monte and (hopefully) only goes downward in expressiveness.

Monte, the Schemer

Strange as it might seem to children like myself, Monte is actually descended from Scheme via E, and this manifests in E's actor-like concurrency and also in the E documentation, which discusses E as a variant of lambda calculus.

What Maps Well

After slot expansion, (set!) bears clear similarity to the behavior of mutable names with VarSlot.

The general design of lexically-scoped closures on objects, and thus the optimization patterns, appear very similar between Monte and Scheme. For example, this commit was directly inspired by this Scheme compiler optimization, posted to Lambda the Ultimate a week prior.

List patterns are present in some Schemes, like Racket, and Monte's list patterns are still present in Kernel-Monte; one of the few explicit type-checked kernel situations. (I think that the only other one is the if expression's test… We don't currently require bindings to be :Binding.)

What Maps Poorly

Exceptions are the obvious problem. (call/cc) provides undelimited continuations, but ejectors are explicitly delimited continuations. Something like Oleg's shift/reset semantics, or Racket exceptions, provide sufficient structure to recover the semantics, but the difference is clear. Oleg only outlines how things work; he does not offer hints on optimization. There is a standard suite of optimizations on continuations when using CPS (continuation-passing style); however, CPS massively complicates implementation.

In particular, when using CPS, all method signatures are complicated by the transformation, which means that tracebacks and other debugging information have to be munged more. We also lose our "no stale stack frames" policy, which informally states that we don't have coroutines nor generators. The CPS transformation generally generates a form of code which should be run as a coroutine, with a live (delimited) continuation passed in from the runtime. This is not impossible, but it is a drastic shift away from what I've studied and built so far.

Since Kernel-Monte is an expression language, a desugaring from def to some sort of intermediate let is highly desirable. However, I have attempted to build this algorithm thrice and been stymied every time. The corner cases are constantly recurring; even the canonical expansion is sufficient to send me into fits with some of its pathological generated code. I've concluded that this transformation, while something I dearly want, requires much more thought.

What Isn't Clear

A-normal form sounds so enticing, but I can't articulate why.

Monte, the Talker

Monte's family tree is firmly rooted in elder Smalltalk-80, and incorporates concepts from its parents Python and E, cousins Ruby and JavaScript, and of course its {famti} Java. Our family is not all speed demons, but many of them are quite skilled at getting great performance out of their semantics, and our distant cousin Lua is often on top of benchmarks. We should make sure that we're not missing any lessons from them.

What Maps Well

We can suppose that every object literal is recurrent in a scope; it has some maker, even if that maker is the top-level eval(). In that sense, the script of an object literal, combined with the closure of the object literal, is just like a description of a class in a class-based language with no inheritance. We can even add on Monte-style extends object composition by permitting subclasses; there is no this or self, but the subclasses could override superclass methods and we could use the standard method cache technique to make that fast.

We have two more layers of boxing than most other object-based languages, but that doesn't seem to really impede the otherwise-identical "pass-by-object" semantics of Monte with pretty much every other language in the family. Our JIT has definitely proven capable of seeing through FinalSlot and Binding just like it can see through Int.

What Maps Poorly

Our family tree really should have a strict line in the sand for scoping rules, because half of the family doesn't have static lexical scopes. Much of what has gone into making Python fast, especially in the fast implementations like PyPy and ZipPy, doesn't apply at all to Monte because Monte does not have dynamic scopes, and so Monte does not need to recover static scoping information in the JIT.

Our static scope, honestly, works against us somewhat. I can't help but feel that most of the quirky design in Ruby and Python bytecode is due to not being able to erase away lots of scope semantics; contrapositively, Monte is actually kind of hard to compile into lower forms precisely because the static scoping information makes manipulating terms harder. (This might just be me whining; I mean "hard" in the "lots of typing and thinking" sense.)

We really do need a "deslotification" system of some sort. I've thought about this, and come up with a couple conceptual systems that generate type information for slots and erase bindings and slots during compilation when it can prove that they're not needed. Unfortunately, I haven't actually implemented anything; this is another situation where things are hard to implement. Once again, this is relatively untrodden territory, other than the word "deslotification" itself, which comes from this E page. Interestingly, I independently came up with some of the schemes on that page, which suggests that I'm on the right track, but I also learned that this stuff was never really implemented, so maybe it's a dead end.

What Isn't Clear

Bytecode seems like a good thing. It also seems like a serious albatross, once we start picking on-disk bytecode formats. I'm not sure whether the Smallcaps construction really is the best way of modelling the actions that Monte objects take.

Paths Unpaved

There's a couple options available to us that are relatively orthogonal to what I've talked about so far.

LLVM is the elephant in the room. It is an aggressively-optimizing, competent code generator for anything that can be put into a standard low-level-typed SSA form. For Monte, LLVM would enable a compilation strategy much like Objective-C (or, I imagine, like Swift): Arrange all objects into a generated class hierarchy, prove and erase all types to get as many unboxed objects as possible, and then emit LLVM, producing a binary that runs at a modest speed.

The main trick to LLVM that I see is that it appears to dictate a semantic model, but that is only because we are looking at LLVM through its intended lens of compiling C++, from which Objective-C appears the closest relative to Monte. However, there exist LLVM-targeting compilers which emit code that looks quite alien; the example that comes to my mind is GHC's LLVM backend, which generates the same graph-reducing machine as GHC's native backend. There's no reason that we could not pursue a similar path after figuring out our semantics.

Another growing elephant is Truffle. Truffle is much like RPython, providing pretty much the same stuff, but with two important differences. First, Truffle itself is not translated in the same way as RPython; there's a complex interaction between Truffle, Graal, and the JVM which produces the desired JIT effects. RPython's complexity is mostly borne by the compiler author; the fanciest switch on the panel of a translated RPython program is the one that controls the JIT's parameters. Truffle lets you pick between multiple different JITs at runtime! This is partially due to choices made in the JVM ecosystem that make this preferable.

The second different is worth emphasizing, just because it matters deeply to me, and I figure that it surely must resonate with other folks. Truffle is not a meta-tracing JIT like RPython, but a partially evaluating JIT. This is both a solid theoretical foundation, and a recipe for proven-correct aggressive optimizations. In benchmarks, Truffle does not disappoint. The only downside to Truffle is having to write Java in roughly the normal Java-to-Python proportions instead of RPython.

We could write pretty much anything in Truffle that we could in RPython; thus, sticking with RPython for the accumulated knowledge and experience that we have on that platform makes sense for now. A Truffle port could be done at some point, perhaps by the community.

Monte, the Frustration

I hate via patterns. But only as a compiler author. As a writer of Monte code, via is delightful. When compiling via patterns, though, one has to extract the guts of the pattern, which turns out to be a seriously tricky task in the corner cases. It's the kind of thing that even production-quality Haskell compiler libraries flinch at handling. (As a corollary, if I understood the Haskell bound package, I would be writing one compiler, in Haskell, and nothing else.)

DeepFrozen proof obligations really should be discharged at compile time whenever possible. They aren't really that expensive, but they definitely impose some running overhead. Similarly, a specializer that could discharge or desugar things like (0..!MAXSIZE) would be nice; that single expression was 20% of the runtime of the richards benchmark recently.

To be more blunt, I like partial evaluation. I would love to have a practical partial evaluator for Monte. I also don't feel that Monte's current semantics are very friendly towards partial evaluation. I really do want to lower everything into some simpler form before doing any specialization.

In Conclusion

In conclusion, I need a vacation, I think. If only there were a Python convention coming up…

It's again that time of the year, where we all fly out to a different country to chat about OpenStack and what we'll do during the next 6 months. This time, it was in Austin, TX and we chatted about the new Newton release that will be out in October.

As the Project Team Leader for the Telemetry project, I set up and animated the week for our team. We had 9 discussion slots of 40 minutes assigned, but finally only used 8. We also, somehow, canceled the contributor team meet-up on the last day, as only a few of us developers were there and available.

We took a few notes in our Etherpads, but I think most of them were pretty sparse, as there was nothing really important we talked about. Actually, many topics were already discussed and covered 6 months ago in Tokyo during the previous summit. We just did not have time to implement everything we wanted, so talking over it again would not have been of a great help.

Reference architecture

Unfortunately, nor Gordon Chung nor the OpenStack Innovation Center had time to run the tests and benchmarks they wanted to run before the summit. We still discussed their plan to run tests and benchmark of the whole Telemetry suite (Ceilometer, Gnocchi & Aodh). They should run their tests for 3 weeks, no more, in a few weeks. The window to run tests being narrow, they want to be sure they are prepared, and will reach to us for help, ideas, and validation.

I've also requested them to, if possible, provide us some profiling (e.g. cProfile) data so we can have better knowledge of the area we can optimize.

Gnocchi, next steps

This session was particularly smooth since most people in the room were not up-to-date with Gnocchi 2.1. Some people expressed concerned about the InfluxDB driver removal, though they were not aware of the bugs it had, and that Gnocchi was actually performing better – so they may very likely be testing Gnocchi directly instead.

No particular fancy feature was requested, only a few bugs and ideas noted on Launchpad were discussed.

Enhancing Ceilometer polling

This session was not particularly productive, as everything was we wanted to discuss was already on the Etherpad from… Tokyo, 6 months ago. It turns out nobody had time to pursue this project, so we'll see what happens. There's definitely some work to do to pursue our goal of splitting the pipeline definition into smaller files.

Aodh roadmap & improvements

First, we decided to definitely kill the combination alarm in the future, in favor of the new composite alarms definition that we like better.

We should switch to OpenStackClient in the future for aodhclient. The OSC team indicated they are willing to provide a way to keep the "aodh" CLI command on its own, which is something that blocked us to move to OSC.

A bunch of people indicated that had support for alarms CRUD in the Horizon dashboard. They should work together with the Horizon team to complete what has been started in Horizon recently to add Aodh support.

Ceilometer splitting

A year ago, we decided to split Ceilometer and its alarm feature: Aodh was born. We did discuss doing it again 6 months ago, but nothing happened as we already had so many stuff on our plate.

As far as I'm concerned, I think it's now time to split some Ceilometer functionality again, so I'm going to do that this time with the event part. Gordon found a name, and this new project will be named Panko.


We have then discussed our documentation. Users present in the room were particularly happy with the Gnocchi policy that we apply since the beginning: no doc = no merge of your patch. The consensus is to move forward on this policy for all Telemetry projects, especially since it's now clear that the documentation team is not going to help us more. Ildikó, our documentation wizard, will take care of making some links between the official OpenStack documentation and our projects, avoid content duplication.

For this cycle, my personal plan is to document Aodh up to roughly 80 %, and then force that policy on newly implemented changes.

Events management

The event management part of Ceilometer and API (soon to be split in its own project as stated above) was discussed in this session. Nothing really exciting coming here, as nobody is willing to enhance it for now. Which, again, makes it a great candidate for splitting it out of Ceilometer.


The last session was dedicated to Vitrage, a root cause analysis tool built on OpenStack. The Vitrage team had a few features that they wanted to see in Aodh, so we discussed that at length. Notably, more support for sending notifications on events (alarm creation, deletion…) should be added in this next release.

Also, a new alarm type that would be entirely managed and triggered over HTTP would be very useful for external projects such as Vitrage. We'll try to make that happen during this cycle too.


There were a few interesting talks about our telemetry projects during this summit, among other I highly recommend watching:

All of this should keep me and the team busy for the next cycle. If you have any question about what has been discussed or the future of our projects, don't hesitate to leave a comment or ask us on the OpenStack development mailing list.

April 28, 2016

Two questions were up for voting, 4 seats on the Board of Directors and approval of the amended By-Laws to join SPI.

Congratulations to our reelected and new board members Egbert Eich, Alex Deucher, Keith Packard and Bryce Harrington. Thanks a lot to Lucas Stach for running. And also big thanks to our outgoing board member Matt Dew, who stepped down for personal reasons.

On the bylaw changes and merging with SPI, 61 out of 65 active members voted, with 54 voting yes, 4 no and 3 abstained. Which means we’re well past the 2/3rd quorum for bylaw changes, and everything’s green now to proceed with the plan to join SPI!

April 27, 2016
Shaders can be huge, and tracking down compiler crashes (or asserts) in LLVM with a giant shader isn't a lot of fun. Luckily, LLVM has a tool called Bugpoint. It takes a given piece of LLVM IR and tries a bunch of simplifications such as removing instructions or basic blocks, while checking that a given condition is still satisfied. Make the given condition something like "llc asserts with message X", and you have a very useful tool for reducing test cases. Unfortunately, its documentation isn't the greatest, so let me briefly dump how I have used it in the past.

I have a little script called that looks like this:


if ! llc -mtriple=amdgcn-- -verify-machineinstrs "$@" 2>&1 | grep "error message here"; then
exit 0
exit $?
When I encounter a compiler assertion, I first make sure to collect the offending shader from our driver using R600_DEBUG=ps,vs,gs,tcs,tes and extract it into a file like bug.ll. (In very rare cases, one may need the preoptir option in R600_DEBUG.) Then I edit with the correct error message and run

bugpoint -compile-custom -compile-command ./ bug.ll
It'll churn for some time and produce a hopefully much smaller .bc file that one can use the usual tools on, such as llc, opt, and llvm-dis.

Occasionally, it can be useful to run the result through opt -instnamer or to simplify it further by hand, but usually, bugpoint provides a good starting point.
April 26, 2016

This is a question raised quite quite often, the last time in a blogpost by Thomas, so I thought it is a good idea to give a slightly longer explanation (and also create an article to link to…).

There are basically three reasons for using XML as the default format for metainfo files:

1. XML is easily forward/backward compatible, while YAML is not

This is a matter of extending the AppStream metainfo files with new entries, or adapt existing entries to new needs.

Take this example XML line for defining an icon for an application:

<icon type="cached">foobar.png</icon>

and now the equivalent YAML:

  cached: foobar.png

Now consider we want to add a width and height property to the icons, because we started to allow more than one icon size. Easy for the XML:

<icon type="cached" width="128" height="128">foobar.png</icon>

This line of XML can be read correctly by both old parsers, which will just see the icon as before without reading the size information, and new parsers, which can make use of the additional information if they want. The change is both forward and backward compatible.

This looks differently with the YAML file. The “foobar.png” is a string-type, and parsers will expect a string as value for the cached key, while we would need a dictionary there to include the additional width/height information:

  cached: name: foobar.png
          width: 128
          height: 128

The change shown above will break existing parsers though. Of course, we could add a cached2 key, but that would require people to write two entries, to keep compatibility with older parsers:

  cached: foobar.png
  cached2: name: foobar.png
          width: 128
          height: 128

Less than ideal.

While there are ways to break compatibility in XML documents too, as well as ways to design YAML documents in a way which minimizes the risk of breaking compatibility later, keeping the format future-proof is far easier with XML compared to YAML (and sometimes simply not possible with YAML documents). This makes XML a good choice for this usecase, since we can not do transitions with thousands of independent upstream projects easily, and need to care about backwards compatibility.

2. Translating YAML is not much fun

A property of AppStream metainfo files is that they can be easily translated into multiple languages. For that, tools like intltool and itstool exist to aid with translating XML using Gettext files. This can be done at project build-time, keeping a clean, minimal XML file, or before, storing the translated strings directly in the XML document. Generally, YAML files can be translated too. Take the following example (shamelessly copied from Dolphin):

<summary>File Manager</summary>
<summary xml:lang="bs">Upravitelj datoteka</summary>
<summary xml:lang="cs">Správce souborů</summary>
<summary xml:lang="da">Filhåndtering</summary>

This would become something like this in YAML:

  C: File Manager
  bs: Upravitelj datoteka
  cs: Správce souborů
  da: Filhåndtering

Looks manageable, right? Now, AppStream also covers long descriptions, where individual paragraphs can be translated by the translators. This looks like this in XML:

  <p>Dolphin is a lightweight file manager. It has been designed with ease of use and simplicity in mind, while still allowing flexibility and customisation. This means that you can do your file management exactly the way you want to do it.</p>
  <p xml:lang="de">Dolphin ist ein schlankes Programm zur Dateiverwaltung. Es wurde mit dem Ziel entwickelt, einfach in der Anwendung, dabei aber auch flexibel und anpassungsfähig zu sein. Sie können daher Ihre Dateiverwaltungsaufgaben genau nach Ihren Bedürfnissen ausführen.</p>
  <p xml:lang="de">Funktionen:</p>
  <p xml:lang="es">Características:</p>
    <li>Navigation (or breadcrumb) bar for URLs, allowing you to quickly navigate through the hierarchy of files and folders.</li>
    <li xml:lang="de">Navigationsleiste für Adressen (auch editierbar), mit der Sie schnell durch die Hierarchie der Dateien und Ordner navigieren können.</li>
    <li xml:lang="es">barra de navegación (o de ruta completa) para URL que permite navegar rápidamente a través de la jerarquía de archivos y carpetas.</li>
    <li>Supports several different kinds of view styles and properties and allows you to configure the view exactly how you want it.</li>

Now, how would you represent this in YAML? Since we need to preserve the paragraph and enumeration markup somehow, and creating a large chain of YAML dictionaries is not really a sane option, the only choices would be:

  • Embed the HTML markup in the file, and risk non-careful translators breaking the markup by e.g. not closing tags.
  • Use Markdown, and risk people not writing the markup correctly when translating a really long string in Gettext.

In both cases, we would loose the ability to translate individual paragraphs, which also means that as soon as the developer changes the original text in YAML, translators would need to translate the whole bunch again, which is inconvenient.

On top of that, there are no tools to translate YAML properly that I am aware of, so we would need to write those too.

3. Allowing XML and YAML makes a confusing story and adds complexity

While adding YAML as a format would not be too hard, given that we already support it for DEP-11 distro metadata (Debian uses this), it would make the business of creating metainfo files more confusing. At time, we have a clear story: Write the XML, store it in /usr/share/metainfo, use standard tools to translate the translatable entries. Adding YAML to the mix adds an additional choice that needs to be supported for eternity and also has the problems mentioned above.

I wanted to add YAML as format for AppStream, and we discussed this at the hackfest as well, but in the end I think it isn’t worth the pain of supporting it for upstream projects (remember, someone needs to maintain the parsers and specification too and keep XML and YAML in sync and updated). Don’t get me wrong, I love YAML, but for translated metadata which needs a guarantee on format stability it is not the ideal choice.

So yeah, XML isn’t fun to write by hand. But for this case, XML is a good choice.

Two weeks ago was the GNOME Software hackfest in London, and I’ve been there! And I just now found the time to blog about it, but better do it late than never 😉 .

Arriving in London and finding the Red Hat offices

After being stuck in trains for the weekend, but fortunately arriving at the airport in time, I finally made it to London with quite some delay due to the slow bus transfer from Stansted Airport. After finding the hotel, the next issue was to get food and a place which accepted my credit card, which was surprisingly hard – in defence of London I must say though, that it was a Sunday, 7 p.m. and my card is somewhat special (in Canada, it managed to crash some card readers, so they needed a hard-reset). While searching for food, I also found the Red Hat offices where the hackfest was starting the next day by accident. My hotel, the office and the tower bridge were really close, which was awesome! I have been to London in 2008 the last time, and only for a day, so being that close to the city center was great. The hackfest didn’t leave any time to visit the city much, but by being close to the center, one could hardly avoid the “London experience” 😉 .

Cool people working on great stuff

towerbridge2016That’s basically the summary for the hackfest 😉 . It was awesome to meet with Richard Hughes again, since we haven’t seen each other in person since 2011, but work on lots of stuff together. This was especially important, since we managed to solve quite some disagreements we had over stuff – Richard even almost managed to make me give in to adding <kudos/> to the AppStream spec, something which I was pretty against supporting (it didn’t make it yet, but I am no longer against the idea of having that – the remaining issues are solvable).

Meeting Iain Lane again (after FOSDEM) was also very nice, and also seeing other people I’ve only worked with over IRC or bug reports (e.g. William, Kalev, …) was great. Also lots of “new” people were there, like guys from Endless, who build their low-budget computer for developing/emerging countries on top of GNOME and Linux technologies. It’s pretty cool stuff they do, you should check out their website! (they also build their distribution on top of Debian, which is even more awesome, and something I didn’t know before (because many Endless people I met before were associated with GNOME or Fedora, I kind of implicitly assumed the system was based on Fedora 😛 )).

The incarnation of GNOME Software used by endless looks pretty different from what the normal GNOME user sees, since it’s adjusted for a different audience and input method. But it looks great, and is a good example for how versatile GS already is! And for upstream GNOME, we’ve seen some pretty great mockups done by Endless too – I hope those will make it into production somehow.

Ironically, a "snapstore" was close to the office ;-)

Ironically, a “snapstore” was close to the office ;-)

XdgApp and sandboxing of apps was also a big topic, aside from Ubuntu and Endless integration. Fortunately, Alexander Larsson was also there to answer all the sandboxing and XdgApp-questions.

I used the time to follow up on a conversation with Alexander we started at FOSDEM this year, about the Limba vs. XdgApp bundling issue. While we are in-line on the sandboxing approach, the way how software is distributed is implemented differently in Limba and XdgApp, and it is bad to have too many bundling systems around (doesn’t make for a good story where we can just tell developers “ship as this bundling format, and it will be supported everywhere”). Talking with Alex about this was very nice, and I think there is a way out of the too-many-solutions dilemma, at least for Limba and XdgApp – I will blog about that separately soon.

On the Ubuntu side, a lot of bugs and issues were squashed and changes upstreamed to GNOME, and people were generally doing their best to reduce Richard’s bus-factor on the project a little 😉 .

I mainly worked on AppStream issues, finishing up the last pieces of appstream-generator and running it against some sample package sets (and later that week against the whole Debian archive). I also started to implement support for showing AppStream issues in the Debian PTS (this work is not finished yet). I also managed to solve a few bugs in the old DEP-11 generator and prepare another release for Ubuntu.

We also enjoyed some good Japanese food, and some incredibly great, but also suddenly very expensive Indian food (but that’s a different story 😉 ).

The most important thing for me though was to get together with people actually using AppStream metadata in software centers and also more specialized places. This yielded some useful findings, e.g. that localized screenshots are not something weird, but actually a wanted feature of Endless for their curated AppStore. So localized screenshots will be part of the next AppStream spec. Also, there seems to be a general need to ship curation information for software centers somehow (which apps are featured? how are they styled? added special banners for some featured apps, “app of the day” features, etc.). This problem hasn’t been solved, since it’s highly implementation-specific, and AppStream should be distro-agnostic. But it is something we might be able to address in a generic way sooner or later (I need to talk to people at KDE and Elementary about it).

In summary…

It was a great event! Going to conferences and hackfests always makes me feel like it moves projects leaps ahead, even if you do little coding. Sorting out issues together with people you see in person (rather than communicating with them via text messages or video chat), is IMHO always the most productive way to move forward (yeah, unless you do this every week, but I think you get my point 😀 ).

For me, being the only (and youngest ^^) developer at the hackfest who was not employed by any company in the FLOSS business, the hackfest was also motivating to continue to invest spare time into working on these projects.

So, the only thing left to do is a huge shout out of “THANK YOU” to the Ubuntu Community Fund – and therefore the Ubuntu community – for sponsoring me! You rock! Also huge thanks to Canonical for organizing the sponsoring really quickly, so I didn’t get into trouble with paying my flights.

Laney and attente walking on the Millennium Bridge after we walked the distance between Red Hat and Canonical's offices.

Laney and attente on the Millennium Bridge after we walked the distance between Red Hat and Canonical’s offices.

To worried KDE people: No, I didn’t leave the blue side – I just generally work on cross-desktop stuff, and would like all desktops to work as well as possible 😉

April 24, 2016

In case you missed it: Please vote now on!

April 21, 2016
New DRM drivers are being added to almost each new kernel release, and because the mode setting API is so rich and complex, bugs do slip in that translate to differences in behaviour between drivers.

There have been previous attempts at writing test suites for validating changes and preventing regressions, but they have typically happened downstream and focused on the specific needs of specific products and limited to one or at most a few of different hardware platforms.

Writing these tests from scratch would have been an enormous amount of work, and gathering previous efforts and joining them wouldn't be much worth it because they were written using different test frameworks and in different programming languages. Also, there would be great overlap on the basic tests, and little would remain of the trickier stuff.

Of the existing test suites, the one with most coverage is intel-gpu-tools, used by the Intel graphics team. Though a big part is specific to the i915 driver, what uses the generic APIs is pretty much driver-independent and can be made to work with the other drivers without much effort. Also, Broadcom's Eric Anholt has already started adding tests for IOCTLs specific to the VideoCore-IV driver.

Collabora's Micah Fedke and Daniel Stone had added a facility for selecting DRM device files other than i915's and I improved the abstraction for creating buffers so it works for drivers without GEM buffers. Next I removed a bunch of superfluous dependencies on i915-only stuff and got a useful subset of tests to run on a Radxa Rock2 board (with the Rockchip 3288 SoC). Around half of these patches have been merged already and the other half are awaiting review. Meanwhile, Collabora's Robert Foss is running the ported tests on a Raspberry Pi 2 and has started sending patches to account for its peculiarities.

The next two big chunks of work are abstracting CRC checksums of frames (on drivers other than i915 this could be done with Google's Chamelium or with a board similar to Numato Opsis), and the buffer management API from libdrm that is currently i915-only (bufmgr). Something that will have to be dealt with in the future is abstracting the submittal of specific loads on the GPU as that's currently very much driver-specific.

Additionally, I will be scheduling jobs in our LAVA instance to run these tests on the boards we have in there.

Thanks to Google for sponsoring my time, to the Intel OTC folks for their support and reviews, and to Collabora for sponsoring Robert's, Micah's and Daniel's time.

April 20, 2016

It’s election season in land, and it matters: Besides new board seats we’re also voting on bylaw changes and whether to join SPI or not.

Personally, and as the secretary of the board I’m very much in favour of joining SPI. It will allow us to offload all the boring bits of running a foundation, and those are also all the bits we tend to struggle with. And that would give the board more time to do things that actually matter and help the community. And all that for a really reasonable price - running our own legal entity isn’t free, and not really worth it for our small budget mostly consisting of travel sponsoring and the occasional internship.

And bylaw changes need a qualified supermajority of all members, every vote counts and not voting essentially means voting no. Hence please vote, and please vote even when you don’t want to join - this is our second attempt and I’d really like to see a clear verdict from our members, one way or the other.


Voting closes by  Apr 26 23:59 UTC, but please don’t cut it short, it’s a computer that decides when it’s over …

April 18, 2016

When we set of to do the Fedora Workstation we had some clear idea about where we wanted to take it, but we also realized there was a lot of cleaning up needed in our stack to make our vision viable. The biggest change we felt was needed to enable us was the move towards using application bundles as the primary shipping method for applications as opposed to the fine grained package universe that RPMS represent. That said we also saw the many advantages the packages brought in terms of easing security updates and allowing people to fine tune their system, so we didn’t want to throw the proverbial baby out with the bathwater. So we started investigating the various technologies out there, as we where of course not alone in thinking about these things. Unfortunately nothing clearly fit the bill of what we wanted and trying to modify for instance Docker to be a good technology for running desktop applications turned out to be nonviable. So we tasked Alex Larsson with designing and creating what today is known as xdg-app. Our requirements list looked something like this (in random order):

a) Easy bundling of needed libraries
b) A runtime system to reduce the application sizes to something more manageable and ease providing security updates.
c) A system designed to be managed by a desktop session as opposed to managed by sysadmin style tools
d) A security model that would let us gradually move towards sandboxing applications and alleviate the downsides of library bundling
e) An ability to reliably offer online updates of applications
f) Reuse as much of the technology created by others as possible to lower maintenance overhead
g) Design it in a way that makes supporting the applications cross multiple distributions possible and easy
h) Provide a top notch developer story so that this becomes a big positive for application developers and not another pain point.

As we investigated what we needed other requirements become obvious, like the need to migrate from X to Wayland in order to build a modern composited windowing system that renders using GL, instead of an aging one that has a rendering interface that is no longer used for the most part, and to be able to provide the level of system security we wanted. There was also the need to design new systems like Pinos for handling video and add new functionality to PulseAudio for dealing with sandboxed applications, creating libinput to have great input handling in Wayland and also let us share the input subsystem between X and Wayland. And of course we wanted our new packaging system tightly integrated into GNOME Software so that install, updating and running these applications became smooth and transparent to the user.

This would be a big undertaking and it turned out to be an even bigger effort than we initially thought, as there was a lot of swamp draining needed here and I am really happy that we have a team capable of pulling these things off. For instance there is not really many other people in the Linux community other than Peter Hutterer who could have created libinput, and without libinput there is no way Wayland would be a viable alternative to X (and without libinput neither would Mir which is a bit ironic for a system that was created because they couldn’t wait for Wayland :).

So going back to the title of this blog entry I feel that we are now finally exiting what I think of as Phase 1, even if we never formally designated it as such, of our development roadmap. For instance we initially hoped to have Wayland feature complete in a Fedora 22 timeframe, but it has taken us extra time to get all the smaller details right, so instead we are now having what we consider the first feature complete Wayland ready with Fedora Workstation 24. And if things go as we expect and hope that should become our default system starting from Fedora Workstation 25. The X Window session will be shipping and available for a long time yet I am sure, but not defaulting to it will mark a clear line in the sand for where the development focus is going forward.

At the same time Xdg-app has started to come together nicely over the last few Months with a lot of projects experimenting with it and bugs and issues being quickly resolved. We also taking major steps towards bringing xdg-app into the mainstream by Alex now working on making Xdg-apps OCI compliant, basically meaning that xdg-apps conform to the Open Container Initative requirements defined by Our expectation is that the next Xdg-app development release will include the needed bits to be OCI compliant. Another nice milestone for Xdg-app was that it recently got added to Debian, meaning that Xdg-apps should be more easily runable in both Fedora its downstreams and in Debian and its downstreams.

Another major piece of engineering that is coming to a close is moving major applications such as Firefox, LibreOffice and Eclipse to GTK3. This was needed both to get these applications able to run natively on Wayland, but it also enabled us to make them work nicely for HiDPI. This has also played out into how GTK3 have positioned itself which to be a toolkit dedicated to pushing the Linux desktop forward and helping that quickly adapt and adopt to changes in the technology landscape. This is why GTK3 is that toolkit that has been leading the charge on things like HiDPI support and Wayland support. At the same time we know some of the changes in GTK3 have been painful for application developers, especially the changes in how theming works, but with the bulk of the migration to using CSS for theming now being complete we expect that even for applications that use GTK3 in ‘weird ways’ like Firefox, LibreOffice and Eclipse, things should be stable.

Another piece of the puzzle we have wanted to improve is the general Linux hardware story. So since Red Hat joined Khronos last year the Red Hat Graphics team, with Dave Airlie and Adam Jackson leading the charge, has been able to participate in preparing the launch of Vulkan through doing review and we even managed to sneak in a bit of code contribution by Adam Jackson ensuring that there was a vendor neutral Vulkan loader available so that we didn’t end up in a situation where every vendor had to provide their own.

We have also been contributing to the vendor neutral OpenGL dispatcher. The dispatcher is basically a layer that routes an applications OpenGL rendering to the correct implementation, so if you have a system with a discrete GPU system you can for instance control which of your two GPUs handle a certain application or game. Adam Jackson has also been collaborating closely with NVidia on getting such a dispatch system complete for OpenGL, so that the age old problem of the Mesa OpenGL library and the proprietary NVidia OpenGL library conflicting can finally be resolved. So NVidia has of course handled the part in their driver and they where also the ones designing this, but Adam has been working on getting the Mesa parts completed. We think that this will make the GPU story on Linux a lot nicer going forward. There are still a few missing pieces before we have the discrete graphics card scenario handled in a smooth way, but we are getting there quickly.

The other thing we have been working on in terms of hardware support, which is still ongoing is improving the Red Hat certification process to cover more desktop hardware. You might ask what that has to do with Fedora Workstation, but it actually is a quite efficient way of improving the quality of Linux support for desktop hardware in general as most of the major vendors submit some of their laptops and desktops to Red Hat for certification. So the more issues the Red Hat certification process can detect the better Linux support on such hardware can become.

Another major effort where we have managed to cover a lot of our goals and targets is GNOME Software. Since the inception of Fedora Workstation we taken that tool and added functionality like UEFI firmware updates, codecs and font handling, GNOME Extensions handling, System upgrades, Xdg-app handling, users reviews, improved application metadata, improved handling of 3rd party repositories and improved general performance with the move from yum to hawkeye. And I think that the Software store has become a crucial part of what you expect of a desktop these days, with things like the Google Play store, the Apple Store and the Microsoft store to some degree defining their respective products more than the heuristics of the shell of Android, iPhone, MacOS or Windows. And I take it as an clear recognition of the great work Richard Hughes had done with GNOME Software that this week there is a special GNOME Software hackfest in London with participants from Fedora/Red Hat, Ubuntu/Canonical, Codethink and Endless.

So I am very happy with where we are at, and I want to say thank you to all long term Fedora users who been with us through the years and also say thank you and welcome to all the new Fedora Workstation users who has seen all the cool stuff we been doing and decided to join us over the last two years; seeing the strong growth in our userbase during this time has been a great source of joy for us and been a verification that we are on the right track.

I am also very happy about how the culmination of these efforts will be on display with the upcoming Fedora Workstation 24 release! Of course it also means it is time for the Fedora Workstation Working group to start planning what Phase 2 of reaching the Fedora Workstation vision should be :)

When we released graphics tablet support in libinput earlier this year, only tablet tools were supported. So while you could use the pen normally, the buttons, rings and strips on the physical tablet itself (the "pad") weren't detected by libinput and did not work. I have now merged the patches for pad support into libinput.

The reason for the delay was simple: we wanted to get it right [1]. Pads have a couple of properties that tools don't have and we always considered pads to be different to pens and initially focused on a more generic interface (the "buttonset" interface) to accommodate for those. After some coding, we have now arrived at a tablet pad-specific interface instead. This post is a high-level overview of the new tablet pad interface and how we intend it do be used.

The basic sign that a pad is present is when a device has the tablet pad capability. Unlike tools, pads don't have proximity events, they are always considered in proximity and it is up to the compositor to handle the focus accordingly. In most cases, this means tying it to the keyboard focus. Usually a pad is available as soon as a tablet is plugged in, but note that the Wacom ExpressKey Remote (EKR) is a separate, wireless device and may be connected after the physical pad. It is up to the compositor to link the EKR with the correct tablet (if there is more than one).

Pads have three sources of events: buttons, rings and strips. Rings and strips are touch-sensitive surfaces and provide absolute values - rings in degrees, strips in normalized [0.0, 1.0] coordinates. Similar to pointer axis sources we provide a source notification. If that source is "finger", then we send a terminating out-of-range event so that the caller can trigger things like kinetic scrolling.

Buttons on a pad are ... different. libinput usually re-uses the Linux kernel's include/input.h event codes for buttons and keys. But for the pad we decided to use plain sequential button numbering, starting at index 0. So rather than a semantic code like BTN_LEFT, you'd simply get a button 0 event. The reasoning behind this is a caveat in the kernel evdev API: event codes have semantic meaning (e.g. BTN_LEFT) but buttons on a tablet pad don't those meanings. There are some generic event ranges (e.g. BTN_0 through to BTN_9) and the Wacom tablets use those but once you have more than 10 buttons you leak into other ranges. The ranges are simply too narrow so we end up with seemingly different buttons even though all buttons are effectively the same. libinput's pad support undoes that split and combines the buttons into a simple sequential range and leaves any semantic mapping of buttons to the caller. Together with libwacom which describes the location of the buttons a caller can get a relatively good idea of how the layout looks like.

Mode switching is a commonly expected feature on tablet. One button is designated as mode switch button and toggles all other buttons between the available modes. On the Intuos Pro series tablets, that button is usually the button inside the ring. Button mapping and thus mode switching is however a feature we leave up to the caller, if you're working on a compositor you will have to implemented mode switching there.

Other than that, pad support is relatively simple and straightforward and should not cause any big troubles.

[1] or at least less wrong than in the past
[2] They're actually linux/input-event-codes.h in recent kernels

April 16, 2016

I moved my blog around a bit and it appears that static pages are now in favour, so I switched to that, by way of Hugo. CSS and such needs more tweaking, but it’ll make do for now.

As part of this, RSS feeds and such changed, if you want to subscribe to this (very seldomly updated) blog, use

AppStream GeneratorSince mid-2015 we were using the dep11-generator in Debian to build AppStream metadata about available software components in the distribution.

Getting rid of dep11-generator

Unfortunately, the old Python-based dep11-generator was hitting some hard limits pretty soon. For example, using multiprocessing with Python was a pain, since it resulted in some very hard-to-track bugs. Also, the multiprocessing approach (as opposed to multithreading) made it impossible to use the underlying LMDB database properly (it was basically closed and reopened in each forked off process, since pickling the Python LMDB object caused some really funny bugs, which usually manifested themselves in the application hanging forever without any information on what was going on). Additionally to that, the Python-based generator forced me to maintain two implementations of the AppStream YAML spec, one in C and one in Python, which consumes quite some time. There were also some other issues (e.g. no unit-tests) in the implementation, which made me think about rewriting the generator.

Adventures in Go / Rust / D

Since I didn’t want to write this new piece of software in C (or basically, writing it in C was my last option 😉 ), I explored Go and Rust for this purpose and also did a small prototype in the D programming language, when I was starting to feel really adventurous. And while I never intended to write the new generator in D (I was pretty fixated on Go…), this is what happened. The strong points for D for this particular project were its close relation to C (and ease of using existing C code), its super-flat learning curve for someone who knows and likes C and C++ and its pretty powerful implementations of the concurrent and parallel programming paradigms. That being said, not all is great in D and there are some pretty dark spots too, mainly when it comes to the standard library and compilers. I will dive into my experiences with D in a separate blogpost.

What good to expect from appstream-generator?

So, what can the new appstream-generator do for you? Basically, the same as the old dep11-generator: It will extract metadata from a distribution’s package archive, download and resize screenshots, search for icons and size them properly and generate reports in JSON and HTML of found metadata and issues.

LibAppStream-based parsing, generation of YAML or XML, multi-distro support, …

As opposed to the old generator, the new generator utilizes the metadata parsers and writers of libappstream. This allows it to return the extracted metadata as AppStream YAML (for Debian) or XML (everyone else) It is also written in a distribution-agnostic way, so if someone wants to use it in a different distribution than Debian, this is possible now. It just requires a very small distribution-specific backend to be written, all of the details of the metadata extraction are abstracted away (just two interfaces need to be implemented). While I do not expect anyone except Debian to use this in the near future (most distros have found a solution to generate metadata already), the frontend-backend split is a much cleaner design than what was available in the previous code. It also allows to unit-test the code properly, without providing a Debian archive in the testsuite.

Feature Flags, Optipng, …

The new generator also allows to enable and disable certain sets of features in a standardized way. E.g. Ubuntu uses a language-pack system for translations, which Debian doesn’t use. Features like this can be implemented as disableable separate modules in the generator. We use this at time to e.g. allow descriptions from packages to be used as AppStream descriptions, or for running optipng on the generated PNG images and icons.

No more Contents file dependency

Another issue the old generator had was that it used the Contents file from the Debian archive to find matching icons for an application. We could never be sure whether the contents in the Contents file actually matched the contents of the package we were currently dealing with. What made things worse is that at Ubuntu, the archive software is only updating the Contents file weekly daily (while the generator might run multiple times a day), which has lead to software being ignored in the metadata, because icons could not yet be found. Even on Debian, with its quickly-updated Contents file, we could immediately see the effects of an out-of-date Contents file when updating it failed once. In the new generator, we read the contents of each package ourselves now and store them in a LMDB database, bypassing the Contents file and removing the whole class of problems resulting from missing or wrong contents-data.

It can’t all be good, right?

That is true, there are also some known issues the new generator has:

Large amounts of RAM required

The better speed of the new generator comes at the cost of holding more stuff in RAM. Much more. When processing data from 5 architectures initially on Debian, the amount of required RAM might lie above 4GB, with the OOM killer sometimes being quicker than the garbage collector… That being said, on subsequent runs the amount of required memory is much lower. Still, this is something I am working on to improve.

What are symbolic links?

To be faster, the appstream-generator will read the md5sum file in .deb packages instead of extracting the payload archive and reading its contents. Since the md5sums file does not list symbolic links, symlinks basically don’t exist for the new generator. This is a problem for software symlinking icons or even .desktop files around, like e.g. LibreOffice does.

I am still investigating how widespread the use of symlinks for icons and .desktop files is, but it looks like fixing packages (making them not-symlink stuff and rather move the files) might be the better approach than investing additional computing power to find symlinks or even switch back to parsing the Contents file. Input on this is welcome!

Deploying asgen

I finished the last pieces of the appstream-generator (together with doing lots of other cool things and talking to great people) at the GNOME Software Hackfest in London last week (detailed blogposts about things that happened there will follow – many thanks once again for the Ubuntu community for sponsoring my attendance!).

Since today, the new generator is running on the Debian infrastructure. If bigger issues are found, we can still roll back to the old code. I decided to deploy this faster, so we can get some good testing done before the Stretch release. Please report any issues you may find!