January 07, 2019

The video

Below you can see glmark2 running as a Wayland client in Weston, on a NanoPC -T4 (so a RK3399 SoC with a Mali T-864 GPU)). It's much smoother than on the video, which is limited to 5FPS by the webcam.

Weston is running with the DRM backend and the GL renderer.

The history behind it

For more than 10 years, at Collabora we have been happily helping our customers to make the most of their hardware by running free software.

One area some of us have specially enjoyed working on has been open drivers for GPUs, which for a long time have been considered the next frontier in the quest to have a full software platform that companies and individuals can understand, improve and fix without having to ask for permission first.

Something that has saddened me a bit has been our reduced ability to help those customers that for one reason or another had chosen a hardware platform with ARM Mali GPUs, as no open driver was available for those.

While our biggest customers were able to get a high level of support from the vendors in order to have the Mali graphics stack well integrated with the rest of their product, the smaller ones had a much harder time in achieving that level of integration, which manifested in reduced performance, increased power consumption and slipped milestones.

That's why we have been following with great interest the several efforts that aimed to come up with an open driver for GPUs in the Mali family, one similar to those already existing for Qualcomm, NVIDIA and Vivante.

At XDC last year we had the chance of meeting the people involved in the latest effort to develop such a driver: Panfrost. And in the months that followed I made some room in my backlog to come up with a plan to give the effort a boost.

At that point, Panfrost was only able to get its bits in the screen by an elaborate hack that involved copying each frame into a X11 SHM buffer, which besides making the setup of the development environment much more cumbersome, invalidated any performance analysis. It also limited testing to demos such as glmark2.

Due to my previous work on Etnaviv I was already familiar with the abstractions in Mesa for setups in which the display of buffers is performed by a device different from the GPU so it was just a matter of seeing how we could get the kernel driver for the Mali GPU to play well with the rest of the stack.

So during the past month or so I have come up with a proper implementation of the winsys abstraction that makes use of ARM's kernel driver. The result is that now developers have a better base on which to work on the rendering side of things.

By properly creating, exporting and importing buffers, we can now run applications on GBM, from demos such as kmscube and glmark2 to compositors such as Weston, but also big applications such as Kodi. We are also supporting zero-copy display of GPU-rendered clients in Weston.

This should make it much easier to work on the rendering side of things, and work on a proper DRM driver in the mainline kernel can proceed in parallel.

For those interested in joining to the effort, Alyssa has graciously taken the time to update the instructions to build and test Panfrost. You can join us at #panfrost in Freenode and can start sending merge requests to Gitlab.

Thanks to Collabora for sponsoring this work and to Alyssa Rosenzweig and Lyude Paul for their previous work and for answering my questions.
December 15, 2018

This time we're digging into HID - Human Interface Devices and more specifically the protocol your mouse, touchpad, joystick, keyboard, etc. use to talk to your computer.

Remember the good old days where you had to install a custom driver for every input device? Remember when PS/2 (the protocol) had to be extended to accommodate for mouse wheels, and then again for five button mice. And you had to select the right protocol to make it work. Yeah, me neither, I tend to suppress those memories because the world is awful enough as it is.

As users we generally like devices to work out of the box. Hardware manufacturers generally like to add bits and bobs because otherwise who would buy that new device when last year's device looks identical. This difference in needs can only be solved by one superhero: Committee-man, with the superpower to survive endless meetings and get RFCs approved.

Many many moons ago, when USB itself was in its infancy, Committee man and his sidekick Caffeine boy got the USB consortium agree on a standard for input devices that is so self-descriptive that operating systems (Win95!) can write one driver that can handle this year's device, and next year's, and so on. No need to install extra drivers, your device will just work out of the box. And so HID was born. This may only be an approximate summary of history.

Originally HID was designed to work over USB. But just like Shrek the technology world is obsessed with layers so these days HID works over different transport layers. HID over USB is what your mouse uses, HID over i2c may be what your touchpad uses. HID works over Bluetooth and it's celebrity-diet version BLE. Somewhere, someone out there is very slowly moving a mouse pointer by sending HID over carrier pigeons just to prove a point. Because there's always that one guy.

HID is incredibly simple in that the static description of the device can just be bytes burnt into the ROM like the Australian sun into unprepared English backpackers. And the event frames are often an identical series of bytes where every bit is filled in by the firmware according to the axis/buttons/etc.

HID is incredibly complicated because parsing it is a stack-based mental overload. Each individual protocol item is simple but getting it right and all into your head is tricky. Luckily, I'm here for you to make this simpler to understand or, failing that, at least more entertaining.

As said above, the purpose of HID is to make devices describe themselves in a generic manner so that you can have a single driver handle any input device. The idea is that the host parses that standard protocol and knows exactly how the device will behave. This has worked out great, we only have around 200 files dealing with vendor- and hardware-specific HID quirks as of v4.20.

HID messages are Reports. And to know what a Report means and how to interpret it, you need a Report Descriptor. That Report Descriptor is static and contains a series of bytes detailing "what" and "where", i.e. what a sequence of bits represents and where to find those bits in the Report. So let's try and parse one of Report Descriptors, let's say for a fictional mouse with a few buttons. How exciting, we're at the forefront of innovation here.

The Report Descriptor consists of a bunch of Items. A parser reads the next Item, processes the information within and moves on. Items are small (1 byte header, 0-4 bytes payload) and generally only apply exactly one tiny little bit of information. You need to accumulate several items to build up enough information to actually know what's happening.

The "what" question of the Report Descriptor is answered with the so-called Usage. This could be something simple like X or Y (0x30 and 0x31) or something more esoteric like System Menu Exit (0x88). A Usage is 16 bits but all Usages are grouped into so-called Usage Pages. A Usage Page too is a 16 bit value and together they form the 32-bit value that tells us what the device can do. Examples:

0001 0031 # Generic Desktop, Y
0001 0088 # Generic Desktop, System Menu Exit
0003 0005 # VR Controls, Head Tracker
0003 0006 # VR Controls, Head Mounted Display
0004 0031 # Keyboard, Keyboard \ and |
Note how the Usage in the last item is the same as the first one, without the Usage Page you will mix things up. It helps if you always think of as the Usage as a 32-bit number. For your kids' bed-time story time, here are the HID Usage Tables from 2004 and the approved HID Usage Table Review Requests of the last decade. Because nothing puts them to sleep quicker than droning on about hex numbers associated with remote control buttons.

To successfully interpret a Report from the device, you need to know which bits have which Usage associated with them. So let's go back to our innovative mouse. We would want a report descriptor with 6 items like this:

Usage Page (Generic Desktop)
Usage (X)
Report Size (16)
Usage Page (Generic Desktop)
Usage (Y)
Report Size (16)
This basically tells the host: X and Y both have 16 bits. So if we get a 4-byte Report from the device, we know two bytes are for X, two for Y.

HID was invented when a time when bits were more expensive than printer ink, so we can't afford to waste any bits (still the case because who would want to spend an extra penny on more ROM). HID makes use of so-called Global items, once those are set their value applies to all following items until changed. Usage Page and Report Size are such Global items, so the above report descriptor is really implemented like this:

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
The Report Count just tells us that 2 fields of the current Report Size are coming up. We have two usages, two fields, and 16 bits each so we know what to do. The Input item is sort-of the marker for the end of the stack, it basically tells us "process what you've seen so far", together with a few flags. Rel in this case means that the Usages are relative. Oh, and Input means that this is data from device to host. Output would be data from host to device, e.g. to set LEDs on a keyboard. There's also Feature which indicates configurable items.

Buttons on a device are generally just numbered so it'd be monumental 16-bits-at-a-time waste to have HID send Usage (Button1), Usage (Button2), etc. for every button on the device. HID instead provides a Usage Minimum and Usage Maximumto sequentially order them. This looks like this:

Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
So we have 5 buttons here and each button has one bit. Note how the buttons are Abs because a button state is not a relative value, it's either down or up. HID is quite intolerant to Schrödinger's thought experiments.

Let's put the two things together and we have an almost-correct Report descriptor:

Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)

Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
New here is Cnst. This signals that the bits have a constant value, thus don't need a Usage and basically don't matter (haha. yeah, right. in theory). Linux does indeed ignore those. Cnst is used for padding to align on byte boundaries - 5 bits for buttons plus 3 bits padding make 8 bits. Which makes one byte as everyone agrees except for granddad over there in the corner. I don't know how he got in.

Were we to get a 5-byte Report from the device, we'd parse it approximately like this:

button_state = byte[0] & 0x1f
x = bytes[1] | (byte[2] << 8)
y = bytes[3] | (byte[4] << 8)
Hooray, we're almost ready. Except not. We may need more info to correctly interpret the data within those reports.

The Logical Minimum and Logical Maximum specify the value range of the actual data. We need this to tell us whether the data is signed and what the allowable range is. Together with the Physical Minimumand the Physical Maximum they specify what the values really mean. In the simple case:

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
This just means our x/y data is signed. Easy. But consider this combination:

Logical Minimum (0)
Logical Maximum (1)
Physical Minimum (1)
Physical Maximum (12)
This means that if the bit is 0, the effective value is 1. If the bit is 1, the effective value is 12.

Note that the above is one report only. Devices may have multiple Reports, indicated by the Report ID. So our Report Descriptor may look like this:

Report ID (01)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Report ID (02)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
If we were to get a Report now, we need to check byte 0 for the Report ID so we know what this is. i.e. our single-use hard-coded parser would look like this:

if byte[0] == 0x01:
button_state = byte[1] & 0x1f
else if byte[0] == 0x02:
x = bytes[2] | (byte[3] << 8)
y = bytes[4] | (byte[5] << 8)
A device may use multiple Reports if the hardware doesn't gather all data within the same hardware bits. Now, you may ask: if I get fifteen reports, how should I know what belongs together? Good question, and lucky for you the HID designers are miles ahead of you. Report IDs are grouped into Collections.

Collections can have multiple types. An Application Collectiondescribes a set of inputs that make sense as a whole. Usually, every Report Descriptor must define at least one Application Collection but you may have two or more. For example, a a keyboard with integrated trackpoint should and/or would use two. This is how the kernel knows it needs to create two separate event nodes for the device. Application Collections have a few reserved Usages that indicate to the host what type of device this is. These are e.g. Mouse, Joystick, Consumer Control. If you ever wondered why you have a device named like "Logitech G500s Laser Gaming Mouse Consumer Control" this is the kernel simply appending the Application Collection's Usage to the device name.

A Physical Collection indicates that the data is collected at one physical point though what a point is is a bit blurry. Theoretical physicists will disagree but a point can be "a mouse". So it's quite common for all reports on a mouse to be wrapped in one Physical Collections. If you have a device with two sets of sensors, you'd have two collections to illustrate which ones go together. Physical Collections also have reserved Usages like Pointer or Head Tracker.

Finally, a Logical Collection just indicates that some bits of data belong together, whatever that means. The HID spec uses the example of buffer length field and buffer data but it's also common for all inputs from a mouse to be grouped together. A quick check of my mice here shows that Logitech doesn't wrap the data into a Logical Collection but Microsoft's firmware does. Because where would we be if we all did the same thing...

Anyway. Now that we know about collections, let's look at a whole report descriptor as seen in the wild:

Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Application)
Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Logical)
Report ID (26)
Usage (Pointer)
Collection (Physical)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Logical Minimum (0)
Logical Maximum (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
Usage (Wheel)
Physical Minimum (0)
Physical Maximum (0)
Report Count (1)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
End Collection
End Collection
End Collection
We have one Application Collection (Generic Desktop, Mouse) that contains one Logical Collection (Generic Desktop, Mouse). That contains one Physical Collection (Generic Desktop, Pointer). Our actual Report (and we have only one but it has the decimal ID 26) has 5 buttons, two 16-bit axes (x and y) and finally another 16 bit axis for the Wheel. This device will thus send 8-byte reports and our parser will do:

if byte[0] != 0x1a: # it's decimal in the above descriptor
error, should be 26
button_state = byte[1] & 0x1f
x = byte[2] | (byte[3] << 8)
y = byte[4] | (byte[5] << 8)
wheel = byte[6] | (byte[7] << 8)
That's it. Now, obviously, you can't write a parser for every HID descriptor out there so your actual parsing code needs to be generic. The Linux kernel does exactly that and so does everything else that needs to parse HID. There's a huge variety in devices out there, all with HID descriptors that may or may not be correct. As with so much in life, correct HID implementations are often defined by "whatever Windows accepts" so if you like playing catch, Linux development is for you.

Oh, in case you just got a bit too optimistic about the state of the world: HID allows for vendor-defined usages. Which does exactly what you'd think it does, it hides vendor-specific protocol inside what should be a generic protocol. There are devices with hidden report IDs that you can only unlock by sending the right magic sequence to the report and/or by defeating the boss on Level 4. Usually those devices present themselves as basic/normal devices over HID but if you know the magic sequence you get to use *gasp* all buttons. Or access the device-specific configuration features. Logitech's HID++ is just one example here but at least that's one where we have most of the specs available.

The above describes how to parse the HID report descriptor and interpret the reports. But what happens once you have a HID report correctly parsed? In the case of the Linux kernel, once the report descriptor is parsed evdev nodes are created (one per Application Collection, more or less). As the Reports come in, they are mapped to evdev codes and the data appears on the evdev node. That's where userspace like libinput can pick it up. That bit is actually quite simple (mostly anyway).

The above output was generated with the tools from the hid-tools repository. Go forth and hid-record.

December 14, 2018
libfprint, the fingerprint reader driver library, is nearing a 1.0 release.

Since the last time I reported on the status of the library, we've made some headway modernising the library, using a variety of different tools. Let's go through them and how they were used.


When libfprint was in its infancy, Daniel Drake found the NBIS fingerprint processing library matched what was required to provide fingerprint matching algorithms, and imported it in libfprint. Since then, the code in this copy-paste library in libfprint stayed the same. When updating it to the latest available version (from 2015 rather than 2007), as well as splitting off a patch to make it easier to update the library again in the future, I used Callcatcher to cull the unused functions.

Callcatcher is not a "production-level" tool (too many false positives, lack of support for many common architectures, etc.), but coupled with manual checking, it allowed us to greatly reduce the number of functions in our copy, so they weren't reported when using other source code quality checking tools.

LLVM's scan-build

This is a particularly easy one to use as its use is integrated into meson, and available through ninja scan-build. The output of the tool, whether on stderr, or on the HTML pages, is pretty similar to Coverity's, but the tool is free, and easily integrated into a CI (once you've fixed all the bugs, obviously). We found plenty of possible memory leaks and unintialised variables using this, with more flexibility than using Coverity's web interface, and avoiding going through hoops when using its "source code check as a service" model.

cflow and callgraph

LLVM has another tool, called callgraph. It's not yet integrated into meson, which was a bit of a problem to get some output out of it. But combined with cflow, we used it to find where certain functions were called, trying to find the origin of some variables (whether they were internal or device-provided for example), which helped with implementing additional guards and assertions in some parts of the library, in particular inside the NBIS sub-directory.

0.99.0 is out

We're not yet completely done with the first pass at modernising libfprint and its ecosystem, but we released an early Yule present with version 0.99.0. It will be integrated into Fedora after the holidays if the early testing goes according to plan.

We also expect a great deal from our internal driver API reference. If you have a fingerprint reader that's unsupported, contact your laptop manufacturer about them providing a Linux driver for it and point them at this documentation.

A number of laptop vendors are already asking their OEM manufacturers to provide drivers to be merged upstream, but a little nudge probably won't hurt.

Happy holidays to you all, and see you for some more interesting features in the new year.
December 12, 2018

Disclaimer: this is pending for v4.21 and thus not yet in any kernel release.

Most wheel mice have a physical feature to stop the wheel from spinning freely. That feature is called detents, notches, wheel clicks, stops, or something like that. On your average mouse that is 24 wheel clicks per full rotation, resulting in the wheel rotating by 15 degrees before its motion is arrested. On some other mice that angle is 18 degrees, so you get 20 clicks per full rotation.

Of course, the world wouldn't be complete without fancy hardware features. Over the last 10 or so years devices have added free-wheeling scroll wheels or scroll wheels without distinct stops. In many cases wheel behaviour can be configured on the device, e.g. with Logitech's HID++ protocol. A few weeks back, Harry Cutts from the chromium team sent patches to enable Logitech high-resolution wheel scrolling in the kernel. Succinctly, these patches added another axis next to the existing REL_WHEEL named REL_WHEEL_HI_RES. Where available, the latter axis would provide finer-grained scroll information than the click-by-click REL_WHEEL. At the same time I accidentally stumbled across the documentation for the HID Resolution Multiplier Feature. A few patch revisions later and we now have everything queued up for v4.21. Below is a summary of the new behaviour.

The kernel will continue to provide REL_WHEEL as axis for "wheel clicks", just as before. This axis provides the logical wheel clicks, (almost) nothing changes here. In addition, a REL_WHEEL_HI_RES axis is available which allows for finer-grained resolution. On this axis, the magic value 120 represents one logical traditional wheel click but a device may send a fraction of 120 for a smaller motion. Userspace can either accumulate the values until it hits a full 120 for one wheel click or it can scroll by a few pixels on each event for a smoother experience. The same principle is applied to REL_HWHEEL and REL_HWHEEL_HI_RES for horizontal scroll wheels (which these days is just tilting the wheel). The REL_WHEEL axis is now emulated by the kernel and simply sent out whenever we have accumulated 120.

Important to note: REL_WHEEL and REL_HWHEEL are now legacy axes and should be ignored by code handling the respective high-resolution version.

The magic value of 120 is taken directly from Windows. That value was chosen because it has a good number of integer factors, so dividing 120 by whatever multiplier the mouse uses gives you a integer fraction of 120. And because HW manufacturers want it to work on Windows, we can rely on them doing it right, provided we use the same approach.

There are two implementations that matter. Harry's patches enable the high-resolution scrolling on Logitech mice which seem to mostly have a multiplier of 8 (i.e. REL_WHEEL_HI_RES will send eight events with a value of 15 before REL_WHEEL sends 1 click). There are some interesting side-effects with e.g. the MX Anywhere 2S. In high-resolution mode with a multiplier of 8, a single wheel movement does not always give us 8 events, the firmware does its own magic here. So we have some emulation code in place with the goal of making the REL_WHEEL event happen on the mid-point of a wheel click motion. The exact point can shift a bit when the device sends 7 events instead of 8 so we have a few extra bits in place to reset after timeouts and direction changes to make sure the wheel behaviour is as consistent as possible.

The second implementation is for the generic HID protocol. This was all added for Windows Vista, so we're only about a decade behind here. Microsoft got the Resolution Multiplier feature into the official HID documentation (possibly in the hope that other HW manufacturers implement it which afaict didn't happen). This feature effectively provides a fixed value multiplier that the device applies in hardware when enabled. It's basically the same as the Logitech one except it's set through a HID feature instead of a vendor-specific protocol. On the devices tested so far (all Microsoft mice because no-one else seems to implement this) the multipliers vary a bit, ranging from 4 to 12. And the exact behaviour varies too. One mouse behaves correctly (Microsoft Comfort Optical Mouse 3000) and sends more events than before. Other mice just send the multiplied value instead of the normal value, so nothing really changes. And at least one mouse (Microsoft Sculpt Ergonomic) sends the tilt-wheel values more frequently and with a higher value. So instead of one event with value 1 every X ms, we now get an event with value 3 every X/4 ms. The mice tested do not drop events like the Logitech mice do, so we don't need fancy emulation code here. Either way, we map this into the 120 range correctly now, so userspace gets to benefit.

As mentioned above, the Resolution Multiplier HID feature was introduced for Windows Vista which is... not the most recent release. I have a strong suspicion that Microsoft dumped this feature as well, the most recent set of mice I have access to don't provide the feature anymore (they have vendor-private protocols that we don't know about instead). So the takeaway for all this is: if you have a Logitech mouse, you'll get higher-resolution scrolling on v4.21. If you have a Microsoft mouse a few years old, you may get high-resolution wheel scrolling if the device supports it. Any other vendor or a new Microsoft mouse, you don't get it.

Coincidentally, if you know anyone at Microsoft who can provide me with the specs for their custom protocol, I'd appreciate it. We'd love to have support for it both in libratbag and in the kernel. Or any other vendor, come to think of it.

December 10, 2018

For V3D last week, I resurrected my old GLES 3.1 series with SSBO and shader imgae support, rebuilt it for V3D 4.1 (shader images no longer need manual tiling), and wrote indirect draw support and started on compute shaders. As of this weekend, dEQP-GLES31 is passing 1387/1567 of tests with “compute” in the name on the simulator. I have a fix needed for barrier(), then it’s time to build the kernel interface. In the process, I ended up fixing several job flushing bugs, plugging memory leaks, improving our shader disassembly debug dumps, and reducing memory consumption and CPU overhead.

The TFU series is now completely merged, and the kernel cache management series from last week is now also merged. I also fixed a bug in the core GPU scheduler that would cause use-after-frees when tracing.

For vc4, Boris completed and merged the H/V flipping work after fixing display of flipped/offset SAND images. More progress has been made on chamelium testing, so we’re nearing having automated regression tests for parts of our display stack. My PM driver has the acks we needed, but my co-maintainer has tacked a new request on in v3 so it’ll be delayed.

December 05, 2018

Khronos Group has published two new extensions for Vulkan: VK_KHR_shader_float16_int8 and VK_KHR_shader_float_controls. In this post, I will talk about VK_KHR_shader_float_controls, which is the extension I have been implementing on Anvil driver, the open-source Intel Vulkan driver, as part of my job at Igalia. For information about VK_KHR_shader_float16_int8 and its implementation in Mesa, you can read Iago’s blogpost.

The Vulkan Working Group has defined a new extension VK_KHR_shader_float_controls, which allows applications to query and override the implementation’s default floating point behavior for rounding modes, denormals, signed zero and infinity. From the Vulkan application developer perspective, VK_shader_float_controls defines a new structure called VkPhysicalDeviceFloatControlsPropertiesKHR where the drivers expose the supported capabilities such as the rounding modes for each floating point data type, how the denormals are expected to be handled by the hardware (either flush to zero or preserve their bits) and if the value is a signed zero, infinity and NaN, whether it will preserve their bits.

typedef struct VkPhysicalDeviceFloatControlsPropertiesKHR {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           separateDenormSettings;
    VkBool32           separateRoundingModeSettings;
    VkBool32           shaderSignedZeroInfNanPreserveFloat16;
    VkBool32           shaderSignedZeroInfNanPreserveFloat32;
    VkBool32           shaderSignedZeroInfNanPreserveFloat64;
    VkBool32           shaderDenormPreserveFloat16;
    VkBool32           shaderDenormPreserveFloat32;
    VkBool32           shaderDenormPreserveFloat64;
    VkBool32           shaderDenormFlushToZeroFloat16;
    VkBool32           shaderDenormFlushToZeroFloat32;
    VkBool32           shaderDenormFlushToZeroFloat64;
    VkBool32           shaderRoundingModeRTEFloat16;
    VkBool32           shaderRoundingModeRTEFloat32;
    VkBool32           shaderRoundingModeRTEFloat64;
    VkBool32           shaderRoundingModeRTZFloat16;
    VkBool32           shaderRoundingModeRTZFloat32;
    VkBool32           shaderRoundingModeRTZFloat64;
} VkPhysicalDeviceFloatControlsPropertiesKHR;

This structure will be filled by the driver when calling vkGetPhysicalDeviceProperties2(), with a pointer to such structure as one of the pNext pointers of VkPhysicalDeviceProperties2 structure. With that, we know if the driver will support the SPIR-V capabilities we want to use in our shaders, if separate*Settings are true, remember to check the value of the property for the floating point bit-size types you are planning to work with.

The required bits to enable such capabilities in a SPIR-V shader are the following:

  1. Enable the extension: OpExtension "SPV_KHR_float_controls"
  2. Enable the desired capability. For example: OpCapability DenormFlushToZero
  3. Specify where to apply it. For example, we would like to flush to zero all fp64 denormalss in the %main function of a shader: OpExecutionMode %main DenormFlushToZero 64. If we want to apply different modes, we would repeat that line with the needed ones.
  4. Profit!

I implemented the support of this extensions for the Anvil’s supported GPUs (Broadwell, Skylake, Kabylake and newer), although we don’t support all the capabilities. For example on Broadwell, float16 denormals are not supported, and the support for flushing to zero the float16 denormals is not supported for all the instructions in the rest of generations.

If you are interested, the patches are now under review :-) As there are not real world code using this feature yet, please fill any bug you find about this in our bugzilla.

December 04, 2018

The last time I talked about my driver work was to announce the implementation of the shaderInt16 feature for the Anvil Vulkan driver back in May, and since then I have been working on VK_KHR_shader_float16_int8, a new Vulkan extension recently announced by the Khronos group, for which I have just posted initial patches in mesa-dev supporting Broadwell and later Intel platforms.

As you probably guessed by the name, this extension enables Vulkan to consume SPIR-V shaders that use of Float16 and Int8 types in arithmetic operations, extending the functionality included with VK_KHR_16bit_storage and VK_KHR_8bit_storage, which was limited to load/store operations. In theory, applications that do not need the range and precision of regular 32-bit floating point and integers, can use these new types to improve performance by increasing ALU throughput and reducing register pressure, which in some platforms can also lead to improved parallelism.

In the case of the Intel platforms initial testing done by Intel suggests that better ALU throughput is expected when issuing half-float instructions. Lower register pressure is also expected, at least for SIMD16 fragment and compute shaders, where we can pack all 16-channels worth of half-float data into a single GPU register, which could significantly improve performance for shaders that would otherwise need to spill registers to memory.

Another neat thing is that while VK_KHR_shader_float16_int8 is a Vulkan extension, its implementation is mostly API agnostic, so most of the work we did here should also help us have a proper mediump implementation for GLSL ES shaders in the future.

There are a few caveats to consider as well though: on some hardware platforms smaller bit-sizes have certain hardware restrictions that may lead to emitting worse shader code in some scenarios, and generally, Mesa’s compiler infrastructure (and the Intel compiler backend in particular) have a long history of being 32-bit only, so there are parts of the compiler stack that still work better for 32-bit code.

Because VK_KHR_shader_float16_int8 is a brand new feature, we don’t really have any real world use cases yet. This is on top of the fact that Mesa’s compiler backends have been mostly (or exclusively) 32-bit aware until now (and more recently 64-bit too), so going forward I would expect a lot of focus on making our compiler be as robust (and optimal) for 16-bit code as it is for 32-bit code.

While we are already aware of a few areas where we can do better and I am currently working on addressing a few of these, one of the major limiting factors we have at the moment is the fact that the only source of 16-bit shaders available to us is the Khronos CTS, which due to its particular motivation, is very different from real world shader workloads and it is not a valid source material to drive compiler optimization work. Unfortunately, it might take some time until we start seeing applications using these new features, so in the meantime we will need to find other ways to drive further work in this area, and I think our best option here might be GLSL ES’s mediump and lowp qualifiers.

GLSL ES mediump and lowp qualifiers have been around for a long time but they are only defined as hints to the shader compiler that lower precision is acceptable and we have never really used them to emit half-float code. Thankfully, Topi Pohjolainen from Intel has been working on this for a while, which would open up a much better scenario for improving our 16-bit compiler paths, so this is something I am really looking forward to.

Finally, as I say above, we could could definitely use more testing and feedback from real world use cases, so if you decide to use this feature in your next project and you hit any bugs, please be sure to file them in Bugzilla so we can continue to improve our implementation.

December 03, 2018

The architecture is a bit of container matroska, but what we're trying to achieve is running Docker privileged inside of a LXC container on a baremetal host.

Alt text

Setup container on LXC Host

In order to give Docker in the guest privileges, the guest container itself has to be given privileges.

There is no simple switch for doing this in LXC unfortunately, but a few config options will do the trick.

lxc launch images:ubuntu/bionic container

lxc config set container security.nesting true
lxc config set container security.privileged true
cat <<EOT | lxc config set container raw.lxc -
lxc.cgroup.devices.allow = a
lxc.cap.drop =

lxc restart container

Setup docker on container

Just to verify that this works, start a privileged Docker container …

Last week while reviewing a patch I read that some gaming keyboards have two modes - keyboard mode and gaming mode. When in gaming mode, the keys send out pre-recorded macros when pressed. Presumably (I am not a gamer) this is to record keyboard shortcuts to have quicker access to various functionalities. The macros are stored in the hardware and are thus relatively independent of the host system. Pprovided you have access to the custom protocol, which you probably don't when you're on Linux. But I digress.

I reckoned this could be done in software and work with any 5 dollar USB keyboard. A few hours later, I have this working now: ggkbdd. It sits directly above the kernel and waits for key events. Once the 'mode key' is hit, the keyboard will send pre-configured key sequences for the respective keys. Hitting the mode key again (or ESC) switches back to normal mode.

There's a lot of functionality that is missing such as integration with the desktop (probably via DBus), better security (dropping privs, masking the fd to avoid accidental key logging), better system integration (request fds from logind, possibly through the compositor). And error handling, etc. I think the total time on this spent is somewhere between 3 and 4h, and that includes the time to write this blog post and debug the systemd unit autostartup. There are likely other projects that solve it the same way, or at least in a similar manner. I didn't check.

This was done as proof-of-concept and

  • I don't know if it's useful and if so, what the use-cases are
  • I don't know if I will have any time to fix things on this
  • I don't know if other (better developed) projects already occupy that space
In the grand glorious future and provided this is indeed something generally useful, this would need compositor integration. Not sure we'll ever get to that point. Meanwhile, consider this a code drop for a proof-of-concept and expect that you'll have to fix any bugs yourself.

I spoke at Linux Plumbers Conference 2018 in Vancouver a few weeks ago, about CUDA and the state of open source compute stacks.

The video is now available.

In V3D land, I built kernel support for the Texture Formatting Unit (TFU) and started using it for glGenerateMipmaps() support. This unit will be also be important for reformatting linear dmabufs (such as from X11 with a linear-only scanout engine) or media decode output in SAND format into the UIF format that V3D can render from. The kernel side is in drm-misc-next, and I’ll land the Mesa side once it hits drm-next.

I also rewrote the V3D cache invalidation in the kernel thanks to feedback from Dave Emett on the hardware team. In the process, this fixed a 3ms(!) CPU-side wait on every job submission, which improved throughput by 4-10x. Now I know why my fancy new hardware felt so slow! I also landed new tracepoints for anyone else debugging execution (aka I could write good sysprof visualization now).

Now that cache management is less broken, I’ve started on resurrecting my old SSBO and shader image branches so that I can write CSD (compute shader) support and get the driver to GLES 3.1.

In the process of writing new kernel code, I found multiple regressions in DRM core during CTS runs. I rebased my old V3D IGT support and got it merged, and built new tests for replicating one of the failures (GPU hang recovery was broken).

The tinydrm work from the previous month also continued. I’ve polished kmsro some more (fewer driver symlinks, more loader knowledge of kmsro), and will be resubmitting soon I hope.

Since I had kmsro nearly working, I gave it a shot for a V3D testing idea from last month. Right now to get X11 running to do conformance runs on it, I’ve got a stub KMS implementation in a branch. If I could do buffer sharing between VKMS and V3D, then kmsro could bind the two together so that we didn’t need out-of-tree patches for V3D testing. It turned out that VKMS didn’t have PRIME support, so I used Noralf’s proposed shmem helpers to add it. I got X11 up and running and looking plausible, but tests were failing left and right. This may be due to the shmem helpers using cached CPU mappings, when we want WC for interaction with most GPUs. Because of issues like this, the shmem helpers for VKMS are stalled at the moment but I may land V3D support in kmsro anyway.

In vc4 land, the Bootlin folks have been knocking off more of the HVS TODO entries. Boris respun underscan support, and hopefully after the next revision we’ll be able to land it. He’s also implemented H/V flipping of planes, which will be useful in some cases for 180-degree rotated displays. (For 90 degree rotations, we would need panel support for it as the HVS can’t go that way). Paul is taking over Boris’s load tracker work and is building tests for it to determine how accurate it is in predicting underflows.

Finally, for vc4 I got permission to write a driver for the PM block that controls power and reset. So far the only thing Linux uses the block for directly is the watchdog timer (which also performs reboots by letting the watchdog timer fire quickly). For power domains, we’ve been using my raspberrypi-power driver to talk to the Raspberry Pi firmware. However, we should be able to do things more reliably, and implement GPU reset properly, if we expose power domains and a reset controller from the PM block. I wrote a series that moves the PM block to a new MFD driver, binds the watchdog driver to it, and adds a new soc driver for power domains and reset. So far, I’ve only tested this driver for V3D power domains and reset, but it looks good and gets more of the Raspberry Pi platform supported in fully open source code.

November 22, 2018
I’ve been working on FreeBSD for Intel for almost 6 months now. In the world of programmers, I am considered an old dog, and these 6 months have been all about learning new tricks. Luckily, I’ve found myself in a remarkably inclusive and receptive community whose patience seems plentiful. As I get ready to take […]
November 19, 2018

Since the day I had my first class of Operating Systems (OS) in my engineering course, I got passionate about it; for me, OS represents one of the greatest achievements of mankind. As a result of my delight for OS, I always tried to gravitate around this field, but my school environment did not provide me with many opportunities to get into the area. To summarize this long journey, I will jump directly into the main point, on November 15 of 2017, I joined to a conference named Linuxdev-br [1] which brought together some of the best Brazilians Kernel developers. I took this opportunity to learn everything that I could by asking lots of questions to developers. Additionally, I was lucky to meet Gustavo Padovan. He helped me a lot during my first steps in the Linux Kernel.

From November 2017 until now, I did the best I could to become a Kernel developer, and I have to admit that the path was very complicated. I paid the price to work from 8 AM to 11 PM, from Sunday to Sunday, to maintain my efforts in my master and the Linux Kernel at the same time; unfortunately, I could not stay focused only in the Kernel. However, all of these efforts were paid off along the year; I had many patches accepted in the Kernel, I joined the Google Summer of Code (GSoC), I traveled to conferences, I returned to Linuxdev-br 2018 as a speaker, I joined XDC2018 [2], and many other good things happened.

Now I am close to complete one year of Linux Kernel, and one question still bugs me: why does it have to be so hard for someone in a similar condition to become part of this world? I realized that I had great support from many people (especially from my sweet and calm wife) and I also pushed myself very hard. Now, I feel that it is time to start giving back something to society; as a result, I began to promote some small events about free software in the university and the city I live. However, my main project related to this started around two months ago with six undergraduate students at the University of Sao Paulo, IME [3]. My plan is simple: train all of these six students to contribute to the Linux Kernel with the intention to help them to create a local group of Kernel developers. I am excited about this project! I noticed that within a few weeks of mentoring the students they already learned lots of things, and in a few days, they will send out their contributions to the Kernel. I want to write a new post about that in December 2018, reporting the results of this new tiny project and the summary of this one year of Linux Kernel. See you soon :)


  1. linuxdev-br
  2. XDC 2018
  3. IME USP
November 15, 2018
As discussed in my previous blog post one of my TODO list items for plymouth is creating a new plymouth theme.

Since the transition to plymouth is not entirely smooth plymouth by default will wait 5 seconds (counted from starting the kernel) before showing itself so that on systems which boot under 5 seconds it never shows. As can be seen in this video, this leads to a very non-smooth experience when the boot takes say 7 seconds as plymouth then only shows briefly, leading to a kinda "flash" effect while it briefly shows.

Another problem with the 5 second wait, is now that we do not show GRUB the user is looking at the firmware's bootsplash for not only the often long firmware initialization time, but also for the 5 seconds plymouth waits on top, making it look as if nothing is happening.

To fix this I've been working on a new plymouth theme which draws a spinner over the firmware boot splash, eliminating the ugly transition from the firmware boot splash to plymouth. This also allows removing the show-delay, so that we provide feedback that something is happening as soon as plymouth starts.

Firmware being firmware getting this done right was somewhat harder then I expected, but I've a first "draft" of a new theme doing this now. I've created some videos showing 2 different systems booting the new theme:Note the videos with diskcrypt where paused when I entered my passphrase. So there is a bit of a jump in them because of this.

I've built a test version of plymouth for Fedora 29, to give this a try download all rpm files from here except the .src.rpm and -devel files and then from a directory with all those files in it, run:

sudo rpm -Uvh plymouth*.rpm

Since plymouth is part of your initrd, you also need to regenerate your initrd:

sudo dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

This regenerates the initrd for the kernel you are currently running, so if you've installed a kernel update and have not rebooted since then you may not get the new theme when rebooting. In this case rerun the dracut command after rebooting.

Note if you've previously followed my instructions to test flickerfree boot, then you need to remove "plymouth.splash_delay=20" from your kernel commandline, since we now no longer want to have a splash-delay.

Now reboot and you should get the new spinner on firmware-boot-splash theme, with Fedora branding.

If you give this a try and the new theme somehow does not look correct, please mail at If you mail me about the theme not displaying correctly please attach the /run/plymouth.log file which this test-build generates to the email and a video of how the theme misbehaves would be great too.

I still need to discuss the idea of using a new theme incorporating the firmware boot splash with the GNOME design team so this is all subject to change.
October 31, 2018
Good morning from Edinburgh, where the breakfast contains haggis, and the charity shops have some interesting finds.

My main goal in attending this hackfest was to discuss Pipewire integration in the desktop, and how it will eventually replace PulseAudio as the audio daemon.

The main problem GNOME has had over the years with PulseAudio relate mostly to how PulseAudio was a black box when it came to its routing policy. What happens when you plug in an HDMI cable into your laptop? Or turn on your Bluetooth headset? I've heard the stories of folks with highly mobile workstations having to constantly visit the Sound settings panel.

PulseAudio has policy scattered in a number of places (do a "git grep routing" inside the sources to see that): some are in the device manager, then modules themselves can set priorities for their outputs and inputs. But there's nothing to take all the information in, and take a decision based on the hardware that's plugged in, and the applications currently in use.

For Pipewire, the policy decisions would be split off from the main daemon. Pipewire, as it gains PulseAudio compatibility layers, will grow a default/example policy engine that will try to replicate PulseAudio's behaviour. At the very least, that will mean that Pipewire won't regress compared to PulseAudio, and might even be able to take better decisions in the short term.

For GNOME, we still wanted to take control of that part of the experience, and make our own policy decisions. It's very possible that this engine will end up being featureful and generic enough that it will be used by more than just GNOME, or even become the default Pipewire one, but it's far too early to make that particular decision.

In the meanwhile, we wanted the GNOME policies to not be written in C, difficult to experiment with for power users, and for edge use cases. We could have started writing a configuration language, but it would have been too specific, and there are plenty of embeddable languages around. It was also a good opportunity for me to finally write the helper library I've been meaning to write for years, based on my favourite embedded language, Lua.

So I'm introducing Anatole. The goal of the project is to make it trivial to write chunks of programs in Lua, while the core of your project is written in C (we might even be able to embed it in Python or Javascript, once introspection support is added).

It's still in the very early days, and unusable for anything as of yet, but progress should be pretty swift. The code is mostly based on Victor Toso's incredible "Lua factory" plugin in Grilo. (I'm hoping that, once finished, I won't have to remember on which end of the stack I need to push stuff for Lua to do something with it ;)

Last month the X.Org Developer’s Conference (XDC) was held in A Coruña, Spain. I took part as a Plasma/KWin developer. My main goal was to simply get into contact with developers from other projects and companies working on open source technology in order to show them that the KDE community aims at being a reliable partner to them now and in the future.

Instead of recounting chronologically what went down at the conference let us look at three key groups of attendees, who are relevant to KWin and Plasma: the graphics drivers and kernel developers, upstream userland and colleagues working on other compositor projects.

Graphics drivers and kernel

If you search on Youtube for videos of talks from previous XDC conferences or for the videos from this year’s XDC you will notice that there are many talks by graphics drivers developers, often directly employed by hardware vendors.

The reason is that hardware vendors have enough money to employ open source developers and send them to conferences and that they benefit greatly from contributing directly to open source projects. Something which also Nvidia management will realize at some point.

On the other side I talked to the Nvidia engineers at the conference, who were very friendly and eager to converse about their technical solutions which they are allowed to share with the community. Sadly their primarily usage of proprietary technology in general hinders them in taking a more active role in the community and there is apparently no progress on their proposed open standard Wayland buffer sharing API.

At least we arranged that they would send some hardware for testing purposes. I won’t be the recipient, since my work focus will be on other topics in the immediate future, but I was able to point to another KWin contributor, who should receive some Nvidia hardware in the future so he can better troubleshoot problems our users on Nvidia experience.

The situation looks completely different for Intel and AMD. In particular Intel has a longstanding track record of open development of their own drivers and contributing to generic open source solutions also being supported by other vendors. And AMD decided not too long ago to open source their most commonly used graphics drivers on Linux. In both cases it is a bliss to target their latest hardware and it was as great as I imagined it to be talking to their developers at XDC, because they are not only interested in their own products but in boosting the whole ecosystem and finding suitable solutions for everyone. I want to explicitly mention Martin Peres from Intel and Harry Wentland from AMD, who I had long, interesting discussions with and who showed great interest in improving the collaboration of low-level engineers and us in userland.

Who I haven’t mentioned yet is ARM. Although they are just like Nvidia, Intel and AMD an XDC “Gold Sponsor” their contribution in terms of content to the conference was minimal, most likely for the same reason of being mostly closed source as in the Nvidia case. And that is equally sad, since we do have some interest in making ARM a well supported target for Plasma. An example is Plasma on the Pinebook. But the driver situation for ARM Mali GPUs is just ugly, developing for them is torture. I know because I did some of the integration work for the Pinebook. All the more I respect the efforts by several extremely talented hackers to provide open-source drivers for ARM Mali GPUs. Some of them presented their work at XDC.

X.Org and upstream

Linux graphics drivers are cool and all, but without XServer, Wayland and other auxiliary cross-vendor user space libraries there would be not much to show off to the user. And after all it is the X.Org Developer’s conference, most notably being home to the XServer and maybe in the future governance wise also to So after looking at low-level driver development, what role did these projects and their developers play at the conference?

First I have to say, that the dichotomy established in the previous paragraph is of course not that distinct. Several graphics drivers are part of mesa, which is again part of and many graphics drivers developers are also contributing to user land or involved in organizational aspects of X.Org and A more prominent one of these organizational aspects is hosting of projects. There was a presentation by Daniel Stone about the transition to GitLab, what was a rather huge project this year and is still ongoing.

But regarding technical topics there were not many presentations about XServer, Wayland and other high level components. After seeing some lightning talks on the first day of the conference I decided to hold a lightning talk myself about my Xwayland GSOC project in 2017. I got one of the last slots on Friday and you can watch a video of my presentation here. Also Drew De Vault presented a demo of wlroot’s layer shell.

So there were not so many talks about the higher level user space graphics stack, but some of us plan to increase the ratio of such talks in the future. After talking about graphics drivers developers and upstream userland this brings me directly to the last group of people:

Compositors developers

We were somewhat a special crowd at XDC. From distinct projects, some of us were from wlroots, Guido from Purism and me from KWin, we were united in, to my knowledge, all of us being the first time at XDC.

If you look at past conferences the involvement of compositor developers was marginal. My proclaimed goal and I believe also the one of all the others is to change this from now on. Because from embedded to desktop we will all benefit by working together where possible and exchanging information with each other, with upstream and with hardware vendors. I believe X.Org and can be a perfect platform for that.

Final remarks on organisation

The organisation of the conference was simply great. Huge thanks to igalia for hosting XDC in their beautiful home town.

What I really liked about the conference schedule was that there were always three long breaks every day and long pauses between the talks allowing the attendees to talk to each other.

What I didn’t like about the conference was that all the attendees were spread over the city in different hotels. I do like the KDE Akademy approach better in this regard: everyone in one place so you can drink together a last beer at the hotel bar before going to bed. That said there were events at multiple evenings throughout the week, but recommending a reasonable priced default hotel for everyone not being part of a large group might still be an idea for next XDC.

October 30, 2018

So we kicked off the PipeWire hackfest in Edinburgh yesterday. We have 15 people attending including Arun Raghavan, Tanu Kaskinen and Colin Guthrie from PulseAudio, PipeWire creator Wim Taymans, Bastien Nocera and Jan Grulich representing GNOME and KDE, Mark Brown from the ALSA kernel team, Olivier Crête,George Kiagiadakis and Nicolas Dufresne was there to represent embedded usecases for PipeWire and finally Thierry Bultel representing automotive.

The event kicked off with Wim Taymans presenting on current state of PipeWire and outlining the remaining issues and current thoughts on how to resolve them. Most of the first day was spent on a roadtable discussion about what are and should be the goals of PipeWire and what potential tradeoffs there would be going forward. PipeWire is probably a bit closer to Jack than PulseAudio in design, so quite a bit of the discussion went on how that would affect the PulseAudio usecases and what is planned to ensure PipeWire works very well for consumer audio usecases.

Personally I ended up spending quite some time just testing and running various Jack apps to see what works already and what doesn’t. In terms of handling outputing audio with Jack apps I was positively surprised how many Jack apps I was able to make work (aka output audio) using PipeWire instead of Jack, but of course we still have some gaps to cover before PipeWire is ready as a drop-in Jack replacement, for instance the Jack session management protocol needs to be implemented first.

The second day we outlined the areas that need work before we are ready to replace PulseAudio and came up with the following list:

  • Mixers – This is basically dealing with hardware mixers. Arun and Wim started looking at a design for this during the hackfest.
  • PulseAudio services – This is all the things in PulseAudio that is not very suitable for putting inside PipeWire. The idea is instead to put them in a separate daemon. This includes things like network streaming, ROAP, DBus apis and so on.
  • Policy/Session handling – We plan to move policy and session handling out of PulseAudio to make it easier for different usecases to set their own policies. PipeWire will still provide some default setup, but the idea here is to have a separate daemon(s) to provide this. Bastien Nocera started prototyping a setup where he could create policy and session handling using Lua scripting.
  • Filters
  • Bluetooth – Ensuring we have great bluetooth support with PipeWire. We would want to move Bluetooth handling to its own daemon, and not have it inside like in PulseAudio to allow for more flexibility with various embedded bluetooth stacks for instance. This could also mean looking at the Linux Bluetooth stack more widely as things are not ideal atm, especially from a security viewpoint.
  • Device reservation – We expect to replace Jack and PulseAudio in steps, starting with PulseAudio. So dealing well with hardware reservation is important to allow people to for instance keep running Jack alongside PipeWire until we are ready for full replacement.
  • Stream Monitoring – Important feature from Jack and PulseAudio that still needs implementing to allowing monitoring audio devices and streams.
  • Latency handling – Improving ways we can deal with hardware latency in for instance consumer devices such as TVs

It is still a bit hard to have a clear timeline for when we will be ready to drop in PipeWire support to replace PulseAudio and then Jack, but we feel the Wayland migration was a good example to follow where we held off doing the switch until we felt comfortable the move would be transparent to most users. There will of course always be corner cases and bugs, but we hope that in general people agree that the Wayland transition was done in a responsible manner and thus could be a good example to follow for us here.

We would like to offers big thanks to the GNOME Foundation for sponsoring travel for some of the community attendees and to Collabora for sponsoring dinner for all attendees the first night.

If you want to take a look at PipeWire, Wim updated the wiki page with PipeWire build intructions to be up-to-date. The hackfest attendees tested them out so we are sure they work, just be aware that you want the ‘Work’ branch and not the Master branch, as that is the one where all the audio work is happening. The Master branch is the video focused branch we use in Fedora for desktop remoting support in browsers and VNC under Wayland.

This last week, I worked on the oldest issue I had in my rpi kernel repo: Adding support for tinydrm displays (it actually predates tinydrm). I wrote a tinydrm driver for the panel I had on hand and got it upstreamed. I wrote a Mesa series enabling vc4 to talk to this new hx8357d tinydrm driver. Once the Mesa side is agreed on, then we can extend it to the other tinydrm drivers and have a solution for X with 3D with limited updates (as opposed to full screen refreshing like fbdev did) on these panels.

I also fixed a longstanding vc4 bug that slightly rotated SDL2 output. They were running all of their coordinates through a rotation matrix they built on each VS invocation, and in the common case of no rotation the cos(0)/sin(0) was inaccurate enough to rotate things by a pixel near the window edges. The sin()/cos() in Mesa is now more accurate so this problem shouldn’t happen with old SDL2, we have a piglit test to make sure other drivers don’t break with SDL2, and upstream SDL2 has stopped building a rotation matrix in the shader. (yay!)

Small stuff from the V3D driver since my last post:

  • Fixed depth writes (or depth/color non-writes) on V3D 4.2.
  • Reduced shader recompiles on V3D 4.2.
  • Reduced VPM consumption by vertex shaders using shared NIR optimizations.
  • Took advantage of hardware half-float pack/unpack operations.
  • Switched from FLUSH_ALL_STATE to FLUSH (less overhead, tested by HW team).
  • Merged a fix for use-after-free of sched fences.
  • Merged a fix for debugfs oops on V3D 4.1
  • Merged the kernel V3D clock measurement debugfs entry.
  • Merged more documentation of the SUBMIT_CL.

Also in V3D, I’ve been experimenting with other ways of running the driver using the software simulator. Right now I have a frankenstein mode in vc4 and v3d where the driver sits on top of the i915 GEM driver, and does BO mapping and interactions with the window system on i915 and then copies those BOs into the simulator to run vc4/v3d ioctls on it. This is gross, and it’s only really usable on my desktop with the i915 driver.

Talking with one of the HW engineers who’s interested in running Mesa against the simulator, there are a few ways we could do simulation. One would be to have the actual kernel driver call back up into a userspace daemon linking the simulator, and provide mappings of BOs and proxied register writes to that userspace dameon. Another would be to implement an actual V3D device in QEMU by linking in the simulator, but we can’t do that due to the GPL. Another would be to use my current simulation code, but let it work on top of vkms or vgem (so that a CI system wouldn’t need i915). This one seems easy. However, I started on a project mimicing what Intel did for their software-only CI setup: intercept libc calls to simulate having a V3D driver that’s not actually present in the kernel. I’ve got the code to the point of trying to submit CLs, at which point the simulator hangs for reasons I haven’t debugged yet. I don’t know if I’ll want to do this long term, but it seems like potentially useful code to other driver developers who have a software simulator.

Finally, in the X world, this month I’ve reviewed and merged more patches than I normally would (thanks, gitlab!), and wrote an extensible testcase for the damage extension due to a bug report/fix that a user had been submitted.

October 23, 2018

As many of you know we kicked of a ambitious goal to revamp the Linux desktop when we launched Fedora Workstation 4 years. We wanted to remove many of the barriers to adoption of Linux as a desktop and make it a better operating system for all, especially for developers.
To that effect we have been pushing a long range of initiatives over the last 4 years ago, ranging from providing a better input stack through libinput, a better display system through Wayland, a better audio and video subsystem through PipeWire, a better way of doing application packaging and dependency handling through Flatpak, a better application installation history through GNOME Software, actual firmware handling for Linux through Linux Vendor Firmware Service, better manageability through Fleet Commander, and Project Silverblue for reliable OS updates. We also had a lot of efforts done to improve general hardware handling, be that work on glvnd and friends for dealing with NVidia driver, the Bolt project for handling Thunderbolt devices better, HiDPI support in the desktop, better touch support in the desktop, improved laptop battery life, and ongoing work to improve state of fingerprint readers under Linux and to provide a flicker free boot experience.

One thing though that was clear to us was that as we where making all these changes to improve the ease of use and reliability of Linux as a desktop operating system we couldn’t make life worse for developers. Developers are the lifeblood of Fedora and Linux and thus we have had Debarshi Ray working on a project we call Fedora Toolbox. Fedora toolbox creates a seamless experience for developers when using an immutable OS like Silverblue, yet want to be able to install the wonderful world of software libraries and tools that makes Linux so powerful for developers. Fedora Toolbox is now ready for early adopters to start testing, so I recommend jumping over to Debarshi’s blog to read up on Fedora Toolbox.

October 22, 2018

Oculus Quest. It’s a cool step forward, no doubt: an all-in-one headset, with touch controllers, inside-out 6DOF tracking, no sensors, no wires, headphones and so on. Neat! I can’t wait to put my hands on it… but, there’s a but:

“It runs rift quality experiences”, that’s Zuckerberg’s words. And they’re twice wrong.

Rift device runs tethered to a PC, which is a x86 architecture based machine. The Quest will be running on a Snapdragon 8xx processor, which is a mobile processor, SoC architecture. There’s no way to compare the mobile architecture’s processing capabilities with the x86 one and its way more powerful CPUs / GPUs – roughly speaking, one focuses on power consumption while the other on performance.

So, no “rift quality” into Quest.

The other thing is that the OS, drivers and graphics libraries are different on each architecture, so say your latest shader feature will be different on each architecture and, in principle the same binary application won’t be able to run on both. In other words, this means that you will need to tell the 3D artist and the programmer to work the application out again… and they will get really mad at you! :)

Therefore, Oculus Quest won’t “run rift” existing apps and games either. They will need to be re-built.


Screenshot from 2018-10-08 23-26-26

October 20, 2018

In the past month-and-a-half since the previous update on the free (open source) Panfrost driver for Mali GPUs, progress has been fervent! During the interim, Lyude Paul and I presented about the driver at the X.Org Developer’s Conference. Check out the talk (slides). Benchmarks included!

Since the talk, we’ve been working on Panfrost like mad. Literally – I was pretty angry by that distorted texture bug! The culprit proved to be the compiler emitting imov (integer move) instructions rather than fmov (floating-point move), apparently causing precision degradation while processing texture coordinates.

In any event, implementing a few new features culminated in the above jellyfish demo. This jellyfish scene is a part of glmark2-es2, a set of OpenGL ES programs used to test and benchmark GPU drivers. For Panfrost, I use glmark as a set of milestones to work towards while developing the driver. glmark is comprehensive and can do just about anything I need while developing graphics drivers, except for making me a peanut butter and jellyfish sandwich. Er, never mind, it looks like they just added an OpenGL sandwich scene. Bonus!

Aside from the texturing issue, the next major challenge faced by the jellyfish scene was “uniform spilling”. Many GPUs require a special “load uniform” instruction in the shader to access a uniform; Midgard does not… usually. Instead, up to 16 (32-bit vec4) uniforms can be preloaded into registers, which makes them free to access in shaders. Yay performance!

The problem with this uniform loading scheme is when a shader needs more than 16 uniforms, which is frequent in vertex shaders loading large matrices. A single 4x4 matrix eats up 4 uniforms; 4 such matrices and there are no uniforms left for driver-internal state. Oops.

The solution is surprisingly simple: rather than using this register-based fast path, use the traditional “load uniform” instructions which have no practical limit on the number of uniforms, at the expense of a nominal performance hit. Specifically, the driver apparently switches from using uniforms to internally generating their crazy cousin, uniform buffer objects. This technique correctly handles these complex shaders.

However, Panfrost goes one step further than the blob appears to! Since uniforms and uniform buffer objects can be used simultaneously, we can use the same memory block for both kinds of uniforms – two for the price one! Mapping the uniform memory like this, the compiler is free to use the register fast path for the first 16 uniforms and seamlessly fallback on the general slow path for the rest, shaving off sixteen “load uniform” instructions on a shader that otherwise has to “spill” to dedicated instructions. A win!

Another major change prompted by a bug bringing up jellyfish surrounded register allocation. Previously, Panfrost used an ersatz, home-rolled register allocator, good enough for spinning gears but a little clunky. Debugging jellyfish uncovered a buggy corner case; the naive solution unfortunately increased register pressure, a performance issue. But there’s no need to settle for naive!

“Bayes”ing the work on Mesa’s classy register allocation library, I scrapped the old code and wrote a more robust, sophisticated register allocator. The new allocator, using graph coloring rather than ad hoc liveness checks, already generates better code with lower register pressure, neatly resolving the issue found in jellyfish. Plus, it will permit further register-related progress when we tackle register spilling and parallelism in stores and texture instructions. The new code is considerably longer, but it will pay off as shader complexity creeps up.

A few miscellaneous bugs later, et voila, Jellyfish running on GPU-accelerated using only free software on a Rockchip RK3399 laptop.


It’s a few hops away from the jellyfish, yet I carrot believe Panfrost is running glmark’s refract demo!

Like jellyfish, uniform spilling is needed for refract. We have been able to render the bunny itself for quite a while, as demoed at the X.Org Developer’s Conference, linked above. Where’s the issue?

Framebuffer objects.

Remember how I mentioned back in April that framebuffer objects were one of the few major features missing in Panfrost?

We can strike that off the list now!

For those unfamiliar, by default, a 3D application renders to the screen. When the application instead renders off-screen, to be later read as a texture, it’s said to render to a “framebuffer object” in graphics lingo. Since Mali is a “render-only” GPU, independent of the display, we already have the freedom to render to arbitrary buffers. Theoretically, rather than telling the GPU to render on-screen, we simply it to render literally anywhere else but the screen. Pretty simple!

Of course, it’s never that simple. To reduce power consumption by saving memory bandwidth, Mali GPUs feature “ARM Framebuffer Compression”, or AFBC for short. As advertised, AFBC is a lossless compression scheme for compressing the framebuffer in-flight from one part of the chip to another.

Only across parts of the chip? Always interested in reducing power, it turns out the blob is aggressive about enabling AFBC anywhere it can. Previously, we could ignore AFBC entirely, since the X11 blob is apparently unable to use AFBC for display. However, since framebuffer objects are off-screen and stored internal to the driver, it doesn’t matter to the rest of the system what format is used. Accordingly, we have never seen a framebuffer object that doesn’t use AFBC, so I had to implement AFBC support in the driver.

Furthermore, framebuffer objects mean juggling multiple framebuffers at once. Together with AFBC support, this additional complexity required a considerable refactor of the command stream driver code to support the extra bookkeeping. Many coding afternoons and a transatlantic flight later, Panfrost implemented preliminary (color) framebuffer objects.

However, to add to the joy, refract does not only use a color attachment to a framebuffer object, but also a depth attachment. Conceptually, the two modes of operation are identical; in practice, they require a different encoding in the command stream and an additional increase in bookkeeping leading to yet another (small) refactor. Plenty of keyboard mashing and brain freezing later, Panfrost supported depth attachments to framebuffer objects as well as color. And poof, a refraction bunny!

Now, for the piece de resistance, rather than a glmark demo, we have the reference Wayland compositor, implementing a simple GPU-accelerated desktop. With a small workaround added to the driver, Panfrost is able to run …


October 19, 2018

This week, the GNOME Foundation Board of Directors met at the Collabora office in Cambridge, UK, for the second annual Foundation Hackfest. We were also joined by the Executive Director, Neil McGovern, and Director of Operations, Rosanna Yuen. This event was started by last year’s board and is a great opportunity for the newly-elected board to set out goals for the coming year and get some uninterrupted hacking done on policies, documents, etc. While it’s fresh in our mind, we wanted to tell you about some of the things we have been working on this week and what the community can hope to see in the coming months.

Wednesday: Goals

On Wednesday we set out to define the overall goals of the Foundation, so we could focus our activities for the coming years, ensuring that we were working on the right priorities. Neil helped to facilitate the discussion using the Charting Impact process. With that input, we went back to the purpose of the Foundation and mapped that to ten and five year goals, making sure that our current strategies and activities would be consistent with reaching those end points. This is turning out to be a very detailed and time-consuming process. We have made a great start, and hope to have something we can share for comments and input soon. The high level 10-year goals we identified boiled down to:

  • Sustainable project and foundation
  • Wider awareness and mindshare – being a thought leader
  • Increased user base

As we looked at the charter and bylaws, we identified a long-standing issue which we need to solve — there is currently no formal process to cover the “scope” of the Foundation in terms of which software we support with our resources. There is the release team, but that is only a subset of the software we support. We have some examples such as GIMP which “have always been here”, but at present there is no clear process to apply or be included in the Foundation. We need a clear list of projects that use resources such as CI, or have the right to use the GNOME trademark for the project. We have a couple of similar proposals from Allan Day and Carlos Soriano for how we could define and approve projects, and we are planning to work with them over the next couple of weeks to make one proposal for the board to review.

Thursday: Budget forecast

We started the second day with a review of the proposed forecast from Neil and Rosanna, because the Foundation’s financial year starts in October. We have policies in place to allow staff and committees to spend money against their budget without further approval being needed, which means that with no approved budget, it’s very hard for the Foundation to spend any money. The proposed budget was based off the previous year’s actual figures, with changes to reflect the increased staff headcount, increased spend on CI, increased staff travel costs, etc, and ensure after the year’s spending, we follow the reserves policy to keep enough cash to pay the foundation staff for a further year. We’re planning to go back and adjust a few things (internships, marketing, travel, etc) to make sure that we have the right resources for the goals we identified.

We had some “hacking time” in smaller groups to re-visit and clarify various policies, such as the conference and hackfest proposal/approval process, travel sponsorship process and look at ways to support internationalization (particularly to indigenous languages).

Friday: Foundation Planning

The Board started Friday with a board-only (no staff) meeting to make sure we were aligned on the goals that we were setting for the Executive Director during the coming year, informed by the Foundation goals we worked on earlier in the week. To avoid the “seven bosses” problem, there is one board member (myself) responsible for managing the ED’s priorities and performance. It’s important that I take advantage of the opportunity of the face to face meeting to check in with the Board about their feedback for the ED and things I should work together with Neil on over the coming months.

We also discussed a related topic, which is the length of the term that directors serve on the Foundation Board. With 7 staff members, the Foundation needs consistent goals and management from one year to the next, and the time demands on board members should be reduced from previous periods where the Foundation hasn’t had an Executive Director. We want to make sure that our “ten year goals” don’t change every year and undermine the strategies that we put in place and spend the Foundation resources on. We’re planning to change the Board election process so that each director has a two year term, so half of the board will be re-elected each year. This also prevents the situation where the majority of the Board is changed at the same election, losing continuity and institutional knowledge, and taking months for people to get back up to speed.

We finished the day with a formal board meeting to approve the budget, more hack time on various policies (and this blog!). Thanks to Collabora for use of their office space, food, and snacks – and thanks to my fellow Board members and the Foundation’s wonderful and growing staff team

October 15, 2018

Last week the Flatpak community woke to the “news” that we are making the world a less secure place and we need to rethink what we’re doing. Personally, I’m not sure this is a fair assessment of the situation. The “tl;dr” summary is: Flatpak confers many benefits besides the sandboxing, and even looking just at the sandboxing, improving app security is a huge problem space and so is a work in progress across multiple upstream projects. Much of what has been achieved so far already delivers incremental improvements in security, and we’re making solid progress on the wider app distribution and portability problem space.

Sandboxing, like security in general, isn’t a binary thing – you can’t just say because you have a sandbox, you have 100% security. Like having two locks on your front door, two front doors, or locks on your windows too, sensible security is about defense in depth. Each barrier that you implement precludes some invalid or possibly malicious behaviour. You hope that in total, all of these barriers would prevent anything bad, but you can never really guarantee this – it’s about multiplying together probabilities to get a smaller number. A computer which is switched off, in a locked faraday cage, with no connectivity, is perfectly secure – but it’s also perfectly useless because you cannot actually use it. Sandboxing is very much the same – whilst you could easily take systemd-nspawn, Docker or any other container technology of choice and 100% lock down a desktop app, you wouldn’t be able to interact with it at all.

Network services have incubated and driven most of the container usage on Linux up until now but they are fundamentally different to desktop applications. For services you can write a simple list of permissions like, “listen on this network port” and “save files over here” whereas desktop applications have a much larger number of touchpoints to the outside world which the user expects and requires for normal functionality. Just thinking off the top of my head you need to consider access to the filesystem, display server, input devices, notifications, IPC, accessibility, fonts, themes, configuration, audio playback and capture, video playback, screen sharing, GPU hardware, printing, app launching, removable media, and joysticks. Without making holes in the sandbox to allow access to these in to your app, it either wouldn’t work at all, or it wouldn’t work in the way that people have come to expect.

What Flatpak brings to this is understanding of the specific desktop app problem space – most of what I listed above is to a greater or lesser extent understood by Flatpak, or support is planned. The Flatpak sandbox is very configurable, allowing the application author to specify which of these resources they need access to. The Flatpak CLI asks the user about these during installation, and we provide the flatpak override command to allow the user to add or remove these sandbox escapes. Flatpak has introduced portals into the Linux desktop ecosystem, which we’re really pleased to be sharing with snap since earlier this year, to provide runtime access to resources outside the sandbox based on policy and user consent. For instance, document access, app launching, input methods and recursive sandboxing (“sandbox me harder”) have portals.

The starting security position on the desktop was quite terrible – anything in your session had basically complete access to everything belonging to your user, and many places to hide.

  • Access to the X socket allows arbitrary input and output to any other app on your desktop, but without it, no app on an X desktop would work. Wayland fixes this, so Flatpak has a fallback setting to allow Wayland to be used if present, and the X socket to be shared if not.
  • Unrestricted access to the PulseAudio socket allows you to reconfigure audio routing, capture microphone input, etc. To ensure user consent we need a portal to control this, where by default you can play audio back but device access needs consent and work is under way to create this portal.
  • Access to the webcam device node means an app can capture video whenever it wants – solving this required a whole new project.
  • Sandboxing access to configuration in dconf is a priority for the project right now, after the 1.0 release.

Even with these caveats, Flatpak brings a bunch of default sandboxing – IPC filtering, a new filesystem, process and UID namespace, seccomp filtering, an immutable /usr and /app – and each of these is already a barrier to certain attacks.

Looking at the specific concerns raised:

  • Hopefully from the above it’s clear that sandboxing desktop apps isn’t just a switch we can flick overnight, but what we already have is far better than having nothing at all. It’s not the intention of Flatpak to somehow mislead people that sandboxed means somehow impervious to all known security issues and can access nothing whatsoever, but we do want to encourage the use of the new technology so that we can work together on driving adoption and making improvements together. The idea is that over time, as the portals are filled out to cover the majority of the interfaces described, and supported in the major widget sets / frameworks, the criteria for earning a nice “sandboxed” badge or submitting your app to Flathub will become stricter. Many of the apps that access --filesystem=home are because they use old widget sets like Gtk2+ and frameworks like Electron that don’t support portals (yet!). Contributions to improve portal integration into other frameworks and desktops are very welcome and as mentioned above will also improve integration and security in other systems that use portals, such as snap.
  • As Alex has already blogged, the 1.6 runtime was something we threw together because we needed something distro agnostic to actually be able to bootstrap the entire concept of Flatpak and runtimes. A confusing mishmash of Yocto with flatpak-builder, it’s thankfully nearing some form of retirement after a recent round of security fixes. The replacement freedesktop-sdk project has just released its first stable 18.08 release, and rather than “one or two people in their spare time because something like this needs to exist”, is backed by a team from Codethink and with support from the Flatpak, GNOME and KDE communities.
  • I’m not sure how fixing and disclosing a security problem in a relatively immature pre-1.0 program (in June 2017, Flathub had less than 50 apps) is considered an ongoing problem from a security perspective. The wording in the release notes?

Zooming out a little bit, I think it’s worth also highlighting some of the other reasons why Flatpak exists at all – these are far bigger problems with the Linux desktop ecosystem than app security alone, and Flatpak brings a huge array of benefits to the table:

  • Allowing apps to become agnostic of their underlying distribution. The reason that runtimes exist at all is so that apps can specify the ABI and dependencies that they need, and you can run it on whatever distro you want. Flatpak has had this from day one, and it’s been hugely reliable because the sandboxed /usr means the app can rely on getting whatever they need. This is the foundation on which everything else is built.
  • Separating the release/update cadence of distributions from the apps. The flip side of this, which I think is huge for more conservative platforms like Debian or enterprise distributions which don’t want to break their ABIs, hardware support or other guarantees, is that you can still get new apps into users hands. Wider than this, I think it allows us huge new freedoms to move in a direction of reinventing the distro – once you start to pull the gnarly complexity of apps and their dependencies into sandboxes, your constraints are hugely reduced and you can slim down or radically rethink the host system underneath. At Endless OS, Flatpak literally changed the structure of our engineering team, and for the first time allowed us to develop and deliver our OS, SDK and apps in independent teams each with their own cadence.
  • Disintermediating app developers from their users. Flathub now offers over 400 apps, and (at a rough count by Nick Richards over the summer) over half of them are directly maintained by or maintained in conjunction with the upstream developers. This is fantastic – we get the releases when they come out, the developers can choose the dependencies and configuration they need – and they get to deliver this same experience to everyone.
  • Decentralised. Anyone can set up a Flatpak repo! We started our own at Flathub because there needs to be a center of gravity and a complete story to build out a user and developer base, but the idea is that anyone can use the same tools that we do, and publish whatever/wherever they want. GNOME uses GitLab CI to publish nightly Flatpak builds, KDE is setting up the same in their infrastructure, and Fedora is working on completely different infrastructure to build and deliver their packaged applications as Flatpaks.
  • Easy to build. I’ve worked on Debian packages, RPMs, Yocto, etc and I can honestly say that flatpak-builder has done a very good job of making it really easy to put your app manifest together. Because the builds are sandboxed and each runtimes brings with it a consistent SDK environment, they are very reliably reproducible. It’s worth just calling this out because when you’re trying to attract developers to your platform or contributors to your app, hurdles like complex or fragile tools and build processes to learn and debug all add resistance and drag, and discourage contributions. GNOME Builder can take any flatpak’d app and build it for you automatically, ready to hack within minutes.
  • Different ways to distribute apps. Using OSTree under the hood, Flatpak supports single-file app .bundles, pulling from OSTree repos and OCI registries, and at Endless we’ve been working on peer-to-peer distribution like USB sticks and LAN sharing.

Nobody is trying to claim that Flatpak solves all of the problems at once, or that what we have is anywhere near perfect or completely secure, but I think what we have is pretty damn cool (I just wish we’d had it 10 years ago!). Even just in the security space, the overall effort we need is huge, but this is a journey that we are happy to be embarking together with the whole Linux desktop community. Thanks for reading, trying it out, and lending us a hand.

October 05, 2018

For the 9th time this year there will be the GStreamer Conference. This year it will be in Edinburgh, UK right after the Embedded Linux Conference Europe, on the 25th of 26th of October. The GStreamer Conference is always a lot of fun with a wide variety of talks around Linux and multimedia, not all of them tied to GStreamer itself, for instance in the past we had a lot of talks about PulseAudio, V4L, OpenGL and Vulkan and new codecs.This year I am really looking forward to talks such as the DeepStream talk by NVidia, Bringing Deep Neural Networks to GStreamer by Pexip and D3Dx Video Game Streaming on Windows by Bebo, to mention a few.

For a variety of reasons I missed the last couple of conferences, but this year I will be back in attendance and I am really looking forward to it. In fact it will be the first GStreamer Conference I am attending that I am not the organizer for, so it will be nice to really be able to just enjoy the conference and the hallway track this time.

So if you haven’t booked yourself in already I strongly recommend going to the GStreamer Conference website and getting yourself signed up to attend.

See you all in Edinburgh!

Also looking forward to seeing everyone attending the PipeWire Hackfest happening right after the GStreamer Conference.

X.Org Developer’s Conference (XDC) is the summit meeting for people that work with graphics in all the world to meet each other for three days. There you will find people working with compositors, direct rendering management (DRM), graphics applications, and so forth; all these people at the same place create a unique learning opportunity. Finally, you can feel the community spirit in every table, talk, and corner.

The XDC has many exciting talks, social events, and space for discussion with developers. All of this enabled thanks to the organization team, which did a great job by organizing the conference; they selected a great university that had a perfect structure for hosting the event. They also included social events that introduced some background about the history of the La Coruna; In my case, I enjoyed to learn a bit of the local history. About the food, the conference provided coffee breaks and lunch during all the days, all of them great!

About the community

In my case, I put as much effort as possible to learn from people. In the first day, I had a great conversation with Daniel Stone, Roman Gilg, Drew DeVault, Guido Günther and other about compositors. In particular, Daniel Stone explained a lot of details about Weston and Wayland; he also taught me how to test Weston on top of VKMS, and how to see logs. Additionally, Daniel gave me an excellent idea: add writeback to VKMS to provide visual output and other features. In the same day, Daniel explained me many things about the community organization and his work to maintain the Gitlab instance for the Freedesktop; I really enjoyed every second of our conversation.

Additionally, I met a group of Sway developers during lunch. After a small chat, for some reason they took their laptops and started to play with Sway; I got really impressed with their work and enthusiasm. Then, I decided that I wanted to learn how to contribute with Sway for two reasons: I want to learn more about graphics in the user space (compositors), and I want to use a new desktop environment. Afterwards, I started asking Drew to teach me how to compile and use Sway. He was really kind, he showed me many things about compositor then pointed me directions to better get into this world.

On the second day, I was obsessed about writeback, and I tried to talk with Brian Starkey; he is the original author of the patch that added writeback to DRM. We spoke for one hour, Bryan explained me so many details about writeback and gave me some ideas on how I could implement it on VKMS. In the end, he also sent me an email with diagrams that he made on-the-fly and some extra explanation about the subject. I am happy that I had the opportunity to learn so many things from him. In the same day, I also got a chance to talk to Arkadiusz Hiler about some of the patches that I sent to IGT, and I also made lots of questions about IGT. He explained with details, how I could read the intel CI report and other related stuff. I hope that after his explanations I can improve my patches and also send much more for IGT.

On the third day, Haneen and I worked together to learn as much as we could by asking many things to Daniel Vetter. We used the opportunity to clarify many of our questions, and also discuss some patches we sent. At the end of our conversation, I applied to become co-mentor in the Outreachy; I hope that I can help bringing new people to become part of this fantastic community.

This is just a brief summary of XDC, I took every single opportunity that I had to talk to people and learned something new.

I finally met Haneen, Daniel Vetter, Sean Paul, Martin Peres, and Arkadiusz Hiler

One exciting thing about working remotely it is the fact that you talk with many people without really know them in person. In particular, I worked with Haneen for such a long time, but I have never seen her; however, in the XDC I finally met her! It was really nice to talk to her, both of us were together most of the time trying to understand as much as we could; as a result, we always helped each other in the event to better understand the concepts that someone would explained us.

I also met Daniel Vetter, and Sean Paul, both of them were our mentors during summer. I really enjoyed to talk with them and put a face on the nickname. Additionally, I met Martin Peres, thanks to him I created this website to keep reporting my work and also thanks to him I could enjoy XDC.

Finally, I met Hiler from Intel. He provided many feedback on the patches I sent IGT; he also helped a lot in the IRC channel. I really liked to meet him in person.


During the event, Haneen and I received so many feedback on how we could improve the VKMS that we decided to do a workshop in the last day. The workshop was great, a lot of people joined, and we took note of new ideas. From the conversation, we emerged the following list of tasks:

Now that I learned a lot and collected so many feedback, I will work on the following steps:

  1. Implement writeback support
  2. Implement configfs system
  3. Mentor a newcomer in the outreachy
October 03, 2018

XDC 2018 This year, Developers’ Conference (XDC 2018) happened in the Computer Science Faculty of University of Coruña, in the city of A Coruña, Spain during the last week of September, from Wednesday 26th to Friday 28th. XDC 2018 was a 3-day conference full of talks about all the technologies about free software graphics stack covering different topics like graphics driver development, testing and benchmarking, DRM, X, virtualization… Check the schedule for more info, slide decks and, once we have them edited, videos. For those loving statistics, we had 18 main track talks, 4 Workshops, 17 lightning talks, 3 social events, hallway tracks… Amazing!

XDC 2018 photo

This year we had 110 registered attendees and +40 students from the University of A Coruña attending it. Taking into account that some people couldn’t come at the very last minute, we estimate that we had ~140 attendees in total, which is probably the most successful XDC conference ever in terms of attendance. Honestly, we were a bit worried as coming to A Coruña is not so easy as other international hubs, so thanks everybody for coming to A Coruña!


This year the conference was organized by GPUL (Galician Linux User and Developer Group founded in 1998) together with University of A Coruña, Igalia and, of course, X.Org Foundation. The organization team was composed by 12 volunteers, some from Igalia and the rest from GPUL, who were taking care that everything went fine and fixed all the late minute issues that happened. I hope GPUL can keep organizing events for another 20 years! :-D

However, we are not perfect. Feel free to send us your feedback to both and, we would like to improve the organization of both next XDC conferences and our own local conferences.

Thanks to Igalia for allowing me organizing this event, for their platinum sponsorship and for sponsor Tuesday and Wednesday events.


October 01, 2018
A big project I've been working on recently for Fedora Workstation is what we call flickerfree boot. The idea here is that the firmware lights up the display in its native mode and no further modesets are done after that. Likewise there are also no unnecessary jarring graphical transitions.

Basically the machine boots up in UEFI mode, shows its vendor logo and then the screen keeps showing the vendor logo all the way to a smooth fade into the gdm screen. Here is a video of my main workstation booting this way.

Part of this effort is the hidden grub menu change for Fedora 29. I'm happy to announce that most of the other flickerfree changes have also landed for Fedora 29:

  1. There have been changes to shim and grub to not mess with the EFI framebuffer, leaving the vendor logo intact, when they don't have anything to display (so when grub is hidden)

  2. There have been changes to the kernel to properly inherit the EFI framebuffer when using Intel integrated graphics, and to delay switching the display to the framebuffer-console until the first kernel message is printed. Together with changes to make "quiet" really quiet (except for oopses/panics) this means that the kernel now also leaves the EFI framebuffer with the logo intact if quiet is used.

  3. There have been changes to plymouth to allow pressing ESC as soon as plymouth loads to get detailed boot messages.

With all these changes in place it is possible to get a fully flickerfree boot today, as the video of my workstation shows. This video is made with a stock Fedora 29 with 2 small kernel commandline tweaks:

  1. Add "i915.fastboot=1" to the kernel commandline, this removes the first and last modeset during the boot when using the i915 driver.

  2. Add "plymouth.splash-delay=20" to the kernel commandline. Normally plymouth waits 5 seconds before showing the charging Fedora logo so that on systems which boot in less then 5 seconds the system simply immediately transitions to gdm. On systems which take slightly longer to boot this makes the charging Fedora logo show up, which IMHO makes the boot less fluid. This option increases the time plymouth waits with showing the splash to 20 seconds.

So if you have a machine with Intel integrated graphics and booting in UEFI mode, you can give flickerfree boot support a spin with Fedora 29 by just adding these 2 commandline options. Note this requires the new grub hidden menu feature to be enabled, see the FAQ on this.

The need for these 2 commandline options shows that the work on this is not yet entirely complete, here is my current TODO list for finishing this feature:

  1. Work with the upstream i915 driver devs to make i915.fastboot the default. If you try i915.fastboot=1 and it causes problems for you please let me know.

  2. Write a new plymouth theme based on the spinner theme which used the vendor logo as background and draws the spinner beneath it. Since this keeps the logo and black background as is and just draws the spinner on top this avoids the current visually jarring transition from logo screen to plymouth, allowing us to set plymouth.splash-delay to 0. This also has the advantage that the spinner will provide visual feedback that something is actually happening as soon as plymouth loads.

  3. Look into making this work with AMD and NVIDIA graphics.

Please give the new flickerfree support a spin and let me know if you have any issues with it.
There have questions about the new GRUB hidden menu Change in various places, here is a FAQ which hopefully answers most questions:

1. What is the GRUB hidden menu change?

See the Detailed Description on the change page. The main motivation for adding this is to get to a fully flickerfree boot.

2. How to enable hidden GRUB menu?

On new Fedora 29 Workstation installs this will be enabled by default. If your system has been upgraded to F29 from an older release, you can enable it by running these commands:

On a system using UEFI booting ("ls /sys/firmware/efi/efivars" returns a bunch of files):

sudo grub2-editenv - set menu_auto_hide=1
sudo grub2-mkconfig -o /etc/grub2-efi.cfg

On a system using legacy BIOS boot:

sudo grub2-editenv - set menu_auto_hide=1
sudo grub2-mkconfig -o /etc/grub2.cfg

Note the grub2-mkconfig will overwrite any manual changes you've made to your grub.cfg (normally no manually changes are done to this file).

If your system has Windows on it, but you boot it only once a year so you would still like to hide the GRUB menu, you can tell GRUB to ignore the presence of Windows by running:

sudo grub2-editenv - set menu_auto_hide=2

3. How to disable hidden GRUB menu

To permanently disable the auto-hide feature run:

sudo grub2-editenv - unset menu_auto_hide

That is it.

4.How to access the GRUB menu when hidden

If for some reason you need to access the GRUB menu while it is hidden there are multiple ways to get to it:

  1. If you can get to gdm, access the top-right menu (the system menu) and click on the power [⏻] icon. Then keep ALT pressed to change the "Restart" option into "Boot Options" and click "Boot Options".

  2. While booting keep SHIFT pressed, usually you need to first press SHIFT when the vendor logo is shown by the firmware / when the firmware says e.g. "Press F2 to enter setup" if you press it earlier it may not be seen. Note this may not work on some machines.

  3. During boot press ESC or F8 while GRUB loads (simply press the key repeatedly directly after power on until you are in the menu).

  4. Force the previous boot to be considered failed:

    1. Press CTRL + ALT + DEL while booting so that the system reboots before hitting gdm

    2. Press CTRL + ALT + F6 to switch away from gdm, followed by CTRL + ALT + DEL.

    3. Press the power-button for 4 seconds to force the machine off.

    Either of these will cause the boot_success grub_env flag to not get set and
    the menu will show the next boot.

  5. Manually set the menu show once flag by running: "grub-set-bootflag menu_show_once" This will cause the menu to show for 60 seconds before continuing with the default boot-option.

5. When is a boot considered successful ?

The boot_success grub_env flag gets set when you login as a normal user and your session lasts at least 2 minutes; or when you shutdown or restart the system from the GNOME system (top-right) menu.

So if you e.g. login, do something and then within 30 seconds type reboot in a terminal (instead of doing the reboot from the menu) then this will not count as a successful boot and the menu will show the next boot.
September 28, 2018

Intro slide


If you're curious about the slides, you can download the PDF or the ODP.


This post has been a part of work undertaken by my employer Collabora.

I would like to thank the wonderful organizers of All Systems Go!, the @ASGConf for hosting a great event.

September 25, 2018

Plasma 5.14 is right around the corner, time to write again an update like I did for 5.13 on what was achieved in terms of Wayland and what is in the work.

On blocking and reprioritizing work

First I directly admit on what I did teaser for 5.14 in my last update but what will not make it: generic gamma color correction per display. There are two reasons for it. The first one is that some preliminary patches, which needed to be merged first, endured long review phases. The second reason is, that when these patches finally were accepted I had shifted my focus on some other topics, which I decided to give an higher priority.

Before delving into these other topics, a short analysis on why the reviews took so long: first there were of course some improvements possible to my patches, but after these got pointed out in the reviews I did fix them back then pretty quickly. The more striking reason is though that we are just short on people who can actually review KWin code changes, in particular with Martin being not maintainer anymore. That is not only a problem for my proposed code changes, but for anyone’s patches to KWin. And this hasn’t improved since back then. We must find a way to reduce the review pressure on the people being capable of doing reviews somehow, at best with only a minimal hit in code quality. I don’t have a full solution for this problem yet, we will see if we find a good one.

After this is out of the way, let us talk about these other features, which I prioritized higher.

Xwayland drag and drop translation

Drag and drop is an important workflow on desktop platforms. In our Wayland session Xwayland clients were able to do this between each other through the X protocol provided by Xwayland. Wayland native clients were able to do it with the Wayland interfaces for data sharing, what already has been implemented in most aspects in KWin and KWayland.

But these two world were separated. Wayland native clients could not drop something on an Xwayland client and vice versa. For that to work a translation layer needed to be created. We had already some small workaround to translate the clipboard in place, but a similar small workaround would not have worked for drag and drop.

Nevertheless I have a solution now, not a small one for sure, but it is similar to how Weston and wlroots try to solve the task, what will hopefully allow us to work closer together in this regard in the future. The solution also rewrites the clipboard mechanism to have it more in line with the other compositors and to be better integrated in the Xwayland handling part of KWin. All the details I omit here for now, but if there is interest I will write another article only about this subject.

In regards to the other compositors one last comment: the Weston solution seems to be only partly done and wlroots still struggles with a few remaining issues, but it was still immensely helpful to read their code to get a first understanding on what needs to be done besides some other literature. I hope I can repay the favor by providing now some more information to them on what I learned when I wrote my patches.

Pointer lock and confining reimagined

While Xwayland drag and drop translation took up the majority of my time in the last three months and is somewhat more important to the majority of our users, who do daily work in the browser, there is a particular subset of users, who will be very happy about a series of changes I did before that and which even already will be available in 5.14. I am talking about our support for pointer constraining, what consists of pointer locking and pointer confining, and the particular subset of users affected by that are gamers.

In the past we already supported pointer constraints, but the support lacked in numerous aspects. I won’t go into detail, but anybody who tried to play some games in a Wayland session can probably tell you about problems, starting from annoyingly often displayed warning messages to never locked or never unlocked cursors. A list of issues and what I did to solve them, can be found here.

The end result should be that pointer constraints can be used from 5.14 on without any hiccups and without even being noticed. This should improve the quality of our Wayland session for gamers drastically and I have planned some more changes for the future to improve it even more.

Touch drags and input redirection rework

A small missing item I noticed when working on Xwayland drag and drop was that we did not yet support Wayland native drags executed through input on a touch screen. To enable these only a few small code additions to KWayland and to KWin were necessary.

But when working on this feature and with my experience from the Xwayland drag and drop project it became all so more apparent to me that our input redirection code for pointer and touch devices needed a serious overhaul. In general the original idea behind it was fine: input redirection detects which client, decoration or internal window (a surface being created by KWin to use on its own, for example for depicting its effects) has device input focus. There is an update function per device, which redetermines the target, and there are filters to run though to channel off events in case of compositor overruling for example by some effect. But in detail there was much code duplication going on and the update function was recalled rather randomly throughout numerous filters.

My patch improves the code in both aspects: the update function gets called only once per event and code duplication is reduced through more inheritance usage. Also it is now more clear when the input focus is on the decoration of a client or if the client is an internal window and what to do then. Overall the effects of this rework will not be directly visible to the user, but it will give us a stronger foundation to build upon in the future.


There is some more foundational work without a direct huge visible payoff necessary, in particular to how internal windows are being treated in KWin, which spawned some nasty problems in the past and needs a more thorough investigation than quick workarounds to ease the immediate pain.

Other foundational work, but which at least has direct impact, is the reorganization of our painting pipeline for multiple outputs at different frequencies, what was mentioned above already as an improvement for gamers. Doing this will not be a small feat and we will see when there is time for it. I hope sooner than later.

What should not be forgotten when talking about this particular feature is adaptive synchronization, better known under AMD’s trademark FreeSync and that we want to support this technology of course as well. But the patches to the kernel and Mesa only recently have been posted to the respective mailing lists and I have literally no idea what is necessary from our side to enable it.

This and others topics to discuss is what I am looking forward to when the X.Org Developer’s Conference 2018 starts tomorrow morning in A Coruña. I traveled a bit through Galicia in the last few days with a rental car and will arrive in the city at some point later today (by the way the picture accompanying this article is from one of the stops I made). Looking at the program and the attendees this should become a very interesting conference. If you are interested as well, you can follow the livestream starting tomorrow with the regular program.

September 24, 2018

So anyone reading my blog posts would probably have picked up on my excitement for the PipeWire project, the effort to unify the world of Linux audio, add an equivalent video bit and provide multimedia handling capabilities to containerized applications. The video part as I have mentioned before was the critical first step and that is starting to look really good with the screen sharing functionality in GNOME shell already using PipeWire and equivalent PipeWire support being added to KDE by Jan Grulich. We have internal patches for both Firefox and Chrome(ium) which we are polishing up to propose them upstream, but we will in the meantime offer them as downstream patches in Fedora as soon as they are ready for primetime. Once those patches are deployed you should have any browser based desktop sharing software, like Google Hangouts, working fully under Wayland (and X).

With the video part of PipeWire already in production we decided the time has come to try to accelerate the development of the audio bits. So PipeWire creator Wim Taymans, PulseAudio developer Arun Raghavan and myself decided to try to host a PipeWire hackfest this fall to bring together many of the core Linux audio developers to try to hash out a plan and a roadmap. So I am very happy to say that at the end of October we will have a gathering in Edinburgh to work on this and the critical people we where hoping to have there are coming. Filipe Coelho who is the current lead developer on Jack will be there alongside Arun Raghavan, Colin Guthrie and Tanu Kaskinen from PulseAudio, Bastien Nocera from the GNOME project and Jan Grulich from KDE will be there representing desktop integration and finally Nirbheek Chauhan, Nicolas Dufresne and George Kiagiadakis from the GStreamer project. I think we have about the right amount of people for this to be productive and at the same time have representation from everyone who needs to be there, so I am feeling very optimistic that we can come out of this event with both a plan for what we want to do and the right people involved to make it happen. The idea that we can have a shared infrastructure for consumer level audio and pro-audio under Linux really excites me and I do believe that if we do this right Linux will take a huge step forward as a natural home for pro-audio desktop users.

A big thanks you to the GNOME Foundation for sponsoring this event and allow us to bring all this people together!

September 15, 2018

XDC 2018 is going to happen in the Computer Science Faculty of University of Coruña, in the city of A Coruña, Spain during the last week of September, from Wednesday 26th to Friday 28th. This is a 3-day conference full of talks about all the technologies about free software graphics stack covering different topics like graphics driver development, testing and benchmarking, DRM, X, virtualization… Check the schedule for more info.

As member of the organization team, I’m proud to announce that there are several surprises for the attendees :-)

  • Tuesday: welcome party, sponsored by Igalia. Traditionally, XDC organizes a networking event the day before the conference, in order to meet each other and share some drinks and food together. This year, the welcome party is going to happen at the Breen’s Tavern, Praza Maria Pita, 24, A Coruña, from 19:00 to 23:00. There will be beverages and food for free for registered attendees thanks to Igalia :-) There will be a registration desk there too.

  • Wednesday: guided tour and stand-up dinner, sponsored by Igalia. This is the first time XDC is happening in A Coruña, which is where Igalia’s headquarters are located. Igalia wants to show the city to the attendees so they can discover the hidden gems it has! There will be a bus waiting in front of the venue and will take us to the city center, where the 2-hour guided tour (in English) will start. After that, we will head to Rectorado UDC where the stand-up dinner is going to happen. The stand-up dinner is a kind of dinner where there are no chairs but only tables with food on them, facilitating chit-chatting among the attendees :-) Again, this two events are for free for registered attendees thanks to Igalia.

  • Saturday: sightseeing activity in Santiago de Compostela. Another XDC tradition is sightseeing activity for the attendees that are staying on Saturday. We are going to visit the city of Santiago de Compostela, Do you want to know more about the Way of Saint James? Walk through the old town which was designated as UNESCO World Heritage Site? We will meet in A Coruña’s train stationat 9:30, to take a train to Santiago. There, we will do a guided tour visiting the relevant parts of the old town, have lunch there and spend free time together during the afternoon until we come back by train around 18:00. The organization asks for 20 EUR to buy the train tickets and to book the guided tour, but the lunch is not included. If you want to attend it, just notify it in the registration desk at the venue.

  • Conference days: lunches are sponsorized by X.Org Foundation. This year, the X.Org Foundation is sponsoring the lunches for all the registered attendees in order to facilitate networking, discussions and spend more time together talking about free software instead of going far away for having lunch. We booked space in the nearest canteen of the campus (see itinerary) where the daily menu will be for free for registered attendees. Thanks X.Org Foundation!

The organization team wants also to help attendees. If you have any food allergy, dietary restriction or something else that we should know in advance, please contact us as soon as possible, either by or at the registration desk.

See you at Developer’s Conference 2018!

September 10, 2018

All Systems Go! 2018 Tickets Selling Out Quickly!

Buy your tickets for All Systems Go! 2018 soon, they are quickly selling out! The conference takes place on September 28-30, in Berlin, Germany, in a bit over two weeks.

Why should you attend? If you are interested in low-level Linux userspace, then All Systems Go! is the right conference for you. It covers all topics relevant to foundational open-source Linux technologies. For details on the covered topics see our schedule for day #1 and for day #2.

For more information please visit our conference website!

See you in Berlin!

September 09, 2018

Last month KDE Akademy was held in Vienna. It was the first Akademy I visited and there wasn’t yet time to write a bit about the impression I got from it, judging what was nice and what could be improved from the point of view of someone new to it. Time to catch up on that.

Akademy came at a bad point in time for me. I was right in the middle of writing code for a larger feature in KWin’s Wayland session: drag-and-drop support between Wayland native and Xwayland windows. When I began the work on this feature back in July I hoped that I could finish it until Akademy. Not being able to do so felt demotivating, but I have to admit my plan was way too optimistic anyways. Only now, several weeks after Akademy, I feel comfortable enough about my code to use it on my work system without constant anxiety for fatal session crashes. But anyway, I went to my first Akademy with a bit less enthusiasm, as I otherwise probably would have shown. On the other side this gives me maybe also a more neutral take on it.

Akademy is basically split into two phases: the talks at the beginning on Saturday and Sunday and the BoFs for the rest of the time from Monday till Friday.

The talks

This is basically what you expect from a conference. Talks by people involved in the community or friends from the outside about different topics related to KDE. And these topics were very different indeed, what shows how large the KDE community and its reach really is. Often KDE is identified with the desktop environment Plasma alone, but there is much more to it. Having this kind of diversity in topics is a great plus to Akademy.

Still one has to say that Plasma is KDE’s flagship offering and by that deserves a central spot in the lineup. So judging from a Plasma dev point of view was this the case at this Akademy? That’s difficult to judge. There were interesting longer talks by David and Nate about the core Plasma Desktop experience, David’s talk technical, Nate’s talk on a broader more visionary note they both presented a path forward for Plasma. There were also talks about Plasma in other contexts, for example by Bhushan about Plasma on Mobile.

So judging by quantity there were enough Plasma talks I would say. Also I can’t really complain, since I personally did not contribute to increasing the quantity by submitting a talk proposal myself. The reason was just that on my first Akademy I wanted to be a listener only. So let us say quantity was fine and quality from what I can tell as a listener as well.

Still there was a distinct feel of too few, too disconnected, and I believe especially too disconnected describes it correctly, not only in regards to Plasma. The talks were spread out over two days in two different sized rooms somewhat randomly. There was no noticeable order to them or a grand arc connecting them besides being somewhat KDE related. One could argue this is a direct result of the KDE community being so diverse in interests and topics. But I believe with good planning by some key figures from the different parts of the community the talks could have been placed better and feel less disjointed. Also more relevant key notes would have helped. They could have provided for example an overarching focus point of the conference interconnecting the talks.

The BoFs

The second part of Akademy were the BoFs, what went for a full workweek. BoF means Birds of a feather and it denotes an informal discussion group about a specific topic. There were many BoFs, often in parallel, and I visited lots of them.

I won’t go into detail for all of them, but a general impression I got was that there is no clear definition of what a BoF should be and what should happen. What is kind of the point of a BoF but somewhat still often disappointing. In an extreme case a BoF I visited was more an internal discussion of a team of half a dozen people working closely together and left everybody just interested in learning more about the project on the outside since they did not know enough about the project’s internals yet. The BoF itself was not named as such and outsiders visiting out of interest were for sure disappointed. Other BoFs were more welcoming to newcomers, but still lacked a guiding structure or just someone moderating the discussion efficiently.

A plan to improve the BoF sessions could be the following: split them up into a mandatory common / introductory part and an optional team / current progress / special topic part, and advertise them as such to conference attendees. While both parts would still be open to anyone, the common part would be specifically suited for newcomers to the project while the team part can be used to discuss current topics in-depth. This at first sounds like it is double the work for project members, but the common part could be simply some reusable presentation slides explaining the core structure of the project together with a Q&A session. So I believe the additional amount of work is small. Also people would know what they get into and could plan their time efficiently. Besides one person from the team, which would probably most often be the maintainer or project lead, others could avoid the introductory part and newcomers could avoid the team part if they don’t yet feel knowledgeable enough to follow the discussion.

The organisation

In general I felt the conference was well organized for being done by volunteers, who were by the way super friendly and keen on helping and solving problems. There were a few issues ranging from non-functional printers to a way too small social event site, but they were so minor to not be worth delving into them more.

But there is one large issue that can not be ignored, and this is the availability and quality of videos form the conference. The videos have just been published recently and I think this is too late. They should be online not later than one week after Akademy.

And watching these videos is no fun at all. The talk recordings have a low resolution, filmed from way back in the room, and the voice quality often is abysmal. In comparison to last Akademy the vidoes already have improved, but they still lack the quality you would expect from an open tech community having the web as the main communication channel.

As an example what a good recording looks like take a look at this talk out of the chamber of darkness. I personally would take some of the Pineapple money and pay a team of professionals at next Akademy to record the talks in HD and upload them timely if we can not do it on our own. Enabling people who can not travel to Akademy to watch the presentations pleasantly and the additional exposure are always worth it.

The social and unofficial program

Of course technical topics are not everything about Akademy. The conference exists also to connect like-minded people and let otherwise semi-anonymous contributors meet each other in person, what often enables them to work better together on KDE projects in the future. Overall for me personally this was fine. I know most of the Plasma devs already and we just had a meeting in Berlin a few months ago. So there was not really that much need to talk in person again, although it helped for some current hot topics of course.

But there were some members of the VDG I was really looking forward to meet in person and the discussions we had were very fruitful. Also it was great to get to know Carlos from the GNOME project. I hope we can improve the collaboration of our communities in term of Wayland in the future. What I missed was talking more with some of the KDE Apps and Frameworks devs. I am not sure why this was. Maybe my personal area of work just does not intersect that much with apps developers at the moment.


This article was in some parts quite critical to Akademy, but that does not mean the conference was bad. Quite the contrary, I enjoyed meeting all the members and friends of the KDE community and I can recommend going there to anyone from user to maintainer of KDE software. It doesn’t matter what kind of KDE offering you use or contribute to, you will find sessions that interest you and people who share that same interest. Next Akademy is in less than a year and I look forward to meeting you there again or for the first time.

September 07, 2018

TWIV3D has been inactive for a bit, because I’ve been inactive for a bit. On August 15th, we added Moss Anholt to our family:


So most of my time has been spent laying around with them and keeping us fed instead of writing software.

Here and there I’ve managed to write some patches. I’ve been working on meeting the current GLES3 CTS’s requirement for a 565 window system buffer being available. We don’t expose those on X11 today, because why would you? Your screen is 8888, so all you could do with 565 is render to a pixmap or buffer. Rendering to a pixmap is useless, becasue EGL doesn’t tell you whether R is in the top or bottom bits, so the X11 client doesn’t know how to interpret the rendered contents other than glReadPixels() (at which point, just use an FBO). My solution for this bad requirement from the CTS has been to add support for 565 pbuffers, which is pointless but is the minimum. This led to a patch to the xserver and a fix to V3D for blending against RTs without alpha channels.

I’m particularly proud of a small series to create a default case handler function for gallium pipe caps. Previously, every addition of a PIPE_CAP (which happens probably at least 10 times a year) has to touch every single driver, or that driver will start failing. It’s easy to forget a driver, particularly when rebasing a series across the addition of a new driver. And for driver authors, you’re faced with this list of 100 values which you don’t know the correct state of when starting to write a driver. The new helper lets CAP creators set a default value (usually whatever the hardcoded value was before their new CAP), and lets driver authors specify a much smaller set of required caps to get started.

Just before Moss arrived, I had landed some patches to VC4 to improve texture upload/download performance. Non-GL SDL apps on glamor were really slow, because the window wouldn’t be aligned to tiles and we’d download the tile-aligned area from the framebuffer to a raster-order temporary, store our new contents into it, then reupload that to the screen. The download from write-combined memory is brutally slow. My new code (mostly a rebase of patches I wrote in January 2017!) got the following results:

  • Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5)
  • Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11)
  • Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)

I also wrote a followup patch to avoid the copy from the user’s buffer into the temporary in the driver for texture uploads, for another 12.1586% +/- 1.38155% (n=145)

Other things since my last post:

  • Fixed VCM cache size setup and the V3D driver’s idea of the VPM size.
  • Fixed a leak of X11 pixmaps backing pbuffers on DRI3.
  • Fixed vc4_fence_server_sync() on pre-syncobj kernels.
  • Fixed vc4 uniform offsets after a sampler in a structure.
  • Improved vc4 uniform upload performance.
  • Fixed vc4 MSAA tile loads when both Z and color are needed.
  • Fixed vc4 rendering to cubemap EGL images.
  • Fixed vc4 leak of the no-vertex-elements workaround BO.
  • Fixed some v3d register spilling bugs.
September 04, 2018

libinput 1.12 was a massive development effort (over 300 patchsets) with a bunch of new features being merged. It'll be released next week or so, so it's worth taking a step back and looking at what actually changed.

The device quirks files replace the previously used hwdb-based udev properties. I've written about this in more detail here but the gist is: we have our own .ini style file format that can match on devices and apply the various quirks devices need. This simplifies debugging a lot, we can now reliably tell users why a quirks file applies or doesn't apply, historically a problem with the hwdb.

The sphinx-based documentation was merged, fixed and added to. We switched to sphinx for the docs and the result is much more user-friendly. Which was the point, it was a switch from a developer-oriented documentation to a user-oriented one. Not that documentation is ever finished.

The usual set of touchpad improvements went in, e.g. the slight motion on finger up is now ignored. We have size-based thumb detection now (useful for Apple touchpads!). And of course various quirks for better pressure ranges, etc. Tripletap on some synaptics touchpads had a tendency to cause multiple taps because of some weird event sequence. Movement in the software button now generates events, the buttons are not just a dead zone anymore. Pointer jump detection is more adaptive now and catches and discards smaller jumps that previously slipped through the cracks. A particularly quirky behaviour was seen on Dell XPS i2c touchpads that exhibit a huge pointer jump, courtesy of the trackpoint controller going to sleep and taking its time to wake up. The delay is still there but the pointer at least lands in the correct location.

We now have improved direction-locking for two-finger scrolling on touchpads. Scrolling up/down should not generate horizontal scroll events anymore as long as the movement is close enough to vertical. This feature is transparent, a diagonal or horizontal movement will immediately disable the direction lock and produce horizontal scroll events as expected.

The trackpoint acceleration has been re-done, see this post for more details and links to the background articles. I've only received one bug report for the new acceleration so it seems to work quite well now. Trackpoints that send events in bursts (e.g. bluetooth ones) are smoothened now to avoid jerky movement.

Velocity averaging was dropped to increase pointer accuracy. Previously we averaged the velocity across multiple events which makes the motion smoother on jittery devices but less accurate on good devices.

We build on FreeBSD now. Presumably this also means it works on FreeBSD :)

libinput now supports palm detection on touchscreens, at least where the ABS_MT_TOOL_TYPE evdev bit is provided.

I think that's about it. Busy days...

September 02, 2018

A 3D model of a cat with Phong shading

In January 2018, I began using my C201 full-time. In June 2018, its charging port broke, apparently due to physical damage, and I was forced to discontinue my school laptop and Mali T760 development machine.

Rest in peace, Veyron Speedy.

In its place, I purchased adopted Kevin, better known by its brand name, the Samsung Chromebook Plus (v1). Kevin contains a 64-bit Rockchip RK3399 processor, a substantial upgrade over the C201’s Rockchip RK3288, with a Mali T860 graphics processor. The T860, a new version of Mali Midgard in between its predecessor T760 and its successor Bifrost, was not supported in Panfrost, the community-led free software driver for modern Mali GPUs ubiquitous in phones and Chromebooks.

Meanwhile, back in May, TL Lim of Pine64 generously offered to send me a RK3399 development board, a ROCKPRO64. During my university days, the board arrived… and so work begun on T860 support.

The new hardware contains a number of challenges unseen in its predecessors. For instance, the RK3288 is a 32-bit processor (armv7) whereas the RK3399 is 64-bit (armv8). While Midgard is natively 64-bit, there are many memory-saving optimizations specifically for 32-bit machines like the RK3288. For instance, in the 32-bit command stream, the field encoding line width is stuffed in between a pointer at the end of the header and the beginning metadata of the tiler payload, in a spot that normally would contain padding to align everything. On a 64-bit machine, however, the preceding pointer is a full 8 bytes, and there is no room for the line width hack to function; this field is instead moved from the first to the last index in the data structure.

Another small change that took far too long to figure out was the modified kernel API for importing memory on 64-bit systems. Imports are necessary to setup the framebuffer correctly, avoiding expensive fullscreen blits. Once I read through the 64-bit portions of the kernel module, it was clear how to implement the new calls, but this change nevertheless slowed initial progress.

Another more challenging departure was the FBDs. From the kernel sources, we know there are two mysterious acronyms, SFBD and MFBD. We figured out these referred to the Framebuffer Descriptor. Why two? It turns out that older Midgard used the SFBD, and newer Midgard and Bifrost use the MFBD. The change coincided with the introduction of multi-render target (MRT) support in the newer chips. Accordingly, we are guessing these are the Single Framebuffer Descriptor (SFBD) and Multiple Framebuffer Descriptor (MFBD), though we can of course never be sure.

In any event, while the T760 supported MFBDs, it was possible to use a 32-bit SFBD instead. On the newer T860, due to the upgraded development hardware, we had no choice but to support the newer data structure. Luckily, Connor had already decoded the majority of the MFBD as part of his investigation in Bifrost. I was easily able to adapt his changes for the T860-family of processors, et voila, the command stream was making sense.

After decoding the new parts of the command stream, I implemented support in the Gallium driver for the new structures; as structures like the framebuffer descriptor are so fundamental, this implied significant changes. But after some fervent coding, the driver for T860 reached feature parity with Panfrost on the T760.

Turning our attention to shaders, it turns out there are no significant differences in the shader core between these two chips. That is, the minute I could replay a triangle on my laptop, we had free shaders. Woohoo!

Once the new hardware was setup, I turned my focus to cleaning up the codebase, implementing more of OpenGL ES 2.0, and debugging issues in the existing implementations. I squashed bugs (“Ow!”) ranging from incorrect stencil testing to broken colour masking in the absence of blending to missing movs in certain shaders. In terms of new features, I implemented the scissor test and added support for shaders with multiple varyings, a common pattern which did not show up in the simpler 3D tests. Multi-varying support required changes in both the command stream driver and the compiler, although we’re now able to understand why varyings are encoded in the (quirky) way they are.

I also (finally) fixed glReadPixels support, so we can start testing with dEQP and Piglit. The results remain somewhat depressing, but it’s a starting point for further driver debugging as we make the jump from proof-of-concept to production code.

Soup’s up, everypony!

(The soup is code.)

On the Bifrost side, Connor Abbott and Lyude Paul have been busy!

After decoding more of the Bifrost ISA, Connor turned his attention to the Bifrost command-stream, an iterative improvement on the Midgard command-stream. Building on my earlier work with the Midgard command stream, Connor identified analogue structures between the two versions, as well as marking off new structures including the aforementioned MFBD.

After decoding these initial structures, he implemented support in panwrap for generating replay-style traces of Bifrost command streams, again building on the infrastructure originally built for Midgard. Although he did not make direct use of the replay facilities in panwrap, these changes did enable a great deal of the Bifrost command stream to be elucidated.

He then went a step further, beyond OpenGL ES 2.0 vertex shaders and fragment shaders, venturing into the land of… compute shaders. Dun dun dun! (“What, no scary music?” “Sorry, Proselint says you already violated the 30ppm of exclamations rule. We are therefore forbidden from aiding your overexcited shenanigans.” “Fiiine. Do I at least get a singing entourage?” “Definitely not.”)

Incidentally, the initial investigations into compute shaders have shed light on vertex shaders, which are internally structured like compute jobs on both Bifrost and Midgard, with each vertex corresponding to a single invocation. However, Connor had a more immediate purpose in mind with his pursuit of compute shaders: gaining the ability to execute arbitrary shader binaries on a Bifrost board, implementing a “shader runner” to use the trochaic name. Midgard never needed this tool, as I have generally implemented the command stream parts of the driver in advance of the compiler, but Bifrost’s ISA decoding is considerably further along than its graphics pipeline. Through his earlier work with panwrap, he was successfully able to build such a tool, directly submitting simple compute jobs with offline precompiled shaders to the Bifrost GPU!

That does beg the question – where do these shader binaries come from, as work has not yet begun on the Bifrost compiler? As you may know, Lyude has been at work implementing a Bifrost assembler, a large task given the shear complexity of Bifrost, especially in comparison to Midgard – and a task made even harder without the ability to test on neither real hardware nor emulators. Paired with Connor’s shader runner, however, the assembler’s shader binaries can be tested, and her assembler is showing signs of life! Pairing these two components have enabled the first free shaders to run on Bifrost. While there are a few unimplemented opcodes remaining, the assembler is working well.

Returning to the shader running, beyond its immense value as a proof-of-concept, a side benefit of this instrumentation is enabling fine-grained ISA decoding. Whereas Midgard uses high-level arithmetic opcodes, with dedicated instructions for operations like a reciprocal-square root, Bifrost uses a minimalist architecture, dividing complex high-level operations into many small instructions. However, it can be difficult to discern the individual meaning of each micro opcode from simply studying disassembled compiled shaders. Nevertheless, with the new ability to assemble and run shaders, it is possible to test micro operations individually, determining their functions outside the high-level sequence. Through this technique, Connor has been able to understand the fine-grained details of operations like division on Bifrost.

All in all, our understanding of Bifrost is beginning to converge with that of Midgard. The next step, of course, will be to implement the Bifrost compiler and extend the Midgard driver to support Bifrost command streams. But hey, if someone had told me a year ago that we could run code natively, bloblessly, on virtually every Mali device, I don’t think I would have believed them. But between our work on Midgard and Bifrost, not to mention the Lima project’s work on Utgard, this dream has become a reality. In the words of Equestria Girls, what a strange new world…

But let’s not get ahead of ourselves. On the Mali T860 (Midgard), Freedreno’s test-cat and glmark2’s equivalent build test run successfully. The shaded cat screenshot at the beginning of the post is a native blobless render on the T860. Of course, the tests demoed on previous blog posts, like the shiny textured cube, continue to work on the new hardware.

Oh, and save for a funky viewport (corrected here) and some sporadic artefacts, it runs ’gears:

es2gears on the T860 with the blobless Panfrost stackes2gears on the T860 with the blobless Panfrost stack


August 30, 2018

Intro slide


If you're curious about the slides, you can download the PDF or the ODP.


This post has been a part of work undertaken by my employer Collabora.

I would like to thank the wonderful organizers of OSSummit NA, the Linux Foundation for hosting a great event.

August 27, 2018

Window Scaling

One of the ideas we had in creating the compositing mechanism was to be able to scale window contents for the user -- having the window contents available as an image provides for lots of flexibility for presentation.

However, while we've seen things like “overview mode” (presenting all of the application windows scaled and tiled for easy selection), we haven't managed to interact with windows in scaled form. That is, until yesterday.

glxgears thinks the window is only 32x32 pixels in size. xfd is scaled by a factor of 2. xlogo is drawn at the normal size.

Two Window Sizes

The key idea for window scaling is to have the X server keep track of two different window sizes -- the sarea occupied by the window within its parent, and the area available for the window contents, including descendents. For now, at least, the origin of the window is the same between these two spaces, although I don't think there's any reason they would have to be.

  • Current Size. This is the size as seen from outside the window, and as viewed by all clients other than the owner of the window. It reflects the area within the parent occupied by the window, including the area which captures pointer events. This can probably use a better name.

  • Owner Size. This is the size of the window viewed from inside the window, and as viewed by the owner of the window. When composited, the composite pixmap gets allocated at this size. When automatically composited, the X server will scale the image of the window from this size to the current size.

Clip Lists

Normally, when computing the clip list for a composited window, the X server uses the current size of the window (aka the “borderSize” region) instead of just the porition of the window which is not clipped by the ancestor or sibling windows. This is how we capture output which is covered by those windows and can use it to generate translucent effects.

With an output size set, instead of using the current size, I use the owner size instead. All un-redirected descendents are thus clipped to this overall geometry.

Sub Windows

Descendent windows are left almost entirely alone; they keep their original geometry, both position and size. Because the output sized window retains its original position, all of the usual coordinate transformations 'just work'. Of course, the clipping computations will start with a scaled clip list for the output sized window, so the descendents will have different clipping. There's suprisingly little effect otherwise.

Output Handling

When an owner size is set, the window gets compositing enabled. The composite pixmap is allocate at the owner size instead of the current size. When no compositing manager is running, the automatic compositing painting code in the server now scales the output from the output size to the current size.

Most X applications don't have borders, but I needed to figure out what to do in case one appeared. I decided that the boarder should be the same size in the output and current presentations. That's about the only thing that I could get to make sense; the border is 'outside' the window size, so if you want to make the window contents twice as big, you want to make the window size twice as big, not some function of the border width.

About the only trick was getting the transformation from output size to current size correct in the presence of borders. That took a few iterations, but I finally just wrote down a few equations and solved for the necessary values. Note that Render transforms take destination space coordinates and generate source space coordinates, so they appear “backwards”. While Render supports projective transforms, this one is just scaling and translation, so we just need:

x_output_size = A * x_current_size + B

Now, we want the border width for input and output to be the same, which means:

border_width + output_size = A * (border_width + current_size) + B
border_width               = A * border_width                   + B

Now we can solve for A:

output_size = A * current_size
A = output_size / current_size

And for B:

border_width = output_size / current_size * border_width + B
B = (1 - output_size / current_size) * border_width

With these, we can construct a suitable transformation matrix:

⎡ Ax  0 Bx ⎤
⎢  0 Ay By ⎥
⎣  0  0  1 ⎦

Input Handling

Input device root coordinates need to be adjusted for owner sized windows. If you nest an owner sized window inside another owner sized window, then there are two transformations involved.

There are actually two places where these transformations need to be applied:

  1. To compute which window the pointer is in. If an output sized window has descendents, then the position of the pointer within the output window needs to be scaled so that the correct descendent is identified as containing the pointer.

  2. To compute the correct event coordinates when sending events to the window. I decided not to attempt to separate the window owner from other clients for event delivery; all clients see the same coordinates in events.

Both of these require the ability to transform the event coordinates relative to the root window. To do that, we translate from root coordinates to window coordinates, scale by the ratio of output to current size and then translate back:

OwnerScaleCoordinate(WindowPtr pWin, double *xd, double *yd)
    if (wOwnerSized(pWin)) {
        *xd = (*xd - pWin->drawable.x) * (double) wOwnerWidth(pWin) /
            (double) pWin->drawable.width + pWin->drawable.x;
        *yd = (*yd - pWin->drawable.y) * (double) wOwnerHeight(pWin) /
            (double) pWin->drawable.height + pWin->drawable.y;

This moves the device to the scaled location within the output sized windows. Performing this transformation from the root window down to the target window adjusts the position correctly even when there is more than one output sized window among the window ancestry.

Case 1. is easy; XYToWindow, and the associated miSpriteTrace function, already traverse the window tree from the root for each event. Each time we descend through a window, we apply the transformation so that subsequent checks for descendents will check the correct coordinates. At each step, I use OwnerScaleCoordinate for the transformation.

Case 2. means taking an arbitrary window and walking up the window tree to the root and then performing each transformation on the way back down. Right now, I'm doing this recursively, but I'm reasonably sure it could be done iteratively instead:

ScaleRootCoordinate(WindowPtr pWin, double *xd, double *yd)
    if (pWin->parent)
        ScaleRootCoordinate(pWin->parent, xd, yd);
    OwnerScaleCoordinate(pWin, xd, yd);

Events and Replies

To make the illusion for the client work, everything the client hears about the window needs to be adjusted so that the window seems to be the owner size and not the current size.

  • Input events. The root coordinates are modified as described above, and then the window-relative coordinates are computed as usual—by subtracting the window origin from the root position. That's because the windows are all left in their original location.

  • ConfigureNotify events. These events are rewritten before being delivered to the owner so that the width and height reflect the owner size. Because window managers send synthetic configure notify events when moving windows, I also had to rewrite those events, or the client would get the wrong size information.

  • PresentConfigureNotify events. For these, I decided to rewrite the size values for all clients. As these are intended to be used to allocate window buffers for presentation, the right size is always the owner size.

  • OwnerWindowSizeNotify events. I created a new event so that the compositing manager could track the owner size of all child windows. That's necessary because the X server only performs the output size scaling operation for automatically redirected windows; if the window is manually redirected, then the compositing manager will have to perform the scaling operation instead.

  • GetGeometry replies. These are rewritten for the window owner to reflect the owner size value. Other clients see the current size instead.

  • GetImage replies. I haven't done this part yet, but I think I need to scale the window image for clients other than the owner. In particular, xwd currently fails with a Match error when it sees a window with a non-default visual that has an output size smaller than the window size. It tries to perform a GetImage operation using the current size, which fails when the server tries to fetch that rectangle from the owner-sized window pixmap.

Composite Extension Changes

I've stuck all of this stuff into the Composite extension; mostly because you need to use Composite to capture the scaled window output anyways.

12. Composite Events (0.5 and later)

Version 0.5 of the extension defines an event selection mechanism and a couple of events.

    CompositePixmapNotify = 0
    CompositeOwnerWindowSizeNotify = 1

Event type delivered in events

    CompositePixmapNotifyMask = 0x0001
    CompositeOwnerWindowSizeNotifyMask = 0x0002

Event select mask for CompositeSelectInput

⎢    CompositeSelectInput
⎢                window:                                Window
⎢                enable                                SETofCOMPOSITEEVENTMASK

This request selects the set of events that will be delivered to the client from the specified window.

    type:            CARD8          XGE event type (35)
    extension:       CARD8          Composite extension request number
    sequence-number: CARD16
    length:          CARD32         0
    evtype:          CARD16         CompositePixmapNotify
    window:          WINDOW
    windowWidth:     CARD16
    windowHeight:    CARD16
    pixmapWidth:     CARD16
    pixmapHeight:    CARD16

This event is delivered whenever the composite pixmap for a window is created, changed or deleted. When the composite pixmap is deleted, pixmapWidth and pixmapHeight will be zero. The client can call NameWindowPixmap to assign a resource ID for the new pixmap.

13. Output Window Size (0.5 and later)

⎢    CompositeSetOwnerWindowSize
⎢                window:                                Window
⎢                width:                                 CARD16
⎢                height:                                CARD16

This request specifies that the owner-visible window size will be set to the provided value, overriding the actual window size as seen by the owner. If composited, the composite pixmap will be created at this size. If automatically composited, the server will scale the output from the owner size to the current window size.

If the window is mapped, an UnmapWindow request is performed automatically first. Then the owner size is set. A CompositeOwnerWindowSizeNotify event is then generated. Finally, if the window was originally mapped, a MapWindow request is performed automatically.

Setting the width and height to zero will clear the owner size value and cause the window to resume normal behavior.

Input events will be scaled from the actual window size to the owner size for all clients.

A Match error is generated if:

  • The window is a root window
  • One, but not both, of width/height is zero

And, of course, you can retrieve the current size too:

⎢    CompositeGetOwnerWindowSize
⎢                window:                                Window
⎢                →
⎢                width:                                 CARD16
⎢                height:                                CARD16

This request returns the current owner window size, if set. Otherwise it returns 0,0, indicating that there is no owner window size set.

    type:            CARD8          XGE event type (35)
    extension:       CARD8          Composite extension request number
    sequence-number: CARD16
    length:          CARD32         0
    evtype:          CARD16         CompositeOwnerWindowSizeNotify
    window:          WINDOW
    windowWidth:     CARD16
    windowHeight:    CARD16
    ownerWidth:      CARD16
    ownerHeight:     CARD16

This event is generated whenever the owner size of the window is set. windowWidth and windowHeight report the current window size. ownerWidth and ownerHeight report the owner window size.

Git repositories

These changes are in various repositories at all using the “window-scaling” branch:

And here's a sample command line app which modifies the owner scaling value for an existing window:

Current Status

This stuff is all very new; I started writing code on Friday evening and got a simple test case working. I then spent Saturday making most of it work, and today finding a pile of additional cases that needed handling. I know that GetImage is broken; I'm sure lots of other stuff is also not quite right.

I'd love to get feedback on whether the API and feature set seem reasonable or not.

The DRM (direct rendering manager, not the content protection stuff) graphics subsystem in the linux kernel does not have a generic 2D accelaration API. Despite an awful lot of of GPUs having more or less featureful blitter units. And many systems need them for a lot of use-cases, because the 3D engine is a bit too slow or too power hungry for just rendering desktops.

It’s a FAQ why this doesn’t exist and why it won’t get added, so I figured I’ll answer this once and for all.

Bit of nomeclatura upfront: A 2D engine (or blitter) is a bit of hardware that can copy stuff with some knowledge of the 2D layout usually used for pixel buffers. Some blitters also can do more like basic blending, converting color spaces or stretching/scaling. A 3D engine on the other hand is the fancy bit of high performance compute block, which run small programs (called shaders) on a massively parallel archicture. Generally with huge memory bandwidth and a dedicated controller to feed this beast through an asynchronous command buffer. 3D engines happen to be really good at rendering the pixels for 3D action games, among other things.

There’s no 2D Acceleration Standard

3D has it easy: There’s OpenGL and Vulkan and DirectX that require a certain feature set. And huge market forces that make sure if you use these features like a game would, rendering is fast.

Aside: This means the 2D engine in a browser actually needs to work like a 3D action game, or the GPU will crawl. The impendence mismatch compared to traditional 2D rendering designs is huge.

On the 2D side there’s no such thing: Every blitter engine is its own bespoke thing, with its own features, limitations and performance characteristics. There’s also no standard benchmarks that would drive common performance characteristics - today blitters are neeeded mostly in small systems, with very specific use cases. Anything big enough to run more generic workloads will have a 3D rendering block anyway. These systems still have blitters, but mostly just to help move data in and out of VRAM for the 3D engine to consume.

Now the huge problem here is that you need to fill these gaps in various hardware 2D engines using CPU side software rendering. The crux with any 2D render design is that transferring buffers and data too often between the GPU and CPU will kill performance. Usually the cliff is so steep that pure CPU rendering using only software easily beats any simplistic 2D acceleration design.

The only way to fix this is to be really careful when moving data between the CPU and GPU for different rendering operations. Sticking to one side, even if it’s a bit slower, tends to be an overall win. But these decisions highly depend upon the exact features and performance characteristics of your 2D engine. Putting a generic abstraction layer in the middle of this stack, where it’s guaranteed to be if you make it a part of the kernel/userspace interface, will not result in actual accelaration.

So either you make your 2D rendering look like it’s a 3D game, using 3D interfaces like OpenGL or Vulkan. Or you need a software stack that’s bespoke to your use-case and the specific hardware you want to run on.

2D Accelaration is Really Hard

This is the primary reason really. If you don’t believe that, look at all the tricks a browser employs to render CSS and HTML and text really fast, while still animating all that stuff smoothly. Yes, a web-browser is the pinnacle of current 2D acceleration tech, and you really need all the things in there for decent performance: Scene graphs, clever render culling, massive batching and huge amounts of pains to make sure you don’t have to fall back to CPU based software rendering at the wrong point in a rendering pipeline. Plus managing all kinds of assorted caches to balance reuse against running out of memory.

Unfortunately lots of people assume 2D must be a lot simpler than 3D rendering, and therefore they can design a 2D API that’s fast enough for everyone. No one jumps in and suggests we’ll have a generic 3D interface at the kernel level, because the lessons there are very clear:

  • The real application interface is fairly high level, and in userspace.

  • There’s a huge industry group doing really hard work to specify these interfaces.

  • The actual kernel to userspace interfaces ends up being highly specific to the hardware and architecture of the userspace driver (which contains most of the magic). Any attempt at a generic interface leaves lots of hardware specific tricks and hence performance on the floor.

  • 3D APIs like OpenGL or Vulkan have all the batching and queueing and memory management issues covered in one way or another.

There are a bunch of DRM drivers which have a support for 2D render engines exposed to userspace. But they all use highly hardware specific interfaces, fully streamlined for the specific engine. And they all require a decently sized chunk of driver code in userspace to translate from a generic API to the hardware formats. This is what DRM maintainers will recommend you to do, if you submit a patch to add a generic 2D acceleration API.

Exactly like a 3D driver.

If All Else Fails, There’s Options

Now if you don’t care about the last bit of performance, and your use-case is limited, and your blitter engine is limited, then there’s already options:

You can take whatever pixel buffer you have, export it as a dma-buf, and then import it into some other subsystem which already has some kind of limited 2D accelaration support. Depending upon your blitter engine, a v4l2 mem2m device, or for simpler things there’s also dmaengines.

On top, the DRM subsystem does allow you to implement the traditional accelaration methods exposed by the fbdev subsystem. In case you have userspace that really insists on using these; it’s not recommended for anything new.

What about KMS?

The above is kinda a lie, since the KMS (kernel modesetting) IOCTL userspace API is a fairly full-featured 2D rendering interface. The aim of course is to render different pixel buffers onto a screen. With the recently added writeback support operations targetting memory are now possible. This could be used to expose a traditional blitter, if you only expose writeback support and no other outputs in your KMS driver.

There’s a few downsides:

  • KMS is highly geared for compositing just a few buffers (hardware usually has a very limited set of planes). For accelerated text rendering you want to do a composite operation for each character, which means this has rather limited use.

  • KMS only needs to run at 60Hz, or whatever the refresh rate of your monitor is. It’s not optimized for efficiency at higher throughput at all.

So all together this isn’t the high-speed 2D accelaration API you’re looking for either. It is a valid alternative to the options above though, e.g. instead of a v4l2 mem2m device.

FAQ for the FAQ, or: OpenVG?

OpenVG isn’t the standard you’re looking for either. For one it’s a userspace API, like OpenGL. All the same reasons for not implementing a generic OpenGL interface at the kernel/userspace apply to OpenVG, too.

Second, the Mesa3D userspace library did support OpenVG once. Didn’t gain traction, got canned. Just because it calls itself a standard doesn’t make it a widely adopted industry default. Unlike OpenGL/Vulkan/DirectX on the 3D side.

Thanks to Dave Airlie and Daniel Stone for reading and commenting on drafts of this text.

August 24, 2018
robertfoss@xps9570 ~/work/libdrm $ git ru
remote: Counting objects: 234, done.
remote: Compressing objects: 100% (233/233), done.
remote: Total 234 (delta 177), reused 0 (delta 0)
Receiving objects: 100% (234/234), 53.20 KiB | 939.00 KiB/s, done.
Resolving deltas: 100% (177/177), completed with 36 local objects.
From git://
   cb592ac8166e..bcb9d976cd91  master     -> upstream/master
 * [new tag]                   libdrm-2.4.93 -> libdrm-2.4.93
 * [new tag]                   libdrm-2.4.94 -> libdrm-2.4.94

The idea here is that we by issuing a single short command can fetch the latest master branch from the upstream repository of the codebase we're working on and set our local master branch to point to the most recent upstream/master one …

August 16, 2018

This is mostly a request for testing, because I've received zero feedback on the patches that I merged a month ago and libinput 1.12 is due to be out. No comments so far on the RC1 and RC2 either, so... well, maybe this gets a bit broader attention so we can address some things before the release. One can hope.

Required reading for this article: Observations on trackpoint input data and X server pointer acceleration analysis - part 5.

As the blog posts linked above explain, the trackpoint input data is difficult and largely arbitrary between different devices. The previous pointer acceleration libinput had relied on a fixed reporting rate which isn't true at low speeds, so the new acceleration method switches back to velocity-based acceleration. i.e. we convert the input deltas to a speed, then apply the acceleration curve on that. It's not speed, it's pressure, but it doesn't really matter unless you're a stickler for technicalities.

Because basically every trackpoint has different random data ranges not linked to anything easily measurable, libinput's device quirks now support a magic multiplier to scale the trackpoint range into something resembling a sane range. This is basically what we did before with the systemd POINTINGSTICK_CONST_ACCEL property except that we're handling this in libinput now (which is where acceleration is handled, so it kinda makes sense to move it here). There is no good conversion from the previous trackpoint range property to the new multiplier because the range didn't really have any relation to the physical input users expected.

So what does this mean for you? Test the libinput RCs or, better, libinput from master (because it's stable anyway), or from the Fedora COPR and check if the trackpoint works. If not, check the Trackpoint Configuration page and follow the instructions there.

August 13, 2018

Nothing lasts forever, and this also applies for GSoC projects. In this report, I tried to summarize my experience in the DRI community and my contributions.

Recap the project idea

First, it is important to remember the main subject of my GSoC Project:

The Kernel Mode-Setting (KMS) is a mechanism that enables a process to command the kernel to set a mode (screen resolution, color depth, and rate) which is in a range of values supported by graphics cards and the display screen. Creating a Virtual KMS (VKMS) has benefits. First, it could be used for testing; second, it can be valuable for running X or Wayland on a headless machine enabling the use of GPU. This module is similar to VGEM, and in some ways to VIRTIO. At the moment that VKMS gets mature enough, it will be used to run i-g-t test cases and to automate userspace testing.

I heard about VKMS in the DRM TODO list and decided to apply for GSoC with this project. A very talented developer from Saudi Arabia named Haneen Mohammed had the same idea but applied to the Outreachy program. We worked together with the desire to push as hard as we can the Virtual KMS.

Overcome the steep learning curve

In my opinion, the main reason for the steep learning curve came from the lack of background experience in how the graphics stack works. For example, when I took operating system classes, I studied many things related to schedulers, memory and disk management, and so forth; on the other hand, I had a 10000-foot view of graphics systems. After long hours of studying and coding, I started to understand better how things work. It is incredible all the progress and advances that the DRI developers brought on the last few years! I wish that the new versions of the Operating system books have a whole chapter for this subject.

I still have problems to understand all the mechanisms available in the DRM; however, now I feel confident on how to read the code/documentation and get into the details of the DRM subsystem. I have plans to compile all the knowledge acquired during the project in a series of blog posts.


During my work in the GSoC, I send my patches to the DRI mailing list and constantly got feedback to improve my work; as a result, I rework most of my patches. The natural and reliable way to track the contribution is by using “git log –author=”Rodrigo Siqueira” in one of the repositories below:

  • For DRM patches: git://
  • For patches already applied to Torvalds branch:
  • For IGT patches: git://

In summary, follows the main patches that I got accepted:

  • drm/vkms: Fix connector leak at the module removal
  • drm/vkms: Add framebuffer and plane helpers
  • drm/vkms: Add vblank events simulated by hrtimers
  • drm/vkms: Add connectors helpers
  • drm/vkms: Add dumb operations
  • drm/vkms: Add extra information about vkms
  • drm/vkms: Add basic CRTC initialization
  • drm/vkms: Add mode_config initialization

We received two contributions from external people; I reviewed both patches:

  • drm/vkms: Use new return type vm_fault_t
  • drm/vkms: Fix the error handling in vkms_init()

I am using IGT to test VKMS, for this reason, I decided to send some contributions to them. I sent a series of patches for fixing GCC warning:

  • Fix comparison that always evaluates to false
  • Avoid truncate string in __igt_lsof_fds
  • Remove parameter aliases with another argument
  • Move declaration to the top of the code
  • Account for NULL character when using strncpy
  • Make string commands dynamic allocate (waiting for review)
  • Fix truncate string in the snprintf (waiting for review)

I also sent a patchset with the goal of adding support for forcing a specific module to be used by IGT tests:

  • Add support to force specific module load
  • Increase the string size for a module name (waiting for review)
  • Add support for forcing specific module (waiting for review)

As a miscellaneous contribution, I created a series of scripts to automate the workflow of Linux Kernel development. This small project was based on a series of scripts provided by my mentor, and I hope it can be useful for newcomers. Follows the project link:

  1. Kworkflow

Work in Progress

I am glad to say that I accomplished all the tasks initially proposed and I did much more. Now I am working to make VKMS work without vblank. This still a work in progress but I am confident that I can finish it soon. Finally, it is important to highlight that my GSoC participation will finish at the end of August because I traveled for two weeks to join the debconf2018.

Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning - Winston Churchill

GSoC gave me one thing that I was pursuing for a long time: a subsystem in the Linux Kernel that I can be focused for years. I am delighted that I found a place to be focused, and I will keep working on VKMS until It is finished.

Finally, the Brazilian government opened a call for encouraging free software development, and I decided to apply the VKMS project. Last week, I received the great news that I was selected in the first phase and now I am waiting for the final results. If everything ends well for me, I will receive funding to work for 5 months in the VKMS and DRM subsystem.

My huge thanks to…

I received support from many people in the dri-devel channel and mailing list. I want to thanks everybody for all the support and patience.

I want to thanks Daniel Vetter for all the feedback and assistance in the VKMS work. I also want to thanks Gustavo Padovan for all the support that he provided to me (which include some calls with great explanations about the DRM). Finally, I want to thanks Haneen for all the help and great work.


  1. Use new return type vm_fault_t
  2. Fix the error handling in vkms_init()
  3. Add support to force specific module load
August 09, 2018

libinput made a design decision early on to use physical reference points wherever possible. So your virtual buttons are X mm high/across, the pointer movement is calculated in mm, etc. Unfortunately this exposed us to a large range of devices that don't bother to provide that information or just give us the wrong information to begin with. Patching the kernel for every device is not feasible so in 2015 the 60-evdev.hwdb was born and it has seen steady updates since. Plenty a libinput bug was fixed by just correcting the device's axis ranges or resolution. To take the magic out of the 60-evdev.hwdb, here's a blog post for your perusal, appreciation or, failing that, shaking a fist at. Note that the below is caller-agnostic, it doesn't matter what userspace stack you use to process your input events.

There are four parts that come together to fix devices: a kernel ioctl and a trifecta of udev rules hwdb entries and a udev builtin.

The kernel's EVIOCSABS ioctl

It all starts with the kernel's struct input_absinfo.

struct input_absinfo {
__s32 value;
__s32 minimum;
__s32 maximum;
__s32 fuzz;
__s32 flat;
__s32 resolution;
The three values that matter right now: minimum, maximum and resolution. The "value" is just the most recent value on this axis, ignore fuzz/flat for now. The min/max values simply specify the range of values the device will give you, the resolution how many values per mm you get. Simple example: an x axis given at min 0, max 1000 at a resolution of 10 means your devices is 100mm wide. There is no requirement for min to be 0, btw, and there's no clipping in the kernel so you may get values outside min/max. Anyway, your average touchpad looks like this in evemu-record:

# Event type 3 (EV_ABS)
# Event code 0 (ABS_X)
# Value 2572
# Min 1024
# Max 5112
# Fuzz 0
# Flat 0
# Resolution 41
# Event code 1 (ABS_Y)
# Value 4697
# Min 2024
# Max 4832
# Fuzz 0
# Flat 0
# Resolution 37
This is the information returned by the EVIOCGABS ioctl (EVdev IOCtl Get ABS). It is usually run once on device init by any process handling evdev device nodes.

Because plenty of devices don't announce the correct ranges or resolution, the kernel provides the EVIOCSABS ioctl (EVdev IOCtl Set ABS). This allows overwriting the in-kernel struct with new values for min/max/fuzz/flat/resolution, processes that query the device later will get the updated ranges.

udev rules, hwdb and builtins

The kernel has no notification mechanism for updated axis ranges so the ioctl must be applied before any process opens the device. This effectively means it must be applied by a udev rule. udev rules are a bit limited in what they can do, so if we need to call an ioctl, we need to run a program. And while udev rules can do matching, the hwdb is easier to edit and maintain. So the pieces we have is: a hwdb that knows when to change (and the values), a udev program to apply the values and a udev rule to tie those two together.

In our case the rule is 60-evdev.rules. It checks the 60-evdev.hwdb for matching entries [1], then invokes the udev-builtin-keyboard if any matching entries are found. That builtin parses the udev properties assigned by the hwdb and converts them into EVIOCSABS ioctl calls. These three pieces need to agree on each other's formats - the udev rule and hwdb agree on the matches and the hwdb and the builtin agree on the property names and value format.

By itself, the hwdb itself has no specific format beyond this:

But since we want to match for specific use-cases, our udev rule assembles several specific match lines. Have a look at 60-evdev.rules again, the last rule in there assembles a string in the form of "evdev:name:the device name:content of /sys/class/dmi/id/modalias". So your hwdb entry could look like this:

evdev:name:My Touchpad Name:dmi:*svnDellInc*
If the name matches and you're on a Dell system, the device gets the EVDEV_ABS_00 property assigned. The "evdev:" prefix in the match line is merely to distinguish from other match rules to avoid false positives. It can be anything, libinput unsurprisingly used "libinput:" for its properties.

The last part now is understanding what EVDEV_ABS_00 means. It's a fixed string with the axis number as hex number - 0x00 is ABS_X. And the values afterwards are simply min, max, resolution, fuzz, flat, in that order. So the above example would set min/max to 0:1 and resolution to 3 (not very useful, I admit).

Trailing bits can be skipped altogether and bits that don't need overriding can be skipped as well provided the colons are in place. So the common use-case of overriding a touchpad's x/y resolution looks like this:

evdev:name:My Touchpad Name:dmi:*svnDellInc*
0x00 and 0x01 are ABS_X and ABS_Y, so we're setting those to 30 units/mm and 20 units/mm, respectively. And if the device is multitouch capable we also need to set ABS_MT_POSITION_X and ABS_MT_POSITION_Y to the same resolution values. The min/max ranges for all axes are left as-is.

The most confusing part is usually: the hwdb uses a binary database that needs updating whenever the hwdb entries change. A call to systemd-hwdb update does that job.

So with all the pieces in place, let's see what happens when the kernel tells udev about the device:

  • The udev rule assembles a match and calls out to the hwdb,
  • The hwdb applies udev properties where applicable and returns success,
  • The udev rule calls the udev keyboard-builtin
  • The keyboard builtin parses the EVDEV_ABS_xx properties and issues an EVIOCSABS ioctl for each axis,
  • The kernel updates the in-kernel description of the device accordingly
  • The udev rule finishes and udev sends out the "device added" notification
  • The userspace process sees the "device added" and opens the device which now has corrected values
  • Celebratory champagne corks are popping everywhere, hands are shaken, shoulders are patted in congratulations of another device saved from the tyranny of wrong axis ranges/resolutions

Once you understand how the various bits fit together it should be quite easy to understand what happens. Then the remainder is just adding hwdb entries where necessary but the touchpad-edge-detector tool is useful for figuring those out.

[1] Not technically correct, the udev rule merely calls the hwdb builtin which searches through all hwdb entries. It doesn't matter which file the entries are in.

August 01, 2018

For some time now I been supporting two Linux developers on patreon. Namely Ryan Gordon of Linux game porting and SDL development fame and Tanu Kaskinen who is a lead developer on PulseAudio these days.

One of the things I often think about is how we can enable more people to make a living from working on the Linux desktop and related technologies. If your reading my blog there is a good chance that you are enabling people to make a living on working on the Linux desktop by paying for RHEL Workstation subscriptions through your work. So a big thank you for that. The fact that Red Hat has paying customers for our desktop products is critical in terms of our ability to do so much of the maintenance and development work we do around the Linux Desktop and Linux graphics stack.

That said I do feel we need more venues than just employment by companies such as Red Hat and this is where I would love to see more people supporting their favourite projects and developers through for instance Patreon. Because unlike one of funding campaigns repeat crowdfunding like Patreon can give developers predictable income, which means they don’t have to worry about how to pay their rent or how to feed their kids.

So in terms of the two Patreons I support Ryan is probably the closest to being able to rely on it for his livelihood, but of course more Patreon supporters will enable Ryan to be even less reliant on payments from game makers. And Tanu’s patreon income at the moment is helping him to spend quite a bit of time on PulseAudio, but it is definitely not providing him with a living income. So if you are reading this I strongly recommend that you support Ryan Gordon and Tanu Kaskinen on Patreon. You don’t need to pledge a lot, I think in general it is in fact better to have many people pledging 10 dollars a Month than a few pledging hundreds, because the impact of one person coming or going is thus a lot less. And of course this is not just limited to Ryan and Tanu, search around and see if any projects or developers you personally care deeply about are using crowdfunding and support them, because if more of us did so then more people would be able to make a living of developing our favourite open source software.

Update: Seems I wasn’t the only one thinking about this, Flatpak announced today that application devs can put their crowdfunding information into their flatpaks and it will be advertised in GNOME Software.

I'd had a bit of a break from adding feature to virgl while I was working on radv, but recently Google and Collabora have started to invest in virgl as a solution. A number of developers from both companies have joined the project.

This meant trying to get virgl to pass their dEQP suite and adding support for newer GL/GLES feature levels. They also have a goal for the renderer to run on a host GLES implementation whereas it currently only ran on a host GL.

Over the past few months I've worked with the group to add support for all the necessary features needed for guest GLES3.1 support (for them) and GL4.3 (for me).

The feature list was roughly:
tessellation shaders
fp64 support
ARB_gpu_shader5 support
Shader buffer objects
Shader image objects
Compute shaders
Copy Image
Texture views

With this list implemented we achieved GL4.3 and GLES3.1.

However Marek@AMD did some work on exposing ASTC for gallium drivers,
and with some extra work on EXT_shader_framebuffer_fetch, we now expose GLES3.2.

There was also plenty of work done on avoiding crashes from rogue guests (rewrote the whole feature/capability bit handling), and lots of bug fixes. There are still ongoing fixes to finish the dEQP tests, but it looks like all the feature work should be landed now.

What next?

Well there is one big problem facing virgl in exposing GL 4.4. GL_ARB_buffer_storage requires exposing coherent buffer memory, and with the virgl architecture, we currently don't have a way to map the pages behind a host GL buffer mapping into a guest GL buffer mapping in order to achieve coherency. This is going to require some thought and it may even require exposing some new GL extensions to export a buffer to a dma-buf.

There has also been a GSoC student Nathan working on vulkan/virgl support, he's made some iniital progess, however vulkan also has requirements on coherent memory so that tricky problems needs to be solved.

Thanks again to all the contributors to the virgl project.

To make testing libinput git master easier, I set up a whot/libinput-git Fedora COPR yesterday. This repo gets the push triggers directly from GitLab so it will rebuild with whatever is currently on git master.

To use the COPR, simply run:

sudo dnf copr enable whot/libinput-git
sudo dnf upgrade libinput
This will give you the libinput package from git. It'll have a date/time/git sha based NVR, e.g. libinput-1.11.901-201807310551git22faa97.fc28.x86_64. Easy to spot at least.

To revert back to the regular Fedora package run:

sudo dnf copr disable whot/libinput-git
sudo dnf distro-sync "libinput-*"

Disclaimer: This is an automated build so not every package is tested. I'm running git master exclusively (from a a ninja install) and I don't push to master unless the test suite succeeds. So the risk for ending up with a broken system is low.

On that note: if you are maintaining a similar repo for other distributions and would like me to add a push trigger in GitLab for automatic rebuilds, let me know.

July 31, 2018

Stack overview

Let's start with having a look at a high level overview of what the graphics stack looks like.

Alt text

Before digging too much further into this, lets cover some terminology.

DRM - Direct Rendering Manager - is the Linux kernel graphics subsystem, which contains all of the graphics drivers and does all of the interfacing with hardware.
The DRM subsystem implements the KMS - kernel mode setting - API.

Mode setting is essentially configuring output settings like resolution for the displays that are being used. And doing it using the kernel means that userspace doesn't need access to setting these things directly.

Alt text

The DRM subsystem talks to the hardware and Mesa is used by applications through the APIs it implements. APIs like OpenGL, OpenGL ES, Vulkan, etc. All …

July 30, 2018

libinput's documentation started out as doxygen of the developer API - they were the main target 4 years ago. Over time, more and more extra documentation was added and now most of it is aimed at users (for self-debugging and troubleshooting or just to explain concepts and features). Unfortunately, with doxygen this all ends up in the "Related Pages". The developer API documentation itself became a less important part, by now all the major compositors have libinput support and it doesn't change much. So while it needs to be there, most of the traffic goes to the user documentation (I think, it's not like I'm running stats).

Something more suited for prose-style docs was needed. I prefer the RTD look so last week I converted most of the libinput documentation into RST format and it's now built with sphinx and the RTD theme. Same URL as before:

The biggest difference is that the Developer API Documentation (still doxygen) is now at, (i.e. add /api/ to the link). If you're programming against libinput's API (e.g. because you're writing a compositor), that's where you need to go.

It's still basically the same content as before, I'll be tidying things up and adding to it over the next few weeks. Hopefully without breaking existing links. There is probably detritus from the doxygen → rst change floating around, I'll be fixing that too. If you want to help out please don't hesitate, I'll do my best to be quick to review any merge requests.

I made some progress on GMP this week, getting jobs to run successfully if I enable regions in the GMP but never disable them. I need to do some more work on BO lifetime management if I want to clear disallowed regions back out of the GMP.

I did some more conformance debug, fixing one of the intermittent issues in GLES3 (The same rendering-versus-VS-texturing synchronization bug that broke glmark2 terrain on vc4). GLES3 conformance is now at GLES2 status plus one new set of intermittent issues. Another fix I made (not misusing HW semaphores) unfortunately doesn’t seem to have improved anything.

I had a few performance ideas left over from last week, and built them on Monday (fixed extra flushing due to glClear() after drawing, or the GFHX-1461 workaround, avoid storing buffers that had all their drawing masked out). Performance of simple apps was still astoundingly slow, so I did some more poking around and determined that my V3D HW is actually running at 37 Mhz. My other board runs at 27Mhz. Now I know why my conformance runs have been taking so long! Unfortunately, Broadcom STB doesn’t support clock control from Linux, so I won’t really be able to do performance work on this driver until I can get a SW stack under me that can set the right clock speeds.

Finally, I cleaned up and merged my CLIF dumping code, so V3D can now generate simple traces that can be replayed by the software simulator or on FPGAs to debug lockups. It’s going to take a bit more work to make it so that BO addresses in uniforms work, which is needed for texturing, UBOs, and spilling.

July 29, 2018

The All Systems Go! 2018 Call for Participation Closes TODAY!

The Call for Participation (CFP) for All Systems Go! 2018 will close TODAY, on 30th of July! We’d like to invite you to submit your proposals for consideration to the CFP submission site quickly!

ASG image

All Systems Go! is everybody's favourite low-level Userspace Linux conference, taking place in Berlin, Germany in September 28-30, 2018.

For more information please visit our conference website!

This is quite a long post. The executive summary is that now hosts an instance of GitLab, which is generally available and now our preferred platform for hosting going forward. We think it offers a vastly better service, and we needed to do it in order to offer the projects we host the modern workflows they have been asking for.

In parallel, we’re working on making our governance, including policies, processes and decision making, much more transparent.

Some history

Founded by Havoc Pennington in 2000, is now old enough to vote. From the initial development of the cross-desktop XDG specs, to supporting critical infrastructure such as NetworkManager, and now as the home to open-source graphics development (the kernel DRM tree, Mesa, Wayland, X.Org, and more), it’s long been a good home to a lot of good work.

We don’t provide day-to-day technical direction or enforce set rules: it’s a very loose collection of projects which we each trust to do their own thing, some with nothing in common but where they’re hosted.

Unfortunately, that hosting hasn’t really grown up a lot since the turn of the millennium. Our account system was forked (and subsequently heavily hacked) from Debian’s old LDAP-based system in 2004. Everyone needing direct Git commit access to projects, or the ability to upload to web space, has to file a bug in Bugzilla, where after a trip through the project maintainer, eventually an admin will get around to pulling their SSH and GPG (!) keys and adding an account by hand.

Similarly, creating or reconfiguring a Git repository also requires manual admin intervention, where on request one of us will SSH into the Git server and do whatever is required. Beyond Git and cgit for viewing, we provide Bugzilla for issue tracking, Mailman and Patchwork for code review and discussion, and ikiwiki for tracking. For our sins, we also have an FTP server running somewhere. None of these services are really integrated with each other; separate accounts and separate sets of permissions are required.

Maintaining these disparate services is a burden on both admins and projects. Projects are frequently blocked on admins adding users and changing their SSH keys, changing Git hooks, adding people to Patchwork, manually applying more duct tape to the integration between these services, and fixing the duct tape when it breaks (which is surprisingly often). As a volunteer admin for the service, doing these kinds of things is not exactly the reason we get out of bed in the morning; it also consumes so much time treading water that we haven’t been able to enable new features and workflows for the projects we host.

Seeking better workflows

As of writing, around one third of the non-dormant projects on fd.o have at some point migrated their development elsewhere; mostly to GitHub. Sometimes this was because the other sites were a more natural home (e.g. to sibling projects), and sometimes just because they offered a better workflow (integration between issue tracking and commits, web-based code review, etc). Other projects which would have found fd.o a natural home have gone straight to hosting externally, though they may use some of our services - particularly mailing lists.

Not everyone wants to make use of these features, and not everyone will. For example, the kernel might well never move away from email for issue tracking and code review. But the evidence shows us that many others do want to, and our platform will be a non-starter for them unless we provide the services they want.

A bit over three years ago, I set up an instance of Phabricator at Collabora to replace our mix of Bugzilla, Redmine, Trac, and JIRA. It was a great fit for how we worked internally, and upstream seemed like a good fit too; though they were laser-focused on their usecases, their extremely solid data storage and processing model made it quite easy to extend, and projects like MediaWiki, Haskell, LLVM and more were beginning to switch over to use it as their tracker. I set up an instance on fd.o, and we started to use it for a couple of trial projects: some issue tracking and code review for Wayland and Weston, development of PiTiVi, and so on.

The first point we seriously discussed it more widely was at XDC 2016 in Helsinki, where Eric Anholt gave a talk about our broken infrastructure, cleverly disguised as something about test suites. It became clear that we had wide interest in and support for better infrastructure, though with some reservation about particular workflows. There was quite a bit of hallway discussion afterwards, as Eric and Adam Jackson in particular tried out Phabricator and gave some really good feedback on its usability. At that point, it was clear that some fairly major UI changes were required to make it usable for our needs, especially for drive-by contributors and new users.

Last year, GNOME went through a similar process. With Carlos and some of the other members being more familiar with GitLab, myself and Emmanuele Bassi made the case for using Phabricator, based on our experiences with it at Collabora and Endless respectively. At the time, our view was that whilst GitLab’s code review was better, the issue tracking (being much like GitHub’s) would not really scale to our needs. This was mostly based on having last evaluated GitLab during the 8.x series; whilst the discussions were going on, GitLab were making giant strides in issue tracking throughout 9.x.

With GitLab coming up to par on issue tracking, both Emmanuele and I ended up fully supporting GNOME’s decision to base their infrastructure on GitLab. The UI changes required to Phabricator were not really tractable for the resources we had, the code review was and will always be fundamentally unsuitable being based around the Subversion-like model of reviewing large branches in one go, and upstream were also beginning to move to a much more closed community model.

By contrast, one of the things which really impressed us about GitLab was how openly they worked, and how open they were to collaboration. Early on in GNOME’s journey to GitLab, they dropped their old CLA to replace it with a DCO, and Eliran Mesika from GitLab’s partnership team came to GUADEC to listen and understand how GNOME worked and what they needed from GitLab. Unfortunately this was too early in the process for us, but Robert McQueen later introduced us, and Eliran and I started talking about how they could help

One of our bigger issues was infrastructure. Not only were our services getting long in the tooth, but so were the machines they ran on. In order to stand up a large new service, we’d need new physical machines, but a fleet of new machines was beyond the admin time we had. It also didn’t solve issues such as everyone’s favourite: half of Europe can’t route to fd.o for half an hour most mornings due to obscure network issues with our host we’ve had no success diagnosing or fixing.

GitLab Inc. listened to our predicament and suggested a solution to help us: that they would sponsor our hosting on Google Cloud Platform for an initial period to get us on our feet. This involves us running the completely open-source GitLab Community Edition on infrastructure we control ourselves, whilst freeing us from having to worry about failing and full disks or creaking networks. (As with GNOME, we politely declined the offer of a license to the pay-for GitLab Enterprise Edition; we wanted to be fully in control of our infrastructure, and on a level playing field with the rest of the open-source community.)

They have also offered us support, from helping a cloud idiot understand how to deploy and maintain services on Kubernetes, to taking the time to listen and understand our workflows and improve GitLab for our uses. Much of the fruit of this is already visible in GitLab through feedback from us and GNOME, though there is always more to come. In particular, one area we’re looking at is integration with mailing lists and placing tags in commit messages, so developers used to mail-based workflows can continue to consume the firehose through email, rather than being required to use the web UI for everything.

Last Christmas, we gave ourselves the present of standing up on GCP, and set about gradually making it usable and maintainable for our projects. Our first hosted project was Panfrost, who were running on either non-free services or non-collaborative hosted services. We wanted to help them out by getting them on to fd.o, but they didn’t want to use the services we had at the time, and we didn’t want to add new projects to those services anyway.

Over time, as we stabilised the deployment and fleshed out the feature set, we added a few smaller projects, who understood the experimental nature and gave us space to make some mistakes, have some down time, and helped us smooth out the rough edges. Some of the blocker here was migrating bugs: though we reused GNOME’s bztogl script, we needed some adjustments for our different setups, as well as various bugfixes.

Not long ago, we migrated Mesa’s repository hosting as well as Wayland and Weston for both repository tracking and issue tracking which are our biggest projects to date.

What we offer to projects

With GitLab, we offer everything you would expect from (their hosted offering), or everything you would expect from GitHub with the usual external services such as Travis CI. This includes issue tracking integrated with repository management (close issues by pushing), merge requests with online review and merge, a comprehensive CI suite with shared runners available to all, custom sites built with whatever toolchain you like, external web hooks to integrate with other services, and a well-documented stable API which allows you to use external clients like git lab.

In theory, we’ve always provided most of the above services. Most of these - if you ignore the lack of integration between them - were more or less fine for projects running their own standalone infrastructure. But they didn’t scale to something like fd.o, where we have a very disparate family of projects sharing little in common, least of all common infrastructure and practices. For example, we did have a Jenkins deployment for a while, but it became very clear very early that this did not scale out to fd.o: it was impossible for us to empower projects to run their own CI without fatally compromising security.

Anyone familiar with the long wait for an admin to add an account or change an SSH key will be relieved to hear that this is no longer. Anyone can make an account on our GitLab instance using an email address and password, or with trusted external identity providers (currently Google,, GitHub, or Twitter) rather than having another username and password. We delegate permission management to project owners: if you want to give someone commit rights to your project, go right ahead. No need to wait for us.

We also support such incredible leading-edge security features as two-factor TOTP authentication for your account, Recaptcha to protect against spammers, and ways of deleting spam which don’t involve an admin sighing into a SQL console for half an hour, trying to not accidentally delete all the content.

Having an integrated CI system allows our projects to run test pipelines on merge requests, giving people fast feedback about any required changes without human intervention, and making sure distcheck works all the time, rather than just the week before release. We can capture and store logs, binaries and more as artifacts.

The same powerful system is also the engine for GitLab Pages: you can use static site generators like Jekyll and Hugo, or have a very spartan, hand-written site but also host auto-generated documentation. The choice is yours: running everything in (largely) isolated containers means that you can again do whatever you like with your own sites, without having to ask admins to set up some duct-taped triggers from Git repositories, then ask them to fix it when they’ve upgraded Python and everything has mysteriously stopped working.

Migration to GitLab, and legacy services

Now that we have a decent and battle-tested service to offer, we can look to what this means for our other services.

Phabricator will be decommissioned immediately; a read-only archive will be taken of public issues and code reviews and maintained as static pages forever, and a database dump will also be kept. But we do not plan to bring this service back, as all the projects using it have already migrated away from it.

Similarly, Jenkins has already been decommissioned and deactivated some time ago.

Whilst we are encouraging projects to migrate their issue tracking away from Bugzilla and helping those who do, we realise a lot of projects have built their workflows around Bugzilla. We will continue to maintain our Bugzilla installation and support existing projects with its use, though we are not offering Bugzilla to new projects anymore, and over the long term would like to see Bugzilla eventually retired.

Patchwork (already currently maintained by Intel for their KMS and Mesa work) is in the same boat, complicated by the fact that the kernel might never move away from patches carved into stone tablets.

Hopefully it goes without saying that our mailing lists are going to be long-lived, even if better issue tracking and code review does mean they’re a little less-trafficked than before.

Perhaps most importantly, we have anongit and cgit. anongit is not provided by GitLab, as they rightly prefer to serve repositories over https. Given that, for all existing projects we are maintaining anongit.fd.o as a read-only mirror of GitLab; there are far too many distributions, build scripts, and users out there with anongit URIs to discontinue the service. Over time we will encourage these downstreams to move to HTTPS to lessen the pressure, but this will continue to live for quite some time. Having cgit live alongside anongit is fairly painless, so we will keep it running whilst it isn’t a burden.

Lastly, annarchy.fd.o (aka people.fd.o) is currently offered as a general-purpose shell host. People use this to manage their Git repositories on people.fd.o and their files publicly served there. Since it is also the primary web host for most projects, both people and scripts use it to deploy files to sites. Some people use it for random personal file storage, to run various scripts and even as a personal IRC host. We are trying to transition these people away from using annarchy for this, as it is difficult for us to provide totally arbitrary resources to everyone who has at one point had an account with one of our member projects. Running a source of lots of IRC traffic is also a good way to make yourself deeply unpopular with many hosts.

Migrating your projects

After being iterated and fleshed out, we are happy to offer to migrate all the projects. For each project, we will ask you to file an issue using the migration template. This gives you a checklist with all the information we need to migrate your GitLab repositories, as well as your existing Bugzilla bugs.

Every user with a SSH account already has an account created for them on GitLab, with access to the same groups. In order to recover access to the migrated accounts, you can request a password-reset link by entering the email address you signed up with into the ‘forgotten password’ box on the GitLab front page.

More information is available on the freedesktop GitLab wiki, and of course the admins are happy to help if you have any problems with this. The usual failure mode is that your email address has changed since you signed up: we’ve had one user who needed it changed as they were still using a Yahoo! mail address.

Governance and process

Away from technical issues, we’re also looking to inject a lot more transparency into our processes. For instance, why do we host kernel graphics development, but not new filesystems? What do we look for (both good and bad), and why is that? What is even for, and who is it serving?

This has just been folk knowledge for some time; passed on by oral legend over IRC as verbal errata to out-of-date wiki pages. Just as with technical issues, this is not healthy for anyone: it’s more difficult for people to get involved and give us the help we so clearly need, it’s more difficult for our community to find out what they can expect from us and how we can help them, and it’s impossible for anyone to measure how good a job we’re actually doing.

One of the reasons we haven’t done a great job at this is because just keeping the Rube Goldberg machine of our infrastructure running exhausts basically all the time we have to deal with fd.o. The time we spend changing someone’s SSH keys by hand, or debugging a Git post-receive hook, is time we’re not spending on the care and feeding of our community.

We’ve spent the past couple of years paying down our technical debt, and the community equivalent thereof. Our infrastructure is much less error-prone than it was: we’ve gone from fighting fires to being able to prepare the new GitLab infrastructure and spend time shepherding projects through it. Now that we have a fair few projects on GitLab and they’ve been able to serve themselves, we’ve been able to take some time for community issues.

Writing down our processes is still a work in progress, but something we’ve made a little more headway on is governance. Currently fd.o’s governance is myself, Keith and Tollef discussing things and coming to some kind of conclusion. Sometimes that’s in recorded public fora, sometimes over email with searchable archives, sometimes just over IRC message or verbally with no public record of what happened.

Given that there’s a huge overlap between our mission and that of the X.Org Foundation (which is a lot more than just X11!), one idea we’re exploring is to bring fd.o under the Foundation’s oversight, with clear responsibility, accountability, and delegated roles. The combination of the two should give our community much more insight into what we’re doing and why - as well as, crucially, the chance to influence it.

Of course, this is all conditional on fd.o speaking to our member projects, and the Foundation speaking to its individual members, and getting wide agreement. There will be a lot of tuning required - not least, the Foundation’s bylaws would need a change which needs a formal vote from the membership - but this at least seems like a promising avenue.

July 26, 2018

A couple of Months ago we visited IKEA and saw the IKEA Trådfri smart lighting system. Since it was relatively cheap we decided to buy their starter pack and enough bulbs to make the recessed lights in our living rooms smartlight. I got it up and running and had some fun switching the light hue and turning the lights on and off from my phone.
A bit later I got a Google Assistant speaker at Google I/O and the system suddenly became somewhat more useful as I could control the lights by calling out to the google assistant. I was also able to connect our AC system thermostats to the google assistant so we could change the room temperature using voice commands.
As a result of this I ended up reading up on smart home technologies and found that my IKEA hub and bulbs was conforming to the ZigBee standard and that I should be able to buy further Zigbee compatible devices from other vendors to extend it. So I ordered a Zigbee compatible in-wall light switch from GE and I also ordered a Zigbee compatible in-ceiling switch from a company called Nue through Amazon. Once I got these home and tried to get them up and running I found that my understanding was flawed as there are two Zigbee standards, the older Zigbee HA and the newer Zigbee LightLink. The IKEA stuff is Zigbee LightLink while at least the GE switch was Zigbee HA and thus the IKEA hub could not control it. So I ended up ordering a Samsung SmartThings hub which supports Zigbee HA, Zigbee LL and the competing system called Z-Wave. At which point I got both my IKEA lights and my two devices working with it.
In the meantime I had also gotten myself a Google Assistant compatible portable aircon from FrigidAire for my home office (no part of main house) and a Google Assistant compatible fire alarm from Nest for the same office space.

So having lived in my new smarter home for a while now what are my conclusions? Well first of all that we still have some way to go before this is truly seamless and obvious. I consider myself a fairly technical person, but it still took quite a bit of googling for me to be able to get everything working. Secondly a lot of the smart home stuff feel a bit gimmicky in the end. For instance the Frigidaire portable air conditioner integration with the Google Assistant is more annoying than useful. It basically requires me to start by asking to talk to Frigidaire and then listen to a ton of crap before I can even start trying to do anything. As for the lights we do actually turn them on and off quite a bit using the voice commands (at least I try to until I realize my wife has disconnected the google assistant in order to use its cable to charge her phone :). I also realized that while installing and buying the in-wall switches are a bit more costly and complicated than just getting some smart bulbs it does work a lot better as I can then control the lights using both voice and switch. Because the smart bulbs can not be turned on using voice if you have a normal switch turned off (obviously, but not something I thought about before buying). So getting something like the IKEA bulbs is a nice and cheap way to try this stuff out I don’t see it as our long term solution here. The thermostat we haven’t controlled or queried once since I did the initial testing of its connection to Google assistant.

All in all I have to say that the smart home tech is cute, but it is far from being essential. I might end up putting in more wall switches for the light going forward, but apart from that, having smart home support in a device is not going to drive my purchasing decisions. Maybe as the tech matures and becomes more mainstream it will also become more useful, but as it stands it mostly solves first world problems (although of course there are real gains here for various accessibility situations).

July 25, 2018

Recently, I submitted a talk about VKMS in a conference named Linux Developer Conference Brazil (linuxdev-br)[1] and got accepted [2]. I am delighted with this opportunity, since I attended linuxdev-br 2017 as a complete newbie and the conference provided me the required inputs to start working in the Kernel. Also, I had the opportunity to talk with great kernel developers and learn a little bit more with them. As a result, I have a lot of affection for this conference; presenting at linuxdev-br 2018 means a lot to me.

In the talk, I will explain how I started with Linux Kernel and got accepted in the GSoC 2018. Following that, I want to provide an overview of the DRM subsystem and finish with the explanation of the VKMS. I want to detail some of the features that I developed, and I will try to explain the features developed by Haneen. As soon as I have the presentation done, I will post it here.


  1. Linux Developer Conference Brazil
  2. Linuxdev-br speakers

Gather round children, it's story time. Especially for you children who lurk on /r/linux and think you may learn something there. Today, I'll tell you a horror story. The one where we convert kernel input events into touchpad events, with the subtle subtitle of "friends don't let friends handle evdev events".

The question put forward is "why do we need libinput at all", when, as frequently suggested on the usual websites, it's sufficient to just read evdev data and there's really no need for libinput. That is of course true. You can use evdev events from the kernel directly. Did you know that the events the kernel gives you are absolute coordinates? And that not all touchpads have buttons? Or that some touchpads have specific event sequences that need to be filtered? No? Well, boy, are you in for a few surprises! Anyway, let's go and handle evdev events ourselves and write our own libmyinput.

How do we know something is a touchpad? Well, we look at the exposed evdev bits. We need ABS_X, ABS_Y and BTN_TOOL_FINGER but don't want INPUT_PROP_DIRECT. If the latter bit is set then we have a touchscreen (probably). We don't actually care about buttons here, that comes later. ABS_X and ABS_Y give us device-absolute coordinates. On touch down you get the evdev frame of "a finger is down at x/y device units from the top-left". As you move around, you get the x/y coordinate updates. The data itself is exactly the same as you would get from a touchscreen, but we know it's a touchpad because we queried the other bits at startup. So your first job is to convert the absolute x/y coordinates to deltas by subtracting the previous position.

Touchpads have different resolutions for x and y so a delta of 10/10 does not mean it's a 45-degree movement. Better check with the resolution to convert this to physical distances to be on the safe side. Oh, btw, the axes aren't reliable. The min/max ranges and the resolutions are wrong on a large number of touchpads. Luckily systemd fixes this for you with the 60-evdev.hwdb. But I should probably note that hwdb only exists because of libinput... Either way, you don't have to care about it because the road's already paved. You're welcome.

Oh wait, you do have to care a little because there are touchpads (e.g. HP Stream 11, ZBook Studio G3, ...) where bits are missing or wrong. So you better write a device database that tells you when you have correct the evdev bits. You could implement this as config option but that's just saying "I know what's wrong here, I know how to fix it but I'm still going to make you google for it and edit a local configuration file to make it work". You could treat your users this way, but you really shouldn't.

As you're happily processing your deltas, you notice that on some touchpads you get motion before you touch the touchpad. Ooops, we need a way to tell whether a finger is down. Luckily the kernel gives you BTN_TOUCH for that event, so you switch your implementation to only calculate deltas when BTN_TOUCH is set. But then you realise that is effectively a hardcoded threshold in the kernel and does not match a lot of devices. Some devices require too-hard finger pressure to trigger BTN_TOUCH, others send it on super-light pressure or even while hovering. After grinding some enamel away you find that many touchpads give you ABS_PRESSURE. Awesome, let's make touches pressure-based instead. Let's use a threshold, no, I mean a device-specific threshold (because if two touchpads would be the same the universe will stop doing whatever a universe does, I clearly haven't thought this through). Luckily we already have the device database so we just add the thresholds there.

Oh, if you want this to run on a Apple touchpad better implement touch size handling (ABS_MT_TOUCH_MAJOR/ABS_MT_TOUCH_MINOR). These axes give you the size of the touching ellipse which is great. Except that the value is just an arbitrary number range that have no reflection to physical properties, so better update your database so you can add those thresholds.

Ok, now we have single-finger handling in our libnotinput. Let's add some sophisticated touchpad features like button clicks. Buttons are easy, the kernel gives us BTN_LEFT and BTN_RIGHT and, if you're lucky, BTN_MIDDLE. Unless you have a clickpad of course in which case you only ever get BTN_LEFT because the whole touchpad can be depressed (much like you, if you continue writing your own evdev handling). Those clickpads are in the majority of laptops these days, so we have to deal with them. The two approaches we have are "software button areas" and "clickfinger". The former detects where your finger is when you push the touchpad down - if it's in the bottom right corner we convert the kernel's BTN_LEFT to a BTN_RIGHT and pass that on. Decide how big the buttons will be (note: some touchpads that need software buttons are only 50mm high, others exceed 100mm height). Whatever size you choose, it's an invisible line on the touchpad. Do you know yet how you will handle a finger that moves from outside the button are into the button area before the click? Or the other way round? Maybe add this to your todo list for fixing later.

Maybe "clickfinger" is easier? It counts how many fingers are on the touchpad when clicking (1 finger == left click, 2 fingers == right click, 3 fingers == middle click). Much easier, except that so far we only handle one finger. The easy fix is to use BTN_TOOL_DOUBLETAP and BTN_TOOL_TRIPLETAP which are bitflags that tell you when a second/third finger are down. Add that to your libthisisnotlibinput. Coincidentally, users often click with their thumb while moving. So you have one finger moving the pointer, then a thumb click. Two fingers down but the user doesn't perceive it as such, this should be a left click. Oops, we don't actually know where the second finger is.

Let's switch our libstillnotlibinput to use ABS_MT_POSITION_X and ABS_MT_POSITION_Y because that gives us per-finger position information (once you understand how the kernel's MT protocol slots work). And when I say "switch" of course I meant "add" because there are still touchpads in use that don't support multitouch so you get to keep both implementations. There are also a bunch of touchpads that can give you the position of two fingers but not of the third. Wipe that tear away and pencil that into your todo list. I haven't mentioned semi-mt devices yet that will give you multitouch position data for two fingers but it won't track them correctly - the first touch position is always the top/left of the bounding box, the second touch is always the bottom/right of the bounding box. Do the right thing for our libwhathaveidone and just pretend semi-mt devices are single-touch touchpads. libinput (the real one) does the same because my sanity is something to be cherished.

Oh, on another note, some touchpads don't have any buttons (some Wacom tablets are large touchpads). Add that to your todo list. You wanted middle buttons to work? Few touchpads have a middle button (clickpads never do anyway). Better write a middle button emulation system that generates BTN_MIDDLE when both buttons are pressed. Or when a finger is on the left and another finger is on the right software button. Or when a finger is in a virtual middle button area. All these need to be present because if not, you get dissed by users for not implementing their favourite interaction method.

So we're several paragraphs in and so far we have: finger tracking and some button handling. And a bunch of things on the todo list. We haven't even started with other fancy features like edge scrolling, two-finger scrolling, pinch/swipe gestures or thumb and palm detection. Oh, and you're not yet handling any other devices like graphics tablets which are a world of their own. If you think all the other features and devices are any less of a mess... well, an Austrian comedian once said (paraphrased): "optimism is just a fancy word for ignorance".

All this is just handling features that users have come to expect. Examples for non-features that you'll have to implement: on some Lenovo series (*50 and newer) you will get a pointer jump after a series of of events that only have pressure information. You'll have to detect and discard that jump. The HP Pavilion DM4 touchpad has random jumps in the slot data. Synaptics PS/2 touchpads may 'randomly' end touches and restart them on the next event frame 10ms later. If you don't handle that you'll get ghost taps. And so on and so forth.

So as you, happily or less so, continue writing your libthisismoreworkthanexpected you'll eventually come to realise that you're just reimplementing libinput. Congratulations or condolences, whichever applies.

libinput's raison d'etre is that it deals with all the mess above so that compositor authors can be blissfully unaware of all this. That's the reason why all the major/general-purpose compositors have switched to libinput. That's the reason most distributions now use libinput with the X server (through the xf86-input-libinput driver). libinput has made some design decisions that you may disagree with but honestly, that's life. Deal with it. It doesn't even do all I want and I wrote >90% of it. Suggesting that you can just handle evdev directly is like suggesting you can use GPS coordinates directly to navigate. Sure you can, but there's a reason why people instead use a Tom Tom or Google Maps.

July 23, 2018

Two more weeks of conformance debug, and I’m now down to 1 group of failures in GLES2 and 1 group in EGL, both of which I believe are bugs in the tests (regressions in recent CTS releases, which is why Intel hadn’t caught them already).

Fixes have included:

  • Various bugs in fence handling (one of which caused throttling of frames to fail, leading to unlimited userspace BO cache usage and 2 days of debug flailing to track down).
  • Allow EGLImages to be created of textures/renderbuffers that we can’t do EGL_MESA_image_dma_buf_export on.
  • Fixed EGLImage import for cubemap images.
  • Landed the UIF format modifier support.
  • Fixed uniform lowering to not cause absurd register spilling.
  • Implemented more of the GFXH-1461 workaround to fix missing Z/S clears
  • Implemented a hack in gallium’s blitter to make the awful dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* tests pass.
  • Fixed incorrect reallocation of PERSISTENT buffers (also a vc4 bug!)
  • Fixed 1D_ARRAY texture upload/download.
  • Fixed drawing to MRT without independent blending
  • Fixed MSAA mapping bugs in gallium’s transfer helper.

I also spent a day knocking out some quick performance improvements:

  • Rotate through the registers to allow the backend scheduler to pair up QPU instructions more frequently.
  • Allow reading from the phys regfile in the following instruction.
  • Use the SFU instructions instead of magic writes on V3D 4.x.
  • Skip texture parameter P2 if it’s just the default value.
  • Start using small immediates (especially useful for VS input/outputs on 4.x).
  • Fix extra scene flush when doing the GFXH-1461 workaround.

After this I should probably work on the GMP support again, then get back to fixing GLES3.

July 22, 2018

The All Systems Go! 2018 Call for Participation Closes in One Week!

The Call for Participation (CFP) for All Systems Go! 2018 will close in one week, on 30th of July! We’d like to invite you to submit your proposals for consideration to the CFP submission site quickly!

ASG image

Notification of acceptance and non-acceptance will go out within 7 days of the closing of the CFP.

All topics relevant to foundational open-source Linux technologies are welcome. In particular, however, we are looking for proposals including, but not limited to, the following topics:

  • Low-level container executors and infrastructure
  • IoT and embedded OS infrastructure
  • BPF and eBPF filtering
  • OS, container, IoT image delivery and updating
  • Building Linux devices and applications
  • Low-level desktop technologies
  • Networking
  • System and service management
  • Tracing and performance measuring
  • IPC and RPC systems
  • Security and Sandboxing

While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome, as long as they have a clear and direct relevance for user-space.

For more information please visit our conference website!

July 18, 2018

Battery life
I was very happy to see that Fedora Workstation 28 in the Phoronix benchmark got the best power consumption on a Dell XPS 13. Improving battery life has been a priority for us and Hans de Goede has been doing some incredible work so far. And best of all; more is to come :). So if you want great battery life with Linux on your laptop then be sure to be running Fedora on your laptop! On that note and to state the obvious, be aware that Fedora Workstation adoption rates are actually a major metric for us to decide where to put our efforts, so if we see good growth in Fedora due to people enjoying the improved battery life it enables us to keep investing in improving battery life, if we don’t see the growth we will need to conclude people don’t care that much and more our investment elsewhere.

Desktop remoting under Wayland
The team is also making great strides with desktop remoting under Wayland. In Fedora Workstation 29 we will have the VNC based GNOME Shell integrated desktop sharing working under Wayland thanks to the work done by Jonas Ådahl. It relies on PipeWire to share you Wayland session over VNC.
On a similar note Jan Grulich, Tomas Popela and Eike Rathke has been working on enabling Wayland desktop sharing through Firefox and Chromium. They are reporting good progress and actually did a video call between Firefox and Chromium last week, sharing their desktops with each other. This is enables by providing a PipeWire backend for both Firefox and Chromium. They are now working on cleaning up their patches and prepare them for submission upstream. We are also looking at providing a patched Firefox in Fedora Workstation 28 supporting this.

Wim Taymans talked about and demonstrated the latest improvements to PipeWire during GUADEC last week. He now got a drop in replacement that allows applications like Totem and Rhythmbox to play audio through PipeWire using the PulseAudio GStreamer plugin as Pipewire now provides a drop in replacement. Wim also keeps improving the Jack support in PipeWire by testing Jack applications one by one and fixing corner cases as he discovers them or they are reported by the Linux pro-audio community. We also ended up ordering Wim a Sony HT-Z9F soundbar for testing as we want to ensure PipeWire has great support for passthrough, be that SPDIF, HDMI or Bluetooth. The HT-Z9F also supports LDAC audio which is a new high quality audio format for Bluetooth and we want PipeWire to have full support for it.
To accelerate Pipewire development and adoption for audio we also decied to try to organize a PipeWire and Linux Audio hackfest this fall, with the goal of mapping our remaining issues and to try to bring the wider linux audio community together. So I am very happy that Arun Raghavan of PulseAudio fame agreed to be one of the co-organizer of this hackfest. Anyone interested in attending the PipeWire 2018 hackfest either add yourself to the attendee list or contact me (contact information can be found through the hackfest page) and I be happy to add you. The primary goal is to have developers from the PulseAudio and JACK communities attend alongside Wim Taymans and Bastien Nocera so we can make sure we got everything we need on the development roadmap and try to ensure we all pull in the same direction.

GNOME Builder
Christian Hergert did an update during GUADEC this year on GNOME Builder. As usual a ton of interesting stuff happening including new support for developing towards embedded devices like the upcoming Purism phone. Christian in his talk mentioned how Builder is probably the worlds first ‘Container Native IDE’ where it both is being developed with being packaged as a Flatpak in mind, but also developed with the aim of creating Flatpaks as its primary output. So a lot of effort is being put into both making sure it works well being inside a container itself, but also got all the bells and whistles for creating containers from your code. Another worthwhile point to mention is that Builder is also one of the best IDEs for doing Rust development in general!

Game mode in Fedora
Feral Interactive, one of the leading Linux game companies, released a tool they call gamemode for Linux not long ago. Since we want gamers to be first class citizens in Fedora Workstation we ended up going back and forth internally a bit about what to do about it, basically discussing if there was another way to resolve the problem even more seamlessly than gamemode. In the end we concluded that while the ideal solution would be to have the default CPU governor be able to deal with games better, we also realized that the technical challenge games posed to the CPU governor, by having a very uneven workload, is hard to resolve automatically and not something we have the resources currently to take a deep dive into. So in the end we decided that just packaging gamemode was the most reasonable way forward. So the package is lined up for the next batch update in Fedora 28 so you should soon be able to install it and for Fedora Workstation 29 we are looking at including it as part of the default install.

July 16, 2018

Since the beginning of May 2018, I have been diving into the DRM subsystem. In the beginning, nothing made sense to me, and I had to fight hard to understand how things work. Fortunately, I was not alone, and I had great support from Gustavo Padovan, Daniel Vetter, Haneen Mohammed, and the entire community. Recently, I finally delivered a new feature for VKMS: the infrastructure for Vblank and page flip events.

At this moment, VKMS have regular Vblank events simulated through hrtimers (see drm-misc-next), which is a feature required by VKMS to mimic real hardware [6]. The development approach was entirely driven by the tests provided by IGT, more specifically the kms_flip. I modified IGT to read a module name via command line and force the use of it, instead of using only the modules defined in the code (patch submitted to IGT, see [1]). With this modification in the IGT, my development process to add a Vblank infrastructure to VKMS had three main steps as Figure 1 describes.

foo Figure 1: My work cycle in VKMS

Firstly, I focused only on the subtest “basic-plain-flip” from IGT and after each execution of the test I checked the failure messages. Secondly, I tried to write the required code to make the test pass; it is essential to highlight that this phase sometimes took me days to understand the problem and implement the fix. Finally, after I overcome the failure, I just put an additional effort to improve the implementation. As can be seen in the patchset send to add the Vblank support [2], the first set of patches was not directly related to the Vblank itself, but it was a necessary infrastructure required for kms_flip to work.

foo Figure 2: sudo ./tests/kms_flip --run-subtest basic-plain-flip --force-module vkms

After an extended period of work to make VKMS pass in the basic-plain-flip, I finally achieved it thanks to all the support that I received from the DRM community. Next, I started to work on the subtest “wf_vblank-ts-check”, and here I spent a lot of time debugging problems. My issue here was due to the stochastic test behavior, sometimes it passed and other, it fails, and I supposed the problem was related to the accumulation of errors during the page flip step. As a result, I put a considerable effort to make the timer in the page flip precise, I end up with a patch that calculates the exact moment for the next period (see [5]). Nevertheless, after I submitted the patch, Chris Wilson highlighted that I was reinventing the wheel since hrtimer already did the required calculations [4]; he was 100% right, after his comment I looked line by line of hrtimer_forward, and I concluded that I implemented the same algorithm. I lost some days recreating something that is not useful in the end; however, it was really valuable for me since I learned how hrtimer works and also expanded my comprehension about Vblank. Finally, Daniel Vetter precisely pointed out a series of problems in the patch (see [3]) that not only improved the tests, but also made most of the tests in the kms_flip pass.

In conclusion, adding the infrastructure for Vblank and page flip events in vkms was an exciting feature for VKMS, but also it was an important task to teach me how things work in the DRM. I am still focused on this part of the VKMS, but now, I am starting to think how can I add virtual hardware which does not support Vblank interrupt. Finally, I want to write a detailed blog post on how I implemented the Vblank support in the VKMS and another post about timers (users and kernel space); I believe this sort of post could be helpful for someone that is just starting in the DRM subsystem.

Thanks for all the DRM community that is always kind and provide great help for a newcomer like me :)


  1. Force option in IGT
  2. Adding infrastructure for Vblank and page flip events in vkms
  3. Daniel Vetter comments in V2
  4. Chris Wilson comments about hrtimer
  5. Calculating the period
  6. DRM misc-next
July 09, 2018

Now that my V3D GPU hangs are cleared up, I made a lot of progress on conformance:

Fixes from last week:

  • Don’t reset the GPU until it stops making progress on the job (instead of just a fixed timeout).
  • Fixed noperspective interpolation
  • Fixed leaks of default attributes and spill BOs.
  • Fixed ARB_color_buffer_float clamping of gl_FragData[]
  • Fixed support for GL_EXT_draw_buffers2’s independent blend enables.

Fail/(Pass+Fail) ratios after my weekend run:

  • EGL: 96/2143
  • GLES2: 7/16985
  • GLES3: 25/42549
  • GLES3 multisample: crash!

I’ve debugged the first MSAA crash, but there are some MSAA resolve bugs after that which I need to sort out (some of which are probably the cause of those EGL failures).

On the VC4 front, Boris landed the writeback changes, and I submitted the DT for it in my ARM pull request.