planet.freedesktop.org
May 22, 2019
Thank you all for the large amount of feedback I have received after my previous Wayland Itches blog post. I've received over 40 mails, below is an attempt at summarizing all the mails.

Highlights

1. Middle click on title / header bar to lower the Window does not work for native apps. Multiple people have reported this issue to me. A similar issue was fixed for not being able to raise Windows. It should be easy to apply a similar fix for the lowering problem. There are bugs open for this here, here and here.

2. Running graphical apps via sudo or pxexec does not work. There are numerous examples of apps breaking because of this, such as lshw-gui and usbivew. At least for X11 apps this is not that hard to fix. But sofar this has deliberately not been fixed. The reasoning behind this is described in this bug. I agree with the reasoning behind this, but I think it is not pragmatic to immediately disallow all GUI apps to connect when run as root starting today.

We need some sort of transition period. So when I find some time for this, I plan to submit a merge-requests which optionally makes gnome-shell/mutter start Xwayland with an xauth file, like how it is done when running in GNOME on Xorg mode. This will be controlled by a gsettings option, which will probably default to off upstream and then distros can choice to override this for now, giving us a transition period

Requests for features implemented as external programs on X11

There are various features which can be implemented as external programs
on X11, but because of the tighter security need to be integrated into the
compositor with Wayland:

  • Hiding of the mouse-cursor when not used à la unclutter-xfixes, xbanish.

  • Rotating screen 90 / 270 degrees à la "xrandr -o [left|right]" mostly used through custom hotkeys, possible fix is defining bindable actions for this in gsd-media-keys.

  • Mapping actions to mouse buttons à la easystroke

  • Some touchscreen's, e.g. so called smart-screens for education, need manual calibration. Under X11 there are some tools to get the callibration matrix for the touchscreen, after which this can be manually applied through xinput. Even under X11 this currently is far from ideal but at least it is possible there.

  • Keys Indicator gnome-shell extension. This still works when using Wayland, but only works for apps using Xwayland, it does not work for native apps.

  • Some sort of xkill and xdotool utility equivalents would be nice

  • The GNOME on screen keyboard is not really suitable for use with apps which are not touch-enabled, as it lacks a way to send ctrl + key, etc. Because of this some users have reported that it is impossible to use alternative on screen keyboards with Wayland. Not being able to use alternative on screen keyboards is by design and IMHO the proper fix here is to improve GNOME's on screen keyboard.

App specific problems


  • Citrix ICA Client does not work well with Xwayland

  • Eclipse does not work well with Xwayland

  • Teamviewer does not work with Wayland. It needs to be updated to use pipewire for screencapturing and the RemoteDesktop portal to inject keyboard and mouse events.

  • Various apps lack screenrecording / capture support due to the app not having support for pipewire: gImageReader, green-recorder, OBS studio, peek, screenrecorder, slack

  • For apps which do support pipewire, there is not an option to share the contents of a window, other then the window making the request. On Xorg it is possible to share a random window and since pipewire allows sharing the whole desktop I see no security reason why we would not allow sharing another window.

  • guake window has incorrect size when using HiDPI scaling, see this issue

Miscellaneous problems


  • Mouse cursor is slow / lags

  • Drag and drop sometimes does not work, e.g. dragging files into file roller to compress or out of file roller to extract.

  • Per keyboard layouts. On X11 after plugging in a keyboard, the layout/keymap for just that one keyboard can be updated manually using xinput, allowing different keyboard layouts for different keyboards when multiple keyboards are connected

  • No-title-bar shell extension, X button can be hit unintentionally, see this issue

  • Various issues with keyboard layout switching

Hard to fix issues


  • Alt-F2, r equivalent (restart the gnome-shell)

  • X11 apps running on top of Xwayland do not work well on HiDPI screens

  • Push-to-talk (passive key grab on space) does not work in Mumble when using native Wayland apps, see this issue

Problems with other compositors then GNOME3 / mutter

I've also received several reports about issues when using another Wayland compositor as GNOME / mutter (Weston, KDE, Sway). I'm sorry but I have not looked very closely into these reports. I believe that it is great that Linux users have multiple Desktop Environments to choose from and I wish for the other DEs to thrive. But there are only so many hours in a day so I've chosen to mainly focus on GNOME.
First of all I do not want people to get their hopes up about $subject of this blogpost. Improving gaming support is a subjects which holds my personal interest and it is an issue I plan to spend time on trying to improve. But this will take a lot of time (think months for simple things, years for more complex things).

As I see it there are currently 2 big issues when running games under Wayland:

1. Many games show as a smal centered image with a black border (letterbox) around the image when running fullscreen.

For 2D games this is fixed by switching to SDL2 which will transparently scale the pixmap the game renders to the desktop resolution. This assumes that 2D games in general do not demand a lot of performance and thus will not run into performance issues when introducing an extra   scaling step. A problem here is that many games still use SDL1.2 (and some games do not use SDL at all).

I plan to look into the recently announced SDL1.2 compatibility wrapper around SDL2. If this works well this should fix this issue for all SDL1.2 2D games, by making them use SDL2 under the hood.

For 3D games this can be fixed by rendering at the desktop resolution, but this might be slow and rendering at a lower resolution leads to the letterbox issue.

Recently mutter has has grown support for the WPviewport extension, which allows Wayland apps to tell the compositor to scale the pixmap the app gives to the compositor before presenting it. If we add support to SDL2's Wayland backend for this then, this can be used to allow rendering 3D apps at a lower resolution and still have them fill the entire screen.

Unfortunately there are 2 problems with this plan:

  1. SDL2 by default uses its x11 backend, not its wayland backend. I'm not sure what fixes need to be done to change this, at a minimum we need a fix at either the SDL or mutter side for this issue, which is going to be tricky.

  2. This only helps for SDL2 apps, again hopefully the SDL1.2 compatibility wrapper for SDL2 can help here, at least for games using SDL.

2. Fullscreen performance is bad with many games.

Since under Wayland games cannot change the monitor resolution, they need to either render at the full desktop resolution, which can be very slow; or they render at a lower resolution and then need to do an extra scaling step each frame.

If we manage to make SDL2's Wayland backend the default and then add WPviewport support to it then this should help by reducing an extra memcpy/blit of a desktop-sized pixmap. Currently what apps which use scaling do is:

  1. render lower-res-pixmap;

  2. scale lower-res-pixmap to desktop-res-pixmap

  3. give desktop-res-pixmap to the compositor;

  4. compositor does a hardware blit of the desktop-res-pixmap to the framebuffer.

With viewport support this becomes:

  1. render lower-res-pixmap;

  2. give low-res-pixmap to the compositor;

  3. compositor uses hardware to do a scaling blit from the low-res-pixmap to the desktop-res framebuffer

Also with viewport support, the compositor could in the case of there only being the one fullscreen app even keep the framebuffer in lowres and use a hardware scaling drm-plane to send the low-res framebuffer scaled to desktop-res to the output while only reading the low-res framebuffer from memory saving a ton of memory bandwidth. But this optimization is going to be a challenge to pull off.
May 21, 2019
The just released 5.2-rc1 kernel includes improved support for Logitech wireless keyboards and mice. Until now we were relying on the generic HID keyboard and mouse emulation for 27 MHz and non-unifying 2.4 GHz wireless receivers.

Starting with the 5.2 kernel instead we actually look at the devices behind the receiver. This allows us to provide battery monitoring support and to have per device quirks, like device specific HID-code to evdev-code mappings where necessary. Until now device specific quirks where not possible because the receivers have a generic product-id which is the same independent of the device behind the receiver.

The per device key-mapping is especially important for 27MHz wireless devices, these use the same HID-code for Fn + F1 to Fn + F12 for all devices, but the markings on the keys differ per model. Sofar it was impossible for Linux to get the mapping for this right, but now that we have per device product-ids for the devices behind the receiver we can finally fix this. As is the case with other devices with vendor specific mappings, the actual mapping is done in userspace through hwdb.

If you have a 27 MHz device (often using this receiver, keyboard marked as canada 210 or canada 310 at the bottom). Please give 5.2 a try. Download the latest 60-keyboard.hwdb file and place it in /lib/udev/hwdb.d (replacing the existing file) and then run "sudo udevadm hwdb --update", before booting into the 5.2 kernel. Then run "sudo evemu-record" select your keyboard and try Fn + F1 to Fn + F12 and any other special keys. If any keys do not work, edit 60-keyboard.hwdb, search for Logitech and add an entry for your keyboard, see the existing Logitech entries. After editing you need to re-run "sudo udevadm hwdb --update", followed by "sudo udevadm trigger" for the changes to take effect. Once you have a working entry, submit a pull-req to systemd to get the changes upstream. If you need any help drop me an email.

We still have some old code for the generic HID emulation for 27 MHz receivers with a product-id of c50c, these should work fine with the new code, but we've been unable to test this. I would really like to move the c50c id over to the new code and remove all the old code. If you've a 27 MHz Logitech device, please run lsusb, if your device has a product-id of c50c and you are willing to test, please drop me an email.

Likewise I suspect that 2.4GHz receivers with a product-id of c531 should work fine with the new support for non-unifying 2.4 GHz receivers, if you have one of those also please drop me an email.
May 14, 2019
We have a job in our team, it's a pretty senior role, definitely want people with lots of experience. Great place to work,ignore any possible future mergers :-)

https://global-redhat.icims.com/jobs/68911/principal-software-engineer/job?mobile=false&width=1526&height=500&bga=true&needsRedirect=false&jan1offset=600&jun1offset=600
Now that GNOME3 on Wayland is the default in Fedora I've been trying to use this as my default desktop, but until recently I've kept falling back to GNOME3 on Xorg because of various small issues.

To fix this I've switched to using GNOME3 on Wayland as day to day desktop now and I'm working on fixing any issues which this causes as I hit them, aka "The Wayland Itches project". So far I've hit and fixed the following issues:

1. TopIcons

The TopIcons extension, which I depend on for some of my workflow, was not working well under Wayland with GNOME-3.30, only the top row of icons was clickable. This was fixed in GNOME-3.32, but with GNOME-3.32 using TopIcons was causing gnome-shell to go into a loop leading to a very high CPU load. The day I wanted to start looking into fixing this I was chatting to Carlos Garnacho and he pointed out to me that this was fixed a couple of days ago in gnome-shell. The fix for this is in gnome-shell 3.32.2 .

2. Hotkeys/desktop shortcuts not working in VirtualBox Virtual Machines

When running a VirtualBox VM under GNOME3 on Wayland, hotkeys such as alt+tab go to the GNOME3 desktop, rather then being forwarded to the VM as happens under Xorg. This can be fixed by changing 2 settings:

  gsettings set org.gnome.mutter.wayland xwayland-allow-grabs true
  gsettings set org.gnome.mutter.wayland xwayland-grab-access-rules "['VirtualBox Machine']"

This is a decent workaround, but we want things to "just work" of course, so we have been working on some changes to make this just work in the next GNOME version.

3. firefox-wayland

I've been also trying to use firefox-wayland as my day to day browser, this has lead to me filing three firefox bugs and I've switched back to regular
firefox (x11) for now.


If you have any Wayland Itches yourself, please drop me an email at hdegoede@redhat.com explaining them in as much detail as you can and I will see what I can do. Note that I typically get a lot of emails when asking for feedback like this, so I cannot promise that I will reply to every email; but I will be reading them all.
May 07, 2019

Once upon a time a driver was written for the Lenovo Ideapad firmware interface for handling special keys and rfkill functionality. This driver was written on an Ideapad laptop with a slider on the side to turn wifi on/off, a so called hardware rfkill switch. Sometime later a Yoga model using the same firmware interface showed up, without a hardware rfkill switch. It turns out that in this case the firmware interface reports the non-present switch as always in the off position, causing NetworkManager to not even try to use the wifi effectively breaking wifi.

So I added a dmi blacklist for models without a hardware rfkill switch. The same firmware interface is still used on new Ideapad and Yoga models and since most modern laptops typically do not have such a switch this dmi blacklist has been growing and growing. Just in the 5.1 kernel alone 5 new models were added. Worse as mentioned not being on the list for a model without the hardware switch leads to non working wifi, pretty much leading to any new Ideapad model not working with Linux until added to the list.

To fix this I've submitted a patch upstream turning the huge blacklist into a whitelist. This whitelist is empty for now, meaning that we define all models as not having a rfkill switch. This does lead to a small regression on models which do actually have a hardware rfkill switch, before this commit they would correctly report the wifi being disabled by the hw switch and e.g. the GNOME3 UI would report "wifi disabled in hardware", where as now users will just get an empty list of available wifi networks when the switch is in the off position. But this is a small price to pay to make sure that as of yet unknown and new Ideapad models do not have non-working wifi because of this issue.

As said the whitelist for models which do actually have a hardware rfkill switch is currently empty, so I need your help to fill it. If you have an Ideapad or Yoga laptop with a wifi on/off slider switch on it. Please run "rfkill list" if this contains "ideapad_wlan" in the output then you are using the ideapad-laptop driver. In this case please check that the "Hard blocked" setting for the "ideapad_wlan" rfkill device properly shows no / yes based on the switch position. If this works your model should be added to the new whitelist. For this please run: "sudo dmidecode &> dmidecode.log" and send me an email at hdegoede@redhat.com with the dmidecode.log attached.

Note the patch to change the list to a whitelist has been included in the Fedora kernels starting with kernel 5.0.10-300, so if you have an
Ideapad or Yoga running Fedora and you do see "ideapad_wlan" in the "rfkill list" output, but the "Hard blocked" setting does not respond, try with a kernel older then 5.0.10-300, let me know if you need help with this.

May 03, 2019

A previous post introduced the SPURV Android compatibility layer for Wayland based Linux environment.
In this post we're going to dig into how you can run an Android application on the very common i.MX6 based Nitrogen6_MAX board from Boundary Devices.

Install dependencies

sudo apt install \
    apt-transport-https \
    bmap-tools \
    ca-certificates \
    curl \
    git \
    gnupg2 \
    repo \
    software-properties-common \
    u-boot-tools \
    qemu-kvm

Set up Docker container for building

# Install Docker
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
sudo apt update
sudo apt install docker-ce

# Set up privileges for Docker
sudo usermod -aG docker ${USER}
su - ${USER}

# Fetch Docker image
docker pull godebos/debos:latest

Build

Build Android

mkdir android; cd android
repo …
May 02, 2019

lwn.net just featured an article the sustainability of open source, which seems to be a bit a topic in various places since a while. I’ve made a keynote at Siemens Linux Community Event 2018 last year which lends itself to a different take on all this:

The slides for those who don’t like videos.

This talk was mostly aimed at managers of engineering teams and projects with fairly little experience in shipping open source, and much less experience in shipping open source through upstream cross vendor projects like the kernel. It goes through all the usual failings and missteps and explains why an upstream first strategy is the right one, but with a twist: Instead of technical reasons, it’s all based on economical considerations of why open source is succeeding. Fundamentally it’s not about the better software, or the cheaper prize, or that the software freedoms are a good thing worth supporting.

Instead open source is eating the world because it enables a much more competitive software market. And all the best practices around open development are just to enable that highly competitive market. Instead of arguing that open source has open development and strongly favours public discussions because that results in better collaboration and better software we put on the economic lens, and private discussions become insider trading and collusions. And that’s just not considered cool in a competitive market. Similar arguments can be made with everything else going on in open source projects.

Circling back to the list of articles at the top I think it’s worth looking at the sustainability of open source as an economic issue of an extremely competitive market, in other words, as a market failure: Occasionally the result is that no one gets paid, the customers only receive a sub-par product with all costs externalized - costs like keeping up with security issues. And like with other market failures, a solution needs to be externally imposed through regulations, taxation and transfers to internalize all the costs again into the product’s prize. Frankly no idea how that would look like in practice though.

Anyway, just a thought, but good enough a reason to finally publish the recording and slides of my talk, which covers this just in passing in an offhand remark.

Update: Fix slides link.

April 26, 2019

The Igalia Coding Experience is an internship program which provides their first exposure to the professional world, working hand in hand with Igalia programmers and learning with them. The internship is aimed to students with background in Computer Science, Information Technology, or Free Software development.

This program is a great opportunity for those students willing to improve their technical skills by working on the field, learn how to contribute to open-source projects, and work together with the engineers of Igalia, which is a worker-owned company rocking in the Free Software world for more than 18 years!

Igalia

We are looking for candidates that are passionate about Free Software philosophy and willing to work on Free Software projects. If you have already contributed to any Free Software project related to our areas of specialization… that’s great! But don’t worry if you have not yet, we encourage you to apply as well!

The conditions of the program are the following:

  • You will be mentored by one Igalian that is an expert in the respective field, so you are not going to be alone.
  • You will need to spend 450h working in the tasks agreed with the mentor, but you are free to distribute them along the year as it fits better for you. Usually students prefer to distribute them on timetables of 3 months working full-time, or 6 months part-time or even 1 year working 10 hours per week!
  • You are not going to do it for free. We will compensate you with 6500€ for all your work :)

This year we are offering Coding experience positions on 6 different areas:

  • Implementation of web standards. The intern will become familiar, and contribute to the implementation of W3C standards in open source web engines.

  • WebKit, one of the most important open source web rendering engines. The intern will have the opportunity to help maintain and/or contribute to the development of new features.

  • Chromium, a well-known browser rendering engine. The intern will work on specific features development and/or bug-fixing. Additional tasks may include maintenance of our internal buildbots, and creation of Chromium/Wayland packages for distribution.

  • Compilers, with focus on WebAssembly and JavaScript implementations. The intern will contribute JS engines like V8 or JSC, working on new language features, optimizations or ports.

  • Multimedia and GStreamer, the leading open source multimedia framework. The intern will help develop the Video Editing stack in GStreamer (namely GES and NLE). This work will include adding new features in any part of GStreamer, GStreamer Editing Services or in the Pitivi video editor, as well as fixing bugs in any of those components.

  • Open-source graphics stack. The student will work in the development of specific features in Mesa or in improving any of the open-source testing suites (VkRunner, piglit) used in the Mesa community. Candidates who would like to propose topics of interest to work on, please include them in the cover letter.

The last area is the one that I have been working for more than 5 years inside the Graphics team at Igalia and I am thrilled we can offer such kind of position this year :-)

You can find more information about Igalia Coding experience program in the website… don’t forget to apply for it! Happy hacking!

April 15, 2019

A previous post introduced the SPURV Android compatibility layer for Wayland based Linux environment.
In this post we're going to dig into how you can run an Android application on the very common i.MX6 based Nitrogen6_MAX board from Boundary Devices.

Build SPURV for Nitrogen6_MAX

Build Android

mkdir android; cd android
repo init -u https://android.googlesource.com/platform/manifest -b android-9.0.0_r10
git clone https://gitlab.collabora.com/zodiac/android_manifest.git .repo/local_manifests/
repo sync -j15
. build/envsetup.sh
lunch spurv-eng
make -j12
cd ..

Build Linux Kernel

git clone https://gitlab.collabora.com/zodiac/linux.git -b android-container
cd linux
sh ../android/device/freedesktop/spurv/build-kernel.sh
cd ..

Create root filesystem

Just a kernel does not make an OS, so we're using Debian …

April 10, 2019

After upgrading my main workstation to F30 a while ago (soon after it branched) dbus-broker failed to start, making my machine pretty-much unusable. I tried putting selinux in permissive mode and that fixed it, so I made a note to revisit this later.

Fast-forward to today, I applied all updates, did a full-relabel for good measure and things were still broken. Spinning up a fresh F30 vm does not exhibit this problem, so the problem had to be something specific to my machine. After lots of debugging I found bug 1663040 which is about the same thing happen on the live media and only on the live media, the problem turns out to be the selinux attributes on the mount-points (/dev, /proc, /sys) in / which cannot be updated by a relabel because at that time they already have a filesystem mounted on them.

I created the problem of the wrong labels myself when I moved from a hdd to a ssd and did a cp -pr of the non mount dirs and a straight forward mkdir to create the mount-points on the ssd. Zbigniew gives a need trick to detect this problem from a running system in bug 1663040:

mkdir /tmp/foo
sudo mount --bind / /tmp/foo
ls -lZd /tmp/foo/* | grep unlabeled

If the output of the last command show any files/dirs with unlabeled_t as type then your system has the same issue as mine had. To fix this boot from a livecd, mount your / on /mnt, cd into /mnt and then run:

chcon -t device_t dev
chcon -t home_root_t home
chcon -t root_t proc sys
chcon -t var_run_t run

Then umount /mnt and reboot. After this your system should be able to run in enforcing mode again without problems.

April 06, 2019
While browsing for some internationalisation/localisation features, I found an interesting piece of functionality in Android's developer documentation. I'll quote it here:
A pseudolocale is a locale that is designed to simulate characteristics of languages that cause UI, layout, and other translation-related problems when an app is translated.
I've now implemented this for applications and libraries that use gettext, as an LD_PRELOAD library, available from this repository.


The current implementation can highlight a number of potential problems (paraphrasing the Android documentation again):
- String concatenation, which displays as one message split across 2 or more brackets.
- Hard-coded strings, which cannot be sent to translation, display as unaccented text in the pseudolocale to make them easy to notice.
- Right-to-left (RTL) problems such as elements not being mirrored.

Our old friend, Office Runner 


Testing brought some unexpected results :)
April 03, 2019

I just installed the Fedora Workstation 30 Beta yesterday and so far things are looking great. As many others have reported to, with the GNOME 3.32 update things definitely feels faster and smoother. So I thought it was a good time to talk about what is coming in Fedora Workstation 30 and what we are currently working on.

Fractional Scaling: One of the big features that landed, although still considered experimental was the fractional scaling feature that has been a collaboration between Jonas Ådahl here at Red hat and Marco Trevisan at Canonical. It has taken quite some time since the initial hackfest as it is a complex task, but we are getting close. Fractional scaling is a critical feature for many HiDPI screen laptops to get a desktop size that perfectly fits their screen, not being to small or to large.

Screen sharing support for Chrome and Firefox under Wayland. The Wayland security model doesn’t allow any application to freely grab images or streams of the whole desktop like you could under X. This is of course a huge improvement in security, but it did cause some disruption for valid usecases like screen sharing with things like BlueJeans and Google Hangouts. We been working on resolving that with the help of PipeWire. We been at it for some time and things are now coming together. Chrome 73 ships with everything needed to make this work with Chrome, although you have to turn it on manually (got to this URL to turn it on: chrome://flags/#enable-webrtc-pipewire-capturer). The reason it needs to be manually enabled is not that it is unreliable, it is because the UI is still a little fugly due to a combination of feature overlap between the browser and the desktop and also how the security feature of the desktop is done. We are trying to come up with ways for the UI to be smoother without sacrificing your privacy/security. For Firefox we will keep shipping with our downstream patch until we manage to get it landed upstream.

Firefox for Wayland: Martin Stransky has been hard at work making Firefox be able to run Wayland-native. That work is tantalizingly near, but we decided to postpone it for Fedora Workstation 31 in the end to make sure it is really well polished before releasing it upon the world. The advantage of Wayland native Firefox is that in addition to bring us one step closer to not needing to run an X server (XWayland) all the time it also enables things like fractional scaling mentioned above to work for Firefox.

OpenH264 improved: As many of you know Firefox relies on a library called OpenH264, provided by Cisco, for its H264 video codec support for WebRTC. This library is also provided to Fedora users from Cisco free of charge (you can install it through GNOME Software). However its usefulness have been somewhat limited due to only supporting the baseline profile used for video calling, but not the Main and High profiles used by most online video content. Well what I can tell you is that Red Hat, Endless and Cisco partnered with Centricular some time ago to add support for decoding those profiles to OpenH264 and that work is now almost complete. The basic code enabling them is already merged, but Jan Schmidt at Centricular is working on fixing a few files that are still giving us problems. As soon as that is generally shipping we hope to get Firefox to be able to use OpenH264 also for things like Youtube playback and of course also use OpenH264 to playback any H264 using GStreamer applications like Totem. So a big thank you to Endless, Cisco and Centricular for working with us on this and thus enabling us to have a legal way to offer H264 support to our users.

NVidia binary driver support under Wayland: We been putting it quite a bit of effort trying to tie off the lose ends for using the NVidia binary driver with Wayland. We did manage to fix a long list of bugs like dealing with various colorspace issues, multimonitor setups and so on. For Intel and AMD graphics users things should actually be pretty good to go at this point. The last major item holding us back on the NVidia side is full support for using the binary driver with XWayland applications (native Wayland applications should work fine already). Adam Jackson worked diligently to get all the pieces in place and we do think we have a model now that will allow NVidia to provide an updated driver that should enable XWayland. As it stands though that driver update is likely to only come out towards the fall, so we will keep defaulting to X for NVidia binary driver users for some time more.

Gaming under Wayland. Olivier Fourdan and Jonas Ådahl has trying to crush any major Wayland bug reported for quite some time now and one area where we seem to have rounded the corner is for games. Valve has been kind enough to give us the ability to install and run any steam game for testing purposes, so whenever we found a game giving us trouble we have been able to let Olivier and Jonas reproduce it easily. So on my own gaming box I am now able to run all the Steam games I have under Wayland, including those using Proton, without a hitch. We haven’t tested with the full Steam catalog of course, there are thousands, so if your favourite game is giving you trouble under Wayland still, please let us know. Talking about gaming one area we will try to free up some cycles going forward to look deeper at is Flatpaks and gaming. We already done quite a bit of work in this area, with things like the NVidia binary driver extension and the Steam package on Flathub. But we know from leading linux game devs that there are still some challenges to be resolved, like making host device access for gamepads simpler from within the Flatpak sandbox.

Flatpak Creation in Fedora. Owen Taylor has been in charge of getting Flatpaks building in Fedora, ensuring we can produce Flatpaks from Fedora packages. Owen set up a system to track the Fedora Flatpak status, we got about 10 applications so far, but hope to greatly grow that number of time as we polish up the system. This enables us to start planning for shipping some applications in Fedora Workstation as Flatpaks by default in a future release. This respository will be available by default in Fedora workstation 30 and you can choose the flatpak version of the package through the new drop down box in the top right corner of GNOME Software. For now the RPM version of the package is still the default, but we expect to change that in later releases of Fedora Workstation.

Gedit in GNOME Software with Source drop down box

Gedit in GNOME Software with Source drop down box

Fedora Toolbox – Debarshi Ray is leading the effort we call Fedora Toolbox, which is our starting point for our goal to revitalise and revolutionize development on Linux. Fedora Toolbox is trying to take the model of a pet container for development and make it seamless and natural. Our goal is to make it dead simple to create pet containers for your projects, so you can for instance have a Fedora pet container where you develop against the leading edge libraries and tools in Fedora, and you can have a RHEL based container where you develop against the library versions and tools shipping in RHEL (makes updating and fixing in production applications a lot easier) and maybe a SteamOS container to work on your little game project. Currently the model is that you have one pet container per OS your targeting, but we are pondering if maybe having one pet container per project would be even better if we can find good ways to avoid it being a lot of extra overhead (by for example having to re-install all your favourite command line tools in the container) or just outright confusing (which container got what tools and libraries again). Our goal here though is to ensure Fedora becomes the premier container native OS out there and thus a natural home for developers doing container development.
We are also working with the team inside Red Hat focusing on AI/ML and trying to ensure that we have a super smooth way for you to get a pet container with things like TensorFlow and CUDA up and running quickly.

Being an excellent platform for Openshift and Kubernetes development: We are putting effort into together with the Red Hat developer tools organization to bringing the OpenShift and CodeReady Studio and CodeReady Workspaces tools to Fedora. These tools have so far been very focused on RHEL support, but thanks to Flatpak for CodeReady Studio and web integration for CodeReady Workspaces we now have a path for making them easily available in Fedora too. In the world of Kubernetes OpenShift is where you want to be, and we want Fedora Workstation to be the ultimate portal for OpenShift development.

Fleet Commander with Active Directory support – So we are about to hit a very major milestone with Fleet Commander our large scale desktop management tool for Fedora and RHEL. Oliver Gutierrez has been hard at work making it work with Active Directory in addition to the existing FreeIPA support. We know that a majority of people interested in Fleet Commander are only using Active Directory currently, so being able to use Active Directory with Fleet Commander should make this great tool available to a huge number of new users. So if you are managing a University computer lab or a large number of Fedora or RHEL clients in your company we should soon have a Fleet Commander release out that you can use. And if you are not using Fedora or RHEL today well Fleet Commander is a very big reason for switching over!
We will do a proper announcement with further details once the release with Active Directory support is out.

PipeWire – I don’t have a major development to report, just a lot of steady work being done to stabilize and improve PipeWire. As mentioned earlier we now have Wayland screen sharing and recording working smoothly in the major browsers which is the user facing feature I think most of you will notice. Wim is still working on pushing the audio side it forward, but that is also a huge task. We have started talking about organizing a new hackfest soon to see if we can accelerate the effort further again. Likely scenario at this point in time is that we start enabling the JACK side of PipeWire first, maybe as early as Fedora Workstation 31, and then come back and do the PulseAudio replacement as a last stage.

Improved Input handling Another area we keep focusing on is improving input in Fedora. Peter Hutterer and Benjamin Tissoires are working hard on improving the stack. Peter just sent an extensive RFC out for how to deal with high resolution mice under Linux and Benjamin has been trying to get support for the Dell Totem landed. Neither will be there unfortunately for Fedora Workstation 30,but we expect to land this before Fedora Workstation 31.

Flicker-free boot
Hans de Goede has continued working on his effort to create a flicker-free boot experience with Fedora. The results of this work is on display in Fedora Workstation 30 and will for most of you now provide a seamless bootup experience . This effort is not so much about functionality as it is about ensuring you have an end-to-end polished experience with your Linux desktop. Things like the constant mode changes we seen in the past contribute to giving Linux an image of being unpolished and we want Fedora to be the vehicle that breaks down that image.

Ramping up Silverblue

For those of you following Fedora you are probably aware of Silverblue, which is our effort to re-think the Linux desktop distribution from the ground up and help us take the Linux desktop to a new level. The distribution model hasn’t really changed much over the last 20 years and we probably polished up the offering as far as we can within the scope of that model. For instance I upgraded my system to Fedora 30 beta yesterday and it was a long and tedious process of looking at about 6000 individual packages get updated from the Fedora 29 version to the Fedora 30 version one by one. I didn’t hit a lot of major snags despite this being a beta, but it is screamingly obvious that updating your operating system in this way is both slow and inherently fragile as anyone of those 6000 packages might hit a problem during upgrade and leave the system in a unknown state, especially since its common for packages to run scripts and similar as part of their upgrade.

Silverblue provides a revolutionary replacement for that process. First of all since it ships as a unified image we make life a lot easier for our QE team who can then test and verify against a single image which is in a known state. This in turn ensures that you as a user can feel confident that the new OS version will not break something on your system. And since the new version is just an image stored on your system next to the old one, upgrading is just about rebooting your system. There is no waiting for individual packages to get upgraded, as everything is already there and ready. Compare it to booting into a different kernel version on Fedora, it is quick and trivial.
And this also means that in the unlikely case that there is a problem with the new OS version you can just as easily go back to the previous version, by rebooting again and choosing to boot into that version. So you basically have instant upgrades with instant rollback if needed.
We believe this will radically change the way you look at OS upgrades forever, in fact you might almost forget they are happening.

And since Silverblue will basically be a Flatpak (and other containers) only OS you will have a clean delimitation between OS and applications. This means that even if we do major updates to the host, your applications should remain unaffected by the host OS update.
In fact we have some very interesting developments underway for Flatpak, with some major new efforts underway, efforts that I would love to talk about, but they are tied to some major Red Hat announcements that will happen at this years Red Hat Summit which will happen on May 7th – May 9th, so I will leave it as a teaser and then let you all know once the Summit is underway and Red Hats related major announcements are done.

There is a lot of work happening around Silverblue and as it happens Matthias Clasen wrote a long blog entry about it today. That blog goes into a lot more details on some of the Silverblue work items we been doing.

Anyway, I feel really excited about Silverblue and as we continue to refine the experience and figure out how everything will look in this brave new world I am sure everyone else will get excited too. Silverblue represents the single biggest evolution of the Linux desktop since the original GNOME and KDE releases back in the late nineties. It is not just about trying to tweak the existing experience, but an attempt at taking a big leap forward and provide an operating system that embodies all that we learned over these last 20 years and provide a natural home for developers and creators of all kind in our container centric computing future. Be sure to grab the Silverblue image of Fedora 30 beta and give it a test run. I recommend activating flathub.org repo to get started in order to get a decent range of applications available. As we move forward we are working hard to ensure that you have the world of applications available out of the box, so no need to go an enable any 3rd party repositories, but there are some more work that needs to happen before we can do that.

Summary
So Fedora Workstation 30 is going to be another exiting release of both of traditional RPM based Workstation version and of Silverblue, and I hope they will encourage even more people to join our rapidly growing Fedora community. Be sure to join us in #fedora-workstation on freenode IRC to talk!

April 02, 2019

Portable Services Walkthrough (Go Edition)

A few months ago I posted a blog story with a walkthrough of systemd Portable Services. The example service given was written in C, and the image was built with mkosi. In this blog story I'd like to revisit the exercise, but this time focus on a different aspect: modern programming languages like Go and Rust push users a lot more towards static linking of libraries than the usual dynamic linking preferred by C (at least in the way C is used by traditional Linux distributions).

Static linking means we can greatly simplify image building: if we don't have to link against shared libraries during runtime we don't have to include them in the portable service image. And that means pretty much all need for building an image from a Linux distribution of some kind goes away as we'll have next to no dependencies that would require us to rely on a distribution package manager or distribution packages. In fact, as it turns out, we only need as few as three files in the portable service image to be fully functional.

So, let's have a closer look how such an image can be put together. All of the following is available in this git repository.

A Simple Go Service

Let's start with a simple Go service, an HTTP service that simply counts how often a page from it is requested. Here are the sources: main.go — note that I am not a seasoned Go programmer, hence please be gracious.

The service implements systemd's socket activation protocol, and thus can receive bound TCP listener sockets from systemd, using the $LISTEN_PID and $LISTEN_FDS environment variables.

The service will store the counter data in the directory indicated in the $STATE_DIRECTORY environment variable, which happens to be an environment variable current systemd versions set based on the StateDirectory= setting in service files.

Two Simple Unit Files

When a service shall be managed by systemd a unit file is required. Since the service we are putting together shall be socket activatable, we even have two: portable-walkthrough-go.service (the description of the service binary itself) and portable-walkthrough-go.socket (the description of the sockets to listen on for the service).

These units are not particularly remarkable: the .service file primarily contains the command line to invoke and a StateDirectory= setting to make sure the service when invoked gets its own private state directory under /var/lib/ (and the $STATE_DIRECTORY environment variable is set to the resulting path). The .socket file simply lists 8080 as TCP/IP port to listen on.

An OS Description File

OS images (and that includes portable service images) generally should include an os-release file. Usually, that is provided by the distribution. Since we are building an image without any distribution let's write our own version of such a file. Later on we can use the portablectl inspect command to have a look at this metadata of our image.

Putting it All Together

The four files described above are already every file we need to build our image. Let's now put the portable service image together. For that I've written a Makefile. It contains two relevant rules: the first one builds the static binary from the Go program sources. The second one then puts together a squashfs file system combining the following:

  1. The compiled, statically linked service binary
  2. The two systemd unit files
  3. The os-release file
  4. A couple of empty directories such as /proc/, /sys/, /dev/ and so on that need to be over-mounted with the respective kernel API file system. We need to create them as empty directories here since Linux insists on directories to exist in order to over-mount them, and since the image we are building is going to be an immutable read-only image (squashfs) these directories cannot be created dynamically when the portable image is mounted.
  5. Two empty files /etc/resolv.conf and /etc/machine-id that can be over-mounted with the same files from the host.

And that's already it. After a quick make we'll have our portable service image portable-walkthrough-go.raw and are ready to go.

Trying it out

Let's now attach the portable service image to our host system:

# portablectl attach ./portable-walkthrough-go.raw
(Matching unit files with prefix 'portable-walkthrough-go'.)
Created directory /etc/systemd/system.attached.
Created directory /etc/systemd/system.attached/portable-walkthrough-go.socket.d.
Written /etc/systemd/system.attached/portable-walkthrough-go.socket.d/20-portable.conf.
Copied /etc/systemd/system.attached/portable-walkthrough-go.socket.
Created directory /etc/systemd/system.attached/portable-walkthrough-go.service.d.
Written /etc/systemd/system.attached/portable-walkthrough-go.service.d/20-portable.conf.
Created symlink /etc/systemd/system.attached/portable-walkthrough-go.service.d/10-profile.conf → /usr/lib/systemd/portable/profile/default/service.conf.
Copied /etc/systemd/system.attached/portable-walkthrough-go.service.
Created symlink /etc/portables/portable-walkthrough-go.raw → /home/lennart/projects/portable-walkthrough-go/portable-walkthrough-go.raw.

The portable service image is now attached to the host, which means we can now go and start it (or even enable it):

# systemctl start portable-walkthrough-go.socket

Let's see if our little web service works, by doing an HTTP request on port 8080:

# curl localhost:8080
Hello! You are visitor #1!

Let's try this again, to check if it counts correctly:

# curl localhost:8080
Hello! You are visitor #2!

Nice! It worked. Let's now stop the service again, and detach the image again:

# systemctl stop portable-walkthrough-go.service portable-walkthrough-go.socket
# portablectl detach portable-walkthrough-go
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.d/10-profile.conf.
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.d/20-portable.conf.
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.d.
Removed /etc/systemd/system.attached/portable-walkthrough-go.socket.
Removed /etc/systemd/system.attached/portable-walkthrough-go.socket.d/20-portable.conf.
Removed /etc/systemd/system.attached/portable-walkthrough-go.socket.d.
Removed /etc/portables/portable-walkthrough-go.raw.
Removed /etc/systemd/system.attached.

And there we go, the portable image file is detached from the host again.

A Couple of Notes

  1. Of course, this is a simplistic example: in real life services will be more than one compiled file, even when statically linked. But you get the idea, and it's very easy to extend the example above to include any additional, auxiliary files in the portable service image.

  2. The service is very nicely sandboxed during runtime: while it runs as regular service on the host (and you thus can watch its logs or do resource management on it like you would do for all other systemd services), it runs in a very restricted environment under a dynamically assigned UID that ceases to exist when the service is stopped again.

  3. Originally I wanted to make the service not only socket activatable but also implement exit-on-idle, i.e. add a logic so that the service terminates on its own when there's no ongoing HTTP connection for a while. I couldn't figure out how to do this race-freely in Go though, but I am sure an interested reader might want to add that? By combining socket activation with exit-on-idle we can turn this project into an excercise of putting together an extremely resource-friendly and robust service architecture: the service is started only when needed and terminates when no longer needed. This would allow to pack services at a much higher density even on systems with few resources.

  4. While the basic concepts of portable services have been around since systemd 239, it's best to try the above with systemd 241 or newer since the portable service logic received a number of fixes since then.

Further Reading

A low-level document introducing Portable Services is shipped along with systemd.

Please have a look at the blog story from a few months ago that did something very similar with a service written in C.

There are also relevant manual pages: portablectl(1) and systemd-portabled(8).

April 01, 2019

Running Android has some advantages compared to native Linux applications, for example with regard to the availability of applications and application developers.

For current non-Android systems, this work enables a path forward to running Android applications in the same graphical environment as traditional non-Android applications are run.

What is SPURV?

SPURV is our experimental containerized Android environment, and this is a quick overview of what it is.

It's aptly named after the first robotic fish since a common Android naming scheme is fish-themed names. Much like its spiritual ancestor Goldfish, the Android emulator.

Other Android Compatibility Layers

This means that Anbox which is LXC based, is different from SPURV in terms of how hardware is accessed. The hardware access that Anbox provides in indirect, and …

Arm driver timeline

The process of reverse engineering Arm GPUs has been going on for a long time, starting with Luc Verhaegens work on the low-end Mali 2/3/400 series of GPUs based on the Arm Utgard family of GPUs.
This driver has recently seen a lot new attention and is itself progressing quickly, which means it will likely be accepted into the kernel soon.
A piece of trivia is that this GPU architecture was what Arm received when they purchased the Norwegian GPU IP vendor Falanx Microsystems.

The Mali T and G-series of GPUs are based on the Midgard and Bifrost architectures respectively, both of which are quite different from the 2/3/400 series. However the T and G-series are somewhat similar at least when …

Back in October, Panfrost ran some simple benchmarks, like glmark. Five months later, Panfrost has grown from running benchmarks to real-world apps, like Kodi, and 3D games like SuperTuxKart and Neverball.

Since the previous post, there have been major improvements across every part of the aspect culminating in this milestone. On the kernel side, my co-contributors Tomeu Vizoso and Rob Herring have created a modern kernel driver, suitable for mainline inclusion. Panfrost now uses this upstream-friendly driver, rather than relying on a modified legacy kernel as in the past. The new kernel module is currently under review for mainline inclusion. You can read more about this progress on Tomeu’s blog.

Outside the kernel, however, the changes have been no less significant. Early development was constrained to our own project repositories, as the code was not yet ready to general users. In early February, thanks in part to the progress on the kernel-space, we flew out of our nest, and Panfrost was merged into upstream Mesa, the central repository for free software graphics drivers. Development now occurs in-tree in Mesa.

We have continued decoding new aspects of the hardware and implementing support in the driver. A few miscellaneous additions include cube maps, gl_PointSize and gl_PointCoord, linear depth rendering, performance counters, and new shader instructions.

One area of particular improvement has been our understanding of the hardware formats (like “4-element vector of 32-bit floats” or “single 16-bit unsigned normalized integer”). In Panfrost’s early days, we knew magic numbers to distinguish a few of the most common formats, but the underlying meanings of the voodoo patterns were elusive. Further, the format bits for textures and attributes were not unified, further hindering the diversity of supported formats available. However, Connor Abbott and I have since identified the underlying meaning of the format codes for textures, attributes, and framebuffers. This new understanding allows for the magic numbers to be replaced by a streamlined format selection routine, mapping Gallium’s formats to the hardware’s and supporting the full spectrum of formats required for a conformant driver. Panfrost is now passing texture format tests for OpenGL ES 2.0.

From a performance standpoint, various optimizations have been added. In particular, a fast path likely relating to the “tiler” in the hardware was discovered. When this fast path is used, performance on geometry heavy scenes skyrockets. In one extreme demo (shading the Stanford bunny), performance more than tripled, and these gains trickle down to real-world games.

Features aside, one of the key issues with an early driver is the brittleness and instability. Accordingly, to guarantee robustness, I now test with the drawElements Quality Program (dEQP), which includes comprehensive code correctness tests. Although we’re still a while away from conformance, I now systematically step through identified issues and resolved the bugs, translating to fixes across every aspect of the driver.

One real-world benefactor of these fixes is the Kodi media center, which today works well using Panfrost to achieve a fluid interface on Midgard devices. For standalone installations of Kodi, today there are experimental images featuring Kodi and Panfrost. To further improve fluidity, Kodi and Panfrost can even interoperate with the video decoding acceleration, contingent on cooperative kernel drivers.

For users more inclined to gaming, some 3D games are beginning to show signs of life with Panfrost. For instance, the classic (OpenGL ES 2.0) backend of the ever-popular kart racing game, SuperTuxKart, now renders with some minor glitches with Panfrost. Performance is playable on simple tracks, though we have many opportunities for optimization. To bring up this racing game, I added support for complex control flow in the compiler. Traditionally, control flow is discouraged in graphics, due to the architecture of desktop GPUs (thread “warps”). However, Midgard does not feature these traditional optimizations, negating the performance penalty for branching from control flow. The implementation required new bookkeeping in the compiler, as well as an investigation into long jumps due to the size of the game’s “uber-shader”. In total, this compiler improvement – paired with assorted bug fixes – allows SuperTuxKart to run.

Likewise, Neverball is playable (and fun!) with Panfrost, although there are rendering anomalies relating to the currently unimplemented legacy feature “point sprites”. In contrast to Kodi and SuperTuxKart, which make liberal use of custom shaders, Neverball is implemented with purely fixed-function desktop OpenGL. This poses an interesting challenge, as Midgard is designed specifically for embedded graphics; the blob does not support this desktop superset. But that’s no reason we can’t!

Like most modern free software OpenGL drivers, Panfrost is built atop the modular “Gallium” architecture. This architecture abstracts away interface details, like desktop versus embedded OpenGL, normalizing differences to allow drivers to focus on the hardware itself. This abstraction means that by implementing Panfrost as an embedded driver atop Gallium, we get a partial desktop OpenGL implementation “free”.

Of course, there is functionality in the desktop superset that does not exist in the embedded profile. While Gallium tries to paper over these differences, the driver is required to implement features like point sprites and alpha testing to expose the corresponding desktop functions. So, the bring-up of desktop OpenGL applications like Neverball has led me to implement some of this additional functionality. Translating the “alpha test” to a conditional discard instruction in the fragment shader works. Similarly, translating “point sprites” to the modern equivalent, gl_PointCoord, is planned.

Interestingly, the hardware does support some functionality only available through the full desktop profile. It is unknown how many “hidden features” of this type are supported; as the blob does not appear to use them, these features were discovered purely by accident on our part. For instance, in addition to the familiar set of “points, lines, and triangles”, Midgard can natively render quadrilaterals and polygons. The existence of this feature will suggested by the corresponding performance counters, and the driver-side mechanics were determined by manual bruteforce of the primitive selection bits. Nevertheless, now that these bonus features are understood, quads can be drawn from desktop applications without first translating to indexed triangles in software. Similarly, it appears in addition to the embedded standard of boolean occlusion queries, setting a chicken bit enable the hardware’s hidden support for precise occlusion counters, a useful desktop feature.

Going forward, although the implementation of OpenGL ES 2.0 is approaching feature-completeness, we will continue to polish the driver, guided by dEQP. Orthogonal to conformance, further optimization to improve performance and lower memory usage is on the roadmap.

It’s incredible to reflect back and realise just one year ago, work had not even begun on writing a real OpenGL driver. Yet here we are today with an increasingly usable, exclusive free software, hardware-accelerated desktop with Mali Midgard graphics.

Frost on.

March 29, 2019

Aside from the regular board elections we also have some bylaw changes to vote on. As usual with bylaw changes, we need a supermajority of all members to agree - if you don’t vote you essentially reject it, but the board has no way of knowing.

Please see the detailed changes of the bylaws, make up your mind, and go voting on the shiny new members page.

March 28, 2019

Today the announcement went out that the Linux Vendor Firmware Service has become and official Linux Foundation service. For those that don’t know it yet LVFS is a service that provides firmware for your linux running hardware and it was one off our initial efforts as part of the Fedora Workstation effort to drain the swamp in terms of making Linux a first class desktop operating system.

The effort came about due to Peter Jones, who is Red Hats representative to the UEFI standards body, approaching me to talk about how Microsoft was trying to push for a standardized way to ship UEFI firmware for Windows and how UEFI being a standard openeded a path for us to actually get full support for this without each vendor having to ship and maintain their own proprietary firmware tools. So we did a meeting with Peter Jones and also brought in Richard Hughes who had already been looking at the problem of firmware updates in Linux, partly due to his ColorHug hardware, and the effort got started with Peter working on the low level OS tooling and Richard taking on building the service to drive distribution and the work to integrate it all into GNOME Software. One concern we had of course was if we could reach critical mass for this and get vendors interested, but luckily Dell was just as keen on improving firmware handling under Linux as us and signed on from the start. Having Dell onboard helped give the effort a lot of credibility and as the service matured we ended up having more and more vendors sign up. We also reached out through Red Hats partnerships to push vendors to adopt supporting it. As Richard also mentions in his interview about it, we had made the solution as similar to Microsofts as possible to decrease the threshold for hardware vendors to join, the goal being that if they did the basic work to support Windows they could more or less just ship the same firmware file to LVFS.

One issue that we had gone back on forth about inside Red Hat was the formal setup of the service. While we all agreed the service was hugely beneficial it felt like something that should be a shared service for all of Linux and we felt that if the service was Red Hat provided it might dissuade other vendors to join. So we started looking around for a neutral place to land the service while in the meantime LVFS had a sort of autonomous status being run as a community effort by Richard Hughes. We ended up talking to Chris Wright, the Red Hat CTO, about the project and he offered to facilitate contact with the Linux Foundation. The initial meetings was very positive and the Linux Foundation seemed interested in running the service right from the start, it did end up taking us quite some time to clear all formal and technical hurdles to get there, but I for one is very happy to see the LVFS now being a vendor neutral service provided by the Linux Foundation.

So a big thank you to Richard Hughes, Peter Jones, Chris Wright, Mario Limonciello and Dell and the Linux Foundation for their help in getting us here. And also a big thank you to Fedora and the Fedora community for their help with providing us a place to develop and polish up this service to the benefit of all. To me this is one of many examples of how Fedora keeps innovating and leading the way on Desktop linux.

March 21, 2019

I had to work on an image yesterday where I couldn't install anything and the amount of pre-installed tools was quite limited. And I needed to debug an input device, usually done with libinput record. So eventually I found that hexdump supports formatting of the input bytes but it took me a while to figure out the right combination. The various resources online only got me partway there. So here's an explanation which should get you to your results quickly.

By default, hexdump prints identical input lines as a single line with an asterisk ('*'). To avoid this, use the -v flag as in the examples below.

hexdump's format string is single-quote-enclosed string that contains the count, element size and double-quote-enclosed printf-like format string. So a simple example is this:


$ hexdump -v -e '1/2 "%d\n"'
-11643
23698
0
0
-5013
6
0
0
This prints 1 element ('iteration') of 2 bytes as integer, followed by a linebreak. Or in other words: it takes two bytes, converts it to int and prints it. If you want to print the same input value in multiple formats, use multiple -e invocations.

$ hexdump -v -e '1/2 "%d "' -e '1/2 "%x\n"'
-11568 d2d0
23698 5c92
0 0
0 0
6355 18d3
1 1
0 0
This prints the same 2-byte input value, once as decimal signed integer, once as lowercase hex. If we have multiple identical things to print, we can do this:

$ hexdump -v -e '2/2 "%6d "' -e '" hex:"' -e '4/1 " %x"' -e '"\n"'
-10922 23698 hex: 56 d5 92 5c
0 0 hex: 0 0 0 0
14879 1 hex: 1f 3a 1 0
0 0 hex: 0 0 0 0
0 0 hex: 0 0 0 0
0 0 hex: 0 0 0 0
Which prints two elements, each size 2 as integers, then the same elements as four 1-byte hex values, followed by a linebreak. %6d is a standard printf instruction and documented in the manual.

Let's go and print our protocol. The struct representing the protocol is this one:


struct input_event {
#if (__BITS_PER_LONG != 32 || !defined(__USE_TIME_BITS64)) && !defined(__KERNEL__)
struct timeval time;
#define input_event_sec time.tv_sec
#define input_event_usec time.tv_usec
#else
__kernel_ulong_t __sec;
#if defined(__sparc__) && defined(__arch64__)
unsigned int __usec;
#else
__kernel_ulong_t __usec;
#endif
#define input_event_sec __sec
#define input_event_usec __usec
#endif
__u16 type;
__u16 code;
__s32 value;
};
So we have two longs for sec and usec, two shorts for type and code and one signed 32-bit int. Let's print it:

$ hexdump -v -e '"E: " 1/8 "%u." 1/8 "%06u" 2/2 " %04x" 1/4 "%5d\n"' /dev/input/event22
E: 1553127085.097503 0002 0000 1
E: 1553127085.097503 0002 0001 -1
E: 1553127085.097503 0000 0000 0
E: 1553127085.097542 0002 0001 -2
E: 1553127085.097542 0000 0000 0
E: 1553127085.108741 0002 0001 -4
E: 1553127085.108741 0000 0000 0
E: 1553127085.118211 0002 0000 2
E: 1553127085.118211 0002 0001 -10
E: 1553127085.118211 0000 0000 0
E: 1553127085.128245 0002 0000 1
And voila, we have our structs printed in the same format evemu-record prints out. So with nothing but hexdump, I can generate output I can then parse with my existing scripts on another box.

March 15, 2019

Ho ho ho, let's write libinput. No, of course I'm not serious, because no-one in their right mind would utter "ho ho ho" without a sufficient backdrop of reindeers to keep them sane. So what this post is instead is me writing a nonworking fake libinput in Python, for the sole purpose of explaining roughly how libinput's architecture looks like. It'll be to the libinput what a Duplo car is to a Maserati. Four wheels and something to entertain the kids with but the queue outside the nightclub won't be impressed.

The target audience are those that need to hack on libinput and where the balance of understanding vs total confusion is still shifted towards the latter. So in order to make it easier to associate various bits, here's a description of the main building blocks.

libinput uses something resembling OOP except that in C you can't have nice things unless what you want is a buffer overflow\n\80xb1001af81a2b1101. Instead, we use opaque structs, each with accessor methods and an unhealthy amount of verbosity. Because Python does have classes, those structs are represented as classes below. This all won't be actual working Python code, I'm just using the syntax.

Let's get started. First of all, let's create our library interface.


class Libinput:
@classmethod
def path_create_context(cls):
return _LibinputPathContext()

@classmethod
def udev_create_context(cls):
return _LibinputUdevContext()

# dispatch() means: read from all our internal fds and
# call the dispatch method on anything that has changed
def dispatch(self):
for fd in self.epoll_fd.get_changed_fds():
self.handlers[fd].dispatch()

# return whatever the next event is
def get_event(self):
return self._events.pop(0)

# the various _notify functions are internal API
# to pass things up to the context
def _notify_device_added(self, device):
self._events.append(LibinputEventDevice(device))
self._devices.append(device)

def _notify_device_removed(self, device):
self._events.append(LibinputEventDevice(device))
self._devices.remove(device)

def _notify_pointer_motion(self, x, y):
self._events.append(LibinputEventPointer(x, y))



class _LibinputPathContext(Libinput):
def add_device(self, device_node):
device = LibinputDevice(device_node)
self._notify_device_added(device)

def remove_device(self, device_node):
self._notify_device_removed(device)


class _LibinputUdevContext(Libinput):
def __init__(self):
self.udev = udev.context()

def udev_assign_seat(self, seat_id):
self.seat_id = seat.id

for udev_device in self.udev.devices():
device = LibinputDevice(udev_device.device_node)
self._notify_device_added(device)


We have two different modes of initialisation, udev and path. The udev interface is used by Wayland compositors and adds all devices on the given udev seat. The path interface is used by the X.Org driver and adds only one specific device at a time. Both interfaces have the dispatch() and get_events() methods which is how every caller gets events out of libinput.

In both cases we create a libinput device from the data and create an event about the new device that bubbles up into the event queue.

But what really are events? Are they real or just a fidget spinner of our imagination? Well, they're just another object in libinput.


class LibinputEvent:
@property
def type(self):
return self._type

@property
def context(self):
return self._libinput

@property
def device(self):
return self._device

def get_pointer_event(self):
if instanceof(self, LibinputEventPointer):
return self # This makes more sense in C where it's a typecast
return None

def get_keyboard_event(self):
if instanceof(self, LibinputEventKeyboard):
return self # This makes more sense in C where it's a typecast
return None


class LibinputEventPointer(LibinputEvent):
@property
def time(self)
return self._time/1000

@property
def time_usec(self)
return self._time

@property
def dx(self)
return self._dx

@property
def absolute_x(self):
return self._x * self._x_units_per_mm

@property
def absolute_x_transformed(self, width):
return self._x * width/ self._x_max_value
You get the gist. Each event is actually an event of a subtype with a few common shared fields and a bunch of type-specific ones. The events often contain some internal value that is calculated on request. For example, the API for the absolute x/y values returns mm, but we store the value in device units instead and convert to mm on request.

So, what's a device then? Well, just another I-cant-believe-this-is-not-a-class with relatively few surprises:


class LibinputDevice:
class Capability(Enum):
CAP_KEYBOARD = 0
CAP_POINTER = 1
CAP_TOUCH = 2
...

def __init__(self, device_node):
pass # no-one instantiates this directly

@property
def name(self):
return self._name

@property
def context(self):
return self._libinput_context

@property
def udev_device(self):
return self._udev_device

@property
def has_capability(self, cap):
return cap in self._capabilities

...
Now we have most of the frontend API in place and you start to see a pattern. This is how all of libinput's API works, you get some opaque read-only objects with a few getters and accessor functions.

Now let's figure out how to work on the backend. For that, we need something that handles events:


class EvdevDevice(LibinputDevice):
def __init__(self, device_node):
fd = open(device_node)
super().context.add_fd_to_epoll(fd, self.dispatch)
self.initialize_quirks()

def has_quirk(self, quirk):
return quirk in self.quirks

def dispatch(self):
while True:
data = fd.read(input_event_byte_count)
if not data:
break

self.interface.dispatch_one_event(data)

def _configure(self):
# some devices are adjusted for quirks before we
# do anything with them
if self.has_quirk(SOME_QUIRK_NAME):
self.libevdev.disable(libevdev.EV_KEY.BTN_TOUCH)


if 'ID_INPUT_TOUCHPAD' in self.udev_device.properties:
self.interface = EvdevTouchpad()
elif 'ID_INPUT_SWITCH' in self.udev_device.properties:
self.interface = EvdevSwitch()
...
else:
self.interface = EvdevFalback()


class EvdevInterface:
def dispatch_one_event(self, event):
pass

class EvdevTouchpad(EvdevInterface):
def dispatch_one_event(self, event):
...

class EvdevTablet(EvdevInterface):
def dispatch_one_event(self, event):
...


class EvdevSwitch(EvdevInterface):
def dispatch_one_event(self, event):
...

class EvdevFallback(EvdevInterface):
def dispatch_one_event(self, event):
...
Our evdev device is actually a subclass (well, C, *handwave*) of the public device and its main function is "read things off the device node". And it passes that on to a magical interface. Other than that, it's a collection of generic functions that apply to all devices. The interfaces is where most of the real work is done.

The interface is decided on by the udev type and is where the device-specifics happen. The touchpad interface deals with touchpads, the tablet and switch interface with those devices and the fallback interface is that for mice, keyboards and touch devices (i.e. the simple devices).

Each interface has very device-specific event processing and can be compared to the Xorg synaptics vs wacom vs evdev drivers. If you are fixing a touchpad bug, chances are you only need to care about the touchpad interface.

The device quirks used above are another simple block:


class Quirks:
def __init__(self):
self.read_all_ini_files_from_directory('$PREFIX/share/libinput')

def has_quirk(device, quirk):
for file in self.quirks:
if quirk.has_match(device.name) or
quirk.has_match(device.usbid) or
quirk.has_match(device.dmi):
return True
return False

def get_quirk_value(device, quirk):
if not self.has_quirk(device, quirk):
return None

quirk = self.lookup_quirk(device, quirk)
if quirk.type == "boolean":
return bool(quirk.value)
if quirk.type == "string":
return str(quirk.value)
...
A system that reads a bunch of .ini files, caches them and returns their value on demand. Those quirks are then used to adjust device behaviour at runtime.

The next building block is the "filter" code, which is the word we use for pointer acceleration. Here too we have a two-layer abstraction with an interface.


class Filter:
def dispatch(self, x, y):
# converts device-unit x/y into normalized units
return self.interface.dispatch(x, y)

# the 'accel speed' configuration value
def set_speed(self, speed):
return self.interface.set_speed(speed)

# the 'accel speed' configuration value
def get_speed(self):
return self.speed

...


class FilterInterface:
def dispatch(self, x, y):
pass

class FilterInterfaceTouchpad:
def dispatch(self, x, y):
...

class FilterInterfaceTrackpoint:
def dispatch(self, x, y):
...

class FilterInterfaceMouse:
def dispatch(self, x, y):
self.history.push((x, y))
v = self.calculate_velocity()
f = self.calculate_factor(v)
return (x * f, y * f)

def calculate_velocity(self)
for delta in self.history:
total += delta
velocity = total/timestamp # as illustration only

def calculate_factor(self, v):
# this is where the interesting bit happens,
# let's assume we have some magic function
f = v * 1234/5678
return f
So libinput calls filter_dispatch on whatever filter is configured and passes the result on to the caller. The setup of those filters is handled in the respective evdev interface, similar to this:

class EvdevFallback:
...
def init_accel(self):
if self.udev_type == 'ID_INPUT_TRACKPOINT':
self.filter = FilterInterfaceTrackpoint()
elif self.udev_type == 'ID_INPUT_TOUCHPAD':
self.filter = FilterInterfaceTouchpad()
...
The advantage of this system is twofold. First, the main libinput code only needs one place where we really care about which acceleration method we have. And second, the acceleration code can be compiled separately for analysis and to generate pretty graphs. See the pointer acceleration docs. Oh, and it also allows us to easily have per-device pointer acceleration methods.

Finally, we have one more building block - configuration options. They're a bit different in that they're all similar-ish but only to make switching from one to the next a bit easier.


class DeviceConfigTap:
def set_enabled(self, enabled):
self._enabled = enabled

def get_enabled(self):
return self._enabled

def get_default(self):
return False

class DeviceConfigCalibration:
def set_matrix(self, matrix):
self._matrix = matrix

def get_matrix(self):
return self._matrix

def get_default(self):
return [1, 0, 0, 0, 1, 0, 0, 0, 1]
And then the devices that need one of those slot them into the right pointer in their structs:

class EvdevFallback:
...
def init_calibration(self):
self.config_calibration = DeviceConfigCalibration()
...

def handle_touch(self, x, y):
if self.config_calibration is not None:
matrix = self.config_calibration.get_matrix

x, y = matrix.multiply(x, y)
self.context._notify_pointer_abs(x, y)

And that's basically it, those are the building blocks libinput has. The rest is detail. Lots of it, but if you understand the architecture outline above, you're most of the way there in diving into the details.

One of the features in the soon-to-be-released libinput 1.13 is location-based touch arbitration. Touch arbitration is the process of discarding touch input on a tablet device while a pen is in proximity. Historically, this was provided by the kernel wacom driver but libinput has had userspace touch arbitration for quite a while now, allowing for touch arbitration where the tablet and the touchscreen part are handled by different kernel drivers.

Basic touch arbitratin is relatively simple: when a pen goes into proximity, all touches are ignored. When the pen goes out of proximity, new touches are handled again. There are some extra details (esp. where the kernel handles arbitration too) but let's ignore those for now.

With libinput 1.13 and in preparation for the Dell Canvas Dial Totem, the touch arbitration can now be limited to a portion of the screen only. On the totem (future patches, not yet merged) that portion is a square slightly larger than the tool itself. On normal tablets, that portion is a rectangle, sized so that it should encompass the users's hand and area around the pen, but not much more. This enables users to use both the pen and touch input at the same time, providing for bimanual interaction (where the GUI itself supports it of course). We use the tilt information of the pen (where available) to guess where the user's hand will be to adjust the rectangle position.

There are some heuristics involved and I'm not sure we got all of them right so I encourage you to give it a try and file an issue where it doesn't behave as expected.

March 13, 2019

Arm driver timeline

The process of reverse engineering Arm GPUs has been going on for a long time, starting with Luc Verhaegens work on the low-end Mali 2/3/400 series of GPUs based on the Arm Utgard family of GPUs.
This driver has recently seen a lot new attention and is itself progressing quickly, which means it will likely be accepted into the kernel soon.
A piece of trivia is that this GPU architecture was what Arm received when they purchased the Norwegian GPU IP vendor Falanx Microsystems.

The Mali T and G-series of GPUs are based on the Midgard and Bifrost architectures respectively, both of which are quite different from the 2/3/400 series. However the T and G-series are somewhat similar at least when …

Arm driver timeline

The process of reverse engineering Arm GPUs has been going on for a long time, starting with Luc Verhaegens work on the low-end Mali 2/3/400 series of GPUs based on the Arm Utgard family of GPUs.
This driver has recently seen a lot new attention and is itself progressing quickly, which means it will likely be accepted into the kernel soon.
A piece of trivia is that this GPU architecture was what Arm received when they purchased the Norwegian GPU IP vendor Falanx Microsystems.

The Mali T and G-series of GPUs are based on the Midgard and Bifrost architectures respectively, both of which are quite different from the 2/3/400 series. However the T and G-series are somewhat similar at least when …

Imagine a finite resource that you want to distribute amongst peers in a fair manner. If you know the number of peers to be n, the problem becomes trivial and you can assign every peer 1/n-th of the total. This way every peer gets the same amount, while no part of the resource stays unused. But what if the number of peers is only known retrospectively? That is, how many resources do you grant a peer if you do not know whether there are more peers or not? How do you define “fairness”? And how do you make sure as little of the resource as possible stays unused?

The fairdist algorithm provides one possible solution to this problem. It defines how many resources a new peer is assigned, considering the following propertis:

  1. The total amount of resources already distributed to other peers. This is also referred to by the term consumption.
  2. The number of peers that already got resources assigned.
  3. The amount of resources remaining. That is, the resources that are remaining to be distributed. This is also referred to by the term reserve.

The following is a mathematical proof of the properties of the fairdist algorithm. For the reference implementation of the algorithm and information on the different applications of it, see the r-fairdist project.

Prerequisites

We define a set of symbols up front, to keep the proofs shorter. Whenever these symbols are mentioned, the following definition applies:

  • Let be a total amount of consumed resources.
  • Let be a total amount of reserved resources.
  • Let be a number of peers that consumed resources.
  • Let be a function that computes the proportion of a peer can consume, based on the number of peers that currently have resources consumed.
  • Let be a function that computes the proportion of a peer is guaranteed, based on the number of peers that currently have resources consumed.

The algorithm considers a total amount of resources, but splits it into two separate parts, the remaining reserve and the consumed part . Their sum represents the total amount that was initially available. It then declares a function , which is the resource allocator. It will later on be used to calculate how many resources of the reserve a peer can allocate: . That is, defines the proportion of the reserve a new peer gets access to. The smaller it is, the more a peer gets.

Similarly, the guarantee is used to declare a lower bound of the total resources the allocator grants a new peer. That is, while is a function applied to allocations, is a property the allocator will guarantee you. Unlike an allocation, will later on be calculated based on the total amount of resources: . Again, the function defines the proportion that is guaranteed. So the smaller is, the stronger the guarantee becomes.

Definition

The allocator is said to guarantee the limit , if there exists a function so that for all , , and :

implies both:

The idea here is to define a function which calculates how many resources a new peer can allocate. That is, considering a new peer requests resources, it will get of the reserve. The first property of this implication guarantees that this allocation is bigger than, or equal to, the guaranteed total for each peer. The guaranteed total is calculated through based on the total amount of resources (which is the consumption plus the reserve).

If you now pick an allocator and a guarantee that fulfil this definition, the idea is that this ensures you that the allocator can be used to serve resource requests from new peers, and it ensures that regardless of how many peers will request resources, each one will be guaranteed an amount equal to, or bigger than, the guarantee .

This definition requires the existance of a reserve watermark . It uses this watermark as a selector for an inductive step. That is, if the requirements of this reserve selector are true, the second implication guarantees that they are true for an infinite number of following allocations. That is, the right hand side of the second implication matches exactly the requirement of the implication, once a single allocation was performed (i.e., a resource chunk was subtracted from the reserve and added to the total consumption, while the number of peers increased by one).

Note that if is , then the requirement of the implication is true for all and . This guarantees that there is always a situation where allocator can actually be applied.

Lemma 1

To prove an allocator guarantees , it is sufficient to show that fulfils:

and

This lemma is used to make it easier to prove a specific allocator guarantees a specific limit. Without it, each proof of the different allocators would have to replicate it.

However, this lemma also gives a better feeling of what the different functions actually mean. For instance, it clearly shows must always be smaller than , and that by a considerable amount. If , then no would ever fulfil this requirement (remember: ). At the same time, you can see the closer and are together, the smaller gets, and as such the requirements on the reserve get harder to fulfil.

The second requirement gives you a recursive equation to find an for any allocator you pick. Hence, in combination both these requirements show you an iterative process to find and , for any guarantee you pick. However, the closer and get, the harder it becomes to solve the recursive equation.

Proof

To show this lemma is true, we must show both implications of the definition are true. As first step, we show the first implication is true, which is:

We show this b starting with the left-hand side and showing it implies the right hand side, using the requisite of this lemma.

As second step, we need to show the second implication of the definition is true, which is:

To prove this, we start with the second requisite of this lemma and then show it implies the right-hand side of the implication, using the requisite of the implication.

Hint: The following introduction of is correct, since is per definition greater than , so neither side can be 0.

Theorem

The following allocators each guarantee the specified limit:

This theorem defines three different allocators for different guarantees. The last one provides the strongest guarantee. Both the allocation and the guarantee are quasilinear. It is thus a good fit for fair allocation schemes, while still being reasonably fast to compute.

The other two provide quadratic and exponential guarantees and are mostly listed for documentational purposes. With the quasilinear guarantees at hand, there is little reason to use the other two.

As you might notice, this theorem does not provide a solution where and become infinitesimally close. It remains open whether what this solution would look like. However, the listed quasilinear solution is good enough, that it is unlikely that better options exist, which can still be calculated in reasonable amounts of time.

Proof

We provide a function for each pair. We then substitute them in Lemma 1 and show through equivalence transformations that the assertions are true.

Proof 1: Exponential Guarantee

  • Allocator:
  • Guarantee:

Let .

Part 1:

Part 2:

Proof 2: Polynomial Guarantee

  • Allocator:
  • Guarantee:

Let .

Part 1:

Part 2:

Proof 3: Quasilinear Guarantee

  • Allocator:
  • Guarantee:

Let .

Part 1:

Part 2:

For this part, we rely on the following property:

This is true for all logarithms for all .

We now show the second requirement of the Lemma is true. However, we cannot use equivalence transformations as in the other proofs. Hence, we show it by implication.

March 08, 2019
GNOME 3.32 will very soon be released, so I thought I'd go back on a few of the things that happened with some of our content applications.

Videos
First, many thanks to Marta Bogdanowicz, Baptiste Mille-Mathias, Ekaterina Gerasimova and Andre Klapper who toiled away at updating Videos' user documentation since 2012, when it was still called “Totem”, and then again in 2014 when “Videos” appeared.

The other major change is that Videos is available, fully featured, from Flathub. It should play your Windows Movie Maker films, your circular wafers of polycarbonate plastic and aluminium, and your Devolver indie films. No more hunting codecs or libraries!

In the process, we also fixed a large number of outstanding issues, such as accommodating for the app menu's planned disappearance, moving the audio/video properties tab to nautilus proper, making the thumbnailer available as an independent module, making the MPRIS plugin work better and loads, loads mo.


Download on Flathub

Books

As Documents was removed from the core release, we felt it was time for Books to become independent. And rather than creating a new package inside a distribution, the Flathub version was updated. We also fixed a bunch of bugs, so that's cool :)
Download on Flathub

Weather

I didn't work directly on Weather, but I made some changes to libgweather which means it should be easier to contribute to its location database.

Adding new cities doesn't require adding a weather station by hand, it would just pick the closest one, and weather stations also don't need to be attached to cities either. They were usually attached to villages, sometimes hamlets!

The automatic tests are also more stringent, and test for more things, which should hopefully mean less bugs.

And even more Flatpaks

On Flathub, you'll also find some applications I packaged up in the last 6 months. First is Teo Thomson emulator, GBE+, a Game Boy emulator focused on accessories emulation, and a way to run your old Flash games offline.
March 07, 2019
There have been questions about the Fedora 30 Flicker Free Boot Change in various places, here is a FAQ which hopefully answers most questions:

1) I get a black screen for a couple of seconds during boot?

1.1) If you have an AMD or Nvidia GPU driving your screen, then this is normal. The graphics drivers for AMD and Nvidia GPUs reset the hardware when loading, this will cause the display to temporarily go black. There is nothing which can be done about this.

1b) If you have a somewhat older Intel GPU (your CPU is pre Skylake) then the i915 driver's support to skip the mode-reset is disabled by default (for now) to fix this add "i915.fastboot=1" to your kernel commandline. For more info on modifying the kernel cmdline, see question 6. .

1c) Do "ls /sys/firmware/efi/efivars" if you get a "No such file or directory" error then your system is booting in classic BIOS mode instead of UEFI mode, to fix this you need to re-install and boot the livecd/installer in UEFI mode when installing. Alternatively you can try to convert your existing install, note this is quite tricky, make backups first!

1d) Your system may be using the classic VGA BIOS during boot despite running in UEFI mode. Often you can select BIOS mode compatility in your BIOS settings aka the CSM setting. If you can select this on a per component level, set the VIDEO/VGA option to "UEFI only" or "UEFI first", alternatively you can try completely disabling the CSM mode.

2) I get a grey-background instead of the firmware splash while Fedora is booting?

Do "ls /sys/firmware/acpi/bgrt" if you get a "No such file or directory" error then try answers 1c and 1d . If you do have a /sys/firmware/acpi/bgrt directory, but you are still getting the Fedora logo + spinner on a grey background instead of on top of the firmware-splash, please file a bug about this and drop me a mail with a link to the bug.

3) Getting rid of the vendor-logo/firmware-splash being shown while Fedora is booting?

If you don't want the firmware-splash to be used as background during boot, you can switch plymouth to the spinner theme, which is identical to the new bgrt theme, except that it does not use the firmware-splash as background, to do this execute the following command from a terminal:
"sudo plymouth-set-default-theme -R spinner"

4) Keeping the firmware-splash as background while unlocking the disk?

If you prefer this, it is possible to keep the firmware-splash as background while the diskcrypt password is shown. To do this do the following:

  1. "sudo mkdir /usr/share/plymouth/themes/mybgrt"

  2. "sudo cp /usr/share/plymouth/themes/bgrt/bgrt.plymouth /usr/share/plymouth/themes/mybgrt/mybgrt.plymouth"

  3. edit /usr/share/plymouth/themes/mybgrt/mybgrt.plymouth, change DialogClearsFirmwareBackground=true to DialogClearsFirmwareBackground=false, change DialogVerticalAlignment=.382 to DialogVerticalAlignment=.6

  4. "sudo plymouth-set-default-theme -R mybgrt"

Note if you do this the disk-passphrase entry dialog may be partially drawn over the vendor-logo part of the firmware-splash, if this happens then try increasing DialogVerticalAlignment to e.g. 0.7 .

5) Get detailed boot progress instead of the boot-splash ?

To get detailed boot progress info press ESC during boot.

6) Always get detailed boot progress instead of the boot-splash ?

To always get detailed boot progress instead of the boot-splash, remove "rhgb" from your kernel commandline:

Edit /etc/default/grub and remove rhgb from GRUB_CMDLINE_LINUX and then if you are booting using UEFI (see 1c) run:
"grub2-mkconfig -o /etc/grub2-efi.cfg"
else (if you are booting using classic BIOS boot) run:
"grub2-mkconfig -o /etc/grub2.cfg".
March 04, 2019

The video

Below you can see the same scene that I recorded in January, which was rendered by Panfrost in Mesa but using Arm's kernel driver. This time, Panfrost is using a new kernel driver that is in a form close to be acceptable in the mainline kernel:

The history behind it

During the past two months Rob Herring and I have been working on a new driver for Midgard and Bifrost GPUs that could be accepted mainline.

Arm already maintains a driver out of tree with an acceptable open source license, but it doesn't implement the DRM ABI and several design considerations make it unsuitable for inclusion in mainline Linux.

The absence of a driver in mainline prevents users from keeping their kernels up-to-date and hurts integration with other parts of the free software stack. It also discourages SoC and BSP vendors from submitting their code to mainline, and hurts their ability to track mainline closely.

Besides the code of the driver itself, there's one more condition for mainline inclusion: an open source implementation of the userspace library needs to exist, so other kernel contributors can help verifying, debugging and maintaining the kernel driver. It's an enormous pile of difficult work to reverse engineer the inner workings of a GPU and then implement a compiler and command submission infrastructure, so big thanks to Alyssa Rosenzweig for leading that effort.

Upstream status

Most of the Panfrost code is already part of mainline Mesa, with the code that directly interacts with the new DRM driver being in the review stage. Currently targeted GPUs are T760 and T860, with the RK3399 being the SoC more often used for testing.

The kernel driver is being developed in the open and though we are trying to follow the best practices as displayed by other DRM drivers, there's a number of tasks that need to be done before we consider it ready for submission.

The work ahead

In the kernel:
- Make MMU code more complete for correctness and better performance
- Handle errors and hangs and correctly reset the GPU
- Improve fence handling
- Test with compute shaders (to check completeness of the ABI)
- Lots of cleanups and bug fixing!

In Mesa:
- Get GNOME Shell working
- Get Chromium working with accelerated WebGL
- Get all of glmark2 working
- Get a decent subset of dEQP passing and use it in CI
- Keep refactoring the code
- Support more hardware

Get the code

The exact bits used for the demo recorded above are in various stages of getting upstreamed to the various upstreams, but here are in branches for easier reproduction:


February 28, 2019
For those of you who want to give the new Flicker Free Boot enhancements for Fedora 30 a try on Fedora 29, this is possible now since the latest F29 bugfix update for plymouth also includes the new theme used in Fedora 30.

If you want to give this a try, add "plymouth.splash_delay=0 i915.fastboot=1" to your kernel commandline:

  1. Edit /etc/default/grub, add "plymouth.splash_delay=0 i915.fastboot=1" to GRUB_CMDLINE_LINUX

  2. Run "sudo grub2-mkconfig -o /etc/grub2-efi.cfg"

Note that i915.fastboot=1 causes the backlight to not work on Haswell CPUs (e.g. i5-42xx CPUs), this is fixed in the 5.0 kernels which are currently available in rawhide/F30.

Run the following commands to get the updated plymouth and the new theme and to select the new theme:

  1. "sudo dnf update plymouth*"

  2. "sudo dnf install plymouth-theme-spinner"

  3. "sudo cp /usr/share/pixmaps/fedora-gdm-logo.png /usr/share/plymouth/themes/spinner/watermark.png"

  4. "sudo plymouth-set-default-theme -R bgrt"

Now on the next boot / installing of offline-updates you should get the new theme.
February 26, 2019

Intro slide

Downloads

If you're curious about the slides, you can download the PDF or the ODP.

Thanks

This post has been a part of work undertaken by my employer Collabora.

I would like to thank the wonderful organizers of Embedded World for hosting a great event.

February 25, 2019

This is the first report about Igalia’s activities around Computer Graphics, specifically 3D graphics and, in particular, the Mesa3D Graphics Library (Mesa), focusing on the year 2018.

GL_ARB_gl_spirv and GL_ARB_spirv_extensions

GL_ARB_gl_spirv is an OpenGL extension whose purpose is to enable an OpenGL program to consume SPIR-V shaders. In the case of GL_ARB_spirv_extensions, it provides a mechanism by which an OpenGL implementation would be able to announce which particular SPIR-V extensions it supports, which is a nice complement to GL_ARB_gl_spirv.

As both extensions, GL_ARB_gl_spirv and GL_ARB_spirv_extensions, are core functionality in OpenGL 4.6, the drivers need to provide them in order to be compliant with that version.

Although Igalia picked up the already started implementation of these extensions in Mesa back in 2017, 2018 is a year in which we put a big deal of work to provide the needed push to have all the remaining bits in place. Much of this effort provides general support to all the drivers under the Mesa umbrella but, in particular, Igalia implemented the backend code for Intel‘s i965 driver (gen7+). Assuming that the review process for the remaining patches goes without important bumps, it is expected that the whole implementation will land in Mesa during the beginning of 2019.

Throughout the year, Alejandro Piñeiro gave status updates of the ongoing work through his talks at FOSDEM and XDC 2018. This is a video of the latter:

ETC2/EAC

The ETC and EAC formats are lossy compressed texture formats used mostly in embedded devices. OpenGL implementations of the versions 4.3 and upwards, and OpenGL/ES implementations of the versions 3.0 and upwards must support them in order to be conformant with the standard.

Most modern GPUs are able to work directly with the ETC2/EAC formats. Implementations for older GPUs that don’t have that support but want to be conformant with the latest versions of the specs need to provide that functionality through the software parts of the driver.

During 2018, Igalia implemented the missing bits to support GL_OES_copy_image in Intel’s i965 for gen7+, while gen8+ was already complying through its HW support. As we were writing this entry, the work has finally landed.

VK_KHR_16bit_storage

Igalia finished the work to provide support for the Vulkan extension VK_KHR_16bit_storage into Intel’s Anvil driver.

This extension allows the use of 16-bit types (half floats, 16-bit ints, and 16-bit uints) in push constant blocks, and buffers (shader storage buffer objects).  This feature can help to reduce the memory bandwith for Uniform and Storage Buffer data accessed from the shaders and / or optimize Push Constant space, of which there are only a few bytes available, making it a precious shader resource.

shaderInt16

Igalia added Vulkan’s optional feature shaderInt16 to Intel’s Anvil driver. This new functionality provides the means to operate with 16-bit integers inside a shader which, ideally, would lead to better performance when you don’t need a full 32-bit range. However, not all HW platforms may have native support, still needing to run in 32-bit and, hence, not benefiting from this feature. Such is the case for operations associated with integer division in the case of Intel platforms.

shaderInt16 complements the functionality provided by the VK_KHR_16bit_storage extension.

SPV_KHR_8bit_storage and VK_KHR_8bit_storage

SPV_KHR_8bit_storage is a SPIR-V extension that complements the VK_KHR_8bit_storage Vulkan extension to allow the use of 8-bit types in uniform and storage buffers, and push constant blocks. Similarly to the the VK_KHR_16bit_storage extension, this feature can help to reduce the needed memory bandwith.

Igalia implemented its support into Intel’s Anvil driver.

VK_KHR_shader_float16_int8

Igalia implemented the support for VK_KHR_shader_float16_int8 into Intel’s Anvil driver. This is an extension that enables Vulkan to consume SPIR-V shaders that use Float16 and Int8 types in arithmetic operations. It extends the functionality included with VK_KHR_16bit_storage and VK_KHR_8bit_storage.

In theory, applications that do not need the range and precision of regular 32-bit floating point and integers, can use these new types to improve performance. Additionally, its implementation is mostly API agnostic, so most of the work we did should also help to have a proper mediump implementation for GLSL ES shaders in the future.

The review process for the implementation is still ongoing and is on its way to land in Mesa.

VK_KHR_shader_float_controls

VK_KHR_shader_float_controls is a Vulkan extension which allows applications to query and override the implementation’s default floating point behavior for rounding modes, denormals, signed zero and infinity.

Igalia has coded its support into Intel’s Anvil driver and it is currently under review before being merged into Mesa.

VkRunner

VkRunner is a Vulkan shader tester based on shader_runner in Piglit. Its goal is to make it feasible to test scripts as similar as possible to Piglit’s shader_test format.

Igalia initially created VkRunner as a tool to get more test coverage during the implementation of GL_ARB_gl_spirv. Soon, it was clear that it was useful way beyond the implementation of this specific extension but as a generic way of testing SPIR-V shaders.

Since then, VkRunner has been enabled as an external dependency to run new tests added to the Piglit and VK-GL-CTS suites.

Neil Roberts introduced VkRunner at XDC 2018. This is his talk:

freedreno

During 2018, Igalia has also started contributing to the freedreno Mesa driver for Qualcomm GPUs. Among the work done, we have tackled multiple bugs identified through the usual testing suites used in the graphic drivers development: Piglit and VK-GL-CTS.

Khronos Conformance

The Khronos conformance program is intended to ensure that products that implement Khronos standards (such as OpenGL or Vulkan drivers) do what they are supposed to do and they do it consistently across implementations from the same or different vendors.

This is achieved by producing an extensive test suite, the Conformance Test Suite (VK-GL-CTS or CTS for short), which aims to verify that the semantics of the standard are properly implemented by as many vendors as possible.

In 2018, Igalia has continued its work ensuring that the Intel Mesa drivers for both Vulkan and OpenGL are conformant. This work included reviewing and testing patches submitted for inclusion in VK-GL-CTS and continuously checking that the drivers passed the tests. When failures were encountered we provided patches to correct the problem either in the tests or in the drivers, depending on the outcome of our analysis or, even, brought a discussion forward when the source of the problem was incomplete, ambiguous or incorrect spec language.

The most important result out of this significant dedication has been successfully passing conformance applications.

OpenGL 4.6

Igalia helped making Intel’s i965 driver conformant with OpenGL 4.6 since day zero. This was a significant achievement since, besides Intel Mesa, only nVIDIA managed to do this too.

Igalia specifically contributed to achieve the OpenGL 4.6 milestone providing the GL_ARB_gl_spirv implementation.

Vulkan 1.1

Igalia also helped to make Intel’s Anvil driver conformant with Vulkan 1.1 since day zero, too.

Igalia specifically contributed to achieve the Vulkan 1.1 milestone providing the VK_KHR_16bit_storage implementation.

Mesa Releases

Igalia continued the work that was already carrying on in Mesa’s Release Team throughout 2018. This effort involved a continuous dedication to track the general status of Mesa against the usual test suites and benchmarks but also to react quickly upon detected regressions, specially coordinating with the Mesa developers and the distribution packagers.

The work was obviously visible by releasing multiple bugfix releases as well as doing the branching and creating a feature release.

CI

Continuous Integration is a must in any serious SW project. In the case of API implementations it is even critical since there are many important variables that need to be controlled to avoid regressions and track the progress when including new features: agnostic tests that can be used by different implementations, different OS platforms, CPU architectures and, of course, different GPU architectures and generations.

Igalia has kept a sustained effort to keep Mesa (and Piglit) CI integrations in good health with an eye on the reported regressions to act immediately upon them. This has been a key tool for our work around Mesa releases and the experience allowed us to push the initial proposal for a new CI integration when the FreeDesktop projects decided to start its migration to GitLab.

This work, along with the one done with the Mesa releases, lead to a shared presentation, given by Juan Antonio Suárez during XDC 2018. This is the video of the talk:

XDC 2018

2018 was the year that saw A Coruña hosting the X.Org Developer’s Conference (XDC) and Igalia as Platinum Sponsor.

The conference was organized by GPUL (Galician Linux User and Developer Group) together with University of A Coruña, Igalia and, of course, the X.Org Foundation.

Since A Coruña is the town in which the company originated and where we have our headquarters, Igalia had a key role in the organization, which was greatly benefited by our vast experience running events. Moreover, several Igalians joined the conference crew and, as mentioned above, we delivered talks around GL_ARB_gl_spirv, VkRunner, and Mesa releases and CI testing.

The feedback from the attendees was very rewarding and we believe the conference was a great event. Here you can see the Closing Session speech given by Samuel Iglesias:

Other activities

Conferences

As usual, Igalia was present in many graphics related conferences during the year:

New Igalians in the team

Igalia’s graphics team kept growing. Two new developers joined us in 2018:

  • Hyunjun Ko is an experienced Igalian with a strong background in multimedia. Specifically, GStreamer and Intel’s VAAPI. He is now contributing his impressive expertise into our Graphics team.
  • Arcady Goldmints-Orlov is the latest addition to the team. His previous expertise as a graphics developer around the nVIDIA GPUs fits perfectly for the kind of work we are pushing currently in Igalia.

Conclusion

Thank you for reading this blog post and we look forward to more work on graphics in 2019!

Igalia

February 21, 2019
Fedora 30 now contains all changes changes for a fully Flicker Free Boot. Last week a new version of plymouth landed which implements the new theme for this and also includes a much improved offline-updates experience, following this design.

At boot the display will seamlessly transit from the firmware boot-splash into the new plymouth theme, which uses the firmware boot-splash as background:



If you are using full-disk encryption then the unlock dialog will look like this:



Last, but not least the new plymouth theme looks like this in offline-updates mode:



ATM the texts in the offline-updates theme are not translated. They are rendered using pango + cairo, so we have the capability to make this fully translatable into all languages, I just need to add gettext support to plymouth for this. I plan to do this next week.

This all assumes that you are booting your machine with UEFI and your firmware supports the BGRT extension (which almost all firmware does). Otherwise you will get a dark-grey background instead of the firmware boot-splash.

If you are running rawhide and are seeing a totally different boot-theme then you likely have changed your plymouth theme away from the "charge" default at one point in time; and in that case you are not automatically updated to the new plymouth theme. To switch to the new theme run:

sudo plymouth-set-default-theme -R bgrt

If you do not like the firmware-splash being used as background, you can use the new theme on a dark-grey background instead by running:

sudo plymouth-set-default-theme -R spinner
February 18, 2019

I attended FOSDEM again this year thanks to funding from Igalia. This time I gave a talk about VkRunner in the graphics dev room. It’s now available on Igalia’s YouTube channel below:

I thought this might be a good opportunity to give a small status update of what has happened since my last blog post nearly a year ago.

Test suite integration

The biggest news is that VkRunner is now integrated into Khronos’ Vulkan CTS test suite and Mesa’s Piglit test suite. This means that if you work on a feature or a bugfix in your Vulkan driver and you want to make sure it doesn’t get regressed, it’s now really easy to add a VkRunner test for it and have it collected in one of these test suites. For Piglit all that is needed is to give the test script a .vk_shader_test extension and drop it anywhere under the tests/vulkan folder and it will automatically be picked up by the Piglit framework. As an added bonus, these tests are also run automatically on Intel’s CI system, so if your test is related to i965 in Mesa you can be sure it will not be regressed.

On the Khronos CTS side the integration is currently a little less simple. Along with help from Samuel Iglesias, we have merged a branch into master that lays the groundwork for adding VkRunner tests. Currently there are only proof-of-concept tests to show how the tests could work. Adding more tests still requires tweaking the C++ code so it’s not quite as simple as we might hope.

API

When VkRunner is built, in now also builds a static library containing a public API. This can be used to integrate VkRunner into a larger test suite. Indeed, the Khronos CTS integration takes advantage of this to execute the tests using the VkDevice created by the test suite itself. This also means it can execute multiple tests quickly without having to fork an external process.

The API is intended to be very highlevel and is as close to possible as just having simple functions to ask VkRunner to execute a test script and return an enum reporting whether the test succeeded or not. There is an example of its usage in the README.

Precompiled shader scripts

One of the concerns raised when integrating VkRunner into CTS is that it’s not ideal to have to run glslang as an external process in order to compile the shaders in the scripts to SPIR-V. To work around this, I added the ability to have scripts with binary shaders. In this case the 32-bit integer numbers of the compiled SPIR-V are just listed in ASCII in the shader test instead of the GLSL source. Of course writing this by hand would be a pain, so the VkRunner repo includes a Python script to precompile a bunch of shaders in a batch. This can be really useful to run the tests on an embedded device where installing glslang isn’t practical.

However, in the end for the CTS integration we took a different approach. The CTS suite already has a mechanism to precompile all of the shaders for all tests. We wanted to take advantage of this also when compiling the shaders from VkRunner tests. To make this work, Samuel added some functions to the VkRunner API to query the GLSL in a VkRunner shader script and then replace them with binary equivalents. That way the CTS suite can use these functions to replace the shaders with its cached compiled versions.

UBOs, SSBOs and compute shaders

One of the biggest missing features mentioned in my last post was UBO and SSBO support. This has now been fixed with full support for setting values in UBOs and SSBOs and also probing the results of writing to SSBOs. Probing SSBOs is particularily useful alongside another added feature: compute shaders. Thanks to this we can run our shaders as compute shaders to calculate some results into an SSBO and probe the buffer to see whether it worked correctly. Here is an example script to show how that might look:

[compute shader]
#version 450

/* UBO input containing an array of vec3s */
layout(binding = 0) uniform inputs {
        vec3 input_values[4];
};

/* A matrix to apply to these values. This is stored in a push
 * constant. */
layout(push_constant) uniform transforms {
        mat3 transform;
};

/* An SSBO to store the results */
layout(binding = 1) buffer outputs {
        vec3 output_values[];
};

void
main()
{
        uint i = gl_WorkGroupID.x;

        /* Transform one of the inputs */
        output_values[i] = transform * input_values[i];
}

[test]
# Set some input values in the UBO
ubo 0 subdata vec3 0 \
  3 4 5 \
  1 2 3 \
  1.2 3.4 5.6 \
  42 11 9

# Create the SSBO
ssbo 1 1024

# Store a matrix uniform to swap the x and y
# components of the inputs
push mat3 0 \
  0 1 0 \
  1 0 0 \
  0 0 1

# Run the compute shader with one instance
# for each input
compute 4 1 1

# Check that we got the expected results in the SSBO
probe ssbo vec3 1 0 ~= \
  4 3 5 \
  2 1 3 \
  3.4 1.2 5.6 \
  11 42 9

Extensions in the requirements section

The requirements section can now contain the name of any extension. If this is done then VkRunner will check for the availability of the extension when creating the device and enable it. Otherwise it will report that the test was skipped. A lot of the Vulkan extensions also add an extended features struct to be used when creating the device. These features can also be queried and enabled for extentions that VkRunner knows about simply by listing the name of the feature in that struct. For example if shaderFloat16 in listed in the requirements section, VkRunner will check for the VK_KHR_shader_float16_int8 extension and the shaderFloat16 feature within its extended feature struct. This makes it really easy to test optional features.

Cross-platform support

I spent a fair bit of time making sure VkRunner works on Windows including compiling with Visual Studio. The build files have been converted to CMake which makes building on Windows even easier. It also compiles for Android thanks to patches from Jaebaek Seo. The repo contains Android build files to build the library and the vkrunner executable. This can be run directly on a device using adb.

User interface

There is a branch containing the beginnings of a user interface for editing VkRunner scripts. It presents an editor widget via GTK and continuously runs the test script in the background as you are editing it. It then displays the results in an image and reports any errors in a text field. The test is run in a separate process so that if it crashes it doesn’t bring down the user interface. I’m not sure whether it makes sense to merge this branch into master, but in the meantime it can be a convenient way to fiddle with a test when it fails and it’s not obvious why.

And more…

Lots of other work has been going on in the background. The best way to get to more details on what VkRunner can do is to take a look at the README. This has been kept up-to-date as the source of documentation for writing scripts.

February 14, 2019

In this blog post, I'll explain how to update systemd's hwdb for a new device-specific entry. I'll focus on input devices, as usual.

What is the hwdb and why do I need to update it?

The hwdb is a binary database sitting at /etc/udev/hwdb.bin and /usr/lib/udev/hwdb.d. It is usually used to apply udev properties to specific devices, those properties are then picked up by other processes (udev builtins, libinput, ...) to apply device-specific behaviours. So you'll need to update the hwdb if you need a specific behaviour from the device.

One of the use-cases I commonly deal with is that some touchpad announces wrong axis ranges or resolutions. With the correct hwdb entry (see the example later) udev can correct these at device initialisation time and every process sees the right axis ranges.

The database is compiled from the various .hwdb files you have sitting on your system, usually in /etc/udev/hwdb.d and /usr/lib/hwdb.d. The terser format of the hwdb files makes them easier to update than, say, writing a udev rule to set those properties.

The full process goes something like this:

  • The various .hwdb files are installed or modified
  • The hwdb.bin file is generated from the .hwdb files
  • A udev rule triggers the udev hwdb builtin. If a match occurs, the builtin prints the to-be properties, and udev captures the output and applies it as udev properties to the device
  • Some other process (often a different udev builtin) reads the udev property value and does something.
On its own, the hwdb is merely a lookup tool though, it does not modify devices. Think of it as a virtual filing cabinet, something will need to look at it, otherwise it's just dead weight.

An example for such a udev rule from 60-evdev.rules contains:


IMPORT{builtin}="hwdb --subsystem=input --lookup-prefix=evdev:", \
RUN{builtin}+="keyboard", GOTO="evdev_end"
The IMPORT statement translates as "look up the hwdb, import the properties". The RUN statement runs the "keyboard" builtin which may change the device based on the various udev properties now set. The GOTO statement goes to skip the rest of the file.

So again, on its own the hwdb doesn't do anything, it merely prints to-be udev properties to stdout, udev captures those and applies them to the device. And then those properties need to be processed by some other process to actually apply changes.

hwdb file format

The basic format of each hwdb file contains two types of entries, match lines and property assignments (indented by one space). The match line defines which device it is applied to.

For example, take this entry from 60-evdev.hwdb:


# Lenovo X230 series
evdev:name:SynPS/2 Synaptics TouchPad:dmi:*svnLENOVO*:pn*ThinkPad*X230*
EVDEV_ABS_01=::100
EVDEV_ABS_36=::100
The match line is the one starting with "evdev", the other two lines are property assignments. Property values are strings, any interpretation to numeric values or others is to be done in the process that requires those properties. Noteworthy here: the hwdb can overwrite previously set properties, but it cannot unset them.

The match line is not specified by the hwdb beyond "it's a glob". The format to use is defined by the udev rule that invokes the hwdb builtin. Usually the format is:


someprefix:search criteria:
For example, the udev rule that applies for the match above is this one in 60-evdev.rules:

KERNELS=="input*", \
IMPORT{builtin}="hwdb 'evdev:name:$attr{name}:$attr{[dmi/id]modalias}'", \
RUN{builtin}+="keyboard", GOTO="evdev_end"
What does this rule do? $attr entries get filled in by udev with the sysfs attributes. So on your local device, the actual lookup key will end up looking roughly like this:

evdev:name:Some Device Name:dmi:bvnWhatever:bvR112355:bd02/01/2018:...
If that string matches the glob from the hwdb, you have a match.

Attentive readers will have noticed that the two entries from 60-evdev.rules I posted here differ. You can have multiple match formats in the same hwdb file. The hwdb doesn't care, it's just a matching system.

We keep the hwdb files matching the udev rules names for ease of maintenance so 60-evdev.rules keeps the hwdb files in 60-evdev.hwdb and so on. But this is just for us puny humans, the hwdb will parse all files it finds into one database. If you have a hwdb entry in my-local-overrides.hwdb it will be matched. The file-specific prefixes are just there to not accidentally match against an unrelated entry.

Applying hwdb updates

The hwdb is a compiled format, so the first thing to do after any changes is to run


$ systemd-hwdb update
This command compiles the files down to the binary hwdb that is actually used by udev. Without that update, none of your changes will take effect.

The second thing is: you need to trigger the udev rules for the device you want to modify. Either you do this by physically unplugging and re-plugging the device or by running


$ udevadm trigger
or, better, trigger only the device you care about to avoid accidental side-effects:

$ udevadm trigger /sys/class/input/eventXYZ
In case you also modified the udev rules you should re-load those too. So the full quartet of commands after a hwdb update is:

$ systemd-hwdb update
$ udevadm control --reload-rules
$ udevadm trigger
$ udevadm info /sys/class/input/eventXYZ
That udevadm info command lists all assigned properties, these should now include the modified entries.

Adding new entries

Now let's get down to what you actually want to do, adding a new entry to the hwdb. And this is where it also get's tricky to have a generic guide because every hwdb file has its own custom match rules.

The best approach is to open the .hwdb files and the matching .rules file and figure out what the match formats are and which one is best. For USB devices there's usually a match format that uses the vendor and product ID. For built-in devices like touchpads and keyboards there's usually a dmi-based match format (see /sys/class/dmi/id/modalias). In most cases, you can just take an existing entry and copy and modify it.

My recommendation is: add an extra property that makes it easy to verify the new entry is applied. For example do this:


# Lenovo X230 series
evdev:name:SynPS/2 Synaptics TouchPad:dmi:*svnLENOVO*:pn*ThinkPad*X230*
EVDEV_ABS_01=::100
EVDEV_ABS_36=::100
FOO=1
Now run the update commands from above. If FOO=1 doesn't show up, then you know it's the hwdb entry that's not yet correct. If FOO=1 does show up in the udevadm info output, then you know the hwdb matches correctly and any issues will be in the next layer.

Increase the value with every change so you can tell whether the most recent change is applied. And before your submit a pull request, remove the FOOentry.

Oh, and once it applies correctly, I recommend restarting the system to make sure everything is in order on a freshly booted system.

Troubleshooting

The reason for adding hwdb entries is always because we want the system to handle a device in a custom way. But it's hard to figure out what's wrong when something doesn't work (though 90% of the time it's a typo in the hwdb match).

In almost all cases, the debugging sequence is the following:

  • does the FOO property show up?
  • did you run systemd-hwdb update?
  • did you run udevadm trigger?
  • did you restart the process that requires the new udev property?
  • is that process new enough to have support for that property?
If the answer to all these is "yes" and it still doesn't work, you may have found a bug. But 99% of the time, at least one of those is a sound "no. oops.".

Your hwdb match may run into issues with some 'special' characters. If your device has e.g. an ® in its device name (some Microsoft devices have this), a bug in systemd caused the match to fail. That bug is fixed now but until it's available in your distribution, replace with an asterisk ('*') in your match line.

Greybeards who have been around since before 2014 (systemd v219) may remember a different tool to update the hwdb: udevadm hwdb --update. This tool still exists, but it does not have the exact same behaviour as systemd-hwdb update. I won't go into details but the hwdb generated by the udevadm tool can provide unexpected matches if you have multiple matches with globs for the same device. A more generic glob can take precedence over a specific glob and so on. It's a rare and niche case and fixed since systemd v233 but the udevadm behaviour remained the same for backwards-compatibility.

Happy updating and don't forget to add Database Administrator to your CV when your PR gets merged.

February 05, 2019
Video and slides for my talk in the LLVM devroom on TableGen are now available here.

Now I only need the time and energy to continue my blog series on the topic...
February 01, 2019
i've just landed a big milestone for the Flicker Free Boot work I'm doing for Fedors 30. Starting with todays rawhide kernel build, version 5.0.0-0.rc4.git3.1, the fastboot option for the i915 Intel display driver is enabled by default on systems with a Skylake CPU/iGPU and newer, as well as on Valleyview and Cherryview (Bay- and Cherry-Trail) systems.

This means that the last modeset / flicker during boot of UEFI systems using the integrated Intel GPU for display output is now gone.
January 31, 2019

The recommended way to link UEFI applications on linux was until now through GNU-EFI, a toolchain provided by the GNU Project that bridges from the ELF world into COFF/PE32+. But why don’t we compile directly to native UEFI? A short dive into the past of GNU Toolchains, its remnants, and a surprisingly simple way out.

The Linux World (and many UNIX Derivatives for that matter) is modeled around ELF. With statically linked languages becoming more prevalent, the impact of the ABI diminishes, but it still defines properties far beyond just how to call functions. The ABI your system uses also effects how compiler and linker interact, how binaries export information (especially symbols), and what features application developers can make use of. We have become used to ELF, and require its properties in places we didn’t expect.

UEFI does not use ELF. For all that matters, UEFI follows Microsoft Windows. This means, UEFI uses COFF/PE32+ (or short PE+). If we compile binaries for UEFI, they must target PE+. And the GNU Compiler Collection can do this… somewhat.

Conceptually, GCC supports many languages, ABIs, targets, and architectures in a single code-base. Technically, though, every compiled instance of GCC compiles from one language to one target. Your compiler that takes C and produces x86-64 is actually specific to x86_64-pc-linux-gnu. You cannot tweak it to compile UEFI binaries. Instead, you need another instance of GCC, one that takes C and produces x86_64-windows-msvc. You probably know this combination under the name MinGW.

But this is not what GNU went for. Instead, to what still puzzles me to this day, the GNU project decided against using its own software and instead produced something named GNU-EFI. The goal of GNU-EFI is to allow writing UEFI applications using the common GNU Toolchain (meaning you compile ELF binaries for Linux). They achieve this by linking a PE+ Stub, which at runtime performs required relocations, parameter translations, and jumps into the ELF application. You effectively write a free-standing Linux Application, add a wrapping layer and then execute it on UEFI. It works, but is needlessly complex.

Is this really the best way to compile for UEFI? Not anymore!

The LLVM toolchain (clang compiler plus lld linker) combines all supported targets in a single toolchain, offering a target selector --target to let LLVM know what to compile for. So as long as you have clang and lld installed, you can compile native UEFI binaries just like normal local compilation:

# Normal local compile+link
$ clang \
        $CFLAGS \
        -o OBJECT \
        -c [SOURCES…]
$ clang \
        $LDFLAGS \
        -o BINARY \
        [OBJECTS…]

To make this compile for UEFI targets, you simply set:

CFLAGS+= \
        --target x86_64-unknown-windows \
        -ffreestanding \
        -fshort-wchar \
        -mno-red-zone

LDFLAGS+= \
        --target x86_64-unknown-windows \
        -nostdlib \
        -Wl,-entry:efi_main \
        -Wl,-subsystem:efi_application \
        -fuse-ld=lld-link

The two things special are --target <TRIPLE> and --fuse-ld=<LINKER>. The former instructs both compiler and linker to produce COFF/PE32+ objects compatible to the Microsoft Windows Platform (which matches the UEFI platform). The latter selects the linker to use. Mind you, using the default linker will very likely fail (default being ld or ld-gold). Currently, you either have to use lld-link (PE+ backend of the LLVM linker), or you need a version of GNU-ld compiled for a PE+ toolchain. I recommend LLVM lld.

Voilà! No need for GNU-EFI, no need to mess with separated toolchains. With LLVM you get all this through your local toolchain.

If you use Meson Build, the c-efi project even provides you an example cross-file. A native meson C project can then be compiled for UEFI by nothing more than passing --cross-file x86_64-unknown-uefi to meson. See its sources for details.

The c-efi project also provides the protocol contants and definitions from the UEFI specification, so you don’t have to extract them yourself.

January 30, 2019

One of the greatest features from piglit was the easy development of OpenGL tests based on GLSL shaders plus some simple commands through shader_runner command. I even wrote about it.

However, Vulkan ecosystem was missing a tool like that but for SPIR-V shader tests… until last year!

Vulkan Logo

VkRunner is a tool written by Neil Roberts, which is very inspired on shader_runner. VkRunner was the result of the Igalia work to enable ARB_gl_spirv extension for Intel’s i965 driver on Mesa, where there was a need to test driver’s code against a good number of shaders to be sure that it was fine.

VkRunner uses a script language to define the requirements needed to run the test, such as the needed extension and features, the shaders to be run and a series of commands to run it. It will then parse everything and execute the equivalent Vulkan commands to do so under the hood, like shader_runner did for OpenGL in piglit.

This is an example of how a Vkrunner looks like:

[compute shader]
#version 450

layout(std140, push_constant) uniform push_constants {
        float in_value;
};

layout(std140, binding = 0) buffer ssbo {
        float out_value;
};

void
main()
{
        out_value = sqrt(in_value);
}

[test]
# Allocate an ssbo big enough for a float at binding 0
ssbo 0 4

# Set the push constant as an input value
uniform float 0 4

compute 1 1 1

# Probe that we got the expected value
tolerance 0.00006% 0.00006% 0.00006% 0.00006%
probe ssbo float 0 0 ~= 2

The end of 2018 was great for VkRunner! First, the tool was integrated into piglit so we can now use it in this amazing open-source testing suite for 3D graphics drivers. Soon after, it was integrated into Khronos Group’s Vulkan and OpenGL Conformance Test Suite (see commit), which will help contributors to easily write SPIR-V tests on Vulkan.

If you want learn more about VkRunner, apart from browsing the repository, Neil wrote a nice blogpost explaining the tool basics, gave a lightning talk at XDC 2018 (slides) in A Coruña and now, he is going to give a talk in the graphics devroom at FOSDEM 2019! You can follow his FOSDEM talk on Saturday via live-stream (or see the recording afterwards) in case you are not going to FOSDEM this year :-)

Igalia

FOSDEM 2019

January 07, 2019

The video

Below you can see glmark2 running as a Wayland client in Weston, on a NanoPC -T4 (so a RK3399 SoC with a Mali T-864 GPU)). It's much smoother than on the video, which is limited to 5FPS by the webcam.


Weston is running with the DRM backend and the GL renderer.

The history behind it


For more than 10 years, at Collabora we have been happily helping our customers to make the most of their hardware by running free software.

One area some of us have specially enjoyed working on has been open drivers for GPUs, which for a long time have been considered the next frontier in the quest to have a full software platform that companies and individuals can understand, improve and fix without having to ask for permission first.

Something that has saddened me a bit has been our reduced ability to help those customers that for one reason or another had chosen a hardware platform with ARM Mali GPUs, as no open driver was available for those.

While our biggest customers were able to get a high level of support from the vendors in order to have the Mali graphics stack well integrated with the rest of their product, the smaller ones had a much harder time in achieving that level of integration, which manifested in reduced performance, increased power consumption and slipped milestones.

That's why we have been following with great interest the several efforts that aimed to come up with an open driver for GPUs in the Mali family, one similar to those already existing for Qualcomm, NVIDIA and Vivante.

At XDC last year we had the chance of meeting the people involved in the latest effort to develop such a driver: Panfrost. And in the months that followed I made some room in my backlog to come up with a plan to give the effort a boost.

At that point, Panfrost was only able to get its bits in the screen by an elaborate hack that involved copying each frame into a X11 SHM buffer, which besides making the setup of the development environment much more cumbersome, invalidated any performance analysis. It also limited testing to demos such as glmark2.

Due to my previous work on Etnaviv I was already familiar with the abstractions in Mesa for setups in which the display of buffers is performed by a device different from the GPU so it was just a matter of seeing how we could get the kernel driver for the Mali GPU to play well with the rest of the stack.

So during the past month or so I have come up with a proper implementation of the winsys abstraction that makes use of ARM's kernel driver. The result is that now developers have a better base on which to work on the rendering side of things.

By properly creating, exporting and importing buffers, we can now run applications on GBM, from demos such as kmscube and glmark2 to compositors such as Weston, but also big applications such as Kodi. We are also supporting zero-copy display of GPU-rendered clients in Weston.

This should make it much easier to work on the rendering side of things, and work on a proper DRM driver in the mainline kernel can proceed in parallel.

For those interested in joining to the effort, Alyssa has graciously taken the time to update the instructions to build and test Panfrost. You can join us at #panfrost in Freenode and can start sending merge requests to Gitlab.

Thanks to Collabora for sponsoring this work and to Alyssa Rosenzweig and Lyude Paul for their previous work and for answering my questions.
December 15, 2018

This time we're digging into HID - Human Interface Devices and more specifically the protocol your mouse, touchpad, joystick, keyboard, etc. use to talk to your computer.

Remember the good old days where you had to install a custom driver for every input device? Remember when PS/2 (the protocol) had to be extended to accommodate for mouse wheels, and then again for five button mice. And you had to select the right protocol to make it work. Yeah, me neither, I tend to suppress those memories because the world is awful enough as it is.

As users we generally like devices to work out of the box. Hardware manufacturers generally like to add bits and bobs because otherwise who would buy that new device when last year's device looks identical. This difference in needs can only be solved by one superhero: Committee-man, with the superpower to survive endless meetings and get RFCs approved.

Many many moons ago, when USB itself was in its infancy, Committee man and his sidekick Caffeine boy got the USB consortium agree on a standard for input devices that is so self-descriptive that operating systems (Win95!) can write one driver that can handle this year's device, and next year's, and so on. No need to install extra drivers, your device will just work out of the box. And so HID was born. This may only be an approximate summary of history.

Originally HID was designed to work over USB. But just like Shrek the technology world is obsessed with layers so these days HID works over different transport layers. HID over USB is what your mouse uses, HID over i2c may be what your touchpad uses. HID works over Bluetooth and it's celebrity-diet version BLE. Somewhere, someone out there is very slowly moving a mouse pointer by sending HID over carrier pigeons just to prove a point. Because there's always that one guy.

HID is incredibly simple in that the static description of the device can just be bytes burnt into the ROM like the Australian sun into unprepared English backpackers. And the event frames are often an identical series of bytes where every bit is filled in by the firmware according to the axis/buttons/etc.

HID is incredibly complicated because parsing it is a stack-based mental overload. Each individual protocol item is simple but getting it right and all into your head is tricky. Luckily, I'm here for you to make this simpler to understand or, failing that, at least more entertaining.

As said above, the purpose of HID is to make devices describe themselves in a generic manner so that you can have a single driver handle any input device. The idea is that the host parses that standard protocol and knows exactly how the device will behave. This has worked out great, we only have around 200 files dealing with vendor- and hardware-specific HID quirks as of v4.20.

HID messages are Reports. And to know what a Report means and how to interpret it, you need a Report Descriptor. That Report Descriptor is static and contains a series of bytes detailing "what" and "where", i.e. what a sequence of bits represents and where to find those bits in the Report. So let's try and parse one of Report Descriptors, let's say for a fictional mouse with a few buttons. How exciting, we're at the forefront of innovation here.

The Report Descriptor consists of a bunch of Items. A parser reads the next Item, processes the information within and moves on. Items are small (1 byte header, 0-4 bytes payload) and generally only apply exactly one tiny little bit of information. You need to accumulate several items to build up enough information to actually know what's happening.

The "what" question of the Report Descriptor is answered with the so-called Usage. This could be something simple like X or Y (0x30 and 0x31) or something more esoteric like System Menu Exit (0x88). A Usage is 16 bits but all Usages are grouped into so-called Usage Pages. A Usage Page too is a 16 bit value and together they form the 32-bit value that tells us what the device can do. Examples:


0001 0031 # Generic Desktop, Y
0001 0088 # Generic Desktop, System Menu Exit
0003 0005 # VR Controls, Head Tracker
0003 0006 # VR Controls, Head Mounted Display
0004 0031 # Keyboard, Keyboard \ and |
Note how the Usage in the last item is the same as the first one, without the Usage Page you will mix things up. It helps if you always think of as the Usage as a 32-bit number. For your kids' bed-time story time, here are the HID Usage Tables from 2004 and the approved HID Usage Table Review Requests of the last decade. Because nothing puts them to sleep quicker than droning on about hex numbers associated with remote control buttons.

To successfully interpret a Report from the device, you need to know which bits have which Usage associated with them. So let's go back to our innovative mouse. We would want a report descriptor with 6 items like this:


Usage Page (Generic Desktop)
Usage (X)
Report Size (16)
Usage Page (Generic Desktop)
Usage (Y)
Report Size (16)
This basically tells the host: X and Y both have 16 bits. So if we get a 4-byte Report from the device, we know two bytes are for X, two for Y.

HID was invented when a time when bits were more expensive than printer ink, so we can't afford to waste any bits (still the case because who would want to spend an extra penny on more ROM). HID makes use of so-called Global items, once those are set their value applies to all following items until changed. Usage Page and Report Size are such Global items, so the above report descriptor is really implemented like this:


Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
The Report Count just tells us that 2 fields of the current Report Size are coming up. We have two usages, two fields, and 16 bits each so we know what to do. The Input item is sort-of the marker for the end of the stack, it basically tells us "process what you've seen so far", together with a few flags. Rel in this case means that the Usages are relative. Oh, and Input means that this is data from device to host. Output would be data from host to device, e.g. to set LEDs on a keyboard. There's also Feature which indicates configurable items.

Buttons on a device are generally just numbered so it'd be monumental 16-bits-at-a-time waste to have HID send Usage (Button1), Usage (Button2), etc. for every button on the device. HID instead provides a Usage Minimum and Usage Maximumto sequentially order them. This looks like this:


Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
So we have 5 buttons here and each button has one bit. Note how the buttons are Abs because a button state is not a relative value, it's either down or up. HID is quite intolerant to Schrödinger's thought experiments.

Let's put the two things together and we have an almost-correct Report descriptor:


Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)

Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
New here is Cnst. This signals that the bits have a constant value, thus don't need a Usage and basically don't matter (haha. yeah, right. in theory). Linux does indeed ignore those. Cnst is used for padding to align on byte boundaries - 5 bits for buttons plus 3 bits padding make 8 bits. Which makes one byte as everyone agrees except for granddad over there in the corner. I don't know how he got in.

Were we to get a 5-byte Report from the device, we'd parse it approximately like this:


button_state = byte[0] & 0x1f
x = bytes[1] | (byte[2] << 8)
y = bytes[3] | (byte[4] << 8)
Hooray, we're almost ready. Except not. We may need more info to correctly interpret the data within those reports.

The Logical Minimum and Logical Maximum specify the value range of the actual data. We need this to tell us whether the data is signed and what the allowable range is. Together with the Physical Minimumand the Physical Maximum they specify what the values really mean. In the simple case:


Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
This just means our x/y data is signed. Easy. But consider this combination:

...
Logical Minimum (0)
Logical Maximum (1)
Physical Minimum (1)
Physical Maximum (12)
This means that if the bit is 0, the effective value is 1. If the bit is 1, the effective value is 12.

Note that the above is one report only. Devices may have multiple Reports, indicated by the Report ID. So our Report Descriptor may look like this:


Report ID (01)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Report ID (02)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)
If we were to get a Report now, we need to check byte 0 for the Report ID so we know what this is. i.e. our single-use hard-coded parser would look like this:

if byte[0] == 0x01:
button_state = byte[1] & 0x1f
else if byte[0] == 0x02:
x = bytes[2] | (byte[3] << 8)
y = bytes[4] | (byte[5] << 8)
A device may use multiple Reports if the hardware doesn't gather all data within the same hardware bits. Now, you may ask: if I get fifteen reports, how should I know what belongs together? Good question, and lucky for you the HID designers are miles ahead of you. Report IDs are grouped into Collections.

Collections can have multiple types. An Application Collectiondescribes a set of inputs that make sense as a whole. Usually, every Report Descriptor must define at least one Application Collection but you may have two or more. For example, a a keyboard with integrated trackpoint should and/or would use two. This is how the kernel knows it needs to create two separate event nodes for the device. Application Collections have a few reserved Usages that indicate to the host what type of device this is. These are e.g. Mouse, Joystick, Consumer Control. If you ever wondered why you have a device named like "Logitech G500s Laser Gaming Mouse Consumer Control" this is the kernel simply appending the Application Collection's Usage to the device name.

A Physical Collection indicates that the data is collected at one physical point though what a point is is a bit blurry. Theoretical physicists will disagree but a point can be "a mouse". So it's quite common for all reports on a mouse to be wrapped in one Physical Collections. If you have a device with two sets of sensors, you'd have two collections to illustrate which ones go together. Physical Collections also have reserved Usages like Pointer or Head Tracker.

Finally, a Logical Collection just indicates that some bits of data belong together, whatever that means. The HID spec uses the example of buffer length field and buffer data but it's also common for all inputs from a mouse to be grouped together. A quick check of my mice here shows that Logitech doesn't wrap the data into a Logical Collection but Microsoft's firmware does. Because where would we be if we all did the same thing...

Anyway. Now that we know about collections, let's look at a whole report descriptor as seen in the wild:


Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Application)
Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Logical)
Report ID (26)
Usage (Pointer)
Collection (Physical)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Logical Minimum (0)
Logical Maximum (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
Usage (Wheel)
Physical Minimum (0)
Physical Maximum (0)
Report Count (1)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
End Collection
End Collection
End Collection
We have one Application Collection (Generic Desktop, Mouse) that contains one Logical Collection (Generic Desktop, Mouse). That contains one Physical Collection (Generic Desktop, Pointer). Our actual Report (and we have only one but it has the decimal ID 26) has 5 buttons, two 16-bit axes (x and y) and finally another 16 bit axis for the Wheel. This device will thus send 8-byte reports and our parser will do:

if byte[0] != 0x1a: # it's decimal in the above descriptor
error, should be 26
button_state = byte[1] & 0x1f
x = byte[2] | (byte[3] << 8)
y = byte[4] | (byte[5] << 8)
wheel = byte[6] | (byte[7] << 8)
That's it. Now, obviously, you can't write a parser for every HID descriptor out there so your actual parsing code needs to be generic. The Linux kernel does exactly that and so does everything else that needs to parse HID. There's a huge variety in devices out there, all with HID descriptors that may or may not be correct. As with so much in life, correct HID implementations are often defined by "whatever Windows accepts" so if you like playing catch, Linux development is for you.

Oh, in case you just got a bit too optimistic about the state of the world: HID allows for vendor-defined usages. Which does exactly what you'd think it does, it hides vendor-specific protocol inside what should be a generic protocol. There are devices with hidden report IDs that you can only unlock by sending the right magic sequence to the report and/or by defeating the boss on Level 4. Usually those devices present themselves as basic/normal devices over HID but if you know the magic sequence you get to use *gasp* all buttons. Or access the device-specific configuration features. Logitech's HID++ is just one example here but at least that's one where we have most of the specs available.

The above describes how to parse the HID report descriptor and interpret the reports. But what happens once you have a HID report correctly parsed? In the case of the Linux kernel, once the report descriptor is parsed evdev nodes are created (one per Application Collection, more or less). As the Reports come in, they are mapped to evdev codes and the data appears on the evdev node. That's where userspace like libinput can pick it up. That bit is actually quite simple (mostly anyway).

The above output was generated with the tools from the hid-tools repository. Go forth and hid-record.

December 14, 2018
libfprint, the fingerprint reader driver library, is nearing a 1.0 release.

Since the last time I reported on the status of the library, we've made some headway modernising the library, using a variety of different tools. Let's go through them and how they were used.

Callcatcher

When libfprint was in its infancy, Daniel Drake found the NBIS fingerprint processing library matched what was required to provide fingerprint matching algorithms, and imported it in libfprint. Since then, the code in this copy-paste library in libfprint stayed the same. When updating it to the latest available version (from 2015 rather than 2007), as well as splitting off a patch to make it easier to update the library again in the future, I used Callcatcher to cull the unused functions.

Callcatcher is not a "production-level" tool (too many false positives, lack of support for many common architectures, etc.), but coupled with manual checking, it allowed us to greatly reduce the number of functions in our copy, so they weren't reported when using other source code quality checking tools.

LLVM's scan-build

This is a particularly easy one to use as its use is integrated into meson, and available through ninja scan-build. The output of the tool, whether on stderr, or on the HTML pages, is pretty similar to Coverity's, but the tool is free, and easily integrated into a CI (once you've fixed all the bugs, obviously). We found plenty of possible memory leaks and unintialised variables using this, with more flexibility than using Coverity's web interface, and avoiding going through hoops when using its "source code check as a service" model.

cflow and callgraph

LLVM has another tool, called callgraph. It's not yet integrated into meson, which was a bit of a problem to get some output out of it. But combined with cflow, we used it to find where certain functions were called, trying to find the origin of some variables (whether they were internal or device-provided for example), which helped with implementing additional guards and assertions in some parts of the library, in particular inside the NBIS sub-directory.

0.99.0 is out

We're not yet completely done with the first pass at modernising libfprint and its ecosystem, but we released an early Yule present with version 0.99.0. It will be integrated into Fedora after the holidays if the early testing goes according to plan.

We also expect a great deal from our internal driver API reference. If you have a fingerprint reader that's unsupported, contact your laptop manufacturer about them providing a Linux driver for it and point them at this documentation.

A number of laptop vendors are already asking their OEM manufacturers to provide drivers to be merged upstream, but a little nudge probably won't hurt.

Happy holidays to you all, and see you for some more interesting features in the new year.
December 12, 2018

Disclaimer: this is pending for v4.21 and thus not yet in any kernel release.

Most wheel mice have a physical feature to stop the wheel from spinning freely. That feature is called detents, notches, wheel clicks, stops, or something like that. On your average mouse that is 24 wheel clicks per full rotation, resulting in the wheel rotating by 15 degrees before its motion is arrested. On some other mice that angle is 18 degrees, so you get 20 clicks per full rotation.

Of course, the world wouldn't be complete without fancy hardware features. Over the last 10 or so years devices have added free-wheeling scroll wheels or scroll wheels without distinct stops. In many cases wheel behaviour can be configured on the device, e.g. with Logitech's HID++ protocol. A few weeks back, Harry Cutts from the chromium team sent patches to enable Logitech high-resolution wheel scrolling in the kernel. Succinctly, these patches added another axis next to the existing REL_WHEEL named REL_WHEEL_HI_RES. Where available, the latter axis would provide finer-grained scroll information than the click-by-click REL_WHEEL. At the same time I accidentally stumbled across the documentation for the HID Resolution Multiplier Feature. A few patch revisions later and we now have everything queued up for v4.21. Below is a summary of the new behaviour.

The kernel will continue to provide REL_WHEEL as axis for "wheel clicks", just as before. This axis provides the logical wheel clicks, (almost) nothing changes here. In addition, a REL_WHEEL_HI_RES axis is available which allows for finer-grained resolution. On this axis, the magic value 120 represents one logical traditional wheel click but a device may send a fraction of 120 for a smaller motion. Userspace can either accumulate the values until it hits a full 120 for one wheel click or it can scroll by a few pixels on each event for a smoother experience. The same principle is applied to REL_HWHEEL and REL_HWHEEL_HI_RES for horizontal scroll wheels (which these days is just tilting the wheel). The REL_WHEEL axis is now emulated by the kernel and simply sent out whenever we have accumulated 120.

Important to note: REL_WHEEL and REL_HWHEEL are now legacy axes and should be ignored by code handling the respective high-resolution version.

The magic value of 120 is taken directly from Windows. That value was chosen because it has a good number of integer factors, so dividing 120 by whatever multiplier the mouse uses gives you a integer fraction of 120. And because HW manufacturers want it to work on Windows, we can rely on them doing it right, provided we use the same approach.

There are two implementations that matter. Harry's patches enable the high-resolution scrolling on Logitech mice which seem to mostly have a multiplier of 8 (i.e. REL_WHEEL_HI_RES will send eight events with a value of 15 before REL_WHEEL sends 1 click). There are some interesting side-effects with e.g. the MX Anywhere 2S. In high-resolution mode with a multiplier of 8, a single wheel movement does not always give us 8 events, the firmware does its own magic here. So we have some emulation code in place with the goal of making the REL_WHEEL event happen on the mid-point of a wheel click motion. The exact point can shift a bit when the device sends 7 events instead of 8 so we have a few extra bits in place to reset after timeouts and direction changes to make sure the wheel behaviour is as consistent as possible.

The second implementation is for the generic HID protocol. This was all added for Windows Vista, so we're only about a decade behind here. Microsoft got the Resolution Multiplier feature into the official HID documentation (possibly in the hope that other HW manufacturers implement it which afaict didn't happen). This feature effectively provides a fixed value multiplier that the device applies in hardware when enabled. It's basically the same as the Logitech one except it's set through a HID feature instead of a vendor-specific protocol. On the devices tested so far (all Microsoft mice because no-one else seems to implement this) the multipliers vary a bit, ranging from 4 to 12. And the exact behaviour varies too. One mouse behaves correctly (Microsoft Comfort Optical Mouse 3000) and sends more events than before. Other mice just send the multiplied value instead of the normal value, so nothing really changes. And at least one mouse (Microsoft Sculpt Ergonomic) sends the tilt-wheel values more frequently and with a higher value. So instead of one event with value 1 every X ms, we now get an event with value 3 every X/4 ms. The mice tested do not drop events like the Logitech mice do, so we don't need fancy emulation code here. Either way, we map this into the 120 range correctly now, so userspace gets to benefit.

As mentioned above, the Resolution Multiplier HID feature was introduced for Windows Vista which is... not the most recent release. I have a strong suspicion that Microsoft dumped this feature as well, the most recent set of mice I have access to don't provide the feature anymore (they have vendor-private protocols that we don't know about instead). So the takeaway for all this is: if you have a Logitech mouse, you'll get higher-resolution scrolling on v4.21. If you have a Microsoft mouse a few years old, you may get high-resolution wheel scrolling if the device supports it. Any other vendor or a new Microsoft mouse, you don't get it.

Coincidentally, if you know anyone at Microsoft who can provide me with the specs for their custom protocol, I'd appreciate it. We'd love to have support for it both in libratbag and in the kernel. Or any other vendor, come to think of it.

December 10, 2018

For V3D last week, I resurrected my old GLES 3.1 series with SSBO and shader imgae support, rebuilt it for V3D 4.1 (shader images no longer need manual tiling), and wrote indirect draw support and started on compute shaders. As of this weekend, dEQP-GLES31 is passing 1387/1567 of tests with “compute” in the name on the simulator. I have a fix needed for barrier(), then it’s time to build the kernel interface. In the process, I ended up fixing several job flushing bugs, plugging memory leaks, improving our shader disassembly debug dumps, and reducing memory consumption and CPU overhead.

The TFU series is now completely merged, and the kernel cache management series from last week is now also merged. I also fixed a bug in the core GPU scheduler that would cause use-after-frees when tracing.

For vc4, Boris completed and merged the H/V flipping work after fixing display of flipped/offset SAND images. More progress has been made on chamelium testing, so we’re nearing having automated regression tests for parts of our display stack. My PM driver has the acks we needed, but my co-maintainer has tacked a new request on in v3 so it’ll be delayed.

December 05, 2018

Khronos Group has published two new extensions for Vulkan: VK_KHR_shader_float16_int8 and VK_KHR_shader_float_controls. In this post, I will talk about VK_KHR_shader_float_controls, which is the extension I have been implementing on Anvil driver, the open-source Intel Vulkan driver, as part of my job at Igalia. For information about VK_KHR_shader_float16_int8 and its implementation in Mesa, you can read Iago’s blogpost.

The Vulkan Working Group has defined a new extension VK_KHR_shader_float_controls, which allows applications to query and override the implementation’s default floating point behavior for rounding modes, denormals, signed zero and infinity. From the Vulkan application developer perspective, VK_shader_float_controls defines a new structure called VkPhysicalDeviceFloatControlsPropertiesKHR where the drivers expose the supported capabilities such as the rounding modes for each floating point data type, how the denormals are expected to be handled by the hardware (either flush to zero or preserve their bits) and if the value is a signed zero, infinity and NaN, whether it will preserve their bits.

typedef struct VkPhysicalDeviceFloatControlsPropertiesKHR {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           separateDenormSettings;
    VkBool32           separateRoundingModeSettings;
    VkBool32           shaderSignedZeroInfNanPreserveFloat16;
    VkBool32           shaderSignedZeroInfNanPreserveFloat32;
    VkBool32           shaderSignedZeroInfNanPreserveFloat64;
    VkBool32           shaderDenormPreserveFloat16;
    VkBool32           shaderDenormPreserveFloat32;
    VkBool32           shaderDenormPreserveFloat64;
    VkBool32           shaderDenormFlushToZeroFloat16;
    VkBool32           shaderDenormFlushToZeroFloat32;
    VkBool32           shaderDenormFlushToZeroFloat64;
    VkBool32           shaderRoundingModeRTEFloat16;
    VkBool32           shaderRoundingModeRTEFloat32;
    VkBool32           shaderRoundingModeRTEFloat64;
    VkBool32           shaderRoundingModeRTZFloat16;
    VkBool32           shaderRoundingModeRTZFloat32;
    VkBool32           shaderRoundingModeRTZFloat64;
} VkPhysicalDeviceFloatControlsPropertiesKHR;

This structure will be filled by the driver when calling vkGetPhysicalDeviceProperties2(), with a pointer to such structure as one of the pNext pointers of VkPhysicalDeviceProperties2 structure. With that, we know if the driver will support the SPIR-V capabilities we want to use in our shaders, if separate*Settings are true, remember to check the value of the property for the floating point bit-size types you are planning to work with.

The required bits to enable such capabilities in a SPIR-V shader are the following:

  1. Enable the extension: OpExtension "SPV_KHR_float_controls"
  2. Enable the desired capability. For example: OpCapability DenormFlushToZero
  3. Specify where to apply it. For example, we would like to flush to zero all fp64 denormalss in the %main function of a shader: OpExecutionMode %main DenormFlushToZero 64. If we want to apply different modes, we would repeat that line with the needed ones.
  4. Profit!

I implemented the support of this extensions for the Anvil’s supported GPUs (Broadwell, Skylake, Kabylake and newer), although we don’t support all the capabilities. For example on Broadwell, float16 denormals are not supported, and the support for flushing to zero the float16 denormals is not supported for all the instructions in the rest of generations.

If you are interested, the patches are now under review :-) As there are not real world code using this feature yet, please fill any bug you find about this in our bugzilla.

December 04, 2018

The last time I talked about my driver work was to announce the implementation of the shaderInt16 feature for the Anvil Vulkan driver back in May, and since then I have been working on VK_KHR_shader_float16_int8, a new Vulkan extension recently announced by the Khronos group, for which I have just posted initial patches in mesa-dev supporting Broadwell and later Intel platforms.

As you probably guessed by the name, this extension enables Vulkan to consume SPIR-V shaders that use of Float16 and Int8 types in arithmetic operations, extending the functionality included with VK_KHR_16bit_storage and VK_KHR_8bit_storage, which was limited to load/store operations. In theory, applications that do not need the range and precision of regular 32-bit floating point and integers, can use these new types to improve performance by increasing ALU throughput and reducing register pressure, which in some platforms can also lead to improved parallelism.

In the case of the Intel platforms initial testing done by Intel suggests that better ALU throughput is expected when issuing half-float instructions. Lower register pressure is also expected, at least for SIMD16 fragment and compute shaders, where we can pack all 16-channels worth of half-float data into a single GPU register, which could significantly improve performance for shaders that would otherwise need to spill registers to memory.

Another neat thing is that while VK_KHR_shader_float16_int8 is a Vulkan extension, its implementation is mostly API agnostic, so most of the work we did here should also help us have a proper mediump implementation for GLSL ES shaders in the future.

There are a few caveats to consider as well though: on some hardware platforms smaller bit-sizes have certain hardware restrictions that may lead to emitting worse shader code in some scenarios, and generally, Mesa’s compiler infrastructure (and the Intel compiler backend in particular) have a long history of being 32-bit only, so there are parts of the compiler stack that still work better for 32-bit code.

Because VK_KHR_shader_float16_int8 is a brand new feature, we don’t really have any real world use cases yet. This is on top of the fact that Mesa’s compiler backends have been mostly (or exclusively) 32-bit aware until now (and more recently 64-bit too), so going forward I would expect a lot of focus on making our compiler be as robust (and optimal) for 16-bit code as it is for 32-bit code.

While we are already aware of a few areas where we can do better and I am currently working on addressing a few of these, one of the major limiting factors we have at the moment is the fact that the only source of 16-bit shaders available to us is the Khronos CTS, which due to its particular motivation, is very different from real world shader workloads and it is not a valid source material to drive compiler optimization work. Unfortunately, it might take some time until we start seeing applications using these new features, so in the meantime we will need to find other ways to drive further work in this area, and I think our best option here might be GLSL ES’s mediump and lowp qualifiers.

GLSL ES mediump and lowp qualifiers have been around for a long time but they are only defined as hints to the shader compiler that lower precision is acceptable and we have never really used them to emit half-float code. Thankfully, Topi Pohjolainen from Intel has been working on this for a while, which would open up a much better scenario for improving our 16-bit compiler paths, so this is something I am really looking forward to.

Finally, as I say above, we could could definitely use more testing and feedback from real world use cases, so if you decide to use this feature in your next project and you hit any bugs, please be sure to file them in Bugzilla so we can continue to improve our implementation.

December 03, 2018

The architecture is a bit of container matroska, but what we're trying to achieve is running Docker privileged inside of a LXC container on a baremetal host.

Alt text

Setup container on LXC Host

In order to give Docker in the guest privileges, the guest container itself has to be given privileges.

There is no simple switch for doing this in LXC unfortunately, but a few config options will do the trick.

lxc launch images:ubuntu/bionic container

lxc config set container security.nesting true
lxc config set container security.privileged true
cat <<EOT | lxc config set container raw.lxc -
lxc.cgroup.devices.allow = a
lxc.cap.drop =
EOT

lxc restart container

Setup docker on container

Just to verify that this works, start a privileged Docker container …

Last week while reviewing a patch I read that some gaming keyboards have two modes - keyboard mode and gaming mode. When in gaming mode, the keys send out pre-recorded macros when pressed. Presumably (I am not a gamer) this is to record keyboard shortcuts to have quicker access to various functionalities. The macros are stored in the hardware and are thus relatively independent of the host system. Pprovided you have access to the custom protocol, which you probably don't when you're on Linux. But I digress.

I reckoned this could be done in software and work with any 5 dollar USB keyboard. A few hours later, I have this working now: ggkbdd. It sits directly above the kernel and waits for key events. Once the 'mode key' is hit, the keyboard will send pre-configured key sequences for the respective keys. Hitting the mode key again (or ESC) switches back to normal mode.

There's a lot of functionality that is missing such as integration with the desktop (probably via DBus), better security (dropping privs, masking the fd to avoid accidental key logging), better system integration (request fds from logind, possibly through the compositor). And error handling, etc. I think the total time on this spent is somewhere between 3 and 4h, and that includes the time to write this blog post and debug the systemd unit autostartup. There are likely other projects that solve it the same way, or at least in a similar manner. I didn't check.

This was done as proof-of-concept and

  • I don't know if it's useful and if so, what the use-cases are
  • I don't know if I will have any time to fix things on this
  • I don't know if other (better developed) projects already occupy that space
In the grand glorious future and provided this is indeed something generally useful, this would need compositor integration. Not sure we'll ever get to that point. Meanwhile, consider this a code drop for a proof-of-concept and expect that you'll have to fix any bugs yourself.

I spoke at Linux Plumbers Conference 2018 in Vancouver a few weeks ago, about CUDA and the state of open source compute stacks.

The video is now available.

https://www.youtube.com/watch?v=d94N2Lu4x9s


In V3D land, I built kernel support for the Texture Formatting Unit (TFU) and started using it for glGenerateMipmaps() support. This unit will be also be important for reformatting linear dmabufs (such as from X11 with a linear-only scanout engine) or media decode output in SAND format into the UIF format that V3D can render from. The kernel side is in drm-misc-next, and I’ll land the Mesa side once it hits drm-next.

I also rewrote the V3D cache invalidation in the kernel thanks to feedback from Dave Emett on the hardware team. In the process, this fixed a 3ms(!) CPU-side wait on every job submission, which improved throughput by 4-10x. Now I know why my fancy new hardware felt so slow! I also landed new tracepoints for anyone else debugging execution (aka I could write good sysprof visualization now).

Now that cache management is less broken, I’ve started on resurrecting my old SSBO and shader image branches so that I can write CSD (compute shader) support and get the driver to GLES 3.1.

In the process of writing new kernel code, I found multiple regressions in DRM core during CTS runs. I rebased my old V3D IGT support and got it merged, and built new tests for replicating one of the failures (GPU hang recovery was broken).

The tinydrm work from the previous month also continued. I’ve polished kmsro some more (fewer driver symlinks, more loader knowledge of kmsro), and will be resubmitting soon I hope.

Since I had kmsro nearly working, I gave it a shot for a V3D testing idea from last month. Right now to get X11 running to do conformance runs on it, I’ve got a stub KMS implementation in a branch. If I could do buffer sharing between VKMS and V3D, then kmsro could bind the two together so that we didn’t need out-of-tree patches for V3D testing. It turned out that VKMS didn’t have PRIME support, so I used Noralf’s proposed shmem helpers to add it. I got X11 up and running and looking plausible, but tests were failing left and right. This may be due to the shmem helpers using cached CPU mappings, when we want WC for interaction with most GPUs. Because of issues like this, the shmem helpers for VKMS are stalled at the moment but I may land V3D support in kmsro anyway.

In vc4 land, the Bootlin folks have been knocking off more of the HVS TODO entries. Boris respun underscan support, and hopefully after the next revision we’ll be able to land it. He’s also implemented H/V flipping of planes, which will be useful in some cases for 180-degree rotated displays. (For 90 degree rotations, we would need panel support for it as the HVS can’t go that way). Paul is taking over Boris’s load tracker work and is building tests for it to determine how accurate it is in predicting underflows.

Finally, for vc4 I got permission to write a driver for the PM block that controls power and reset. So far the only thing Linux uses the block for directly is the watchdog timer (which also performs reboots by letting the watchdog timer fire quickly). For power domains, we’ve been using my raspberrypi-power driver to talk to the Raspberry Pi firmware. However, we should be able to do things more reliably, and implement GPU reset properly, if we expose power domains and a reset controller from the PM block. I wrote a series that moves the PM block to a new MFD driver, binds the watchdog driver to it, and adds a new soc driver for power domains and reset. So far, I’ve only tested this driver for V3D power domains and reset, but it looks good and gets more of the Raspberry Pi platform supported in fully open source code.

November 22, 2018
I’ve been working on FreeBSD for Intel for almost 6 months now. In the world of programmers, I am considered an old dog, and these 6 months have been all about learning new tricks. Luckily, I’ve found myself in a remarkably inclusive and receptive community whose patience seems plentiful. As I get ready to take […]
November 19, 2018

Since the day I had my first class of Operating Systems (OS) in my engineering course, I got passionate about it; for me, OS represents one of the greatest achievements of mankind. As a result of my delight for OS, I always tried to gravitate around this field, but my school environment did not provide me with many opportunities to get into the area. To summarize this long journey, I will jump directly into the main point, on November 15 of 2017, I joined to a conference named Linuxdev-br [1] which brought together some of the best Brazilians Kernel developers. I took this opportunity to learn everything that I could by asking lots of questions to developers. Additionally, I was lucky to meet Gustavo Padovan. He helped me a lot during my first steps in the Linux Kernel.

From November 2017 until now, I did the best I could to become a Kernel developer, and I have to admit that the path was very complicated. I paid the price to work from 8 AM to 11 PM, from Sunday to Sunday, to maintain my efforts in my master and the Linux Kernel at the same time; unfortunately, I could not stay focused only in the Kernel. However, all of these efforts were paid off along the year; I had many patches accepted in the Kernel, I joined the Google Summer of Code (GSoC), I traveled to conferences, I returned to Linuxdev-br 2018 as a speaker, I joined XDC2018 [2], and many other good things happened.

Now I am close to complete one year of Linux Kernel, and one question still bugs me: why does it have to be so hard for someone in a similar condition to become part of this world? I realized that I had great support from many people (especially from my sweet and calm wife) and I also pushed myself very hard. Now, I feel that it is time to start giving back something to society; as a result, I began to promote some small events about free software in the university and the city I live. However, my main project related to this started around two months ago with six undergraduate students at the University of Sao Paulo, IME [3]. My plan is simple: train all of these six students to contribute to the Linux Kernel with the intention to help them to create a local group of Kernel developers. I am excited about this project! I noticed that within a few weeks of mentoring the students they already learned lots of things, and in a few days, they will send out their contributions to the Kernel. I want to write a new post about that in December 2018, reporting the results of this new tiny project and the summary of this one year of Linux Kernel. See you soon :)

Reference

  1. linuxdev-br
  2. XDC 2018
  3. IME USP
November 15, 2018
As discussed in my previous blog post one of my TODO list items for plymouth is creating a new plymouth theme.

Since the transition to plymouth is not entirely smooth plymouth by default will wait 5 seconds (counted from starting the kernel) before showing itself so that on systems which boot under 5 seconds it never shows. As can be seen in this video, this leads to a very non-smooth experience when the boot takes say 7 seconds as plymouth then only shows briefly, leading to a kinda "flash" effect while it briefly shows.

Another problem with the 5 second wait, is now that we do not show GRUB the user is looking at the firmware's bootsplash for not only the often long firmware initialization time, but also for the 5 seconds plymouth waits on top, making it look as if nothing is happening.

To fix this I've been working on a new plymouth theme which draws a spinner over the firmware boot splash, eliminating the ugly transition from the firmware boot splash to plymouth. This also allows removing the show-delay, so that we provide feedback that something is happening as soon as plymouth starts.

Firmware being firmware getting this done right was somewhat harder then I expected, but I've a first "draft" of a new theme doing this now. I've created some videos showing 2 different systems booting the new theme:Note the videos with diskcrypt where paused when I entered my passphrase. So there is a bit of a jump in them because of this.

I've built a test version of plymouth for Fedora 29, to give this a try download all rpm files from here except the .src.rpm and -devel files and then from a directory with all those files in it, run:

sudo rpm -Uvh plymouth*.rpm

Since plymouth is part of your initrd, you also need to regenerate your initrd:

sudo dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

This regenerates the initrd for the kernel you are currently running, so if you've installed a kernel update and have not rebooted since then you may not get the new theme when rebooting. In this case rerun the dracut command after rebooting.

Note if you've previously followed my instructions to test flickerfree boot, then you need to remove "plymouth.splash_delay=20" from your kernel commandline, since we now no longer want to have a splash-delay.

Now reboot and you should get the new spinner on firmware-boot-splash theme, with Fedora branding.

If you give this a try and the new theme somehow does not look correct, please mail at hdegoede@redhat.com. If you mail me about the theme not displaying correctly please attach the /run/plymouth.log file which this test-build generates to the email and a video of how the theme misbehaves would be great too.

I still need to discuss the idea of using a new theme incorporating the firmware boot splash with the GNOME design team so this is all subject to change.
October 31, 2018
Good morning from Edinburgh, where the breakfast contains haggis, and the charity shops have some interesting finds.

My main goal in attending this hackfest was to discuss Pipewire integration in the desktop, and how it will eventually replace PulseAudio as the audio daemon.

The main problem GNOME has had over the years with PulseAudio relate mostly to how PulseAudio was a black box when it came to its routing policy. What happens when you plug in an HDMI cable into your laptop? Or turn on your Bluetooth headset? I've heard the stories of folks with highly mobile workstations having to constantly visit the Sound settings panel.

PulseAudio has policy scattered in a number of places (do a "git grep routing" inside the sources to see that): some are in the device manager, then modules themselves can set priorities for their outputs and inputs. But there's nothing to take all the information in, and take a decision based on the hardware that's plugged in, and the applications currently in use.

For Pipewire, the policy decisions would be split off from the main daemon. Pipewire, as it gains PulseAudio compatibility layers, will grow a default/example policy engine that will try to replicate PulseAudio's behaviour. At the very least, that will mean that Pipewire won't regress compared to PulseAudio, and might even be able to take better decisions in the short term.

For GNOME, we still wanted to take control of that part of the experience, and make our own policy decisions. It's very possible that this engine will end up being featureful and generic enough that it will be used by more than just GNOME, or even become the default Pipewire one, but it's far too early to make that particular decision.

In the meanwhile, we wanted the GNOME policies to not be written in C, difficult to experiment with for power users, and for edge use cases. We could have started writing a configuration language, but it would have been too specific, and there are plenty of embeddable languages around. It was also a good opportunity for me to finally write the helper library I've been meaning to write for years, based on my favourite embedded language, Lua.

So I'm introducing Anatole. The goal of the project is to make it trivial to write chunks of programs in Lua, while the core of your project is written in C (we might even be able to embed it in Python or Javascript, once introspection support is added).

It's still in the very early days, and unusable for anything as of yet, but progress should be pretty swift. The code is mostly based on Victor Toso's incredible "Lua factory" plugin in Grilo. (I'm hoping that, once finished, I won't have to remember on which end of the stack I need to push stuff for Lua to do something with it ;)

Last month the X.Org Developer’s Conference (XDC) was held in A Coruña, Spain. I took part as a Plasma/KWin developer. My main goal was to simply get into contact with developers from other projects and companies working on open source technology in order to show them that the KDE community aims at being a reliable partner to them now and in the future.

Instead of recounting chronologically what went down at the conference let us look at three key groups of attendees, who are relevant to KWin and Plasma: the graphics drivers and kernel developers, upstream userland and colleagues working on other compositor projects.

Graphics drivers and kernel

If you search on Youtube for videos of talks from previous XDC conferences or for the videos from this year’s XDC you will notice that there are many talks by graphics drivers developers, often directly employed by hardware vendors.

The reason is that hardware vendors have enough money to employ open source developers and send them to conferences and that they benefit greatly from contributing directly to open source projects. Something which also Nvidia management will realize at some point.

On the other side I talked to the Nvidia engineers at the conference, who were very friendly and eager to converse about their technical solutions which they are allowed to share with the community. Sadly their primarily usage of proprietary technology in general hinders them in taking a more active role in the community and there is apparently no progress on their proposed open standard Wayland buffer sharing API.

At least we arranged that they would send some hardware for testing purposes. I won’t be the recipient, since my work focus will be on other topics in the immediate future, but I was able to point to another KWin contributor, who should receive some Nvidia hardware in the future so he can better troubleshoot problems our users on Nvidia experience.

The situation looks completely different for Intel and AMD. In particular Intel has a longstanding track record of open development of their own drivers and contributing to generic open source solutions also being supported by other vendors. And AMD decided not too long ago to open source their most commonly used graphics drivers on Linux. In both cases it is a bliss to target their latest hardware and it was as great as I imagined it to be talking to their developers at XDC, because they are not only interested in their own products but in boosting the whole ecosystem and finding suitable solutions for everyone. I want to explicitly mention Martin Peres from Intel and Harry Wentland from AMD, who I had long, interesting discussions with and who showed great interest in improving the collaboration of low-level engineers and us in userland.

Who I haven’t mentioned yet is ARM. Although they are just like Nvidia, Intel and AMD an XDC “Gold Sponsor” their contribution in terms of content to the conference was minimal, most likely for the same reason of being mostly closed source as in the Nvidia case. And that is equally sad, since we do have some interest in making ARM a well supported target for Plasma. An example is Plasma on the Pinebook. But the driver situation for ARM Mali GPUs is just ugly, developing for them is torture. I know because I did some of the integration work for the Pinebook. All the more I respect the efforts by several extremely talented hackers to provide open-source drivers for ARM Mali GPUs. Some of them presented their work at XDC.

X.Org and freedesktop.org upstream

Linux graphics drivers are cool and all, but without XServer, Wayland and other auxiliary cross-vendor user space libraries there would be not much to show off to the user. And after all it is the X.Org Developer’s conference, most notably being home to the XServer and maybe in the future governance wise also to freedesktop.org. So after looking at low-level driver development, what role did these projects and their developers play at the conference?

First I have to say, that the dichotomy established in the previous paragraph is of course not that distinct. Several graphics drivers are part of mesa, which is again part of freedesktop.org and many graphics drivers developers are also contributing to user land or involved in organizational aspects of X.Org and freedesktop.org. A more prominent one of these organizational aspects is hosting of projects. There was a presentation by Daniel Stone about the freedesktop.org transition to GitLab, what was a rather huge project this year and is still ongoing.

But regarding technical topics there were not many presentations about XServer, Wayland and other high level components. After seeing some lightning talks on the first day of the conference I decided to hold a lightning talk myself about my Xwayland GSOC project in 2017. I got one of the last slots on Friday and you can watch a video of my presentation here. Also Drew De Vault presented a demo of wlroot’s layer shell.

So there were not so many talks about the higher level user space graphics stack, but some of us plan to increase the ratio of such talks in the future. After talking about graphics drivers developers and upstream userland this brings me directly to the last group of people:

Compositors developers

We were somewhat a special crowd at XDC. From distinct projects, some of us were from wlroots, Guido from Purism and me from KWin, we were united in, to my knowledge, all of us being the first time at XDC.

If you look at past conferences the involvement of compositor developers was marginal. My proclaimed goal and I believe also the one of all the others is to change this from now on. Because from embedded to desktop we will all benefit by working together where possible and exchanging information with each other, with upstream and with hardware vendors. I believe X.Org and freedesktop.org can be a perfect platform for that.

Final remarks on organisation

The organisation of the conference was simply great. Huge thanks to igalia for hosting XDC in their beautiful home town.

What I really liked about the conference schedule was that there were always three long breaks every day and long pauses between the talks allowing the attendees to talk to each other.

What I didn’t like about the conference was that all the attendees were spread over the city in different hotels. I do like the KDE Akademy approach better in this regard: everyone in one place so you can drink together a last beer at the hotel bar before going to bed. That said there were events at multiple evenings throughout the week, but recommending a reasonable priced default hotel for everyone not being part of a large group might still be an idea for next XDC.

October 30, 2018

So we kicked off the PipeWire hackfest in Edinburgh yesterday. We have 15 people attending including Arun Raghavan, Tanu Kaskinen and Colin Guthrie from PulseAudio, PipeWire creator Wim Taymans, Bastien Nocera and Jan Grulich representing GNOME and KDE, Mark Brown from the ALSA kernel team, Olivier Crête,George Kiagiadakis and Nicolas Dufresne was there to represent embedded usecases for PipeWire and finally Thierry Bultel representing automotive.

The event kicked off with Wim Taymans presenting on current state of PipeWire and outlining the remaining issues and current thoughts on how to resolve them. Most of the first day was spent on a roadtable discussion about what are and should be the goals of PipeWire and what potential tradeoffs there would be going forward. PipeWire is probably a bit closer to Jack than PulseAudio in design, so quite a bit of the discussion went on how that would affect the PulseAudio usecases and what is planned to ensure PipeWire works very well for consumer audio usecases.

Personally I ended up spending quite some time just testing and running various Jack apps to see what works already and what doesn’t. In terms of handling outputing audio with Jack apps I was positively surprised how many Jack apps I was able to make work (aka output audio) using PipeWire instead of Jack, but of course we still have some gaps to cover before PipeWire is ready as a drop-in Jack replacement, for instance the Jack session management protocol needs to be implemented first.

The second day we outlined the areas that need work before we are ready to replace PulseAudio and came up with the following list:

  • Mixers – This is basically dealing with hardware mixers. Arun and Wim started looking at a design for this during the hackfest.
  • PulseAudio services – This is all the things in PulseAudio that is not very suitable for putting inside PipeWire. The idea is instead to put them in a separate daemon. This includes things like network streaming, ROAP, DBus apis and so on.
  • Policy/Session handling – We plan to move policy and session handling out of PulseAudio to make it easier for different usecases to set their own policies. PipeWire will still provide some default setup, but the idea here is to have a separate daemon(s) to provide this. Bastien Nocera started prototyping a setup where he could create policy and session handling using Lua scripting.
  • Filters
  • Bluetooth – Ensuring we have great bluetooth support with PipeWire. We would want to move Bluetooth handling to its own daemon, and not have it inside like in PulseAudio to allow for more flexibility with various embedded bluetooth stacks for instance. This could also mean looking at the Linux Bluetooth stack more widely as things are not ideal atm, especially from a security viewpoint.
  • Device reservation – We expect to replace Jack and PulseAudio in steps, starting with PulseAudio. So dealing well with hardware reservation is important to allow people to for instance keep running Jack alongside PipeWire until we are ready for full replacement.
  • Stream Monitoring – Important feature from Jack and PulseAudio that still needs implementing to allowing monitoring audio devices and streams.
  • Latency handling – Improving ways we can deal with hardware latency in for instance consumer devices such as TVs

It is still a bit hard to have a clear timeline for when we will be ready to drop in PipeWire support to replace PulseAudio and then Jack, but we feel the Wayland migration was a good example to follow where we held off doing the switch until we felt comfortable the move would be transparent to most users. There will of course always be corner cases and bugs, but we hope that in general people agree that the Wayland transition was done in a responsible manner and thus could be a good example to follow for us here.

We would like to offers big thanks to the GNOME Foundation for sponsoring travel for some of the community attendees and to Collabora for sponsoring dinner for all attendees the first night.

If you want to take a look at PipeWire, Wim updated the wiki page with PipeWire build intructions to be up-to-date. The hackfest attendees tested them out so we are sure they work, just be aware that you want the ‘Work’ branch and not the Master branch, as that is the one where all the audio work is happening. The Master branch is the video focused branch we use in Fedora for desktop remoting support in browsers and VNC under Wayland.

This last week, I worked on the oldest issue I had in my rpi kernel repo: Adding support for tinydrm displays (it actually predates tinydrm). I wrote a tinydrm driver for the panel I had on hand and got it upstreamed. I wrote a Mesa series enabling vc4 to talk to this new hx8357d tinydrm driver. Once the Mesa side is agreed on, then we can extend it to the other tinydrm drivers and have a solution for X with 3D with limited updates (as opposed to full screen refreshing like fbdev did) on these panels.

I also fixed a longstanding vc4 bug that slightly rotated SDL2 output. They were running all of their coordinates through a rotation matrix they built on each VS invocation, and in the common case of no rotation the cos(0)/sin(0) was inaccurate enough to rotate things by a pixel near the window edges. The sin()/cos() in Mesa is now more accurate so this problem shouldn’t happen with old SDL2, we have a piglit test to make sure other drivers don’t break with SDL2, and upstream SDL2 has stopped building a rotation matrix in the shader. (yay!)

Small stuff from the V3D driver since my last post:

  • Fixed depth writes (or depth/color non-writes) on V3D 4.2.
  • Reduced shader recompiles on V3D 4.2.
  • Reduced VPM consumption by vertex shaders using shared NIR optimizations.
  • Took advantage of hardware half-float pack/unpack operations.
  • Switched from FLUSH_ALL_STATE to FLUSH (less overhead, tested by HW team).
  • Merged a fix for use-after-free of sched fences.
  • Merged a fix for debugfs oops on V3D 4.1
  • Merged the kernel V3D clock measurement debugfs entry.
  • Merged more documentation of the SUBMIT_CL.

Also in V3D, I’ve been experimenting with other ways of running the driver using the software simulator. Right now I have a frankenstein mode in vc4 and v3d where the driver sits on top of the i915 GEM driver, and does BO mapping and interactions with the window system on i915 and then copies those BOs into the simulator to run vc4/v3d ioctls on it. This is gross, and it’s only really usable on my desktop with the i915 driver.

Talking with one of the HW engineers who’s interested in running Mesa against the simulator, there are a few ways we could do simulation. One would be to have the actual kernel driver call back up into a userspace daemon linking the simulator, and provide mappings of BOs and proxied register writes to that userspace dameon. Another would be to implement an actual V3D device in QEMU by linking in the simulator, but we can’t do that due to the GPL. Another would be to use my current simulation code, but let it work on top of vkms or vgem (so that a CI system wouldn’t need i915). This one seems easy. However, I started on a project mimicing what Intel did for their software-only CI setup: intercept libc calls to simulate having a V3D driver that’s not actually present in the kernel. I’ve got the code to the point of trying to submit CLs, at which point the simulator hangs for reasons I haven’t debugged yet. I don’t know if I’ll want to do this long term, but it seems like potentially useful code to other driver developers who have a software simulator.

Finally, in the X world, this month I’ve reviewed and merged more patches than I normally would (thanks, gitlab!), and wrote an extensible testcase for the damage extension due to a bug report/fix that a user had been submitted.

October 23, 2018

As many of you know we kicked of a ambitious goal to revamp the Linux desktop when we launched Fedora Workstation 4 years. We wanted to remove many of the barriers to adoption of Linux as a desktop and make it a better operating system for all, especially for developers.
To that effect we have been pushing a long range of initiatives over the last 4 years ago, ranging from providing a better input stack through libinput, a better display system through Wayland, a better audio and video subsystem through PipeWire, a better way of doing application packaging and dependency handling through Flatpak, a better application installation history through GNOME Software, actual firmware handling for Linux through Linux Vendor Firmware Service, better manageability through Fleet Commander, and Project Silverblue for reliable OS updates. We also had a lot of efforts done to improve general hardware handling, be that work on glvnd and friends for dealing with NVidia driver, the Bolt project for handling Thunderbolt devices better, HiDPI support in the desktop, better touch support in the desktop, improved laptop battery life, and ongoing work to improve state of fingerprint readers under Linux and to provide a flicker free boot experience.

One thing though that was clear to us was that as we where making all these changes to improve the ease of use and reliability of Linux as a desktop operating system we couldn’t make life worse for developers. Developers are the lifeblood of Fedora and Linux and thus we have had Debarshi Ray working on a project we call Fedora Toolbox. Fedora toolbox creates a seamless experience for developers when using an immutable OS like Silverblue, yet want to be able to install the wonderful world of software libraries and tools that makes Linux so powerful for developers. Fedora Toolbox is now ready for early adopters to start testing, so I recommend jumping over to Debarshi’s blog to read up on Fedora Toolbox.

October 22, 2018

Oculus Quest. It’s a cool step forward, no doubt: an all-in-one headset, with touch controllers, inside-out 6DOF tracking, no sensors, no wires, headphones and so on. Neat! I can’t wait to put my hands on it… but, there’s a but:

“It runs rift quality experiences”, that’s Zuckerberg’s words. And they’re twice wrong.

Rift device runs tethered to a PC, which is a x86 architecture based machine. The Quest will be running on a Snapdragon 8xx processor, which is a mobile processor, SoC architecture. There’s no way to compare the mobile architecture’s processing capabilities with the x86 one and its way more powerful CPUs / GPUs – roughly speaking, one focuses on power consumption while the other on performance.

So, no “rift quality” into Quest.

The other thing is that the OS, drivers and graphics libraries are different on each architecture, so say your latest shader feature will be different on each architecture and, in principle the same binary application won’t be able to run on both. In other words, this means that you will need to tell the 3D artist and the programmer to work the application out again… and they will get really mad at you! :)

Therefore, Oculus Quest won’t “run rift” existing apps and games either. They will need to be re-built.

 

Screenshot from 2018-10-08 23-26-26

October 20, 2018

In the past month-and-a-half since the previous update on the free (open source) Panfrost driver for Mali GPUs, progress has been fervent! During the interim, Lyude Paul and I presented about the driver at the X.Org Developer’s Conference. Check out the talk (slides). Benchmarks included!

Since the talk, we’ve been working on Panfrost like mad. Literally – I was pretty angry by that distorted texture bug! The culprit proved to be the compiler emitting imov (integer move) instructions rather than fmov (floating-point move), apparently causing precision degradation while processing texture coordinates.

In any event, implementing a few new features culminated in the above jellyfish demo. This jellyfish scene is a part of glmark2-es2, a set of OpenGL ES programs used to test and benchmark GPU drivers. For Panfrost, I use glmark as a set of milestones to work towards while developing the driver. glmark is comprehensive and can do just about anything I need while developing graphics drivers, except for making me a peanut butter and jellyfish sandwich. Er, never mind, it looks like they just added an OpenGL sandwich scene. Bonus!

Aside from the texturing issue, the next major challenge faced by the jellyfish scene was “uniform spilling”. Many GPUs require a special “load uniform” instruction in the shader to access a uniform; Midgard does not… usually. Instead, up to 16 (32-bit vec4) uniforms can be preloaded into registers, which makes them free to access in shaders. Yay performance!

The problem with this uniform loading scheme is when a shader needs more than 16 uniforms, which is frequent in vertex shaders loading large matrices. A single 4x4 matrix eats up 4 uniforms; 4 such matrices and there are no uniforms left for driver-internal state. Oops.

The solution is surprisingly simple: rather than using this register-based fast path, use the traditional “load uniform” instructions which have no practical limit on the number of uniforms, at the expense of a nominal performance hit. Specifically, the driver apparently switches from using uniforms to internally generating their crazy cousin, uniform buffer objects. This technique correctly handles these complex shaders.

However, Panfrost goes one step further than the blob appears to! Since uniforms and uniform buffer objects can be used simultaneously, we can use the same memory block for both kinds of uniforms – two for the price one! Mapping the uniform memory like this, the compiler is free to use the register fast path for the first 16 uniforms and seamlessly fallback on the general slow path for the rest, shaving off sixteen “load uniform” instructions on a shader that otherwise has to “spill” to dedicated instructions. A win!

Another major change prompted by a bug bringing up jellyfish surrounded register allocation. Previously, Panfrost used an ersatz, home-rolled register allocator, good enough for spinning gears but a little clunky. Debugging jellyfish uncovered a buggy corner case; the naive solution unfortunately increased register pressure, a performance issue. But there’s no need to settle for naive!

“Bayes”ing the work on Mesa’s classy register allocation library, I scrapped the old code and wrote a more robust, sophisticated register allocator. The new allocator, using graph coloring rather than ad hoc liveness checks, already generates better code with lower register pressure, neatly resolving the issue found in jellyfish. Plus, it will permit further register-related progress when we tackle register spilling and parallelism in stores and texture instructions. The new code is considerably longer, but it will pay off as shader complexity creeps up.

A few miscellaneous bugs later, et voila, Jellyfish running on GPU-accelerated using only free software on a Rockchip RK3399 laptop.

Refract

It’s a few hops away from the jellyfish, yet I carrot believe Panfrost is running glmark’s refract demo!

Like jellyfish, uniform spilling is needed for refract. We have been able to render the bunny itself for quite a while, as demoed at the X.Org Developer’s Conference, linked above. Where’s the issue?

Framebuffer objects.

Remember how I mentioned back in April that framebuffer objects were one of the few major features missing in Panfrost?

We can strike that off the list now!

For those unfamiliar, by default, a 3D application renders to the screen. When the application instead renders off-screen, to be later read as a texture, it’s said to render to a “framebuffer object” in graphics lingo. Since Mali is a “render-only” GPU, independent of the display, we already have the freedom to render to arbitrary buffers. Theoretically, rather than telling the GPU to render on-screen, we simply it to render literally anywhere else but the screen. Pretty simple!

Of course, it’s never that simple. To reduce power consumption by saving memory bandwidth, Mali GPUs feature “ARM Framebuffer Compression”, or AFBC for short. As advertised, AFBC is a lossless compression scheme for compressing the framebuffer in-flight from one part of the chip to another.

Only across parts of the chip? Always interested in reducing power, it turns out the blob is aggressive about enabling AFBC anywhere it can. Previously, we could ignore AFBC entirely, since the X11 blob is apparently unable to use AFBC for display. However, since framebuffer objects are off-screen and stored internal to the driver, it doesn’t matter to the rest of the system what format is used. Accordingly, we have never seen a framebuffer object that doesn’t use AFBC, so I had to implement AFBC support in the driver.

Furthermore, framebuffer objects mean juggling multiple framebuffers at once. Together with AFBC support, this additional complexity required a considerable refactor of the command stream driver code to support the extra bookkeeping. Many coding afternoons and a transatlantic flight later, Panfrost implemented preliminary (color) framebuffer objects.

However, to add to the joy, refract does not only use a color attachment to a framebuffer object, but also a depth attachment. Conceptually, the two modes of operation are identical; in practice, they require a different encoding in the command stream and an additional increase in bookkeeping leading to yet another (small) refactor. Plenty of keyboard mashing and brain freezing later, Panfrost supported depth attachments to framebuffer objects as well as color. And poof, a refraction bunny!


Now, for the piece de resistance, rather than a glmark demo, we have the reference Wayland compositor, implementing a simple GPU-accelerated desktop. With a small workaround added to the driver, Panfrost is able to run …

Weston!

October 19, 2018

This week, the GNOME Foundation Board of Directors met at the Collabora office in Cambridge, UK, for the second annual Foundation Hackfest. We were also joined by the Executive Director, Neil McGovern, and Director of Operations, Rosanna Yuen. This event was started by last year’s board and is a great opportunity for the newly-elected board to set out goals for the coming year and get some uninterrupted hacking done on policies, documents, etc. While it’s fresh in our mind, we wanted to tell you about some of the things we have been working on this week and what the community can hope to see in the coming months.

Wednesday: Goals

On Wednesday we set out to define the overall goals of the Foundation, so we could focus our activities for the coming years, ensuring that we were working on the right priorities. Neil helped to facilitate the discussion using the Charting Impact process. With that input, we went back to the purpose of the Foundation and mapped that to ten and five year goals, making sure that our current strategies and activities would be consistent with reaching those end points. This is turning out to be a very detailed and time-consuming process. We have made a great start, and hope to have something we can share for comments and input soon. The high level 10-year goals we identified boiled down to:

  • Sustainable project and foundation
  • Wider awareness and mindshare – being a thought leader
  • Increased user base

As we looked at the charter and bylaws, we identified a long-standing issue which we need to solve — there is currently no formal process to cover the “scope” of the Foundation in terms of which software we support with our resources. There is the release team, but that is only a subset of the software we support. We have some examples such as GIMP which “have always been here”, but at present there is no clear process to apply or be included in the Foundation. We need a clear list of projects that use resources such as CI, or have the right to use the GNOME trademark for the project. We have a couple of similar proposals from Allan Day and Carlos Soriano for how we could define and approve projects, and we are planning to work with them over the next couple of weeks to make one proposal for the board to review.

Thursday: Budget forecast

We started the second day with a review of the proposed forecast from Neil and Rosanna, because the Foundation’s financial year starts in October. We have policies in place to allow staff and committees to spend money against their budget without further approval being needed, which means that with no approved budget, it’s very hard for the Foundation to spend any money. The proposed budget was based off the previous year’s actual figures, with changes to reflect the increased staff headcount, increased spend on CI, increased staff travel costs, etc, and ensure after the year’s spending, we follow the reserves policy to keep enough cash to pay the foundation staff for a further year. We’re planning to go back and adjust a few things (internships, marketing, travel, etc) to make sure that we have the right resources for the goals we identified.

We had some “hacking time” in smaller groups to re-visit and clarify various policies, such as the conference and hackfest proposal/approval process, travel sponsorship process and look at ways to support internationalization (particularly to indigenous languages).

Friday: Foundation Planning

The Board started Friday with a board-only (no staff) meeting to make sure we were aligned on the goals that we were setting for the Executive Director during the coming year, informed by the Foundation goals we worked on earlier in the week. To avoid the “seven bosses” problem, there is one board member (myself) responsible for managing the ED’s priorities and performance. It’s important that I take advantage of the opportunity of the face to face meeting to check in with the Board about their feedback for the ED and things I should work together with Neil on over the coming months.

We also discussed a related topic, which is the length of the term that directors serve on the Foundation Board. With 7 staff members, the Foundation needs consistent goals and management from one year to the next, and the time demands on board members should be reduced from previous periods where the Foundation hasn’t had an Executive Director. We want to make sure that our “ten year goals” don’t change every year and undermine the strategies that we put in place and spend the Foundation resources on. We’re planning to change the Board election process so that each director has a two year term, so half of the board will be re-elected each year. This also prevents the situation where the majority of the Board is changed at the same election, losing continuity and institutional knowledge, and taking months for people to get back up to speed.

We finished the day with a formal board meeting to approve the budget, more hack time on various policies (and this blog!). Thanks to Collabora for use of their office space, food, and snacks – and thanks to my fellow Board members and the Foundation’s wonderful and growing staff team