planet.freedesktop.org
March 06, 2015

libinput supports edge scrolling since version 0.7.0. Whoops, how does the post title go with this statement? Well, libinput supports edge scrolling, but only on some devices and chances are your touchpad won't be one of them. Bug 89381 is the reference bug here.

First, what is edge scrolling? As the libinput documentation illustrates, it is scrolling triggered by finger movement within specific regions of the touchpad - the left and bottom edges for vertical and horizontal scrolling, respectively. This is in contrast to two-finger scrolling, triggered by a two-finger movement, anywhere on the touchpad. synaptics had edge scrolling since at least 2002, the earliest commit in the repo. Back then we didn't have multitouch-capable touchpads, these days they're the default and you'd be struggling to find one that doesn't support at least two fingers. But back then edge-scrolling was the default, and touchpads even had the markings for those scroll edges painted on.

libinput adds a whole bunch of features to the touchpad driver, but those features make it hard to support edge scrolling. First, libinput has quite smart software button support. Those buttons are usually on the lowest ~10mm of the touchpad. Depending on finger movement and position libinput will send a right button click, movement will be ignored, etc. You can leave one finger in the button area while using another finger on the touchpad to move the pointer. You can press both left and right areas for a middle click. And so on. On many touchpads the vertical travel/physical resistance is enough to trigger a movement every time you click the button, just by your finger's logical center moving.

libinput also has multi-direction scroll support. Traditionally we only sent one scroll event for vertical/horizontal at a time, even going as far as locking the scroll direction. libinput changes this and only requires a initial threshold to start scrolling, after that the caller will get both horizontal and vertical scroll information. The reason is simple: it's context-dependent when horizontal scrolling should be used, so a global toggle to disable doesn't make sense. And libinput's scroll coordinates are much more fine-grained too, which is particularly useful for natural scrolling where you'd expect the content to move with your fingers.

Finally, libinput has smart palm detection. The large majority of palm touches are along the left and right edges of the touchpad and they're usually indistinguishable from finger presses (same pressure values for example). Without palm detection some laptops are unusable (e.g. the T440 series).

These features interfere heavily with edge scrolling. Software button areas are in the same region as the horizontal scroll area, palm presses are in the same region as the vertical edge scroll area. The lower vertical edge scroll zone overlaps with software buttons - and that's where you would put your finger if you'd want to quickly scroll up in a document (or down, for natural scrolling). To support edge scrolling on those touchpads, we'd need heuristics and timeouts to guess when something is a palm, a software button click, a scroll movement, the start of a scroll movement, etc. The heuristics are unreliable, the timeouts reduce responsiveness in the UI. So our decision was to only provide edge scrolling on touchpads where it is required, i.e. those that cannot support two-finger scrolling, those with physical buttons. All other touchpads provide only two-finger scrolling. And we are focusing on making 2 finger scrolling good enough that you don't need/want to use edge scrolling (pls file bugs for anything broken)

Now, before you get too agitated: if edge scrolling is that important to you, invest the time you would otherwise spend sharpening pitchforks, lighting torches and painting picket signs into developing a model that allows us to do reliable edge scrolling in light of all the above, without breaking software buttons, maintaining palm detection. We'd be happy to consider it.

This feature got merged for libinput 0.8 but I noticed I hadn't blogged about it. So belatedly, here is a short description of scroll sources in libinput.

Scrolling is a fairly simple concept. You move the mouse wheel and the content moves down. Beyond that the details get quite nitty, possibly even gritty. On touchpads, scrolling is emulated through a custom finger movement (e.g. two-finger scrolling). A mouse wheel moves in discrete steps of (usually) 15 degrees, a touchpad's finger movement is continuous (within the device physical resolution). Another scroll method is implemented for the pointing stick: holding the middle button down while moving the stick will generate scroll events. Like touchpad scroll events, these events are continuous. I'll ignore natural scrolling in this post because it just inverts the scroll direction. Kinetic scrolling ("fling scrolling") is a comparatively recent feature: when you lift the finger, the final finger speed determines how long the software will keep emulating scroll events. In synaptics, this is done in the driver and causes all sorts of issues - the driver may keep sending scroll events even while you start typing.

In libinput, there is no kinetic scrolling at all, what we have instead are scroll sources. Currently three sources are defined, wheel, finger and continuous. Wheel is obvious, it provides the physical value in degrees (see this post) and in discrete steps. The "finger" source is more interesting, it is the hint provided by libinput that the scroll event is caused by a finger movement on the device. This means that a) there are no discrete steps and b) libinput guarantees a terminating scroll event when the finger is lifted off the device. This enables the caller to implement kinetic scrolling: simply wait for the terminating event and then calculate the most recent speed. More importantly, because the kinetic scrolling implementation is pushed to the caller (who will push it to the client when the Wayland protocol for this is ready), kinetic scrolling can be implemented on a per-widget basis.

Finally, the third source is "continuous". The only big difference to "finger" is that we can't guarantee that the terminating event is sent, simply because we don't know if it will happen. It depends on the implementation. For the caller this means: if you see a terminating scroll event you can use it as kinetic scroll information, otherwise just treat it normally.

For both the finger and the continuous sources the scroll distance provided by libinput is equivalent to "pixels", i.e. the value that the relative motion of the device would otherwise send. This means the caller can interpret this depending on current context too. Long-term, this should make scrolling a much more precise and pleasant experience than the old X approach of "You've scrolled down by one click".

The API documentation for all this is here: http://wayland.freedesktop.org/libinput/doc/latest/group__event__pointer.html, search for anything with "pointer_axis" in it.

February 28, 2015

Work continues on Typhon. I've recently yearned for a way to study the Monte-level call stacks for profiling feedback. After a bit of work, I think that I've built some things that will help me.

My initial desire was to get the venerable perf to work with Typhon. perf's output is easy to understand, with a little practice, and describes performance problems pretty well.

I'm going to combine this with Brendan Gregg's very cool flame graph system for visually summarizing call stacks, in order to show off how the profiling information is being interpreted. I like flame graphs and they were definitely a goal of this effort.

Maybe perf doesn't need any help. I was able to use it to isolate some Typhon problems last week. I'll use my Monte webserver for all of these tests, putting it under some stress and then looking at the traces and flame graphs.

Now seems like a good time to mention that my dorky little webserver is not production-ready; it is literally just good enough to respond to Firefox, siege, and httperf with a 200 OK and a couple bytes of greeting. This is definitely a microbenchmark.

With that said, let's look at what perf and flame graphs say about webserver performance:

An unhelpful HTTP server profile

You can zoom in on this by clicking. Not that it'll help much. This flame graph has two big problems:

  1. Most of the time is spent in the mysterious "[unknown]" frames. I bet that those are just caused by the JIT's code generation, but perf doesn't know that they're meaningful or how to label them.
  2. The combination of JIT and builtin objects with builtin methods result in totally misleading call stacks, because most object calls don't result in new methods being added to the stack.

I decided to tackle the first problem first, because it seemed easier. Digging a bit, I found a way to generate information on JIT-created code objects and get that information to perf via a temporary file.

The technique is only documented via tribal knowledge and arcane blog entries. (I suppose that, in this regard, I am not helping.) It is described both in this kernel patch implementing the feature, and also in this V8 patch. The Typhon JIT hooks show off my implementation of it.

So, does it work? What does it look like?

I didn't upload a picture of this run, because it doesn't look different from the earlier graph! The big [unknown] frames aren't improved at all. Sufficient digging will reveal the specific newly-annotated frames being nearly never called. Clearly this was not a winning approach.

At this point, I decided to completely change my tack. I wrote a completely new call stack tracer inside Typhon. I wanted to do a sampling profiler, but sampling is hard in RPython. The vmprof project might fix that someday. For now, I'll have to do a precise profiler.

Unlabeled HTTP server profile with correct atoms

I omitted the coding montage. Every time a call is made from within SmallCaps, the profiler takes a time measurement before and after the call. This is pretty great! But can we get more useful names?

Names in Monte are different from names in, say, Python or Java. Python and Java both have class names. Since Monte does not have classes, Monte doesn't have a class name. A compromise which we accept here is to use the "display name" of the class, which will be the pattern used to bind a user-level object literal, and will be the name of the class for all of the runtime object classes. This is acceptable.

HTTP server profile with correct atoms and useful display names

Note how the graphs are differently shaped; all of the frames are being split out properly and the graph is more detailed as a result. The JIT is still active during this entire venture, and it'd be cool to see what the JIT is doing. We can use RPython's rpython.rlib.jit.we_are_jitted() function to mark methods as being JIT'd, and we can ask the flame graph generator to colorize them.

HTTP server profile with JIT COLORS HOLY FUCK I CANNOT BELIEVE IT WORKS

Oh man! This is looking pretty cool. Let's colorize the frames that are able to sit directly below JIT entry points. I do this with a heuristic (regular expression).

THE COLORS NEVER STOP, CAN'T STOP, WON'T STOP

This isn't even close to the kind of precision and detail from the amazing Java-on-Illumos profiles on Gregg's site, but it's more than enough to help my profiling efforts.

February 26, 2015
I recently purchased a 64GB mini SD card to slot in to my laptop and/or tablet, keeping media separate from my home directory pretty full of kernel sources.

This Samsung card looked fast enough, and at 25€ include shipping, seemed good enough value.


Hmm, no mention of the SD card size?

The packaging looked rather bare, and with no mention of the card's size. I opened up the packaging, and looked over the card.

Made in Taiwan?

What made it weirder is that it says "made in Taiwan", rather than "Made in Korea" or "Made in China/PRC". Samsung apparently makes some cards in Taiwan, I've learnt, but I didn't know that before getting suspicious.

After modifying gnome-multiwriter's fake flash checker, I tested the card, and sure enough, it's an 8GB card, with its firmware modified to show up as 67GB (67GB!). The device (identified through the serial number) is apparently well-known in swindler realms.

Buyer beware, do not buy from "carte sd" on Amazon.fr, and always check for fake flash memory using F3 or h2testw, until udisks gets support for this.

Amazon were prompt in reimbursing me, but the Comité national anti-contrefaçon and Samsung were completely uninterested in pursuing this further.

In short:

  • Test the storage hardware you receive
  • Don't buy hardware from Damien Racaud from Chaumont, the person behind the "carte sd" seller account
February 23, 2015

Some years ago I bought myself a new laptop, deleted the windows partition and installed Fedora on the system. Only to later realize that the system had a bug that required a BIOS update to fix and that the only tool for doing such updates was available for Windows only. And while some tools and methods have been available from a subset of vendors, BIOS updates on Linux has always been somewhat of hit and miss situation. Well luckily it seems that we will finally get a proper solution to this problem.
Peter Jones, who is Red Hat’s representative to the UEFI working group and who is working on making sure we got everything needed to support this on Linux, approached me some time ago to let me know of the latest incoming update to the UEFI standard which provides a mechanism for doing BIOS updates. Which means that any system that supports UEFI 2.5 will in theory be one where we can initiate the BIOS update from Linux. So systems supporting this version of the UEFI specs is expected to become available through the course of this year and if you are lucky your hardware vendor might even provide a BIOS update bringing UEFI 2.5 support to your existing hardware, although you would of course need to do that one BIOS update in the old way.

So with Peter’s help we got hold of some prototype hardware from our friends at Intel which already got UEFI 2.5 support. This hardware is currently in the hands of Richard Hughes. Richard will be working on incorporating the use of this functionality into GNOME Software, so that you can do any needed BIOS updates through GNOME Software along with all your other software update needs.

Peter and Richard will as part of this be working to define a specification/guideline for hardware vendors for how they can make their BIOS updates available in a manner we can consume and automatically check for updates. We will try to align ourselves with the requirements from Microsoft in this area to allow the vendors to either use the exact same package for both Windows and Linux or at least only need small changes to them. We can hopefully get this specification up on freedesktop.org for wider consumption once its done.

I am also already speaking with a couple of hardware vendors to see if we can pilot this functionality with them, to both encourage them to support UEFI 2.5 as quickly as possible and also work with them to figure out the finer details of how to make the updates available in a easily consumable fashion.

Our hope here is that you eventually can get almost any hardware and know that if you ever need a BIOS update you can just fire up Software and it will tell you what if any BIOS updates are available for your hardware, and then let you download and install them. For people running Fedora servers we have had some initial discussions about doing BIOS updates through Cockpit, in addition of course to the command line tools that Peter is writing for this.

I mentioned in an earlier blog post that one of our goals with the Fedora Workstation is to drain the swamp in terms of fixing the real issues making using a Linux desktop challenging, well this is another piece of that puzzle and I am really glad we had Peter working with the UEFI standards group to ensure the final specification was useful also for Linux users.

Anyway as soon as I got some data on concrete hardware that will support this I will make sure to let you know.

February 22, 2015
So, I finally figured out the bug that was causing some incorrect rendering in xonotic (and, it turns out, to be the same bug plaguing a lot of other games/webgl/etc).  The fix is pushed to upstream mesa master (and I guess I should probably push it to the 10.5 stable branch too).  Now that xonotic renders correctly, I think I can finally call freedreno a4xx support usable:



Also, for fun, a little comparison between the ifc6540 board (snapdragon 805, aka apq8084), and my laptop (i5-4310U).  Both have 1920x1080 resolution, both running gnome-shell and firefox (with identical settings).  Laptop is fedora f21 while ifc6540 is rawhide), but it is quite close to an apples-to-apples comparision:



Obviously not a rigorous benchmark, so please don't read too much into the results.  The intel is still faster overall (as it should be at it's size/price/power budget), but amazing that the gap is becoming so small between something that can be squeezed into a cell phone and dedicated laptop class chips.

February 21, 2015

Today, I will describe a new way to reverse engineer PCI drivers by creating a PCI passthrough with a QEMU virtual machine. In this article, I will show you how to use the Intel VT-d technology in order to trace memory mapped input/output (MMIO) accesses of a QEMU VM. As a member of Nouveau community, this howto will only be focused on the NVIDIA‘s proprietary driver but it should be pretty similar for all PCI drivers.

Introduction

Reverse engineering the NVIDIA’s proprietary driver is not an easy task, especially on Windows because we have no support for both mmiotrace, a toolbox for tracing memory mapped I/O access within the Linux kernel, and valgrind-mmt which allows tracing application accesses to mmaped memory.

When I started to reverse engineer NVIDIA Perfkit on Windows (for graphics performance counters) in-between the Google Summer of Code 2013 and 2014, I wrote some tools for dumping the configuration of these performance counters, but it was very painful to find multiplexers because I couldn’t really trace MMIO accesses. I would have liked to use Intel VT-d but my old computer didn’t support that recent technology, but recently I got a new computer and my life has changed. ;)

But what is VT-d and how to use it with QEMU ?

An input/output memory management unit (IOMMU) allows guest virtual machines to directly use peripheral devices, such as Ethernet, accelerated graphics cards, through DMA and interrupt remapping. This is called VT-d at Intel and AMD-Vi at AMD.

QEMU allows to use that technology through the VFIO driver which is an IOMMU/device agnostic framework for exposing direct device access to userspace, in a secure, IOMMU protected environment. In other words, this allows safe, non-privileged, userspace drivers. Initially developed by Cisco, VFIO is now maintened by Alex Williamson at Red Hat.

In this howto, I will use Fedora as guest OS but whatever you use it should work for both Linux and Windows OS. Let’s get start.

Tested hardware

Motherboard: ASUS B85 PRO GAMER

CPU: Intel Core i5-4460 3.20GHz

GPU: NVIDIA GeForce 210 (host) and NVIDIA GeForce 9500 GT (guest)

OS: Arch Linux (host) and Fedora 21 (guest)

Prerequisites

Your CPU needs to support both virtualization and IOMMU (Intel VT-d technology, Core i5 at least). You will also need two NVIDIA GPUs and two monitors, or one with two different inputs (one plugged into your host GPU, one into your guest GPU). I would also recommend you to have a separate keyboard and mouse for the guest OS.

Step 1: Hardware setup

Check if your CPU supports virtualization.

egrep -i '^flags.*(svm|vmx)' /proc/cpuinfo

If so, enable CPU virtualization support and Intel VT-d from the BIOS.

Step 2: Kernel config

1) Modify kernel config
Device Drivers --->
    [*] IOMMU Hardware Support  --->
        [*]   Support for Intel IOMMU using DMA Remapping Devices
        [*]   Support for Interrupt Remapping
Device Drivers --->
    [*] VFIO Non-Privileged userspace driver framework  --->
        [*]   VFIO PCI support for VGA devices
Bus options (PCI etc.) --->
    [*] PCI Stub driver
2) Build kernel
3) Reboot, and check if your system has support for both IOMMU and DMA remapping
dmesg | grep -e IOMMU -e DMAR
[    0.000000] ACPI: DMAR 0x00000000BD9373C0 000080 (v01 INTEL  HSW      00000001 INTL 00000001)
[    0.019360] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap d2008c20660462 ecap f010da
[    0.019362] IOAPIC id 8 under DRHD base  0xfed90000 IOMMU 0
[    0.292166] DMAR: No ATSR found
[    0.292235] IOMMU: dmar0 using Queued invalidation
[    0.292237] IOMMU: Setting RMRR:
[    0.292246] IOMMU: Setting identity map for device 0000:00:14.0 [0xbd8a6000 - 0xbd8b2fff]
[    0.292269] IOMMU: Setting identity map for device 0000:00:1a.0 [0xbd8a6000 - 0xbd8b2fff]
[    0.292288] IOMMU: Setting identity map for device 0000:00:1d.0 [0xbd8a6000 - 0xbd8b2fff]
[    0.292301] IOMMU: Prepare 0-16MiB unity mapping for LPC
[    0.292307] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]

!!! If you have no output, you have to fix this before continuing !!!

Step 3: Build QEMU

git clone git://git.qemu-project.org/qemu.git --depth 1
cd qemu
./configure --python=/usr/bin/python2 # Python 3 is not yet supported
make && make install

You can also install QEMU from your favorite package manager, but I would recommend you to get the source code if you want to enable VFIO tracing support.

Step 4: Unbind the GPU with pci-stub

According to my hardware config, I have two NVIDIA GPUs, so blacklisting the Nouveau kernel module is not so good. Instead, I will use pci-stub in order to unbind the GPU which will be assigned to the guest OS.

NOTE: If pci-stub was built as a module, you’ll need to modify /etc/mkinitcpio.conf, add pci-stub in the MODULES section, and update your initramfs.

lspci
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
05:00.0 VGA compatible controller: NVIDIA Corporation G96 [GeForce 9500 GT] (rev a1)
lspci -n
01:00.0 0300: 10de:0a65 (rev a2) # GT218
05:00.0 0300: 10de:0640 (rev a1) # G96

Now add the following kernel parameter to your bootloader.

pci-stub.ids=10de:0640

Reboot, and check.

dmesg | grep pci-stub
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-nouveau root=UUID=5f64607c-5c72-4f65-9960-d5c7a981059e rw quiet pci-stub.ids=10de:0640
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-nouveau root=UUID=5f64607c-5c72-4f65-9960-d5c7a981059e rw quiet pci-stub.ids=10de:0640
[    0.295763] pci-stub: add 10DE:0640 sub=FFFFFFFF:FFFFFFFF cls=00000000/00000000
[    0.295768] pci-stub 0000:05:00.0: claimed by stub

Step 5: Bind the GPU with VFIO

Now, it’s time to bind the GPU (the G96 card in this example) with VFIO in order to pass through it to the VM. You can use this script to make life easier:

#!/bin/bash

modprobe vfio-pci

for dev in "$@"; do
        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
        device=$(cat /sys/bus/pci/devices/$dev/device)
        if [ -e /sys/bus/pci/devices/$dev/driver ]; then
                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
        fi
        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done

Bind the GPU:

./vfio-bind.sh 0000:05:00.0 # G96

Step 6: Testing KVM VGA-Passthrough

Let’s test if it works, as root:

qemu-system-x86_64 \
    -enable-kvm \
    -M q35 \
    -m 2G \
    -cpu host, kvm=off \
    -device vfio-pci,host=05:00.0,multifunction=on,x-vga=on

If it works fine, you should see a black QEMU window with the message “Guest has not initialized the display (yet)”. You will need to pass -vga none, otherwise it won’t work. I’ll show you all the options I use a bit later.

NOTE: kvm=off is required for some recent NVIDIA proprietary drivers because it won’t be loaded if it detects KVM…

Step 7: Add USB support

At this step, we have assigned the GPU to the virtual machine, but it would be a good idea to be able to use that guest OS with a keyboard, for example. To do this, we need to add USB support to the VM. The preferred way is to pass through an entire USB controller like we already did for the GPU.

lspci | grep USB
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)

Add the following line to QEMU, example for 00:14.0:

-device vfio-pci,host=00:14.0,bus=pcie.0

Before trying USB support inside the VM, you need to assign that USB controller to VFIO, but you will lose your keyboard and your mouse from the host in case they are connected to that controller.

./vfio-bind.sh 0000:00:14.0

In order to re-enable the USB support from the host, you will need to unbind the controller, and to bind it to xhci_hcd.

echo 0000:00:14.0 > /sys/bus/pci/drivers/vfio-pci/unbind
echo 0000:00:14.0 > /sys/bus/pci/drivers/xhci_hcd/bind

If you get an error with USB support, you might simply try a different controller, or try to assign USB devices by ID.

Step 8: Install guest OS

Now, it’s time to install the guest OS. I installed Fedora 21 because it’s just not possible to run Arch Linux inside QEMU due to a bug in syslinux… Whatever, install your favorite Linux OS and go ahead. I would also recommend to install envytools (a collection of tools developed by the members of the Nouveau community) in order to easily test the tracing support.

You can use the script below to launch a VM with VGA and USB passthrough, and all the stuff we need.

#!/bin/bash

modprobe vfio-pci

vfio_bind()
{
    dev="$1"
        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
        device=$(cat /sys/bus/pci/devices/$dev/device)
        if [ -e /sys/bus/pci/devices/$dev/driver ]; then
                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
        fi
        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
}

# Bind devices.
modprobe vfio-pci
vfio_bind 0000:05:00.0  # GPU (NVIDIA G96)
vfio_bind 0000:00:14.0  # USB controller

qemu-system-x86_64 \
    -enable-kvm \
    -M q35 \
    -m 2G \
    -hda fedora.img \
    -boot d \
    -cpu host,kvm=off \
    -vga none \
    -device vfio-pci,host=05:00.0,multifunction=on,x-vga=on \
    -device vfio-pci,host=00:14.0,bus=pcie.0


# Restore USB controller
echo 0000:00:14.0 > /sys/bus/pci/drivers/vfio-pci/unbind
echo 0000:00:14.0 > /sys/bus/pci/drivers/xhci_hcd/bind

Step 9: Enable VFIO tracing support for QEMU

1) Configure QEMU to enable tracing

Enable the stderr trace backend. Please refer to docs/tracing.txt if you want to change the backend.

./configure --python=/usr/bin/python2 --enable-trace-backends=stderr
2) Disable MMAP support

Disabling MMAP support uses the slower read/write accesses to MMIO space that will get traced. To do this, open the file include/hw/vfio/vfio-common.h, and change #define VFIO_ALLOW_MMAP from 1 to 0.

 /* Extra debugging, trap acceleration paths for more logging */
-#define VFIO_ALLOW_MMAP 1
+#define VFIO_ALLOW_MMAP 0

Re-build QEMU.

3) Add the trace points you want to observe

Create a events.txt file and add the vfio_region_write trace point which dumps MMIO read/write accesses of the GPU.

echo "vfio_region_write" > events.txt

VFIO tracing support is now enabled and configured, really easy, huh?

Thanks to Alex Williamson for these hints.

Step 10: Trace MMIO write accesses

Let’s now test VFIO tracing support. Enable events tracing by adding the following line to the script which launchs the VM.

-trace events=events.txt

Launch the VM. You should see lot of traces from the standard error output, this is a good news.

Open a terminal in the VM, go to the directory where envytools has been built, and run (as root) the following command.

./nvahammer 0xa404 0xdeadbeef

This command writes a 32-bit value (0xdeadbeef) to the MMIO register at 0xa404 and repeats the write in an infinite loop. It needs to be manually aborted.

Go back to the host, and you should see the following traces if it works fine.

12347@1424299207.289770:vfio_region_write  (0000:05:00.0:region0+0xa404, 0xdeadbeef, 4)
12347@1424299207.289774:vfio_region_write  (0000:05:00.0:region0+0xa404, 0xdeadbeef, 4)
12347@1424299207.289778:vfio_region_write  (0000:05:00.0:region0+0xa404, 0xdeadbeef, 4)

In this example, we have only traced MMIO write accesses, but of course, if you want to trace read accesses, you just have to change vfio_region_write to vfio_region_read.

Congratulations!

In this article I showed you how to trace MMIO accesses using a PCI passthrough with QEMU, Intel VT-d and VFIO. However, all PCI accesses are currently traced including USB controller and this is not ideal unlike mmiotrace which only dumps accesses for one peripheral. It would be also a good idea to have the same format as mmiotrace in order to use the decoding tools we already have for it in envytools.

Future work

– do not trace all PCI accesses (device and subrange address filtering)

– VFIO traces to the mmiotrace format

– compare performance when tracing support is enabled or not

Related ressources

KVM VGA-Passthrough on ArchLinux

VGA-Passthrough on Debian

VFIO documentation

QEMU VFIO tracing documentation


February 18, 2015

FOSDEM 2015

It was another great FOSDEM this year. Even with their 5-10.000 attendants, the formula of being absolutely free, with limited sponsorship, and while only making small changes each year is an absolute winner. There is just no conference which comes even close to FOSDEM.

For those on ICE14 on Friday, the highspeed train from Frankfurt to Brussels south at 14:00, who were so nice to participate in my ad-hoc visitor count: 66. I counted 66 people, but i might have skipped a few as people were sometimes too engrossed in their laptops to hear me over their headphones. On a ~400 seat train, that's a pretty high number, and i never see the same level of geekiness on the Frankfurt to Brussels trains as on the Friday before FOSDEM. If it didn't sound like an organizational nightmare, it might have been a good idea to talk to DB and get a whole carriage reserved especially for FOSDEM goers.

With the Graphics DevRoom we returned to the K building this year, and i absolutely love the cozy 80 people classrooms we have there. With good airflow, freely movable tables, and an easy way to put in my powersockets, i have an easy time as a devroom organizer. While some speakers probably prefer bigger rooms to have a higher number of attendants, there is nothing like the more direct interaction of the rooms in the K buildings. With us sitting on the top floor, we also only had people who were very actively interested in the topics of the graphics devroom.

Despite the fact that FOSDEM has no equal, and anyone who does anything with open source software in the European Union should attend, i always have a very difficult time recruiting a full set of speakers for the graphics DevRoom. Perhaps the biggest reason for this is the fact that it is a free conference, and it lacks the elitarian status of a paid-for conference. Everyone can attend your talk, even people who do not work on the kernel or on graphics drivers, and the potential speaker might feel as if he needs to waste his time on people who are below his own perceived station. Another reason may be that it is harder to convince the beancounters to sponsor a visit to a free conference. In that case, if you live in the European Union and when you are paid to do open source software, you should be able to afford going to the must-visit FOSDEM by yourself.

As for next year, i am not sure whether there will be a graphics devroom again. Speaker count really was too low and perhaps it is time for another hiatus. Perhaps it is once again time to show people that talking in a devroom at FOSDEM truly is a privilege, and not to be taken for granted.

Tamil "Driver" talk.

My talk this year, or rather, my incoherent mumble finished off with a demo, was about showing my work on the ARM Mali Midgard GPUs. For those who had to endure it, my apologies for my ill-preparedness; i poured all my efforts into the demo (which was finished on Wednesday), and spent too much time doing devroom stuff (which ate Thursday) and of course in drinking up, ahem, the event that is FOSDEM (which ate the rest of the weekend). I will try to make up for it now in this blog post.

Current Tamil Status.

As some might remember, in September and October 2013, i did some preliminary work on the Mali T-series. I spent about 3 to 3.5 weeks building the infrastructure to capture the command stream and replay it. At the same time I also dug deep into the Mali binary driver to expose the binary shader compiler. These two feats gave me all the prerequisites for doing the job of reverse engineering the command stream of the Mali Midgard.

During the Christmas holidays of 2014 and in January 2015, i spent my time building a command stream parser. This is a huge improvement over my work on lima, where i only ended doing so later on in the process (while bringing up Q3A). As i built up the capabilities of this parser, i threw ever more complex GLES2 programs at it. A week before FOSDEM, my parser was correctly handling multiple draws and frames, uniforms, attributes, varyings and textures. Instead of having raw blobs of memory, i had C structs and tables, allowing me to easily see the differences between streams.

I then took the parsed result of my most complex test and slowly turned that into actual C code, using the shader binary produced by the binary compiler, and adding a trivial sequential memory allocator. I then added rotation into the mix, and this is the demo as seen on FOSDEM (and now uploaded to youtube).

All the big things are known.

For textures. I only have simple texture support at this time, no mipmapping nor cubemapping yet, and only RGB565 and RGBA32 are supported at this time. I also still have not figured out how to disable swizzling, instead i re-use the texture swizzling code from lima, the only place where I was able to re-use code in tamil. This missing knowledge is just some busywork, and a bit of coding away.

As for programs, while both the Mali Utgard (M-series) and Midgard (T-series) binary compilers output in a format called MBS (Mali Binary Shader), the contents of each file is significantly different. I had no option but to rewrite the MBS parser for tamil.

Instead of rewriting the vertex shaders binaries like ARMs binary driver does, i reorder the entries in the attribute descriptor table to match the order as described by the shader compiler output. This avoids adding a whole lot of logic to handle this correctly, even though MBS now describes which bits to alter in the binary. I still lay uniforms, attributes and varyings out by hand though, i similarly have only limited knowledge of typing at this point. This mostly is a bit of busywork of writing up the actual logic, and trying out a few different things.

I know only very few things about most of the GL state. Again, mostly busywork with a bit of testing and coding up the results. And while many values left and right are still magic, nothing big is hiding any more.

Unlike lima, i am refraining from building up more infrastructure (than necessary to show the demo) outside of Mesa. The next step really is writing up a mesa driver. Since my lima driver for mesa was already pretty advanced, i should be able to re-use a lot of the knowledge gained there, and perhaps some code.

The demo

The demo was shown on a Samsung ARM Chromebook, with a kernel and linux installation from september 2013 (when i brought up cs capture and exposed the shader compiler). The exynos KMS driver on this 3.4.0 kernel is terrible. It only accepts a few fixed resolutions (as if I never existed and modesetting wasn't a thing), and then panics when you even look at it. Try to leave X while using HDMI: panic. Try to use a KMS plane to display the resulting render: panic.

In the end, i used fbdev and memcpy the rendered frame over to the console. On this kernel, i cannot even disable the console, so some of the visible flashing is the console either being overwritten by or overwriting the copied render.

The youtube video shows a capture of the Chromebooks built in LCD, at 1280x720 on a 1366x768 display. At FOSDEM, i was lucky that the projector accepted the 1280x720 mode the exynos hdmi driver produced. My dumb HDMI->VGA converter (which makes the image darker) was willing to pass this through directly. I have a more intelligent HDMI->VGA adapter which also does scaling and which keeps colours nice, but that one just refused the output of the exynos driver. The video that was captured in our devroom probably does not show the demo correctly, as that should've been at 1024x768.

The demo shows 3 cubes rotating in front of the milky way. It is 4 different draws, using 3 different textures, and 3 different programs. These cubes currently rotate at 47fps, with the memcpy. During the talk, the chromebook slowed down progressively down to 26fps and even 21fps at one point, but i have not seen that behaviour before or since. I do know of an issue that makes the demo fail at frame 79530, which is 100% reproducible. I still need to track this down, it probably is an issue with my job handling code.

linux-exynos.org

With Lima and Tamil i am in a very unique position. Unlike on adreno, tegra or Videocore, i have to deal with many different SoCs. Apart from the difference in kernel GPU drivers, i also have to deal with differences in display drivers and run into a whole new world of hurts every time i move over to a new target device. The information for doing a proper linux installation on an android or chrome device is usually dispersed, not up to date, and not too good, and i get to do a lot of the legwork for myself every time, knowing full well that a lot of others have done so already but couldn't be bothered to document things (hence my role in the linux-sunxi community).

The ARM chromebook and its broken kms driver is much of the same. Last FOSDEM i complained how badly supported and documented the Samsung ARM chromebook is, despite its popularity, and appealed for more linux-sunxi style, SoC specific communities, especially since I, as a graphics driver developer, cannot go and spend as much time in each and every of the SoC projects as i have done with sunxi.

During the questions round of this talk, one guy in the audience asked what needed to be done to fix the SoC pain. At first i completely missed the question, upon which he went and rephrased his question. My answer was: provide the infrastructure, make some honest noise and people will come. Usually, when some asks such a question, nothing ever comes from it. But Merlijn "Wizzup" Wajer and his buddy S.J.R. "Swabbles" van Schaik really came through.

Today there is the linux-exynos.org wiki, the linux-exynos mailinglist, and the #linux-exynos irc channel. While the community is not as large as linux-sunxi, it is steadily growing. So if you own exynos based hardware, or if your company is basing a product on the exynos chipset, head to linux-exynos.org and help these guys out. Linux-exynos deserves your support.
February 16, 2015

A few months ago, I wrote the definitive guide about Python method declaration, which had quite a good success. I still fight every day in OpenStack to have the developers declare their methods correctly in the patches they submit.

Automation plan

The thing is, I really dislike doing the same things over and over again. Furthermore, I'm not perfect either, and I miss a lot of these kind of problems in the reviews I made. So I decided to replace me by a program – a more scalable and less error-prone version of my brain.

In OpenStack, we rely on flake8 to do static analysis of our Python code in order to spot common programming mistakes.

But we are really pedantic, so we wrote some extra hacking rules that we enforce on our code. To that end, we wrote a flake8 extension called hacking. I really like these rules, I even recommend to apply them in your own project. Though I might be biased or victim of Stockholm syndrome. Your call.

Anyway, it's pretty clear that I need to add a check for method declaration in hacking. Let's write a flake8 extension!

Typical error

The typical error I spot is the following:

class Foo(object):
# self is not used, the method does not need
# to be bound, it should be declared static
def bar(self, a, b, c):
return a + b - c


That would be the correct version:

class Foo(object):
@staticmethod
def bar(a, b, c):
return a + b - c


This kind of mistake is not a show-stopper. It's just not optimized. Why you have to manually declare static or class methods might be a language issue, but I don't want to debate about Python misfeatures or design flaws.

Strategy

We could probably use some big magical regular expression to catch this problem. flake8 is based on the pep8 tool, which can do a line by line analysis of the code. But this method would make it very hard and error prone to detect this pattern.

Though it's also possible to do an AST based analysis on on a per-file basis with pep8. So that's the method I pick as it's the most solid.

AST analysis

I won't dive deeply into Python AST and how it works. You can find plenty of sources on the Internet, and I even talk about it a bit in my book The Hacker's Guide to Python.

To check correctly if all the methods in a Python file are correctly declared, we need to do the following:

  • Iterate over all the statement node of the AST
  • Check that the statement is a class definition (ast.ClassDef)
  • Iterate over all the function definitions (ast.FunctionDef) of that class statement to check if it is already declared with @staticmethod or not
  • If the method is not declared static, we need to check if the first argument (self) is used somewhere in the method

Flake8 plugin

In order to register a new plugin in flake8 via hacking, we just need to add an entry in setup.cfg:

[entry_points]
flake8.extension =
[…]
H904 = hacking.checks.other:StaticmethodChecker
H905 = hacking.checks.other:StaticmethodChecker


We register 2 hacking codes here. As you will notice later, we are actually going to add an extra check in our code for the same price. Stay tuned.

The next step is to write the actual plugin. Since we are using an AST based check, the plugin needs to be a class following a certain signature:

@core.flake8ext
class StaticmethodChecker(object):
def __init__(self, tree, filename):
self.tree = tree
 
def run(self):
pass


So far, so good and pretty easy. We store the tree locally, then we just need to use it in run() and yield the problem we discover following pep8 expected signature, which is a tuple of (lineno, col_offset, error_string, code).

This AST is made for walking ♪ ♬ ♩

The ast module provides the walk function, that allow to iterate easily on a tree. We'll use that to run through the AST. First, let's write a loop that ignores the statement that are not class definition.

@core.flake8ext
class StaticmethodChecker(object):
def __init__(self, tree, filename):
self.tree = tree
 
def run(self):
for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue


We still don't check for anything, but we know how to ignore statement that are not class definitions. The next step need to be to ignore what is not function definition. We just iterate over the attributes of the class definition.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue


We're all set for checking the method, which is body_item. First, we need to check if it's already declared as static. If so, we don't have to do any further check and we can bail out.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue
# Check that it has a decorator
for decorator in body_item.decorator_list:
if (isinstance(decorator, ast.Name)
and decorator.id == 'staticmethod'):
# It's a static function, it's OK
break
else:
# Function is not static, we do nothing for now
pass


Note that we use the special for/else form of Python, where the else is evaluated unless we used break to exit the for loop.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue
# Check that it has a decorator
for decorator in body_item.decorator_list:
if (isinstance(decorator, ast.Name)
and decorator.id == 'staticmethod'):
# It's a static function, it's OK
break
else:
try:
first_arg = body_item.args.args[0]
except IndexError:
yield (
body_item.lineno,
body_item.col_offset,
"H905: method misses first argument",
"H905",
)
# Check next method
continue


We finally added some check! We grab the first argument from the method signature. Unless it fails, and in that case, we know there's a problem: you can't have a bound method without the self argument, therefore we raise the H905 code to signal a method that misses its first argument.

Now you know why we registered this second pep8 code along with H904 in setup.cfg. We have here a good opportunity to kill two birds with one stone.

The next step is to check if that first argument is used in the code of the method.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue
# Check that it has a decorator
for decorator in body_item.decorator_list:
if (isinstance(decorator, ast.Name)
and decorator.id == 'staticmethod'):
# It's a static function, it's OK
break
else:
try:
first_arg = body_item.args.args[0]
except IndexError:
yield (
body_item.lineno,
body_item.col_offset,
"H905: method misses first argument",
"H905",
)
# Check next method
continue
for func_stmt in ast.walk(body_item):
if six.PY3:
if (isinstance(func_stmt, ast.Name)
and first_arg.arg == func_stmt.id):
# The first argument is used, it's OK
break
else:
if (func_stmt != first_arg
and isinstance(func_stmt, ast.Name)
and func_stmt.id == first_arg.id):
# The first argument is used, it's OK
break
else:
yield (
body_item.lineno,
body_item.col_offset,
"H904: method should be declared static",
"H904",
)


To that end, we iterate using ast.walk again and we look for the use of the same variable named (usually self, but if could be anything, like cls for @classmethod) in the body of the function. If not found, we finally yield the H904 error code. Otherwise, we're good.

Conclusion

I've submitted this patch to hacking, and, finger crossed, it might be merged one day. If it's not I'll create a new Python package with that check for flake8. The actual submitted code is a bit more complex to take into account the use of abc module and include some tests.

As you may have notice, the code walks over the module AST definition several times. There might be a couple of optimization to browse the AST in only one pass, but I'm not sure it's worth it considering the actual usage of the tool. I'll let that as an exercise for the reader interested in contributing to OpenStack. 😉

Happy hacking!

A book I wrote talking about designing Python applications, state of the art, advice to apply when building your application, various Python tips, etc. Interested? Check it out.

February 14, 2015

Two years of having a real job have made me bitter and grumpy, so I'm gonna stop with the cutesy blog post titles.

Today, I'm gonna talk a bit about Monte. Specifically, I'm going to talk about Typhon's current virtual machine. Typhon used to use an abstract syntax tree interpreter ("AST interpreter") as its virtual machine. It now uses a variant of the SmallCaps bytecode machine. I'll explain how and why.

When I started designing Typhon, I opted for an AST VM because it seemed like it matched Monte's semantics well. Monte is an expression language, which means that every syntactic node is an expression which can be evaluated to a single value. Monte is side-effecting, so evaluation must happen in an environment which can record the side effects. An AST interpreter could have an object representing each syntactic node, and each node could be evaluated within an environment to produce a value.

class Node(object):

    def evaluate(self, environment):
        pass

Subclasses of Node can override evaluate() to have behavior specific to that node. The Assign class can assign values to names in the environment, the Object class can create new objects, and the Call class can pass messages to objects. This technique is amenable to RPython, since Node.evaluate() has the same signature for all nodes.

How well does this work with the RPython just-in-time ("JIT") compiler? It does alright. The JIT has little problem seeing that the nodes are compile-time constant, and promotion applied strategically allows the JIT to see into things like Call nodes, which cause methods to be inlined relatively well. However, there are some problems.

The first problem is that Monte has no syntactic loops. Since RPython's JIT is designed to compile loops, this means that some extra footwork is needed to set up the JIT. I took Monte's looping object, __loop, and extended it to detect when its argument is likely to yield a JIT trace. The detection is simple, examining whether the loop body is a user-defined object and only entering the JIT when that is the case. In theory, most bodies will be user-defined, since Monte turns this:

object SubList:
    to coerce(specimen, ej):
        def [] + l exit ej := specimen
        for element in l:
            subguard.coerce(element, ej)
        return specimen

‌into this:

object SubList:
    method coerce(specimen, ej):
        def via (__splitList.run([0])) [l] exit ej := specimen
        var validFlag__6 := true
        try:
            __loop.run([l, object _ {
                method run(_, value__8) {
                    __validateFor.run([validFlag__6])
                    subguard.coerce([value__8, ej])
                    null
                }
            }])
        finally:
            validFlag__6 := false
        specimen

This example is from Typhon's prelude. The expansion of loops, and other syntax, is defined to turn the full Monte language into a smaller language, Kernel-Monte, which resembles Kernel-E. Now, in this example, the nonce loop object generated by the expander is very custom, and probably couldn't be simplified any further. However, it is theoretically possible that an optimizer could detect loops that call a single method repeatedly and simplify them more aggressively. In practice, that optimization doesn't exist, so Typhon thinks that all loops are user-defined and allows the JIT to trace all of them.

The next hurdle has to do with names. Monte's namespaces are static, which means that it's possible to always know in advance which names are in a scope. This is a powerful property for compilers, since it opens up many kinds of compilation and lowering which aren't available to languages with dynamic namespaces like Python. However, Typhon doesn't have access to the scoping information from the expander, only the AST. This means that Typhon has to redo most of the name and scope analysis in order to figure out things like how big namespaces should be and where to store everything. I initially did all of this at runtime, in the JIT, but it is very slow. This is because the JIT has problems seeing into dictionaries, and it cannot trust that a dictionary is actually constant-size or constant-keyed.

RPython does have an answer to this, called virtual and virtualizable objects. A virtual object is one that is never constructed within a JIT trace, because the JIT knows that the object won't leave the trace and can be totally elided. (The literature talks at length of "escape analysis", the formal term for this.) Typhon's AST nodes occasionally generated virtual objects, but only rarely, because most objects are assigned to names in the environment, and the JIT refuses to ignore modifications to the environment.

Virtualizable objects solve this problem neatly. A virtualizable object has one or more attributes, called virtualizable attributes, which can be "out-of-sync" during a JIT trace. A virtualizable can delay being updated, as long as it's updated at some point before the end of the JIT trace. RPython allows fields of integers and floating-point values to be virtualizable, as well as constant-size lists. However, Typhon uses mappings of names for its environments, backed by dictionaries, and dictionaries can't be virtualizable.

The traditional solution to this problem involves assigning an index to every name within a scope, and using a constant-size list with the indices. I did this, but it was arduous and didn't have the big payoff that I had wanted. Why not? Well, every AST node introduces a new scope! This caused scope creation to be expensive. I realized that I needed to compute closures as well.

Around this time, a few months ago, I began to despair because debugging the AST VM is hard. RPython's JIT logging and tooling is all based around the assumption that there is a low-level virtual machine of some sort which has instructions and encapsulation, and the AST was just too hard to manage in this way. I had had to invent my own tracebacks, my own logging, and my own log viewer. This wasn't going well. I wanted to join the stack-based VM crew and not have piles of ASTs to slog through whenever something wasn't going right. So, I decided to try to implement SmallCaps, from E. E, of course, is the inspiration for Monte, and shares many features with Monte. SmallCaps was based on old Smalltalk systems, but was designed to work with unique E features like ejectors.

So, enough talk, time for some code. First, let's lay down some ground rules. These are the guiding semantics of SmallCaps in Typhon. Keep in mind that we are describing a single-stack automaton with a side stack for exception handling and an environment with frames for local values and closed-over "global" values.

  • All expressions return a value. Therefore, an expression should always compile to some instructions which start with an empty stack leave a single value on the stack.
  • All patterns perform some side effects in the environment and return nothing. Therefore, they should compile to instructions which consume two values from the stack (specimen and ejector) and leave nothing.
  • When an exception handler is required, every handler must be dropped when it's no longer needed.

With these rules, the compiler's methods become very obvious.

class Str(Node):
    """
    A literal string.
    """

    def compile(self, compiler):
        index = compiler.literal(StrObject(self._s))

Strings are compiled into a single LITERAL instruction that places a string on the stack. Simple enough.

class Sequence(Node):
    """
    A sequence of nodes.
    """

    def compile(self, compiler):
        for node in self._l[:-1]:
            node.compile(compiler)
            compiler.addInstruction("POP", 0)
        self._l[-1].compile(compiler)

Here we compile sequences of nodes by compiling each node in the sequence, and then using POP to remove each intermediate node's result, since they aren't used. This nicely mirrors the semantics of sequences, which are to evaluate every node in the sequence and then return the value of the ultimate node's evaluation.

This also shows off a variant on the Visitor pattern which Allen, Mike, and I are calling the "Tourist pattern", where an accumulator is passed from node to node in the structure and recursion is directed by each node. This makes managing the Expression Problem much easier, since nodes completely contain all of the logic for each accumulation, and makes certain transformations much easier. More on that in a future post.

class FinalPattern(Pattern):

    def compile(self, compiler):
        # [specimen ej]
        if self._g is None:
            compiler.addInstruction("POP", 0)
            # [specimen]
        else:
            self._g.compile(compiler)
            # [specimen ej guard]
            compiler.addInstruction("ROT", 0)
            compiler.addInstruction("ROT", 0)
            # [guard specimen ej]
            compiler.call(u"coerce", 2)
            # [specimen]
        index = compiler.addFrame(u"_makeFinalSlot")
        compiler.addInstruction("NOUN_FRAME", index)
        compiler.addInstruction("SWAP", 0)
        # [_makeFinalSlot specimen]
        compiler.call(u"run", 1)
        index = compiler.addLocal(self._n)
        compiler.addInstruction("BINDSLOT", index)
        # []

This pattern is compiled to insert a specimen into the environment, compiling the optional guard along the way and ensuring order of operations. The interspersed comments represent the top of stack in-between operations, because it helps me keep track of how things are compiled.

With this representation, the Compiler is able to see the names and indices of every binding introduced during compilation, which means that creating index-based frames as constant-size lists is easy. (I was going to say "trivial," but it was not trivial!)

I was asked on IRC about why I chose to adapt SmallCaps instead of other possible VMs. The answer is mostly that SmallCaps was designed and implemented by people that were much more experienced than me, and that I trust their judgement. I tried several years ago to design a much purer concatenative semantics for Kernel-E, and failed. SmallCaps works, even if it's not the simplest thing to implement. I did briefly consider even smaller semantics, like those of the Self language, but I couldn't find anything expressive enough to capture all of Kernel-E's systems. Ejectors are tricky.

That's all for now. Peace.

So apparently, this happened, and then this. Long story short, the elementary OS guys had been offered to use SPI as the legal entity to represent the project, something they didn't need at all, and since they didn't, Joshua Drake, apparently a director at SPI, decided to threat them with bad press all over if they didn't agree to join SPI. Which he then did, he started several threads on reddit and wrote a blog post trying to undermine the project, the post is now deleted, and this aberration of an apology (which is total BS and shows how much of an ass he is).

I seriously don't get why this guy has not been fired from the SPI organization immediately, this sort of bullying behaviour should not be allowed and, at least in my book, an apology means nothing. Someone like that does not belong to an organization that is supposed to help free software thrive and protect its communities.

I don't get how SPI expects the community to trust them at all after this.

I am really angry at this and I would like to express the elementary OS guys all my support.

February 11, 2015
Linux 3.19 was just released and my usual overview of what the next merge window will bring is more than overdue. The big thing overall is certainly all the work around atomic display updates, but read on for what else all has been done.

Let's first start with all the driver internal rework to support atomic. The big thing with atomic is that it requires a clean split between code that checks display updates and the code that commits a new display state to the hardware. The corallary from that is that any derived state that's computed in the validation code and needed int the commit code must be stored somewhere in the state object. Gustavo Padovan and Matt Roper have done all that work to support atomic plane updates. This is the code that's now in 3.20 as a tech preview. The big things missing for proper atomic plane updates is async commit support (which has already landed for 3.21) and support to adjust watermark settings on the fly. Patches for from Ville have been around for a long time, but need to be rebased, reviewed and extended for recently added platforms.

On the modeset side Ander Conselvan de Oliveira has done a lot of the necessary work already. Unfortunately converting the modeset code is much more involved for mainly two reaons: First there's a lot more derived state that needs to be handled, and the existing code already has structures and code for this. Conceptually the code has been prepared for an atomic world since the big display rewrite and the introduction of CRTC configuration structures. But converting the i915 modeset code to the new DRM atomic structures and interface is still a lot of work. Most of these refactorings have landed in 3.20. The other hold-up is shared resources and the software state to handle that. This is mostly for handling display PLLs, but also other shared resources like the display FIFO buffers. Patches to handle this are still in-flight.

Continuing with modeset work Jani Nikula has reworked the DSI support to use the shared DSI helpers from the DRM core. Jani also reworked the DSI to in preparation for dual-link DSI support, which Gaurav Singh implemented. Rodrigo Vivi and others provided a lot of patches to improve PSR support and enable it for Baytrail/Braswell. Unfortunately there's still issues with the automated testcase and so PSR unfortunately stays disabled by default for now. Rodrigo also wrote some nice DocBook documentation for fbc, another step towards fully documenting i915 driver internals.

Moving on to platform enabling there has been a lot of work from Ville on Cherryview: RPS/gpu turbo and pipe CRC support (used for automated display testing) are both improved. On Skylake almost all the basic enabling is merged now: PM patches, enabling mmio pageflips and fastboot support from Damien have landed. Tvrtko Ursulin also create the infrastructure for global GTT views. This will be used for some upcoming features on Skylake. And to simplify enabling future platforms Chris Wilson and Mika Kuoppala have completely rewritten the forcewake domains handling code.

Also really important for Skylake is that the execlist support for gen8+ command submission is finally in a good enough to be used by default - on Skylake that's the only support path, legacy ring submission has been deprecated. Around that feature and a few other closely related ones a lot of code was refactoring: John Harrison implemented the conversion from raw sequence numbers to request objects for gpu progress tracking. This as is also prep work for landing a gpu scheduler. Nick Hoath removed the special execlist request tracking structures, simplifying the code. The engine initialization code was also refactored for a cleaner split between software and hardware initialization, leading to robuster code for driver load and system resume. Dave Gordon has also reworked the code tracking and computing the ring free space. On top of that we've also enabled full ppgtt again, but for now only where execlists are available since there are still issues with the legacy ring-based pagetable loading.

For generic GEM work there's the really nice support for write-combine cpu memory mappings from Akash Goel and Chris Wilson. On Atom SoC platforms lacking the giant LLC bigger chips have this gives a really fast way to upload textures. And even with LLC it's useful for uploading to scanout targets since those are always uncached. But like the special-purpose uploads paths for LLC platforms the cpu mmap views do not detile textures, hence special-purpose fastpaths need to be written in mesa and other userspace to fully exploit this. In other GEM features the shadow batch copying code for the command parser has now also landed.

Finally there's the redesign from Imre Deak to use the component framework for the snd-hda/i915 interactions. Modern platforms need a lot of coordination between the graphics and sound driver side for audio over hdmi, and thus far this was done with some ad-hoc dynamic symbol lookups. Which results in a lot of headaches to get the ordering correctly for driver load or system suspend and resume. With the component framework this depency is now explicit, which means we will be able to get rid of a few hacks. It's also much easier to extend for the future - new platforms tend to integrate different components even more.
Now that Presentation feedback has finally landed in Weston (feedback, flags), people are starting to pay attention to the output timings as now you can better measure them. I have seen a couple of complaints already that Weston has an extra frame of latency, and this is true. I also have a patch series to fix it that I am going to propose.

To explain how the patch series affects Weston's repaint loop, I made some JSON-timeline recordings before and after, and produced some graphs with Wesgr. Here I will explain how the repaint loop works timing-wise.


The old algorithm

The old repaint scheduling algorithm in Weston repaints immediately on receiving the pageflip completion event. This maximizes the time available for the compositor itself to repaint, but it also means that clients can never hit the very next vblank / pageflip.

Figure 1. The old algorithm, the client paints as response to frame callbacks.

Frame callback events are sent at the "post repaint" step. This gives clients almost a full frame's time to draw and send their content before the compositor goes to "begin repaint" again. In Figure 1. you see, that if a client paints extremely fast, the latency to screen is almost two frame periods. The frame latency can never be less than one frame period, because the compositor samples the surface contents (the "repaint flush" point) immediately after the previous vblank.

Figure 2. The old algorithm, the client paints as response to Presentation feedback events.

While frame callback driven clients still get to the full frame rate, the situation is worse if the client painting is driven by presentation_feedback.presented events. The intent is to draw and show a new frame as soon as the old frame was shown. Because Weston starts repaint immediately on the pageflip completion, which is essentially the same time when Presentation feedback is sent, the client cannot hit the repaint of this frame and gets postponed to the next. This is the same two frame latency as with frame callbacks, but here the framerate is halved because the client waits for the frame to be actually shown before continuing, as is evident in Figure 2.

Figure 3. The old algorithm, client posts a frame while the compositor is idle.

Figure 3. shows a less relevant case, where the compositor is idle while a client posts a new frame ("damage commit"). When the compositor is idle graphics-wise (the gray background in the figure), it is not repainting continuously according to the output scanout cycle. To start painting again, Weston waits for an extra vblank first, then repaints, and then the new frame is shown on the next vblank. This is also a 1-2 frame period latency, but it is unrelated to the other two cases, and is not changed by the patches.

The modification to the algorithm

The modification is simple, yet perhaps counter-intuitive at first. We reduce the latency by adding a delay. The "delay before repaint" is in all the figures, and the old algorithm is essentially using a zero delay. The compositor's repaint is delayed so that clients have a chance to post a new frame before the compositor samples the surface contents.

A good amount of delay is a hard question. Too small delay and clients do not have time to act. Too long delay and the compositor itself will be in danger of missing the vblank deadline. I do not know what a good amount is or how to derive it, so I just made it configurable. You can set the repaint window length in milliseconds in weston.ini. The repaint window is the time from starting repaint to the deadline, so the delay is the frame period minus the repaint window. If the repaint window is too long for a frame period, the algorithm will reduce to the old behaviour.

The new algorithm

The following figures are made with a 60 Hz refresh and a 7 millisecond repaint window.

Figure 4. The new algorithm, the client paints as response to frame callback.

When a client paints as response to the frame callback (Figure 4), it still has a whole frame period of time to paint and post the frame. The total latency to screen is a little shorter now, by the length of the delay before compositor's repaint. It is a slight improvement.

Figure 5. The new algorithm, the client paints as response to Presentation feedback.

A significant improvement can be seen in Figure 5. A client that uses the Presentation extension to wait for a frame to be actually shown before painting again is now able to reach the full output frame rate. It just needs to paint and post a new frame during the delay before compositor's repaint. This mode of operation provides the shortest possible latency to screen as the client is able to target the very next vblank. The latency is below one frame period if the deadlines are met.

Discussion

This is a relatively simple change that should reduce display latency, but analyzing how exactly it affects things is not trivial. That is why Wesgr was born.

This change does not really allow clients to wait some additional time before painting to reduce the latency even more, because nothing tells clients when the compositor will repaint exactly. The risk of missing an unknown deadline grows the later a client paints. Would knowing the deadline have practical applications? I'm not sure.

These figures also show the difference between the frame callback and Presentation feedback. When a client's repaint loop is driven by frame callbacks, it maximizes the time available for repainting, which reduces the possibility to miss the deadline. If a client drives its repaint loop by Presentation feedback events, it minimizes the display latency at the cost of increased risk of missing the deadline.

All the above ignores a few things. First, we assume that the time of display is the point of vblank which starts to scan out the new frame. Scanning out a frame actually takes most of the frame period, it's not instantaneous. Going deeper, updating the framebuffer during scanout period instead of vblank could allow reducing latency even more, but the matter becomes complicated and even somewhat subjective. I hear some people prefer tearing to reduce the latency further. Second, we also ignore any fencing issues that might come up in practise. If a client submits a GPU job that takes a long while, there is a good chance it will cause everything to miss a deadline or more.

As usual, this work and most of the development of JSON-timeline and Wesgr were sponsored by Collabora.

PS. Latency and timing issues are nothing new. Owen Taylor has several excellent posts on related subjects in his blog.
February 09, 2015

dell-dw5570

Your Dell modem not getting online?

It’s not uncommon to find weird mobile broadband modems that for one reason or another don’t end up working as expected with NetworkManager/ModemManager; but the new 3G/4G modems in Dell laptops are at a total different level. These Dell-branded devices are really Sierra Wireless powered modems, e.g. the Dell 5808 is a Sierra Wireless MC7355, or the Dell DW5570 is a Sierra Wireless MC8805.

Late last year we started to receive several bugreports in the ModemManager and libqmi mailing lists for these kind of devices. Basically, the modem would never get to a proper online mode with the RF transceivers powered and therefore would never even get registered in the mobile network. This was happening to both QMI and MBIM based configurations, and the direct error message reported by libqmi when trying to get into online mode was just… not very very helpful.


  $ sudo qmicli -d /dev/cdc-wdm1 --dms-get-operating-mode
  [/dev/cdc-wdm1] Operating mode retrieved:
    Mode: 'low-power'
    HW restricted: 'no'


  $ sudo qmicli -d /dev/cdc-wdm1 --dms-set-operating-mode=online
  error: couldn't set operating mode: QMI protocol error (3): 'Internal'

The issue was reported to the kernel, assuming that this would likely be a new missing rfkill related setup in newer Dell laptops. One of the users reported in that same bugreport that actually using Sierra’s GobiNet driver instead of qmi_wwan would end up putting the modem in online mode, so just switching drivers during boot would make it work. WTF?

Digging in Sierra’s GobiNet QMI driver

Well, without much hope of finding anything, and given that I had just bought such a Dell modem myself for testing a new “Dell” plugin, I decided to dig into Sierra’s kernel driver sources. Apart from some already known things (e.g. they use the WDA service to set the net data format in new modems instead of the old CTL service), these lines popped:

  if (is9x15)
  {
    // Set FCC Authentication
    result = QMIDMSSWISetFCCAuth( pDev );
    if (result != 0)
    {
      return result;
    }
  }

The Sierra GobiNet driver is sending some magic “FCC auth” command during boot to the modem; which according to the driver sources maps to command 0x555F in the DMS service. Hey I should try that!

Adding the new command support in libqmi wasn’t difficult, so in some minutes I was ready to test it… and worked.

  $ sudo qmicli -d /dev/cdc-wdm1 --dms-get-operating-mode
  [/dev/cdc-wdm1] Operating mode retrieved:
    Mode: 'low-power'
    HW restricted: 'no'


  $ sudo qmicli -d /dev/cdc-wdm1 --dms-set-fcc-authentication
  [/dev/cdc-wdm1] Successfully set FCC authentication


  $ sudo qmicli -d /dev/cdc-wdm1 --dms-get-operating-mode
  [/dev/cdc-wdm1] Operating mode retrieved:
    Mode: 'online'
    HW restricted: 'no'

Support for this is already available automatically when using libqmi and ModemManager git master. It will hit the next stable releases likely as well.

MBIM?

Well, I don’t know if there is any command in MBIM to do the same operation (likely there is in a Sierra-specific service), but one thing we could anyway try to do is to use “QMI embedded in MBIM“, which Bjørn has already tested some times. I’ll try to test that some day, but I’ll need to get another modem as my DW5570 only comes up with a QMI configuration. For now, if you’re stuck with this problem using MBIM, you can likely just select USB configuration #1 using usb_modeswitch and get the modem switched to QMI mode.

TL;DR?

Dell-branded Sierra Wireless modems need the “FCC Auth” command (QMI DMS service, 0x555F) before they can be brought online; supported in libqmi and ModemManager already.

[UPDATE]
These fixes have been already released in ModemManager 1.4.4 and libqmi 1.12.4.


Filed under: Development, FreeDesktop Planet, GNOME Planet, Planets Tagged: Dell, GobiNet, libqmi, MBIM, ModemManager, QMI, sierra-wireless
February 06, 2015

I just pushed a patchset into libinput to introduce the concept of device groups. This post will explain what they are in this context and why they are needed.

libinput exposes kernel devices as an opaque struct libinput_device. It only recognises evdev devices at this point, this may change in the future if we see a need for it. libinput also exposes a few bits of information about the device such as the name, PID/VID and a handle to the struct udev_device that matches this device. The latter enables callers to get more information from the device. libinput also provides a bunch of configuration settings for each device. Pointer devices get acceleration settings, absolute devices have calibration, etc. For most devices this works just fine.

Some devices like Wacom tablets are represented as multiple event nodes. On a 3.19 kernel you'd get three event nodes for an Intuos 5 touch - the pad (i.e. the tablet itself), a touch node and one node for all the tools (stylus, eraser, etc. multiplexed). libinput exposes each of these nodes as separate device, but that is problematic when applying certain configuration settings. For example, applying a left-handed configuration to the tablet means it's rotated by 180 degrees so we need to rotate the coordinates accordingly. Of course, such a rotation would have to apply to both the touch and the stylus devices but now the caller is left with having to figure out which other devices to set.

The original idea was to present such devices as a single, merged struct libinput_device with multiple capabilities, i.e. a single physical device that can do touch, tablet and pad buttons. A configuration setting like left-handed-ness would then apply to all devices transparently. The API is clean, usage is simple, everybody is happy. Except when they aren't - this doesn't actually work particularly well. First, having such merged devices means we require devices to change at runtime, adding/removing capabilities on-the-fly which puts a burden on the callers to handle this correctly. Second, not all configuration options apply to all subdevices. If the Intuos is used as a touchpad you may want natural scrolling enabled on the touchpad but the wheel on the Wacom mouse should probably still work normally. Third, the subdevices may have different PID/VIDs and certainly have different udev devices. So now libinput needs a way to get to those. In short, a merged device looks nice in theory but the implementation of it would make the libinput API cumbersome to use for little benefit.

The solution to this are device groups: each device in libinput is now part of a struct libinput_device_group. This is just an opaque object that doesn't do anything but sit there but it's enough to identify how devices are grouped together. If two devices return the same device group, they logically belong together. The caller can then decide what to do with it, e.g. loop through all devices of a group to apply a certain configuration setting to all devices. The basic approach is thus:


new_device = libinput_event_get_device(event);
new_group = libinput_device_get_device_group(new_device);
libinput_device_group_ref(new_group);

for each (device, group) in previously_stored_devices {
if (group == new_group)
printf("This device shares a group with %s", device);
}
The device groups' lifetime is as you'd expect: it is created for the first device in the group and ceases once the last device in a group is removed. It's not deleted until the last reference was deleted but it won't get recycled. In other words, if you keep unplugging and re-plugging that Intuos tablet, the device group will be new after every plug.

Note that we're intentionally not providing ways to get the devices from a device group, or counting the devices within a group, etc. This avoids race conditions (the view libinput has of the devices isn't the same as the caller has while going through the event queue) but it also makes the API simpler. libinput's callers are mainly compositors which use toolkits with advanced datastructures (glib, Qt, etc.). Using a pointer as key into a hashmap is simpler and less buggy than using whatever hand-crafted hashmap/list implementation we can provide through the libinput API.

January 29, 2015

Lenovo released a new set of laptops for 2015 with a new (old) feature: the trackpoint device has the physical buttons back. Last year's experiment apparently didn't work out so well.

What do we do in Linux with the last generation's touchpads? The kernel marks them with INPUT_PROP_TOPBUTTONPAD based on the PNPID [1]. In the X.Org synaptics driver and libinput we take that property and emulate top software buttons based on it. That took us a while to get sorted and feed into the myriad of Linux distributions out there but at some point last year the delicate balance of nature was restored and the touchpad-related rage dropped to the usual background noise.

Slow-forward to 2015 and Lenovo introduces the new series. In the absence of unnecessary creativity they are called the X1 Carbon 3rd, T450, T550, X250, W550, L450, etc. Lenovo did away with the un(der)-appreciated top software buttons and re-introduced physical buttons for the trackpoint. Now, that's bound to make everyone happy again. However, as we learned from Agent Smith, happiness is not the default state of humans so Lenovo made sure the harvest is safe.

What we expected to happen was that the trackpoint device has BTN_LEFT, BTN_MIDDLE, BTN_RIGHT and the touchpad has BTN_LEFT and is marked with INPUT_PROP_BUTTONPAD (i.e. it is a Clickpad). That is the case on the x220 generation and the T440 generation. Though the latter doesn't actually have trackpoint buttons and we emulated them in software.

On the X1 Carbon 3rd, the trackpoint has BTN_LEFT, BTN_MIDDLE, BTN_RIGHT but they never send events. The touchpad has BTN_LEFT and BTN_0, BTN_1 and BTN_2 [2]. Clicking the left button on the trackpoint generates BTN_0 on the touchpad device, clicking the right button generates BTN_1 on the touchpad device. So in short, Lenovo has decided to wire the newly re-introduced trackpoint buttons to the touchpad, not the trackpoint. [3] The middle button is currently dead, which is a kernel bug. Meanwhile we think of it as security feature - never accidentally paste your password into your IRC session again!

What does this mean for us? Neither synaptics nor evdev nor libinput currently support this so we've been busy aipodae and writing patches like crazy. The patch goes into the kernel and udev.... The two patches needed go into the kernel and udev, and libinput. No, the three patches needed go into the kernel, udev and libinput, and synaptics. The four patches, no, wait. Amongst the projects needing patches are the kernel, udev, libinput and synaptics. I'll try again:

With those put together, things pretty much work as they're supposed to. libinput handles middle button scrolling as well this way but synaptics won't, much for the same reason it didn't work in the previous generation: synaptics can't talk to evdev and vice versa. And given that synaptics is on life support or in pallative care, depending how you look at it, I recommend not holding your breath for a fix. Otherwise you may join it quickly.

Note that all the patches are fresh off the presses and there may be a few bits changing before they are done. If you absolutely can't live without the trackpoint's buttons you can work around it by disabling the synaptics kernel driver until the patches have trickled down to your distribution.

The tracking bug for all this is Bug 88609. Feel free to CC yourself on it. Bring popcorn.

Final note: I haven't seen logs from the T450, T550, ... devices yet yet so this is so far only confirmed on the X1 Carbon so far. Given the hardware is essentially identical I expect it to be true for the rest of the series though.

[1] We also apply quirks for the 2013 generation because the firmware was buggy - a problem Synaptics Inc. has since fixed (but currently gives us slight headaches).
[2] It is also marked with INPUT_PROP_TOPBUTTONPAD which is a bug. It uses a new PNPID but one that was in the range we previously believed was for pads without trackpoint buttons. That's an an easy thing to fix.
[3] The reason for that seems to be HW design: this way they can keep the same case/keyboard and just swap the touchpad bits.
[4] synaptics is old enough to support dedicated scroll buttons. Buttons that used to send BTN_0 and BTN_1 and are thus interpreted as scroll up/down event.

January 28, 2015
Another kernel release is imminent and a lot of things happened since my last big blog post about atomic modeset. Read on for what new shiny things 3.20 will bring this area.

Atomic IOCTL and Properties


The big thing for sure is that the actual atomic IOCTL from Ville has finally landed for 3.20. That together with all the work from Rob Clark to add all the new atomic properties to make it functional (there's no IOCTL structs for any standard values, everything is a property now) means userspace can finally start using atomic. Well, it's still hidden behind the experimental module option drm.atomic=1 but otherwise it's all there. There's a few big differences compared to earlier iterations:
  • Core properties are now fully handled by the core, drivers can only decode their own properties. This should mean that the handling of standardized properties should be more uniform and also makes it easier to extend (e.g. with the standardized rotation property we already have) beyond the paramaters the legacy interfaces provided.
  • Atomic props&ioctl are opt-in per file_priv, userspace needs to explicitly ask for it. This is the same idea as with universal plane support and will make sure that old userspace doesn't get confused and fall over when it would see all the new atomic properties. In the same spirit some of the legacy-only properties (like DPMS) will be rejected in the atomic IOCTL paths.
  • Atomic modesets are currently not possible since the exact ABI for how to handle the mode property is still under discussion. The big missing thing is handling blob properties, which will also be needed to upload gamma table updates through the atomic interface.
Another missing piece that's now resolved is the DPMS support. On a high level this greatly reduces the complexity of the legacy DPMS settings into a  simple per-CRTC boolean. Contemporary userspace really wants nothing more and anything but a simple on/off is the only thing that current hardware can support. Furthermore all the bookkeeping is done in the helpers, which call down into drivers to enable or disable entire display pipelines as needed. Which means that drivers can rip out tons of code for the legacy DPMS support by just wiring up drm_atomic_helper_connector_dpms.

New Driver Conversions and Backend Hooks


Another big thing for 3.20 is that driver support has moved forward greatly: Tegra has now most of the basic bits ready, MSM is already converted. Both still lack conversion and testing for DPMS since that landed very late though. There's a lot of prep work for exynos, but unfortunately nothing yet towards atomic support. And i915 is in the process of being converted to the new structures and will have very preliminary atomic plane updates support in 3.20, hidden behind a module option for now.

And all that work resulted in piles of little fixes. Like making sure that the legacy cursor IOCTLs don't suddenly stall a lot more, breaking old userspace. But we also added a bunch more hooks for driver backends to simplifiy driver code:
  • For drivers there's often a very big difference between a plane update, and disabling a plane. And with atomic state objects it's a bit more work to figure out when exactly you're doing a on to off transition. Thierry Redding added a new optional ->atomic_plane_disable() hook for drivers which will take care of all those disdinctions.
  • Mostly for hysterical raisins going back to the original Xrandr implementations for userspace mode setting the callbacks to enable and disable encoders and CRTCs used by the various helper libraries have really confusing names. And with the legacy helpers all kinds of strange semantics. Since the atomic helpers massively simplify things in this area it made a lot of sense to add a new set of ->enable() and ->disable() hooks, which are preferred if they're set. All the other hooks (namely ->prepare(), ->commit() and ->dpms()) will eventually be deprecated for atomic drivers. Note that ->mode_set is already superseeded by ->mode_set_nofb due to the explicit primary plane handling with atomic updates.
Finally driver conversions showed that vblank handling requirements imposed by the atomic helpers are a lot stricter than what userspace tends to cope with. i915 has recently reworked all its vblank handling and improved the core helpers with the addition of drm_crtc_vblank_off() and drm_crtc_vblank_on(). If these are used in the CRTC disable and enable hooks the vblank code will automatically reject vblank waits when the pipe is off. Which is the behaviour both the atomic helpers and the transitional helpers expect.

One complication is that on load the driver must ensure manually that the vblank state matches up with the hardware and atomic software state with a manual call to these functions. In the simple case where drivers reset everything to off (which is what the reset implementations provided by the atomic helpers presume) this just means calling drm_crtc_vblank_off() somewhen in the driver load code. For drivers that read out the actual hardware state they need to call either _off() or _on() matching on the actual display pipe status.

Future Work


Of course there's still a few things left to do before atomic display updates can be rolled out to the masses. And a few things that would be rather nice to have, too:
  • Support for blob properties so that modesets and a bunch of other neat things can finally be done.
  • Testing, and lots of it, before we enable atomic by default.
  • Thanks to DP MST connectors can be hotplugged, and there are still a lot of lifetime issues surrounding connector handling all over the drm subsystem. Atomic display code is unfortunately no exception.
  • And finally try to make the vblank code a bit more generally useful and then implement generic async atomic commit on top of that.
It's promising to keep interesting!
January 27, 2015

Yesterday I released version 0.8 of AppStream, the cross-distribution standard for software metadata, that is currently used by GNOME-Software, Muon and Apper in to display rich metadata about applications and other software components.

 What’s new?

The new release contains some tweaks on AppStreams documentation, and extends the specification with a few more tags and refinements. For example we now recommend sizes for screenshots. The recommended sizes are the ones GNOME-Software already uses today, and it is a good idea to ship those to make software-centers look great, as others SCs are planning to use them as well. Normal sizes as well as sizes for HiDPI displays are defined. This change affects only the distribution-generated data, the upstream metadata is unaffected by this (the distro-specific metadata generator will resize the screenshots anyway).

Another addition to the spec is the introduction of an optional <source_pkgname/> tag, which holds the source package name the packages defined in <pkgname/> tags are built from. This is mainly for internal use by the distributor, e.g. it can decide to use this information to link to internal resources (like bugtrackers, package-watch etc.). It may also be used by software-center applications as additional information to group software components.

Furthermore, we introduced a <bundle/> tag for future use with 3rd-party application installation solutions. The tag notifies a software-installer about the presence of a 3rd-party application bundle, and provides the necessary information on how to install it. In order to do that, the software-center needs to support the respective installation solution. Currently, the Limba project and Xdg-App bundles are supported. For software managers, it is a good idea to implement support for 3rd-party app installers, as soon as the solutions are ready. Currently, the projects are worked on heavily. The new tag is currently already used by Limba, which is the reason why it depends on the latest AppStream release.

How do I get it?

All AppStream libraries, libappstream, libappstream-qt and libappstream-glib, are supporting the 0.8 specification in their latest version – so in case you are using one of these, you don’t need to do anything. For Debian, the DEP-11 spec is being updated at time, and the changes will land in the DEP-11 tools soon.

Improve your metadata!

This call goes especilly to many KDE projects! Getting good data is partly a task for the distributor, since packaging issues can result in incorrect or broken data, screenshots need to be properly resized etc. However, flawed upstream data can also prevent software from being shown, since software with broken data or missing data will not be incorporated in the distro XML AppStream data file.

Richard Hughes of Fedora has created a nice overview of software failing to be included. You can see the failed-list here – the data can be filtered by desktop environment etc. For KDE projects, a Comment= field is often missing in their .desktop files (or a <summary/> tag needs to be added to their AppStream upstream XML file). Keep in mind that you are not only helping Fedora by fixing these issues, but also all other distributions cosuming the metadata you ship upstream.

For Debian, we will have a similar overview soon, since it is also a very helpful tool to find packaging issues.

If you want to get more information on how to improve your upstream metadata, and how new metadata should look like, take a look at the quickstart guide in the AppStream documentation.

Beware of promo-newa.com, they scammed my colleague and friend Richard. As someone with experience in importing from China I know how scary and risky it can be so I completely sympathize with them. Apparently they sent hacked 96Mb flash drives that reported to be 1Gb flash drives.

Let's make sure the internet is filled with references to this scam. Also, if you live in Shenzhen and/or can think of any of helping him that'd be really nice.

At the very least, make sure you share this post around in your preferred social media wall!

Yesterday I arrived to Cambridge to attend the DevX hackfest. Loads of good stuff going on, I am mostly focusing on trying to integrate the hundreds of ignored pull requests we're getting in Github's mirror with Bugzilla automatically. In the meantime loads of interesting discussions about sandboxing, Builder, docs and mallard balls being thrown all over the place and hitting my face (thanks Kat).

It is real nice to catch up with everyone, we went for dinner to a pretty good Korean place, I should thank Codethink for kindly sponsoring the dinner. Afterwards we went to The Eagle pub, apparently the place where DNA discovery was celebrated and discussed.

And this morning we are celebrating Christian Hergert making it to the 50K stretch goal just before the end of the crowdfunding campaign for GNOME Builder.

I would like to thank my employer, Red Hat, for sponsoring my trip here too.

January 26, 2015

Jobs at Red Hat
So I got a LOT of responses to my blog post about the open positions we have here at Red Hat working on Fedora and the Desktop. In fact I got so many it will probably take a bit of time before we can work through them all. So you might have to wait a little bit before getting a response from us. Anyway, thanks you to everyone who sent me their CV, much appreciated and looking forward to working with those of you we end up hiring!

Builder campaign closes in 13 hours
I want to make one last pitch for everyone to contribute to the Builder crowdfunding campaign. It has just passed 47 000 USD as I write this, which means we just need another 3000 USD to reach
the graphical debugger stretch goal. Don’t miss out on this opportunity to help this exciting open source project!

January 22, 2015

It seems we can't ever get rid of the issues with this series. Daniel Martin filed a kernel bug for the latest series of these devices (Oct 2014) and it looks like they all need manual fixing again.

When the *40 series first came out, the PS/2 firmware was buggy and advertised bogus coordinate ranges for the x/y axes. Since we use those coordinate ranges to set up the size and position of software buttons (very much needed since that series did away with the physical trackpoint buttons) we added kernel patches for each of those laptops. The kernel would match on the PNPID (e.g. LEN0036 on a T440) and fix the min/max range for both axes based actual measurements. Since this requires someone to have a laptop, measure it, file a bug or send a patch, wait for it to get into the kernel, wait for it to get into distros it took quite a while to get all models supported.

Lenovo has updated the series in Oct 2014 and it's starting to get in the hands of users. And as it turns out, the touchpads now have different coordinate ranges but the same PNPID. And the values reported by the firmware are still bogus, so we need the same quirk, again, for each touchpad. Update 22/01/15: looks like the ranges are correct enough, so we don't need to update all ranges separately.

So in short: if you have one of the latest series *40 touchpads, your touchpad software buttons will be off. CC yourself on the kernel bug and if you have a model that's not listed there yet, add the required data. Eventually that will end up in the kernel and then everything is hunky-dory again. Until then, have a drink on behalf of the Synaptics/Lenovo QA departments.

Now the obvious question: why does this work with Windows? They don't use the PS/2 protocol but the SMBus/RMI4 interface and thus PS/2 firmware correctness is apparently not top priority for the QA departments. But the SMBus protocol requires the Host Notify feature, which caused Synaptics to reimplement the i2c driver for Windows. And that's what is shipped/preinstalled as driver. We don't support Host Notify on Linux, so there goes that idea. But there's strong suspicion that's not the only piece of the puzzle that's missing anyway...

Update 22/01/15: The min/max ranges advertised seem to be correct in the newer versions which would indicate that Synaptics has fixed the firmware. That's great (except for re-using the PNPID). Now we need to just detect that and drop the quirks for the newer touchpads. Hans has a good suggestion for how to do this, so with a bit of luck this will end up being only one kernel patch instead of one per device.

January 21, 2015

So Red Hat are currently looking to hire into the various teams building and supporting efforts such as the Fedora Workstation, the Red Hat Enterprise Linux Workstation and of course Fedora and RHEL in generaL. We are looking at hiring around 6-7 people to help move these and other Red Hat efforts forward. We are open to candidates from any country where Red Hat has a presence, but for a subset of the positions candidates relocating to our main engineering offices in Brno, Czech Republic or Westford, Massachussets, USA, will be a requirement or candidates interested in that will be given preference. We are looking for a mix of experienced and junior candidates, so regardless of it you are fresh out of school or haven been around for a while this might be for you.

So instead of providing a list of job descriptions what I want to do here is list of various skills and backgrounds we want and then we will adjust the exact role of the candidates we end up hiring depending on the mix of skills we find. So this might be for you if you are a software developer and have one or more of these skills or backgrounds:

* Able to program in C
* Able to program in Ruby
* Able to program in Javascript
* Able to program in Assembly
* Able to program in Python
* Experience with Linux Kernel development
* Experience with GTK+
* Experience with Wayland
* Experience with x.org
* Experience with developing for PPC architecture
* Experience with compiler optimisations
* Experience with llvm-pipe
* Experience with SPICE
* Experience with developing software like Virtualbox, VNC, RDP or similar
* Experience with building web services
* Experience with OpenGL development
* Experience with release engineering
* Experience with Project Atomic
* Experience with graphics driver enablement
* Experience with other PC hardware enablement
* Experience with enterprise software management tools like Satellite or ManageIQ
* Experience with accessibility software
* Experience with RPM packaging
* Experience with Fedora
* Experience with Red Hat Enterprise Linux
* Experience with GNOME

It should be clear from the list above that we are not just looking for people with a background in desktop development this time around, two of the positions for instance will mostly be dealing with Linux kernel development. We are looking for people here who can help us make sure the latest and greatest laptops on the market works great with Fedora and RHEL, be that on the graphics side or in terms of general hardware enablement. These jobs will put you right in the middle of the action in terms of defining the future of the 3 Fedora variants, especially the Workstation; defining the future of Red Hats Enterprise Linux Workstation products and pushing the Linux platform in general forward.

If you are interested in talking with us about if we can find a place for you in Red Hat as part of this hiring round please send me your CV and some words about yourself and I will make sure to put you in contact with our recruiters. And if you are unsure about what kind of things we work on here I suggest taking a look at my latest blog about our Fedora Workstation 22 efforts to see a small sample of some of the things we are currently working on.

You can reach me at cschalle(at)redhat(dot)com.

January 19, 2015

A Fedora 22 feature is to use the libinput X.Org driver as default driver for all non-tablet devices. This replaces the current defaults of synaptics for touchpads and evdev for anything else (tablets usually use the wacom driver, if installed). As expected, changing a default has some repercussions both for users and for developers. These are outlined below, based on the libinput 0.8 release. Future versions may add features, so check with your latest local version. Generally, the behaviour should roughly stay the same, big changes such as devices not being detected or not working is most likely a bug. Some behaviours are new, e.g. always-on palm detection, top software buttons on specific touchpads, etc. If in doubt, check the libinput documentation for hints on whether something is supposed to work in a particular manner.

Changes visible to Users

Any custom xorg.conf snippets will cease to work, if they are properly stacked. Options set by snippets are almost always exclusive to one particular driver. When the default driver changes, the snippet may not apply to the device anymore. Whether they stop working depends whether the Driver line is present. Consider this example snippet:

Section "InputClass"
Identifier "enable tapping"
MatchProduct "my touchpad"
Driver "synaptics"
Option "TapButton1" "1"
EndSection
This snippet does two things: it assigns the synaptics driver to the "my mouse" device and sets the option TapButton1. The assignment will override the default libinput assignment, i.e. this device won't change behaviour, you just don't get to use any new features. If the Driver line is not present then this snippet won't do anything, the libinput driver does not provide a TapButton1 option. It is safe to leave the snippet in place, options that are not supported by a driver are simply ignored.

The xf86-input-libinput man page has a list of options that can be set. For example, the above snippet would have an equivalent as


Section "InputClass"
Identifier "enable tapping"
MatchDriver "libinput"
MatchProduct "my touchpad"
Option "Tapping" "on"
EndSection
Note that this matches on a driver rather than assign the driver. Since options are driver-specific this is the correct approach.

The other visible change is a difference in default pointer speed. We have fine-tuning pointer acceleration on our TODO lists, but we're not quite there yet and any help would be appreciated. In the meantime you may see changes in pointer acceleration.

Finally, you may see certain features have gone the way of the dodo. Specifically the touchpad code exposes a lot less knobs to tweak. Most of that is intentional, some of it may warrant discussion. If there is a particular feature you are missing or doesn't work as you want to, please file a bug.

Changes visible to developers

These changes affect desktop environments, specifically the part that configures input devices. The changes affect three categories: pointer acceleration, button mapping, touchpad disabling and device properties. The property "libinput Send Events Modes Available" exists on all devices, it can be used to determine if a device is handled by the libinput driver.

Pointer acceleration

The X server exposes a variety of knobs for its pointer acceleration code. The oldest knob (and specified in the core protocol) is the XChangePointerControl request. In some environments this is exposed as a single slider, in others it's split into multiple settings (Acceleration and Threshold, for example).

libinput does away with this and only exposes a single 1-value float property "libinput Accel Speed" with a range of -1 (slowest) to 1 (fastest). The XChangePointerControl request has no effect on a libinput device. It is up to you how to map the current speed mappings into the [-1, 1] range.

Button mapping

The X server provides button mapping through the XSetPointerMapping request. This is most commonly used to apply a left-handed configuration and to enable natural scrolling. The call will continue to work with the libinput driver, but better methods are available.

The property "libinput Left Handed Enabled" takes a single boolean 8-bit value to enable and disable left-handed mode. Unlike the X request this will automatically take care of the tapping configuration (and other things in the future). If the property is not available on a device, that device has no left-handed mode.

The property "libinput Natural Scrolling Enabled" takes a single boolean 8-bit value to enable and disable natural scrolling. This applies to smooth scrolling and legacy button scrolling (which the libinput driver doesn't do anyway). If the property is not available on a device, that device has no natural scrolling mode.

Touchpad disabling

In the synaptics driver, disabling the touchpad is usually done with the "Synaptics Off" property. This is used by syndaemon to turn the touchpad off while typing. libinput does this by default, so it is safe to simply ignore this at all. Don't bother starting syndaemon, it won't control the libinput driver.

Device properties

Any code that handles a driver-specific property (prefixed by "evdev" or "synaptics") will stop working. These properties are not exposed by the libinput driver (we tried, it was not viable). KDE's kcm_touchpad module is a particularly bad offender here, it exposes almost every toggle the driver ever had. Make sure the code behaves well if the properties are not present and you're basically good to go.

If you decide to handle libinput-specific properties, the general rule is: if a single-value property is not present, that configuration does not apply to this device. Bitmask-style properties are split into an "libinput foo Available" and "libinput foo Enabled". The former lists the available settings, the latter enables a specific setting. Have a look at the xf86-input-libinput source for details on each property.

So Fedora Workstation 21 is done and out and I am extremely pleased to see the positive reception and great reviews. But we are not resting on our laurels here and are already busy planning for the Fedora Workstation 22 release. As many of you might know Fedora Workstation 22 is going to come up relatively fast, so we only have about 6 more weeks of development on it feature the freezes starts to kick inn. Luckily we have a relatively long list of items that we started working on during the Fedora Workstation 21 cycle that is nearing completing and thus should make the next release. We are of course also working on bigger long term developments that you should maybe see the first outline of in Fedora 22, but not the final version. I thought it would be nice to summarize some of the bigger items we expect to land and link to the relevant blogs and announcements for each one.

Wayland
So first out is to give an update on our work on Wayland as I know that is something a lot of people are curios about. We are continuing to make great strides forward and recently hired Jonas Ådahl to the team who many might recognize as an active Wayland and libinput developer. He will be spearheading our overall Wayland effort as we are approaching the finish line. All in all things are looking good, we got a lot of the basic plumbing in place for Fedora Workstation 21, so most works these days is mostly focused on polish and cleanups. One of the bigger items is the migration to use libinput. libinput is a library we decided to create to be able to share input device handling between X and Wayland and thus make the transition smoother and lower our workload during the transition period. Libinput itself is getting very close to feature complete and they are even working on some new features for it now taking it beyond what was in X. Peter Hutterer recently released version 0.8 and we expect to have 1.0 out and in use for both X.org and Wayland in time for Fedora Workstation 22.
In parallel we are also working on porting the needed bits in GNOME over to use libinput and remove any lingering X dependencies, like the GNOME Control Center which should also be ready for Fedora Workstation 22.

Another major change related to Wayland in Fedora Workstation 22 is that we will switch the default backends in GTK+ and SDL over to using Wayland. Currently in Fedora Workstation 21 applications are actually running on top of XWayland, but in Fedora Workstation 22 at least GTK+ and SDL applications will be default to Wayland when run under the Wayland session.
The Wayland SDL backend has been around for quite a bit, but Jonas Ådahl plans on spending some time smoothing out the last rough edges, in fact for SDL applications we hope we can actually provide noticeable performance improvement over X in some cases (not because OpenGL will be faster of course, but because we might be able to be smarter about handling different resolutions between desktop and game), but we have to wait and see if that pans out or if we have to settle for performance parity with X. We are also looking at getting the login session to use Wayland by default. All in all this should take us a huge step forward towards making using Wayland feel real.

As it looks now Wayland should be quite close to what you would define as feature complete for Fedora Workstation 22, but one thing that is going to take longer to reach maturity is the support for binary drivers, especially the NVidia ones. This of course is a task that mostly falls on NVidia for natural reasons, but we are trying to help out by Adam Jackson working to making sure Mesa works with their proposed EGLStreams and OpenGL Dispatcher proposals. So during the course of the coming year we will likely have a situation that you will be able to have a production ready Wayland session if you are running any of the open source drivers, but if you want to run Wayland on top of the NVidia binary driver that is most likely to only really be possible towards the end of the year. That said this is a guesstimate from our side as how quick the heavy lifting will happen, and how quickly it will be released by NVidia for public consumption is of course all relying on internal plans and resources at NVidia and not something we control.

Battery life
One thing we know being developers ourselves and from speaking with developers about their operating system of choice, battery life is among the top 5 reasons for what choice people make about their hardware and software. Due to this Owen Taylor has been investigating for some time now both what solutions exist today, what other operating systems are doing and what approaches we can take to improve battery life. Because a common complaint we hear from a lot of people is that they don’t feel they get great battery life when running linux on their laptops currently. Some people are able to solve this using powertop, but we feel there are a lot of room for automatically give our users better battery life beyond manual tweaking user powertop.

Improving battery life is a complex issue in many ways, including figuring out how to measure battery life. I guess everyone has seen laptops advertised with X number of hours of battery life, but it is our impression that those numbers tend to be quite bogus even when running the bundled operating system. In some testing we done we concluded that the worst offenders numbers could only true if you left your laptop idle in the corner with the screen blacked out. So gnome-battery-bench will help us achieve a couple of things, it should generate comparable battery lifetime numbers which both should help our users choose the hardware that gives the best battery life under linux and it also lets us as developers keep tracking how changes affect battery life so that we can catch regressions for instance. It also lets us verify the effect various kernel tuneables or ambient light detection schemes have on battery life in a better way than we can with existing tools. We also hope to use this to work with vendors to improve the battery life of their hardware when running Fedora or RHEL. Anyway, I suggest reading Owens Taylors blog for some more details of his work on improving battery life..

One important effort we want to undertake here, which might not all make it for Fedora Workstation 22, is taking advantage of the ambient light detectors in many modern laptops. One of the biggest battery drains in your system is the screen brightness setting and by using the ambient light detection hardware we hope to be able to put in place some intelligent behavior for different situations. This is a hard problem though and it was attempted solved in GNOME before, but the end result back then was that people felt they where “fighting” GNOME over their laptop brightness settings, so in the end it was dropped completely, so we need to careful to not repeat that outcome.

Application bundles
Another major effort that is not going to ready for Fedora Workstation 22, but which we might have some preview of is Application bundles. Matthias Clasen recently sent out an email to the Fedora Desktop mailing list outlining the state of the application bundles work. This is a continuation of the Sandboxed Applications in GNOME proposal from Lennart Poettering. The effort is being spearheaded by Alexander Larsson and the goal is to build the infrastructure needed to do sandboxed desktop applications efficiently. There is a wiki page up already detailing Sandboxed Apps and there are some test applications already available. For instance you can grab an application bundle of Builder, the cool new IDE project from Christian Hergert. (Hint, make sure to support his Builder crowdfunding effort if you have not already.). Once this effort matures it will revolutionize how desktop applications are built and distributed. It should make life easier for application developers as these bundled applications are designed to be distribution agnostic and the sandboxing aspect should help improve security. Also the transition should put the application developers directly in charge of the update cycle of their applications enabling them to better support their users.

3rd Party Application handling
So the ever resourceful Richard Hughes has been working on adding support for handling 3rd party applications in GNOME Software. He outlined this effort in a recent blog post about GNOME Software.

While the end goal here it to offer 3rd party application bundles as described in the section above, the feature has also a lot of near term advantages. We have seen that over the course of the last years we moved from a model where you use your browser to search for software online to users expecting to find all software available for a system through its app store. With this 3rd party application support available in GNOME Software we can start working to make that expectation a reality also in Fedora. We took great strides forward in Fedora Workstation 21 with having metadata available for most of the standard applications packaged in Fedora, but there is also a lot of popular applications and other things out there that people tend to install and use which we for various reasons are not interested or able to ship in Fedora. The reason for this can range from licensing issues, to packaging issues to simply resource issues. With Richards work we will be able to make such software discoverable in Fedora, yet make a clear distinction between the software we have vetted and checked and the software you get from 3rd parties.

How to deal with 3rd party software has been a long and ongoing discussion in the Fedora community, and there are a lot of practical and principal details to deal with, but hopefully with this infrastructure in place it will be a lot easier to navigate those issues as people have something concrete to relate to instead of just abstract ideas and concepts.

One challenge for instance we have to figure out is that on one side we don’t want 3rd party software offered in Fedora to be some for of endorsement or sign of being somehow vetted by the Fedora Project on an ongoing basis, yet on the other side the list will most likely need to be based on some form of editorial process to for instance protect both Red Hat and Fedora from potential legal threats. I plan on sending an initial proposal to the Workstation Working Group soon for how this can work and once we hashed out the details there we will need to bring the Working groups proposal into the wider Fedora community as this also affects our Cloud and Server offering.

File Manager
A lot of people these days use Google Drive, be that personally or because their company has a corporate Google apps account. So to make life easier for our users we are making sure that Nautilus are able to treat your Google drive as just another drive on the system, letting you drag and drop files off or on it. We also dedicated some effort to clean up and modernize the file manager in general, with Carlos Soriano blogging about his efforts there just before Christmas. All in all I think these are improvements that should improve the life of our developer and sysadmin target audience, but of course they are also very useful improvements to the general linux using public.

Qt Theming
One of the things we had to postpone for Fedora Workstation 21 was the Adwaita theme for Qt applications. We are expecting it to hit Fedora Workstation 22 though and you can get the theme to install and test from Martin Briza copr repository. The end goal here is wether you run a pure Qt application like Skype or Scribus, or a KDE application like Krita or Amarok, you should get an Adwaita look and feel to the application. Of course desktop integration isn’t just about a theme, there is a reason the GNOME HIG exists, but this should be an improvement over the current situation. The theme currently targets Qt4, but of course Qt5 is also on the roadmap for a later release.

Further terminal improvements
As I mentioned in an earlier blog entry about Fedora Workstation we realize that the terminal is the most important application for many developers and sysadmins. So we are also hoping to land some more of the terminal improvements we been working on in Fedora Workstation 21. The notifications for long running jobs being maybe the thing I know a lot of developers are excited about getting their hands on. It will let you for instance start a long compile in a terminal and know that you will be notified once it is completed instead of having to manually check in from time to time.

More development tools
In my opinion the best IDE for Python development currently is PyCharm. And not only is it the best from a functionality standpoint they also decided to release an open source version last year. That said we have been struggling a bit with the packaging of PyCharm, and interestingly enough it is one of those applications I think will benefit greatly from the application bundle work we are doing, but in the meantime we at least do have a Copr of PyCharm available. It is still an open question, but we might make this CoPR one of our testcases for the 3rd party application support in GNOME Software as mentioned earlier. Anyway if you are a Python developer I strongly recommend taking a lot at it. Personally I looked at various Python IDEs over the years, but always ended up just going back to Gedit, but when trying PyCharm it was the first time I felt that the application actually offered me useful functionality beyond what a text editor like Gedit does. Also in recent versions they also deal well with the introspection based Python bindings for GTK3 which was a great boon for me.

We are also looking at improving the development story around Vagrant and doing Fedora and RHEL development, more details on that at a later point.

ABRT improvements
The ABRT tool has become a crucial development tool for us over the last couple of years. The Fedora Retrace server is one of our main tools
for prioritizing which bugs to look at first and a crucial part of our goal of making Fedora a solid
distribution. That said, especially its early days, ABRT has had its share of detractors and people
being a bit frustrated with it, so Bastien Nocera and Allan Day has been working with the ABRT team to both integrate
it further with the desktop, for instance ensuring that it follows your desktop wide privacy settings
and to make sure that the user experience of submitting a retrace report is as smooth and intuitive as possible
and not to mention as unobtrusive as possible, for instance you don’t want ABRT to choke your system while trying to generate
a stack trace for us. The Fedora Workstation Tasklist contains links to bugzilla and github so you can track their progress.

Still a lot to do!
So making our vision for the Fedora Workstation come through takes of course a lot of effort from a lot of people. And we are really lucky to be part of such a great community where so much cool stuff is happening all the time. I mean the Builder effort from Christian Hergert as I talked about earlier is one of them, but there are so many other things happening too. So if you want to get involved take a look at our tasklist and see if there is anything that interests you or for that matter if there is something that you think should be worked on, but isn’t on the list yet. Then come join us either on #fedora-workstation on the freenode IRC network or join the fedora-desktop mailing list.

So I'm stuck somewhere on an airport and jetlagged on my return trip from my very first LCA in Auckland - awesome conference, except for being on the wrong side of the globe: Very broad range of speakers, awesome people all around, great organization and since it's still a community conference none of the marketing nonsense and sales-pitch keynotes.

Also done a presentation about botching up ioctls. Compared to my blog post a bunch more details on technical issues and some overall comments on what's really important to avoid a v2 ioctl because v1 ended up being unsalvageable. Slides and video (curtesy the LCA video team).
January 15, 2015

libinput 0.8 was released yesterday. One feature I'd like to talk about here: the change to provide mouse wheel events as physical distances.

Mouse wheels are clicks. In the evdev protocol they're sent via the REL_WHEEL and REL_HWHEEL axes, with a value of 1 per click. Spinning the wheel fast enough will give you a higher value per event but it's still just a multiple of the physical clicks. This is a conundrum for libinput.

libinput exports scroll events as "axis" events, the value returned by libinput_event_pointer_get_axis_value() for touchpads and button scrolling is in "pixels". libinput doesn't have a concept of pixels of course but the compositor will likely take the relative motion and apply it to the cursor. Scroll events are in the same coordinate space, i.e. the scrolling for two-finger scrolling has the same feel as moving the pointer. This continuous coordinate space is at odds with the discrete values coming from a wheel. We added axis sources to the API so you can now tell whether an event was generated by the wheel or some other scroll methods. But still, the discrete values from a wheel are at odds with the other sources.

For libinput 0.8, we changed the default reporting mode of the wheel. For the click count, a new call libinput_event_pointer_get_axis_value_discrete() provides that number (positive and negative for both direction). The libinput_event_pointer_get_axis_value() on a wheel now returns the movement of the wheel in degrees. That gives us a continuous coordinate space that also opens up new possibilities: if you know the rotation of a mouse wheel in degrees, you know things like "has the wheel been turned a full rotation". I don't quite know how, but I'm sure there are interesting interfaces you can make from that :)

Of course, the physical properties don't change, the degrees are always a multiple of the click count, and on most mice one click count is a 15 degree movement. The Logitech M325 is a notable exception here with a 20 degree angle. This isn't advertised by the hardware of course so we rely on the udev hwdb to set it where it differs from the default. The patch for this has been pushed to systemd and will soon arrive at a distribution near you.

And to answer a question I'm sure will come up: those mice that support a free spinning scrollwheel don't change the reporting mode. The G500s for example merely moves a physical bit that provides friction and the click feel/noise. The device still reports in 15 degree angle counts. I strongly suspect that other mice with this feature are the same (if not, we can now work this continuous motion into libinput and handle it propertly).

January 14, 2015

After nearly 12 years working on Gentoo and hearing blathering about how “Gentoo is about choice” and “Gentoo is a metadistribution,” I’ve come to a conclusion to where we need to go if we want to remain viable as a Linux distribution.

If we want to have any relevance, we need to have focus. Everything for everybody is a guarantee that you’ll be nothing for nobody. So I’ve come up with three specific use cases for Gentoo that I’d like to see us focus on:

People developing software

As Gentoo comes, by default, with a guaranteed-working toolchain, it’s a natural fit for software developers. A few years back, I tried to set up a development environment on Ubuntu. It was unbelievable painful. More recently, I attempted the same on a Mac. Same result — a total nightmare if you aren’t building for Mac or iOS.

Gentoo, on the other hand, provides a proven-working development environment because you build everything from scratch as you install the OS. If you need headers or some library, it’s already there. No problem. Whereas I’ve attempted to get all of the barebones dev packages installed on many other systems and it’s been hugely painful.

Frankly, I’ve never come across as easy of a dev environment as Gentoo, if you’ve managed to set it up as a user in the first place. And that’s the real problem.

People who need extreme flexibility (embedded, etc.)

Nearly 10 years ago, I founded the high-performance clustering project in Gentoo, because it was a fantastic fit for my needs as an end user in a higher-ed setting. As it turns out, it was also a good fit for a number of other folks, primarily in academia but also including the Adelie Linux team.

What we found was that you could get an extra 5% or so of performance out of building everything from scratch. At small scale that sounds absurd, but when that translates into 5-6 digits or more of infrastructure purchases, suddenly it makes a lot more sense.

In related environments, I worked on porting v5 of the Linux Terminal Server Project (LTSP) to Gentoo. This was the first version that was distro-native vs pretending to be a custom distro in its own right, and the lightweight footprint of a diskless terminal was a perfect fit for Gentoo.

In fact, around the same time I fit Gentoo onto a 1.8MB floppy-disk image, including either the dropbear SSH client or the kdrive X server for a graphical environment. This was only possible through the magic of the ROOT and PORTAGE_CONFIGROOT variables, which you couldn’t find in any other distro.

Other distros such as ChromeOS and CoreOS have taken similar advantage of Gentoo’s metadistribution nature to build heavily customized Linux distros.

People who want to learn how Linux works

Finally, another key use case for Gentoo is for people who really want to understand how Linux works. Because the installation handbook actually works you through the entire process of installing a Linux distro by hand, you acquire a unique viewpoint and skillset regarding what it takes to run Linux, well beyond what other distros require. In fact I’d argue that it’s a uniquely portable and low-level skillset that you can apply much more broadly than those you could acquire elsewhere.

In conclusion

I’ve suggested three core use cases that I think Gentoo should focus on. If it doesn’t fit those use cases, I would suggest that we allow but not specifically dedicate effort to enabling those particulars.

We’ve gotten overly deadened to how people want to use Linux, and this is my proposal as to how we could regain it.


Tagged: gentoo
January 05, 2015

If you read this blog entry it is very likely that you are a direct beneficiary of open source and free software. Like myself you probably have been able to get hold of, use and tinker with software that in the old world of closed source dominance would all together have cost you maybe ten thousand dollars or more. So with the spirit of the Yuletide season fresh in mind it is time to open your wallet and support some important open source fundraising campaigns.

The first one is the Builder, an IDE of our GNOME which is an effort by the unstoppable Christian Hergert to create a truly powerful and modern IDE for GNOME. Christian has already made huge strides forward with his project since quiting his dayjob to start it, and helping fund him to cross the finish line would be greatly beneficial to us all. And I think it would make a wonderful addition to the Fedora Workstation effort, so this is an easy way for you to help us move that effort forward too. So head over to the fundraiser webpage or start by viewing the great fundraiser video below:

The second effort I want to highlight is the still ongoing fundraiser for the PiTiVi video editor. Since they started that effort they have raised 22190 USD of the 35 000 USD they need to get PiTiVi to a level where they are confident to make a 1.0 release. And I think we all agree that having a top notch video editor avaiable, especially one that uses GStreamer and thus helps improve our general multimedia story is very important. This effort also has a nice introduction video if I want to know more:

I have personally contributed money to both these efforts and I hope you will too! Both projects are crucial for the long term health of the ecosystem and both are done by credible teams with the right skills to succeed. So for those of us out of school and in paying jobs, setting aside for example 100 USD to help these two efforts should be an easy choice to make, the value we will get back easily dwarfs that amount.

January 01, 2015

I just pushed up a new branch to my LLVM repo that enables two important LLVM codegen features (machine scheduling and subreg livenes) for SI+ targets, which should improve performance of the radeonsi driver.

The biggest improvement that I’m seeing with this branch is the luxmark luxball OpenCL demo which is about 60% faster on my Bonaire. Other tests I’ve done show 10% – 25% improvements in performance. I haven’t done much OpenGL benchmarking, but I expect these changes will have much bigger impact on the OpenCL benchmarks, so OpenGL improvements may be in the lower end of that range. I still need more benchmark results to know for sure.

December 20, 2014
Just in time for the upcoming break, we have figured out how to do alpha-test, and now supertuxkart is rendering properly:



If you are wondering about the new stk beta, I have a build from a few weeks back which seems to render properly as well.. few rough edges but I think that is just from using random git commit-id for stk.  But we don't have enough gl3 features yet (on a3xx or a4xx) to be using the new rendering paths.

And gnome-shell works nicely too.  Still some rendering issues with xonotic.  And a little ways behind a3xx in piglit results, but not quite as much as I would have expected at this early stage.

Still missing are some optimizations that are important for certain use-cases (hw-binning support for games, GMEM bypass for UI/mipmap-generation/etc).  But the a420 in apq8084 (ifc6540 board) is surprisingly fast all the same.
December 17, 2014

Multi-Stream Transport 4k Monitors and X

I'm sure you've seen a 4k monitor on a friends desk running Mac OS X or Windows and are all ready to go get one so that you can use it under Linux.

Once you've managed to acquire one, I'm afraid you'll discover that when you plug it in, you're limited to 30Hz refresh rates at the full size, unless you're running a kernel that is version 3.17 or later. And then...

Good Grief! What Is My Computer Doing!

Ok, so now you're running version 3.17 and when X starts up, it's like you're using a gigantic version of Google Cardboard. Two copies of a very tall, but very narrow screen greets you.

Welcome to MST island.

In order to drive these giant new panels at full speed, there isn't enough bandwidth in the display hardware to individually paint each pixel once during each frame. So, like all good hardware engineers, they invented a clever hack.

This clever hack paints the screen in parallel. I'm assuming that they've got two bits of display hardware, each one hooked up to half of the monitor. Now, each paints only half of the pixels, avoiding costly redesign of expensive silicon, at least that's my surmise.

In the olden days, if you did this, you'd end up running two monitor cables to your computer, and potentially even having two video cards. Today, thanks to the magic of Display Port Multi-Stream Transport, we don't need all of that; instead, MST allows us to pack multiple cables-worth of data into a single cable.

I doubt the inventors of MST intended it to be used to split a single LCD panel into multiple "monitors", but hardware engineers are clever folk and are more than capable of abusing standards like this when it serves to save a buck.

Turning Two Back Into One

We've got lots of APIs that expose monitor information in the system, and across which we might be able to wave our magic abstraction wand to fix this:

  1. The KMS API. This is the kernel interface which is used by all graphics stuff, including user-space applications and the frame buffer console. Solve the problem here and it works everywhere automatically.

  2. The libdrm API. This is just the KMS ioctls wrapped in a simple C library. Fixing things here wouldn't make fbcons work, but would at least get all of the window systems working.

  3. Every 2D X driver. (Yeah, we're trying to replace all of these with the one true X driver). Fixing the problem here would mean that all X desktops would work. However, that's a lot of code to hack, so we'll skip this.

  4. The X server RandR code. More plausible than fixing every driver, this also makes X desktops work.

  5. The RandR library. If not in the X server itself, how about over in user space in the RandR protocol library? Well, the problem here is that we've now got two of them (Xlib and xcb), and the xcb one is auto-generated from the protocol descriptions. Not plausible.

  6. The Xinerama code in the X server. Xinerama is how we did multi-monitor stuff before RandR existed. These days, RandR provides Xinerama emulation, but we've been telling people to switch to RandR directly.

  7. Some new API. Awesome. Ok, so if we haven't fixed this in any existing API we control (kernel/libdrm/X.org), then we effectively dump the problem into the laps of the desktop and application developers. Given how long it's taken them to adopt current RandR stuff, providing yet another complication in their lives won't make them very happy.

All Our APIs Suck

Dave Airlie merged MST support into the kernel for version 3.17 in the simplest possible fashion -- pushing the problem out to user space. I was initially vaguely tempted to go poke at it and try to fix things there, but he eventually convinced me that it just wasn't feasible.

It turns out that all of our fancy new modesetting APIs describe the hardware in more detail than any application actually cares about. In particular, we expose a huge array of hardware objects:

  • Subconnectors
  • Connectors
  • Outputs
  • Video modes
  • Crtcs
  • Encoders

Each of these objects exposes intimate details about the underlying hardware -- which of them can work together, and which cannot; what kinds of limits are there on data rates and formats; and pixel-level timing details about blanking periods and refresh rates.

To make things work, some piece of code needs to actually hook things up, and explain to the user why the configuration they want just isn't possible.

The sticking point we reached was that when an MST monitor gets plugged in, it needs two CRTCs to drive it. If one of those is already in use by some other output, there's just no way you can steal it for MST mode.

Another problem -- we expose EDID data and actual video mode timings. Our MST monitor has two EDID blocks, one for each half. They happen to describe how they're related, and how you should configure them, but if we want to hide that from the application, we'll have to pull those EDID blocks apart and construct a new one. The same goes for video modes; we'll have to construct ones for MST mode.

Every single one of our APIs exposes enough of this information to be dangerous.

Every one, except Xinerama. All it talks about is a list of rectangles, each of which represents a logical view into the desktop. Did I mention we've been encouraging people to stop using this? And that some of them listened to us? Foolishly?

Dave's Tiling Property

Dave hacked up the X server to parse the EDID strings and communicate the layout information to clients through an output property. Then he hacked up the gnome code to parse that property and build a RandR configuration that would work.

Then, he changed to RandR Xinerama code to also parse the TILE properties and to fix up the data seen by application from that.

This works well enough to get a desktop running correctly, assuming that desktop uses Xinerama to fetch this data. Alas, gtk has been "fixed" to use RandR if you have RandR version 1.3 or later. No biscuit for us today.

Adding RandR Monitors

RandR doesn't have enough data types yet, so I decided that what we wanted to do was create another one; maybe that would solve this problem.

Ok, so what clients mostly want to know is which bits of the screen are going to be stuck together and should be treated as a single unit. With current RandR, that's some of the information included in a CRTC. You pull the pixel size out of the associated mode, physical size out of the associated outputs and the position from the CRTC itself.

Most of that information is available through Xinerama too; it's just missing physical sizes and any kind of labeling to help the user understand which monitor you're talking about.

The other problem with Xinerama is that it cannot be configured by clients; the existing RandR implementation constructs the Xinerama data directly from the RandR CRTC settings. Dave's Tiling property changes edit that data to reflect the union of associated monitors as a single Xinerama rectangle.

Allowing the Xinerama data to be configured by clients would fix our 4k MST monitor problem as well as solving the longstanding video wall, WiDi and VNC troubles. All of those want to create logical monitor areas within the screen under client control

What I've done is create a new RandR datatype, the "Monitor", which is a rectangular area of the screen which defines a rectangular region of the screen. Each monitor has the following data:

  • Name. This provides some way to identify the Monitor to the user. I'm using X atoms for this as it made a bunch of things easier.

  • Primary boolean. This indicates whether the monitor is to be considered the "primary" monitor, suitable for placing toolbars and menus.

  • Pixel geometry (x, y, width, height). These locate the region within the screen and define the pixel size.

  • Physical geometry (width-in-millimeters, height-in-millimeters). These let the user know how big the pixels will appear in this region.

  • List of outputs. (I think this is the clever bit)

There are three requests to define, delete and list monitors. And that's it.

Now, we want the list of monitors to completely describe the environment, and yet we don't want existing tools to break completely. So, we need some way to automatically construct monitors from the existing RandR state while still letting the user override portions of it as needed to explain virtual or tiled outputs.

So, what I did was to let the client specify a list of outputs for each monitor. All of the CRTCs which aren't associated with an output in any client-defined monitor are then added to the list of monitors reported back to clients. That means that clients need only define monitors for things they understand, and they can leave the other bits alone and the server will do something sensible.

The second tricky bit is that if you specify an empty rectangle at 0,0 for the pixel geometry, then the server will automatically compute the geometry using the list of outputs provided. That means that if any of those outputs get disabled or reconfigured, the Monitor associated with them will appear to change as well.

Current Status

Gtk+ has been switched to use RandR for RandR versions 1.3 or later. Locally, I hacked libXrandr to override the RandR version through an environment variable, set that to 1.2 and Gtk+ happily reverts back to Xinerama and things work fine. I suspect the plan here will be to have it use the new Monitors when present as those provide the same info that it was pulling out of RandR's CRTCs.

KDE appears to still use Xinerama data for this, so it "just works".

Where's the code

As usual, all of the code for this is in a collection of git repositories in my home directory on fd.o:

git://people.freedesktop.org/~keithp/randrproto master
git://people.freedesktop.org/~keithp/libXrandr master
git://people.freedesktop.org/~keithp/xrandr master
git://people.freedesktop.org/~keithp/xserver randr-monitors

RandR protocol changes

Here's the new sections added to randrproto.txt

                  ❧❧❧❧❧❧❧❧❧❧❧

1.5. Introduction to version 1.5 of the extension

Version 1.5 adds monitors

 • A 'Monitor' is a rectangular subset of the screen which represents
   a coherent collection of pixels presented to the user.

 • Each Monitor is be associated with a list of outputs (which may be
   empty).

 • When clients define monitors, the associated outputs are removed from
   existing Monitors. If removing the output causes the list for that
   monitor to become empty, that monitor will be deleted.

 • For active CRTCs that have no output associated with any
   client-defined Monitor, one server-defined monitor will
   automatically be defined of the first Output associated with them.

 • When defining a monitor, setting the geometry to all zeros will
   cause that monitor to dynamically track the bounding box of the
   active outputs associated with them

This new object separates the physical configuration of the hardware
from the logical subsets  the screen that applications should
consider as single viewable areas.

1.5.1. Relationship between Monitors and Xinerama

Xinerama's information now comes from the Monitors instead of directly
from the CRTCs. The Monitor marked as Primary will be listed first.

                  ❧❧❧❧❧❧❧❧❧❧❧

5.6. Protocol Types added in version 1.5 of the extension

MONITORINFO { name: ATOM
          primary: BOOL
          automatic: BOOL
          x: INT16
          y: INT16
          width: CARD16
          height: CARD16
          width-in-millimeters: CARD32
          height-in-millimeters: CARD32
          outputs: LISTofOUTPUT }

                  ❧❧❧❧❧❧❧❧❧❧❧

7.5. Extension Requests added in version 1.5 of the extension.

┌───
    RRGetMonitors
    window : WINDOW
     ▶
    timestamp: TIMESTAMP
    monitors: LISTofMONITORINFO
└───
    Errors: Window

    Returns the list of Monitors for the screen containing
    'window'.

    'timestamp' indicates the server time when the list of
    monitors last changed.

┌───
    RRSetMonitor
    window : WINDOW
    info: MONITORINFO
└───
    Errors: Window, Output, Atom, Value

    Create a new monitor. Any existing Monitor of the same name is deleted.

    'name' must be a valid atom or an Atom error results.

    'name' must not match the name of any Output on the screen, or
    a Value error results.

    If 'info.outputs' is non-empty, and if x, y, width, height are all
    zero, then the Monitor geometry will be dynamically defined to
    be the bounding box of the geometry of the active CRTCs
    associated with them.

    If 'name' matches an existing Monitor on the screen, the
    existing one will be deleted as if RRDeleteMonitor were called.

    For each output in 'info.outputs, each one is removed from all
    pre-existing Monitors. If removing the output causes the list of
    outputs for that Monitor to become empty, then that Monitor will
    be deleted as if RRDeleteMonitor were called.

    Only one monitor per screen may be primary. If 'info.primary'
    is true, then the primary value will be set to false on all
    other monitors on the screen.

    RRSetMonitor generates a ConfigureNotify event on the root
    window of the screen.

┌───
    RRDeleteMonitor
    window : WINDOW
    name: ATOM
└───
    Errors: Window, Atom, Value

    Deletes the named Monitor.

    'name' must be a valid atom or an Atom error results.

    'name' must match the name of a Monitor on the screen, or a
    Value error results.

    RRDeleteMonitor generates a ConfigureNotify event on the root
    window of the screen.

                  ❧❧❧❧❧❧❧❧❧❧❧
December 16, 2014
So kernel version 3.18 is out the door and it's time for our regular look at what's in the next merge window.
First looking at new hardware the big item is basic Skylake support. There are still a few smalls things missing, but mostly it's there now. This has been contributed by Damien, Satheeshakrishna and a lot of other folks. Looking at other platforms there has also been a lot of changes for vlv/chv: Improved backlight code, completely refactored interrupt handling to bring it in line with other platforms, rewritten panel power sequencing code, all from Ville. Rodrigo contributed PSR support for vlv/chv together with a lot of other fixes for PSR. Unfortunately it's not yet again enabled by default.

Moving on to Broadwell and the render side of things, Mika and Arun provided patches to improve the render workaround code and bring the set of workarounds up to date. execlist (the new command submission support on Gen8+) is also being polished with the addition of on-demand pinning of context objects with patches from Thomas Daniel and Oscar Mateo. Finally the RPS/render-turbo code has seen a lot of polish from Imre with a few fixes from Tom O'Rourke.

Otherwise not a lot of really big things happened on the GEM side: Just a few patches to fix issues in ppgtt (unfortunately still not enabled by default anywhere due to fun with context switches). And there's a bit of prep work and reorg all over for new stuff landing hopefully soon.

Looking at overall infrastructure changes the big thing certainly is the preparations for atomic display updates. The drm core/driver interface for atomic and all the helper library code to convert drivers has landed in 3.19, and already some conversions. On the Intel side it's been just prep work under the hood thus far with patches from Ander to precompute display PLL state. The new code to use vblank evades for pagelips has also landed, which is needed for atomic plane updates. And prep patches from Gustavo Padovan started to split the low-level plane update functions into check and commit steps. Lots more patches from different people are in flight and some have been merged for 3.20 already.

Besides these driver internal changes for atomic there has been other work to improve the codebase: Imre reorganized our handlers for suspend, resume and thawing and freezing. Jani reworked the audio and eld code which is the gfx side of the puzzle needed to make audio over HDMI or DP work. Jesse provided patches to track infoframes more accurately, which is needed to correctly fastboot (i.e. without modesets if possible) on external screens.

For older machines Ville has spent a few spare cycles to make them more useful: GPU reset support for gen3/4 should mitigate some of the recent chromium crashes on mesa, and the modeset code on i830M might work correctly for the first time, ever.


And of course the usual pile of smaller fixes and improvements all over.

Not directly related to code or features is the start of documenting i915 driver internals: With this release we now have some of the interrupt handling, fifo underrun reporting, frontbuffer tracking and runtime pm support newly document. And there's lots more in-flight, so hopefully soonish this will be fairly useful.

Those running Fedora Rawhide or GNOME 3.12 may have noticed that there is no Xorg.log file anymore. This is intentional, gdm now starts the X server so that it writes the log to the systemd journal. Update 29 Mar 2014: The X server itself has no capabilities for logging to the journal yet, but no changes to the X server were needed anyway. gdm merely starts the server with a /dev/null logfile and redirects stdin/stderr to the journal.

Thus, to get the log file use journalctl, not vim, cat, less, notepad or whatever your $PAGER was before.

This leaves us with the following commands.


journalctl -e _COMM=Xorg
Which would conveniently show something like this:

Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "wacom"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Lenovo Optical USB Mouse: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Integrated Camera: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Sleep Button: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Video Bus: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Power Button: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (EE) Server terminated successfully (0). Closing log file.
The -e toggle jumps to the end and only shows 1000 lines, but that's usually enough. journalctl has a bunch more options described in the journalctl man page. Note the PID in square brackets though. You can easily limit the output to just that PID, which makes it ideal to attach to the log to a bug report.

journalctl _COMM=Xorg _PID=5438
Previously the server kept only a single backup log file around, so if you restarted twice after a crash, the log was gone. With the journal it's now easy to extract the log file from that crash five restarts ago. It's almost like the future is already here.

Update 16/12/2014: This post initially suggested to use journactl /usr/bin/Xorg. Using _COMM is path-independent.

Fedora 21

Added 16/12/2014: If you recently updated to/installed Fedora 21 you'll notice that the above command won't show anything. As part of the Xorg without root rights feature Fedora ships a wrapper script as /usr/bin/Xorg. This script eventually executes /usr/libexecs/Xorg.bin which is the actual X server binary. Thus, on Fedora 21 replace Xorg with Xorg.bin:


journalctl -e _COMM=Xorg.bin
journalctl _COMM=Xorg.bin _PID=5438
Note that we're looking into this so that in a few updates time we don't have a special command here.

December 13, 2014

Present and Compositors

The current Present extension is pretty unfriendly to compositing managers, causing an extra frame of latency between the applications operation and the scanout buffer. Here's how I'm fixing that.

An extra frame of lag

When an application uses PresentPixmap, that operation is generally delayed until the next vblank interval. When using X without composting, this ensures that the operation will get started in the vblank interval, and, if the rendering operation is quick enough, you'll get the frame presented without any tearing.

When using a compositing manager, the operation is still delayed until the vblank interval. That means that the CopyArea and subsequent Damage event generation don't occur until the display has already started the next frame. The compositing manager receives the damage event and constructs a new frame, but it also wants to avoid tearing, so that frame won't get displayed immediately, instead it'll get delayed until the next frame, introducing the lag.

Copy now, complete later

While away from the keyboard this morning, I had a sudden idea -- what if we performed the CopyArea and generated Damage right when the PresentPixmap request was executed but delayed the PresentComplete event until vblank happened.

With the contents updated and damage delivered, the compositing manager can immediately start constructing a new scene for the upcoming frame. When that is complete, it can also use PresentPixmap (either directly or through OpenGL) to queue the screen update.

If it's fast enough, that will all happen before vblank and the application contents will actually appear at the desired time.

Now, at the appointed vblank time, the PresentComplete event will get delivered to the client, telling it that the operation has finished and that its contents are now on the screen. If the compositing manager was quick, this event won't even be a lie.

We'll be lying less often

Right now, the CopyArea, Damage and PresentComplete operations all happen after the vblank has passed. As the compositing manager delays the screen update until the next vblank, then every single PresentComplete event will have the wrong UST/MSC values in it.

With the CopyArea happening immediately, we've a pretty good chance that the compositing manager will get the application contents up on the screen at the target time. When this happens, the PresentComplete event will have the correct values in it.

How can we do better?

The only way to do better is to have the PresentComplete event generated when the compositing manager displays the frame. I've talked about how that should work, but it's a bit twisty, and will require changes in the compositing manager to report the association between their PresentPixmap request and the applications' PresentPixmap requests.

Where's the code

I've got a set of three patches, two of which restructure the existing code without changing any behavior and a final patch which adds this improvement. Comments and review are encouraged, as always!

git://people.freedesktop.org/~keithp/xserver.git present-compositor
December 08, 2014
As the development window for GNOME 3.16 advances, I've been adding a few new developer features, selfishly, so I could use them in my own programs.

Connectivity support for applications

Picking up from where Dan Winship left off, we've merged support for application to detect the network availability, especially the "connected to a network but not to the Internet" case.

In glib/gio now, watch the value of the "connectivity" property in GNetworkMonitor.

Grilo automatic network awareness

This glib/gio feature allows us to show/hide Grilo sources from applications' view if they require Internet and LAN access to work. This should be landing very soon, once we've made the new feature optional based on the presence of the new GLib.

Totem

And finally, this means we'll soon be able to show a nice placeholder when no network connection is available, and there are no channels left.

Grilo Lua resources support

A long-standing request, GResources support has landed for Grilo Lua plugins. When a script is loaded, we'll look for a separate GResource file with ".gresource" as the suffix, and automatically load it. This means you can use a local icon for sources with the URL "resource:///org/gnome/grilo/foo.png". Your favourite Lua sources will soon have icons!

Grilo Opensubtitles plugin

The developers affected by this new feature may be a group of one, but if the group is ever to expand, it's the right place to do it. This new Grilo plugin will fetch the list of available text subtitles for specific videos, given their "hashes", which are now exported by Tracker.

GDK-Pixbuf enhancements

I can point you to the NEWS file for the latest version, but the main gains are that GIF animations won't eat all your memory, DPI metadata support in JPEG, PNG and TIFF formats, and, for image viewers, you can tell whether a TIFF file is multi-page to open it in a more capable viewer.

Batched inserts, and better filters in GOM

Does what it says on the tin. This is useful for populating the database quicker than through piecemeal inserts, it also means you don't need to chain inserts when inserting multiple items.

Mathieu also worked on fixing the priority of filters when building complex queries, as well as supporting more than 2 items in a filter ("foo OR bar OR baz" for example).
December 06, 2014

click here to jump to the instructions

Mice have an optical sensor that tells them how far they moved in "mickeys". Depending on the sensor, a mickey is anywhere between 1/100 to 1/8200 of an inch or less. The current "standard" resolution is 1000 DPI, but older mice will have 800 DPI, 400 DPI etc. Resolutions above 1200 DPI are generally reserved for gaming mice with (usually) switchable resolution and it's an arms race between manufacturers in who can advertise higher numbers.

HW manufacturers are cheap bastards so of course the mice don't advertise the sensor resolution. Which means that for the purpose of pointer acceleration there is no physical reference. That delta of 10 could be a millimeter of mouse movement or a nanometer, you just can't know. And if pointer acceleration works on input without reference, it becomes useless and unpredictable. That is partially intended, HW manufacturers advertise that a lower resolution will provide more precision while sniping and a higher resolution means faster turns while running around doing rocket jumps. I personally don't think that there's much difference between 5000 and 8000 DPI anymore, the mouse is so sensitive that if you sneeze your pointer ends up next to Philae. But then again, who am I to argue with marketing types.

For us, useless and unpredictable is bad, especially in the use-case of everyday desktops. To work around that, libinput 0.7 now incorporates the physical resolution into pointer acceleration. And to do that we need a database, which will be provided by udev as of systemd 218 (unreleased at the time of writing). This database incorporates the various devices and their physical resolution, together with their sampling rate. udev sets the resolution as the MOUSE_DPI property that we can read in libinput and use as reference point in the pointer accel code. In the simplest case, the entry lists a single resolution with a single frequency (e.g. "MOUSE_DPI=1000@125"), for switchable gaming mice it lists a list of resolutions with frequencies and marks the default with an asterisk ("MOUSE_DPI=400@50 800@50 *1000@125 1200@125"). And you can and should help us populate the database so it gets useful really quickly.

How to add your device to the database

We use udev's hwdb for the database list. The upstream file is in /usr/lib/udev/hwdb.d/70-mouse.hwdb, the ruleset to trigger a match is in /usr/lib/udev/rules.d/70-mouse.rules. The easiest way to add a match is with the libevdev mouse-dpi-tool (version 1.3.2). Run it and follow the instructions. The output looks like this:


$ sudo ./tools/mouse-dpi-tool /dev/input/event8
Mouse Lenovo Optical USB Mouse on /dev/input/event8
Move the device along the x-axis.
Pause 3 seconds before movement to reset, Ctrl+C to exit.
Covered distance in device units: 264 at frequency 125.0Hz | |^C
Estimated sampling frequency: 125Hz
To calculate resolution, measure physical distance covered
and look up the matching resolution in the table below
16mm 0.66in 400dpi
11mm 0.44in 600dpi
8mm 0.33in 800dpi
6mm 0.26in 1000dpi
5mm 0.22in 1200dpi
4mm 0.19in 1400dpi
4mm 0.17in 1600dpi
3mm 0.15in 1800dpi
3mm 0.13in 2000dpi
3mm 0.12in 2200dpi
2mm 0.11in 2400dpi

Entry for hwdb match (replace XXX with the resolution in DPI):
mouse:usb:v17efp6019:name:Lenovo Optical USB Mouse:
MOUSE_DPI=XXX@125
Take those last two lines, add them to a local new file /etc/udev/hwdb.d/71-mouse.hwdb. Rebuild the hwdb, trigger it, and done:

$ sudo udevadm hwdb --update
$ sudo udevadm trigger /dev/input/event8
Leave out the device path if you're not on systemd 218 yet. Check if the property is set:

$ udevadm info /dev/input/event8 | grep MOUSE_DPI
E: MOUSE_DPI=1000@125
And that shows everything worked. Restart X/Wayland/whatever uses libinput and you're good to go. If it works, double-check the upstream instructions, then file a bug against systemd with those two lines and assign it to me.

Trackballs are a bit hard to measure like this, my suggestion is to check the manufacturer's website first for any resolution data.

Update 2014/12/06: trackball comment added, udevadm trigger comment for pre 218

December 02, 2014

Disclaimer: Limba is stilllimba-small in a very early stage of development. Bugs happen, and I give to guarantees on API stability yet.

Limba is a very simple cross-distro package installer, utilizing OverlayFS found in recent Linux kernels (>= 3.18).

As example I created a small Limba package for one of the Qt5 demo applications, and I would like to share the process of creating Limba packages – it’s quite simple, and I could use some feedback on how well the resulting packages work on multiple distributions.

I assume that you have compiled Limba and installed it – how that is done is described in its README file. So, let’s start.

1. Prepare your application

The cool thing about Limba is that you don’t really have to do many changes on your application. There are a few things to pay attention to, though:

  • Ensure the binaries and data are installed into the right places in the directory hierarchy. Binaries must go to $prefix/bin, for example.
  • Ensure that configuration can be found under /etc as well as under $prefix/etc

This needs to be done so your application will find its data at runtime. Additionally, you need to write an AppStream metadata file, and find out which stuff your application depends on.

2. Create package metadata & install software

1.1 Basics

Now you can create the metadata necessary to build a Limba package. Just run

cd /path/to/my/project
lipkgen make-template

This will create a “lipkg” directory, containing a “control” file and a “metainfo.xml” file, which can be a symlink to the AppStream metadata, or be new metadata.

Now, configure your application with /opt/bundle as install prefix (-DCMAKE_INSTALL_PREFIX=/opt/bundle, –prefix=/opt/bundle, etc.) and install it to the lipkg/inst_target directory.

1.2 Handling dependencies

If your software has dependencies on other packages, just get the Limba packages for these dependencies, or build new ones. Then place the resulting IPK packages in the lipkg/repo directory. Ideally, you should be able to fetch Limba packages which contain the software components directly from their upstream developers.

Then, open the lipkg/control file and adjust the “Requires” line. The names of the components you depend on match their AppStream-IDs (<id/> tag in the AppStream XML document). Any version-relation (>=, >>, <<, <=, <>) is supported, and specified in brackets after the component-id.

The resulting control-file might look like this:

Format-Version: 1.0

Requires: Qt5Core (>= 5.3), Qt5DBus (>= 5.3), libpng12

If the specified dependencies are in the repo/ subdirectory, these packages will get installed automatically, if your application package is installed. Otherwise, Limba depends on the user to install these packages manually – there is no interaction with the distribution’s package-manager (yet?).

3. Building the package

In order to build your package, make sure the content in inst_target/ is up to date, then run

lipkgen build lipkg/

This will build your package and output it in the lipkg/ directory.

4. Testing the package

You can now test your package, Just run

sudo lipa install package.ipk

Your software should install successfully. If you provided a .desktop file in $prefix/share/applications, you should find your application in your desktop’s application-menu. Otherwise, you can run a binary from the command-line, just append the version of your package to the binary name (bash-comletion helps). Alternatively, you can use the runapp command, which lets you run any binary in your bundle/package, which is quite helpful for debugging (since the environment a Limba-installed application is run is different from the one of other applications).

Example:

runapp ${component_id}-${version}:/bin/binary-name

And that’s it! :-)

I used these steps to create a Limba package for the OpenGL Qt5 demo on Tanglu 2 (Bartholomea), and tested it on Kubuntu 15.04 (Vivid) with KDE, as well as on an up-to-date Fedora 21, with GNOME and without any Qt or KDE stuff installed:

qt5demo-limba-kubuntuqt5demo-limba-fedora

I encountered a few obstacles when building the packages, e.g. Qt5 initially didn’t find the right QPA plugin – that has been fixed by adjusting a config file in the Qt5Gui package. Also, on Fedora, a matching libpng was missing, so I included that as well.

You can find the packages at Github, currently (but I am planning to move them to a different place soon). The biggest issue with Limba is at time, that it needs Linux 3.18, or an older kernel with OverlayFS support compiled in. Apart from that and a few bugs, the experience is quite smooth. As soon as I am sure there are now hidden fundamental issues, I can think of implementing more features, like signing packages and automatically updating them.

Have fun playing around with Limba!

December 01, 2014

A long-standing and unfixable problem in X is that we cannot send a number of keys to clients because their keycode is too high. This doesn't affect any of the normal keys for typing, but a lot of multimedia keys, especially "newly" introduced ones.

X has a maximum keycode 255, and "Keycodes lie in the inclusive range [8,255]". The reason for the offset 8 keeps escaping me but it doesn't matter anyway. Effectively, it means that we are limited to 247 keys per keyboard. Now, you may think that this would be enough and thus the limit shouldn't really affect us. And you're right. This post explains why it is a problem nonetheless.

Let's discard any ideas about actually increasing the limit from 8 bit to 32 bit. It's hardwired into too many open-coded structs that this is simply not an option. You'd be breaking every X client out there, so at this point you might as well rewrite the display server and aim for replacing X altogether. Oh wait...

So why aren't 247 keycodes enough? The reason is that large chunks of that range are unused and wasted.

In X, the keymap is an array in the form keysyms[keycode] = some keysym (that's a rather simplified view, look at the output from "xkbcomp -xkb $DISPLAY -" for details). The actual value of the keycode doesn't matter in theory, it's just an index. Of course, that theory only applies when you're looking at one keyboard at a time. We need to ship keymaps that are useful everywhere (see xkeyboard-config) and for that we need some sort of standard. In the olden days this meant every vendor had their own keycodes (see /usr/share/X11/xkb/keycodes) but these days Linux normalizes it to evdev keycodes. So we know that KEY_VOLUMEUP is always 115 and we can hook it up thus to just work out of the box. That however leaves us with huge ranges of unused keycodes because every device is different. My keyboard does not have a CD eject key, but it has volume control keys. I have a key to start a web browser but I don't have a key to start a calculator. Others' keyboards do have those keys though, and they expect those keys to work. So the default keymap needs to map the possible keycodes to the matching keysyms and suddenly our 247 keycodes per keyboard becomes 247 for all keyboards ever made. And that is simply not enough.

To work around this, we'd need locally hardware-adjusted keymaps generated at runtime. After loading the driver we can look at the keys that exist, remap higher keycodes into an unused range and then communicate that to move the keysyms into the newly mapped keycodes. This is...complicated. evdev doesn't know about keymaps. When gnome-settings-daemon applied your user-specific layout, evdev didn't get told about this. GNOME on the other hand has no idea that evdev previously re-mapped higher keycodes. So when g-s-d applies your setting, it may overwrite the remapped keysym with the one from the default keymaps (and evdev won't notice).

As usual, none of this is technically unfixable. You could figure out a protocol extension that drivers can talk to the servers and the clients to notify them of remapped keycodes. This of course needs to be added to evdev, the server, libX11, probably xkbcomp and libxkbcommon and of course to all desktop environments that set they layout. To write the patches you need a deep understanding of XKB which would definitely make your skillset a rare one, probably make you quite employable and possibly put you on the fast track for your nearest mental institution. XKB and happiness don't usually go together, but at least the jackets will keep you warm.

Because of the above, we go with the simple answer: "X can't handle keycodes over 255"

November 29, 2014
I'm not sure

but if hd0;u]; means anything to anyone from displaylink, or is the first unencrypted bytes they send, then oops.

Looks like I have some work to do next week.
November 21, 2014
So someone leaked 2011 era PowerVR SGX microcode and user space... And now everyone is pissing themselves like a bunch of overexcited puppies...

I've been fed links from several sides now, and i cannot believe how short-sighted and irresponsible people are, including a few people who should know better.

STOP TELLING PEOPLE TO LOOK AT PROPRIETARY CODE.

Having gotten that out of the way, I am writing this blog to put everyone straight and stop the nonsense, and to calmly explain why this leak is not a good thing.

Before i go any further, IANAL, but i clearly do seem to tread much more carefully on these issues than most. As always, feel free to debunk what i write here in the comments, especially you actual lawyers, especially those lawyers in the .EU.

LIBV and the PVR.

Let me just, once again, state my position towards the PowerVR.

I have worked on the Nokia N9, primarily on the SGX kernel side (which is of course GPLed), but i also touched both the microcode and userspace. So I have seen the code, worked with and i am very much burned on it. Unless IMG itself gives me permission to do so, i am not allowed to contribute to any open source driver for the PowerVR. I personally also include the RGX, and not just SGX, in that list, as i believe that some things do remain the same. The same is true for Rob Clark, who worked with PowerVR when at Texas Instruments.

This is, however, not why i try to keep people from REing the PowerVR.

The reason why i tell people to stay away is because of the design of the PowerVR and its driver stack: PVR is heavily microcode driven, and this microcode is loaded through the kernel from userspace. The microcode communicates directly with the kernel through some shared structs, which change depending on build options. There are sometimes extensive changes to both the microcode, kernel and userspace code depending on the revision of the SGX, customer project and build options, and sometimes the whole stack is affected, from microcode to userspace. This makes the powervr a very unstable platform: change one component, and the whole house of cards comes tumbling down. A nightmare for system integrators, but also bad news for people looking to provide a free driver for this platform. As if the murderous release cycle of mobile hardware wasn't bad enough of a moving target already.

The logic behind me attempting to keep people away from REing the PowerVR is, at one end, the attempt to focus the available decent developers on more rewarding GPUs and to keep people from burning out on something as shaky as the PowerVR. On the other hand, by getting everyone working on the other GPUs, we are slowly forcing the whole market open, singling out Imagination Technologies. At one point, IMG will be forced to either do this work itself, and/or to directly support open sourcing themselves, or to remain the black sheep forever.

None of the above means that I am against an open source driver for PVR, quite the opposite, I just find it more productive to work on the other GPUs amd wait this one out.

Given their bad reputation with system integrators, their shaky driver/microcode design, and the fact that they are in a cut throat competition with ARM, Imagination Technologies actually has the most to gain from an open source driver. It would at least take some of the pain out of that shaky microcode/kernel/userspace combination, and make a lot of peoples lives a lot easier.

This is not open source software.

Just because someone leaked this code, it has not magically become free software.

It is still just as proprietary as before. You cannot use this code in any open source project, or at all, the license on it applies just as strongly as before. If you download it, or distribute it, or whatever other actions forbidden in the license, you are just as accountable as the other parties in the chain.

So for all you kiddies who now think "Great, finally an open driver for PowerVR, let's go hack our way into celebrity", you couldn't be more wrong. At best, you just tainted yourself.

But the repercussion go further than that. The simple fact that this code has been leaked has cast a very dark shadow on any future open source project that might involve the powervr. So be glad that we have been pretty good at dissuading people from wasting their time on powervr, and that this leak didn't end up spoiling many man-years of work.

Why? Well, let's say that there was an advanced and active PowerVR reverse engineering project. Naturally, the contributors would not be able to look at the leaked code. But it goes further than that. Say that you are the project maintainer of such a reverse engineered driver, how do you deal with patches that come in from now on? Are you sure that they are not taken more or less directly from the leaked driver? How do you prove this?

Your fun project just turned from a relatively straightforward REing project to a project where patches absolutely need to be signed-off, and where you need to establish some severe trust into your contributors. That's going to slow you down massively.

But even if you can manage to keep your code clean, the stigma will remain. Even if lawyers do not get involved, you will spend a lot of time preparing yourself for such an eventuality. Not a fun position to be in.

The manpower issue.

I know that any clued and motivated individual can achieve anything. I also know that really clued people, who are dedicated and can work structuredly are extremely rare and that their time is unbelievably valuable.

With the exception of Rob, who is allowed to spend some of his redhat time on the freedreno driver, none of the people working on the open ARM GPU drivers have any support. Working on such a long haul project without support either limits the amount of time available for it, or severely reduces the living standard of the person doing so, or anywhere between those extremes. If you then factor in that there are only a handful of people working on a handful of drivers, you get individuals spending several man-years mostly on their own for themselves.

If you are wondering why ARM GPU drivers are not moving faster, then this is why. There are just a limited few clued individuals who are doing this, and they are on their own, and they have been at it for years by now. Think of that the next time you want to ask "Is it done yet?".

This is why I tried to keep people from REing the powerVR, what little talent and stamina there is can be better put to use on more straightforward GPUs. We have a hard enough time as it is already.

Less work? More work!

If you think that this leaked driver takes away much of the hard work of reverse engineering and makes writing an open source driver easy, you couldn't be more wrong.

This leak means that here is no other option left apart from doing a full clean room. And there need to be very visible and fully transparent processes in place in a case like this. Your one man memory dumper/bit-poker/driver writer just became at least two persons. One of them gets to spend his time ogling bad code (which proprietary code usually ends up being), trying to make sense of it, and then trying to write extensive documentation about it (without being able to test his findings much). The other gets to write code from that documentation, but also little more. Both sides are very much forbidden to go back and forth between those two positions.

As if we ARM GPU driver developers didn't have enough frustration to deal with, and the PVR stack isn't bad enough already, the whole situation just got much much worse.

So for all those who think that now the floodgates are open for PowerVR, don't hold your breath. And to those who now suddenly want to create an open source driver for the powervr, i ask: you and what army?

For all those who are rinsing out their shoes ask yourself how many unsupported man-years you will honestly be able to dedicate to this, and whether there will be enough individuals who can honestly claim the same. Then pick your boring task, and then stick to it. Forever. And hope that the others also stick to their side of this bargain.

LOL, http://goo.gl/kbBEPX

What have we come to?

The leaked source code of a proprietary graphics driver is not something you should be spreading amongst your friends for "Lolz", especially not amongst your open source graphics driver developing friends.

I personally am not too bothered about the actual content of this one, the link names were clear about what it was, and I had seen it before. I was burned before, so i quickly delved in to verify that this was indeed SGX userspace. In some cases, with the links being posted publicly, i then quickly moved on to dissuade people from looking at it, for what limited success that could have had.

But what would i have done if this were Mali code, and the content was not clear from the link name? I got lucky here.

I am horrified about the lack of responsibility of a lot of people. These are not some cat pictures, or some nude celebrities. This is code that forbids people from writing graphics drivers.

But even if you haven't looked at this code yet, most of the damage has been done. A reverse engineered driver for powervr SGX will now probably never happen. Heck, i just got told that someone even went and posted the links to the powerVR REing mailinglist (which luckily has never seen much traffic). I wonder how that went:
Hi,
Are you the guys doing the open source driver for PowerVR SGX?
I have some proprietary code here that could help you speed things along.
Good luck!

So for the person who put this up on github: thank you so much. I hope that you at least didn't use your real name. I cannot imagine that any employer would want to hire anyone who acts this irresponsibly. Your inability to read licenses means that you cannot be trusted with either proprietary code or open source code, as you seem unable to distinguish between them. Well done.

The real culprit is of course LG, for crazily sticking the GPL on this. But because one party "accidentally" sticks a GPL on that doesn't make it GPL, and that doesn't suddenly give you the right to repeat the mistake.

Last months ISA release.

And now for something slightly different...

Just over a month ago, there was the announcement about Imagination Technologies' new SDK. Supposedly, at least according to the phoronix article, Imagination Technologies made the ISA (instruction set architecture) of the RGX available in it.

This was not true.

What was released was the assembly language for the PowerVR shaders, which then needs to be assembled by the IMG RGX assembler to provide the actual shader binaries. This is definitely not the ISA, and I do not know whether it was Alexandru Voica (an Imagination marketing guy who suddenly became active on the phoronix forums, and who i believe to be the originator of this story) or the author of the article on Phoronix who made this error. I do not think that this was bad intent though, just that something got lost in translation.

The release of the assembly language is very nice though. It makes it relatively straightforward to match the assembly to the machine code, and takes away most of the pain of ISA REing.

Despite the botched message, this was a big step forwards for ARM GPU makers; Imagination delivered what its customers need (in this case, the ability to manually tune some shaders), and in the process it also made it easier for potential REers to create an open source driver.

Looking forward.

Between the leak, the assembly release, and the market position Imagination Technologies is in, things are looking up though.

Whereas the leak made a credible open source reverse engineering project horribly impractical and very unlikely, it did remove some of the incentive for IMG to not support an open source project themselves. I doubt that IMG will now try to bullshit us with the inane patent excuse. The (not too credible) potential damage has been done here already now.

With the assembly language release, a lot of the inner workings and the optimization of the RGX shaders was also made public. So there too the barrier has disappeared.

Given the structure of the IMG graphics driver stack, system integrators have a limited level of satisfaction with IMG. I really doubt that this has improved too much since my Nokia days. Going open source now, by actively supporting some clued open source developers and by providing extensive NDA-free documentation, should not pose much of a legal or political challenge anymore, and could massively improve the perception of Imagination Technologies, and their hardware.

So go for it, IMG. No-one else is going to do this for you, and you can only gain from it!

With OpenStack embracing the Tooz library more and more over the past year, I think it's a good start to write a bit about it.

A bit of history

A little more than year ago, with my colleague Yassine Lamgarchal and others at eNovance, we investigated on how to solve a problem often encountered inside OpenStack: synchronization of multiple distributed workers. And while many people in our ecosystem continue to drive development by adding new bells and whistles, we made a point of solving new problems with a generic solution able to address the technical debt at the same time.

Yassine wrote the first ideas of what should be the group membership service that was needed for OpenStack, identifying several projects that could make use of this. I've presented this concept during the OpenStack Summit in Hong-Kong during an Oslo session. It turned out that the idea was well-received, and the week following the summit we started the tooz project on StackForge.

Goals

Tooz is a Python library that provides a coordination API. Its primary goal is to handle groups and membership of these groups in distributed systems.

Tooz also provides another useful feature which is distributed locking. This allows distributed nodes to acquire and release locks in order to synchronize themselves (for example to access a shared resource).

The architecture

If you are familiar with distributed systems, you might be thinking that there are a lot of solutions already available to solve these issues: ZooKeeper, the Raft consensus algorithm or even Redis for example.

You'll be thrilled to learn that Tooz is not the result of the NIH syndrome, but is an abstraction layer on top of all these solutions. It uses drivers to provide the real functionalities behind, and does not try to do anything fancy.

All the drivers do not have the same amount of functionality of robustness, but depending on your environment, any available driver might be suffice. Like most of OpenStack, we let the deployers/operators/developers chose whichever backend they want to use, informing them of the potential trade-offs they will make.

So far, Tooz provides drivers based on:

All drivers are distributed across processes. Some can be distributed across the network (ZooKeeper, memcached, redis…) and some are only available on the same host (IPC).

Also note that the Tooz API is completely asynchronous, allowing it to be more efficient, and potentially included in an event loop.

Features

Group membership

Tooz provides an API to manage group membership. The basic operations provided are: the creation of a group, the ability to join it, leave it and list its members. It's also possible to be notified as soon as a member joins or leaves a group.

Leader election

Each group can have a leader elected. Each member can decide if it wants to run for the election. If the leader disappears, another one is elected from the list of current candidates. It's possible to be notified of the election result and to retrieve the leader of a group at any moment.

Distributed locking

When trying to synchronize several workers in a distributed environment, you may need a way to lock access to some resources. That's what a distributed lock can help you with.

Adoption in OpenStack

Ceilometer is the first project in OpenStack to use Tooz. It has replaced part of the old alarm distribution system, where RPC was used to detect active alarm evaluator workers. The group membership feature of Tooz was leveraged by Ceilometer to coordinate between alarm evaluator workers.

Another new feature part of the Juno release of Ceilometer is the distribution of polling tasks of the central agent among multiple workers. There's again a group membership issue to know which nodes are online and available to receive polling tasks, so Tooz is also being used here.

The Oslo team has accepted the adoption of Tooz during this release cycle. That means that it will be maintained by more developers, and will be part of the OpenStack release process.

This opens the door to push Tooz further in OpenStack. Our next candidate would be write a service group driver for Nova.

The complete documentation for Tooz is available online and has examples for the various features described here, go read it if you're curious and adventurous!

November 19, 2014

Debian's latest round of angry mailing list threads have been about some combination of init systems, future direction and project governance. The details aren't particularly important here, and pretty much everything worthwhile in favour of or against each position has already been said several times, but I think this bit is important enough that it bears repeating: the reason I voted "we didn't need this General Resolution" ahead of the other options is that I hope we can continue to use our normal technical and decision-making processes to make Debian 8 the best possible OS distribution for everyone. That includes people who like systemd, people who dislike systemd, people who don't care either way and just want the OS to work, and everyone in between those extremes.

I think that works best when we do things, and least well when a lot of time and energy get diverted into talking about doing things. I've been trying to do my small part of the former by fixing some release-critical bugs so we can release Debian 8. Please join in, and remember to write good unblock requests so our hard-working release team can get through them in a finite time. I realise not everyone will agree with my idea of which bugs, which features and which combinations of packages are highest-priority; that's fine, there are plenty of bugs to go round!

Regarding init systems specifically, Debian 'jessie' currently works with at least systemd-sysv or sysvinit-core as pid 1 (probably also Upstart, but I haven't tried that) and I'm confident that Debian developers won't let either of those regress before it's released as Debian 8.

I expect the freeze for Debian 'stretch' (presumably Debian 9) to be a couple of years away, so it seems premature to say anything about what will or won't be supported there; that depends on what upstream developers do, and what Debian developers do, between now and then. What I can predict is that the components that get useful bug reports, active maintenance, thorough testing, careful review, and similar help from contributors will work better than the things that don't; so if you like a component and want it to be supported in Debian, you can help by, well, supporting it.


PS. If you want the Debian 8 installer to leave you running sysvinit as pid 1 after the first reboot, here's a suitable incantation to add to the kernel command-line in the installer's bootloader. This one certainly worked when KiBi asked for testing a few days ago:

preseed/late_command="in-target apt-get install -y sysvinit-core"

I think that corresponds to this line in a preseeding file, if you use those:

d-i preseed/late_command string in-target apt-get install -y sysvinit-core

A similar apt-get command, without the in-target prefix, should work on an installed system that already has systemd-sysv. Depending on other installed software, you might need to add systemd-shim to the command line too, but when I tried it, apt-get was able to work that out for itself.

If you use aptitude instead of apt-get, double-check what it will do before saying "yes" to this particular switchover: its heuristic for resolving conflicts seems to be rather more trigger-happy about removing packages than the one in apt-get.

November 16, 2014

Apparently, people care when you, as privileged person (white, male, long-time Debian Developer) throw in the towel because the amount of crap thrown your way just becomes too much. I guess that's good, both because it gives me a soap box for a short while, but also because if enough people talk about how poisonous the well that Debian is has become, we can fix it.

This morning, I resigned as a member of the systemd maintainer team. I then proceeded to leave the relevant IRC channels and announced this on twitter. The responses I've gotten have been almost all been heartwarming. People have generally been offering hugs, saying thanks for the work put into systemd in Debian and so on. I've greatly appreciated those (and I've been getting those before I resigned too, so this isn't just a response to that). I feel bad about leaving the rest of the team, they're a great bunch: competent, caring, funny, wonderful people. On the other hand, at some point I had to draw a line and say "no further".

Debian and its various maintainer teams are a bunch of tribes (with possibly Debian itself being a supertribe). Unlike many other situations, you can be part of multiple tribes. I'm still a member of the DSA tribe for instance. Leaving pkg-systemd means leaving one of my tribes. That hurts. It hurts even more because it feels like a forced exit rather than because I've lost interest or been distracted by other shiny things for long enough that you don't really feel like part of a tribe. That happened with me with debian-installer. It was my baby for a while (with a then quite small team), then a bunch of real life thing interfered and other people picked it up and ran with it and made it greater and more fantastic than before. I kinda lost touch, and while it's still dear to me, I no longer identify as part of the debian-boot tribe.

Now, how did I, standing stout and tall, get forced out of my tribe? I've been a DD for almost 14 years, I should be able to weather any storm, shouldn't I? It turns out that no, the mountain does get worn down by the rain. It's not a single hurtful comment here and there. There's a constant drum about this all being some sort of conspiracy and there are sometimes flares where people wish people involved in systemd would be run over by a bus or just accusations of incompetence.

Our code of conduct says, "assume good faith". If you ever find yourself not doing that, step back, breathe. See if there's a reasonable explanation for why somebody is saying something or behaving in a way that doesn't make sense to you. It might be as simple as your native tongue being English and their being something else.

If you do genuinely disagree with somebody (something which is entirely fine), try not to escalate, even if the stakes are high. Examples from the last year include talking about this as a war and talking about "increasingly bitter rear-guard battles". By using and accepting this terminology, we, as a project, poison ourselves. Sam Hartman puts this better than me:

I'm hoping that we can all take a few minutes to gain empathy for those who disagree with us. Then I'm hoping we can use that understanding to reassure them that they are valued and respected and their concerns considered even when we end up strongly disagreeing with them or valuing different things.

I'd be lying if I said I didn't ever feel the urge to demonise my opponents in discussions. That they're worse, as people, than I am. However, it is imperative to never give in to this, since doing that will diminish us as humans and make the entire project poorer. Civil disagreements with reasonable discussions lead to better technical outcomes, happier humans and a healthier projects.

November 15, 2014
A couple weeks ago, qualcomm (quic) surprised some by sending kernel patches to enable the new adreno 4xx family of GPUs found in their latest SoCs.  Such as the apq8084 powering my ifc6540 board with the a420 GPU.  Note that qualcomm had already sent patches to enable display support for apq8084, merged in 3.17.  And I'm looking forward to more good things from their upstream efforts in the future.

So in the last weeks, in between various other kernel work (atomic-helper conversion and few other misc things for 3.19) and RHEL stuff, I've managed to bang out initial gallium support for a4xx.  There are still plenty of missing things, or stuff hard-coded, etc.  But yesterday I managed to get textures working, and fix RGBA/BGRA confusion, so now enough works for 'gears and maybe about half of glmark2:



I've intentionally pushed it (just now) after the mesa 10.4 branch point, since it isn't quite ready to be enabled by default in distro mesa builds.  When it gets to the point of at least being able to run a desktop environment (gnome-shell / compiz / etc), I may backport to 10.4.  But there is still a lot of work to do.  The good news is that so far it seems quite fast (and that is without hw binning or XA yet even!)

November 11, 2014

Container Integration

Since a while containers have been one of the hot topics on Linux. Container managers such as libvirt-lxc, LXC or Docker are widely known and used these days. In this blog story I want to shed some light on systemd's integration points with container managers, to allow seamless management of services across container boundaries.

We'll focus on OS containers here, i.e. the case where an init system runs inside the container, and the container hence in most ways appears like an independent system of its own. Much of what I describe here is available on pretty much any container manager that implements the logic described here, including libvirt-lxc. However, to make things easy we'll focus on systemd-nspawn, the mini-container manager that is shipped with systemd itself. systemd-nspawn uses the same kernel interfaces as the other container managers, however is less flexible as it is designed to be a container manager that is as simple to use as possible and "just works", rather than trying to be a generic tool you can configure in every low-level detail. We use systemd-nspawn extensively when developing systemd.

Anyway, so let's get started with our run-through. Let's start by creating a Fedora container tree in a subdirectory:

# yum -y --releasever=20 --nogpg --installroot=/srv/mycontainer --disablerepo='*' --enablerepo=fedora install systemd passwd yum fedora-release vim-minimal

This downloads a minimal Fedora system and installs it in in /srv/mycontainer. This command line is Fedora-specific, but most distributions provide similar functionality in one way or another. The examples section in the systemd-nspawn(1) man page contains a list of the various command lines for other distribution.

We now have the new container installed, let's set an initial root password:

# systemd-nspawn -D /srv/mycontainer
Spawning container mycontainer on /srv/mycontainer
Press ^] three times within 1s to kill container.
-bash-4.2# passwd
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
-bash-4.2# ^D
Container mycontainer exited successfully.
#

We use systemd-nspawn here to get a shell in the container, and then use passwd to set the root password. After that the initial setup is done, hence let's boot it up and log in as root with our new password:

$ systemd-nspawn -D /srv/mycontainer -b
Spawning container mycontainer on /srv/mycontainer.
Press ^] three times within 1s to kill container.
systemd 208 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
Detected virtualization 'systemd-nspawn'.

Welcome to Fedora 20 (Heisenbug)!

[  OK  ] Reached target Remote File Systems.
[  OK  ] Created slice Root Slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Created slice System Slice.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target Slices.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket.
         Starting Journal Service...
[  OK  ] Started Journal Service.
[  OK  ] Reached target Paths.
         Mounting Debug File System...
         Mounting Configuration File System...
         Mounting FUSE Control File System...
         Starting Create static device nodes in /dev...
         Mounting POSIX Message Queue File System...
         Mounting Huge Pages File System...
[  OK  ] Reached target Encrypted Volumes.
[  OK  ] Reached target Swap.
         Mounting Temporary Directory...
         Starting Load/Save Random Seed...
[  OK  ] Mounted Configuration File System.
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Mounted Temporary Directory.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Debug File System.
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Create static device nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Trigger Flushing of Journal to Persistent Storage...
         Starting Recreate Volatile Files and Directories...
[  OK  ] Started Recreate Volatile Files and Directories.
         Starting Update UTMP about System Reboot/Shutdown...
[  OK  ] Started Trigger Flushing of Journal to Persistent Storage.
[  OK  ] Started Update UTMP about System Reboot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Login Service...
         Starting Permit User Sessions...
         Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.
         Starting Cleanup of Temporary Directories...
[  OK  ] Started Cleanup of Temporary Directories.
[  OK  ] Started Permit User Sessions.
         Starting Console Getty...
[  OK  ] Started Console Getty.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Login Service.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.

Fedora release 20 (Heisenbug)
Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (console)

mycontainer login: root
Password:
-bash-4.2#

Now we have everything ready to play around with the container integration of systemd. Let's have a look at the first tool, machinectl. When run without parameters it shows a list of all locally running containers:

$ machinectl
MACHINE                          CONTAINER SERVICE
mycontainer                      container nspawn

1 machines listed.

The "status" subcommand shows details about the container:

$ machinectl status mycontainer
mycontainer:
       Since: Mi 2014-11-12 16:47:19 CET; 51s ago
      Leader: 5374 (systemd)
     Service: nspawn; class container
        Root: /srv/mycontainer
     Address: 192.168.178.38
              10.36.6.162
              fd00::523f:56ff:fe00:4994
              fe80::523f:56ff:fe00:4994
          OS: Fedora 20 (Heisenbug)
        Unit: machine-mycontainer.scope
              ├─5374 /usr/lib/systemd/systemd
              └─system.slice
                ├─dbus.service
                │ └─5414 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-act...
                ├─systemd-journald.service
                │ └─5383 /usr/lib/systemd/systemd-journald
                ├─systemd-logind.service
                │ └─5411 /usr/lib/systemd/systemd-logind
                └─console-getty.service
                  └─5416 /sbin/agetty --noclear -s console 115200 38400 9600

With this we see some interesting information about the container, including its control group tree (with processes), IP addresses and root directory.

The "login" subcommand gets us a new login shell in the container:

# machinectl login mycontainer
Connected to container mycontainer. Press ^] three times within 1s to exit session.

Fedora release 20 (Heisenbug)
Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (pts/0)

mycontainer login:

The "reboot" subcommand reboots the container:

# machinectl reboot mycontainer

The "poweroff" subcommand powers the container off:

# machinectl poweroff mycontainer

So much about the machinectl tool. The tool knows a couple of more commands, please check the man page for details. Note again that even though we use systemd-nspawn as container manager here the concepts apply to any container manager that implements the logic described here, including libvirt-lxc for example.

machinectl is not the only tool that is useful in conjunction with containers. Many of systemd's own tools have been updated to explicitly support containers too! Let's try this (after starting the container up again first, repeating the systemd-nspawn command from above.):

# hostnamectl -M mycontainer set-hostname "wuff"

This uses hostnamectl(1) on the local container and sets its hostname.

Similar, many other tools have been updated for connecting to local containers. Here's systemctl(1)'s -M switch in action:

# systemctl -M mycontainer
UNIT                                 LOAD   ACTIVE SUB       DESCRIPTION
-.mount                              loaded active mounted   /
dev-hugepages.mount                  loaded active mounted   Huge Pages File System
dev-mqueue.mount                     loaded active mounted   POSIX Message Queue File System
proc-sys-kernel-random-boot_id.mount loaded active mounted   /proc/sys/kernel/random/boot_id
[...]
time-sync.target                     loaded active active    System Time Synchronized
timers.target                        loaded active active    Timers
systemd-tmpfiles-clean.timer         loaded active waiting   Daily Cleanup of Temporary Directories

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

49 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

As expected, this shows the list of active units on the specified container, not the host. (Output is shortened here, the blog story is already getting too long).

Let's use this to restart a service within our container:

# systemctl -M mycontainer restart systemd-resolved.service

systemctl has more container support though than just the -M switch. With the -r switch it shows the units running on the host, plus all units of all local, running containers:

# systemctl -r
UNIT                                        LOAD   ACTIVE SUB       DESCRIPTION
boot.automount                              loaded active waiting   EFI System Partition Automount
proc-sys-fs-binfmt_misc.automount           loaded active waiting   Arbitrary Executable File Formats File Syst
sys-devices-pci0000:00-0000:00:02.0-drm-card0-card0\x2dLVDS\x2d1-intel_backlight.device loaded active plugged   /sys/devices/pci0000:00/0000:00:02.0/drm/ca
[...]
timers.target                                                                                       loaded active active    Timers
mandb.timer                                                                                         loaded active waiting   Daily man-db cache update
systemd-tmpfiles-clean.timer                                                                        loaded active waiting   Daily Cleanup of Temporary Directories
mycontainer:-.mount                                                                                 loaded active mounted   /
mycontainer:dev-hugepages.mount                                                                     loaded active mounted   Huge Pages File System
mycontainer:dev-mqueue.mount                                                                        loaded active mounted   POSIX Message Queue File System
[...]
mycontainer:time-sync.target                                                                        loaded active active    System Time Synchronized
mycontainer:timers.target                                                                           loaded active active    Timers
mycontainer:systemd-tmpfiles-clean.timer                                                            loaded active waiting   Daily Cleanup of Temporary Directories

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

191 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

We can see here first the units of the host, then followed by the units of the one container we have currently running. The units of the containers are prefixed with the container name, and a colon (":"). (The output is shortened again for brevity's sake.)

The list-machines subcommand of systemctl shows a list of all running containers, inquiring the system managers within the containers about system state and health. More specifically it shows if containers are properly booted up, or if there are any failed services:

# systemctl list-machines
NAME         STATE   FAILED JOBS
delta (host) running      0    0
mycontainer  running      0    0
miau         degraded     1    0
waldi        running      0    0

4 machines listed.

To make things more interesting we have started two more containers in parallel. One of them has a failed service, which results in the machine state to be degraded.

Let's have a look at journalctl(1)'s container support. It too supports -M to show the logs of a specific container:

# journalctl -M mycontainer -n 8
Nov 12 16:51:13 wuff systemd[1]: Starting Graphical Interface.
Nov 12 16:51:13 wuff systemd[1]: Reached target Graphical Interface.
Nov 12 16:51:13 wuff systemd[1]: Starting Update UTMP about System Runlevel Changes...
Nov 12 16:51:13 wuff systemd[1]: Started Stop Read-Ahead Data Collection 10s After Completed Startup.
Nov 12 16:51:13 wuff systemd[1]: Started Update UTMP about System Runlevel Changes.
Nov 12 16:51:13 wuff systemd[1]: Startup finished in 399ms.
Nov 12 16:51:13 wuff sshd[35]: Server listening on 0.0.0.0 port 24.
Nov 12 16:51:13 wuff sshd[35]: Server listening on :: port 24.

However, it also supports -m to show the combined log stream of the host and all local containers:

# journalctl -m -e

(Let's skip the output here completely, I figure you can extrapolate how this looks.)

But it's not only systemd's own tools that understand container support these days, procps sports support for it, too:

# ps -eo pid,machine,args
 PID MACHINE                         COMMAND
   1 -                               /usr/lib/systemd/systemd --switched-root --system --deserialize 20
[...]
2915 -                               emacs contents/projects/containers.md
3403 -                               [kworker/u16:7]
3415 -                               [kworker/u16:9]
4501 -                               /usr/libexec/nm-vpnc-service
4519 -                               /usr/sbin/vpnc --non-inter --no-detach --pid-file /var/run/NetworkManager/nm-vpnc-bfda8671-f025-4812-a66b-362eb12e7f13.pid -
4749 -                               /usr/libexec/dconf-service
4980 -                               /usr/lib/systemd/systemd-resolved
5006 -                               /usr/lib64/firefox/firefox
5168 -                               [kworker/u16:0]
5192 -                               [kworker/u16:4]
5193 -                               [kworker/u16:5]
5497 -                               [kworker/u16:1]
5591 -                               [kworker/u16:8]
5711 -                               sudo -s
5715 -                               /bin/bash
5749 -                               /home/lennart/projects/systemd/systemd-nspawn -D /srv/mycontainer -b
5750 mycontainer                     /usr/lib/systemd/systemd
5799 mycontainer                     /usr/lib/systemd/systemd-journald
5862 mycontainer                     /usr/lib/systemd/systemd-logind
5863 mycontainer                     /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
5868 mycontainer                     /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt102
5871 mycontainer                     /usr/sbin/sshd -D
6527 mycontainer                     /usr/lib/systemd/systemd-resolved
[...]

This shows a process list (shortened). The second column shows the container a process belongs to. All processes shown with "-" belong to the host itself.

But it doesn't stop there. The new "sd-bus" D-Bus client library we have been preparing in the systemd/kdbus context knows containers too. While you use sd_bus_open_system() to connect to your local host's system bus sd_bus_open_system_container() may be used to connect to the system bus of any local container, so that you can execute bus methods on it.

sd-login.h and machined's bus interface provide a number of APIs to add container support to other programs too. They support enumeration of containers as well as retrieving the machine name from a PID and similar.

systemd-networkd also has support for containers. When run inside a container it will by default run a DHCP client and IPv4LL on any veth network interface named host0 (this interface is special under the logic described here). When run on the host networkd will by default provide a DHCP server and IPv4LL on veth network interface named ve- followed by a container name.

Let's have a look at one last facet of systemd's container integration: the hook-up with the name service switch. Recent systemd versions contain a new NSS module nss-mymachines that make the names of all local containers resolvable via gethostbyname() and getaddrinfo(). This only applies to containers that run within their own network namespace. With the systemd-nspawn command shown above the the container shares the network configuration with the host however; hence let's restart the container, this time with a virtual veth network link between host and container:

# machinectl poweroff mycontainer
# systemd-nspawn -D /srv/mycontainer --network-veth -b

Now, (assuming that networkd is used in the container and outside) we can already ping the container using its name, due to the simple magic of nss-mymachines:

# ping mycontainer
PING mycontainer (10.0.0.2) 56(84) bytes of data.
64 bytes from mycontainer (10.0.0.2): icmp_seq=1 ttl=64 time=0.124 ms
64 bytes from mycontainer (10.0.0.2): icmp_seq=2 ttl=64 time=0.078 ms

Of course, name resolution not only works with ping, it works with all other tools that use libc gethostbyname() or getaddrinfo() too, among them venerable ssh.

And this is pretty much all I want to cover for now. We briefly touched a variety of integration points, and there's a lot more still if you look closely. We are working on even more container integration all the time, so expect more new features in this area with every systemd release.

Note that the whole machine concept is actually not limited to containers, but covers VMs too to a certain degree. However, the integration is not as close, as access to a VM's internals is not as easy as for containers, as it usually requires a network transport instead of allowing direct syscall access.

Anyway, I hope this is useful. For further details, please have a look at the linked man pages and other documentation.

Over the last couple of years, we've put some effort into better tooling for debugging input devices. Benjamin's hid-replay is an example for a low-level tool that's great for helping with kernel issues, evemu is great for userspace debugging of evdev devices. evemu has recently gained better Python bindings, today I'll explain here how those make it really easy to analyse event recordings.

Requirement: evemu 2.1.0 or later

The input needed to make use of the Python bindings is either a device directly or an evemu recordings file. I find the latter a lot more interesting, it enables me to record multiple users/devices first, and then run the analysis later. So let's go with that:


$ sudo evemu-record > mouse-events.evemu
Available devices:
/dev/input/event0: Lid Switch
/dev/input/event1: Sleep Button
/dev/input/event2: Power Button
/dev/input/event3: AT Translated Set 2 keyboard
/dev/input/event4: SynPS/2 Synaptics TouchPad
/dev/input/event5: Lenovo Optical USB Mouse
Select the device event number [0-5]: 5
That pipes any event from the mouse into the file, to be terminated by ctrl+c. It's just a text file, feel free to leave it running for hours.

Now for the actual analysis. The simplest approach is to read all events from a file and print them:


#!/usr/bin/env python

import sys
import evemu

filename = sys.argv[1]
# create an evemu instance from the recording,
# create=False means don't create a uinput device from it
d = evemu.Device(filename, create=False)

for e in d.events():
print e
That prints out all events, so the output should look identical to the input file's event list. The output you should see is something like:

E: 7.817877 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
E: 7.821887 0002 0000 -001 # EV_REL / REL_X -1
E: 7.821903 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
E: 7.825872 0002 0000 -001 # EV_REL / REL_X -1
E: 7.825879 0002 0001 -001 # EV_REL / REL_Y -1
E: 7.825883 0000 0000 0000 # ------------ SYN_REPORT (0) ----------

The events are an evemu.InputEvent object, with the properties type, code, value and the timestamp as sec, usec accessible (i.e. the underlying C struct). The most useful method of the object is InputEvent.matches(type, code) which takes both integer values and strings:


if e.matches("EV_REL"):
print "this is a relative event of some kind"
elif e.matches("EV_ABS", "ABS_X"):
print "absolute X movement"
elif e.matches(0x03, 0x01):
printf "absolute Y movement"

A practical example: let's say we want to know the maximum delta value our mouse sends.



import sys
import evemu

filename = sys.argv[1]
# create an evemu instance from the recording,
# create=False means don't create a uinput device from it
d = evemu.Device(filename, create=False)

if not d.has_event("EV_REL", "REL_X") or \
not d.has_event("EV_REL", "REL_Y"):
print "%s isn't a mouse" % d.name
sys.exit(1)

deltas = []

for e in d.events():
if e.matches("EV_REL", "REL_X") or \
e.matches("EV_REL", "REL_Y"):
deltas.append(e.value)

max = max([abs(x) for x in deltas])
print "Maximum delta is %d" % (max)
And voila, with just a few lines of code we've analysed a set of events. The rest is up to your imagination. So far I've used scripts like this to help us implement palm detection, figure out ways how to deal with high-DPI mice, estimate the required size for top softwarebuttons on touchpads, etc.

Especially for printing event values, a couple of other functions come in handy here:


type = evemu.event_get_value("EV_REL")
code = evemu.event_get_value("EV_REL", "REL_X")

strtype = evemu.event_get_name(type)
strcode = evemu.event_get_name(type, code)
They do what you'd expect from them, and both functions take either strings and actual types/codes as numeric values. The same exists for input properties.

The following was debugged and discovered by Benjamin Tissoires, I'm merely playing the editor and publisher. All credit and complimentary beverages go to him please.

Wacom recently added two interesting products to its lineup: the Intuos Creative Stylus 2 and the Bamboo Stylus Fineline. Both are styli only, without the accompanying physical tablet and they are marketed towards the Apple iPad market. The basic idea here is that touch location is provided by the system, the pen augments that with buttons, pressure and whatever else. The tips of the styli are 2.9mm (Creative Stylus 2) and 1.9mm (Bamboo Fineline), so definitely smaller than your average finger, and smaller than most other touch pens. This could of course be useful for any touch-capable Linux laptop, it's a cheap way to get an artist's tablet. The official compatibility lists the iPads only, but then that hasn't stopped anyone in the past.

We enjoy a good relationship with the Linux engineers at Wacom, so naturally the first thing was to ask if they could help us out here. Unfortunately, the answer was no. Or more specifically (and heavily paraphrased): "those devices aren't really general purpose, so we wouldn't want to disclose the spec". That of course immediately prompted Benjamin to go and buy one.

From Wacom's POV not disclosing the specs makes sense and why will become more obvious below. The styli are designed for a specific use-case, if Wacom claims that they can work in any use-case they have a lot to lose - mainly from the crowd that blames the manufacturer if something doesn't work as they expect. Think of when netbooks were first introduced and people complained that they weren't full-blown laptops, despite the comparatively low price...

The first result: the stylus works on most touchscreens (and Benjamin has a few of those) but not on all of them. Specifically, the touchscreen on the Asus N550JK didn't react to it. So that's warning number 1: it may not work on your specific laptop and you probably won't know until you try.

Pairing works, provided you have a Bluetooth 4.0 chipset and your kernel supports it (tested on 3.18-rc3). Problem is: you can connect the device but you don't get anything out of it. Why? Bluetooth LE. Let's expand on that: Bluetooth LE uses the Generic Attribute Profile (GATT). The actual data is divided into Profiles, Services and Characteristics, which are clearly named by committee and stand for the general topic, subtopic/item and data point(s). So in the example here the Profile is Heart Rate Profile, the Service is Heart Rate Measurement and the Characteristic is the actual count of "lub-dub" on your ticker [1]. All are predefined. Again, why does this matter? Because what we're hoping for is the Hid Service or the Hid over GATT Service service. In both cases we could then use the kernel's uhid module to get the stylus to work. Alas, the actual output of the device is:


[bluetooth]# info C5:37:E8:73:57:BE
Device C5:37:E8:73:57:BE
Name: Stylus1
Alias: Stylus1
Appearance: 0x0341
Paired: yes
Trusted: yes
Blocked: no
Connected: yes
LegacyPairing: no
UUID: Vendor specific (00001523-1212-efde-1523-785feabcd123)
UUID: Generic Access Profile (00001800-0000-1000-8000-00805f9b34fb)
UUID: Generic Attribute Profile (00001801-0000-1000-8000-00805f9b34fb)
UUID: Device Information (0000180a-0000-1000-8000-00805f9b34fb)
UUID: Battery Service (0000180f-0000-1000-8000-00805f9b34fb)
UUID: Vendor specific (6e400001-b5a3-f393-e0a9-e50e24dcca9e)
Modalias: usb:v056Ap0329d0001
So we can see GAP and GATT, Device Information and Battery Service (both predefined) and 2 Vendor specific profiles (i.e. "magic fairy dust"). And this is where Benjamin got stuck - each of these may have a vendor-specific handshake, protocol, etc. And it's not even sure he'll be able to set the device up so it talks to him. So warning number 2: you can see and connect the device, but it'll talk gibberish (or nothing).

Now, it's probably possible to reverse engineer that if you have sufficient motivation. We don't. The Bluetooth spec is available though, once you work your way through that you can start working on the vendor specific protocol which we know nothing about.

Last but not least: the userspace component. The device itself is not ready-to-use, it provides pressure but you'd still have to associate it with the right touch point. That's not trivial, especially in the presence of other touch points (the outside of your hand while using the stylus for example). So we'd need to add support for this in the X drivers and libinput to make it work. Wacom and/or OS X presumably solved this for iPads, but even there it doesn't just work. The applications need to support it and "You do have to do some digging to figure out to connect the stylus to your favorite art apps -- it's a different procedure for each one, but that's common among these styluses." That's something we wouldn't do the same way on the Linux desktop. So warning number 3: if you can make the kernel work, it won't work as you expect in userspace, and getting it to work is a huge task.

Now all that pretty much boils down to: is it worthwhile? Our consensus so far was "no". I guess Wacom was right in holding back the spec after all. These devices won't work on any tablet and even if they would, we don't have anything in the userspace stack to actually support them properly. So in summary: don't buy the stylus if you plan to use it in Linux.

[1] lub-dub is good. ta-lub-dub is not. you don't want lub-dub-ta. wikipedia

In my previous blog post, I was talking about a pathologically bad Linux desktop performance with FullHD monitors on Allwinner A10 hardware.

A lot of time has passed since then. Thanks to the availability of Rockchip sources and documentation, we have learned a lot of information about the DRAM controller in Allwinner A10/A13/A20 SoCs. Both Allwinner and Rockchip are apparently licensing the DRAM controller IP from the same third-party vendor. And their DRAM controller hardware registers are sharing a lot of similarities (though unfortunately this is not an exact match).

Having a much better knowledge about the hardware allowed us to revisit this problem, investigate it in more details and come up with a solution back in April 2014. The only missing part was providing an update in this blog. At least to make it clear that the problem has been resolved now. So here we go...

November 10, 2014

As some of you already know, since the larger restructuring in PackageKit for the 1.0 release, I am rethinking Listaller, the 3rd-party application installer for Linux systems, as well.

During the past weeks, I was playing around with a lot of different ideas and code, to make installations of 3rd-party software easily possible on Linux, but also working together with the distribution package manager. I now have come up with an experimental project, which might achieve this.

Motivation

Many of you know Lennart’s famous blogpost on how we put together Linux distributions. And he makes a lot of good and valid points there (in fact, I agree with his reasoning there). The proposed solution, however, is not something which I am very excited about, at least not for the use-case of installing a simple application[1]. Leaving things like the exclusive dependency on technology like Btrfs aside, the solution outlined by Lennart basically bypasses the distribution itself, instead of working together with it. This results in a duplication of installed libraries, making it harder to overview which versions of which software component are actually running on the system. There is also a risk for security holes due to libraries not being updated. The security issues are worked around by a superior sandbox, which still needs to be implemented (but will definitively come soon, maybe next year).

I wanted to explore a different approach of managing 3rd-party applications on Linux systems, which allows sharing as much code as possible between applications.

Limba – Glick2 and Listaller concepts mergedlimba-small

In order to allow easy creation of software packages, as well as the ability to share software between different 3rd-party applications, I took heavy inspiration from Alexander Larssons Glick2 project, combining it with ideas from the application-directory based Listaller.

The result is Limba (named after Limba tree, not the voodoo spirit – I needed some name starting with “li” to keep the prefix used in Listaller, and for a tool like this the name didn’t really matter ;-) ).

Limba uses OverlayFS to combine an application with its dependencies before running it, as well as mount namespaces and shared subtrees. Except for OverlayFS, which just landed in the kernel recently, all other kernel features needed by Limba are available for years now (and many distributions ship with OverlayFS on older kernels as well).

How does it work?

In order to to achieve separation of software, each software component is located in a separate container (= package). A software component can be an application, like Kate or GEdit, but also be a single shared library (openssl) or even a full runtime (KDE Frameworks 5 parts, GNOME 3).

Each of these software components can be identified via AppStream metadata, which is just a few bits of XML. A Limba package can declare a dependency on any other software component. In case that software is available in the distribution’s repositories, the version found there can be used. Otherwise, another Limba package providing the software is required.

Limba packages can be provided from software repositories (e.g. provided by the distributor), or be nested in other packages. For example, imagine the software “Kate” requires a version of the Qt5 libraries, >= 5.2. The downloadable package for “Kate” can be bundled with that dependency, by including the “Qt5 5.2″ Limba package in the “Kate” package. In case another software is installed later, which also requires the same version of Qt, the already installed version will be used.

Since the software components are located in separate directories under /opt/software, an application will not automatically find its dependencies, or be able to locate its own files. Therefore, each application has to be run by a special tool, which merges the directory trees of the main application and it’s dependencies together using OverlayFS. This has the nice sideeffect that the main application could override files from its dependencies, if necessary. The tool also sets up a new mount namespace, so if the application is compiled with a certain prefix, it does not need to be relocatable to find its data files.

At installation time, to achieve better system integration, certain files (like e.g. the .desktop file) are split out of the installed directory tree, so the newly installed application achieves almost full system integration.

AQNAY*

Can I use Limba now?

Limba is an experiment. I like it very much, but it might happen that I find some issues with it and kill it off again. So, if you feel adventurous, you can compile the source code and use the example “Foobar” application to play around with Limba. Before it can be used in production (if at all), some more time is needed.

I will publish documentation on how to test the project soon.

Doesn’t OverlayFS have a maximum stacking depth?

Oh yes it has! The “How does it work” explanation doesn’t tell the whole truth in that regard (mainly to keep the section small). In fact, Limba will generate a “runtime” for the newly installed software, which is a directory with links to the actual individual software components the runtime consists of. The runtime is identified by an UUID. This runtime is then mounted together with the respective applications using OverlayFS. This works pretty great, and also results in no dependency-resolution to be done immediately before an application is started.

Than dependency stuff gives me a headache…

Admittedly, allowing dependencies adds a whole lot of complexity. Other approaches, like the one outlined by Lennart work around that (and there are good reasons for doing that as well).

In my opinion, the dependency-sharing and de-duplication of software components, as well as the ability to use the components which are packaged by your Linux distribution is worth the extra effort.

Can you give an overview of future plans for Limba?

Sure, so here is the stuff which currently works:

  • Creating simple packages
  • Installing packages
  • Very basic dependency resolution (no relations (like >, <, =) are respected yet)
  • Running applications
  • Initial bits of system integration (.desktop files are registered)

These features are planned for the new future:

  • Support for removing software
  • Automatic software updates of 3rd-party software
  • Atomic updates
  • Better system integration
  • Integration with the new sandboxing features
  • GPG signing of packages
  • More documentation / bugfixes

Remember that Limba is an experiment, still ;-)

XKCD 927

Technically, I am replacing one solution with another one here, so the situation does not change at all ;-). But indeed, some duplicate work is done due to more people working in this area now on similar questions.

But I think this is a good thing, because the solutions worked on are fundamentally different approaches, and by exploring multiple ways of doing things, we will come up with something great in the end. (XKCD reference)

Doesn’t the use of OverlayFS have an impact on the performance of software running with Limba?

I ran some synthetic benchmarks and didn’t notice any problems – even the startup speed of Limba applications is only a few milliseconds slower than the startup of the “raw” native application. However, I will still have to run further tests to give a definitive answer on this.

How do you solve ABI compatibility issues?

This approach requires software to keep their ABI stable. But since software can have strict dependencies on a specific version of a software (although I’d discourage that), even people who are worried about this issue can be happy. We are getting much better at tracking unwanted ABI breaks, and larger projects offer stable API/ABI during a major release cycle. For smaller dependencies, there are, as explained above, stricter dependencies.

In summary, I don’t think ABI incompatibilities will be a problem with this approach – at least not more than they have been in general. (The libuild facilities from Listaller to minimize dependencies will still be present im Limba, of course)

You are wrong because of $X!

Please leave a comment in this case! I’d love to discuss new ideas and find the limits of the Limba concept – that’s why I am writing C code afterall, since what looks great on paper might not work in reality or have issues one hasn’t thought about before. So any input is welcomed!

Conclusion

Last but not least I want to thank Alexander Larsson for writing Glick2, which Limba is heavily inspired from, and for his patient replies to my emails.

If Limba turns out to be a good idea, you can expect a few more blog posts about it soon.


* Answered questions nobody asked yet

[1]: Don’t get me wrong, I would like to have these ideas implemented – they offer great value. But I think for “simple” software deployment, the solution is an overkill.

November 07, 2014
okay another braindump (still nothing working).

The git repo mentioned in previous post has all the code I've hacked up so far.

I finished writing the HDCP protocol stages, and sending all the msgs and getting replies from the device.

So I've successfully reached a point where I've negotiated a HDCP session key with the device, and we are both happy about it. Unfortunately I've no idea what I'm meant to be encrypting to send to the device. The next packet the USB traces contain is 384-bytes of encrypted data.

Now HDCP v2 had a vulnerabilty in its key neg, and I've written code to try and use this fact. So I've taken a trace I made from Windows, and extracted the necessary bits, and using that I've managed to derive the master key used in that trace, and subsequently managed to derived the session key for it. So I've replayed the first encrypted packet from the trace to the device and got an encrypted response the same as in the trace.

I've tried changing a bit in the session key, riv value and data I'm sending, and doing that causes the device not to reply with the answer. This to me implies that the device is using the HDCP cipher to encode the control channel. Now HDCP does say you should only do this for video streams, but maybe DisplayLink forgot to read that bit.

Now where does this leave me, in theory I should be able to replay the full trace (haven't had time yet) and I should see the same picture on screen as I did (though I can't remember what monitor/device I used, so I might have to retrace and restage my tests before then).

However I really need to decrypt the encrypted data in the trace, and from reading the HDCP spec the only values I need to feed the AES engine are ks ^ lc128, riv, streamctr, inputctr. I'm assuming streamctr and inputctr are 0 for the first packet (I could be wrong, maybe they use some wacky streamctr to avoid messing with hdcp), riv and ks I've captured. So lc128 is possibly the crux.

Now what is lc128? Its a secret 128-bit value in the HDCP world given only to HDCP adopters. Its normally something you'd store in hw on the GPU etc as an input to the hw cipher. But in displaylink there is no GPU encrypting the data. Now its possible that displaylink don't use the same lc128 as the HDCP people, unlikely but possible. Maybe they cipher their streams with their own lc128, and only use the offical hdcp lc128 for actual HDCP streams.

I don't think lc128 has leaked, I'm not sure what the consequences of it leaking would be, but hey its just a magic number, and if displaylink are using as an input to their AES code, it must be in RAM at some point, now I need to figure out ways to work that out. I'm not sure how long it would take to brute force as 128-bit key space, probably impossible.

At any point if someone from DisplayLink wants to talk, you know where to find me :-)

I got back to working on enabling LLVM’s machine scheduler for radeonsi targets in the R600 backend after seeing a really good tutorial about how it works at this year’s LLVM Developer’s conference.

Since I last worked on this, I’ve figured out how to enable register pressure tracking in the scheduler, so now the scheduler will switch to a register pressure reduction strategy once register usage approaches the threshold where using more registers reduces the number of threads that can run in parallel.

So far the results look pretty good, several of the Phoronix benchmarks are faster with the scheduler enabled. However, I am still trying to track down a bug which is causing the xonotic benchmark to lockup when using the ‘ultra’ settings.

If anyone wants to test it out, I’ve pushed the code to my personal repo.

November 06, 2014

… or “Why do I not see any update notifications on my brand-new Debian Jessie installation??”

This is a short explanation of the status quo, and also explains the “no update notifications” issue in a slightly more detailed way, since I am already getting bug reports for that.

As you might know, GNOME provides GNOME-Software for installation of applications via PackageKit. In order to work properly, GNOME-Software needs AppStream metadata, which is not yet available in Debian. There was a GSoC student working on the necessary code for that, but the code is not yet ready and doesn’t produce good results yet. Therefore, I postponed AppStream integration to Jessie+1, with an option to include some metadata for GNOME and KDE to use via a normal .deb package.

Then, GNOME was updated to 3.14. GNOME 3.14 moved lots of stuff into GNOME-Software, including the support for update-notifications (which have been in g-s-d before). GNOME-Software is also the only thing which can edit the application groups in GNOME-Shell, at least currently.

So obviously, there was no a much stronger motivation to support GNOME-Software in Jessie. The appstream-glib library, which GNOME-Software uses exclusively to read AppStream metadata, didn’t support the DEP-11 metadata format which Debian uses in place of the original AppSTream XML for a while, but does so in it’s current development branch. So that component had to be packaged first. Later, GNOME-Software was uploaded to the archive as well, but still lacked the required metadata. That data was provided by me as a .deb package later, locally generated using the current code by my SoC student (the data isn’t great, but better than nothing). So far with the good news.

But there are multiple issues at time. First of all, the appstream-data package didn’t pass NEW so far, due to it’s complex copyright situation (nothing we can’t resolve, since app-install-data, which appstream-data would replace, is in Debian as well). Also, GNOME-Software is exclusively using offline-updates (more information also on [1] and [2]) at time. This isn’t always working at the moment, since I haven’t had the time to test it properly – and I didn’t expect it to be used in Debian Jessie as well[3].

Furthermore, the offline-updates feature requires systemd (which isn’t an issue in itself, I am quite fine with that, but people not using it will get unexpected results, unless someone does the work to implement offline-updates with sysvinit).

Since we are in freeze at time, and obviously this stuff is not ready yet, GNOME is currently without update notifications and without a way to change the shell application groups.

So, how can we fix this? One way would of course be to patch notification support back into g-s-d, if the new layout there allows doing that. But that would not give us the other features GNOME-Software provides, like application-group-editing.

Implementing that differently and patching it to make it work would be more or at least the same amount of work like making GNOME-Software run properly. I therefore prefer getting GNOME-Software to run, at least with basic functionality. That would likely mean hiding things like the offline-update functionality, and using online-updates with GNOME-PackageKit instead.

Obviously, this approach has it’s own issues, like doing most of the work post-freeze, which kind of defeats the purpose of the freeze and would need some close coordination with the release-team.

So, this is the status quo at time. It is kind of unfortunate that GNOME moved crucial functionality into a new component which requires additional integration work by the distributors so quickly, but that’s something which isn’t worth to talk about. We need a way forward to bring update-notifications back, and there is currently work going on to do that. For all Debian users: Please be patient while we resolve the situation, and sorry for the inconvenience. For all developers: If you would like to help, please contact me or Laurent Bigonville, there are some tasks which could use some help.

As a small remark: If you are using KDE, you are lucky – Apper provides the notification support like it always did, and thanks to improvements in aptcc and PackageKit, it even is a bit faster now. For the Xfce and <other_desktop> people, you need to check if your desktop provides integration with PackageKit for update-checking. At least Xfce doesn’t, but after GNOME-PackageKit removed support for it (which was moved to gnome-settings-daemon and now to GNOME-Software) nobody stepped up to implement it yet (so if you want to do it – it’s not super-complicated, but knowledge of C and GTK+ is needed).

—-

[3]: It looks like dpkg tries to ask a debconf question for some reason, or an external tool like apt-listchanges is interfering with the process, which must run completely unsupervised. There is some debugging needed to resolve these Debian-specific issues.

November 04, 2014

So just a quick update. I pushed out the 1.5 release of Transmageddon today. No major new features just fixing a regression in terms of dealing with files where you only have a video track or where you want to drop the audio track as part of the transcoding process. I am also having some issues with Intel Hardware encoding atm, but I think those are somewhere lower in the stack, so I hope to file a bug against either GStreamer or the libva project for that issue, but for now I recommend not having the Intel VA plugins for GStreamer installed.

As always you find the latest release on linuxrising.org.

I also submitted a Transmageddon update to Fedora 21, so if you are a Fedora user please test the build there and give it some Karma

November 03, 2014

So I've just reposted my atomic modeset helper series, and since the main goal of all that work was to ensure a smooth and simple transition for existing drivers to the promised atomic land it's time to elaborate a bit. The big problem is that the existing helper libraries and callbacks to driver backends don't really fit the new semantics, so some shuffling was required to avoid long-term pain. So if you are a driver writer and just interested in the details then read for what needs to be done to support atomic modeset updates using these new helper libraries.

Phase 1: Reworking the Driver Backend Functions for Planes

The first phase is reworking the driver backend callbacks to fit the new world. There are two big mismatches between the new atomic semantics and legacy ioctl interfaces:

  • The primary plane is no longer tied to the CRTC. Instead it is possible to enable the CRTC without any planes (resulting in a black screen) or only overlay planes. And the primary plane can be enabled/disabled and moved without changing the mode (of course only if the hardware actually supports it). But the existing CRTC helper library used to implement modesets only provides the single crtc->mode_set driver callback which always implicitly enables the primary plane, too.
  • Atomic updates of multiple planes isn't supported at all. And worse the code to check whether a plane update will work out is smashed into the same callback that does the actual plane update, defeating the check/commit distinction used in atomic interfaces.
Both issues are addressed by adding new driver backend callbacks. Furthermore a few transitional helper functions are provided to implement the legacy entry points in terms of these new callbacks. That way the driver backend can be reworked without the additional hassle of needing to deal with all the atomic state object handling and check/commit semantics.

The first step is to rework the ->disable/update_plane hooks using the transitional helper implementations drm_plane_helper_update/disable. These need the following new driver callbacks:
  • ->atomic_check for both CRTCs and planes. This isn't strictly required, but any state checks implemented in the current ->update_plane hook must be moved into the plane's ->atomic_check callback. The CRTC's callback will likely be empty for now.
  • ->atomic_begin and ->atomic_flush CRTC callbacks. These wrap the actual plane update and should do per-CRTC work like preparing to send out the flip completion event. Or ensure that the plane updates are actually done atomically by e.g. setting/clearing GO bits or latching the update through some other means. Or if the hardware does not provide any support for synchronized updates, use vblank evasion to ensure all updates happen on the same frame.
  • ->prepare_fb and ->cleanup_fb hooks are also optional. These are used to setup the framebuffers, e.g. pin their backing storage into memory and set up any needed hardware resources. The important part is that besides the ->atomic_check callbacks ->prepare_fb is the only new callback which is allowed to fail. This is important to make asynchronous commits of atomic state updates work. The helper library guarantees that for any successful call of ->prepare_fb it will call ->cleanup_fb - even when something else fails in the atomic update.
  • Finally there's ->atomic_update. That's the function which does all the per-plane update, like setting up the new viewport or the new base address of the framebuffer for each plane.
With this it's also easy to implement universal plane support directly, instead of with the default implementation which doesn't allow the primary plane to be disabled. Universal planes are a requirement for atomic and need to be implemented in phase 1, but testing the primary plane support is also a good preparation for the next step:

The new crtc->mode_set_nofb callback must be implement, which just updates the CRTC timings and data in the hardware without touching the primary plane state at all. The provided helpers functions drm_helper_crtc_mode_set and drm_helper_crtc_mode_set_base then implement the callbacks required by the CRTC helpers in terms of the new ->mode_set_nofb callback and the above newly implemented plane helper callbacks.

Phase 2: Wire up the Atomic State Object Scaffolding

With the completion of phase 1 all the driver backend functions have been adapted to the new requirements of the atomic helper library. The goal of phase 2 is to get all the state object handling needed for atomic updates into place. There are three steps to that:

  • The first is fairly simply and consists in just wiring up all the state reset, duplicate and destroy functions for planes, CRTCs and connectors. Except for really crazy cases the default implementations from the atomic helper library should be good enough, at least to get started. With this there will always be an atomic state object stored in each object's ->state pointer.
  • The second step is patching up the state objects in legacy code paths to make sure that we can partially transition to atomic updates. If your driver doesn't have any transition checks for plane updates (i.e. doesn't ever look at the old state to figure out whether an change is possible) then all you need to do is keep the framebuffer pointers and reference counts in balance with drm_atomic_set_fb_for_plane. The transitional helpers from phase 1 already do this, so usually the only place this is needs to be manually added is in the ->page_flip callback.
  • Finally all ->mode_fixup callbacks need to be audited to not depend upon any state which is only set in the various CRTC helper callbacks and not tracked by the atomic state objects. This isn't required for implementing the legacy interfaces using atomic updates, but this is important to correctly implement the check/commit semantics. Especially when the commit is done asynchronously. This really is a corner-case though, but any such code must be moved into ->atomic_check hooks and rewritten in terms of the atomic state objects.

Phase 3: Rolling out Atomic Support

With the driver backend changes from phase 1 and the state handling changes from phase 2 everything is ready for the step-by-step rollout of atomic support. Presuming nothing was missed this just consists of wiring up the ->atomic_check and ->atomic_commit implementations from the atomic helper library. And then replacing all the legacy entry pointers with the corresponding functions from the atomic helper library to implement them in terms of atomic.

The recommended order is to start with planes first, then test the ->set_config functionality. Page flips and properties are best done later since they likely need some additional work:
  • The atomic helper library doesn't provide any default asynchronous commit support, since driver and hardware requirements seem to be too diverse. At least until we have a few proper implementations and can use them to extract a good set of helper functions. Hence drivers must implement basic async commit support using the building blocks provided (and other drm and kernel infrastructre like flip queues, fence callbacks and work items - hopefully soonish also vblank callbacks).
  • Property values need to be moved into the relevant state object first before the corresponding implementations can be wired up. As long as the driver doesn't yet support the full atomic ioctl this can be done at leisure, but must be completed before the ioctl can be enabled. To do so drivers need to subclass the relevant state structure and reimplement all the state handling functions rolled out in phase 2.

Besides these two complications (which might require a bit of work depending upon the driver) this is all that's needed for full atomic modeset and pageflip support.

Follow-up Driver Cleanups

But there's of course quite a bit of cleanup work possible afterards!

The are some big differences between the old CRTC helper modeset logic and the new one (using the same callbacks, but completely rewritten otherwise) in the atomic helper library:
  • The encoder/bridge/CRTC enabling/disabling sequence for a given modeset configuration is now always the same. Which means unused CRTC won't be disabled any more only after everything else is set up, but together with all the other blocks before enabling anything from the new configuration. Also, when an output pipeline changes the helper library will always force a full modeset of the entire pipeline.
    This reduces combinatorial complexity a lot and should especially help with shared resources (like PLLs) - no longer can a modeset spuriously fail just because the old CRTC hasn't released its PLL before the new one was enabled.
  • Thanks to the atomic state tracking the helper code won't lose track of the software state of each object any more. Which means disabled functions won't be disabled more than once. So all code in the driver-backend which checks the current state and acts accordingly can be flattened and replaced by WARNings.

These are all lessons learned from the i915 modeset rewrite. The only thing missing in the atomic helpers compared to i915 is the state readout and cross-checking support - everything else is there. But even that can be easily implemented by adding hardware state readout callbacks and using them in the various state reset functions (to reconstruct matching software state) and also to cross-check state.

The other big cleanup task is to stop using all the legacy state variables and switch all the driver backend code to only look at the state object structures. The two big examples here are crtc->mode and the plane->fb pointer.

So What Now?

With all that converting drivers should be simple and can be done with a series of not-too-invasive refactorings. But my patch series doesn't yet contain the actual atomic modeset ioctl. So what's left to be done in the drm core?

  • Per-plane locking is still missing - currently all plane-related changes just lock all CRTCs. Which is a bit too much, and in cases like the cursor plane or for page flips actually a regression compared to the legacy code paths.
  • The atomic ioctl uses properties for everything, even for the standard properties inherited from the legacy ioctls. All the code for parsing properties exists already in Rob Clark's patch series, but needs to be rebased and adpated to the slightly different interfaces this latest iteration of the internal atomic interface has.
  • The fbdev emulation needs to grow proper atomic check/commit support. This is both a good kernel-internal validation of the atomic interface and would finally allow us to get multi-pipe configuration for fbcon to work correctly. But before we can do this we need a driver with multiple CRTCs, shared resource constraints and proper atomic support to be able to even test this.
  • There's still some room for more helpers, for example pretty much all drivers have some sort of vblank driver callback and work item infrastructure. That's better done as a helper in the core vblank handling code.
  • And finally we need the actual ioctl code.
So still a few things to do, besides adding atomic support to all drivers.

Update: The explanation for how to implement state readout and cross checking was a bit confused, so I reworded that.
November 02, 2014
Since Dave Airlie moved the feature cut-off of the drm-next tree roughly one month ahead it is already time for our regular look at what's ahead. Even though the 3.17 features aren't even released yet.

On the modeset side of things we now have the final pieces for plane rotation support from Sonika Jindal and Ville. The DisplayPort code has also seen lots of improvements, with updated training values in preparation of the latest eDP standard (Sonika Jindal) and support for DP training pattern 3 (Ville). DSI panels now support burst mode (Shobhit) and hdmi conformance has been improved with some fixes from Clint Taylor.

For eDP panels we also have improved panel power sequencing code, mostly to fix issues on Cherryview, from Ville. Ville has also contributed fixes to the VDD handling code, which is used to temporarily enable panel power. And the backlight code learned to handle the bl_power setting so that the backlight can be turned off completely without upsetting the panel's power sequencing, contributed by Jani.

Chris Wilson has also been fairly busy on the modeset code: 3.18 includes his patches to cache EDIDs for a single probe call - unfortunately the full caching solution to keep the EDID around between multiple probe calls isn't merged yet. And pageflips have now improved error detection and recovery logic: In case something goes wrong we shouldn't end up stuck any longer waiting for a pageflip to complete that has been lost by either the hardware or the driver.

Moving on to platform specific work there's been lots of preparations for Skylake, most of it from Damien and Sonika. The actual intial platform enabling is delayed for 3.19 though. On the other end of the timeline Ville fixed up i830M modeset support on a rainy w/e in his vacation, and 3.18 now has all that code. And there has been a lot of Cherryview fixes all over.

Cherryview also gained support for power wells and hence runtime pm (Ville). And for platform agnostic feature a lot of the preparation for DRRS (dynamic refresh rate switching) is merged, hopefully the actual feature patches from Vandana Kannan will land in 3.19.

Moving on the render side of the driver there's been a lot of patches to beat the full ppgtt support into shape. The context code has been cleaned up, lifetime handling for ppgtt address spaces is fixed and bad interactions with secure batches are now also rectified. Enabling full ppgtt missed the feature cutoff by a hair though, but it's already enabling for the following release.

Basic support for execlists command submission from Ben Widawsky, Oscar Mateo and Thomas Daniel was also merged. This is the fancy new way to submit commands available on Gen8 and subsequent platforms. It's not yet enabled by default, but since it's a requirement for a lot of cool new features keep an eye on what's going on here. There is also a lot of work going on underneath to enable all this new code in GEM, like preparing to switch away from sequence numbers to tracking gpu progress more abstractly using the driver's request structures.

And this time around there is also some cool stuff going on in the drm core worth of a shout-out: The vblank handling code is massively revamped, hopefully plugging all the small races, inconsistencies and inefficiencies in that code. And thanks to David Herrmann it is finally possible to write a drm driver without the drm midlayer getting completely in the way of a proper driver load and unload sequence! Unfortunately i915 can't be converted right away since the legacy usermodesetting code crucial relies on this midlayer functionality. But that's now well deprecated and hopefully can be removed in one of the next releases.
November 01, 2014
Trackballs

I dusted off (literally) my Logitech Marble trackball to replace the Intuos tablet + mouse combination that I was using to cut down on the lateral movement of my right arm which led to back pains.

Not that you care about that one bit, but that meant that I needed a way to get a scroll wheel working with this scroll-wheel less trackball. That's now implemented in gnome-settings-daemon for GNOME 3.16. You'd run:


gsettings set org.gnome.settings-daemon.peripherals.trackball scroll-wheel-emulation-button 8

With "8" being the mouse button number to use to make the trackball ball into a wheel. We plan to add an interface to configure this in the Settings.

Touchscreens

Touchscreens are now switched off when the screensaver is on. This means you'll usually need to use one of the hardware buttons on tablets, or a mouse or keyboard on laptops to turn the screen back on.

Note that you'll need a kernel patch to avoid surprises when the touchscreen is re-enabled.

More touchscreens

The driver for the Goodix touchscreen found in the Onda v975w is now upstream as well.
October 31, 2014

Last week, a blog post Hints for writing Unix tools by Marius Eriksen made the rounds. It presented nine suggestions on what makes a command a good citizen of the Unix command-line ecosystem, especially for fitting into pipelines and filters.

This reminded me of a longer list of guidelines I recently gathered as part of our efforts to train new hires in Solaris engineering. I polled long time engineers, trawled the Best Practices documents of our Architecture Review Committee, cross referenced to the WCAG2ICT accessibility guidelines for non-web applications recommended by Oracle’s accessibility group, and linked to our online documentation, to come up with our suggestions on writing new CLI tools for Solaris.

Since these may be useful to others writing commands, I figured I’d share some of them. I’ve left out the bits specific to complying with our internal policies or using private interfaces that aren’t documented for external use, but many of these are generally applicable. Do note that these are based in part on lessons learned from 40+ years of Unix history, and that history means that many existing commands do not follow these suggestions, and in some cases, can’t follow them without breaking backwards compatibility, so please don’t start calling tech support to complain about every case our old code isn’t doing one of these things.

One of the key points of our best practices is that many commands belong to part of a larger family, and it’s best to fit in with that existing family. If you’re writing a Solaris-specific command, it should follow the Solaris Command Line Interface Paradigm guidelines (as listed in the Solaris Intro(1) man page), but GNU commands should instead follow the GNU Coding Standards, classic X11 commands should use the options described in the OPTIONS section of the X(7) man page, and so on.

Command names & paths

  • Most new commands should have names 3-9 letters long. Command names longer than 9 letters should be commands users rarely have to type in.
  • Follow common naming patterns, such as:
    Pattern Usage
    *adm Command to change current state & administer a subsystem
    *cfg Command to make permanent configuration changes to a subsystem
    *info Command to print information about objects managed by a subsystem
    *prop Command to print properties of objects managed by a subsystem
    *stat Command to print/monitor statistics on a subsystem
  • Commands run by normal users should be delivered in /usr/bin/. Commands normally only run by sysadmins should be delivered in /usr/sbin/. Commands only run by other programs, not humans, should be in an appropriate subdirectory under /usr/lib/. (Commands not delivered with the OS should instead use the appropriate subdirectory under /opt instead of /usr in the above paths.)

Options

  • Never provide an option to take a password or other sensitive data on the command line or environment variables, as ps and the proc tools can show those to other users. (see Passing secrets to subprocesses).
  • All commands should have a --help and -? option to print recognized options/arguments/subcommands.
  • Option parsing should use one of the standard getopt() routines if at all possible. If you don’t use one, your custom parser will need to replicate a lot of things the standard routines provide for error checking & handling.
    • When reporting errors, be specific about which argument/option failed, don’t just dump usage output and make the user guess which part of the command line was wrong. (See WCAG2ICT #3.3.1. Examples of fixing this in X11 programs: bitmap, fslsfonts, mkfontscale, xgamma, xpr, xsetroot.)
    • If possible, provide suggestions to correct - if option is invalid, list options that would be valid. Same for subcommands, arguments, etc. (See WCAG2ICT #3.3.3.)
  • Option flags should be similar across commands when possible.

Subcommands

If you are writing a command that uses subcommands, then being careful in your work can make your command much easier to use.

Good examples to follow: hg, zfs, dladm

  • The help subcommand should list the other subcommands, but not overwhelm the user with pages of details on all of them. (Remember, the Solaris kernel text console has no scrollback and users with text-to-speech don’t want 10 minutes of output from it.)
    Good examples: hg, svccfg
  • The help foo or foo --help subcommands should list the options specific to that subcommand.
    Good examples: hg
  • Look at existing commands with similar subcommands and use similar names for your subcommands

Text output

  • All functionality should be available when TERM=dumb. Use of color output, bold text, terminal positioning codes, etc. can be used to enhance output on more capable terminals, but users need to be able to use the system without it. Users may need to run different commands to get plain text interface instead of curses/terminal mode, such as ed instead of vi, or mailx vs. mutt, as long as it’s clearly documented what they need to run instead, but they must be able to get their work done in some way. (See WCAG2ICT #1.3.2, WCAG2ICT #1.4.1, & WCAG2ICT #1.4.3)
  • Text output is generally composed of messages and data. Messages are the text included in the program, such as status descriptions, error messages, and output headers; while data comes from the subsystem the command interacts with, and depends on the system in question.
    • Messages displayed to users should use gettext(3C) to allow translation & localization.
    • Errors should be printed to stderr, other output to stdout, unless specific output redirection options (such as logging errors to a file) are given.
  • Users should be able to disable any use of ASCII art, line drawing characters, figlet-style text and any other output other than plain text which a text-to-speech screen reader cannot figure out how to read, while not losing information, only formatting of it. (See WCAG2ICT #1.1.1 & WCAG2ICT #1.4.5)
  • Error messages should include the program name to help track down which program produced an error in a shell script, SMF method, etc. This is automatically done if you use the standard libc functions err, verr, errx, verrx, warn, vwarn, warnx, vwarnx (3c).
  • Parsable output should follow the design outlined in Creating Shell-Friendly Parsable Output:
    • Parsable output should require the user to specify the fields to output, via a -o or similar flag, so that new fields can be added to the command without breaking existing parsers.
    • Headers should be omitted in parsable output mode or when a flag such as -H is specified to omit them.
    • Parsable output should use a non-whitespace delimiter, such as “:” between fields.

Privileges on Solaris

User Interaction

  • If you offer an interactive command prompt mode, such as svccfg does for executing subcommands, consider using libtecla or similar support for command line editing in this mode.
  • Any operation that may permanently alter or destroy data should either have an “undo” option (such as rollback to prior snapshot) or have a mode offering the user a chance to confirm (such as the -i option to rm). (See WCAG2ICT #3.3.4)
  • Users should be able to configure timeout lengths for any operation that expects user interaction before a timeout expires. (See WCAG2ICT #2.2.1)

Implementation

References

So for some reason I decided to look at the displaylink usb3 adaptors today. (no good news).

This blog post is so I don't forget all of this when I page it out. Notes, HDCP1.0 being broken doesn't matter to this, maybe HDCPv2.0 being a bit broken could be used, but I'm not sure how!

The displaylink USB3 protocol is based on HDCP protocol. I've traced the first few packets and it clearly
looks like the host sends two packets

AKE_Init,
AKE_Transmitter_Info

and the device sends back
AKE_Send_Cert

at least.

AKE_Send_Cert contains a 522 byte certificate, containing a receiver id, public key, some misc bytes and a signature generated with the DCP LLC private key, that you have to verify.

so the HDCP v2.2 spec contains the DP LLC public key, and I've written some code to verify the spec using openssl, but it totally fails to work. This is probably due to me doing something stupid, or not understanding what I'm doing, if you are openssl knowledgeable and want to look, the hack fest is
http://cgit.freedesktop.org/~airlied/dl3dev/

It might be the DisplayLink devices use a different signing key than the DP LLC one.

That repo contains some code to talk to the device (currently disabled) and do the initial sequence, along with an attempt to verify the cert.

Now once I get past this hurdle, the larger one seems to remain, the HDCP 2.0 spec has a global secret 128-bit value called LC128, that everyone who implements HDCP gets and hides somewhere. Its probably sitting in the displaylink driver in hex, but I'd hope they at least hide it better than that. It may also be possibly supplied by the OS, Windows or OSX. (I've no clue yet). That value is used in the key negotiation.

Now it might be possible that Displaylink allow non-HDCP encrypted data to be sent to the device, in which case win if I can find out where/how to do that, or it might be the device requires HDCP and decrypts non-HDCP content before sending it over VGA/DVI. I've no ideas yet on that front either.

Ah well probably enough learning for today, I knew nothing about HDCP this morning, so I can't say it made my life any better learning about it :-P
October 30, 2014

Glamor Cleanup

Before I start really digging in to reworking the Render support in Glamor, I wanted to take a stab at cleaning up some cruft which has accumulated in Glamor over the years. Here's what I've done so far.

Get rid of the Intel fallback paths

I think it's my fault, and I'm sorry.

The original Intel Glamor code has Glamor implement accelerated operations using GL, and when those fail, the Intel driver would fall back to its existing code, either UXA acceleration or software. Note that it wasn't Glamor doing these fallbacks, instead the Intel driver had a complete wrapper around every rendering API, calling special Glamor entry points which would return FALSE if GL couldn't accelerate the specified operation.

The thinking was that when GL couldn't do something, it would be far faster to take advantage of the existing UXA paths than to have Glamor fall back to pulling the bits out of GL, drawing to temporary images with software, and pushing the bits back to GL.

And, that may well be true, but what we've managed to prove is that there really aren't any interesting rendering paths which GL can't do directly. For core X, the only fallbacks we have today are for operations using a weird planemask, and some CopyPlane operations. For Render, essentially everything can be accelerated with the GPU.

At this point, the old Intel Glamor implementation is a lot of ugly code in Glamor without any use. I posted patches to the Intel driver several months ago which fix the Glamor bits there, but they haven't seen any review yet and so they haven't been merged, although I've been running them since 1.16 was released...

Getting rid of this support let me eliminate all of the _nf functions exported from Glamor, along with the GLAMOR_USE_SCREEN and GLAMOR_USE_PICTURE_SCREEN parameters, along with the GLAMOR_SEPARATE_TEXTURE pixmap type.

Force all pixmaps to have exact allocations

Glamor has a cache of recently used textures that it uses to avoid allocating and de-allocating GL textures rapidly. For pixmaps small enough to fit in a single texture, Glamor would use a cache texture that was larger than the pixmap.

I disabled this when I rewrote the Glamor rendering code for core X; that code used texture repeat modes for tiles and stipples; if the texture wasn't the same size as the pixmap, then texturing would fail.

On the Render side, Glamor would actually reallocate pixmaps used as repeating texture sources. I could have fixed up the core rendering code to use this, but I decided instead to just simplify things and eliminate the ability to use larger textures for pixmaps everywhere.

Remove redundant pixmap and screen private pointers

Every Glamor pixmap private structure had a pointer back to the pixmap it was allocated for, along with a pointer to the the Glamor screen private structure for the related screen. There's no particularly good reason for this, other than making it possible to pass just the Glamor pixmap private around a lot of places. So, I removed those pointers and fixed up the functions to take the necessary extra or replaced parameters.

Similarly, every Glamor fbo had a pointer back to the Glamor screen private too; I removed that and now pass the Glamor screen private parameter as needed.

Reducing pixmap private complexity

Glamor had three separate kinds of pixmap private structures, one for 'normal' pixmaps (those allocated by them selves in a single FBO), one for 'large' pixmaps, where the pixmap was tiled across many FBOs, and a third for 'atlas' pixmaps, which presumably would be a single FBO holding multiple pixmaps.

The 'atlas' form was never actually implemented, so it was pretty easy to get rid of that.

For large vs normal pixmaps, the solution was to move the extra data needed by large pixmaps into the same structure as that used by normal pixmaps and simply initialize those elements correctly in all cases. Now, most code can ignore the difference and simply walk the array of FBOs as necessary.

The other thing I did was to shrink the number of possible pixmap types from 8 down to three. Glamor now exposes just these possible pixmap types:

  • GLAMOR_MEMORY. This is a software-only pixmap, stored in regular memory and only drawn with software. This is used for 1bpp pixmaps, shared memory pixmaps and glyph pixmaps. Most of the time, these pixmaps won't even get a Glamor pixmap private structure allocated, but if you use one of these with the existing Render acceleration code, that will end up wanting a private pointer. I'm hoping to fix the code so we can just use a NULL private to indicate this kind of pixmap.

  • GLAMOR_TEXTURE. This is a full Glamor pixmap, capable of being used via either GL or software fallbacks.

  • GLAMOR_DRM_ONLY. This is a pixmap based on an FBO which was passed from the driver, and for which Glamor couldn't get the underlying DRM object. I think this is an error, but I don't quite understand what's going on here yet...

Future Work

  • Deal with X vs GL color formats
  • Finish my new CompositeGlyphs code
  • Create pure shader-based gradients
  • Rewrite Composite to use the GPU for more computation
  • Take another stab at doing GPU-accelerated trapezoids