February 22, 2017

As I wrote a few weeks ago in my post about Gnocchi 3.1 being released, one of the new feature available in this version it the S3 driver. Today I would like to show you how easy it is to use it and store millions of metrics into the simple, durable and massively scalable object storage provided by Amazon Web Services.


The installation of Gnocchi for this use case is not different than the standard installation procedure described in the documentation. Simply install Gnocchi from PyPI using the following command:

$ pip install gnocchi[s3,postgresql] gnocchiclient

This will install Gnocchi with the dependencies for the S3 and PostgreSQL drivers and the command-line interface to talk with Gnocchi.

Configuring Amazon RDS

Since you need a SQL database for the indexer, the easiest way to get started is to create a database on Amazon RDS. You can create a managed PostgreSQL database instance in just a few clicks.

Once you're on the homepage of Amazon RDS, pick PostgreSQL as a database:

You can then configure your PostgreSQL instance: I've picked a dev/test instance with the basic options available within the RDS Free Tier, but you can pick whatever you think is needed for your production use. Set a username and a password and note them for later: we'll need them to configure Gnocchi.

The next step is to configure the database in details. Just set the database name to "gnocchi" and leave the other options to their default values (I'm lazy).

After a few minutes, your instance should be created and running. Note down the endpoint. In this case, my instance is

Configuring Gnocchi for S3 access

In order to give Gnocchi an access to S3, you need to create access keys. The easiest way to create them is to go to IAM in your AWS console, pick a user with S3 access and click on the big gray button named "Create access key".

Once you do that, you'll get the access key id and secret access key. Note them down, we will need these later.

Creating gnocchi.conf

Now is time to create the gnocchi.conf file. You can place it in /etc/gnocchi if you want to deploy it system-wide, or in any other directory and add the --config-file option to each Gnocchi command..

Here are the values that you should retrieve and write in the configuration file:

  • indexer.url: the PostgreSQL RDS instance endpoint and credentials (see above) to set into
  • storage.s3_endpoint_url: the S3 endpoint URL – that depends on the region you want to use and they are listed here.
  • storage.s3_region_name: the S3 region name matching the endpoint you picked.
  • storage.s3_access_key_id and storage.s3_secret_acess_key: your AWS access key id and secret access key.

Your gnocchi.conf file should then look like that:

url = postgresql://
driver = s3
s3_endpoint_url =
s3_region_name = eu-west-1
s3_access_key_id = <you access key id>
s3_secret_access_key = <your secret access key>

Once that's done, you can run gnocchi-upgrade in order to initialize Gnocchi indexer (PostgreSQL) and storage (S3):

$ gnocchi-upgrade --config-file gnocchi.conf
2017-02-07 15:35:52.491 3660 INFO gnocchi.cli [-] Upgrading indexer 
2017-02-07 15:36:04.127 3660 INFO gnocchi.cli [-] Upgrading storage 

Then you can run the API endpoint using the test endpoint gnocchi-api and specifying its default port 8041:

$ gnocchi-api --port 8041 -- --config-file gnocchi.conf
2017-02-07 15:53:06.823 6290 INFO [-] WSGI config used: /Users/jd/Source/gnocchi/gnocchi/rest/api-paste.ini
STARTING test server
Available at
DANGER! For testing only, do not use in production

The best way to run Gnocchi API is to use uwsgi as documented, but in this case, using the testing daemon gnocchi-api is good enough.

Finally, in another terminal, you can start the gnocchi-metricd daemon that will process metrics in background:

$ gnocchi-metricd --config-file gnocchi.conf
2017-02-07 15:52:41.416 6262 INFO gnocchi.cli [-] 0 measurements bundles across 0 metrics wait to be processed.

Once everything is running, you can use Gnocchi's client to query it and check that everything is OK. The backlog should be empty at this stage, obviously.

$ gnocchi status
| Field                                               | Value |
| storage/number of metric having measures to process | 0     |
| storage/total number of measures to process         | 0     |

Gnocchi is ready to be used!

$ # Create a generic resource "foobar" with a metric named "visitor"
$ gnocchi resource create foobar -n visitor
| Field                 | Value                                         |
| created_by_project_id |                                               |
| created_by_user_id    | admin                                         |
| creator               | admin                                         |
| ended_at              | None                                          |
| id                    | b4d568e4-7af1-5aec-ac3f-9c09fa3685a9          |
| metrics               | visitor: 05f45876-1a69-4a64-8575-03eea5b79407 |
| original_resource_id  | foobar                                        |
| project_id            | None                                          |
| revision_end          | None                                          |
| revision_start        | 2017-02-07T14:54:54.417447+00:00              |
| started_at            | 2017-02-07T14:54:54.417414+00:00              |
| type                  | generic                                       |
| user_id               | None                                          |

# Send the number of visitor at 2 different timestamps
$ gnocchi measures add --resource-id foobar -m 2017-02-07T15:56@23 visitor
$ gnocchi measures add --resource-id foobar -m 2017-02-07T15:57@42 visitor

# Check the average number of visitor
# (the --refresh option is given to be sure the measure are processed)
$ gnocchi measures show --resource-id foobar visitor --refresh
| timestamp                 | granularity | value |
| 2017-02-07T15:55:00+00:00 |       300.0 |  32.5 |

# Now shows the minimum number of visitor
$ gnocchi measures show --aggregation min --resource-id foobar visitor
| timestamp                 | granularity | value |
| 2017-02-07T15:55:00+00:00 |       300.0 |  23.0 |

And voilà! You're ready to store millions of metrics and measures on your Amazon Web Services cloud platform. I hope you'll enjoy it and feel free to ask any question in the comment section or by reaching me directly!

February 16, 2017

Knowing that collectd is a daemon that collects system and applications metrics and that Gnocchi is a scalable timeseries database, it sounds like a good idea to combine them together. Cheery on the cake: you can easily draw charts using Grafana.

While it's true that Gnocchi is well integrated with OpenStack, as it comes from this ecosystem, it actually works standalone by default. Starting with the 3.1 version, it is now easy to send metrics to Gnocchi using collectd.


What we'll need to install to accomplish this task is:

  • collectd
  • Gnocchi
  • collectd-gnocchi

How you install them does not really matter. If they are packaged by your operating system, go ahead. For Gnocchi and collectd-gnocchi, you can also use pip:

# pip install gnocchi[file,postgresql]
Successfully installed gnocchi-3.1.0
# pip install collectd-gnocchi
Collecting collectd-gnocchi
  Using cached collectd-gnocchi-1.0.1.tar.gz
Installing collected packages: collectd-gnocchi
  Running install for collectd-gnocchi ... done
Successfully installed collectd-gnocchi-1.0.1

The detailed installation procedure for Gnocchi is detailed in the documentation. It among other things explains which flavors are available – here I picked PostgreSQL and the file driver to store the metrics.



Gnocchi is simple to configure and is again documented. The default configuration file is /etc/gnocchi/gnocchi.conf – you can generate it with gnocchi-config-generator if needed. However, it also possible to specify another configuration file by appending the --config-file option to any command line

In Gnocchi's configuration file, you need to set the indexer.url configuration option to point an existing PostgreSQL database and set storage.file_basepath to an existing directory to store your metrics (the default is /var/lib/gnocchi). That gives something like:

url = postgresql://root:p4assw0rd@localhost/gnocchi
file_basepath = /var/lib/gnocchi

Once done, just run the gnocchi-upgrade command to initialize the index and storage.


Collectd provides a default configuration file that loads a bunch of plugin by default, that will meter all sort of metrics on your computer. You can check the documentation online to see how to disable or enable plugins.

As the collectd-gnocchi plugin is written in Python, you'll need to enable the Python plugin and load the collectd-gnocchi module:

LoadPlugin python
<Plugin python>
Import "collectd_gnocchi"
<Module collectd_gnocchi>
endpoint "http://localhost:8041"

That is enough to enable the storage of metrics in Gnocchi.

Running the daemons

Once everything is configured, you can launch gnocchi-metricd and the gnocchi-api daemon:

$ gnocchi-metricd
2017-01-26 15:22:49.018 15971 INFO gnocchi.cli [-] 0 measurements bundles
across 0 metrics wait to be processed.
# In another terminal
$ gnocchi-api --port 8041
STARTING test server
Available at

It's not recommended to run Gnocchi using Gnocchi API (as written in the documentation): using uwsgi is a better option. However for rapid testing, the gnocchi-api daemon is good enough.

Once that's done, you can start collectd:

$ collectd
# Or to run in foreground with a different configuration file:
# $ collectd -C collectd.conf -f

If you have any problem launchding colllectd, check syslog for more information: there might be an issue loading a module or plugin.

If no error are printed, then everythin's working fine and you soon should see gnocchi-api printing some requests such as: - - [26/Jan/2017 15:27:03] "POST /v1/resource/collectd HTTP/1.1" 409 113 - - [26/Jan/2017 15:27:03] "POST /v1/batch/resources/metrics/measures?create_metrics=True HTTP/1.1" 400 91

Enjoying the result

Once everything runs, you can access your newly created resources and metric by using the gnocchiclient. It should have been installed as a dependency of collectd_gnocchi, but you can also install it manually using pip install gnocchiclient.

If you need to specify a different endpoint you can use the --endpoint option (which default to http://localhost:8041). Do not hesitate to check the --help option for more information.

$ gnocchi resource list --details
| id            | type     | project_id | user_id | original_resource_id | started_at    | ended_at | revision_start | revision_end | creator | host      |
| dd245138-00c7 | collectd | None       | None    | dd245138-00c7-5bdc-  | 2017-01-26T14 | None     | 2017-01-26T14: | None         | admin   | localhost |
| -5bdc-94f8-26 |          |            |         | 94f8-263e236812f7    | :21:02.297466 |          | 21:02.297483+0 |              |         |           |
| 3e236812f7    |          |            |         |                      | +00:00        |          | 0:00           |              |         |           |
$ gnocchi resource show collectd:localhost
| Field                 | Value                                                                 |
| created_by_project_id |                                                                       |
| created_by_user_id    | admin                                                                 |
| creator               | admin                                                                 |
| ended_at              | None                                                                  |
| host                  | localhost                                                             |
| id                    | dd245138-00c7-5bdc-94f8-263e236812f7                                  |
| metrics               | interface-en0@if_errors-0: 5d60f224-2e9e-4247-b415-64d567cf5866       |
|                       | interface-en0@if_errors-1: 1df8b08b-555a-4cab-9186-f9b79a814b03       |
|                       | interface-en0@if_octets-0: 491b7517-7219-4a04-bdb6-934d3bacb482       |
|                       | interface-en0@if_octets-1: 8b5264b8-03f3-4aba-a7f8-3cd4b559e162       |
|                       | interface-en0@if_packets-0: 12efc12b-2538-45e7-aa66-f8b9960b5fa3      |
|                       | interface-en0@if_packets-1: 39377ff7-06e8-454a-a22a-942c8f2bca56      |
|                       | interface-en1@if_errors-0: c3c7e9fc-f486-4d0c-9d36-55cea855596a       |
|                       | interface-en1@if_errors-1: a90f1bec-3a60-4f58-a1d1-b3c09dce4359       |
|                       | interface-en1@if_octets-0: c1ee8c75-95bf-4096-8055-8c0c4ec8cd47       |
|                       | interface-en1@if_octets-1: cbb90a94-e133-4deb-ac10-3f37770e32f0       |
|                       | interface-en1@if_packets-0: ac93b1b9-da71-4876-96aa-76067b35c6c9      |
|                       | interface-en1@if_packets-1: 2f8528b2-12ae-4c4d-bec7-8cc987e7487b      |
|                       | interface-en2@if_errors-0: ddcf7203-4c49-400b-9320-9d3e0a63c6d5       |
|                       | interface-en2@if_errors-1: b249ea42-01ad-4742-9452-2c834010df71       |
|                       | interface-en2@if_octets-0: 8c23013a-604e-40bf-a07a-e2dc4fc5cbd7       |
|                       | interface-en2@if_octets-1: 806c1452-0607-4b56-b184-c4ffd48f52c0       |
|                       | interface-en2@if_packets-0: c5bc6103-6313-4b8b-997d-01930d1d8af4      |
|                       | interface-en2@if_packets-1: 478ae87e-e56b-44e4-83b0-ed28d99ed280      |
|                       | load@load-0: 5db2248d-2dca-401e-b2e2-bbaee23b623e                     |
|                       | load@load-1: 6f74ac93-78fd-4a74-a47e-d2add487a30f                     |
|                       | load@load-2: 1897aca1-356e-4791-907f-512e516992b5                     |
|                       | memory@memory-active-0: 83944a85-9c84-4fe4-b471-1a6cf8dce858          |
|                       | memory@memory-free-0: 0ccc7cfa-26a5-4441-a15f-9ebb2aa82c6d            |
|                       | memory@memory-inactive-0: 63736026-94c4-47c5-8d6f-a9d89d65025b        |
|                       | memory@memory-wired-0: b7217fd6-2cdc-4efd-b1a8-a1edd52eaa2e           |
| original_resource_id  | dd245138-00c7-5bdc-94f8-263e236812f7                                  |
| project_id            | None                                                                  |
| revision_end          | None                                                                  |
| revision_start        | 2017-01-26T14:21:02.297483+00:00                                      |
| started_at            | 2017-01-26T14:21:02.297466+00:00                                      |
| type                  | collectd                                                              |
| user_id               | None                                                                  |
% gnocchi metric show -r collectd:localhost load@load-0
| Field                              | Value                                                                 |
| archive_policy/aggregation_methods | min, std, sum, median, mean, 95pct, count, max                        |
| archive_policy/back_window         | 0                                                                     |
| archive_policy/definition          | - timespan: 1:00:00, granularity: 0:05:00, points: 12                 |
|                                    | - timespan: 1 day, 0:00:00, granularity: 1:00:00, points: 24          |
|                                    | - timespan: 30 days, 0:00:00, granularity: 1 day, 0:00:00, points: 30 |
| archive_policy/name                | low                                                                   |
| created_by_project_id              |                                                                       |
| created_by_user_id                 | admin                                                                 |
| creator                            | admin                                                                 |
| id                                 | 5db2248d-2dca-401e-b2e2-bbaee23b623e                                  |
| name                               | load@load-0                                                           |
| resource/created_by_project_id     |                                                                       |
| resource/created_by_user_id        | admin                                                                 |
| resource/creator                   | admin                                                                 |
| resource/ended_at                  | None                                                                  |
| resource/id                        | dd245138-00c7-5bdc-94f8-263e236812f7                                  |
| resource/original_resource_id      | dd245138-00c7-5bdc-94f8-263e236812f7                                  |
| resource/project_id                | None                                                                  |
| resource/revision_end              | None                                                                  |
| resource/revision_start            | 2017-01-26T14:21:02.297483+00:00                                      |
| resource/started_at                | 2017-01-26T14:21:02.297466+00:00                                      |
| resource/type                      | collectd                                                              |
| resource/user_id                   | None                                                                  |
| unit                               | None                                                                  |
$ gnocchi measures show -r collectd:localhost load@load-0
| timestamp                 | granularity |              value |
| 2017-01-26T00:00:00+00:00 |     86400.0 | 3.2705004391254193 |
| 2017-01-26T15:00:00+00:00 |      3600.0 | 3.2705004391254193 |
| 2017-01-26T15:00:00+00:00 |       300.0 | 2.6022800611413044 |
| 2017-01-26T15:05:00+00:00 |       300.0 |  3.561742940080275 |
| 2017-01-26T15:10:00+00:00 |       300.0 | 2.5605337960379466 |
| 2017-01-26T15:15:00+00:00 |       300.0 |  3.837517851142473 |
| 2017-01-26T15:20:00+00:00 |       300.0 | 3.9625948392427883 |
| 2017-01-26T15:25:00+00:00 |       300.0 | 3.2690042162698414 |

As you can see, the command line works smoothly and can show you any kind of metric reported by collectd. In this case, it was just running on my laptop, but you can imagine it's easy enough to poll thousands of hosts with collectd and Gnocchi.

Bonus: charting with Grafana

Grafana, a charting software, has a plugin for Gnocchi as detailed in the documentation. Once installed, you can just configure Grafana to point to Gnocchi this way:

Grafana configuration screen

You can then create a new dashboard by filling the forms as you wish. See this other screenshot for a nice example:

Charts of my laptop's load average

I hope everything is clear and easy enough. If you have any question, feel free to write something in the comment section!

February 13, 2017
Probably the biggest news of the last two weeks is that Boris's native HDMI audio driver is now on the mailing list for review.  I'm hoping that we can get this merged for 4.12 (4.10 is about to be released, so we're too late for 4.11).  We've tested stereo audio so far, no compresesd audio (though I think it should Just Work), and >2 channel audio should be relatively small amounts of work from here.  The next step on HDMI audio is to write the alsalib configuration snippets necessary to hide the weird details of HDMI audio (stereo IEC958 frames required) so that sound playback works normally for all existing userspace, which Boris should have a bit of time to work on still.

I've also landed the vc4 part of the DSI driver in the drm-misc-next tree, along with a fixup.  Working with the new drm-misc-next trees for vc4 submission is delightful -- once I get a review, I can just push code to the tree and it will magically (through Daniel Vetter's pull requests and some behind-the-scenes tools) go upstream at the appropriate time.  I am delighted with the work that Daniel has been doing to make the DRM subsystem more welcoming of infrequent contributors and reducing the burden on us all of participating in the Linux kernel process.

In 3D land, the biggest news is that I've fixed a kernel oops (producing a desktop lockup) when CMA returns out of memory.  Unfortunately, VC4 doesn't have an MMU, so we require that memory allocations for graphics be contiguous (using the Contiguous Memory Allocator), and to make things worse we have a limit of 256MB for CMA due to an addressing bug of the V3D part, so CMA returning out of memory is a serious and unfortunately frequent problem.  I had a bug where a CMA failure would put the failed allocation into the BO cache, so if you asked for a BO of the same size soon enough, you'd get the bad BO back and crash.  I've been unable to construct a good minimal testcase for it, but the patch is on the mailing list and in the rpi-4.4.y tree now.

I've also fixed a bug in the downstream tree's "firmware KMS" mode (where I use the closed source firmware for display, but open source 3D) where fullscreen 3D rendering would get a single frame displayed and then freeze.

In userspace, I've fixed a bit of multisample rendering (copies of MSAA depth buffers) that gets more of our regression testing working, and worked on some potential rasterization problems (the texwrap regression tests are failing due to texture filtering troubles, and I'm not sure if we're actually doing anything wrong or not because we're near the cutoff for the "how accurate does the filtering have to be?").

Coming up, I'm looking at fixing a cursor updates bug that Michael Zoran has found, and fixing up the DSI panel driver so that it can hopefully get into 4.12.  Also, after a recent discussion with Eben, I've realized that we can actually scan out tiled buffers from the GPU, so we may get big performance wins for un-composited X if I can have Mesa coordinate with the kernel on producing tiled buffers.
February 10, 2017

libinput has a couple of features that 'automagically' work on touchpads such as disable-while-typing and the lid switch triggered disabling of touchpads and disabling the touchpad when an external mouse is plugged in [1]. But not all of these features make sense on all touchpads. For example, an Apple Magic Trackpad doesn't need disable-while-typing because unless you have a creative arrangement of input devices [2], the touchpad won't be where your palm is likely to hit it. Likewise, a Logitech T650 connected over a unifying receiver shouldn't get disabled when the laptop lid closes.

For this to work, libinput has some code to figure out whether a touchpad is internal or external. Initially we had some code to detect this but eventually moved this to the ID_INPUT_TOUCHPAD_INTEGRATION property now set by udev's hwdb (systemd 231 and later). Having it in the hwdb makes it quite trivial to override locally where the current rules are insufficient (and until the hwdb is fixed, thanks for filing a bug). We still have the fallback code though in case the tag is missing. On a sufficiently modern distribution, udevadm info /sys/class/input/event4 for your touchpad device node should show something like ID_INPUT_TOUCHPAD_INTEGRATION=internal.

So for any feature that libinput adds for touchpads, we only enable it where it makes sense. That's why your external touchpad doesn't trigger disable-while-typing or the lid switch.

[1] ok, I admit, this is something we should've left to the client, but now we have the feature.
[2] yes, I'm sure there's at least one person out there that uses the touchpad upside down in front of the keyboard and is now angry that libinput doesn't allow arbitrary rotation of the device combined with configurable dwt. I think of you every night I cry myself to sleep.

February 06, 2017

Last week-end, I was in Brussels, Belgium for the 2017 edition of the FOSDEM, one of the greatest open source developer conference.

This year, I decided to propose a talk about Gnocchi which was accepted in the Python devroom. The track was very wlell organized (thanks to Stéphane Wirtel) and I was able to present Gnocchi to a room full of Python developers!

I've explained why we created Gnocchi and how we did it, and finally briefly explained how to use it with the command-line interface or in a Python application using the SDK.

You can check the slides and the video of the talk.

February 03, 2017

Seems that there was a rift in the spacetime that sucked away the video of my LCA talk, but the awesome NextDayVideo team managed to pull it back out. And there’s still the writeup and slides available.

February 02, 2017

It's always difficult to know when to release, and we really wanted to do it earlier. But it seems that each week more awesome work was being done in Gnocchi, so we kept delaying it while having no pressure to push it out.

A photo posted by Julien Danjou (@juldanjou) on

I've made my own gnocchis to celebrate!

But now that the OpenStack cycle is finishing, even Gnocchi does not strictly follow it, it seemed to be a good time to cut the leash and leave this release be.

There are again some major new changes coming from 3.0. The previous version 3.0 was tagged in October and had 90 changes merged from 13 authors since 2.2. This 3.1 version have 200 changes merged from 24 different authors. This is a great improvement of our contributor base and our rate of change – even if our delay to merge is very low. Once again, we pushed usage of release notes to document user visible changes, and they can be read online.

Therefore, I am going to summary quickly the major changes:

  • The REST API authentication mechanism has been modularized. It's now simple to provide any authentication mechanism for Gnocchi as a plugin. The default is now a HTTP basic authentication mechanism that does not implement any kind of enforcement. The Keystone authentication is still available, obviously.

  • Batching has been improved and can now create metrics on the fly, reducing the latency needed when pushing measures to non-existing metrics. This is leveraged by the collectd-gnoccchi plugin for example.

  • The performance of Carbonara based backend has been largely improved. This is not really listed as a change as it's not user-visible, but an amazing work of profiling and rewriting code from Pandas to NumPy has been done. While Pandas is very developer-friendly and generic, using NumPy directly offers way more performance and should decrease gnocchi-metricd CPU usage by a large factor.

  • The storage has been split into two parts: the storage of incoming new measures to be processed, and the storage and archival of aggregated metrics. This allows to use e.g. file to store new measures being sent, and once processed store them into e.g. Ceph. Before that change, all the new measures had to go into Ceph. While there's no specific driver yet for incoming measures, it's easy to envision a driver for systems like Redis or Memcached.

  • A new Amazon S3 driver has been merged. It works in the same way than the file or OpenStack Swift drivers.

I will write more about some of these new features in the upcoming weeks, as they are very interesting for Gnocchi's users.

We are planning to run a scalability test and benchmarks using the ScaleLab in a few weeks if everything goes as planned. I will obviously share the result here, but we also submitted a talk for the next OpenStack Summit in Boston to present the results of our scalability and performance tests – hoping the session will be accepted.

I will also be talking about Gnocchi this Sunday at FOSDEM.

We don't have a very determined roadmap for Gnocchi during the next weeks. Sure we do have a few ideas on what we want to implement, but we are also very easily influenced by the requests of our user: therefore feel free to ask for anything!

February 01, 2017

I merged a patchset from James Ye today to add support for switch events to libinput, specifically: lid switch events. This feature is scheduled for libinput 1.7.

First, what are switches and how are they different so keys? A key's state is transient with a neutral state of "key is up". The state itself is expected to change frequently. Switches don't always have a defined logical neutral state and the state changes only infrequently. This requires different handling in applications and thus libinput exposes a new interface (and capability) for switches.

The interface itself is trivial. A switch event has two properties, the switch type (e.g. "lid") and the switch state (on/off). See the libinput-debug-events source code for a simple code to print the state and type.

In libinput, we generally try to restrict ourselves to the cases we know how to handle. So in the first iteration, we'll support a single switch event: the lid switch. This is the toggle that changes when you close the lid on your laptop.

But libinput uses this internally too: touchpads are disabled automatically whenever the lid is closed. Indeed, this functionally was the main motivation for this patchset. On a number of devices, we get ghost touches when the lid is closed. Even though the touchpad is unreachable by the user interference with the screen still causes events, moving the pointer in unexpected ways and generally being a nuisance. Some trackpoints suffer from the same issue. But now that libinput knows about the lid switch it can transparently disable the touchpad whenever the lid is closed and thus discard the events.

Lid switches on some devices are unreliable. There are some devices where the lid is permanently closed and other devices where the lid can be closed, but we'll never see the open event. So we change behaviour based on a few factors. After all, no-one likes a dysfunctional touchpad because the lid switch is broken (if you do, seek help). For one, whenever we detect keyboard events while in logically closed state we'll assume that the lid is open after all and adjust state accordingly. Unless the lid switch is reliable, we don't sync the initial state. That's annoying for those who start libinput in closed mode, but it filters out all devices that set the lid switch to "on" and then never change again. On the Surface 3 devices we go even further: we know those devices needs a bit of hand-holding. So whenever we detect activity on the keyboard, we also write the EV_SW/SW_LID state to the device node, thus updating the kernel to be correct again (and thus help everyone else who may be listening).

The exact behaviours will likely change slightly over time as we have to deal with corner-cases one-by-one. But meanwhile, it's even easier for compositors to listen to switch events and users don't have to deal with ghost touches anymore. Many thanks to James Ye for implementing this.

January 30, 2017
Most of last week was spent switching my development environment over to a setup with no SD cards involved at all.  This was triggered by yet another card failing, and I spent a couple of days off and on trying to recover it.  I now have three scripts that build and swap my test environment between upstream 64-bit Pi3, upstream 32-bit Pi3, and downstream 32-bit Pi3, using just the closed source bootloader without u-boot or an SD card.  Previously I was on Pi2 only (much slower for testing), and running downstream kernels was really difficult.

Once I got the new netboot system working, I tested and landed the NEON part of my tiling work (the big 208% download and 41% upload performance improvements).  I'm looking forward to fixing up the clever tiling math parts soon.

I also tested and landed a few little compiler improvements. The nicest compiler improvement was turning on a switch that Marek added to gallium: We now run the GLSL compiler optimization loop exactly once (because it's required to avoid GLSL linker regressions), and rely on NIR to do the actual optimization after the GLSL linker has run.  The GLSL IR is a terrible IR for doing optimization on (only a bit better than Mesa or TGSI IRs), and it's made worse by the fact that I wrote a bunch of its optimizations back before we had good datastructures available in Mesa and before we had big enough shaders that using good datastructures mattered.  I'm horrified by my old code and can't wait to get it deleted (Timothy Arceri has been making progress on that front).  Until we can actually delete it, though, cutting down the number of times we execute my old optimization passes should improve our compile times on complicated shaders.

Now that I have a good way to test the downstream kernel, I went ahead and made a giant backport of our current vc4 kernel code to the 4.9 branch.  I hope the Foundation can get that branch shipped soon -- backporting to 4.9 is so much easier for me than old 4.4, and the structure of the downstream DT files makes it much clearer what there is left to be moved upstream.

Meanwhile, Michael Zoran has continued hacking on the staging VCHI code, and kernel reviewers were getting bothered by edits to code with no callers.  Michael decided to solve that by submitting the old HDMI audio driver that feeds through the closed source firmware (I'm hoping we can delete this soon once Boris's work lands, though), and I pulled the V4L2 camera driver out of rpi-4.9.y and submitted that to staging as well.  I unfortunately don't have the camera driver quite working yet, because when I modprobe it the network goes down.  There are a ton of variables that have changed since the last time I ran the camera (upstream vs downstream, 4.10 vs 4.4, pi3 vs pi2), so it's going to take a bit of debugging before I have it working again.

Other news: kraxel from RH has resubmitted the SDHOST driver upstream, so maybe we can have wifi by default soon.  Baruch Siach has submitted some fixes that I suspect get BT working.  I've also passed libepoxy off to Emmanuele Bassi (long time GNOME developer) who has fixed it up to be buildable and usable on Linux again and converted it to the Meson build system, which appears to be really promising.

In order to read events and modify devices, libinput needs a file descriptor to the /dev/input/event node. But those files are only accessible by the root user. If libinput were to open these directly, we would force any process that uses libinput to have sufficient privileges to open those files. But these days everyone tries to reduce a processes privileges wherever possible, so libinput simply delegates opening and closing the file descriptors to the caller.

The functions to create a libinput context take a parameter of type struct libinput_interface. This is an non-opaque struct with two function pointers: "open_restricted" and "close_restricted". Whenever libinput needs to open or close a file, it calls the respective function. For open_restricted() libinput expects the caller to return an fd with the given flags.

In the simplest case, a caller can merely call open() and close(). This is what the debugging tools do (and the test suite). But obviously this means you have to run those as root. The main wayland compositors (weston, mutter, kwin, ...) instead forward the request to systemd-logind. That then opens the event node and returns the fd which is passed to libinput. And voila, the compositors don't need to run as root, libinput doesn't have to know how the fd is opened and everybody wins. Plus, logind will mute the fd on VT-switch, so we can't leak keyboard events.

In the case it's a combination of the two. When the server runs with systemd-logind enabled, it will open the fd before the driver initialises the device. During the init stage, libinput asks the xf86-input-libinput driver to open the device node. The driver forwards the request to the server which simply returns the already-open fd. When the server runs without systemd-logind, the server opens the file normally with a standard open() call.

So in summary: you can easily run libinput without systemd-logind but you'll have to figure out how to get the required privileges to open device nodes. For anything more than a test or debugging program, I recommend using systemd-logind.

January 27, 2017

We're in the middle of the 1.7 development cycle and one of the features merged already is support for "wheel tilt", i.e. support for devices that don't have a separate horizontal wheel but instead rely on a tilt motion for horizontal event. Now, the way this is handled in the kernel is that the events are sent via REL_WHEEL (or REL_DIAL) so we don't actually need special code in libinput to handle tilt. But libinput tries to to make sense of input devices so the upper layers have a reliable base to build on - and that's why we need tilt-wheels to be handled.

For 'pointer axis' events (i.e. scroll events) libinput provides scroll sources. These specify how the scroll event was generated, allowing a caller to handle things accordingly. A finger-based scroll for example can trigger kinetic scrolling while a mouse wheel would not usually do so. The value for a pointer axis is also dependent on the scroll source - for continuous/finger based scrolling the value is in pixels. For a mouse wheel, the value is in degrees. This obviously doesn't work for a tilt event because degrees don't make sense in this context. So the new axis source is just that, an indicator that the event was caused by a wheel tilt rather than a rotation. Its value matches the default wheel rotation (i.e. 15 degrees) just to make use of it easier.

Of course, a device won't tell us whether it provides a proper wheel or just tilt. So we need a hwdb property and I've added that to systemd's repo. To make this work, set the MOUSE_WHEEL_TILT_HORIZONTAL and/or MOUSE_WHEEL_TILT_VERTICAL property on your hardware and you're off. Yay.

Patches for the wayland protocol have been merged as well, so this is/will be available to wayland clients.

January 24, 2017
I spent last week at 2017 (the best open source conference), mostly talking to people and going to talks instead of acually hacking on vc4.  I'd like to highlight a few talks you could go watch if you weren't there.

Jack Moffit gave an overview of what Servo is doing with Rust on parallel processing and process isolation in web engines.  I've been hacking on this project in my spare time, and the community is delightful to work with.  Check out his talk to see what I've been excited about recently.

I then spent the next session in the hallway chatting with him instead of going to Willy's excellent talk about how the Linux page cache works.  One of the subjects of that talk makes me wonder: I've got these contiguous regions of GPU memory on VC4, could I use huge pages when we're mapping them for uploads to the GPU?  Would they help enough to matter?

Dave Airlie gave a talk about the structure of the Vulkan drivers and his work with Bas on the Radeon Vulkan driver.  It's awesome how quickly these two have been able to turn around a complete Vulkan driver, and it really validates our open source process and the stack we've built in Mesa.

Nathan Egge from Mozilla gave a talk on accelerating JPEG decode using the GPU.  I thought it was going to be harder to do than he made it out to be, so it was pretty cool to see the performance improvements he got.  The only sad part was that it isn't integrated into WebRender yet as far as I can see.

Karen Sandler talked about the issue of what happens to our copyrights when we die, from experience in working with current project maintainers on relicensing efforts.  I think this is a really important talk for anyone who writes copyleft software.  It actually softened my stance on CLAs (!) but is mostly a call to action for "Nobody has a good process yet for dealing with our copyrights after death, let's work on this."

And, finally, I think the most important talk of the conference to me was Daniel Vetter's Maintainers Don't Scale (no video yet, just a blog post).  The proliferation of boutique trees in the kernel with "overworked" maintainers either ignoring or endlessly bikeshedding the code of outside contributors is why I would never work on the Linux kernel if I wasn't paid to do so.  Daniel actually offers a way forward from the current state of maintainership in the kernel, with group maintainership (a familiar model in healthy large projects) to replace DRM's boutique trees and a reinforcement of the cooperative values that I think are why we all work on free software.  I would love to be able to shut down my boutique DRM and ARM platform trees.

In VC4, I did still manage to write a couple of kernel fixes in response to a security report, and rebase DSI on 4.10 and respin some of its patches for resubmission.  I'm still hoping for DSI in 4.11, but as usual it depends on DT maintainer review (no response for over a month).  I also spent a lot of time with Keith talking about ARM cache coherency questions and trying experiments on my NEON tiling code.

While I was mostly gone last week, though, Boris has been working on finish off my HDMI audio branch.  He now has a branch that has a seriously simplified architecture compared to what I'd talked about with larsc originally (*so* much cleaner than what I was modeling on), and he has it actually playing some staticy audio.  He's now debugging what's causing the static, including comparing dumps of raw audio between VC4 and the firmware, along with the register settings of the HDMI and DMA engines.
January 20, 2017

This is the write-up of my talk at LCA 2017 in Hobart. It’s not exactly the same, because this is a blog and not a talk, but the same contents. The slides for the talk are here, and I will link to the video as soon as it is available. Update: Video is now uploaded.

Linux Kernel Maintainers

First let’s look at how the kernel community works, and how a change gets merged into Linus Torvalds’ repository. Changes are submitted as patches to mailing list, then get some review and eventually get applied by a maintainer to that maintainer’s git tree. Each maintainer then sends pull request, often directly to Linus. With a few big subsystems (networking, graphics and ARM-SoC are the major ones) there’s a second or third level of sub-maintainers in. 80% of the patches get merged this way, only 20% are committed by a maintainer directly.

Most maintainers are just that, a single person, and often responsible for a bunch of different areas in the kernel with corresponding different git branches and repositories. To my knowledge there are only three subsystems that have embraced group maintainership models of different kinds: TIP (x86 and core kernel), ARM-SoC and the graphics subsystem (DRM).

The radical change, at least for the kernel community, that we implemented over a year ago for the Intel graphics driver is to hand out commit rights to all regular contributors. Currently there are 19 people with commit rights to the drm-intel repository. In the first year of ramp-up 70% of all patches are now committed directly by their authors, a big change compared to how things worked before, and still work everywhere else outside of the graphics subsystem. More recently we also started to manage the drm-misc tree for subsystem wide refactorings and core changes in the same way.

I’ve covered the details of the new process in my Kernel Recipes talk “Maintainers Don’t Scale”, and LWN has covered that, and a few other talks, in their article on linux kernel maintainer scalability. I also covered this topic at the kernel summit, again LWN covered the group maintainership discussion. I don’t want to go into more detail here, mostly because we’re still learning, too, and not really experts on commit rights for everyone and what it takes to make this work well. If you want to enjoy what a community does who really has this all figured out, watch Emily Dunham’s talk “Life is better with Rust’s community automation” from last year’s LCA.

What we are experts on is the Linux Kernel’s maintainer model - we’ve run things for years with the traditional model, both as single maintainers and small groups, and now gained the outside perspective by switching to something completely different. Personally, I’ve come to believe that the maintainer model as implemented by the kernel community just doesn’t scale. Not in the technical sense of big-O scalability, because obviously the kernel community scales to a rather big size. Much larger organizations, entire states are organized in a hierarchical way, the kernel maintainer hierarchy is not anything special. Besides that, git was developed specifically to support the Linux maintainer hierarchy, and git won. Clearly, the linux maintainer model scales to big numbers of contributors. Where I think it falls short is the constant factor of how efficiently contributions are reviewed and merged, especially for non-maintainer contributors. Which do 80% of all patches.

Cult of Busy

The first issue that routinely comes out when talking about maintainer topics is that everyone is overloaded. There’s a pervasive spirit in our industry (especially in the US) hailing overworked engineers as heroes, with an entire “cult of busy” around. If you have time, you’re a slacker and probably not worth it. Of course this doesn’t help when being a maintainer, but I don’t believe it’s a cause of why the Linux maintainer model doesn’t work. This cult of busy leads to burnout, which is in my opinion a prime risk when you’re an open source person. Personally I’ve gone through a few difficult phases until I understood my limits and respected them. When you start as a maintainer for 2-3 people, and it increases to a few dozen within a couple of years, then getting a bit overloaded is rather natural - it’s a new job, with a different set of responsibilities and I had no clue about a lot of things. That’s no different from suddenly being a leader of a much bigger team anywhere else. A great talk on this topic is “What part of “… for life” don’t you understand?” from Jacob Kaplan-Moss since it’s by a former maintainer. It also contains a bunch of links to talks on burnout specifically. Ignoring burnout is not healthy, or not knowing about the early warning signs, it is rampant in our communities, but for now I’ll leave it at that.

Boutique Trees and Bus Factors

The first issue I see is how maintainers usually are made: You scratch an itch somewhere, write a bit of code, suddenly a few more people find it useful, and “tag” you’re the maintainer. On top, you often end up being stuck in that position “for life”. If the community keeps growing, or your maintainer becomes otherwise busy with work&life, you have your standard-issue overloaded bottleneck.

That’s the point where I think the kernel community goes wrong. When other projects reach this point they start to build up a more formal community structure, with specialized roles, boards for review and other bits and pieces. One of the oldest, and probably most notorious, is Debian with its constitution. Of course a small project doesn’t need such elaborate structures. But if the goal is world domination, or at least creating something lasting, it helps when there’s solid institutions that cope with people turnover. At first just documenting processes and roles properly goes a long way, long before bylaws and codified decision processes are needed.

The kernel community, at least on the maintainer side, entirely lacks this.

What instead most often happens is that a new set of ad-hoc, chosen-by-default maintainers start to crop up in a new level of the hierarchy, below your overload bottleneck. Because becoming your own maintainer is the only way to help out and to get your own features merged. That only perpetuates the problem, since the new maintainers are as likely to be otherwise busy, or occupied with plenty of other kernel parts already. If things go well that area becomes big, and you have another git tree with another overloaded maintainer. More often than not people move around, and accumulate small bits allover under their maintainership. And then the cycle repeats.

The end result is a forest of boutique trees, each covering a tiny part of the project, maintained by a bunch of notoriously overloaded people. The resulting cross-tree coordination issues are pretty impressive - in the graphics subsystem we fairly often end up with with simple drivers that somehow need prep patches in 5 different trees before you can even land that simple driver in the graphics tree.

Unfortunately that’s not the bad part. Because these maintainers are all busy with other trees, or their work, or life in general, you’re guaranteed that one of them is not available at any given time. Worse, because their tree has relatively little activity because it covers a small area, many only pick up patches once per kernel release, which means a built-in 3 month delay. That’s all because each tree and area has just one maintainer. In the end you don’t even need the proverbial bus to hit anyone to feel the pain of having a single point of failure in your organization - there’s so many maintainer trees around that that absence always happens, and constantly.

Of course people get fed up trying to get features merged, and often the fix is trying to become a maintainer yourself. That takes a while and isn’t easy - only 20% of all patches are authored by maintainers - and after the new code landed it makes it all worse: Now there’s one more semi-absent maintainer with one more boutique tree, adding to all the existing troubles.

Checks and Balances

All patches merged into the Linux kernel are supposed to be reviewed, and rather often that review is only done by the maintainers who merges the patch. When maintainers send out pull requests the next level of maintainers then reviews those patch piles, until they land in Linus’ tree. That’s an organization where control flows entirely top-down, with no checks and balances to reign in maintainers who are not serving their contributors well. History of dicatorships tells us that despite best intentions, the end result tends to heavily favour the few over the many. As a crude measure for how much maintainers subject themselves to some checks&balances by their peers and contributors I looked at how many patches authored and committed by the same person (probably a maintainer) do not also carry a reviewed or acked tag. For the Intel driver that’s less than 3%. But even within the core graphics code it’s only 5%, and that covers the time before we started to experiment with commit rights for that area. And for the graphics subsystem overall the ratio is still only about 25%, including a lot of drivers with essentially just one contributor, who is always volunteered as the maintainer, and hence somewhat natural that those maintainers lack reviewers.

Outside of graphics only roughly 25% of all patches written by maintainers are reviewed by their peers - 75% of all maintainer patches lack any kind of recorded peer review, compared to just 25% for graphics alone. And even looking at core areas like kernel/ or mm/ the ratio is only marginally better at about 30%. In short, in the kernel at large, peer review of maintainers isn’t the norm.

And there’s nothing outside of the maintainer hierarchy that could provide some checks and balance either. The only way to escalate disagreement is by starting a revolution, and revolutions tend to be long, drawn-out struggles and generally not worth it. Even Debian only recently learned that they lack a way to depose maintainers, and that maybe going maintainerless would be easier (again, LWN has you covered).

Of course the kernel is not the only hierarchy where there’s no meaningful checks and balances. Professor at universities, and managers at work are in a fairly similar position, with minimal options for students or employers to meaningfully appeal decisions. But that’s a recognized problem, and at least somewhat countered by providing ways to provide anonymous feedback, often through regular surveys. The results tend to not be all that significant, but at least provide some control and accountability to the wider masses of first-level dwellers in the hierarchy. In the kernel that amounts to about 80% of all contributions, but there’s no such survey. On the contrary, feedback sessions about maintainer happiness only reinforce the control structure, with e.g. the kernel summit featuring an “Is Linus happy?” session each year.

Another closely related aspect to all this is how a project handles personal conflicts between contributors. For a very long time Linux didn’t have any formal structures in this area either, with the only options available to unhappy people to either take it or leave it. Well, or usurping a maintainer with a small revolution, but that’s not really an option. For two years we’ve now had the “Code of Conflict”, which de facto just throws up its hands and declares that conflict are the normal outcome, essentially just encoding the status quo. Refusing to handle conflicts in a project with thousands of contributors just doesn’t work, except that it results in lots of frustration and ultimately people trying to get away. Again, the lack of a poised board to enforce a strong code of conduct, independent of the maintainer hierarchy, is in line with the kernel community unwillingness to accept checks and balances.

Mesh vs. Hierarchy

The last big issue I see with the Linux kernel model, featuring lots of boutique trees and overloaded maintainer, is that it seems to harm collaboration and integration of new contributors. In the Intel graphics, driver maintainers only ever reviewed a small minority of all patches over the last few years, with the goal to foster direct collaboration between contributors. Still, when a patch was stuck, maintainers were the first point of contact, especially, but not only, for newer contributors. No amount of explaining that only the lack of agreement with the reviewer was the gating factor could persuade people to fully collaborate on code reviews and rework the code, tests and documentation as needed. Especially when they’re coming with previous experience where code review is more of a rubber-stamp step compared to the distributed and asynchronous pair-programming it often resembles in open-source. Instead, new contributors often just ended up falling back to pinging maintainers to make a decision or just merge the patches as-is.

Giving all regular contributors commit rights and fully trusting them to do the right thing entirely fixed that: If the reviewer or author have commit rights there’s no easy excuse anymore to involve maintainers when the author and reviewer can’t reach agreement. Of course that requires a lot of work in mentoring people, making sure requirements for merging are understood and documented, and automating as much as possible to avoid screw ups. I think maintainers who lament their lack of review bandwidth, but also state they can’t trust anyone else aren’t really doing their jobs.

At least for me, review isn’t just about ensuring good code quality, but also about diffusing knowledge and improving understanding. At first there’s maybe one person, the author (and that’s not a given), understanding the code. After good review there should be at least two people who fully understand it, including corner cases. And that’s also why I think that group maintainership is the only way to run any project with more than one regular contributor.

On the topic of patch review and maintainers, there’s also the habit of wholesale rewrites of patches written by others. If you want others to contribute to your project, then that means you need to accept other styles and can’t enforce your own all the time. Merging first and polishing later recognizes new contributions, and if you engage newcomers for the polish work they tend to stick around more often. And even when a patch really needs to be reworked before merging it’s better to ask the author to do it: Worst case they don’t have time, best case you’ve improved your documentation and training procedure and maybe gained a new regular contributor on top.

A great take on the consequences of having fixed roles instead of trying to spread responsibilities more evenly is Alice Goldfuss’ talk “Rock Stars, Builders, and Janitors: You’re doing it wrong”. I also think that rigid roles present a bigger bar for people with different backgrounds, hampering diversity efforts and in the spirit of Sarah Sharps post on what makes a good community, need to be fixed first.

Towards a Maintainer’s Manifest

I think what’s needed in the end is some guidelines and discussions about what a maintainer is, and what a maintainer does. We have ready-made licenses to avoid havoc, there’s code of conducts to copypaste and implement, handbooks for building communities, and for all of these things, lots of conferences. Maintainer on the other hand you become by accident, as a default. And then everyone gets to learn how to do it on their own, while hopefully not burning too many bridges - at least I myself was rather lost on that journey at times. I’d like to conclude with a draft on a maintainer’s manifest.

It’s About the People

If you’re maintainer of a project or code area with a bunch of full time contributors (or even a lot of drive-by contributions) then primarily you deal with people. Insisting that you’re only a technical leader just means you don’t acknowledge what your true role really is.

And then, trust them to do a good job, and recognize them for the work they’re doing. The important part is to trust people just a bit more than what they’re ready for, as the occasional challenge, but not too much that they’re bound to fail. In short, give them the keys and hope they don’t wreck the car too badly, but in all cases have insurance ready. And insurance for software is dirt cheap, generally a git revert and the maintainer profusely apologizing to everyone and taking the blame is all it takes.

Recognize Your Power

You’re a maintainer, and you have essentially absolute power over what happens to your code. For successful projects that means you can unleash a lot of harm on people who for better or worse are employed to deal with you. One of the things that annoy me the most is when maintainers engage in petty status fights against subordinates, thinly veiled as technical discussions - you end up looking silly, and it just pisses everyone off. Instead recognize your powers, try to stay on the good side of the force and make sure you share it sufficiently with the contributors of your project.

Accept Your Limits

At the beginning you’re responsible for everything, and for a one-person project that’s all fine. But eventually the project grows too much and you’ll just become a dictator, and then failure is all but assured because we’re all human. Recognize what you don’t do well, build institutions to replace you. Recognize that the responsibility you initially took on might not be the same as that which you’ll end up with and either accept it, or move on. And do all that before you start burning out.

Be a Steward, Not a Lord

I think one of key advantages of open source is that people stick around for a very long time. Even when they switch jobs or move around. Maybe the usual “for life” qualifier isn’t really a great choice, since it sounds more like a mandatory sentence than something done by choice. What I object to is the “dictator” part, since if your goal is to grow a great community and maybe reach world domination, then you as the maintainer need to serve that community. And not that the community serves you.

Thanks a lot to Ben Widawsky, Daniel Stone, Eric Anholt, Jani Nikula, Karen Sandler, Kimmo Nikkanen and Laurent Pinchart for reading and commenting on drafts of this text.

January 17, 2017
In this blog post I promised I would get back to people who want to use the nvidia driver on an optimus laptop.

The set of xserver patches I blogged about last time have landed upstream and in Fedora 25 (in xorg-x11-server 1.19.0-3 and newer), allowing the nvidia driver packages to drop a xorg.conf snippet which will make the driver atuomatically work on optimus setups.

The nvidia packages now are using this, so if you install these, then the nvidia driver should just work on your laptop.

Note that you should only install these drivers if you actually have a supported (new enough) nvidia GPU. These drivers replace the libGL implementation, so installing them on a system without a nvidia GPU will cause things to break. This will be fixed soon by switching to libglvnd as the official libGL provider and having both mesa and the nvidia driver provide "plugins" for libglvnd. I've actually just completed building a new enough libglvnd + libglvnd enabled mesa for rawhide, so rawhide users will have libglvnd starting tomorrow.
January 16, 2017

My day-to-day activities are still evolving around the Python programming language, as I continue working on the OpenStack project as part of my job at Red Hat. OpenStack is still the biggest Python project out there, and attract a lot of Python hackers.

Those last few years, however, things have taken a different turn for me when I made the choice with my team to rework the telemetry stack architecture. We decided to make a point of making it scale way beyond what has been done in the project so far.

I started to dig into a lot of different fields around Python. Topics you don't often look at when writing a simple and straight-forward application. It turns out that writing scalable applications in Python is not impossible, nor that difficult. There are a few hiccups to avoid, and various tools that can help, but it really is possible – without switching to another whole language, framework, or exotic tool set.

Working on those projects seemed to me like a good opportunity to share with the rest of the world what I learned. Therefore, I decided to share my most recent knowledge addition around distributed and scalable Python application in a new book, entitled The Hacker's Guide to Scaling Python (or Scaling Python, in short). The book should be released in a few months – fingers crossed.

And as the book is still a work-in-progress, I'll be happy to hear any remark, subject, interrogation or topic idea you might have or any particular angle you would like me to take in this book (reply in the comments section or shoot me an email). And if you'd like to get be kept updated on this book advancement, you can subscribe in the following form or from the book homepage.

The adventure of working on my previous book, The Hacker's Guide to Python, has been so tremendous and the feedback so great, that I'm looking forward releasing this new book later this year!

Last week I was on vacation, but the week before that I did some more work on figuring out Intel's Mesa CI system, and Mark Janes has started working on some better documentation for it.  I now understand better how the setup is going to work, but haven't made much progress on actually getting a master running yet.

More fun, though, was finally taking a look at optimizing the tiled texture load/store code.  This got started with Patrick Walton tweeting a link to a blog post on clever math for implementing texture tiling, given a couple of assumptions.

As with all GPUs these days, VC4 swizzles textures into a tiled layout so that when a cacheline is loaded from memory, it will cover nearby pixels in the Y direction as well as X, so that you are more likely to get cache hits for your neighboring pixels in drawing.  And the tiling tends to be multiple levels, so that nearby cachelines are also nearby on the screen, reducing DRAM access latency.

For small textures on VC4, we have a single level of tiling: 4x4@32bpp blocks ("utiles") of raster order data, themselves arranged in up to a 4x4 block, and this mode is called LT.  Once things go over that size, we go to T tiling, where we call a 4x4 LT block a subtile, and arrange subtiles either clockwise or counterclockwise order within a 2x2 block (a tile, which at this point is now 1kb of data), and tiles themselves are arranged left-to-right, then right-to-left, then left-to-right again.

The first thing I did was implement the blog post's clever math for LT textures.  One of the nice features of the math trick is that it means I can do partial utile updates finally, because it skips around in the GPU's address space based on the CPU-side pixel coordinate instead of trying to go through GPU address space to update a 4x4 utile at a time.  The downside of doing things this way is that jumping around in GPU address space means that our writes are unlikely to benefit from write combining, which is usually important for getting full write performance to GPU memory.  It turned out, though, that the math was so much more efficient than what I was doing that it was still a win.

However, I found out that the clever math made reads slower.  The problem is that, because we're write-combined, reads are uncached -- each load is a separate bus transaction to main memory.  Switching from my old utile-at-a-time load using memcpy to the new code meant that instead of doing 4 loads using NEON to grab a row at a time, we were now doing 16 loads of 32 bits at a time, for what added up to a 30% performance hit.

Reads *shouldn't* be something that people do much, except that we still have some software fallbacks in X on VC4 ("core" unantialiased text rendering, for example, which I need to write the GLSL 1.20 shaders for), which involve doing a read/modify/write cycle on a texture.  My first attempt at fixing the regression was just adding back a fast path that operates on a utile at a time if things are nicely utile-aligned (they generally are).  However, some forced inlining of functions per cpp that I was doing for the unaligned case meant that the glibc memcpy call now got inlined back to being non-NEON, and the "fast" utile code ended up not helping loads.

Relying on details of glibc's implementation (their tradeoff for when to do NEON loads) and of gcc's implementation (when to go from memcpy calls to inlined 32-bits-at-a-time code) seems like a pretty bad idea, so I decided to finally write the NEON acceleration that Eben and I have talked about several times.

My first hope was that I could load a full cacheline with NEON's VLD.  VLD1 only loads up to 1 "quadword" (16 bytes) at a time, so that doesn't seem like much help.  VLD4 can load 64 bytes like we want, but it also turns AOS data into SOA in the process, and there's no corresponding "SOA-back-to-AOS store 8 or 16 bytes at a time" like we need to do to get things back into the CPU's strided representation.  I tried VLD4+VST4 into a temporary, then doing my old untiling path on the cached temporary, but that still left me a few percent slower on loads than not doing any of this work at all.

Finally, I hit on using the VLDM instruction.  It seems to be intended for stack loads/stores, but we can also use it to get 64 bytes of data in from memory untouched into NEON registers, and then I can use 4 (32bpp) or 8 (8 or 16bpp) VST1s to store it to the CPU side.  With this, we get a 208.256% +/- 7.07029% (n=10) improvement to GetTexImage performance at 1024x1024.  Doing the same NEON code for stores gave a 41.2371% +/- 3.52799% (n=10) improvement, probably mostly due to not calling into memcpy and having it go through its size/alignment-based memcpy path choosing process.

I'm not yet hitting full memory bandwidth, but this should be a noticeable improvement to X, and it'll probably help my piglit test suite runtime as well.  Hopefully I'll get the code polished up and landed next week when I get back from LCA.
January 11, 2017
A while back Debian has switched to using the modesetting Xorg driver rather then the intel Xorg driver for Intel GPUs.

There are several good reasons for this, rather then repeating them I'm just going to point to the Debian announcement.

This blog post is to let all Fedora users know that starting with Fedora-26 / rawhide as of today, we are making the same change.

Note that the xorg-x11-drv-intel package has already been carrying a Fedora patch to not bind to the GPU on Skylake or newer, even before Debian announced this, this just makes the same change for older Intel GPUs.

For people who are using the now default GNOME3 on Wayland session, nothing changes, since Xwayland always uses glamor for X acceleration, just like the modesetting driver.

If you encounter any issues causes by this change, please file a bug in bugzilla.

The "for all recent Intel GPUs" in the subject of this blog post in practice means that we're making this change for gen4 and newer Intel GPUs.
January 09, 2017

Since beginning last year, our team at Igalia has been involved in enabling ARB_gpu_shader_fp64 extension to different Intel generations: first Broadwell and later, then Haswell, now IvyBridge (still under review); so working on supporting Vulkan’s Float64 capability into Mesa was natural.

Since beginning last year, our team at Igalia has been involved in enabling ARB_gpu_shader_fp64 extension to different Intel generations: first Broadwell and later, then Haswell, now IvyBridge (still under review); so working on supporting Vulkan’s Float64 capability into Mesa was natural.

January 07, 2017


I’ve been lately working on integrating ModemManager in OpenWRT, in order to provide a unique and consolidated way to configure and manage mobile broadband modems (2G, 3G, 4G, Iridium…), all working with netifd.

OpenWRT already has some support for a lot of the devices that ModemManager is able to manage (e.g. through the uqmi, umbim or wwan packages), but unlike the current solutions, ModemManager doesn’t require protocol-specific configurations or setups for the different devices; i.e. the configuration for a modem running in MBIM mode may be the same one as the configuration for a modem requiring AT commands and a PPP session.

Currently the OpenWRT package prepared is based on ModemManager git master, and therefore it supports: QMI modems (including the new MC74XX series which are raw-ip only and don’t support DMS UIM operations), MBIM modems, devices requiring QMI over MBIM operations (e.g. FCC auth), and of course generic AT+PPP based modems, Cinterion, Huawei (both AT+PPP and AT+NDISDUP), Icera, Haier, Linktop, Longcheer, Ericsson MBM, Motorola, Nokia, Novatel, Option (AT+PPP and HSO), Pantech, Samsung, Sierra Wireless (AT+PPP and DirectIP), Simtech, Telit, u-blox, Wavecom, ZTE… and even Iridium and Thuraya satellite modems. All with the same configuration.

Along with ModemManager itself, the OpenWRT feed also contains libqmi and libmbim, which provide the qmicli, mbimcli, and soon the qmi-firmware-update utilities. Note that you can also use these command line tools, even if ModemManager is running, via the qmi-proxy and mbim-proxy setups (i.e. just adding -p to the qmicli or mbimcli commands).

This is not the first time I’ve tried to do this; but this time I believe it is a much more complete setup and likely ready for others to play with it. You can jump to the modemmanager-openwrt bitbucket repository and follow the instructions to include it in your OpenWRT builds:

The following sections try to get into a bit more detail of which were the changes required to make all this work.

And of course, thanks to VeloCloud for sponsoring the development of the latest ModemManager features that made this integration possible 🙂

udev vs hotplug

One of the latest biggest features merged in ModemManager was the possibility to run without udev support; i.e. without automatically monitoring the device addition and removals happening in the system.

Instead of using udev, the mmcli command line tool ended up with a new --report-kernel-event that can be used to report the device addition and removals manually, e.g.:

$ mmcli --report-kernel-event="action=add,subsystem=tty,name=ttyUSB0"
$ mmcli --report-kernel-event="action=add,subsystem=net,name=wwan0"

This new way of notifying device events made it very easy to integrate the automatic device discovery supported in ModemManager directly via tty and net hotplug scripts (see mm_report_event()).

With the integration in the hotplug scripts, ModemManager will automatically detect and probe the different ports exposed by the broadband modem devices.

udev rules

ModemManager relies on udev rules for different things:

  • Blacklisting devices: E.g. we don’t want ModemManager to claim and probe the TTYs exposed by Arduinos or braille displays. The package includes a USB vid:pid based blacklist of devices that expose TTY ports and are not modems to be managed by ModemManager.
  • Blacklisting ports: There are cases where we don’t want the automatic logic selection to grab and use some specific modem ports, so the package also provides a much shorter list of ports blacklisted from actual modem devices. E.g. the QMI implementation in some ZTE devices is so poor that we decided to completely skip it and fallback to AT+PPP.
  • Greylisting USB serial adapters: The TTY ports exposed by USB serial adapters aren’t probed automatically, as we don’t know what’s connected in the serial side. If we want to have a serial modem, though, the mmcli --scan-modems operation may be executed, which will include the probing of these greylisted devices.
  • Specifying port type hints: Some devices expose multiple AT ports, but with different purposes. E.g. a modem may expose a port for AT control and another port for the actual PPP session, and choosing the wrong one will not work. ModemManager includes a list of port type hints so that the automatic selection of which port is for what purpose is done transparently.

As we’re not using udev when running in OpenWRT, ModemManager includes now a custom generic udev rules parser that uses sysfs properties to process and apply the rules.

procd based startup

The ModemManager daemon is setup to be started and controlled via procd. The init script controlling the startup will also take care of scheduling the re-play of the hotplug events that had earlier triggered --report-kernel-event actions (they’re cached in /tmp); e.g. to cope with events coming before the daemon started or to handle daemon restarts gracefully.


Well, no, I didn’t port ModemManager to use ubus 🙂 If you want to run ModemManager under OpenWRT you’ll also need to have the DBus daemon running.

netifd protocol handler

When using ModemManager, the user shouldn’t need to know the peculiarities of the modem being used: all modems and protocols (QMI, MBIM, Generic AT, vendor-specific AT…) are all managed via the same single DBus interfaces. All the modem control commands are internal to ModemManager, and the only additional considerations needed are related to how to setup the network interface once the modem is connected, e.g.:

  • PPP: some modems require a PPP session over a serial port.
  • Static: some modems require static IP configuration on a network interface.
  • DHCP: some modems require dynamic IP configuration on a network interface.

The OpenWRT package for ModemManager includes a custom protocol handler that enables the modemmanager protocol to be used when configuring network interfaces. This new protocol handler takes care of configuring and bringing up the interfaces as required when the modem gets into connected state.

Example configuration

The following snippet shows an example interface configuration to set in /etc/config/network.

 config interface 'broadband'
   option device '/sys/devices/platform/soc/20980000.usb/usb1/1-1/1-1.2/1-1.2.1'
   option proto 'modemmanager'
   option apn ''
   option username 'vodafone'
   option password 'vodafone'
   option pincode '7423'
   option lowpower '1'

The settings currently supported are the following ones:

  • device: The full sysfs path of the broadband modem device needs to be configured. Relying on the interface names exposed by the kernel is never a good idea, as these may change e.g. across reboots or when more than one modem device is available in the system.
  • proto: As said earlier, the new modemmanager protocol needs to be configured.
  • apn: If the connection requires an APN, the APN to use.
  • username: If the access point requires authentication, the username to use.
  • password: If the access point requires authentication, the password to use.
  • pincode: If the SIM card requires a PIN, the code to use to unlock it.
  • lowpower: If enabled, this setting will request the modem to go into low-power state (i.e. IMSI detach and RF off) when the interface is disconnected.

As you can see, the configuration can be used for any kind of modem device, regardless of what control protocol it uses, which interfaces are exposed, or how the connection is established. The settings are currently only IPv4 only, but adding IPv6 support shouldn’t be a big issue, patches welcome 🙂


The main purpose of using a mobile broadband modem is of course the connectivity itself, but it also may provide many more features. ModemManager provides specific interfaces and mmcli actions for the secondary features which are also available in the OpenWRT integration, including:

  • SMS messaging (both 3GPP and 3GPP2).
  • Location information (3GPP LAC/CID, CDMA Base station, GPS…).
  • Time information (as reported by the operator).
  • 3GPP USSD operations (e.g. to query prepaid balance to the operator).
  • Extended signal quality information (RSSI, Ec/Io, LTE RSRQ and RSRP…).
  • OMA device management operations (e.g. to activate CDMA devices).
  • Voice call control.

Worth noting that not all these features are available for all modem types (e.g. SMS messaging is available for most devices, but OMA DM is only supported in QMI based modems).


You can now have your 2G/3G/4G mobile broadband modems managed with ModemManager and netifd in your OpenWRT based system.

Filed under: Development, FreeDesktop Planet, GNOME Planet, Planets Tagged: libmbim, libqmi, ModemManager, openwrt
January 06, 2017

2017 starts with good news for Intel Haswell users: it has been a long time coming, but we have finally landed GL_ARB_gpu_shader_fp64 for this platform. Thanks to Matt Turner for reviewing the huge patch series!

Maybe you are not particularly excited about GL_ARB_gpu_shader_fp64, but that does not mean this is not an exciting milestone for you if you have a Haswell GPU (or even IvyBridge, read below): this extension was the last piece missing to finally bring Haswell to expose OpenGL 4.0!

If you want to give it a try but you don’t want to build the driver from the latest Mesa sources, don’t worry: the feature freeze for the Mesa 13.1 release is planned to happen in just a few days and the current plan is to have the release in early February, so if things go according to plan you won’t have to wait too long for an official release.

But that is not all, now that we have landed Fp64 we can also send for review the implementation of GL_ARB_vertex_attrib_64bit. This could be a very exciting milestone, since I believe this is the only thing missing for Haswell to have all the extensions required for OpenGL 4.5!

You might be wondering about IvyBridge too, and 2017 also starts with good news for IvyBridge users. Landing Fp64 for Haswell allowed us to send for review the IvyBridge patches we had queued up for GL_ARB_gpu_shader_fp64 which will bring IvyBridge up to OpenGL 4.0. But again, that is not all, once we land Fp64 we should also be able to send the patches for GL_vertex_attrib_64bit and get IvyBridge up to OpenGL 4.2, so look forward to this in the near future!

We have been working hard on Fp64 and Va64 during a good part of 2016, first for Broadwell and later platforms and later for Haswell and IvyBridge; it has been a lot of work so it is exciting to see all this getting to the last stages and on its way to the hands of the final users.

All this has only been possible thanks to Intel’s sponsoring and the great support and insight that our friends there have provided throughout the development and review processes, so big thanks to all of them and also to the team at Igalia that has been involved in the development with me.

January 03, 2017

This post describes the synclient tool, part of the xf86-input-synaptics package. It does not describe the various options, that's what the synclient(1) and synaptics(4) man pages are for. This post describes what synclient is, where it came from and how it works on a high level. Think of it as a anti-bus-factor post.

Maintenance status

The most important thing first: synclient is part of the synaptics X.Org driver which is in maintenance mode, and superseded by libinput and the xf86-input-libinput driver. In general, you should not be using synaptics anymore anyway, switch to libinput instead (and report bugs where the behaviour is not correct). It is unlikely that significant additional features will be added to synclient or synaptics and bugfixes are rare too.

The interface

synclient's interface is extremely simple: it's a list of key/value pairs that would all be set at the same time. For example, the following command sets two options, TapButton1 and TapButton2:

synclient TapButton1=1 TapButton2=2
The -l switch lists the current values in one big list:

$ synclient -l
Parameter settings:
LeftEdge = 1310
RightEdge = 4826
TopEdge = 2220
BottomEdge = 4636
FingerLow = 25
FingerHigh = 30
MaxTapTime = 180
The commandline interface is effectively a mapping of the various xorg.conf options. As said above, look at the synaptics(4) man page for details to each option.


A decade ago, the X server had no capabilities to change driver settings at runtime. Changing a device's configuration required rewriting an xorg.conf file and restarting the server. To avoid this, the synaptics X.Org touchpad driver exposed a shared memory (SHM) segment. Anyone with knowledge of the memory layout (an internal struct) and permission to write to that segment could change driver options at runtime. This is how synclient came to be, it was the tool that knew that memory layout. A synclient command would thus set the correct bits in the SHM segment and the driver would use the newly updated options. For obvious reasons, synclient and synaptics had to be the same version to work.

Atoms are 32-bit unsigned integers and created for each property name at runtime. They represent a unique string (the property name) and can be created by applications too. Property name to Atom mappings are global. Once any driver initialises a property by its name (e.g. "Synaptics Tap Actions"), that property and the corresponding Atom will exist globally until the server resets. Atoms unknown to a driver are simply ignored.

8 or so years ago, the X server got support for input device properties, a generic key/value store attached to each input device. The keys are the properties, identified by an "Atom" (see box on the side). The values are driver-specific. All drivers make use of this now, being able to change a property at runtime is the result of changing a property that the driver knows of.

synclient was converted to use properties instead of the SHM segment and eventually the SHM support was removed from both synclient and the driver itself. The backend to synclient is thus identical to the one used by the xinput tool or tools used by other drivers (e.g. the xsetwacom tool). synclient's killer feature was that it was the only tool that knew how to configure the driver, these days it's merely a commandline argument to property mapping tool. xinput, GNOME, KDE, they all do the same thing in the backend.

How synclient works

The driver has properties of a specific name, format and value range. For example, the "Synaptics Tap Action" property contains 7 8-bit values, each representing a button mapping for a specific tap action. If you change the fifth value of that property, you change the button mapping for a single-finger tap. Another property "Synaptics Off" is a single 8-bit value with an allowed range of 0, 1 or 2. The properties are described in the synaptics(4) man page. There is no functional difference between this synclient command:

synclient SynapticsOff=1
and this xinput command

xinput set-prop "SynPS/2 Synaptics TouchPad" "Synaptics Off" 1
Both set the same property with the same calls. synclient uses XI 1.x's XChangeDeviceProperty() and xinput uses XI 2.x's XIChangeProperty() if available but that doesn't really matter. They both fetch the property, overwrite the respective value and send it back to the server.

Pitfalls and quirks

synclient is a simple tool. If multiple touchpads are present it will simply pick the first one. This is a common issue for users with a i2c touchpad and will be even more common once the RMI4/SMBus support is in a released kernel. In both cases, the kernel creates the i2c/SMBus device and an additional PS/2 touchpad device that never sends events. So if synclient picks that device, all the settings are changed on a device that doesn't actually send events. This depends on the order the devices were added to the X server and can vary between reboots. You can work around that by disabling or ignoring the PS/2 device.

synclient is a one-shot tool, it does not monitor devices. If a device is added at runtime, the user must run the command to change settings. If a device is disabled and re-enabled (VT-switch, suspend/resume, ...), the user must run synclient to change settings. This is a major reason we recommend against using synclient, the desktop environment should take care of this. synclient will also conflict with the desktop environment in that it isn't aware when something else changes things. If synclient runs before the DE's init scripts (e.g. through xinitrc), its settings may be overwritten by the DE. If it runs later, it overwrites the DE's settings.

synclient exclusively supports synaptics driver properties. It cannot change any other driver's properties and it cannot change the properties created by the X server on each device. That's another reason we recommend against it, because you have to mix multiple tools to configure all devices instead of using e.g. the xinput tool for all property changes. Or, as above, letting the desktop environment take care of it.

The interface of synclient is IMO not significantly more obvious than setting the input properties directly. One has to look up what TapButton1 does anyway, so looking up how to set the property with the more generic xinput is the same amount of effort. A wrong value won't give the user anything more useful than the equivalent of a "this didn't work".


If you're TL;DR'ing an article labelled "the definitive guide to" you're kinda missing the point...

January 02, 2017
The 3DMMES test has been thoroughly instruction count limited, so any wins we can get on code generation translate pretty directly into performance gains.  Last week I decided to work on fixing up Jonas's patch to schedule instructions in the delay slots of thread switching, which can save us 3 instructions per texture sample.

Thread switching, you may recall, is a trick in the fragment shader to hide texture fetch latency by cooperatively switching to another fragment shader instance after you request a texture fetch, so that it can make some progress when you'd probably be blocked anyway.  This lets us better occupy the ALUs, at the cost of each shader needing to fit into half of the physical register file.

However, VC4 doesn't cancel instructions in the pipeline when we request a switch (same as for within-shader branching), so 3 more instructions from the current shader get executed.  For my first implementation of thread switching, I just dumped in 3 NOPs after each THRSW to fill the delay slots.  This was still a massive win over not threading, so it was good enough.

Jonas's patch tries to fill the delay slots after we schedule in the thread switch by trying to extract the THRSW signal off of the instruction we scheduled and move it up to 2 instructions before that, and then only add enough NOPs to get us 3 slots filled.  There was a little bug (it re-scheduled the thrsw instruction instead of a NOP in trying to pad out to delay slots), but it basically worked and got us a 1.55% instruction count win on shader-db.

The problem was that he was scheduling ALU operations along with the thrsw, and if the thrsw signal was alone without an ALU operation in it, after moving the thrsw up we'd have a NOP left in the old location.  I wrote a followon patch to fix that: We now only schedule thrsws on their own without ALU operations, insert the THRSW as early as we can, and then add NOPs as necessary to fill remaining delay slots.  This was another 0.41% instruction count win.

This isn't as good as it could be.  Maybe we don't fill the slots as well as we could before choosing to schedule thrsw, but instructions we choose to schedule after that could be fit in.  Those would be tricky because we have to check that they don't write flags or the accumulators (which wouldn't be preserved across the thrsw) or new texture coordinates.  We also don't put the THRSW at any particular point in the timeline between sampler request and results collection.  We might be able to get wins by trying to put thrsw at the lowest-register-pressure point between them, so that fewer things need to be forced into physical regs instead of accumulators.

Those will be projects for later.  It's probably much more important that we figure out how to schedule 2-4 samples with a single thrsw, instead of doing a thrsw per sample like we do today.

The other project last week was starting to build up a plan for vc4 CI.  We've had a few regressions to vc4 in Mesa master because developers (including me!) don't do testing on vc4 hardware on every commit.  The Intel folks have a lovely CI system that does piglit, DEQP, and performance regression testing for them, both tracking Mesa master and doing pre-commit tests of voluntarily submitted branches.  I despaired of the work needed to build something that good, but apparently they've open sourced their configuration, so I'll be trying to replicate that in the next few weeks.  This week I worked on getting the hardware and a plan for where to host it.  I'm looking forward to a bright future of 30-minute test runs on the actual hardware and long-term tracking of performance metrics.  Unfortunately, there are no docs to it and I've never worked with jenkins before, so this is going to be a challenge.

Other things: I submitted a patch to mm/ to shut up the CMA warning we've been suffering from (and patching away downstream) for years, got exactly the expected response ("this dmesg spew in an common path might be useful for somebody debugging something some day, so it should stay there"), so hopefully we can just get Fedora and Debian to patch it out instead.  This is yet another data point in favor of Matthew Garrett's plan of "write kernel patches you need, don't bother submitting them upstream". Started testing a new patch to reduce error return rate on vc4 memory allocatoins on upstream kernel, haven't confirmed that it's working yet.  I also spent a lot of time reviewing tarceri's patches to Mesa preparing for the on-disk shader cache, which should help improve app startup times and reduce memory consumption.

Packaging Python has been a painful experience for long. The history of the various distribution that Python offered along the years is really bumpy, and both the user and developer experience has been pretty bad.

Fortunately, things improved a lot in the recent years, with the reconciliation of setuptools and distribute.

Though in the context of the OpenStack project, a solution on top of setuptools has been already started a while back. Its usage is now spread across a whole range of software and libraries.

This project is called pbr, for Python Build Reasonableness. Don't be afraid by the OpenStack colored themed of the documentation – it is a bad habit of OpenStack folks to not advertise their tooling in an agnostic fashion. The tool has no dependency with the cloud platform, and can be used painlessly with any package.

How it works

pbr takes inspiration from distutils2 (a now abandoned project) and uses a setup.cfg file to describe the packager's intents. This is how a using pbr looks like:

import setuptools
setuptools.setup(setup_requires=['pbr'], pbr=True)

Two lines of code – it's that simple. The actual metadata that the setup requires is stored in setup.cfg:

name = foobar
author = Dave Null
author-email =
summary = Package doing nifty stuff
license = MIT
description-file =
home-page =
requires-python = >=2.6
classifier =
Development Status :: 4 - Beta
Environment :: Console
Intended Audience :: Developers
Intended Audience :: Information Technology
License :: OSI Approved :: Apache Software License
Operating System :: OS Independent
Programming Language :: Python
packages =

This syntax is way easier to write and read than the standard

pbr also offers other features such as:

  • automatic dependency installation based on requirements.txt
  • automatic documentation building and generation using Sphinx
  • automatic generation of AUTHORS and ChangeLog files based on git history
  • automatic creation of the list of files to include using git
  • version management based on git tags

All of this comes with little to no effort on your part.

Using flavors

One of the feature that I use a lot, is the definition of flavors. It's not tied particularly to pbr – it's actually provided by setuptools and pip themselves – but pbr setup.cfg file makes it easy to use.

When distributing a software, it's common to have different drivers for it. For example, your project could support both PostgreSQL or MySQL – but nobody is going to use both at the same time. The usual trick to make it work is to add the needed library to the requirements list (e.g. requirements.txt). The upside is that the software will work directly with either RDBMS, but the downside is that this will install both libraries, whereas only one is needed. Using flavors, you can specify different scenarios:

postgresql =
mysql =

When installing your package, the user can then just pick the right flavor by using pip to install the package:

$ pip install foobar[postgresql]

This will install foobar, all its dependencies listed in requirements.txt, plus whatever dependencies are listed in the [extras] section of setup.cfg matching the flavor. You can also combine several flavors, e.g.:

$ pip install foobar[postgresql,mysql]

would install both flavors.

pbr is well-maintained and in very active development, so if you have any plans to distribute your software, you should seriously consider including pbr in those plans.

December 23, 2016
Yesterday Valve gave me a copy of DOOM for Christmas (not really for Christmas), and I got the wine bits in place from Fedora, then I spent today trying to get DOOM to render on radv.

Thanks to ParkerR on #radeon for taking the picture from his machine, I'm too lazy.

So it runs kinda, it hangs the GPU a fair bit, it misrenders some colors in some scenes, but you can see most of it. I'm not sure if I'll get back to this before next year (I'll try), but I'm pretty happy to have gotten it this far in a day, though I'm sure the next few things will me much more difficult to debug.

The branch is here:
December 22, 2016
Yesterday I gave a short talk about the Chamelium board from the ChromeOS team, and thought that the slides could be useful for others as this board gets used more and more outside of Google.

If you are interested in how this board can help you automate the testing of your display (and not only!) code and hardware, a new mailing list has been created to discuss its uses. We at Collabora will be happy to help you integrate this board in your CI lab as well.

Thanks go to Intel for sponsoring the preparation of these slides and for allowing me to share them under an open license.

And of course, thanks to Google's ChromeOS team for releasing the hardware design with an open hardware license along with the code they are running on it and with it.

The fancy new Sphinx-based documentation has landed a while ago in upstream. Jani Nikula has written a nice overview on LWN (part 2). And it is getting used a lot. But judging by how often I type it in replies on the mailing list what’s missing is a super-short howto. To build the documentation, run:

$ make DOCBOOKS="" htmldocs

The output can then be found in Documentation/output/. When typing documentation please always check that your new text does get rendered. The output also contains documentation about kernel-doc and the toolchain itself. Since the build is incremental it is recommended that you first run it before touching anything. That way you’ll only see warnings in areas you’ve touched, not all of them - the build is unfortunately somewhat noisy.

December 20, 2016
Two big successes last week.

One is that Dave Airlie has pulled Boris's VEC (STDV) code for 4.10.  We didn't quite make it for getting the DT changes in time to have full support in 4.10, but the DT changes will be a lot easier to backport than the driver code.

The other is that I finally got the DSI panel working, after 9 months of frustrating development.  It turns out that my DSI transactions to the Toshiba chip aren't working, but if I use I2C to the undocumented Atmel microcontroller to relay to the Toshiba in one of the 4 possible orderings of the register sequence, the panel comes up.  The Toshiba docs say I need to do I2C writes at the beginning of the poweron sequence, but the closed firmware uses DSI transactions for the whole sequence, making me suspicious that the Atmel has already done some of the Toshiba register setup.

I've now submitted the DSI driver, panel, and clock patches after some massive cleanup.  It even comes with proper runtime power management for both the panel and the VC4 DSI module.

The next steps in modesetting land will be to write an input driver for the panel, do any reworks we need from review, and get those chunks merged in time for 4.11.  While I'm working on this, Boris is now looking at my HDMI audio code so that hopefully we can get that working as well.  Eben is going to poke Dom about fixing the VC4driver interop with media decode.  I feel like we're now making rapid progress toward feature parity on the modesetting side of the VC4 driver.
December 19, 2016

This is a common source of confusion: the legacy X.Org driver for touchpads is called xf86-input-synaptics but it is not a driver written by Synaptics, Inc. (the company).

The repository goes back to 2002 and for the first couple of years it Peter Osterlund was the sole contributor. Back then it was called "synaptics" and really was a "synaptics device" driver, i.e. it handled PS/2 protocol requests to initialise Synaptics, Inc. touchpads. Evdev support was added in 2003, punting the initialisation work to the kernel instead. This was the groundwork for a generic touchpad driver. In 2008 the driver was renamed to xf86-input-synaptics and relicensed from GPL to MIT to take it under the X.Org umbrella. I've been involved with it since 2008 and the official maintainer since 2011.

For many years now, the driver has been a generic touchpad driver that handles any device that the Linux kernel can handle. In fact, most bugs attributed to the synaptics driver not finding the touchpad are caused by the kernel not initialising the touchpad correctly. The synaptics driver reads the same evdev events that are also handled by libinput and the xf86-input-evdev driver, any differences in behaviour are driver-specific and not related to the hardware. The driver handles devices from Synaptics, Inc., ALPS, Elantech, Cypress, Apple and even some Wacom touch tablets. We don't care about what touchpad it is as long as the evdev events are sane.

Synaptics, Inc.'s developers are active in kernel development to help get new touchpads up and running. Once the kernel handles them, the xorg drivers and libinput will handle them too. I can't remember any significant contribution by Synaptics, Inc. to the synaptics driver, so they are simply neither to credit nor to blame for the current state of the driver. The top 10 contributors since August 2008 when the first renamed version of xf86-input-synaptics was released are:

8 Simon Thum
10 Hans de Goede
10 Magnus Kessler
13 Alexandr Shadchin
15 Christoph Brill
18 Daniel Stone
18 Henrik Rydberg
39 Gaetan Nadon
50 Chase Douglas
396 Peter Hutterer
There's a long tail of other contributors but the top ten illustrate that it wasn't Synaptics, Inc. that wrote the driver. Any complaints about Synaptics, Inc. not maintaining/writing/fixing the driver are missing the point, because this driver was never a Synaptics, Inc. driver. That's not a criticism of Synaptics, Inc. btw, that's just how things are. We should have renamed the driver to just xf86-input-touchpad back in 2008 but that ship has sailed now. And synaptics is about to be superseded by libinput anyway, so it's simply not worth the effort now.

The other reason I included the commit count in the above: I'm also the main author of libinput. So "the synaptics developers" and "the libinput developers" are effectively the same person, i.e. me. Keep that in mind when you read random comments on the interwebs, it makes it easier to identify people just talking out of their behind.

A long-standing criticism of libinput is its touchpad acceleration code, oscillating somewhere between "terrible", "this is bad and you should feel bad" and "I can't complain because I keep missing the bloody send button". I finally found the time and some more laptops to sit down and figure out what's going on.

I recorded touch sequences of the following movements:

  • super-slow: a very slow movement as you would do when pixel-precision is required. I recorded this by effectively slowly rolling my finger. This is an unusual but sometimes required interaction.
  • slow: a slow movement as you would do when you need to hit a target several pixels across from a short distance away, e.g. the Firefox tab close button
  • medium: a medium-speed movement though probably closer to the slow side. This would be similar to the movement when you move 5cm across the screen.
  • medium-fast: a medium-to-fast speed movement. This would be similar to the movement when you move 5cm across the screen onto a large target, e.g. when moving between icons in the file manager.
  • fast: a fast movement. This would be similar to the movement when you move between windows some distance apart.
  • flick: a flick movement. This would be similar to the movement when you move to a corner of the screen.
Note that all these are by definition subjective and somewhat dependent on the hardware. Either way, I tried to get something of a reasonable subset.

Next, I ran this through a libinput 1.5.3 augmented with printfs in the pointer acceleration code and a script to post-process that output. Unfortunately, libinput's pointer acceleration internally uses units equivalent to a 1000dpi mouse and that's not something easy to understand. Either way, the numbers themselves don't matter too much for analysis right now and I've now switched everything to mm/s anyway.

A note ahead: the analysis relies on libinput recording an evemu replay. That relies on uinput and event timestamps are subject to a little bit of drift across recordings. Some differences in the before/after of the same recording can likely be blamed on that.

The graph I'll present for each recording is relatively simple, it shows the velocity and the matching factor.The x axis is simply the events in sequence, the y axes are the factor and the velocity (note: two different scales in one graph). And it colours in the bits that see some type of acceleration. Green means "maximum factor applied", yellow means "decelerated". The purple "adaptive" means per-velocity acceleration is applied. Anything that remains white is used as-is (aside from the constant deceleration). This isn't really different to the first graph, it just shows roughly the same data in different colours.

Interesting numbers for the factor are 0.4 and 0.8. We have a constant acceleration of 0.4 on touchpads, i.e. a factor of 0.4 "don't apply acceleration", the latter is "maximum factor". The maximum factor is twice as big as the normal factor, so the pointer moves twice as fast. Anything below 0.4 means we decelerate the pointer, i.e. the pointer moves slower than the finger.

The super-slow movement shows that the factor is, aside from the beginning always below 0.4, i.e. the sequence sees deceleration applied. The takeaway here is that acceleration appears to be doing the right thing, slow motion is decelerated and while there may or may not be some tweaking to do, there is no smoking gun.

Super slow motion is decelerated.

The slow movement shows that the factor is almost always 0.4, aside from a few extremely slow events. This indicates that for the slow speed, the pointer movement maps exactly to the finger movement save for our constant deceleration. As above, there is no indicator that we're doing something seriously wrong.

Slow motion is largely used as-is with a few decelerations.

The medium movement gets interesting. If we look at the factor applied, it changes wildly with the velocity across the whole range between 0.4 and the maximum 0.8. There is a short spike at the beginning where it maxes out but the rest is accelerated on-demand, i.e. different finger speeds will produce different acceleration. This shows the crux of what a lot of users have been complaining about - what is a fairly slow motion still results in an accelerated pointer. And because the acceleration changes with the speed the pointer behaviour is unpredictable.

In medium-speed motion acceleration changes with the speed and even maxes out.

The medium-fast movement shows almost the whole movement maxing out on the maximum acceleration factor, i.e. the pointer moves at twice the speed to the finger. This is a problem because this is roughly the speed you'd use to hit a "mentally preselected" target, i.e. you know exactly where the pointer should end up and you're just intuitively moving it there. If the pointer moves twice as fast, you're going to overshoot and indeed that's what I've observed during the touchpad tap analysis userstudy.

Medium-fast motion easily maxes out on acceleration.

The fast movement shows basically the same thing, almost the whole sequence maxes out on the acceleration factor so the pointer will move twice as far as intuitively guessed.

Fast motion maxes out acceleration.

So does the flick movement, but in that case we want it to go as far as possible and note that the speeds between fast and flick are virtually identical here. I'm not sure if that's me just being equally fast or the touchpad not quite picking up on the short motion.

Flick motion also maxes out acceleration.

Either way, the takeaway is simple: we accelerate too soon and there's a fairly narrow window where we have adaptive acceleration, it's very easy to top out. The simplest fix to get most touchpad movements working well is to increase the current threshold on when acceleration applies. Beyond that it's a bit harder to quantify, but a good idea seems to be to stretch out the acceleration function so that the factor changes at a slower rate as the velocity increases. And up the acceleration factor so we don't top out and we keep going as the finger goes faster. This would be the intuitive expectation since it resembles physics (more or less).

There's a set of patches on the list now that does exactly that. So let's see what the result of this is. Note ahead: I also switched everything from mm/s which causes some numbers to shift slightly.

The super-slow motion is largely unchanged though the velocity scale changes quite a bit. Part of that is that the new code has a different unit which, on my T440s, isn't exactly 1000dpi. So the numbers shift and the result of that is that deceleration applies a bit more often than before.

Super-slow motion largely remains the same.

The slow motions are largely unchanged but more deceleration is now applied. Tbh, I'm not sure if that's an artefact of the evemu replay, the new accel code or the result of the not-quite-1000dpi of my touchpad.

Slow motion largely remains the same.

The medium motion is the first interesting one because that's where we had the first observable issues. In the new code, the motion is almost entirely unaccelerated, i.e. the pointer will move as the finger does. Success!

Medium-speed motion now matches the finger speed.

The same is true of the medium-fast motion. In the recording the first few events were past the new thresholds so some acceleration is applied, the rest of the motion matches finger motion.

Medium-fast motion now matches the finger speed except at the beginning where some acceleration was applied.

The fast and flick motion are largely identical in having the acceleration factor applied to almost the whole motion but the big change is that the factor now goes up to 2.3 for the fast motion and 2.5 for the flick motion, i.e. both movements would go a lot faster than before. In the graphics below you still see the blue area marked as "previously max acceleration factor" though it does not actually max out in either recording now.

Fast motion increases acceleration as speed increases.

Flick motion increases acceleration as speed increases.

In summary, what this means is that the new code accelerates later but when it does accelerate, it goes faster. I tested this on a T440s, a T450p and an Asus VivoBook with an Elantech touchpad (which is almost unusable with current libinput). They don't quite feel the same yet and I'm not happy with the actual acceleration, but for 90% of 'normal' movements the touchpad now behaves very well. So at least we go from "this is terrible" to "this needs tweaking". I'll go check if there's any champagne left.

December 15, 2016
We're about a week before Christmas, and I'm going to explain how I created a retro keyboard as a gift to my father, who introduced me to computers when he brought back a Thomson TO7 home, all the way back in 1985.

The original idea was to use a Thomson computer to fit in a smaller computer, such as a CHIP or Raspberry Pi, but the software update support would have been difficult, the use limited to the builtin programs, and it would have required a separate screen. So I restricted myself to only making a keyboard. It was a big enough task, as we'll see.

How do keyboards work?

Loads of switches, that's how. I'll point you to Michał Trybus' blog post « How to make a keyboard - the matrix » for details on this works. You'll just need to remember that most of the keyboards present in those older computers have no support for xKRO, and that the micro-controller we'll be using already has the necessary pull-up resistors builtin.

The keyboard hardware

I chose the smallest Thomson computer available for my project, the MO5. I could have used a stand-alone keyboard, but would have lost all the charm of it (it just looks like a PC keyboard), some other computers have much bigger form factors, to include cartridge, cassette or floppy disk readers.

The DCMoto emulator's website includes tons of documentation, including technical documentation explaining the inner workings of each one of the chipsets on the mainboard. In one of those manuals, you'll find this page:

Whoot! The keyboard matrix in details, no need for us to discover it with a multimeter.

That needs a wash in soapy water

After opening up the computer, and eventually giving the internals, and the keyboard especially if it has mechanical keys, a good clean, we'll need to see how the keyboard is connected.

Finicky metal covered plastic

Those keyboards usually are membrane keyboards, with pressure pads, so we'll need to either find replacement connectors at our local electronics store, or desolder the ones on the motherboard. I chose the latter option.

Desoldered connectors

After matching the physical connectors to the rows and columns in the matrix, using a multimeter and a few key presses, we now know which connector pin corresponds to which connector on the matrix. We can start soldering.

The micro-controller

The micro-controller in my case is a Teensy 2.0, an Atmel AVR-based micro-controller with a very useful firmware that makes it very very difficult to brick. You can either press the little button on the board itself to upload new firmware, or wire it to an external momentary switch. The funny thing is that the Atmega32U4 is 16 times faster than the original CPU (yeah, we're getting old).

I chose to wire it to the "Initial. Prog" ("Reset") button on the keyboard, so as to make it easy to upload new firmware. To do this, I needed to cut a few traces coming out of the physical switch on the board, to avoid interferences from components on the board, using a tile cutter. This is completely optional, and if you're only going to use firmware that you already know at least somewhat works, you can set a key combo to go into firmware upload mode in the firmware. We'll get back to that later.

As far as connecting and soldering to the pins, we can use any I/O pins we want, except D6, which is connected to the board's LED. Note that any deviation from the pinout used in your firmware, you'd need to make changes to it. We'll come back to that again in a minute.

The soldering

Colorful tinning

I wanted to keep the external ports full, so it didn't look like there were holes in the case, but there was enough headroom inside the case to fit the original board, the teensy and pins on the board. That makes it easy to rewire in case of error. You could also dremel (yes, used as a verb) a hole in the board.

As always, make sure early that things would fit, especially the cables!

The unnecessary pollution

The firmware

Fairly early on during my research, I found the TMK keyboard firmware, as well as very well written forum post with detailed explanations on how to modify an existing firmware for your own uses.

This is what I used to modify the firmware for the gh60 keyboard for my own use. You can see here a step-by-step example, implementing the modifications in the same order as the forum post.

Once you've followed the steps, you'll need to compile the firmware. Fedora ships with the necessary packages, so it's a simple:

sudo dnf install -y avr-libc avr-binutils avr-gcc

I also compiled and installed in my $PATH the teensy_cli firmware uploader, and fixed up the udev rules. And after a "make teensy" and a button press...

It worked first time! This is a good time to verify that all the keys work, and you don't see doubled-up letters because of short circuits in your setup. I had 2 wires touching, and one column that just didn't work.

I also prepared a stand-alone repository, with a firmware that uses the tmk_core from the tmk firmware, instead of modifying an existing one.

Some advices

This isn't the first time I hacked on hardware, but I'll repeat some old adages, and advices, because I rarely heed those warnings, and I regret...
  • Don't forget the size, length and non-flexibility of cables in your design
  • Plan ahead when you're going to cut or otherwise modify hardware, because you might regret it later
  • Use breadboard cables and pins to connect things, if you have the room
  • Don't hotglue until you've tested and retested and are sure you're not going to make more modifications
That last one explains the slightly funny cabling of my keyboard.

Finishing touches

All Sugru'ed up

To finish things off nicely, I used Sugru to stick the USB cable, out of the machine, in place. And as earlier, it will avoid having an opening onto the internals.

There are a couple more things that I'll need to finish up before delivery. First, the keymap I have chosen in the firmware only works when a US keymap is selected. I'll need to make a keymap for Linux, possibly hard-coding it. I will also need to create a Windows keymap for my father to use (yep, genealogy software on Linux isn't quite up-to-par).

Prototype and final hardware

All this will happen in the aforementioned repository. And if you ever make your own keyboard, I'm happy to merge in changes to this repository with documentation for your Speccy, C64, or Amstrad CPC hacks.

(If somebody wants to buy me a Sega keyboard, I'll gladly work on a non-destructive adapter. Get in touch :)
December 14, 2016

The collective internet troll fest had it’s fun recently discussing AMD’s DAL. Hacker news discussed the rejection and some of the reactions, reddit had some fun and of course everyone on phoronix forums was going totally nuts. Luckily reason seems to finally prevail with LWN covering things too. I don’t want to spill more bits over the topic itself (read the LWN coverage and mailing list threads for that), but I think it’s worth looking at the fundamental underlying problem a bit more.

Discussing midlayers seems to be one of the recuring topics in the linux kernel. There’s the original midlayer-mistake article from Neil Brown that seems to have started it all. But LWN gained more articles in the years since, covering the iscsi driver as a study in avoiding OS abstraction layers, or a similar case in wireless with the Broadcom driver. The dismissal of midlayers and hailing of helper libraries has become so prevalent that calling your shiny new subsystem libfoo (viz. libnvdimm) seems to be a powerful trick to speed up its acceptance.

It seems common knowledge and accepted fact, but still there’s a constant stream of drivers that come packaged with huge abstraction layers - mostly to abstract OS internals, but very often also just a few midlayers between different components within the driver. A major reason for this is certain that submissions by former proprietary teams, or just generally code developed internally behind closed doors is suffering from the platform problem - again LWN has you covered. If your driver is not open and part of upstream (or even for an open source OS), then the provided services and interfaces are fixed. And there’s no way to improve things, worse, when change does happen the driver team generally doesn’t have any influence at all over what and how things changes. Hence it makes sense to insert a big abstraction layer to isolate the driver from the outside madness.

But that does not explain why big drivers (and more so, subsystems) come with some nice abstraction layers wedged in-between different parts. I believe, with not any proof really, that this is because company planners are extremely risk averse: Sure, working together and sharing code across the entire project has long-term benefits. But the more people are involved the bigger the risk for a bikeshed fest or some other delay, and you never want to be the one team that delayed a release by a few months. Adding isolation in the form of lots of fixed abstraction layers helps with that. But long term, and for really big projects, sharing code and working together has clear benefits, even if there’s routinely a hiccup - the neck-breaking speed of Linux kernel development overall is testament enough for that I think.

All that is just technicalities really, because in the end upstream and open source is about collaboratively developing software. That requires shared control, and interfaces between different components need to be a lot more permeable. In my experience that core idea of handing control over development to outsiders is the really scary part of joining upstream, since followed through to its conclusions it means you need to completely rethink how products are planned and developed: The entire organisation, from individual developers, to teams and including the management chain have to change. And that freaks people out.

In summary I think code submissions with lots of midlayers are bound to stay with us, and that’s good: Because new midlayers means new teams and companies start to take upstream seriously. And new midlayers getting cleaned up means new teams and new companies are going to the painful changes necessary to adjust to an upstream first model. The code itself is just the canary for the real shifts happening.

In other news: World domination is still progressing according to plan.

December 12, 2016
As blogged about already by Christian for Fedora 25 we've been working on improving hybrid gfx support, as well as on making it easier for users who want to, to install the NVidia binary driver.

The improved hybrid gfx support using the default opensource drivers was ready in time for and is part of the Fedora 25 release. Unfortunately the NVidia driver work was not ready in time. This has lead to some confusion.

Let me start with a FAQ to try and clarify things:

1)  I want to use Fedora 25 on a laptop with hybrid graphics, will this work?

Yes Fedora 25 supports hybrid graphics out of the box using the opensource drivers included with Fedora. A lot of work has been done to make this as smooth as possible. If you encounter any problems with hybrid graphics under Fedora 25 using the default opensource drivers, please file a bug in bugzilla and put me in the Cc.

2) I want to use the NVidia driver with Fedora 25, I heard it would be available in gnome-software?

It is our intend to make the NVidia driver available in gnome-software (for users who have 3th party sources enabled), but this is not ready yet, I hope to announce some progress on this on this blog soon. see below for a progress report on this.

3) I've an NVidia GPU which driver is better for me?

If you want maximum performance for e.g. gaming, you will likely want the NVidia driver. But be aware that on Optimus laptops using the NVidia driver will result in much higher power consumption / shorter battery life, as all rendering then is done on the NVidia GPU, which means that it cannot be powered down while your doing light work.

If your workload does not need maximum 3D rendering performance your likely better of with the default opensource drivers. On Optimus laptops these offer much better batery life and the opensource drivers do not have the risk of breaking when upgrading your kernel, or upgrading to the next Fedora release.

4) I want to have good battery life, but I also want to play the occasional game with good performance?

As part of the work on making it easy to use the NVidia driver with Fedora, I'm also planning to add a utility which will allow you to easily switch between nouveau and the NVidia driver. Note this will require a reboot each time you switch.

5) What is the status of making the NVidia driver available for easy installation in Fedora 25?

This biggest issue with the Nvidia driver is that it will not work out of the box on optimus enabled laptops, installing it on such a laptop will typically result in the laptop booting to a black screen, which is NOT the user experience we want to offer.

I've just finished a set of patches for the xserver, which will allow the NVidia driver rpm to install a xorg.conf snippet which will make the xorg autoconfigure code automatically set things up the right way for the nvidia driver on optimus enabled laptops.

Depending on the upstream review of these patches I will prepare an updated xserver package with these patches for Fedora 25 soon. Once that is in place, the rpms can be updated with the xorg.conf snippet.

When that is done, there still are some other issues to tackle:

  • Mesa in F25 is not yet glvnd dispatch enabled, which means that the rpms need to play ldconfig path tricks, which means that if the rpms get installed on a system with an unsupported GPU, and Xorg falls back to the open-source driver things go boom because apps end up loading the wrong

  • We want to have some mechanism in place to automatically load the nouveau kernel-module before starting gdm if the nvidia kernel module did not load for some reason (e.g. it failed to compile against a new kernel)

  • The rpms currently use dkms or akmod, building the nvidia kernel module from source on your system, we want to switch things over to having pre-built kmods available

Note not all of these are necessarily blockers for adding the Nvidia driver
to gnome-software.

6) I want to try out the nvidia driver now, is that possible ?

First of all, if you've an optimus enabled laptop, please do not do this (yet), we should have a solution ready for you in a couple of weeks.

If you have a system which is only using a *single* nvidia GPU, then you can install the nvidia driver using the following commands:

  sudo dnf update 'kernel*'
  sudo dnf config-manager --add-repo=
  sudo dnf install nvidia-settings kernel-devel dkms-nvidia vulkan.i686 nvidia-driver-libs.i686

Then reboot and you should be running the nvidia driver, to get back to the opensource driver do:

  sudo dnf remove nvidia-driver

A short while ago, I asked a bunch of people for long-term touchpad usage data (specifically: evemu recordings). I currently have 25 sets of data, the shortest of which has 9422 events, the longest of which has 987746 events. I requested that evemu-record was to be run in the background while people use their touchpad normally. Thus the data is quite messy, it contains taps, two-finger scrolling, edge scrolling, palm touches, etc. It's also raw data from the touchpad, not processed by libinput. Some care has to be taken with analysis, especially since it is weighted towards long recordings. In other words, the user with 987k events has a higher influence than the user with 9k events. So the data is useful for looking for patterns that can be independently verified with other data later. But it's also useful for disproving hypothesis, i.e. "we cannot do $foo because some users' events show $bla".

One of the things I've looked into was tapping. In libinput, a tap has two properties: a time threshold and a movement threshold. If the finger is held down longer than 180ms or it moves more than 3mm it is not a tap. These numbers are either taken from synaptics or just guesswork (both, probably). The need for a time-based threshold is obvious: we don't know whether the user is tapping until we see the finger up event. Only if that doesn't happen within a given time we know the user simply put the finger down. The movement threshold is required because small movements occur while tapping, caused by the finger really moving (e.g. when tapping shortly before/after a pointer motion) or by the finger center moving (as the finger flattens under pressure, the center may move a bit). Either way, these thresholds delay real pointer movement, making the pointer less reactive than it could be. So it's in our interest to have these thresholds low to get reactive pointer movement but as high as necessary to have reliable tap detection.

General data analysis

Let's look at the (messy) data. I wrote a script to calculate the time delta and movement distance for every single-touch sequence, i.e. anything with two or more fingers down was ignored. The script used a range of 250ms and 6mm of movement, discarding any sequences outside those thresholds. I also ignored anything in the left-most or right-most 10% because it's likely that anything that looks like a tap is a palm interaction [1]. I ran the script against those files where the users reported that they use tapping (10 users) which gave me 6800 tap sequences. Note that the ranges are purposely larger than libinput's to detect if there was a significant amount of attempted taps that exceed the current thresholds and would be misdetected as non-taps.

Let's have a look at the results. First, a simple picture that merely prints the start location of each tap, normalised to the width/height of the touchpad. As you can see, taps are primarily clustered around the center but can really occur anywhere on the touchpad. This means any attempt at detecting taps by location would be unreliable.

Normalized distribution of touch sequence start points (relative to touchpad width/height)

You can easily see the empty areas in the left-most and right-most 10%, that is an artefact of the filtering.

The analysis of time is more interesting: There are spikes around the 50ms mark with quite a few outliers going towards 100ms forming what looks like a narrow normal distribution curve. The data points are overlaid with markers for the mean [2], the 50 percentile, the 90 percentile and the 95 percentile [3]. And the data says: 95% of events fall below 116ms. That's something to go on.

Times between touch down and touch up for a possible tap event.
Note that we're using a 250ms timeout here and thus even look at touches that would not have been detected as tap by libinput. If we reduce to the 180ms libinput uses, we get a 95% percentile of 98ms, i.e. "of all taps currently detected as taps, 95% are 98ms or shorter".

The analysis of distance is similar: Most of the tap sequences have little to no movement, with 50% falling below 0.2mm of movement. Again the data points are overlaid with markers for the mean, the 50 percentile, the 90 percentile and the 95 percentile. And the data says: 95% of events fall below 1.8mm. Again, something to go on.

Movement between the touch down and the touch up event for a possible tap (10 == 1mm)
Note that we're using a 6mm threshold here and thus even look at touches that would not have been detected as tap by libinput. If we reduce to the 3mm libinput uses, we get a 95% percentile of 1.2mm, i.e. "of all taps currently detected as taps, 95% move 1.2mm or less".

Now let's combine the two. Below is a graph mapping times and distances from touch sequences. In general, the longer the time, the longer the more movement we get but most of the data is in the bottom left. Since doing percentiles is tricky on 2 axes, I mapped the respective axes individually. The biggest rectangle is the 95th percentile for time and distance, the number below shows how many data points actually fall into this rectangle. Looks promising, we still have a vast majority of touchpoints fall into the respective 95 percentiles though the numbers are slightly lower than the individual axes suggest.

Time to distance map for all possible taps
Again, this is for the 250ms by 6mm movement. About 3.3% of the events fall into the area between 180ms/3mm and 250ms/6mm. There is a chance that some of the touches have have been short, small movements, we just can't know by from data.

So based on the above, we learned one thing: it would not be reliable to detect taps based on their location. But we also suspect two things now: we can reduce the timeout and movement threshold without sacrificing a lot of reliability.

Verification of findings

Based on the above, our hypothesis is: we can reduce the timeout to 116ms and the threshold to 1.8mm while still having a 93% detection reliability. This is the most conservative reading, based on the extended thresholds.

To verify this, we needed to collect tap data from multiple users in a standardised and reproducible way. We wrote a basic website that displays 5 circles (see the screenshot below) on a canvas and asked a bunch of co-workers in two different offices [4] to tap them. While doing so, evemu-record was running in the background to capture the touchpad interactions. The touchpad was the one from a Lenovo T450 in both cases.

Screenshot of the <canvas> that users were asked to perform the taps on.
Some users ended up clicking instead of tapping and we had to discard those recordings. The total number of useful recordings was 15 from the Paris office and 27 from the Brisbane office. In total we had 245 taps (some users missed the circle on the first go, others double-tapped).

We asked each user three questions: "do you know what tapping/tap-to-click is?", "do you have tapping enabled" and "do you use it?". The answers are listed below:

  • Do you know what tapping is? 33 yes, 12 no
  • Do you have tapping enabled? 19 yes, 26 no
  • Do you use tapping? 10 yes, 35 no

I admit I kinda screwed up the data collection here because it includes those users whose recordings we had to discard. And the questions could've been better. So I'm not going to go into too much detail. The only useful thing here though is: the majority of users had tapping disabled and/or don't use it which should make any potential learning effect disappear[5]

Ok, let's look at the data sets, same scripts as above:

Times between touch down and touch up for tap events

Movement between the touch down and the touch up events of a tap (10 == 1mm)
95th percentile for time is 87ms. 95th percentile for distance is 1.09mm. Both are well within the numbers we expected we saw above. The combined diagram shows that 87% of events fall within the 87ms/10.9mm box.

Time to distance map for all taps
The few outliers here are close enough to the edge that expanding the box to to 100ms/1.3mm we get more than 95%. So it appears that our hypothesis is correct, reducing the timeout to 116ms and 1.8mm will have a 95% detection reliability. Furthermore, using the clean data it looks like we can use a lower threshold than previously assumed and still get a good detection ratio. Specifically, data collected in a controlled environment across 42 different users of varying familiarity with touchpad tapping shows that 100ms and 1.3mm gets us a 95% detection rate of taps.

What does this mean for users?

Based on the above, the libinput thresholds will be reduced to 100ms and 1.3mm. Let's see how we go with this and then we can increase it in the future if misdetection is higher than expected. Patches will on the wayland-devel list shortly.

For users that don't have tapping enabled, this will not change anything. All users who have tapping enabled will see a more responsive cursor on small movements as the time and distance thresholds have been significantly reduced. Some users may see a drop in tap detection rate. This is hopefully a subconscious enough effect that those users learn to tap faster or with less movement. If not, we have to look at it separately and see how we can deal with that.

If you find any issues with the analysis above, please let me know.

[1] These scripts analyse raw touchpad data, they don't benefit from libinput's palm detection
[2] Note: mean != average, the mean is less affected by strong outliers. look it up, it's worth knowing
[3] X percentile means X% of events fall below this value
[4] The Brisbane and Paris offices. No separate analysis was done, so it is unknown whether close proximity to baguettes has an effect to tap behaviour
[5] i.e. the effect of users learning how to use a system that doesn't work well out-of-the-box. This may result in e.g. quicker taps from those that are familiar with the system vs those that don't.

December 09, 2016


One of the key reasons to keep using the out-of-tree Sierra GobiNet drivers and GobiAPI was that upgrading firmware in the WWAN modules was supported out of the box, while we didn’t have any way to do so with qmi_wwan in the upstream kernel and libqmi.

I’m glad to say that this is no longer the case; as we already have a new working solution in the aleksander/qmi-firmware-update branch in the upstream libqmi git repository, which will be released in the next major libqmi release. Check it out!

The new tool is named, no surprise, qmi-firmware-update; and allows upgrading firmware for Qualcomm based Sierra Wireless devices (e.g. MC7700, MC7710, MC7304, MC7354, MC7330, MC7444…). I’ve personally not tested any other device or manufacturer yet, so won’t say we support others just yet.

This work wouldn’t have been possible without Bjørn Mork‘s swi-update program, which already contained most of the bits and pieces for the QDL download session management, we all owe him quite some virtual beers. And thanks also to Zodiac Inflight Innovations for sponsoring this work!

Sierra Wireless SWI9200 series (e.g. MC7700, MC7710…)

The upgrade process for Sierra Wireless SWI9200 devices (already flagged as EOL, but still used in thousands of places) is very straightforward:

  • Device is rebooted in boot & hold mode (what we call QDL download mode) by running AT!BOOTHOLD in the primary AT port.
  • A QDL download session is run to upload the firmware image, which is usually just one single file which contains the base system plus the carrier-specific settings.
  • Once the QDL download session is finished, the device is rebooted in normal mode.

The new qmi-firmware-update tool supports all these steps just by running one single command as follows:

$ sudo qmi-firmware-update \
     --update \
     -d 1199:68a2 \

Sierra Wireless SWI9x15 series (e.g. MC7304, MC7354…)

The upgrade process for Sierra Wireless SWI9x15 devices is a bit more complex, as these devices support and require the QMI DMS Set/Get Firmware Preference commands to initiate the download. The steps would be:

  • Decide which firmware version, config version and carrier strings to use. The firmware version is the version of the system itself, the config version is the version of the carrier-specific image, and the carrier string is the identifier of the network operator.
  • Using QMI DMS Set Firmware Preference, the new desired firmware version, config version and carrier are specified. When the firmware and config version don’t match the ones currently running in the device, it will reboot itself in boot & hold mode and wait for the new downloads.
  • A QDL download session is run to upload each firmware image available. For these kind of devices, two options are given to the users: a pair of .cwe and .nvu images containing the system and carrier images separately, or a consolidated .spk image with both. It’s advised to always use the consolidated .spk image to avoid mismatches between system and config.
  • Once the QDL download sessions are finished, the device is rebooted in normal mode.

Again, the new qmi-firmware-update tool supports all these steps just by running one single command as follows:

$ sudo qmi-firmware-update \
     --update \
     -d 1199:68c0 \

This previous commands will analyze each firmware image provided and will extract the firmware version, config version and carrier so that the user doesn’t need to explicitly provide them (although there are also options to do that if needed).

Sierra Wireless SWI9x30 series (e.g. MC7455, MC7430..)

The upgrade process for Sierra Wireless SWI9x30 devices is equivalent to the one used for SWI9x15. One of the main differences, though, is that SWI9x15 devices seem to only allow one pair of modem+pri images (system+config) installed in the system, while the SWI9x30 allows the user to download multiple images and then select them using the QMI DMS List/Select/Delete Stored Image commands.

The SWI9x30 modules may also run in MBIM mode instead of QMI. In this case, the firmware upgrade process is exactly the same as with the SWI9x15 series, but using QMI over MBIM. The qmi-firmware-update program supports this operation with the –device-open-mbim command line argument:

$ sudo qmi-firmware-update \
    --update \
    -d 1199:9071 \
    --device-open-mbim \
    SWI9X30C_02.20.03.00.cwe \

Notes on device selection

There are multiple ways to select which device is going to be upgraded:

  • vid:pid: If there is a single device to upgrade in the system, it usually is easiest to use the -d option to select it by vid:pid (or even just by vid). This is the way used by default in all previous examples, and really the easiest one if you just have one modem available.
  • bus:dev: If there are multiple devices to upgrade in the same system, a more restrictive device selection can be achieved with the -s option specifying the USB device bus number plus device number, which is unique per physical device.
  • /dev/cdc-wdm: A firmware upgrade operation may also be started by using the –cdc-wdm option (shorter, -w) and specifying a /dev/cdc-wdm device exposed by the module.
  • /dev/ttyUSB: If the device is already in boot & hold mode, a QDL download session may be performed directly on the tty specified by the –qdl-serial (shorter, -q) option.

Notes on firmware images

Sierra Wireless provides firmware images for all their SWI9200, SWI9x15 and SWI9x30 modules in their website. Sometimes they do specify “for Linux” (and you would get .cwe, .nvu or .spk images) and sometimes they just provide .exe Windows OS binaries. For the latter, you can just decompress the .exe file e.g. using 7-Zip and get the firmware images that you would use with qmi-firmware-update, e.g.:

 $ 7z x SWI9200M_3.5-Release13-SWI9200X_03.05.29.03.exe
 $ ls *.{cwe,nvu,spk} 2>/dev/null


qmi-firmware-update now allows upgrading firmware in Sierra Wireless modems using qmi_wwan and libqmi.

Filed under: FreeDesktop Planet, GNOME Planet, Planets Tagged: firmware upgrade, Gobi, GobiNet, libqmi, QMI, qmi_wwan, Qualcomm, sierra-wireless
December 07, 2016

Update: Dec 08 2016: someone's working on this project. Sorry about the late update, but feel free to pick other projects you want to work on.

Interested in hacking on some low-level stuff and implementing a feature that's useful to a lot of laptop owners out there? We have a feature on libinput's todo list but I'm just constantly losing my fight against the ever-growing todo list. So if you already know C and you're interested in playing around with some low-level bits of software this may be the project for you.

Specifically: within libinput, we want to disable certain devices based on a lid state. In the first instance this means that when the lid switch is toggled to closed, the touchpad and trackpoint get silently disabled to not send events anymore. [1] Since it's based on a switch state, this also means that we'll now have to listen to switch events and expose those devices to libinput users.

The things required to get all this working are:

  • Designing a switch interface plus the boilerplate code required (I've done most of this bit already)
  • Extending the current evdev backend to handle devices with EV_SW and exposing their events
  • Hooking up the switch devices to internal touchpads/trackpoints to disable them ad-hoc
  • Handle those devices where lid switch is broken in the hardware (more details on this when we get to this point)

You get to dabble with libinput and a bit of udev and the kernel. Possibly Xorg stuff, but that's unlikely at this point. This project is well suited for someone with a few spare weekends ahead. It's great for someone who hasn't worked with libinput before, but it's not a project to learn C, you better know that ahead of time. I'd provide the mentoring of course (I'm in UTC+10, so expect IRC/email). If you're interested let me know. Riches and fame may happen but are not guaranteed.

[1] A number of laptops have a hw issue where either device may send random events when the lid is closed

xinput is a tool to query and modify X input device properties (amongst other things). Every so-often someone-complains about it's non-intuitive interface, but this is where users are mistaken: xinput is a not a configuration UI. It is a DUI - a developer user interface [1] - intended to test things without having to write custom (more user-friendly) for each new property. It is nothing but a tool to access what is effectively a key-value store. To use it you need to know not only the key name(s) but also the allowed formats, some of which are only documented in header files. It is intended to be run under user supervision, anything it does won't survive device hotplugging. Relying on xinput for configuration is the same as relying on 'echo' to toggle parameters in /sys for kernel configuration. It kinda possibly maybe works most of the time but it's not pretty. And it's not intended to be, so please don't complain to me about the arcane user interface.

[1] don't do it, things will be a bit confusing, you may not do the right thing, you can easily do damage, etc. A lot of similarities... ;)

December 06, 2016

Avoiding CVE-2016-8655 with systemd

Just a quick note: on recent versions of systemd it is relatively easy to block the vulnerability described in CVE-2016-8655 for individual services.

Since systemd release v211 there's an option RestrictAddressFamilies= for service unit files which takes away the right to create sockets of specific address families for processes of the service. In your unit file, add RestrictAddressFamilies=~AF_PACKET to the [Service] section to make AF_PACKET unavailable to it (i.e. a blacklist), which is sufficient to close the attack path. Safer of course is a whitelist of address families whch you can define by dropping the ~ character from the assignment. Here's a trivial example:

RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX

This restricts access to socket families, so that the service may access only AF_INET, AF_INET6 or AF_UNIX sockets, which is usually the right, minimal set for most system daemons. (AF_INET is the low-level name for the IPv4 address family, AF_INET6 for the IPv6 address family, and AF_UNIX for local UNIX socket IPC).

Starting with systemd v232 we added RestrictAddressFamilies= to all of systemd's own unit files, always with the minimal set of socket address families appropriate.

With the upcoming v233 release we'll provide a second method for blocking this vulnerability. Using RestrictNamespaces= it is possible to limit which types of Linux namespaces a service may get access to. Use RestrictNamespaces=yes to prohibit access to any kind of namespace, or set RestrictNamespaces=net ipc (or similar) to restrict access to a specific set (in this case: network and IPC namespaces). Given that user namespaces have been a major source of security vulnerabilities in the past months it's probably a good idea to block namespaces on all services which don't need them (which is probably most of them).

Of course, ideally, distributions such as Fedora, as well as upstream developers would turn on the various sandboxing settings systemd provides like these ones by default, since they know best which kind of address families or namespaces a specific daemon needs.

This post mostly affects developers of desktop environments/Wayland compositors. A systemd pull request was merged to add two new properties to some keyboards: XKB_FIXED_LAYOUT and XKB_FIXED_VARIANT. If set, the device must not be switched to a user-configured layout but rather the one set in the properties. This is required to make fake keyboard devices work correctly out-of-the-box. For example, Yubikeys emulate a keyboard and send the configured passwords as key codes matching a US keyboard layout. If a different layout is applied, then the password may get mangled by the client.

Since udev and libinput are sitting below the keyboard layout there isn't much we can do in this layer. This is a job for those parts that handle keyboard layouts and layout configurations, i.e. GNOME, KDE, etc. I've filed a bug for gnome here, please do so for your desktop environment.

If you have a device that falls into this category, please submit a systemd patch/file a bug and cc me on it (@whot).

December 05, 2016

This post applies to most tools that interface with the X server and change settings in the server, including xinput, xmodmap, setxkbmap, xkbcomp, xrandr, xsetwacom and other tools that start with x. The one word to sum up the future for these tools under Wayland is: "non-functional".

An X window manager is little more than an innocent bystander when it comes to anything input-related. Short of handling global shortcuts and intercepting some mouse button presses (to bring the clicked window to the front) there is very little a window manager can do. It's a separate process to the X server and does not receive most input events and it cannot affect what events are being generated. When it comes to input device configuration, any X client can tell the server to change it - that's why general debugging tools like xinput work.

A Wayland compositor is much more, it is a window manager and the display server merged into one process. This gives the compositor a lot more power and responsibility. It handles all input events as they come out of libinput and also manages device's configuration. Oh, and instead of the X protocol it speaks Wayland protocol.

The difference becomes more obvious when you consider what happens when you toggle a setting in the GNOME control center. In both Wayland and X, the control center toggles a gsettings key and waits for some other process to pick it up. In both cases, mutter gets notified about the change but what happens then is quite different. In GNOME(X), mutter tells the X server to change a device property, the server passes that on to the xf86-input-libinput driver and from there the setting is toggled in libinput. In GNOME(Wayland), mutter toggles the setting directly in libinput.

Since there is no X server in the stack, the various tools can't talk to it. So to get the tools to work they would have to talk to the compositor instead. But they only know how to speak X protocol, and no Wayland protocol extension exists for input device configuration. Such a Wayland protocol extension would most likely have to be a private one since the various compositors expose device configuration in different ways. Whether this extension will be written and added to compositors is uncertain, I'm not aware of any plans or even intentions to do so (it's a very messy problem). But either way, until it exists, the tools will merely shout into the void, without even an echo to keep them entertained. Non-functional is thus a good summary.

The Raspberry Pi Foundation recently started contracting with Free Electrons to give me some support on the display side of the stack.  Last week I got to review and release their first big piece of work: Boris Brezillon's code for SDTV support.  I had suggested that we use this as the first project because it should have been small and self contained.  It ended up that we had some clock bugs Boris had to fix, and a bug in my core VC4 CRTC code, but he got a working patch series together shockingly quickly.  He did one respin for a couple more fixes once I had tested it, and it's now out on the list waiting for devicetree maintainer review.  If nothing goes wrong, we should have composite out support in 4.11 (we're probably a week late for 4.10).

On my side, I spent some more time on HDMI audio and the DSI panel.  On the audio side, I'm now emitting the GCP packet for audio mute appropriately (I think), and with some more clocking fixes it's now accepting the audio data at the expected rate.  On the DSI front, I fixed a bit of sequencing and added debugfs for the registers like we have in our other encoders.  There's still no actual audio coming out of HDMI, and only white coming out of the panel.

The DSI situation is making me wish for someone else's panel that I could attach to the connector, so I could see if my bugs are in the Atmel bridge programming or in the DSI driver.

I did some more analysis of 3DMMES's shaders, and improved our code generation, for wins of 0.4%, 1.9%, 1.2%, 2.6%, and 1.2%.  I also experimented with changing the UBO (indirect addressed uniform array) upload path, which showed no effect.  3DMMES's uniform arrays are tiny, though, so it may be a win in some other app later.

I also got a couple of new patches from Jonas Pfeil.  I went through his thread switch delay slots patch, which is pretty close to ready.  He has a similar patch for branching delay slots, though apparently that one isn't showing wins yet in things he's tested.  Perhaps most exciting, though, is that he went and implemented an idea I had dropped on github: replacing our shadow copies of raster textures with a bunch of math in the shader and using general memory loads.  This could potentially fix X performance without a compositor, which we otherwise really don't have a workaround for other than "use a compositor."  It could also improve GL-in-a-window performance: right now all of our DRI surfaces are raster format, because we want to be able to get full screen pageflipping, but that means we do the shadow copy if they aren't fullscreen.  Hopefully this week I'll get a chance to test and review it.

I pushed the patch to require resolution today, expect this to hit the general public with libinput 1.6. If your graphics tablet does not provide axis resolution we will need to add a hwdb entry. Please file a bug in systemd and CC me on it (@whot).

How do you know if your device has resolution? Run sudo evemu-describe against the device node and look for the ABS_X/ABS_Y entries:

# Event code 0 (ABS_X)
# Value 2550
# Min 0
# Max 3968
# Fuzz 0
# Flat 0
# Resolution 13
# Event code 1 (ABS_Y)
# Value 1323
# Min 0
# Max 2240
# Fuzz 0
# Flat 0
# Resolution 13
if the Resolution value is 0 you'll need a hwdb entry or your tablet will stop working in libinput 1.6. You can file the bug now and we can get it fixed, that way it'll be in place once 1.6 comes out.

pastebins are useful for dumping large data sets whenever the medium of conversation doesn't make this easy or useful. IRC is one example, or audio/video conferencing. But pastebins only work when the other side looks at the pastebin before it expires, and the default expiry date for a pastebin may only be a few days.

This makes them effectively useless for bugs where it may take a while for the bug to be triaged and the assignee to respond. It may take even longer to figure out the source of the bug, and if there's a regression it can take months to figure it out. Once the content disappears we have to re-request the data from the reporter. And there is a vicious dependency too: usually, logs are more important for difficult bugs. Difficult bugs take longer to fix. Thus, with pastebins, the more difficult the bug, the more likely the logs become unavailable.

All useful bug tracking systems have an attachment facility. Use that instead, it's archived with the bug and if a year later we notice a regression, we still have access to the data.

If you got here because I pasted the link to this blog post, please do the following: download the pastebin content as raw text, then add it as attachment to the bug (don't paste it as comment). Once that's done, we can have a look at your bug again.

November 28, 2016
I missed last week's update, but with the holiday it ended up being a short week anyway.

The multithreaded fragment shaders are now in drm-next and Mesa master.  I think this was the last big win for raw GL performance and we're now down to the level of making 1-2% improvements in our commits.  That is, unless we're supposed to be using double-buffer non-MS mode and the closed driver was just missing that feature.  With the glmark2 comparisons I've done, I'm comfortable with this state, though.  I'm now working on performance comparisons for 3DMMES Taiji, which the HW team often uses as a benchmark.  I spent a day or so trying to get it ported to the closed driver and failed, but I've got it working on the open stack and have written a couple of little performance fixes with it.

The first was just a regression fix from the multithreading patch series, but it was impressive that multithreading hid a 2.9% instruction count penalty and still showed gains.

One of the new fixes I've been working on is folding ALU operations into texture coordinate writes.  This came out of frustration from the instruction selection research I had done the last couple of weeks, where all algorithms seemed like they would still need significant peephole optimization after selection.  I finally said "well, how hard would it be to just finish the big selection problems I know are easily doable with peepholes?" and it wasn't all that bad.  The win came out to about 1% of instructions, with a similar benefit to overall 3DMMES performance (it's shocking how ALU-bound 3DMMES is)

I also started on a branch to jump to the end of the program when all 16 pixels in a thread have been discarded.  This had given me a 7.9% win on GLB2.7 on Intel, so I hoped for similar wins here.  3DMMES seemed like a good candidate for testing, too, with a lot of discards that are followed by reams of code that could be skipped, including texturing.  Initial results didn't seem to show a win, but I haven't actually done any stats on it yet.  I also haven't done the usual "draw red where we were applying the optimization" hack to verify that my code is really working, either.

While I've been working on this, Jonas Pfeil (who originally wrote the multithreading support) has been working on a couple of other projects.  He's been trying to get instructions scheduled into the delay slots of thread switches and branching, which should help reduce any regressions those two features might have caused.  More exciting, he's just posed a branch for doing nearest-filtered raster textures (the primary operation in X11 compositing) using direct memory lookups instead of our shadow-copy fallback.  Hopefully I get a chance to review, test, and merge in the next week or two.

On the kernel side, my branches for 4.10 have been pulled.  We've got ETC1 and multithread FS for 4.10, and a performance win in power management.  I've also been helping out and reviewing Boris Brezillon's work for SDTV output in vc4.  Those patches should be hitting the list this week.
November 20, 2016

The Fedora Change to retire the synaptics driver was approved by FESCO. This will apply to Fedora 26 and is part of a cleanup to, ironically, make the synaptics driver easier to install.

Since Fedora 22, xorg-x11-drv-libinput is the preferred input driver. For historical reasons, almost all users have the xorg-x11-drv-synaptics package installed. But to actually use the synaptics driver over xorg-x11-drv-libinput requires a manually dropped xorg.conf.d snippet. And that's just not ideal. Unfortunately, in DNF/RPM we cannot just say "replace the xorg-x11-drv-synaptics package with xorg-x11-drv-libinput on update but still allow users to install xorg-x11-drv-synaptics after that".

So the path taken is a package rename. Starting with Fedora 26, xorg-x11-drv-libinput's RPM will Provide/Obsolete [1] xorg-x11-drv-synaptics and thus remove the old package on update. Users that need the synaptics driver then need to install xorg-x11-drv-synaptics-legacy. This driver will then install itself correctly without extra user intervention and will take precedence over the libinput driver. Removing xorg-x11-drv-synaptics-legacy will remove the driver assignment and thus fall back to libinput for touchpads. So aside from the name change, everything else works smoother now. Both packages are now updated in Rawhide and should be available from your local mirror soon.

What does this mean for you as a user? If you are a synaptics user, after an update/install, you need to now manually install xorg-x11-drv-synaptics-legacy. You can remove any xorg.conf.d snippets assigning the synaptics driver unless they also include other custom configuration.

See the Fedora Change page for details. Note that this is a Fedora-specific change only, the upstream change for this is already in place.

[1] "Provide" in RPM-speak means the package provides functionality otherwise provided by some other package even though it may not necessarily provide the code from that package. "Obsolete" means that installing this package replaces the obsoleted package.

November 16, 2016
At this point, I haven't pushed a new release tag for xf86-video-freedreno to update to latest xserver ABI.  I'm inclined not to.  If you are using a modern xserver you probably want to be using xf86-video-modesetting + glamor.  It has more features (dri3, xv, etc) and better performance.  And GL support on a3xx/a4xx is pretty solid.  So distros with a modern xserver might as well drop the xf86-video-freedreno package.

The one case where xf86-video-freedreno is still useful is bringing up a new generation of adreno, since it can do dri2 with pure-sw fallbacks for all the EXA ops.  But if that is what you are doing, I guess you know how to git clone and build.

The possible alternative is to push a patch that makes xf86-video-freedreno still build, but only probe (with latest xserver ABI) if some "ForceLoad" type option is given in xorg.conf, otherwise fallback to modesetting/glamor.  I can't think of a good reason to do this at the moment.  But as always, questions/comments/suggestions welcome.

November 15, 2016

As usual, it can be turned on and off at build-time and there is some configuration available as well to control how the effect works. Here are some screen-shots:

Motion Blur Off
Motion Blur Off
Motion Blur On, intensity 12.5%
Motion Blur On, intensity 12.5%

Motion Blur On, intensity 25%

Motion Blur On, intensity 25%
Motion Blur On, intensity 50%
Motion Blur On, intensity 50%

Motion blur is a technique used to make movement feel smoother than it actually is and is targeted at hiding the fact that things don’t move in continuous fashion, but rather, at fixed intervals dictated by the frame rate. For example, a fast moving object in a scene can “leap” many pixels between consecutive frames even if we intend for it to have a smooth animation at a fixed speed. Quick camera movement produces the same effect on pretty much every object on the scene. Our eyes can notice these gaps in the animation, which is not great. Motion blur applies a slight blur to objects across the direction in which they move, which aims at filling the animation gaps produced by our discrete frames, tricking the brain into perceiving a smoother animation as result.

In my demo there are no moving objects other than the sky box or the shadows, which are relatively slow objects anyway, however, camera movement can make objects change screen-space positions quite fast (specially when we rotate the view point) and the motion- blur effect helps us perceive a smoother animation in this case.

I will try to cover the actual implementation in some other post but for now I’ll keep it short and leave it to the images above to showcase what the filter actually does at different configuration settings. Notice that the smoothing effect is something that can only be perceived during motion, so still images are not the best way to showcase the result of the filter from the perspective of the viewer, however, still images are a good way to freeze the animation and see exactly how the filter modifies the original image to achieve the smoothing effect.

Last Friday, both a GNOME bug day and a bank holiday, a few of us got together to squash some bugs, and discuss GNOME and GNOME technologies.

Guillaume, a new comer in our group, tested the captive portal support for NetworkManager and GNOME in Gentoo, and added instructions on how to enable it to their Wiki. He also tested a gateway related configuration problem, the patch for which I merged after a code review. Near the end of the session, he also rebuilt WebKitGTK+ to test why Google Docs was not working for him anymore in Web. And nobody believed that he could build it that quickly. Looks like opinions based on past experiences are quite hard to change.

Mathieu worked on removing jhbuild's .desktop file as nobody seems to use it, and it was creating the Sundry category for him, in gnome-shell. He also spent time looking into the tracker blocker that is Mozilla's Focus, based on disconnectme's block lists. It's not as effective as uBlock when it comes to blocking adverts, but the memory and performance improvements, and the slow churn rate, could make it a good default blocker to have in Web.

Haïkel looked into using Emeus, potentially the new GTK+ 4.0 layout manager, to implement the series properties page for Videos.

Finally, I added Bolso to jhbuild, and struggled to get gnome-online-accounts/gnome-keyring to behave correctly in my installation, as the application just did not want to log in properly to the service. I also discussed Fedora's privacy policy (inappropriate for Fedora Workstation, as it doesn't cover the services used in the default installation), a potential design for Flatpak support of joypads and removable devices in general, as well as the future design of the Network panel.
I dropped HDMI audio last week because Jonas Pfeil showed up with a pair of branches to do multithreaded fragment shaders.

Some context for multithreaded fragment shaders: Texture lookups are really slow.  I think we eyeballed them as having a latency of around 20 QPU instructions.  We can hide latency sometimes if there's some unrelated math to be done after the texture coordinate calculation but before using the texture sample.  However, in most cases, you need the texture sample results right away for the next bit of work.

To allow programs to hide that latency, there's a cooperative hyperthreading mode that a fragment shader can opt into.  The shader stays in the bottom half of register space, and before it collects the results of a texture fetch, it issues a thread switch signal, which the hardware will use to run a second fragment shader thread until that one issues its own thread switch.  For the second thread, the top bit of the register addresses gets flipped, so the two threads' states don't overlap (except for the accumulators and the flags, which the shaders have to save and restore).

I had delayed working on this because the full solution was going to be really tricky: queue up as many lookups as we can, then thread switch, then collect all those results before switching again, all while respecting the FIFO sizes.  However, Jonas's huge contribution here was in figuring out that you don't need to be perfect, you can get huge gains by thread switching between each texture lookup and reading its results.

The upshot was a 0-20% performance improvement on glmark2 and a performance hit to only one testcase.  With this we're down to 3 subtests that we're slower on than the closed source driver.  Jonas's kernel patch is out on the list, and I rewrote the Mesa series to expose threading to the register allocator and landed all but the enabling patch (which has to wait on the kernel side getting merged).  Hopefully I'll finish merging it in a week.

In the process of writing multithreading, Jonas noticed that we were scheduling our TLB_Z writes early in the program, which can cut into fragment shader parallelism between the QPUs (of which we have 12) because a TLB_Z write locks the scoreboard for that 2x2 pixel quad.  I merged a patch to Mesa that pushes the TLB_Z down to the bottom, at the cost of a single extra QPU instruction.  Some day we should flip QPU scheduling around so that we pair that TLB_Z up better, and fill our delay slots in thread switches and branching as well.

I also tracked down a major performance issue in Raspbian's desktop using the open driver.  I asked them to use xcompmgr a while back, because readback from the front buffer using the GPU is really slow (The texture unit can't read from raster textures, so we have to reformat them to be readable), and made window dragging unbearable.  However, xcompmgr doesn't unredirect fullscreen windows, so full screen GL apps emit copies (with reformatting!) instead of pageflipping.

It looks like the best choice for Raspbian is going to be using compton, an xcompmgr fork that does pageflipping (you get tear free screen updates!) and unredirection of full screen windows (you get pageflipping directly from GL apps).  I've also opened a bug report on compton with a description of how they could improve their GL drawing for a tiled renderer like the Pi, which could improve its performance for windowed updates significantly.

Simon ran into some trouble with compton, so he hasn't flipped the default yet, but I would encourage anyone running a Raspberry Pi desktop to give it a shot -- the improvement should be massive.

Other things last week: More VCHIQ patch review, merging more to the -next branches for 4.10 (Martin Sperl's thermal driver, at last!), and a shadow texturing bugfix.
November 14, 2016

I've written more extensively about this here but here's an analogy that should get the point across a bit better: Wayland is just a protocol, just like HTTP. In both cases, you have two sides with very different roles and functionality. In the HTTP case, you have the server (e.g. Apache) and the client (a browser, e.g. Firefox). The communication protocol is HTTP but both sides make a lot of decisions unrelated to the protocol. The server decides what data is sent, the client decides how the data is presented to the user. Wayland is very similar. The server, called the "compositor", decides what data is sent (also: which of the clients even gets the data). The client renders the data [1] and decides what to do with input like key strokes, etc.

Asking Does $FEATURE work under Wayland? is akin to asking Does $FEATURE work under HTTP?. The only answer is: it depends on the compositor and on the client. It's the wrong question. You should ask questions related to the compositor and the client instead, e.g. "does $FEATURE work in GNOME?" or "does $FEATURE work in GTK applications?". That's a question that can be answered.

Of course, there are some cases where the fault is really the protocol itself. But often enough, it's not.

[1] albeit it does so by telling the compositor to display it. The analogy with HTTP only works to some extent... :)

November 10, 2016

So, in Fedora Workstation 24 we added H264 support through OpenH264. In Fedora Workstation 25 I am happy to tell you all that we are taking another step in improving our codec support by adding support for mp3 playback. I know this has been a big wishlist item for a long time for a lot of people so I am really happy that we are finally in a position to fulfil that wish. You should be able to download the mp3 plugin on day 1 through GNOME Software or through the missing codec installer in various GStreamer applications. For Fedora Workstation 26 I would not be surprised if we decide to ship it on the install media.

Fo the technically inclined out there, our initial enablement is through the mpeg123 library and corresponding GStreamer plugin. The main reason we choose this library over all the others available out there was a combination of using the same license as GStreamer (lgpl v2) and being a well established library used by a lot of different applications already. There might be other mp3 decoders added in the future depending on interest in and effort by the community. So get ready to install Fedora Workstation 25 when its released soon and play some tunes :)

P.S. To be 110% clear we will not be adding encoding support at this time.

November 08, 2016
Almost all of Collabora's customers use the Linux kernel on their products. Often they will use the exact code as delivered by the SBC vendors and we'll work with them in other pars of their software stack. But it's becoming increasingly common for our customers to adapt the kernel sources to the specific needs of their particular products.

A very big problem most of them have is that the kernel version they based on isn't getting security updates any more because it's already several years old. And the reason why companies are shipping kernels so old is that they have been so heavily modified compared to the upstream versions, that rebasing their trees on top of newer mainline releases is so expensive that is very hard to budget and plan for it.

To avoid that, we always recommend our customers to stay close to their upstreams, which implies rebasing often on top of new releases (typically LTS releases, with long term support). For the budgeting of that work to become possible, the size of the delta between mainline and downstream sources needs to be manageable, which is why we recommend contributing back any changes that aren't strictly specific to their products.

But even for those few companies that already have processes in place for upstreaming their changes and are rebasing regularly on top of new LTS releases, keeping up with mainline can be a substantial disruption of their production schedules. This is in part because new bugs will be in the new mainline release, and new bugs will be in the downstream changes as they get applied to the new version.

Those companies that are already keeping close to their upstreams typically have advanced QA infrastructure that will detect those bugs long before production, but a long stabilization phase after every rebase can significantly slow product development.

To improve this situation and encourage more companies to keep their efforts close to upstream we at Collabora have been working for a few years already in continuous integration of FOSS components across a diverse array of hardware. The initial work was sponsored by Bosch for one of their automotive projects, and since the start of 2016 Google has been sponsoring work on continuous integration of the mainline kernel.

One of the major efforts to continuously integrate the mainline Linux kernel codebase is, which builds several configurations of different trees and submits boot jobs to several labs around the world, collating the results. This is being of great help already in detecting at a very early stage any changes that either break the builds, or prevent a specific piece of hardware from completing the boot stage.

Though can easily detect when an update to a source code repository has introduced a bug, such updates can have several dozens of new commits, and without knowing which specific commit introduced the bug, we cannot identify culprits to notify of the problem. This means that either someone needs to monitor the dashboard for problems, or email notifications are sent to the owners of the repositories who then have to manually look for suspicious commits before getting in contact with their author.

To address this limitation, Google has asked us to look into improving the existing code for automatic bisection so it can be used right away when a regression is detected, so the possible culprits are notified right away without any manual intervention.

Another area in which is currently lacking is in the coverage of the testing. Build and boot regressions are very annoying for developers because they impact negatively everybody who work in the affected configurations and hardware, but the consequences of regressions in peripheral support or other subsystems that aren't involved critically during boot can still make rebases much costlier.

At Collabora we have had a strong interest in having the DRM subsystem under continuous integration and some time ago started a R&D project for making the test suite in IGT generically useful for all the DRM drivers. IGT started out being i915-specific, but as most of the tests exercise the generic DRM ABI, they could as well test other drivers with a moderate amount of effort. Early in 2016 Google started sponsoring this work and as of today submitters of new drivers are using it to validate their code.

Another related effort has been the addition to DRM of a generic ABI for retrieving CRCs of frames from different components in the graphics pipeline, so two frames can be compared when we know that they should match. And another one is adding support to IGT for the Chamelium board, which can simulate several display connections and hotplug events.

A side-effect of having continuous integration of changes in mainline is that when downstreams are sending back changes to reduce their delta, the risk of introducing regressions is much smaller and their contributions can be accepted faster and with less effort.

We believe that improved QA of FOSS components will expand the base of companies that can benefit from involvement in development upstream and are very excited by the changes that this will bring to the industry. If you are an engineer who cares about QA and FOSS, and would like to work with us on projects such as, LAVA, IGT and Chamelium, get in touch!

November 07, 2016
I spent a little bit of time last week on actual 3D work.  I finally figured out what was going wrong with glmark2's terrain demo.  The RCP (reciprocal) instruction on vc4 is a very rough approximation, and in translating GLSL code doing floating point divides (which we have to convert from a / b to a * (1 / b)) we use a Newton-Raphson step to improve our approximation.  However, we forgot to do this for the implicit divide we have to do at the end of your vertex shader, so the triangles in the distance were unstable and bounced around based on the error in the RCP output.

I also finally got around to writing kernel-side validation for ETC1 texture compression.  I should be able to merge support for this to v4.10, which is one of the feature checkboxes we were missing compared to the old driver.  Unfortunately ETC1 doesn't end up being very useful for us in open source stacks, because S3TC has been around a lot longer and supported on more hardware, so there's not much content using ETC that I know of outside of Android land.

I also spent a while fixing up various regressions that had happened in Mesa while I'd been playing in the kernel.  Some day I should work on CI infrastructure on real hardware, but unfortunately Mesa doesn't have any centralized CI infrastructure to plug into.  Intel has a test farm, but not all proposed patches get submitted to it, and it (understandably) doesn't have non-Intel hardware.

In kernel land, I sent out a patch to reduce V3D's overhead due to power management.  We were immediately shutting off the 3D engine when it went idle, even if userland might be expected to submit a new frame shortly thereafter.  This was taking up 1% of the CPU on profiles I was looking at last week, and I've seen it up to 3.5%.  Now we keep the GPU on for at least 40ms ("about two frames plus some slack").  If I'm reading right, the closed driver does immediate powerdown as well, so we may be using slightly more power now, but given that power state transitions themselves cost power, and the CPU overhead costs power, hopefully this works out fine anyway (even though we've never done power measurements for the platform).

I also reviewed more patches from Michael Zoran for vchiq.  It should now (once all reviewed patches land) work on arm64.  We still need to get the vchiq-using drivers like camera and vcsm into staging.

Finally, I made more progress on HDMI audio.  One of the ALSA developers, Lars-Peter Clausen, pointed out that ALSA's design is that userspace would provide the IEC958 subframes by using alsalib's iec958 plugin and a bit ouf asoundrc configuration.  He also provided a ton of help debugging that pipeline.  I rebased and cleaned up into a branch that uses simple-audio-card as my machine driver, so the whole stack is pretty trivial.  Now aplay runs successfully, though no sound has come out.  Next step is to go test that the closed stack actually plays audio on this monitor and capture some register state for debugging.
November 01, 2016

I finally have a bit of time to look at touchpad pointer acceleration in libinput. But when I did, I found a great total of 5 bugs across and Red Hat's bugzilla, despite this being the first thing anyone brings up whenever libinput is mentioned. 5 bugs - that's not much to work on. Note that over time there were also a lot of bugs where pointer acceleration was fixed once the touchpad's axis ranges were corrected which usually is a two-liner for the udev hwdb.

Anyway, point of this post: if you're still having issues with pointer acceleration on your touchpad in libinput, please file a bug against libinput and make it block the new tracker bug 98535. The libinput documentation has instructions on how to report a touchpad bug, but amongst the various things I need is your laptop model name.

Don't complain about it on reddit, phoronix, HN, or in some random forum, because you're just wasting bytes there and it won't get fixed that way.

When we started the Fedora Workstation effort one thing we wanted to do was to drain the proverbial swamp of make sure that running Linux on a laptop is a first rate experience. As you see from my last blog entry we have been working on building a team dedicated to that task. There are many facets to this effort, but one that we kept getting asked about was sorting out hybrid graphics. So be aware that some of this has been covered in previous blog entries, but I want to be thorough here. So this blog will cover the big investments of time and effort we are putting into Wayland and X Windows, GNOME Shell and Nouveau (the open source driver for NVidia GPU hardware).

A big part of this story is also that we are trying to make it easier to run the NVidia binary driver on Fedora in general which is actually a fairly large undertaking as there are a lot challenges with trying to do that without requiring any manual configuration from the user and making sure it works fully in the hybrid graphics usecase. This is one of the most requested areas of improvement for Fedora. Many users are trying to run Fedora on their existing laptops or other hardware. Rather than allow this gap to be a reason for people to not run Fedora, we want to provide a better experience. Of course users will have the freedom to make their own choice about installing these drivers, using information provided by Software.

Hybrid graphics is the term used for when you have a laptop with two GPUs, usually one Intel and one NVidia GPU, but there are also some laptops available which comes with Intel/AMD CPU + AMD GPU. The purpose of hybrid graphics is to have your (Intel) Integrated GPU be your ‘day to day GPU’ for running your desktop yet not drawing to much power, but then if you want to for instance play a game on your system you can activate your secondary GPU which got a lot more power, but which also draws more electricity.

What complicated this even more was the fact that most users who wanted to use this setup wanted to use it in combination with the binary NVidia driver, in order to get top performance from their second GPU, which is not surprising since that is the whole point of switching to it. The main challenge with this was that the Mesa and binary NVidia driver both provided an OpenGL implementation that due to the way things works in the X Window System ended up overwriting each other, basically breaking the overwritten driver. The solution for this was a system called Bumblebee which employed some clever hacks to work around this issue, but Bumblebee is a solution to a problem we shouldn’t have to begin with.

Of course dealing with the OpenGL stack wasn’t the only challenge here. For instance different display outputs ended up being connected to different GPUs, with one of the most common setups being that the internal screen is connected to your Intel GPU and your external HDMI and DisplayPort connections are connected to your NVidia or AMD GPU. One some systems there where hardware connectors allowing you to display using either GPU to any screen, but a setup we see becoming more common is that the drivers for both card needs to be initialized in all cases, allowing the display connected GPU to slave itself to the other GPU to allow rendering on one GPU and displaying on the other. Within Mesa this is fairly straightforward to handle, so if both Intel and NVidia as rendering using Mesa things are fairly straightforward. What makes this a lot more challenging here is the NVidia binary driver, so once you install the binary driver we need to find a way to bridge and interoperate between these two stacks.

And of course that is just the highlights, like anything complex like this there are a long laundry list of items from the point where you can checkbox having a feature to it working really well and seamless.

What to expect in Fedora Workstation 25
Lets start with a word of caution here, we are working on a tight deadline here to get all the pieces lined up, so I can not be 100% sure what will make it for the day of release. What we are confident about though is that we will have all the low level plumbing sorted so that we can improve this over the course of the Fedora 25 lifecycle. As always I hope the community will join us in testing and developing this, to ensure we have even the corner cases worked out for Fedora Workstation 26.

The initial step we took and this is quite some time ago was that Dave Airlie worked on making sure we could handle two GPUs within the Mesa stack. As a bit of wordplay on the NVidia solution being called Optimus the open source Mesa solution was named Prime. This work basically allowed you to use Mesa for your Intel card and use Mesa (Nouveau) for your NVidia card and launch applications on the NVidia card while the screen was connected to the Intel card. You can choose which one is used by setting the DRI_PRIME=1 environment variable.

The second step was Adam Jackson collaborating with NVidia on something called libglvnd. Libglvnd stands for GL Vendor Neutral Dispatch and it is basically a dispatch layer that allows your OpenGL calls to be dispatched to more than one OpenGL implementation. Nvidia created this specification and have been steadily working on supporting it in their binary driver, but Adam Jackson stepped up to help review their patches and work on supporting glvnd in the Mesa drivers. This means that for Fedora Workstation 25 you should for the first time ever be able to have both the binary NVidia driver and Mesa installed without any file conflicts. So the X server now has the infrastructure to route your OpenGL call to the correct stack when that DRI_PRIME variable is set. That said there is a major catch here, due to the way things currently work once the NVidia binary driver is installed it expects to be rendering the screen all the time. So we need for the short term to figure out a way to allow it to do that, and in the long run we need to work with NVidia to figure out how the Intel open source driver can collaborate with the Nvidia driver to allow us to only use the Intel driver at times (to save on power for instance). We are still pushing hard on trying to have the short term fix in place, but as I write this we haven’t fully nailed this down yet.

The third step is work that Ben Skeggs has been doing on dealing with the monitor handling here, which includes adding MST support to Nouveau, because a lot of external ports and docking stations have not been working with these hybrid graphics setups due to the external screens all being routed through the NVidia chip. These patches just got accepted upstream and we will be including them in Fedora Workstation 25.

The fourth step has been work that Hans de Goede has been doing around Prime, fixing the modesetting driver and fixing cursor handling with hybrid graphics. In some sense the work Hans has been doing, and check his blog entry linked, is part of that washlist I talked about, the many smaller items that needs to work smoothly for this to go from ‘checkbox works’ to ‘actually works’. He is also working with the DNF team to allow us to control the kernel updates if you use the binary NVidia driver, meaning that we will only bump up your kernel driver when we have the binary NVidia driver module ready too. If for any reason the binary Nvidia driver doesn’t work we want to have graceful fallback to Nouveau.

Fifth step has been Jonas Ådahls work on enabling the binary NVidia driver for Wayland. He has put together a set of patches to be able to support NVidias EGLStreams interface, which means that starting from Fedora Workstation 25 you will be able to use Wayland also with NVidias binary driver.
You can see his work in progress patches here. Jonas will also be looking at implementing hybrid graphics support in Wayland, so we ensure that also for this usecase Wayland is on par with X for Fedora Workstation 26.

The sixth step has been work done by Bastien Nocera to ensure we expose this functionality in the user interface. The idea is that you should be able to configure on a per application basis which GPU they are being launched on. It also means that you can now get all your GPU information in the GNOME about screen also when you have dual-GPU systems. More details in his blog post here.

gnome-shell-discrete-gpu-menuChoosing discete GPU in GNOME Shell

The seventh step is the work that Simone Caronni from Negativo17 has been doing on working with us on packaging the binary NVidia driver in a way that makes all this work. You can find his repo here on Negativo17, but Simone has also been working with Kalev Lember and Richard Hughes to ensure the driver show up correctly in GNOME Software once you have the repository enabled.

NVidia driver in GNOME SoftwareNVidia driver in GNOME Software

The plan is to offer the driver in Fedora Workstation 25 as third party software, but we haven’t yet made the formal proposal due to wanting to be sure we ironed out all important bugs first, both on our side and on NVidia side.

So as you can see from all of this there are a lot of components involved and since we are trying to support both X and Wayland with all of this the job hasn’t exactly been easy. There are for sure some things that will not be ready in time for Fedora Workstation 25, for instance if you want to use the binary NVidia driver we will not be able to make that work with XWayland. So native Wayland applications will be fine, including games using SDL on top of Wayland, but due to how the stack is architected we haven’t gotten around to implementing bridging OpenGL from Xwayland down to the binary NVidia driver. With Nouveau it should work however. This corner case we will have to figure out for Fedora Workstation 26, but for now we have decided that if we detect a Optimus system we will default you to X instead of Wayland.

Get involved
As always we love for more people to join us in this effort and one good way to get started with that is to join our Hybrid graphics test day on Thursday the 3rd of November!

October 31, 2016
Last week was spent almost entirely on HDMI audio support.

I had started the VC4-side HDMI setup last week, and got the rest of the stubbed bits filled out this week.

Next was starting to get the audio side connected.  I had originally derived my VC4 side from the Mediatek driver, which makes a child device for the generic "hdmi-codec" DAI codec to attach to.  HDMI-codec is a shared implementation for HDMI encoders that take I2S input, which basically just passes audio setup params into the DRM HDMI encoder, whose job is to actually make I2S signals on wires get routed into HDMI audio packets.  They also have a CPU DAI registered through device tree that moves PCM data from memory onto wires, and a machine driver registered through devicetree that connects the CPU DAI's wires to the HDMI codec DAI's wires.  This is apparently how SoC audio is usually structured.

VC4, on the other hand, doesn't have a separate hardware block for moving PCM data from memory onto wires, it just has the MAI_DATA register in the HDMI block that you put S/PDIF (IEC958) data into using the SoC's generic DMA engine.  So I made VC4's HDMI driver register another a child device with the MAI_DATA register's address passed in, and that child device gets bound by the CPU DAI driver and uses the ASoC generic DMA engine support.  That CPU DAI driver also ends up setting up the machine driver, since I didn't have any other reference to the CPU DAI anywhere.  The glue here is all based on static strings I can't rely on, and is a complete hack for now.

Last, notice how I've used "PCM" talking about a working implementation's audio data and "S/PDIF" data for the thing I have to take in?  That's the sticking point.  The MAI_DATA register wants things that look like the AES3 frames description you can find on wikipedia.  I think the HW is generating my preamble.  In the firmware's HDMI audio implementation, they go through the unpacked PCM data and set the channel status bit in each sample according to the S/PDIF channel status word (the first 2 bytes of which are also documented on wikipedia).   At the same time that they're doing that, they also set the parity bit.

My first thought was "Hey, look, ALSA has SNDRV_PCM_FORMAT_IEC958_SUBFRAME, maybe my codec can say it only takes these S/PDIF frames and everything will work out."  No.  It turns out aplay won't generate them and neither will gstreamer.  To make a usable HDMI audio driver, I'm going to need to generate my IEC958 myself.

Linux is happy to generate my channel status word in memory.  I think the hardware is happy to generate my parity bit.  That just leaves making the CPU DAI write the bits of the channel status word into the subframes as PCM samples come in to the driver, and making sure that the MSB of the PCM audio data is bit 27.  I haven't figured out how to do that quite yet, which is the project for this week.

Work-in-progress, completely non-functional code is here.

For the last time in 2016, I flew out to the OpenStack Summit in Barcelona, where I had the chance to meet (again) a lot of my fellow OpenStack contributors there.

How To Work Upstream with OpenStack

My week started by giving a talk about How To Work Upstream with OpenStack where I explained, accompanied by Ryota and Ashiq, to the audience how to contribute upstream to OpenStack. It went well and was well received by the public – you can watch the video below or download the slides.

Python 3 in telemetry projects

I've attended a few interesting cross-project sessions, which helped me getting some prioritization for my work during the next few months.

The Python 3 porting effort is blocked for a while in Nova and Swift for various (mostly non-technical) reasons, while almost all other projects are working correctly. On the other hand, we have committed the telemetry projects to be the first one to drop Python 2 support has soon as it is possible. The next steps are to be sure downstream is ready and enable functional testing in devstack with Python 3.

Ceilometer deprecation

Gordon Chung talking about Gnocchi 3

The Ceilometer sessions were really interesting, are we mainly discussed deprecating and removing old crufts that are not or should not be used anymore. The main change will be the depreciation of the Ceilometer API. It has been clear for more than a year that Gnocchi is the way-to-go to store and provide access to metrics, but we failed at announcing wildly. A lot of the people I talked to during the summit were not aware that the Ceilometer API was not a good pick, and that Gnocchi was the now recommended storage backend. Bad communication from our side – but we are going to fix it as of now.

We also committed to simplify the current architecture by removing the collector, which has now be made obsolete by the agent based architecture that was implemented during the last development cycles.

Aodh alarm timeout

We had a feature proposal for a while in Aodh that we postponed for too long already: having timeout triggered after not having seen some events. This seems to be a functionality requested by NFV users – something we want Aodh to cover. We spent some time discussing this feature, and now that we all have a clear understanding of the use case, we'll work on having a clear path to the implementation.

I've also attended a session with the Vitrage developers in order to discuss how we could work better together, as they rely on Aodh. It seems there might be some convergence in the future, which would be very welcome. Wait'n see.

Gnocchi improvement, past and future

The Gnocchi session ran smoothly, and everyone seemed happy with the work we have done so far. We've made some impressive improvement in Gnocchi 3.0 – as I already covered previously – and Gordon Chung presented a short talk about the performance difference metered while working on this new version of Gnocchi:

The return of the InfluxDB driver is on the table, as Sam Morrison proposed a patch for that while back. While it's not as fast and scalable as other drivers, it offers a good alternative for people having to use it.

Leandro Reox presented how to do capacity planning using Ceilometer and Gnocchi, presenting the projects at the same time:

It is pretty impressive to see what they achieved with this project, and I'm looking forward to being able to check how it works inside.

PTG and beyond

The next meeting is supposed to be the new OpenStack PTG in February in Atlanta, though we did not request any specific space there. While the team love seeing each other face-to-face every few months, we achieved to follow all of the guidelines I listed recently on good open source project management, meaning we are able to work very well asynchronously and remotely. There is no need to put hard requirements on people wanting to participate in our community. Nevertheless, I expect cross-projects discussions that will happen to still concern the OpenStack Telemetry projects.

In the end, we're all very happy with our past and future roadmaps and I'm looking forward to achieving our next big milestones with our amazing telemetry team!

You might remember my attempts at getting an easy to use cross-compilation for ARM applications on my x86-64 desktop machine.

With Fedora 25 approaching, I'm happy to say that the necessary changes to integrate the feature have now rolled into Fedora 25.

For example, to compile the GNU Hello Flatpak for ARM, you would run:

$ flatpak install gnome org.freedesktop.Platform/arm org.freedesktop.Sdk/arm
Installing: org.freedesktop.Platform/arm/1.4 from gnome
$ sudo dnf install -y qemu-user-static
$ TARGET=arm ./

For other applications, add the --arch=arm argument to the flatpak-builder command-line.

This example also works for 64-bit ARM with the architecture name aarch64.
October 28, 2016
Recently I've been working on improving hybrid graphics support for the upcoming Fedora 25 release. Although Fedora 25 Workstation will use Wayland by default for its GNOME 3 desktop, my work has been on hybrid gfx support under X11 (Xorg) as GNOME 3 on Wayland does not yet support hybrid gfx,

So no Wayland, still there are a lot of noticable hybrid gfx support and users of laptops with hybrid gfx using the open-source drivers should have a much smoother userexperience then before. Here is an (incomplete) list of generic improvements:

  • Fix the discrete GPU not suspending after using an external monitor, halving laptop battery life

  • xrandr --listproviders no longer shows 3 devices instead of 2 on laptops with 2 GPUs

  • Hardware cursor support when both GPUs have active video outputs, previously X would fallback to software cursor rendering in this case which would typically lead to a flickering, or entirely invisible cursor at the top of the screen

Besides this a lot of work has been done on fixing hybrid gfx issues in the modesetting driver, this is important since with Fedora we use the modesetting driver on Skylake and newer Intel integrated gfx as well as on Maxwell and newer Nvidia discrete GPUs, the following issues have been fixed in the modesetting driver (and thus on laptops with a Skylake CPU and/or a Maxwell or newer GPU):

  • Hide HW cursor on init, this fixes 2 cursors showing on some setups

  • Make the modesetting driver support DRI_PRIME=1 render offloading when the secondary GPU is using the modesetting driver

  • Fix misrendering (tiled vs linear) when using DRI_PRIME=1 render offloading and the primary GPU is using the modesetting driver

  • Fix GL apps running at 1 fps when shown on a video-output of the secondary GPU and the primary GPU is using the modesetting driver

  • Fix secondary GPU video output partial screen updates (part of the screen showing a previous frame) when the discrete GPU is the secondary GPU and the primary GPU is using the modesetting driver

  • Fix secondary GPU video output updates lagging (or sometimes the last frame simply not being shown at all because no further rendering is happening) when the discrete GPU is the secondary GPU and the primary GPU is using the modesetting driver

Note coming Thursday (November 3th) we're having a Fedora Better Switchable Graphics Test Day, if you have a laptop with hybrid gfx please join us to help further improving hybrid gfx support.
I have recently been occupied with a new project (and being with a cold all this week), so I have not been much present in the Wayland community. Now I can finally say what I and Emilio have been up to: Waltham! For more information, please see our annoucement.