October 05, 2015

Last week, I've been invited to the OpenStack Paris meetup #16, whose subject was about metrics in OpenStack. Last time I spoke at this meetup was back in 2012, during the OpenStack Paris meetup #2. A very long time ago!

I talked for half an hour about Gnocchi, the OpenStack project I've been running for 18 months now. I started by explaining the story behind the project and why we needed to build it. Ceilometer has an interesting history and had a curious roadmap these last year, and I summarized that briefly. Then I talk about how Gnocchi works and what it offers to users and operators. The slides where full of JSON, but I imagine it offered a interesting view of what the API looks like and how easy it is to operate. This also allowed me to emphasize how many use cases are actually really covered and solved, contrary to what Ceilometer did so far. The talk has been well received and I got a few interesting questions at the end.

The video of the talk (in French) and my slides are available on my talk page and below. I hope you'll enjoy it.

September 25, 2015

Lately I have worked on ARB_shader_storage_buffer_object  extension implementation on Mesa, along with Iago (who has already explained how we implement it).

One of the features we implemented was adding support for buffer variables to the queries defined in ARB_program_interface_query. Those queries ended up being very useful to check another feature introduced by ARB_shader_storage_buffer_object: the new storage layout called std430.

I have developed basic tests for piglit, but Jordan Justen and Ilia Mirkin mentioned that Ian Romanick had worked in a comprehensive testing for uniform buffer objects presented at XDC2014.

Ian developed a stochastic search-based testing for uniform block layouts which generates shader_runner tests using a template in Mako that defines a series of uniform blocks with different layouts. Finally, shader_runner verifies some information through glGetActiveUniforms*() queries such as: the variable type, its offset inside the block, its array size (if any), if it is row major, etc… using “active uniform” command.

That was the kind of testing that I was looking for, so I cloned his repository and developed a modified version for shader storage blocks and std430.

During that process, I found that shader_runner was lacking support for the queries defined in ARB_program_interface_query extension. Yesterday, the patches were pushed to master branch of the piglit repository.

If you want to use this new command, the format  is the following:

verify program_interface_query GL_INTERFACE_TYPE_ENUM var_name GL_PROPS_ENUM integer

or, if we include the GL type enum:

verify program_interface_query GL_INTERFACE_TYPE_ENUM var_name GL_PROPS_ENUM GL_TYPE_ENUM

I have written an example to show how to use this command in a shader_runner test. This is just a snippet showing the usage of the new command:

link success

verify program_interface_query GL_BUFFER_VARIABLE SSBO1.s5_1[0].s1_4[1].s1_3.fv3[0] GL_TYPE GL_FLOAT_VEC4
verify program_interface_query GL_BUFFER_VARIABLE SSBO1.s5_1[0].s1_4[1].s1_3.fv3[0] GL_TOP_LEVEL_ARRAY_SIZE 4
verify program_interface_query GL_BUFFER_VARIABLE SSBO1.s5_1[0].s1_4[1].s1_3.fv3[0] GL_OFFSET 128
verify program_interface_query GL_BUFFER_VARIABLE SSBO1.s5_1[0].s1_4[0].s1_3.fv3[0] GL_OFFSET 48
verify program_interface_query GL_BUFFER_VARIABLE SSBO1.s5_1[0].s1_4[0].m34_1 GL_TYPE GL_FLOAT_MAT3x4
verify program_interface_query GL_BUFFER_VARIABLE SSBO1.s5_1[0].s1_4[1].m34_1 GL_OFFSET 80
verify program_interface_query GL_BUFFER_VARIABLE SSBO1.s5_1[0].s1_4[1].m34_1 GL_ARRAY_STRIDE 0
verify program_interface_query GL_BUFFER_VARIABLE SSBO1.m43_1 GL_IS_ROW_MAJOR 1
verify program_interface_query GL_PROGRAM_INPUT piglit_vertex GL_TYPE GL_FLOAT_VEC4
verify program_interface_query GL_PROGRAM_OUTPUT piglit_fragcolor GL_TYPE GL_FLOAT_VEC4

I hope you find them useful for your tests!


I've wanted a stand-alone radio in my office for a long time. I've been using a small portable radio, but it ate batteries quickly (probably a 4-pack of AA for a bit less of a work week's worth of listening), changing stations was cumbersome (hello FM dials) and the speaker was a bit teeny.

A couple of years back, I had a Raspberry Pi-based computer on pre-order (the Kano, highly recommended for kids, and beginners) through a crowd-funding site. So I scoured « brocantes » (imagine a mix of car boot sale and antiques fair, in France, with people emptying their attics) in search of a shell for my small computer. A whole lot of nothing until my wife came back from a week-end at a friend's with this:

Photo from Radio Historia

A Philips Octode Super 522A, from 1934, when SKUs were as superlative-laden and impenetrable as they are today.

Let's DIY

I started by removing the internal parts of the radio, without actually turning it on. When you get such old electronics, they need to be checked thoroughly before being plugged, and as I know nothing about tube radios, I preferred not to. And FM didn't exist when this came out, so not sure what I would have been able to do with it anyway.

Roomy, and dirty. The original speaker was removed, the front buttons didn't have anything holding them any more, and the nice backlit screen went away as well.

To replace the speaker, I went through quite a lot of research, looking for speakers that were embedded, rather than get a speaker in box that I would need to extricate from its container. Visaton make speakers that can be integrated into ceiling, vehicles, etc. That also allowed me to choose one that had a good enough range, and would fit into the one hole in my case.

To replace the screen, I settled on an OLED screen that I knew would work without too much work with the Raspberry Pi, a small AdaFruit SSD1306 one. Small amount of soldering that was up to my level of skills.

It worked, it worked!

Hey, soldering is easy. So because of the size of the speaker I selected, and the output power of the RPi, I needed an amp. The Velleman MK190 kit was cheap (€10), and should just be able to work with the 5V USB power supply I planned to use. Except that the schematics are really not good enough for an electronics starter. I spent a couple of afternoons verifying, checking on the Internet for alternate instructions, re-doing the solder points, to no avail.

'Sup Tiga!

So much wasted time, and got a cheap car amp with a power supply. You can probably find cheaper.

Finally, I got another Raspberry Pi, and SD card, so that the Kano, with its super wireless keyboard, could find a better home (it went to my godson, who seemed to enjoy the early game of Pong, and being a wizard).

Putting it all together

We'll need to hold everything together. I got a bit of help for somebody with a Dremel tool for the piece of wood that will hold the speaker, and another one that will stick three stove bolts out of the front, to hold the original tuning, mode and volume buttons.

A real joiner

I fast-forwarded the machine by a couple of years with a « Philips » figure-of-8 plug at the back, so machine's electrics would be well separated from the outside.

Screws into the side panel for the amp, blu-tack to hold the OLED screen for now, RPi on a few leftover bits of wood.


My first attempt at getting something that I could control on this small computer was lcdgrilo. Unfortunately, I would have had to write a Web UI for it (remember, my buttons are just stuck on, for now at least), and probably port the SSD1306 OLED screen's driver from Python, so not a good fit.

There's no proper Fedora support for Raspberry Pis, and while one can use a nearly stock Debian with a few additional firmware files on Raspberry Pis, Fedora chose not to support that slightly older SoC at all, which is obviously disappointing for somebody working on Fedora as a day job.

Looking for other radio retrofits, and there are plenty of quality ones on the Internet, and for various connected speakers backends, I found PiMusicBox. It's a Debian variant with Mopidy builtin, and a very easy to use initial setup: edit a settings file on the SD card image, boot and access the interface via a browser. Tada!

Once I had tested playback, I lowered the amp's volume to nearly zero, raised the web UI's volume to the maximum, and raised the amp's volume to the maximum bearable for the speaker. As I won't be able to access the amp's dial, we'll have this software only solution.

Wrapping up

I probably spent a longer time looking for software and hardware than actually making my connected radio, but it was an enjoyable couple of afternoons of work, and the software side isn't quite finished.

First, in terms of hardware support, I'll need to make this OLED screen work, how lazy of me. The audio setup is currently just the right speaker, as I'd like both the radios and AirPlay streams to be downmixed.

Secondly, Mopidy supports plugins to extend its sources, uses GStreamer, so would be a right fit for Grilo, making it easier for Mopidy users to extend through Lua.

Do note that the Raspberry Pi I used is a B+ model. For B models, it's recommended to use a separate DAC, because of the bad audio quality, even if the B+ isn't that much better. Testing out use the HDMI output with an HDMI to VGA+jack adapter might be a way to cut costs as well.

Possible improvements could include making the front-facing dials work (that's going to be a tough one), or adding RFID support, so I can wave items in front of it to turn it off, or play a particular radio.

In all, this radio cost me:
- 10 € for the radio case itself
- 36.50 € for the Raspberry Pi and SD card (I already had spare power supplies, and supported Wi-Fi dongle)
- 26.50 € for the OLED screen plus various cables
- 20 € for the speaker
- 18 € for the amp
- 21 € for various cables, bolts, planks of wood, etc.

I might also count the 14 € for the soldering iron, the 10 € for the Velleman amp, and about 10 € for adapters, cables, and supplies I didn't end up using.

So between 130 and 150 €, and a number of afternoons, but at the end, a very flexible piece of hardware that didn't really stretch my miniaturisation skills, and a completely unique piece of furniture.

In the future, I plan on playing with making my own 3-button keyboard, and making a remote speaker to plug in the living room's 5.1 amp with a C.H.I.P computer.

Happy hacking!
September 24, 2015


If you were waiting for a new post to learn something new about Intel GEN graphics, I am sorry. I started moving away from i915 kernel development about one year ago. Although I had worked landed patches in mesa before, it has taken me some time to get to a point where I had enough of a grasp on things to write a blog post of interest. I did start to write a post about the i965 compiler, but I was never able to complete it.1

In general I think the documentation has gotten quite good at giving the full details and explanations in a reasonable format. In terms of diagrams and prose, the GEN8+ Programmer Reference Manuals blow the previous docs out of the water. The greatest challenge I see though is the organization of the information which is arguably worse on newer documentation. Also, there is so much of it that it’s very difficult to be able to find everything you need when you need it; and it is especially hard when you don’t really know what you’re looking for. Therefore, I thought it would be valuable to people who want to understand this part of the Intel hardware and the i965 driver. More importantly, I have a terrible memory, and this will serve as good documentation for future me.

I highly recommend looking at the SVG files if you want to see the pictures in higher detail. I try very hard to strike a balance between factual accuracy and in making the information consumable. If you feel that either something is grossly inaccurate, or far too confusing, please let me know.

And finally, before I begin, I’d like to extend a special thanks to Kristian Høgsberg for the initial 90 minute discussion that seeded this post as well as helping me along the way. I believe it benefited both of us.

Whiteboarding with krh

Blurry picture of the resulting whiteboard


One of the most important components that goes into a building a 3d scene is a vertex. The point represented by the vertex may simply be a point or it may be part of some amalgamation of other points to make a line, shape, mesh, etc. Depending on the situation, you might observe a set of vertices referred to as: topology, primitive, or geometry (these aren’t all the same thing, but unless you’re speaking to a pedant you can expect the terms to be interchanged pretty frequently). Most often, these vertices are parts of triangles which make up a more complex model which is part of a scene.

The first stage in modern current programmable graphics pipeline is the Vertex Shader which unsurprisingly operates on the vertices.2 In the simplest case you might use a vertex shader to simply locate the vertex in the correct place using the model, and view transformations. Don’t worry if you’re asking yourself why is it called a shader. Given the example there isn’t anything I would define as “shading” either. How clever programmers make use of the vertex shader is out of scope, thankfully – but rest assured it’s not always so simple.

Vertex processing of a gradient square

Vertex processing of a gradient square

Vertex Buffers

The attributes that define a vertex are called, Vertex Attributes (yay). For now you can just imagine the most obvious attribute – position. The attributes are stored in one or more buffers that are created and defined by the Graphics API. Each vertex (defined here as a collection of all the attributes for a single point) gets processed and/or modified by the vertex shader with the program that is also specified by the API.

It is the responsibility of the application to build up the vertex buffers. I already mentioned that position is a very common attribute describing a vertex; unless you’re going to be creating parts of your scene within a shader, it’s likely a required attribute. So the programmer in one way or another will define each position of every vertex for the scene. We can have other things though, and the purpose of the current programmable GPU pipelines is to allow the sky to be the limit. Colors and normals are some other common vertex attributes. With a few more magic API calls, we’ll have a nice 2d image on the screen that represents the 3D image provided via the vertices.

Here are some links with much more detailed information:

Intel GPU Hardware and the i965 Driver

The GPU hardware will need to be able to operate on the vertices that are contained in the buffers created by the programmer. It will also need to be able to execute a shader program specified by the programmer. Setting this up is the responsibility of the GPU driver. When I say driver, I mean like the i965 driver in mesa. This is just a plain old shared object sitting on your disk. This is not some kernel component. In graphics, the kernel driver knows less about the underlying 3d hardware than the userspace driver.

Ultimately, the driver needs to take all the API goop and make sure it programs the hardware correctly for the given goop. There will be details on this later in the post. Getting the shader program to run takes some amount of state setup, relatively little. It’s primarily a different component within the driver which implements this. Modern graphics drivers have a surprisingly full featured compiler within them for being able to convert the API’s shading language into the native code on the GPU.

Execution Units

Programmable shading for all stages in the GPU pipeline occurs on Execution Units which are often abbreviated to EU. All other stuff is generally called, “fixed function”. The Execution Units are VLIW processors. The ISA for the EU is documented. The most important point I need to make is that each thread of each EU has its own register file. If you’re unfamiliar with the term register file, you can think of it as thread local memory. If you’re unfamiliar with that, I’m sorry.

Push vs. Pull Vertex Attributes

As it pertains to the shader, Vertex Attributes can be pushed, or pulled [or both]. Intel and most other hardware started with the push model. This is because there was no programmable shaders for geometry in the olden days. Some of the hardware vendors have moved exclusively from push to pull. Since Intel hardware can do both (as is likely the case for any HW supporting the push model), I’ll give a quick overview mostly as a background. The Intel docs do have a comparison of the two which I noticed after writing this section. I am too lazy to find the link 😛

In the pull model, none of the vertex attributes needed by the shader program are fetched before execution of the shader program beings. Assuming you can get the shader programs up and running quickly, and have a way to hide the memory latency (like relatively fast thread switching). This comes with some advantages (maybe more which I haven’t thought of):

  1. In programs with control flow or conditional execution, you may not need all the attributes.
  2. Fewer transistors for fixed function logic.
  3. Assuming the push model requires register space – more ability to avoid spills when there are a lot of vertex attributes.
    • More initial register space for the programs

The push model has fixed function hardware that is designed to fetch all the vertex attributes and populate it into the shader program during invocation. Unlike the pull model, all attributes that may be needed are fetched and this can cause some overhead.

  1. It doesn’t suffer from the two requirements above though (fast shader invocation + latency hiding).
  2. Can do format conversion automatically.
  3. Hardware should also have a better view of the system and be able to more intelligently do the attribute pushing, perhaps utilizing caches better, or considering memory arbiters better, etc.

As already mentioned, usually if your hardware supports the push model, you can optionally pull things as needed, and you can easily envision how these things may work. That is especially useful on GEN in certain cases where we have hardware constraints that make the push model difficult, or even impossible to use.

Push vs. Pull

Push vs. Pull

Green Square

Before moving on, I’d like to demonstrate a real example with code. We’ll use this to see how the attributes make their way down the pipeline. The shader program will simply draw a green square. Such a square can be achieved very easily without a vertex shader, or more specifically, without passing the color through the vertex shader, but this will circumvent what I want to demonstrate. I’ll be using shader_runner from the piglit GL test suite and framework. The i965 driver has several debug capabilities built into the driver and exposed via environment variables. In order to see the contents of the commands emitted by the driver to the hardware in a batchbuffer, you can set the environment variable INTEL_DEBUG=batch. The contents of the batchbuffers will be passed to libdrm for decoding3. The shader programs get disassembled to the native code by the i965 driver4.

Here is the linked shader we’ll be using for shader_runner:

GLSL >= 1.30

[vertex shader]
varying vec4 color;
void main()
  color = vec4(0,1,0,1);
  gl_Position = gl_Vertex;

[fragment shader]
varying vec4 color;

void main()
  gl_FragColor = color;

draw rect -1 -1 2 2

From the shader program we can infer the vertex shader is reading from exactly one input variable: gl_Vertex and is going to “output” two things: the varying vec4 color, and the built-in vec4 gl_Position. Here is the disassembled program from invoking INTEL_DEBUG=vs Don’t worry if you can’t make sense of the assembly that follows. I’m only trying to show the reader that the above shader program is indeed fetching vertex attributes with the aforementioned push model.

mov(8)          g119<1>UD       g1<8,8,1>UD                     { align1 WE_all 1Q compacted };
mov(8)          g120<1>F        g2<8,8,1>F                      { align1 1Q compacted };
mov(8)          g121<1>F        g3<8,8,1>F                      { align1 1Q compacted };
mov(8)          g122<1>F        g4<8,8,1>F                      { align1 1Q compacted };
mov(8)          g123<1>F        g5<8,8,1>F                      { align1 1Q compacted };
mov(8)          g124<1>F        [0F, 0F, 0F, 0F]VF              { align1 1Q compacted };
mov(8)          g125<1>F        1F                              { align1 1Q };
mov(8)          g126<1>F        [0F, 0F, 0F, 0F]VF              { align1 1Q compacted };
mov(8)          g127<1>F        1F                              { align1 1Q };
send(8)         null            g119<8,8,1>F
                            urb 1 SIMD8 write mlen 9 rlen 0                 { align1 1Q EOT };

Ignore that first mov for now. That last send instruction indicates that the program is outputting 98 registers of data as determined by mlen. Yes, mlen is 9 because of that first mov I asked you to ignore. The rest of the information for the send instruction won’t yet make sense. Refer to Intel’s public documentation if you want to know more about the send instruction, but hopefully I’ll cover most of the important parts by the end. The constant 0,1,0,15 values which are moved to g124-g127 should make it clear what color corresponds to, which means the other thing in g120-g123 is gl_Position, and g2-g5 is gl_Vertex (because gl_Position = gl_Vertex;).

*Either you already know how this stuff works, or that should seem weird. We have two vec4 things, color and position, and yet we’re passing 8 vec4 registers down to the next stage – what are the other 6? Well that is the result of SIMD8 dispatch.

Programming the Hardware

Okay, fine. Vertices are important. The various graphics APIs all give ways to specify them, but unless you want to use a software renderer, the hardware needs to be made aware of them too. I guess (and I really am guessing) that most of the graphics APIs are similar enough that a single set of HW commands are sufficient to achieve the goal. It’s actually not a whole lot of commands to do this, and in fact they make a good deal of sense once you understand enough about the APIs and HW.

For the very simple case, we can reduce the number of commands needed to get vertices into the GPU pipeline to 3. There are certainly other very important parts of setting up state so things will run correctly, and the not so simple cases have more commands just for this part of the pipeline setup.

  1. Here are my vertex buffers – 3DSTATE_VERTEX_BUFFERS
  2. Here is how my vertex buffers are laid out – 3DSTATE_VERTEX_ELEMENTS
  3. Do it – 3DPRIMITIVE

There is a 4th command which deserves honorable mention, 3DSTATE_VS. We’re going to have to defer dissecting that for now.

The hardware unit responsible for fetching vertices is simply called the Vertex Fetcher (VF). It’s the fixed function hardware that was mentioned earlier in the section about push vs pull models. The Vertex Fetcher’s job is to feed the vertex shader, and this can be split into two main functional objectives, transferring the data from the vertex buffer into a format which can be read by the vertex shader, and allocating a handle and special internal memory for the data (more later).


The API specifies a vertex buffer which contains very important things for rendering the scene. This is done with a command that describes properties of the vertex buffer. From the 3DSTATE_VERTEX_BUFFERS documentation:

This structure is used in 3DSTATE_VERTEX_BUFFERS to set the state associated with a VB. The VF function will use this state to determine how/where to extract vertex element data for all vertex elements associated with the VB.

(the actual inline data is defined here: VERTEX_BUFFER_STATE)

In my words, the command specifies a number of buffers in any order which are “bound” during the time of drawing. These binding points are needed by the hardware later when it actually begins to fetch the data for drawing the scene. That is why you may have noticed that a lot of information is actually missing from the command. Here is a diagram which shows what a vertex buffer might look like in memory. This diagram is based upon the green square vertex shader described earlier.




The API also specifies the layout of the vertex buffers. This entails where the actual components are as well as the format of the components. Application programmers can and do use lots of cooky stuff to do amazing things, which is again, thankfully out of scope. As an example, let’s suppose our vertices are comprised of three attributes, a 2D coordinate X,Y; a Normal vector U, V; and a color R, G, B.

Packing Vertex Buffers

Packing Vertex Buffers

In the above picture, the first two have very similar programming. They have the same number of VERTEX_ELEMENTS. The third case is a bit weird even from the API perspective, so just consider the fact that it exists and move on.

Just like the vertex buffer state we initialized in hardware, here too we must initialize the vertex element state.

Vertex Elements

Vertex Elements

I don’t think I’ve posted driver code yet, so here is some. What the code does is loop over every enabled attribute enum gl_vert_attrib (GL defines some and some are generic) and is setting up the command for transferring the data from the vertex buffer to the special memory we’ll talk about later. The hardware will have a similar looking loop (the green text in the above diagram) to read these things and to act upon them.

BEGIN_BATCH(1 + nr_elements * 2);
OUT_BATCH((_3DSTATE_VERTEX_ELEMENTS << 16) | (2 * nr_elements - 1));
for (unsigned i = 0; i < brw->vb.nr_enabled; i++) {
   struct brw_vertex_element *input = brw->vb.enabled[i];
   uint32_t format = brw_get_vertex_surface_type(brw, input->glarray);
   uint32_t comp0 = BRW_VE1_COMPONENT_STORE_SRC;
   uint32_t comp1 = BRW_VE1_COMPONENT_STORE_SRC;
   uint32_t comp2 = BRW_VE1_COMPONENT_STORE_SRC;
   uint32_t comp3 = BRW_VE1_COMPONENT_STORE_SRC;

   switch (input->glarray->Size) {
   case 0: comp0 = BRW_VE1_COMPONENT_STORE_0;
   case 1: comp1 = BRW_VE1_COMPONENT_STORE_0;
   case 2: comp2 = BRW_VE1_COMPONENT_STORE_0;
   case 3: comp3 = input->glarray->Integer ? BRW_VE1_COMPONENT_STORE_1_INT
                                           : BRW_VE1_COMPONENT_STORE_1_FLT;
   OUT_BATCH((input->buffer << GEN6_VE0_INDEX_SHIFT) |
             GEN6_VE0_VALID |
             (format << BRW_VE0_FORMAT_SHIFT) |
             (input->offset << BRW_VE0_SRC_OFFSET_SHIFT));
             (comp1 << BRW_VE1_COMPONENT_1_SHIFT) |
             (comp2 << BRW_VE1_COMPONENT_2_SHIFT) |
             (comp3 << BRW_VE1_COMPONENT_3_SHIFT));


Once everything is set up, 3DPRIMITIVE tells the hardware to start doing stuff. Again, much of it is driven by the graphics API in use. I don’t think it’s worth going into too much detail on this one, but feel free to ask questions in the comments…


Modern GEN hardware has a dedicated GPU memory which is referred to as L3 cache. I’m not sure there was a more confusing way to name it. I’ll try to make sure I call it GPU L3 which separates it from whatever else may be called L3.6 The GPU L3 is partitioned into several functional parts, the largest of which is the special URB memory.

The Unified Return Buffer (URB) is a general purpose buffer used for sending data between different threads, and, in some cases, between threads and fixed function units (or vice versa).

The Thread Dispatcher (TD) is the main source of URB reads. As a part of spawning a thread, pipeline fixed functions provide the TD with a number of URB handles, read offsets, and lengths. The TD reads the specified data from the URB and provide that data in the thread payload pre loaded into GRF registers.

I’ll explain that quote in finer detail in the next section. For now, simply recall from the first diagram that we have some input which goes into some geometry part of the pipeline (starting with vertex shading), followed by rasterization (and other fixed function things), followed by fragment shading. The URB is what is containing the data of this geometry as it makes its way through the various stages within that geometry part of this pipeline. The fixed function part of the hardware will consume the data from the URB at which point it can be reused for the next set of geometry.

To summarize, the data going through the pipeline may exist in 3 types of memory depending on the stage and the hardware:

  • Graphics memory (GDDR on discrete cards, or system memory on integrated graphics).
  • The URB.
  • The Execution Unit’s register file

URB space must be divided among the geometry shading stages. To do this, we have state commands like 3DSTATE_URB_VS. Allocation granularity is in 512b chunks which one might make an educated guess is based on the cacheline size in the system (cacheline granularity tends to be the norm for internal clients of memory arbiters can request).

Boiling down the URB allocation in mesa for the VS, you get an admittedly difficult to understand thing.

unsigned vs_size = MAX2(brw->vs.prog_data->base.urb_entry_size, 1);
unsigned vs_entry_size_bytes = vs_size * 64;
unsigned vs_wants =
   ALIGN(brw->urb.max_vs_entries * vs_entry_size_bytes,
         chunk_size_bytes) / chunk_size_bytes - vs_chunks;

The vs_entry_size_bytes is the size of every “packet” which is delivered to the VS shader thread with the push model of execution (keep this in mind for the next section).

count = _mesa_bitcount_64(vs_prog_data->inputs_read);
unsigned vue_entries =
   MAX2(count, vs_prog_data->base.vue_map.num_slots);

vs_prog_data->base.urb_entry_size = ALIGN(vue_entries, 4) / 4;

Above, count is the number of inputs for the vertex shader, and let’s defer a bit on num_slots. Ultimately, we get a count of VUE entries.

WTF is a VUE

In general, vertex data is stored in Vertex URB Entries (VUEs) in the URB, processed by CLIP threads, and only referenced by the pipeline stages indirectly via VUE handles.

The Vertex Fetcher and the Thread Dispatcher will work in tandem to get a Vertex Shader thread running with all the correct data in place. A VUE handle is created for some portion of the URB with the specified size at the time the Vertex Fetcher acts upon a 3DSTATE_VERTEX_ELEMENT. The Vertex Fetcher will populate some part of the VUE based on the contents of the vertex buffer as well as things like adding the handle it generated, and format conversion. That data is referred to as the VS thread payload, and the functionality is referred to as Input Assembly (The documentation there is really good, you should read that page). It’s probably been too long without a picture. The following picture deviates from our green square example in case you are keeping a close eye.

VF create VUEs

VF create VUEs

There are a few things you should notice there. The VF can only write a full vec4 into the URB. In the example there are 2 vec2s (a position, and a normal), but the VF must populate the remaining 2 elements. We briefly saw some of the VERTEX_ELEMENT fields earlier but I didn’t explain them. The STORE_SRC instructs the VF to copy, or convert the source data from the vertex buffer and write it into the VUE. STORE_0 and STORE_1 do not read from the vertex buffer but they do write in a 0, and 1 respectively into the VUE. The terminology that seems to be the most prevalent for the vec4 of data in the VUE is “row”. ie. We’ve used 2 rows per VUE for input, row 0 is the position, row 1 is the normal. The images depict a VUE as an array of vec4 rows.

Here is the second half of the action where the VS actually makes the vertex data available to the EU by loading the register file.

SIMD8 VS Thread Dispatch

SIMD8 VS Thread Dispatch

A VUE is allocated per vertex, and one handle is allocated for each as well. That happens once (it’s more complicated with HS, DS, and GS, but I don’t claim to understand that, so I’ll skip it for now). That means that as the vertex attributes are getting mutated along the GPU pipeline the output is going to have to be written back into the same VUE space that the input came from. The other constraint is we need to actually setup the VUEs so they can be used by the fixed function pipeline later. This may be deferred until a later stage if we’re using other geometry shading stages, or perhaps ignored entirely for streamout. To sum up there are two requirements we have to meet:

  1. The possibly modified vertex data for a given VUE must be written back to the same VUE.
  2. We must build the VUE header, also in place of the old VUE.7

#2 seems pretty straight forward once we find the definition of a VUE header. That should also clear up why we had a function doing MAX2 to determine the VUE size in the previous chapter – we need the max of [the inputs, and outputs with a VUE header]. You don’t have to bother looking at VUE header definition if you don’t want, for our example we just need to know that we need to put an X, Y, Z, W in DWord 4-7 of the header. But how are we going to get the data back into the VUE… And now finally we can explain that 9th register (and mov) I asked you to ignore earlier. Give yourself a pat on the back if you still remember that.

Well we didn’t allocate the handles or the space in the URB for the VUEs. The Vertex Fetcher hardware did that. So we can’t possibly know where they are. But wait! VS Thread Payload for SIMD8 dispatch does have some interesting things in it. Here’s my simplified version of the payload from the table in the docs:

3g0.3Scratch Space
Sampler State Pointer
4g0.4Binding Table Pointer
5g0.5Scratch Offset
6.-7g0.6Don't Care
8-15g1.0Vertex 0-7 Return Handles
16-23 (optional)g2.0 (optional)Constant Data
16-23 (pushed up for constant data)g2.0Vertex Data

Oh. g1 contains the “return” handles for the VUEs. In other words, the thing to reference the location of the output. Let’s go back to the original VS example now

mov(8)          g119<1>UD       g1<8,8,1>UD                     { align1 WE_all 1Q compacted };
mov(8)          g120<1>F        g2<8,8,1>F                      { align1 1Q compacted };
mov(8)          g121<1>F        g3<8,8,1>F                      { align1 1Q compacted };
mov(8)          g122<1>F        g4<8,8,1>F                      { align1 1Q compacted };
mov(8)          g123<1>F        g5<8,8,1>F                      { align1 1Q compacted };
mov(8)          g124<1>F        [0F, 0F, 0F, 0F]VF              { align1 1Q compacted };
mov(8)          g125<1>F        1F                              { align1 1Q };
mov(8)          g126<1>F        [0F, 0F, 0F, 0F]VF              { align1 1Q compacted };
mov(8)          g127<1>F        1F                              { align1 1Q };
send(8)         null            g119<8,8,1>F
                            urb 1 SIMD8 write mlen 9 rlen 0                 { align1 1Q EOT };

With what we now know, we should be able to figure out what this means. Recall that we had 4 vertices in the green square. The destination of the mov instruction is the first register, so our chunk of data to be sent in the URB write message is:

g119 <= URB return handles
g120 <= X X X X ? ? ? ?
g121 <= Y Y Y Y ? ? ? ?
g122 <= Z Z Z Z ? ? ? ?
g123 <= W W W W ? ? ? ?
g124 <= 0 0 0 0 0 0 0 0
g125 <= 1 1 1 1 1 1 1 1
g126 <= 0 0 0 0 0 0 0 0 
g127 <= 1 1 1 1 1 1 1 1

The send command [message], like the rest of the ISA operates upon a channel. In other words, read the values in column order. Send outputs n (9 in this case) registers worth of data. The URB write is pretty straightforward, it just writes n-1 (8 in this case) elements from the channel consuming the first register as the URB handle. That happens across all 8 columns because we’re using SIMD8 dispatch.
URB return handle, x, y, z, w, 0, 1, 0, 1. So that’s good, but if you are paying very close attention, you are missing one critical detail, and not enough information is provided above to figure it out. I’ll give you a moment to go back in case I got your curiosity……

As we already stated above, DWord 4-7 of the VUE header has to be XYZW. So if you believe what I told you about URB writes, what we have above is going to end up with XYZW in DWORD 0-3, and RGBA in DWord4-7. That means the position will be totally screwed, and the color won’t even get passed down to the next stage (because it will be VUE header data). This is solved with the global offset field in the URB write message. To be honest, I don’t quite understand how the code implements this (it’s definitely in src/mesa/drivers/dri/i965/brw_fs_visitor.cpp). I can go find the references to the URB write messages, and dig around the i965 code to figure out how it works, or you can do that and I’ll do one last diagram. What; diagram? Good choice.

URB writeback from VS

URB writeback from VS


For me, writing blog posts like this is like writing a big patch series. You write and correct so many times that you become blind to any errors. WordPress says I am closing in on my 300th revision (and that of course excludes the diagrams). Please do let me know if you notice any errors.

I really wanted to talk about a few other things actually. The way SGVs work for VertexID, InstanceID, and the edge flag. Unfortunately, I think I’ve reached the limit on information that should be in a single blog post. Maybe I’ll do those later or something. Here is the gradient square using VertexID which I was planning to elaborate upon (that is why the first picture had a gradient square):

GLSL >= 1.30

[vertex shader]
varying vec4 color;
void main()
  color = vec4(gl_VertexID == 0, gl_VertexID == 1,
		gl_VertexID == 2, 1);
  gl_Position = gl_Vertex;

[fragment shader]
varying vec4 color;

void main()
  gl_FragColor = color;

draw rect -1 -1 2 2


A great link about the GPU pipeline which I never actually read.


Inkscape SVGs

I think Inkscape might save some extra stuff, so try that if it looks weird in the browser or whatever.
Basic Pipeline:

Push vs. Pull:

Pushed Attributes:

Vertex Buffers and Vertex Elements:

Attribute Packing:

VUE Life Cycle:

Download PDF

  1. I spent a significant amount of time and effort writing about the post about the GLSL compiler in mesa with a focus on the i965 backend. If you’ve read my blog posts, hopefully you see that I try to include real spec reference and code snippets (and pretty pictures). I started work on that before NIR was fully baked, and before certain people essentially rewrote the whole freaking driver. Needless to say, by the time I had a decent post going, everything had changed. Sadly it would have been great to document that part of the driver, but I discarded that post. If there is any interest in me posting what I had written/drawn, maybe I’d be willing to post what I had done in its current state. 

  2. I find the word “pipeline” a bit misleading here, but maybe that’s just me. It’s not specific enough. It seems to imply that a vertex is the smallest atom for which the GPU will operate on, and while that’s true, it ignores the massively parallel nature of the beast. There are other oddities as well, like streamout 

  3. The decoding within libdrm is quite old and crufty. Most, if not everything in this post was hand decoded via the specifications for the commands in the public documentation 

  4. To dump the generated vertex shader and fragment shader in native code (with some amount of the IR), you similarly add fs, and vs. To dump all three things at once for example:

  5. The layout is RGBA, ie. completely opaque green value. 

  6. Originally, and in older public docs it is sometimes referred to as Mid Level Cache (MLC), and I believe they are exactly the same thing. MLC was the level before Last Level Cache (LLC), which made sense, until Baytrail which had a GPU L3, but did not have an LLC. 

  7. You could get clever, or buggy, depending on which way you look at it and actually write into a different VUE. The ability is there, but it’s not often (if ever) used 

September 23, 2015
As I'm known to do, a focus on the little things I worked on during the just released GNOME 3.18 development cycle.

Hardware support

The accelerometer support in GNOME now uses iio-sensor-proxy. This daemon also now supports ambient light sensors, which Richard used to implement the automatic brightness adjustment, and compasses, which are used in GeoClue and gnome-maps.

In kernel-land, I've fixed the detection of some Bosch accelerometers, added support for another Kyonix one, as used in some tablets.

I've also added quirks for out-of-the-box touchscreen support on some cheaper tablets using the goodix driver, and started reviewing a number of patches for that same touchscreen.

With Larry Finger, of Realtek kernel drivers fame, we've carried on cleaning up the Realtek 8723BS driver used in the majority of Windows-compatible tablets, in the Endless computer, and even in the $9 C.H.I.P. Linux computer.

Bluetooth UI changes

The Bluetooth panel now has better « empty states », explaining how to get Bluetooth working again when a hardware killswitch is used, or it's been turned off by hand. We've also made receiving files through OBEX Push easier, and builtin to the Bluetooth panel, so that you won't forget to turn it off when done, and won't have trouble finding it, as is the case for settings that aren't used often.


GNOME Videos has seen some work, mostly in the stabilisation, and bug fixing department, most of those fixes were also landed in the 3.16 version.

We've also been laying the groundwork in grilo for writing ever less code in C for plugin sources. Grilo Lua plugins can now use gnome-online-accounts to access keys for specific accounts, which we've used to re-implement the Pocket videos plugin, as well as the cover art plugin.

All those changes should allow implementing OwnCloud support in gnome-music in GNOME 3.20.

My favourite GNOME 3.18 features

You can call them features, or bug fixes, but the overall improvements in the Wayland and touchpad/touchscreen support are pretty exciting. Do try it out when you get a GNOME 3.18 installation, and file bugs, it's coming soon!

Talking of bug fixes, this one means that I don't need to put in my password by hand when I want to access work related resources. Connect to the VPN, and I'm authenticated to Kerberos.

I've also got a particular attachment to the GeoClue GPS support through phones. This allows us to have more accurate geolocation support than any desktop environments around.

A few for later

The LibreOfficeKit support that will be coming to gnome-documents will help us get support for EPubs in gnome-books, as it will make it easier to plug in previewers other than the Evince widget.

Victor Toso has also been working through my Grilo bugs to allow us to implement a preview page when opening videos. Work has already started on that, so fingers crossed for GNOME 3.20!
September 22, 2015

Only 14 tickets still available!

systemd.conf 2015 is close to being sold out, there are only 14 tickets left now. If you haven't bought your ticket yet, now is the time to do it, because otherwise it will be too late and all tickets will be gone!

Why attend? At this conference you'll get to meet everybody who is involved with the systemd project and learn what they are working on, and where the project will go next. You'll hear from major users and projects working with systemd. It's the primary forum where you can make yourself heard and get first hand access to everybody who's working on the future of the core Linux userspace!

To get an idea about the schedule, please consult our preliminary schedule.

In order to register for the conference, please visit the registration page.

We are still looking for sponsors. If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our Becoming a Sponsor page!

For further details about systemd.conf consult the conference website.

September 18, 2015
I've done a talk at XDC 2015 about atomic modesetting with a focus for driver writers. Most of the talk is an overview of how an atomic modeset looks and how to implement the different parts in a driver backend. Anyway, for all those who missed it, there's a video and slides.
September 17, 2015

A few days ago, the French equivalent of Hacker News, called "Le Journal du Hacker", interviewed me about my work on OpenStack, my job at Red Hat and my self-published book The Hacker's Guide to Python. I've spent some time translating it into English so you can read it if you don't understand French! I hope you'll enjoy it.

Hi Julien, and thanks for participating in this interview for the Journal du Hacker. For our readers who don't know you, can you introduce you briefly?

You're welcome! My name is Julien, I'm 31 years old, and I live in Paris. I now have been developing free software for around fifteen years. I had the pleasure to work (among other things) on Debian, Emacs and awesome these last years, and more recently on OpenStack. Since a few months now, I work at Red Hat, as a Principal Software Engineer on OpenStack. I am in charge of doing upstream development for that cloud-computing platform, mainly around the Ceilometer, Aodh and Gnocchi projects.

Being myself a system architect, I follow your work in OpenStack since a while. It's uncommon to have the point of view of someone as implied as you are. Can you give us a summary of the state of the project, and then detail your activities in this project?

The OpenStack project has grown and changed a lot since I started 4 years ago. It started as a few projects providing the basics, like Nova (compute), Swift (object storage), Cinder (volume), Keystone (identity) or even Neutron (network) who are basis for a cloud-computing platform, and finally became composed of a lot more projects.

For a while, the inclusion of projects was the subject of a strict review from the technical committee. But since a few months, the rules have been relaxed, and we see a lot more projects connected to cloud-computing joining us.

As far as I'm concerned, I've started with a few others people the Ceilometer project in 2012, devoted to handling metrics of OpenStack platforms. Our goal is to be able to collect all the metrics and record them to analyze them later. We also have a module providing the ability to trigger actions on threshold crossing (alarm).

The project grew in a monolithic way, and in a linear way for the number of contributors, during the first two years. I was the PTL (Project Technical Leader) for a year. This leader position asks for a lot of time for bureaucratic things and people management, so I decided to leave my spot in order to be able to spend more time solving the technical challenges that Ceilometer offered.

I've started the Gnocchi project in 2014. The first stable version (1.0.0) was released a few months ago. It's a timeseries database offering a REST API and a strong ability to scale. It was a necessary development to solve the problems tied to the large amount of metrics created by a cloud-computing platform, where tens of thousands of virtual machines have to be metered as often as possible. This project works as a standalone deployment or with the rest of OpenStack.

More recently, I've started Aodh, the result of moving out the code and features of Ceilometer related to threshold action triggering (alarming). That's the logical suite to what we started with Gnocchi. It means Ceilometer is to be split into independent modules that can work together – with or without OpenStack. It seems to me that the features provided by Ceilometer, Aodh and Gnocchi can also be interesting for operators running more classical infrastructures. That's why I've pushed the projects into that direction, and also to have a more service-oriented architecture (SOA)

I'd like to stop for a moment on Ceilometer. I think that this solution was very expected, especially by the cloud-computing providers using OpenStack for billing resources sold to their customers. I remember reading a blog post where you were talking about the high-speed construction of this brick, and features that were not supposed to be there. Nowadays, with Gnocchi and Aodh, what is the quality of the brick Ceilometer and the programs it relies on?

Indeed, one of the first use-case for Ceilometer was tied to the ability to get metrics to feed a billing tool. That's now a reached goal since we have billing tools for OpenStack using Ceilometer, such as CloudKitty.

However, other use-cases appeared rapidly, such as the ability to trigger alarms. This feature was necessary, for example, to implement the auto-scaling feature that Heat needed. At the time, for technical and political reasons, it was not possible to implement this feature in a new project, and the functionality ended up in Ceilometer, since it was using the metrics collected and stored by Ceilometer itself.

Though, like I said, this feature is now in its own project, Aodh. The alarm feature is used since a few cycles in production, and the Aodh project brings new features on the table. It allows to trigger threshold actions and is one of the few solutions able to work at high scale with several thousands of alarms. It's impossible to make Nagios run with millions of instances to fetch metrics and triggers alarms. Ceilometer and Aodh can do that easily on a few tens of nodes automatically.

On the other side, Ceilometer has been for a long time painted as slow and complicated to use, because its metrics storage system was by default using MongoDB. Clearly, the data structure model picked was not optimal for what the users were doing with the data.

That's why I started Gnocchi last year, which is perfectly designed for this use case. It allows linear access time to metrics (O(1) complexity) and fast access time to the resources data via an index.

Today, with 3 projects having their own perimeter of features defined – and which can work together – Ceilometer, Aodh and Gnocchi finally erased the biggest problems and defects of the initial project.

To end with OpenStack, one last question. You're a Python developer for a long time and a fervent user of software testing and test-driven development. Several of your blogs posts point how important their usage are. Can you tell us more about the usage of tests in OpenStack, and the test prerequisites to contribute to OpenStack?

I don't know any project that is as tested on every layer as OpenStack is. At the start of the project, there was a vague test coverage, made of a few unit tests. For each release, a bunch of new features were provided, and you had to keep your fingers crossed to have them working. That's already almost unacceptable. But the big issue was that there was also a lot of regressions, et things that were working were not anymore. It was often corner cases that developers forgot about that stopped working.

Then the project decided to change its policy and started to refuse all patches – new features or bug fix – that would not implement a minimal set of unit tests, proving the patch would work. Quickly, regressions were history, and the number of bugs largely reduced months after months.

Then came the functional tests, with the Tempest project, which runs a test battery on a complete OpenStack deployment.

OpenStack now possesses a complete test infrastructure, with operators hired full-time to maintain them. The developers have to write the test, and the operators maintain an architecture based on Gerrit, Zuul, and Jenkins, which runs the test battery of each project for each patch sent.

Indeed, for each version of a patch sent, a full OpenStack is deployed into a virtual machine, and a battery of thousands of unit and functional tests is run to check that no regressions are possible.

To contribute to OpenStack, you need to know how to write a unit test – the policy on functional tests is laxer. The tools used are standard Python tools, unittest for the framework and tox to run a virtual environment (venv) and run them.

It's also possible to use DevStack to deploy an OpenStack platform on a virtual machine and run functional tests. However, since the project infrastructure also do that when a patch is submitted, it's not mandatory to do that yourself locally.

The tools and tests you write for OpenStack are written in Python, a language which is very popular today. You seem to like it more than you have to, since you wrote a book about it, The Hacker's Guide to Python, that I really enjoyed. Can you explain what brought you to Python, the main strong points you attribute to this language (quickly) and how you went from developer to author?

I stumbled upon Python by chance, around 2005. I don't remember how I hear about it, but I bought a first book to discover it and started toying with that language. At that time, I didn't find any project to contribute to or to start. My first project with Python was rebuildd for Debian in 2007, a bit later.

I like Python for its simplicity, its object orientation rather clean, its easiness to be deployed and its rich open source ecosystem. Once you get the basics, it's very easy to evolve and to use it for anything, because the ecosystem makes it easy to find libraries to solve any kind of problem.

I became an author by chance, writing blog posts from time to time about Python. I finally realized that after a few years studying Python internals (CPython), I learned a lot of things. While writing a post about the differences between method types in Python – which is still one of the most read post on my blog – I realized that a lot of things that seemed obvious to me where not for other developers.

I wrote that initial post after thousands of hours spent doing code reviews on OpenStack. I, therefore, decided to note all the developers pain points and to write a book about that. A compilation of what years of experience taught me and taught to the other developers I decided to interview in the book.

I've been very interested by the publication of your book, for the subject itself, but also the process you chose. You self-published the book, which seems very relevant nowadays. Is that a choice from the start? Did you look for an editor? Can you tell use more about that?

I've been lucky to find out about others self-published authors, such as Nathan Barry – who even wrote a book on that subject, called Authority. That's what convinced me it was possible and gave me hints for that project.

I've started to write in August 2013, and I ran the firs interviews with other developers at that time. I started to write the table of contents and then filled the pages with what I knew and what I wanted to share. I manage to finish the book around January 2014. The proof-reading took more time than I expected, so the book was only released in March 2014. I wrote a complete report on that on my blog, where I explain the full process in detail, from writing to launching.

I did not look for editors though I've been proposed some. The idea of self-publishing really convince me, so I decided to go on my own, and I have no regret. It's true that you have to wear two hats at the same time and handle a lot more things, but with a minimal audience and some help from the Internet, anything's possible!

I've been reached by two editors since then, a Chinese and Korean one. I gave them rights to translate and publish the books in their countries, so you can buy the Chinese and Korean version of the first edition of the book out there.

Seeing how successful it was, I decided to launch a second edition in Mai 2015, and it's likely that a third edition will be released in 2016.

Nowadays, you work for Red Hat, a company that represents the success of using Free Software as a commercial business model. This company fascinates a lot in our community. What can you say about your employer from your point of view?

It only has been a year since I joined Red Hat (when they bought eNovance), so my experience is quite recent.

Though, Red Hat is really a special company on every level. It's hard to see from the outside how open it is, and how it works. It's really close to and it really looks like an open source project. For more details, you should read The Open Organization, a book wrote by Jim Whitehurst (CEO of Red Hat), which he just published. It describes perfectly how Red Hat works. To summarize, meritocracy and the lack of organization in silos is what makes Red Hat a strong organization and puts them as one of the most innovative company.

In the end, I'm lucky enough to be autonomous for the project I work on with my team around OpenStack, and I can spend 100% working upstream and enhance the Python ecosystem.

September 15, 2015

Many modern mice have the ability to store profiles, customize button mappings and actions and switch between several hardware resolutions. A number of those mice are targeted at gamers, but the features are increasingly common in standard mice. Under Linux, support for these device is spotty, though there are a few projects dedicated to supporting parts of the available device range. [1] [2] [3]

Benjamin Tissoires and I started a new project: libratbag. libratbag is a library to provide a generic interface to these mice,enabling desktop environments to provide configuration tools without having to worry about the device model. As of the time of this writing, we have partial support for the Logitech HID++ 1.0 (G500, G5) and HID++ 2.0 protocols (G303), the Etekcity Scroll Alpha and Roccat Kone XTD. Thomas H. P. Anderson already added the G5, G9 and the M705.

git clone

The internal architecture is fairly simple, behind the library's API we have a couple of protocol-specific drivers that access the mouse. The drivers match a specific product/vendor ID combination and load the data from the device, the library then exports it to the caller as a struct ratbag_device. Each device has at least one profile, each profile has a number of buttons and at least one resolution. Where possible, the resolutions can be queried and set, the buttons likewise can be queried and set for different functions. If the hardware supports it, you can map buttons to other buttons, assign macros, or special functions such as DPI/profile switching. The main goal of libratbag is to unify access to the devices so a configuration application doesn't need different libraries per hardware. Especially short-term, we envision using some of the projects listed above through custom backends.

We're at version 0.1 at the moment, so the API is still subject to change. It looks like this:

#include <libratbag.h>

struct ratbag *ratbag;
struct ratbag_device *device;
struct ratbag_profile *p;
struct ratbag_button *b;
struct ratbag_resolution *r;

ratbag = ratbag_create_context(...);
device = ratbag_device_new_from_udev(ratbag, udev_device);

/* retrieve the first profile */
p = ratbag_device_get_profile(device, 0);

/* retrieve the first resolution setting of the profile */
r = ratbag_profile_get_resolution(p, 0);
printf("The first resolution is: %dpi @ %d Hz\n",


/* retrieve the fourth button */
b = ratbag_profile_get_button(p, 4);

if (ratbag_button_get_action_type(b) == RATBAG_BUTTON_ACTION_TYPE_SPECIAL &&
ratbag_button_get_special(b) == RATBAG_BUTTON_ACTION_SPECIAL_RESOLUTION_UP)
printf("button 4 selects next resolution");


For testing and playing around with libratbag, we have a tool called ratbag-command that exposes most of the library:

$ ratbag-command info /dev/input/event8
Device 'BTL Gaming Mouse'
Capabilities: res profile btn-key btn-macros
Number of buttons: 11
Profiles supported: 5
Profile 0 (active)
0: 800x800dpi @ 500Hz
1: 800x800dpi @ 500Hz (active)
2: 2400x2400dpi @ 500Hz
3: 3200x3200dpi @ 500Hz
4: 4000x4000dpi @ 500Hz
5: 8000x8000dpi @ 500Hz
Button: 0 type left is mapped to 'button 1'
Button: 1 type right is mapped to 'button 2'
Button: 2 type middle is mapped to 'button 3'
Button: 3 type extra (forward) is mapped to 'profile up'
Button: 4 type side (backward) is mapped to 'profile down'
Button: 5 type resolution cycle up is mapped to 'resolution cycle up'
Button: 6 type pinkie is mapped to 'macro "": H↓ H↑ E↓ E↑ L↓ L↑ L↓ L↑ O↓ O↑'
Button: 7 type pinkie2 is mapped to 'macro "foo": F↓ F↑ O↓ O↑ O↓ O↑'
Button: 8 type wheel up is mapped to 'wheel up'
Button: 9 type wheel down is mapped to 'wheel down'
Button: 10 type unknown is mapped to 'none'
Profile 1
And to toggle/query the various settings on the device:

$ ratbag-command dpi set 400 /dev/input/event8
$ ratbag-command profile 1 resolution 3 dpi set 800 /dev/input/event8
$ ratbag-command profile 0 button 4 set action special doubleclick

libratbag is in a very early state of development. There are a bunch of FIXMEs in the code, the hardware support is still spotty and we'll appreciate any help we can get, especially with the hardware driver backends. There's a TODO in the repo for some things that we already know needs changing. Feel free to browse the repo on github and drop us some patches.

Eventually we want this to be integrated into the desktop environments, either in the respective control panels or in a standalone application. libratbag already provides SVGs for some devices we support but we'll need some designer input for the actual application. Again, any help you want to provide here will be much appreciated.

A Preliminary systemd.conf 2015 Schedule is Now Online!

We are happy to announce that an initial, preliminary version of the systemd.conf 2015 schedule is now online! (Please ignore that some rows in the schedule link the same session twice on that page. That's a bug in the web site CMS we are working on to fix.)

We got an overwhelming number of high-quality submissions during the CfP! Because there were so many good talks we really wanted to accept, we decided to do two full days of talks now, leaving one more day for the hackfest and BoFs. We also shortened many of the slots, to make room for more. All in all we now have a schedule packed with fantastic presentations!

The areas covered range from containers, to system provisioning, stateless systems, distributed init systems, the kdbus IPC, control groups, systemd on the desktop, systemd in embedded devices, configuration management and systemd, and systemd in downstream distributions.

We'd like to thank everybody who submited a presentation proposal!

Also, don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! Register here!.

We are still looking for sponsors. If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our Becoming a Sponsor page!

For further details about systemd.conf consult the conference website.

September 14, 2015

We've been hard working with the Gnocchi team these last months to store your metrics, and I guess it's time to show off a bit.

So far Gnocchi offers scalable metric storage and resource indexation, especially for OpenStack cloud – but not only, we're generic. It's cool to store metrics, but it can be even better to have a way to visualize them!


We very soon started to build a little HTML interface. Being REST-friendly guys, we enabled it on the same endpoints that were being used to retrieve information and measures about metric, sending back text/html instead of application/json if you were requesting those pages from a Web browser.

But let's face it: we are back-end developers, we suck at any kind front-end development. CSS, HTML, JavaScript? Bwah! So what we built was a starting point, hoping some magical Web developer would jump in and finish the job.

Obviously it never happened.

Ok, so what's out there?

It turns out there are back-end agnostic solutions out there, and we decided to pick Grafana. Grafana is a complete graphing dashboard solution that can be plugged on top of any back-end. It already supports timeseries databases such as Graphite, InfluxDB and OpenTSDB.

That was largely enough for that my fellow developer Mehdi Abaakouk to jump in and start writing a Gnocchi plugin for Grafana! Consequently, there is now a basic but solid and working back-end for Grafana that lies in the grafana-plugins repository.

With that plugin, you can graph anything that is stored in Gnocchi, from raw metrics to metrics tied to resources. You can use templating, but no annotation yet. The back-end supports Gnocchi with or without Keystone involved, and any type of authentication (basic auth or Keystone token). So yes, it even works if you're not running Gnocchi with the rest of OpenStack.

It also supports advanced queries, so you can search for resources based on some criterion and graphs their metrics.

I want to try it!

If you want to deploy it, all you need to do is to install Grafana and its plugins, and create a new datasource pointing to Gnocchi. It is that simple. There's some CORS middleware configuration involved if you're planning on using Keystone authentication, but it's pretty straightforward – just set the cors.allowed_origin option to the URL of your Grafana dashboard.

We added support of Grafana directly in Gnocchi devstack plugin. If you're running DevStack you can follow the instructions – which are basically adding the line enable_service gnocchi-grafana.

Moving to Grafana core

Mehdi just opened a pull request a few days ago to merge the plugin into Grafana core. It's actually one of the most unit-tested plugin in Grafana so far, so it should be on a good path to be merged in the future and have support of Gnocchi directly into Grafana without any plugin involved.

September 11, 2015

LimbaHubThe Limba project does not only have the goal to allow developers to deploy their applications directly on multiple Linux distributions while reducing duplication of shared resources, it should also make it easy for developers to build software for Limba.

Limba is worth nothing without good tooling to make it fun to use. That’s why I am working on that too, and I want to share some ideas of how things could work in future and which services I would like to have running. I will also show what is working today already (and that’s quite something!). This time I look at things from a developer’s perspective (since the last posts on Limba were more end-user centric). If you read on, you will also find a nice video of the developer workflow 😉

1. Creating metadata and building the software

To make building Limba packages as simple as possible, Limba reuses already existing metadata, like AppStream metadata to find information about the software you want to create your package for.

To ensure upstreams can build their software in a clean environment, Limba makes using one as simple as possible: The limba-build CLI tool creates a clean chroot environment quickly by using an environment created by debootstrap (or a comparable tool suitable for the Linux distribution), and then using OverlayFS to have all changes to the environment done during the build process land in a separate directory.

To define build instructions, limba-build uses the same YAML format TravisCI uses as well for continuous integration. So there is a chance this data is already present as well (if not, it’s trivial to write).

In case upstream projects don’t want to use these tools, e.g. because they have well-working CI already, then all commands needed to build a Limba package can be called individually as well (ideally, building a Limba package is just one call to lipkgen).

I am currently planning “DeveloperIPK” packages containing resources needed to develop against another Limba package. With that in place and integrated with the automatic build-environment creation, upstream developers can be sure the application they just built is built against the right libraries as present in the package they depend on. The build tool could even fetch the build-dependencies automatically from a central repository.

2. Uploading the software to a repository

While everyone can set up their own Limba repository, and the limba-build repo command will help with that, there are lots of benefits in having a central place where upstream developers can upload their software to.

I am currently developing a service like that, called “LimbaHub”. LimbaHub will contain different repositories distributors can make available to their users by default, e.g. there will be one with only free software, and one for proprietary software. It will also later allow upstreams to create private repositories, e.g. for beta-releases.

3. Security in LimbaHub

Every Limba package is signed with they key of its creator anyway, so in order to get a package into LimbaHub, one needs to get their OpenPGP key accepted by the service first.

Additionally, the Hub service works with a per-package permission system. This means I can e.g. allow the Mozilla release team members to upload a package with the component-ID “org.mozilla.firefox.desktop” or even allow those user(s) to “own” the whole org.mozilla.* namespace.

This should prevent people hijacking other people’s uploads accidentally or on purpose.

4. QA with LimbaHub

LimbaHub should also act as guardian over ABI stability and general quality of the software. We could for example warn upstreams that they broke ABI without declaring that in the package information, or even reject the package then. We could validate .desktop files and AppStream metadata, or even check if a package was built using hardening flags.

This should help both developers to improve their software as well as users who benefit from that effort. In case something really bad gets submitted to LimbaHub, we always have the ability to remove the package from the repositories as a last resort (which might trigger Limba to issue a warning for the user that he won’t receive updates anymore).

What works

Limba, LimbaHub and the tools around it are developing nicely, so far no big issues have been encountered yet.

That’s why I made a video showing how Limba and LimbaHub work together at time:

Still, there is a lot of room for improvement – Limba has not yet received enough testing, and LimbaHub is merely a proof-of-concept at time. Also, lots of high-priority features are not yet implemented.

LimbaHub and Limba need help!

At time I am developing LimbaHub and Limba alone with only occasional contributions from others (which are amazing and highly welcomed!). So, if you like Python and Flask, and want to help developing LimbaHub, please contact me – the LimbaHub software could benefit from a more experienced Python web developer than I am 😉 (and maybe having a designer look over the frontend later makes sense as well). If you are not afraid of C and GLib, and like to chase bugs or play with building Limba packages, consider helping Limba development :-)

September 07, 2015
Kernel 4.2 is released already and the 4.3 merge window in full swing, time to look at what's in it for the intel graphics driver.

Biggest thing for sure is that Skylake is finally out of preliminary support and enabled by default. The reason for the long hold-up was some ABI fumble - the hardware exposes the topmost plane both through the new universal plane registers and the legacy cursor registers and because we simply carried the legacy plane code around in the driver we ended up exposing both. This wasn't something big to take care of but somehow was dragged on forever.

The other big thing is that now legacy modesets are done with the new atomic modesetting code driver-internally. Atomic support in i915.ko isn't ready for prime-time yet fully, but this is definitely a big step forward. Besides atomic there's also other cross-platform improvements in the modeset code: Ville fixed up the 12bpc support for HDMI, which is now used by default if the screen supports it. Mika Kahola and Ville also implemented dynamic adjustment of the cdclk, which is the main clock source for display engines on intel graphics. And there's a big difference in the clock speeds needed between e.g. a 4k screen and a 720p TV.

Continuing with power saving features Rodrigo again spent a lot of time fixing up PSR (panel self refresh). And Paulo did the same by writing patches to improve FBC (framebuffer compression). We have some really solid testcases by now, unfortunately neither feature is ready for enabling by default yet. Especially PSR is still plagued by screen freezes on some random systems. Also there's been some fixes to DRRS (dynamic refresh rate switching) from Ramalingam. DRRS is enabled by default already, where supported. And finally some improvements to make the frontbuffer rendering tracking more accurate, which is used by all three of these display power saving features.

And of course there's also tons of improvements to platform code. Display PLL code for Sklylake and Valleyview&Cherryview was tuned by Damien and Ville respectively. There's been tons of work on Broxton and DSI support by Imre, Gaurav and others.

Moving on to the rendering side the big change is how tracking of rendering tasks is handled. In the past the driver just used raw sequence numbers emitted by the hardware, but for cross-driver synchronization and reordering tasks with an eventual gpu scheduler more abstraction is needed. A big step is converting over to the i915 request structure completely, done by John Harrison. The next step will be to switch the internal implementation for i915 requests to the cross-driver fences, but that's for future kernels. As a follow-up cleanup John also removed the OLR, which stands for outstanding lazy request. It was a neat little trick implemented years ago to simplify handling error recovery, but which causes tons of pain with subtle bugs. Making requests more explicit in the driver allowed us to finally remove this trick since.

There's also been a pile of platform related features: MOCS programming for Skylake/Broxton (which is used for caching control). Resource streamer support from Abdiel, which is used to offload some of the buffer object tracking for shaders from the cpu to the gpu. And the command parser on Haswell was extended to support atomic instructions in shaders. And finally for Skylake Mika Kuoppala added code to avoid resetting the gpu - in certain cases the hardware would hard-hang the entire system trying to execute the reset. And a dead gpu is still better than a dead system.

Piwik told me that people are still sharing my post about the state of GNOME-Software and update notifications in Debian Jessie.

So I thought it might be useful to publish a small update on that matter:

  • If you are using GNOME or KDE Plasma with Debian 8 (Jessie), everything is fine – you will receive update notifications through the GNOME-Shell/via g-s-d or Apper respectively. You can perform updates on GNOME with GNOME-PackageKit and with Apper on KDE Plasma.
  • If you are using a desktop-environment not supporting PackageKit directly – for example Xfce, which previously relied on external tools – you might want to try pk-update-icon from jessie-backports. The small GTK+ tool will notify about updates and install them via GNOME-PackageKit, basically doing what GNOME-PackageKit did by itself before the functionality was moved into GNOME-Software.
  • For Debian Stretch, the upcoming release of Debian, we will have gnome-software ready and fully working. However, one of the design decisions of upstream is to only allow offline-updates (= download updates in the background, install on request at next reboot) with GNOME-Software. In case you don’t want to use that, GNOME-PackageKit will still be available, and so are of course all the CLI tools.
  • For KDE Plasma 5 on Debian 9 (Stretch), a nice AppStream based software management solution with a user-friendly updater is also planned and being developed upstream. More information on that will come when there’s something ready to show ;-).

I hope that clarifies things a little. Have fun using Debian 8!


It appears that many people have problems with getting update notifications in GNOME on Jessie. If you are affected by this, please try the following:

  1. Open dconf-editor and navigate to org.gnome.settings-daemon.plugins.updates. Check if the key active is set to true
  2. If that doesn’t help, also check if at org.gnome.settings-daemon.plugins.updates the frequency-refresh-cache value is set to a sane value (e.g. 86400)
  3. Consider increasing the priority value (if it isn’t at 300 already)
  4. If all of that doesn’t help: I guess some internal logic in g-s-d is preventing a cache refresh then (e.g. because I thinks it is on a expensive network connection and therefore doesn’t refresh the cache automatically, or thinks it is running on battery). This is a bug. If that still happens on Debian Stretch with GNOME-Software, please report a bug against the gnome-software package. As a workaround for Jessie you can enable unconditional cache refreshing via the APT cronjob by installing apt-config-auto-update.
September 05, 2015
While trying to autogenerate C interfaces for XKB during the last 2 months it sometimes felt like an attempt to "behead the Hydra": Whenever I fixed a problem, a number of new issues arose (deep from the intestines of the python code generator). At the moment it seems I have reached a state where no new heads emerge, and I even managed to cut off a number of the most ugliest...

The cause of most (all?) troubles is a very basic way of distinguishing data types into fixed size (everything you can feed to sizeof), variable size data types (lists, valueparams, ...) and padding. In the X protocol fixed size fields are normally grouped together, so e.g. in a request all fixed size fields are collected before any variable size fields.

XCB, whose purpose it is to translate between X server and client, takes advantage of this ordering: Several grouped fixed size fields are conveniently mapped to a C struct, so it is fairly easy to deal with. The treatment of variable size data is more difficult and involves the use of (autogenerated) accessors and iterators. Also the specific number of elements in a variable size data type can depend on expressions that are specified in the protocol, but need to be evaluated at runtime.
Now XKB breaks with the "pure doctrine" of the X core protocol: Fixed and variable size fields are intermixed in several cases and new expressions are introduced. Further problematic cases include a variable size union, lists with a variable number of variable size elements, ... And finally, the keyboard extension defines a new datatype (ListOfItems), which is referred to as 'switch' in XCB.

Switch is a kind of list, but the 'items' are not necessarily of the same type, and whether an item is included or not depends on expressions that have to be evaluated for each item separately. Defining a C mapping for 'switch' was one of the main goals of my work, and it turned out that a set of new functions was needed. These new functions must either 'serialize' a switch into a buffer, depending on the concrete switch conditions, or 'unserialize' it to some C struct. As 'switch' can contain any other data type valid in the X protocol (which means especially that it also can contain other switches...), 'serialize'/'unserialize' had to be defined for all those data types.
Once I had the autogenerator spill out '_serialize()' and '_unserialize()', it turned out they can be used to deal with some other special cases as well - most notably the (above-mentioned) intermixing of fixed and variable size fields.

But probably the most appealing feature of the serializers is that they are invisible to anyone who wants to use XCB as they get called automatically in the autogenerated helper functions.

A last note on the current status:
+ xkb.c (autogenerated from xkb.xml) compiles!
+ the request side should be finished (more tests needed however)
+ simple XKB test programs run
- on the reply side, special cases still need special treatment
- _unserialize() should be renamed to _sizeof() in some cases
- some special cases also need special accessors

The code is currently hosted on annarchy, but I will export the repos shortly.
September 04, 2015

Continuing my post series on the tools I use these days in Python, this time I would like to talk about a library I really like, named voluptuous.

It's no secret that most of the time, when a program receives data from the outside, it's a big deal to handle it. Indeed, most of the time your program has no guarantee that the stream is valid and that it contains what is expected.

The robustness principle says you should be liberal in what you accept, though that is not always a good idea neither. Whatever policy you chose, you need to process those data and implement a policy that will work – lax or not.

That means that the program need to look into the data received, check that it finds everything it needs, complete what might be missing (e.g. set some default), transform some data, and maybe reject those data in the end.

Data validation

The first step is to validate the data, which means checking all the fields are there and all the types are right or understandable (parseable). Voluptuous provides a single interface for all that called a Schema.

>>> from voluptuous import Schema
>>> s = Schema({
... 'q': str,
... 'per_page': int,
... 'page': int,
... })
>>> s({"q": "hello"})
{'q': 'hello'}
>>> s({"q": "hello", "page": "world"})
voluptuous.MultipleInvalid: expected int for dictionary value @ data['page']
>>> s({"q": "hello", "unknown": "key"})
voluptuous.MultipleInvalid: extra keys not allowed @ data['unknown']

The argument to voluptuous.Schema should be the data structure that you expect. Voluptuous accepts any kind of data structure, so it could also be a simple string or an array of dict of array of integer. You get it. Here it's a dict with a few keys that if present should be validated as certain types. By default, Voluptuous does not raise an error if some keys are missing. However, it is invalid to have extra keys in a dict by default. If you want to allow extra keys, it is possible to specify it.

>>> from voluptuous import Schema
>>> s = Schema({"foo": str}, extra=True)
>>> s({"bar": 2})
{"bar": 2}

It is also possible to make some keys mandatory.

>>> from voluptuous import Schema, Required
>>> s = Schema({Required("foo"): str})
>>> s({})
voluptuous.MultipleInvalid: required key not provided @ data['foo']

You can create custom data type very easily. Voluptuous data types are actually just functions that are called with one argument, the value, and that should either return the value or raise an Invalid or ValueError exception.

>>> from voluptuous import Schema, Invalid
>>> def StringWithLength5(value):
... if isinstance(value, str) and len(value) == 5:
... return value
... raise Invalid("Not a string with 5 chars")
>>> s = Schema(StringWithLength5)
>>> s("hello")
>>> s("hello world")
voluptuous.MultipleInvalid: Not a string with 5 chars

Most of the time though, there is no need to create your own data types. Voluptuous provides logical operators that can, combined with a few others provided primitives such as voluptuous.Length or voluptuous.Range, create a large range of validation scheme.

>>> from voluptuous import Schema, Length, All
>>> s = Schema(All(str, Length(min=3, max=5)))
>>> s("hello")
>>> s("hello world")
voluptuous.MultipleInvalid: length of value must be at most 5

The voluptuous documentation has a good set of examples that you can check to have a good overview of what you can do.

Data transformation

What's important to remember, is that each data type that you use is a function that is called and returns a value, if the value is considered valid. That value returned is what is actually used and returned after the schema validation:

>>> import uuid
>>> from voluptuous import Schema
>>> def UUID(value):
... return uuid.UUID(value)
>>> s = Schema({"foo": UUID})
>>> data_converted = s({"foo": "uuid?"})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']
>>> data_converted = s({"foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D"})
>>> data_converted
{'foo': UUID('8b7ba51c-dff5-45dd-b28c-6911a2317d1d')}

By defining a custom UUID function that converts a value to a UUID, the schema converts the string passed in the data to a Python UUID object – validating the format at the same time.

Note a little trick here: it's not possible to use directly uuid.UUID in the schema, otherwise Voluptuous would check that the data is actually an instance of uuid.UUID:

>>> from voluptuous import Schema
>>> s = Schema({"foo": uuid.UUID})
>>> s({"foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D"})
voluptuous.MultipleInvalid: expected UUID for dictionary value @ data['foo']
>>> s({"foo": uuid.uuid4()})
{'foo': UUID('60b6d6c4-e719-47a7-8e2e-b4a4a30631ed')}

And that's not what is wanted here.

That mechanism is really neat to transform, for example, strings to timestamps.

>>> import datetime
>>> from voluptuous import Schema
>>> def Timestamp(value):
... return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S")
>>> s = Schema({"foo": Timestamp})
>>> s({"foo": '2015-03-03T12:12:12'})
{'foo': datetime.datetime(2015, 3, 3, 12, 12, 12)}
>>> s({"foo": '2015-03-03T12:12'})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']

Recursive schemas

So far, Voluptuous has one limitation so far: the ability to have recursive schemas. The simplest way to circumvent it is by using another function as an indirection.

>>> from voluptuous import Schema, Any
>>> def _MySchema(value):
... return MySchema(value)
>>> from voluptuous import Any
>>> MySchema = Schema({"foo": Any("bar", _MySchema)})
>>> MySchema({"foo": {"foo": "bar"}})
{'foo': {'foo': 'bar'}}
>>> MySchema({"foo": {"foo": "baz"}})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']['foo']

Usage in REST API

I started to use Voluptuous to validate data in a the REST API provided by Gnocchi. So far it has been a really good tool, and we've been able to create a complete REST API that is very easy to validate on the server side. I would definitely recommend it for that. It blends with any Web framework easily.

One of the upside compared to solution like JSON Schema, is the ability to create or re-use your own custom data types while converting values at validation time. It is also very Pythonic, and extensible – it's pretty great to use for all of that. It's also not tied to any serialization format.

On the other hand, JSON Schema is language agnostic and is serializable itself as JSON. That makes it easy to be exported and provided to a consumer so it can understand the API and validate the data potentially on its side.

August 29, 2015


GPU mirroring provides a mechanism to have the CPU and the GPU use the same virtual address for the same physical (or IOMMU) page. An immediate result of this is that relocations can be eliminated. There are a few derivative benefits from the removal of the relocation mechanism, but it really all boils down to that. Other people call it other things, but I chose this name before I had heard other names. SVM would probably have been a better name had I read the OCL spec sooner. This is not an exclusive feature restricted to OpenCL. Any GPU client will hopefully eventually have this capability provided to them.

If you’re going to read any single PPGTT post of this series, I think it should not be this one. I was not sure I’d write this post when I started documenting the PPGTT (part 1, part2, part3). I had hoped that any of the following things would have solidified the decision by the time I completed part3.

  1. CODE: The code is not not merged, not reviewed, and not tested (by anyone but me). There’s no indication about the “upstreamability”. What this means is that if you read my blog to understand how the i915 driver currently works, you’ll be taking a crap-shoot on this one.
  2. DOCS: The Broadwell public Programmer Reference Manuals are not available. I can’t refer to them directly, I can only refer to the code.
  3. PRODUCT: Broadwell has not yet shipped. My ulterior motive had always been to rally the masses to test the code. Without product, that isn’t possible.

Concomitant with these facts, my memory of the code and interesting parts of the hardware it utilizes continues to degrade. Ultimately, I decided to write down what I can while it’s still fresh (for some very warped definition of “fresh”).


GPU mirroring is the goal. Dynamic page table allocations are very valuable by itself. Using dynamic page table allocations can dramatically conserve system memory when running with multiple address spaces (part 3 if you forgot), which is something which should become pretty common shortly. Consider for a moment a Broadwell legacy 32b system (more details later). TYou would require about 8MB for page tables to map one page of system memory. With the dynamic page table allocations, this would be reduced to 8K. Dynamic page table allocations are also an indirect requirement for implementing a 64b virtual address space. Having a 64b virtual address space is a pretty unremarkable feature by itself. On current workloads [that I am aware of] it provides no real benefit. Supporting 64b did require cleaning up the infrastructure code quite a bit though and should anything from the series get merged, and I believe the result is a huge improvement in code readability.

Current Status

I briefly mentioned dogfooding these several months ago. At that time I only had the dynamic page table allocations on GEN7 working. The fallout wasn’t nearly as bad as I was expecting, but things were far from stable. There was a second posting which is much more stable and contains support of everything through Broadwell. To summarize:

Feature Status TODO
Dynamic page tables Implemented Test and fix bugs
64b Address space Implemented Test and fix bugs
GPU mirroring Proof of Concept Decide on interface; Implement interface.1

Testing has been limited to just one machine, mine, when I don’t have a million other things to do. With that caveat, on top of my last PPGTT stabilization patches things look pretty stable.

Present: Relocations

Throughout many of my previous blog posts I’ve gone out of the way to avoid explaining relocations. My reluctance was because explaining the mechanics is quite tedious, not because it is a difficult concept. It’s impossible [and extremely unfortunate for my weekend] to make the case for why these new PPGTT features are cool without touching on relocations at least a little bit. The following picture exemplifies both the CPU and GPU mapping the same pages with the current relocation mechanism.

Current PPGTT support

Current PPGTT support

To get to the above state, something like the following would happen.

  1. Create BOx
  2. Create BOy
  3. Request BOx be uncached via (IOCTL DRM_IOCTL_I915_GEM_SET_CACHING).
  4. Do one of aforementioned operations on BOx and BOy
  5. Perform execbuf2.

Accesses to the BO from the CPU require having a CPU virtual address that eventually points to the pages representing the BO2. The GPU has no notion of CPU virtual addresses (unless you have a bug in your code). Inevitably, all the GPU really cares about is physical pages; which ones. On the other hand, userspace needs to build up a set of GPU commands which sometimes need to be aware of the absolute graphics address.

Several commands do not need an absolute address. 3DSTATE_VS for instance does not need to know anything about where Scratch Space Base Offset
is actually located. It needs to provide an offset to the General State Base Address. The General State Base Address does need to be known by userspace:

Using the relocation mechanism gives userspace a way to inform the i915 driver about the BOs which needs an absolute address. The handles plus some information about the GPU commands that need absolute graphics addresses are submitted at execbuf time. The kernel will make a GPU mapping for all the pages that constitute the BO, process the list of GPU commands needing update, and finally submit the work to the GPU.

Future: No relocations

GPU Mirroring

GPU Mirroring

The diagram above demonstrates the goal. Symmetric mappings to a BO on both the GPU and the CPU. There are benefits for ditching relocations. One of the nice side effects of getting rid of relocations is it allows us to drop the use of the DRM memory manager and simply rely on malloc as the address space allocator. The DRM memory allocator does not get the same amount of attention with regard to performance as malloc does. Even if it did perform as ideally as possible, it’s still a superfluous CPU workload. Other people can probably explain the CPU overhead in better detail. Oh, and OpenCL 2.0 requires it.

"OpenCL 2.0 adds support for shared virtual memory (a.k.a. SVM). SVM allows the host and 
kernels executing on devices to directly share complex, pointer-containing data structures such 
as trees and linked lists. It also eliminates the need to marshal data between the host and devices. 
As a result, SVM substantially simplifies OpenCL programming and may improve performance."

Makin’ it Happen


As I’ve already mentioned, the most obvious requirement is expanding the GPU address space to match the CPU.

Page Table Hierarchy

Page Table Hierarchy

If you have taken any sort of Operating Systems class, or read up on Linux MM within the last 10 years or so, the above drawing should be incredibly unremarkable. If you have not, you’re probably left with a big ‘WTF’ face. I probably can’t help you if you’re in the latter group, but I do sympathize. For the other camp: Broadwell brought 4 level page tables that work exactly how you’d expect them to. Instead of the x86 CPU’s CR3, GEN GPUs have PML4. When operating in legacy 32b mode, there are 4 PDP registers that each point to a page directory and therefore map 4GB of address space3. The register is just a simple logical address pointing to a page directory. The actual changes in hardware interactions are trivial on top of all the existing PPGTT work.

The keen observer will notice that there are only 256 PML4 entries. This has to do with the way in which we've come about 64b addressing in x86. <a href="" target="_blank">This wikipedia article</a> explains it pretty well, and has links.

“This will take one week. I can just allocate everything up front.” (Dynamic Page Table Allocation)

Funny story. I was asked to estimate how long it would take me to get this GPU mirror stuff in shape for a very rough proof of concept. “One week. I can just allocate everything up front.” If what I have is, “done” then I was off by 10x.

Where I went wrong in my estimate was math. If you consider the above, you quickly see why allocating everything up front is a terrible idea and flat out impossible on some systems.

Page for the PML4
512 PDP pages per PML4 (512, ok we actually use 256)
512 PD pages per PDP (256 * 512 pages for PDs)
512 PT pages per PD (256 * 512 * 512 pages for PTs)
(256 * 512^2 + 256 * 512 + 256 + 1) * PAGE_SIZE = ~256G = oops

Dissimilarities to x86

First and foremost, there are no GPU page faults to speak of. We cannot demand allocate anything in the traditional sense. I was naive though, and one of the first thoughts I had was: the Linux kernel [heck, just about everything that calls itself an OS] manages 4 level pages tables on multiple architectures. The page table format on Broadwell is remarkably similar to x86 page tables. If I can’t use the code directly, surely I can copy. Wrong.

Here is some code from the Linux kernel which demonstrates how you can get a PTE for a given address in Linux.

typedef unsigned long   pteval_t;
typedef struct { pteval_t pte; } pte_t;

static inline pteval_t native_pte_val(pte_t pte)
        return pte.pte;

static inline pteval_t pte_flags(pte_t pte)
        return native_pte_val(pte) & PTE_FLAGS_MASK;

static inline int pte_present(pte_t a)
        return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
static inline pte_t *pte_offset_kernel(pmd_t *pmd, unsigned long address)
        return (pte_t *)pmd_page_vaddr(*pmd) + pte_index(address);
#define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))

#define pgd_offset(mm, address) ( (mm)->pgd + pgd_index((address)))
static inline pud_t *pud_offset(pgd_t *pgd, unsigned long address)
        return (pud_t *)pgd_page_vaddr(*pgd) + pud_index(address);
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);

/* My completely fabricated example of finding page presence */
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *ptep;
struct mm_struct *mm = current->mm;
unsigned long address = 0xdefeca7e;

pgd = pgd_offset(mm, address);
pud = pud_offset(pgd, address);
pmd = pmd_offset(pud, address);
ptep = pte_offset_map(pmd, address);
printk("Page is present: %s\n", pte_present(*ptep) ? "yes" : "no");

X86 page table code has a two very distinct property that does not exist here (warning, this is slightly hand-wavy).

  1. The kernel knows exactly where in physical memory the page tables reside4. On x86, it need only read CR3. We don’t know where our pages tables reside in physical memory because of the IOMMU. When VT-d is enabled, the i915 driver only knows the DMA address of the page tables.
  2. There is a strong correlation between a CPU process and an mm (set of page tables). Keeping mappings around of the page tables is easy to do if you don’t want to take the hit to map them every time you need to look at a PTE.

If the Linux kernel needs to find if a page is present or not without taking a fault, it need only look to one of those two options. After about of week of making the IOMMU driver do things it shouldn’t do, and trying to push the square block through the round hole, I gave up on reusing the x86 code.

Why Do We Actually Need Page Table Tracking?

The IOMMU interfaces were not designed to pull a physical address from a DMA address. Pre-allocation is right out. It’s difficult to try to get the instantaneous state of the page tables…

Another thought I had very early on was that tracking could be avoided if we just never tore down page tables. I knew this wasn’t a good solution, but at that time I just wanted to get the thing working and didn’t really care if things blew up spectacularly after running for a few minutes. There is actually a really easy set of operations that show why this won’t work. For the following, think of the four level page tables as arrays. ie.

  • PML4[0-255], each point to a PDP
  • PDP[0-255][0-511], each point to a PD
  • PD[0-255][0-511][0-511], each point to a PT
  • PT[0-255][0-511][0-511][0-511] (where PT[0][0][0][0][0] is the 0th PTE in the system)
  1. [mesa] Create a 2M sized BO. Write to it. Submit it via execbuffer
  2. [i915] See new BO in the execbuffer list. Allocate page tables for it…
    1. [DRM]Find that address 0 is free.
    2. [i915]Allocate PDP for PML4[0]
    3. [i915]Allocate PD for PDP[0][0]
    4. [i915]Allocate PT for PD[0][0][0]/li>
    5. [i915](condensed)Set pointers from PML4->PDP->PD->PT
    6. [i915]Set the 512 PTEs PT[0][0][0][0][511-0] to point to the BO’s backing page.
  3. [i915] Dispatch work to the GPU on behalf of mesa.
  4. [i915] Observe the hardware has completed
  5. [mesa] Create a 4k sized BO. Write to it. Submit both BOs via execbuffer.
  6. [i915] See new BO in the execbuffer list. Allocate page tables for it…
    1. [DRM]Find that address 0x200000 is free.
    2. [i915]Allocate PDP[0][0], PD[0][0][0], PT[0][0][0][1].
    3. Set pointers… Wait. Is PDP[0][0] allocated already? Did we already set pointers? I have no freaking idea!
    4. Abort.

Page Tables Tracking with Bitmaps

Okay. I could have used a sentinel for empty entries. It is possible to achieve this same thing by using a sentinel value (point the page table entry to the scratch page). To implement this involves reading back potentially large amounts of data from the page tables which will be slow. It should work though. I didn’t try it.

After I had determined I couldn’t reuse x86 code, and that I need some way to track which page table elements were allocated, I was pretty set on using bitmaps for tracking usage. The idea of a hash table came and went – none of the upsides of a hash table are useful here, but all of the downsides are present(space). Bitmaps was sort of the default case. Unfortunately though, I did some math at this point, notice the LaTex!.
\frac{2^{47}bytes}{\frac{4096bytes}{1 page}} = 34359738368 pages \\  34359738368 pages \times \frac{1bit}{1page} = 34359738368 bits \\  34359738368 bits \times \frac{8bits}{1byte} = 4294967296 bytes
That’s 4GB simply to track every page. There’s some more overhead because page [tables, directories, directory pointers] are also tracked.
  256entries + (256\times512)entries + (256\times512^2)entries = 67240192entries \\  67240192entries \times \frac{1bit}{1entry} = 67240192bits \\  67240192bits \times \frac{8bits}{1byte} = 8405024bytes \\  4294967296bytes + 8405024bytes = 4303372320bytes \\  4303372320bytes \times \frac{1GB}{1073741824bytes} = 4.0078G

I can’t remember whether I had planned to statically pre-allocate the bitmaps, or I was so caught up in the details and couldn’t see the big picture. I remember thinking, 4GB just for the bitmaps, that will never fly. I probably spent a week trying to figure out a better solution. When we invent time travel, I will go back and talk to my former self: 4GB of bitmap tracking if you’re using 128TB of memory is inconsequential. That is 0.3% of the memory used by the GPU. Hopefully you didn’t fall into that trap, and I just wasted your time, but there it is anyway.

Sample code to walk the page tables

This code does not actually exist, but it is very similar to the real code. The following shows how one would “walk” to a specific address allocating the necessary page tables and setting the bitmaps along the way. Teardown is a bit harder, but it is similar.

static struct i915_pagedirpo *
alloc_one_pdp(struct i915_pml4 *pml4, int entry)

static struct i915_pagedir *
alloc_one_pd(struct i915_pagedirpo *pdp, int entry)

static struct i915_tab *
alloc_one_pt(struct i915_pagedir *pd, int entry)

 * alloc_page_tables - Allocate all page tables for the given virtual address.
 * This will allocate all the necessary page tables to map exactly one page at
 * @address. The page tables will not be connected, and the PTE will not point
 * to a page.
 * @ppgtt:	The PPGTT structure encapsulating the virtual address space.
 * @address:	The virtual address for which we want page tables.
static void
alloc_page_tables(ppgtt, unsigned long address)
	struct i915_pagetab *pt;
	struct i915_pagedir *pd;
	struct i915_pagedirpo *pdp;
	struct i915_pml4 *pml4 = &ppgtt->pml4; /* Always there */

	int pml4e = (address >> GEN8_PML4E_SHIFT) & GEN8_PML4E_MASK;
	int pdpe = (address >> GEN8_PDPE_SHIFT) & GEN8_PDPE_MASK;
	int pde = (address >> GEN8_PDE_SHIFT) & I915_PDE_MASK;
	int pte = (address & I915_PDES_PER_PD);

	if (!test_bit(pml4e, pml4->used_pml4es))
		goto alloc;

	pdp = pml4->pagedirpo[pml4e];
	if (!test_bit(pdpe, pdp->used_pdpes;))
		goto alloc;

	pd = pdp->pagedirs[pdpe];
	if (!test_bit(pde, pd->used_pdes)
		goto alloc;

	pt = pd->page_tables[pde];
	if (test_bit(pte, pt->used_ptes))

	pdp = alloc_one_pdp(pml4, pml4e);
	set_bit(pml4e, pml4->used_pml4es);
	pd = alloc_one_pd(pdp, pdpe);
	set_bit(pdpe, pdp->used_pdpes);
	pt = alloc_one_pt(pd, pde);
	set_bit(pde, pd->used_pdes);

Here is a picture which shows the bitmaps for the 2 allocation example above.

Bitmaps tracking page tables

Bitmaps tracking page tables

The GPU mirroring interface

I really don’t want to spend too much time here. In other words, no more pictures. As I’ve already mentioned, the interface was designed for a proof of concept which already had code using userptr. The shortest path was to simply reuse the interface.

In the patches I’ve submitted, 2 changes were made to the existing userptr interface (which wasn’t then, but is now, merged upstream). I added a context ID, and the flag to specify you want mirroring.

struct drm_i915_gem_userptr {
	__u64 user_ptr;
	__u64 user_size;
	__u32 ctx_id;
	__u32 flags;
#define I915_USERPTR_READ_ONLY          (1<<0)
#define I915_USERPTR_GPU_MIRROR         (1<<1)
#define I915_USERPTR_UNSYNCHRONIZED     (1<<31)
	 * Returned handle for the object.
	 * Object handles are nonzero.
	__u32 handle;
	__u32 pad;

The context argument is to tell the i915 driver for which address space we’ll be mirroring the BO. Recall from part 3 that a GPU process may have multiple contexts. The flag is simply to tell the kernel to use the value in user_ptr as the address to map the BO in the virtual address space of the GEN GPU. When using the normal userptr interface, the i915 driver will pick the GPU virtual address.

  • Pros:
    • This interface is very simple.
    • Existing userptr code does the hard work for us
  • Cons:
    • You need 1 IOCTL per object. Much undeeded overhead.
    • It’s subject to a lot of problems userptr has5
    • Userptr was already merged, so unless pad get’s repruposed, we’re screwed

What should be: soft pin

There hasn’t been too much discussion here, so it’s hard to say. I believe the trends of the discussion (and the author’s personal preference) would be to add flags to the existing execbuf relocation mechanism. The flag would tell the kernel to not relocate it, and use the presumed_offset field that already exists. This is sometimes called, “soft pin.” It is a bit of a chicken and egg problem since the amount of work in userspace to make this useful is non-trivial, and the feature can’t merged until there is an open source userspace. Stay tuned. Perhaps I’ll update the blog as the story unfolds.

Wrapping it up (all 4 parts)

As usual, please report bugs or ask questions.

So with the 4 parts you should understand how the GPU interacts with system memory. You should know what the Global GTT is, why it still exists, and how it works. You might recall what a PPGTT is, and the intricacies of multiple address space. Hopefully you remember what you just read about 64b and GPU mirror. Expect a rebased patch series from me soon with all that was discussed (quite a bit has changed around me since my original posting of the patches).

This is the last post I will be writing on how GEN hardware interfaces with system memory, and how that related to the i915 driver. Unlike the Rocky movie series, I will stop at the 4th. Like the Rocky movie series, I hope this is the best. Yes, I just went there.

Unlike the usual, “buy me a beer if you liked this”, I would like to buy you a beer if you read it and considered giving me feedback. So if you know me, or meet me somewhere, feel free to reclaim the voucher.

Image links

The images I’ve created. Feel free to do with them as you please.

Download PDF

  1. The patches I posted for enabling GPU mirroring piggyback of of the existing userptr interface. Before those patches were merged I added some info to the API (a flag + context) for the point of testing. I needed to get this working quickly and porting from the existing userptr code was the shortest path. Since then userptr has been merged without this extra info which makes things difficult for people trying to test things. In any case an interface needs to be agreed upon. My preference would be to do this via the existing relocation flags. One could add a new flag called "SOFT_PIN" 

  2. The GEM and BO terminology is a fancy sounding wrapper for the notion that we want an interface to coherently write data which the GPU can read (input), and have CPU observe data which the GPU has written (output)  

  3. The PDP registers are are not PDPEs because they do not have any of the associated flags of a PDPE. Also, note that in my patch series I submitted a patch which defines the number of these to be PDPE. This is incorrect. 

  4. I am not sure how KVM works manages page tables. At least conceptually I’d think they’d have a similar problem to the i915 driver’s page table management. I should have probably looked a bit closer as I may have been able to leverage that; but I didn’t have the idea until just now… looking at the KVM code, it does have a lot of similarities to the approach I took 

  5. Let me be clear that I don’t think userptr is a bad thing. It’s a very hard thing to get right, and much of the trickery needed for it is *not* needed for GPU mirroring 


Pictures are the right way to start.


Conceptual view of aliasing PPGTT bind/unbind

There is exactly one thing to get from the above drawing, everything else is just to make it as close to fact as possible.

  1. The aliasing PPGTT (aliases|shadows|mimics) the global GTT.

The wordy overview

Support for Per-process Graphics Translation Tables (PPGTT) debuted on Sandybridge (GEN6). The features provided by hardware are a superset of Aliasing PPGTT, which is entirely a software construct. The most obvious unimplemented feature is that the hardware supports multiple PPGTTs. Aliasing PPGTT is a single instance of a PPGTT. Although not entirely true, it’s easiest to think of the Aliasing PPGTT as a set page table of page tables that is maintained to have the identical mappings as the global GTT (the picture above). There is more on this in the Summary section

Until recently, aliasing PPGTT was the only way to make use of the hardware feature (unless you accidentally stepped into one of my personal branches). Aliasing PPGTT is implemented as a performance feature (more on this later). It was an important enabling step for us as well as it provided a good foundation for the lower levels of the real PPGTT code.

In the following, I will be using the HSW PRMs as a reference. I’ll also assume you’ve read, or understand part 1.

Selecting GGTT or PPGTT

Choosing between the GGTT and the Aliasing PPGTT is very straight forward. The choice is provided in several GPU commands. If there is no explicit choice, than there is some implicit behavior which is usually sensible. The most obvious command to be provided with a choice is MI_BATCH_BUFFER_START. When a batchbuffer is submitted, the driver sets a single bit that determines whether the batch will execute out of the GGTT or a Aliasing PPGTT1. Several commands as well, like PIPE_CONTROL, have a bit to direct which to use for the reads or writes that the GPU command will perform.


The names for all the page table data structures in hardware are the same as for IA CPU. You can see the Intel® 64 and IA-32 Architectures Software Developer Manuals for more information. (At the time of this post: page 1988 Vol3. 4.2 HIERARCHICAL PAGING STRUCTURES: AN OVERVIEW). I don’t want to rehash the HSW PRMs  too much, and I am probably not allowed to won’t copy the diagrams. However, for the sake of having a consolidated post, I will rehash the most pertinent parts.

There is one conceptual Page Directory for a PPGTT – the docs call this a set of Page Directory Entries (PDEs), however since they are contiguous, calling it a Page Directory makes a lot of sense to me. In fact, going back to the Ironlake docs, that seems to be the case. So there is one page directory with up to 512 entries, each pointing to a page table.  There are several good diagrams which I won’t bother redrawing in the PRMs2

Page Directory Entry
31:12 11:04 03:02 01 0
Physical Page Address 31:12 Physical Page Address 39:32 Rsvd Page size (4K/32K) Valid
Page Table Entry
31:12 11 10:04 03:01 0
Physical Page Address 31:12 Cacheability Control[3] Physical Page Address 38:32 Cacheability Control[2:0] Valid

There’s some things we can get from this for those too lazy to click on the links to the docs.

  1. PPGTT page tables exist in physical memory.
  2. PPGTT PTEs have the exact same layout as GGTT PTEs.
  3. PDEs don’t have cache attributes (more on this later).
  4. There exists support for big pages3

With the above definitions, we now can derive a lot of interesting attributes about our GPU. As already stated, the PPGTT is a two-level page table (I’ve not yet defined the size).

  • A PDE is 4 bytes wide
  • A PTE is 4 bytes wide
  • A Page table occupies 4k of memory.
  • There are 4k/4 entries in a page table.

With all this information, I now present you a slightly more accurate picture.


An object with an aliased PPGTT mapping


PP_DCLV – PPGTT Directory Cacheline Valid Register: As the spec tells us, “This register controls update of the on-chip PPGTT Directory Cache during a context restore.” This statement is directly contradicted in the very next paragraph, but the important part is the bit about the on-chip cache. This register also determines the amount of virtual address space covered by the PPGTT. The documentation for this register is pretty terrible, so a table is actually useful in this case.

PPGTT Directory Cacheline Valid Register (from the docs)
63:32 31:0
MBZ PPGTT Directory Cache Restore [1..32] 16 entries
DCLV, the right way
31 30 1 0
PDE[511:496] enable PDE [495:480] enable PDE[31:16] enable PDE[15:0] enable

The, “why” is not important. Each bit represents a cacheline of PDEs, which is how the register gets its name4. A PDE is 4 bytes, there are 64b in a cacheline, so 64/4 = 16 entries per bit.  We now know how much address space we have.

512 PDEs * 1024 PTEs per PT * 4096 PAGE_SIZE = 2GB


PP_DIR_BASE: Sadly, I cannot find the definition to this in the public HSW docs. However, I did manage to find a definition in the Ironlake docs yay me. There are several mentions in more recent docs, and it works the same way as is outlined on Ironlake. Quoting the docs again, “This register contains the offset into the GGTT where the (current context’s) PPGTT page directory begins.” We learn a very important caveat about the PPGTT here – the PPGTT PDEs reside within the GGTT.


With these two things, we now have the ability to program the location, and size (and get the thing to load into the on-chip cache). Here is current i915 code which switches the address space (with simple comments added). It’s actually pretty ho-hum.

ret = intel_ring_begin(ring, 6);
if (ret)
	return ret;

intel_ring_emit(ring, MI_LOAD_REGISTER_IMM(2));
intel_ring_emit(ring, RING_PP_DIR_DCLV(ring));
intel_ring_emit(ring, PP_DIR_DCLV_2G);       // program size
intel_ring_emit(ring, RING_PP_DIR_BASE(ring));
intel_ring_emit(ring, get_pd_offset(ppgtt)); // program location
intel_ring_emit(ring, MI_NOOP);

As you can see, we program the size to always be the full amount (in fact, I fixed this a long time ago, but never merged). Historically, the offset was at the top of the GGTT, but with my PPGTT series merged, that is abstracted out, and the simple get_pd_offset() macro gets the offset within the GGTT. The intel_ring_emit() stuff is because the docs recommended setting the registers via the GPU’s LOAD_REGISTER_IMMEDIATE command, though empirically it seems to be fine if we simply write the registers via MMIO (for Aliasing PPGTT). See my previous blog post if you want more info about the commands execution in the GPU’s ringbuffer. If it’s easier just pretend it’s 2 MMIO writes.


All of the resources are allocated and initialized upfront. There are 3 main steps. Note that the following comes from a relatively new kernel, and I have already submitted patches which change some of the cosmetics. However, the concepts haven’t changed for pre-gen8.

1. Allocate space in the GGTT for the PPGTT PDEs

ret = drm_mm_insert_node_in_range_generic(&dev_priv->,
					  &ppgtt->node, GEN6_PD_SIZE,
					  GEN6_PD_ALIGN, 0,
					  0, dev_priv->,

2. Allocate the page tables

for (i = 0; i < ppgtt->num_pd_entries; i++) {
	ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
	if (!ppgtt->pt_pages[i]) {
		return -ENOMEM;

3. [possibly] IOMMU map the pages

for (i = 0; i < ppgtt->num_pd_entries; i++) {
	dma_addr_t pt_addr;

	pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i], 0, 4096,

As the system binds, and unbinds objects into the aliasing PPGTT, it simply writes the PTEs for the given object (possibly spanning multiple page tables). The PDEs do not change. PDEs are mapped to a scratch page when not used, as are the PTEs.


As we saw in step 3 above, I mention that the page tables may be mapped by the IOMMU. This is one important caveat that I didn’t fully understand early on, so I wanted to recap a bit. Recall that the GGTT is allocated out of system memory during the boot firmware’s initialization. This means that as long as Linux treats that memory as special, everything will just work (just don’t look for IOMMU implicated bugs on our bugzilla). The page tables however are special because they get allocated after Linux is already running, and the IOMMU is potentially managing the memory. In other words, we don’t want to write the physical address to the PDEs, we want to write the dma address. Deferring to wikipedia again for the description of an IOMMU., that’s all.It tripped be up the first time I saw it because I hadn’t dealt with this kind of thing before. Our PTEs have worked the same way for a very long time when mapping the BOs, but those have somewhat hidden details because they use the scatter-gather functions.

Feel free to ask questions in the comments if you need more clarity – I’d probably need another diagram to accommodate.

Cached page tables

Let me be clear, I favored writing a separate post for the Aliasing PPGTT because it gets a lot of the details out of the way for the post about Full PPGTT. However, the entire point of this feature is to get a [to date, unmeasured] performance win. Let me explain… Notice bits 4:3 of the ECOCHK register.  Similarly in the i915 code:

ecochk = I915_READ(GAM_ECOCHK);
if (IS_HASWELL(dev)) {
} else {
I915_WRITE(GAM_ECOCHK, ecochk);

What these bits do is tell the GPU whether (and how) to cache the PPGTT page tables. Following the Haswell case, the code is saying to map the PPGTT page table with write-back caching policy. Since the writes for Aliasing PPGTT are only done at initialization, the policy is really not that important.

Below is how I’ve chosen to distinguish the two. I have no evidence that this is actually what happens, but it seems about right.


Flow chart for GPU GGTT memory access. Red means slow.

Flow chart for GPU PPGTT memory access. Red means slow.

Flow chart for GPU PPGTT memory access. Red means slow.

Red means slow. The point which was hopefully made clear above is that when you miss the TLB on a GGTT access, you need to fetch the entry from memory, which has a relatively high latency. When you miss the TLB on a PPGTT access, you have two caches (the special PDE cache for PPGTT, and LLC) which are backing the request. Note there is an intentional bug in the second diagram – you may miss the LLC on the PTE fetch also. I was trying to keep things simple, and show the hopeful case.

Because of this, all mappings which do not require GGTT mappings get mapped to the aliasing PPGTT.


Distinctions from the GGTT

At this point I hope you’re asking why we need the global GTT at all. There are a few limited cases where the hardware is incapable (or it is undesirable) of using a per process address space.

A brief description of why, with all the current callers of the global pin interface.

  • Display: Display actually implements it’s own version of the GGTT. Maintaining the logic to support multiple level page tables was both costly, and unnecessary. Anything relating to a buffer being scanned out to the display must always be mapped into the GGTT. Ie xpect this to be true, forever.
    • i915_gem_object_pin_to_display_plane(): page flipping
    • intel_setup_overlay(): overlays
  • Ringbuffer: Keep in mind that the aliasing PPGTT is a special case of PPGTT. The ringbuffer must remain address space and context agnostic. It doesn’t make any sense to connect it to the PPGTT, and therefore the logic does not support it. The ringbuffer provides direct communication to the hardware’s execution logic – which would be a nightmare to synchronize if we forget about the security nightmare. If you go off and think about how you would have a ringbuffer mapped by multiple address spaces, you will end up with something like execlists.
    • allocate_ring_buffer()
  • HW Contexts: Extremely similar to ringbuffer.
    • intel_alloc_context_page(): Ironlake RC6
    • i915_gem_create_context(): Create the default HW context
    • i915_gem_context_reset(): Re-pin the default HW context
    • do_switch(): Pin the logical context we’re switching to
  • Hardware status page: The use of this, prior to execlists, is much like rinbuffers, and contexts. There is a per process status page with execlists.
    • init_status_page()
  • Workarounds:
    • init_pipe_control(): Initialize scratch space for workarounds.
    • intel_init_render_ring_buffer(): An i830 w/a I won’t bother to understand
    • render_state_alloc(): Full initialization of GPUs 3d state from within the kernel
  • Other
    • i915_gem_gtt_pwrite_fast(): Handle pwrites through the aperture. More info here.
    • i915_gem_fault(): Map an object into the aperture for gtt_mmap. More info here.
    • i915_gem_pin_ioctl(): The DRI1 pin interface.

GEN8 disambiguation

Off the top of my head, the list of some of the changes on GEN8 which will get more detail in a later post. These changes are all upstream from the original Broadwell integration.

  • PTE size increased to 8b
    • Therefore, 512 entries per table
    • Format mimics the CPU PTEs
  • PDEs increased to 8b (remains 512 PDEs per PD)
    • Page Directories live in system memory
      • GGTT no longer holds the PDEs.
    • There are 4 PDPs, and therefore 4 PDs
    • PDEs are cached in LLC instead of special cache (I’m guessing)
  • New HW PDP (Page Directory Pointer) registers point to the PDs, for legacy 32b addressing.
    • PP_DIR_BASE, and PP_DCLV are removed
  • Support for 4 level page tables, up to 48b virtual address space.
    • PML4[PML4E]->PDP
    • PDP[PDPE] -> PD
    • PD[PDE] -> PT
    • PT{PTE] -> Memory
  • Big pages are now 64k instead of 32k (still not implemented)
  • New caching interface via PAT like structure


There’s actually an interesting thing that you start to notice after reading Distinctions from the GGTT. Just about every thing mapped into the GGTT shouldn’t be mapped into the PPGTT. We already stated that we try to map everything else into the PPGTT. The set of objects mapped in the GGTT, and the set of objects mapped into the PPGTT are disjoint5). The patches to make this work are not yet merged. I’d put an image here to demonstrate, but I am feeling lazy and I really want to get this post out today.


  • The Aliasing PPGTT is a single instance of the hardware feature: PPGTT.
  • Aliasing PPGTT was designed as a drop in performance replacement to the GGTT.
  • GEN8 changed a lot of architectural stuff.
  • The Aliasing PPGTT shouldn’t actually alias the GGTT because the objects they map are a disjoint set.

Like last time, links to all the SVGs I’ve created. Use them as you like.

Download PDF

  1. Actually it will use whatever the current PPGTT is, but for this post, that is always the Aliasing PPGTT 

  2. Page walk, Two-Level Per-Process Virtual Memory 

  3. Big pages have the same goal as they do on the CPU – to reduce TLB pressure. To date, there has been no implementation of big pages for GEN (though a while ago I started putting something together). There has been some anecdotal evidence that there isn’t a big win to be had for many workloads we care about, and so this remains a low priority. 

  4. This register thus allows us to limit, or make a sparse address space for the PPGTT. This mechanism is not used, even in the full PPGTT patches 

  5. There actually is a case on GEN6 which requires both. Currently this need is implemented by drivers/gpu/drm/i915/i915_gem_execbuffer.c: i915_gem_execbuffer_relocate_entry( 

For those of you reading this that didn’t know, I’ve had two months of paid vacation – one of the real perks of working for Intel. Today is the last day. It is as hard as I thought it would be.

Most of the vacation was spent vacationing. As I have access to none of the pictures at the moment, and I don’t want to make you jealous, I’ll skip over that. Toward the end though, I ended up at a coffee shop waiting for someone with nothing to do. I spent a little bit of time working on HOBos, and I thought it could be interesting to write about it.

WARNING: There is nothing novel here.

A brief history of HOBos

The HOBby operating system is an operating system project I started a while ago. I am a bit embarrassed that I started an OS. In my opinion, it’s one of lamer tasks to take on because

  1. everyone seems to do it;
  2. there really isn’t a need, there are many operating systems with permissive licenses already; and
  3. sites like OSDev have made much of the work trivial (I like to think that when I started there wasn’t quite as much info readily available, but that’s a lie).
Larrabee Av in Portland (not what the project was named after)

Larrabee Av in Portland
(not what the project was named after)

HOBos began while I was working on the Larrabee project. The team spent a lot of time optimizing the memory management, and the scheduler for the embedded software. I really wanted to work on these things full time. Unfortunately for me, having had a background in device drivers, I was often required to do other things. As a means to scratch the itch, I started HOBos after not finding anything suitable for my needs. The stuff I found were all either too advanced, or too rudimentary. When I was hired to work on i915, I decided that it was better use of my free time. Since then, I’ve been tweaking things here or there, and I do try to make sure things still run with the latest QEMU and compilers at least once a year. The last actual feature I added was more than 1300 days ago:

commit 1c9b5c78b22b97246989b00e807c9bf1fbc9e517
	Author: Ben Widawsky 
	Date: Sat Mar 19 21:19:57 2011 -0700

	basic backtrace

So back to the coffee shop. I tried to do something or other, got a hang, and didn’t want to fire up GDB.


HOBos had implemented backtraces since the original import from SVN (let’s pretend that means, since always). Obtaining a backtrace is actually pretty straightforward on x86.

The stack frame

The stack frame can be thought of as memory contents that are locally scoped to a function. Declaring a local variable will end up in the stack frame. A global variable will not. As functions call other functions, you end up with multiple frames. A stack is used because the last frames added are the first ones removed (this is not always true for things like exceptions, but nevermind that for now). The fact that a stack decrements is arbitrarily chosen, as far as I can tell. The following shows the stack when the function foo() calls the function bar().

Example Stackframe

Example Stackframe

void bt_fp(void *fp)
	do {
		uint64_t prev_rbp = *((uint64_t *)fp);
		uint64_t prev_ip = *((uint64_t *)(fp + sizeof(prev_rbp)));
		struct sym_offset sym_offset = get_symbol((void *)prev_ip);
		printf("\t%s (+0x%x)\n";,, sym_offset.offset);
		fp = (void *)prev_rbp;
		/* Stop if rbp is not in the kernel
		 * TODO: need an upper bound too*/
		if (fp <= (void *)KVADDR(DMAP_PML,0,0,0))
	} while(1);

The memory contents shown above are created as a result of two things. First is what the CPU implicitly does upon execution of the call instruction. The second is what the compiler generates. Since we’re talking about x86 here, the call instruction always pushes at least the return address. The second I’ll detail a bit more in a bit. Correlating this to the picture, the green (foo) and blue (bar) are creations of the compiler. The brown is automatically pushed on to the stack by the hardware, and is automatically popped from the stack on the ret instruction.

In the above there are two registers worth noting, RBP, and RSP. RBP which is the 64b extension to the original 8086 BP, Base Pointer, register is the beginning of the stack frame ie the Frame Pointer. RSP, the extension to the 8086 SP, Stack Pointer, points to the end of the stack frame. By convention the Base Pointer doesn’t change throughout a function being executed and therefore it is often used as the reference to local variables stored on the stack. -100(%rbp) as an example above.

Digging further into that disassembly above, one notices a pattern. Every function begins with:

push   %rbp       // Push the old RBP, RSP now points to this
mov    %rsp,%rbp  // Store RSP in RBP

Assuming this is the convention, it implies that at any given point during the execution of a function we can obtain the previous RBP by reading the current RBP and doing some processing. Specifically, reading RBP gives us the old Stack Pointer, which is pointing to the last RBP. As mentioned above, the x86 CPU pushed the return address immediately before the push %rbp – which means as we work backwards through the Base Pointers, we can also obtain the caller for the current stack frame. People have done really nice pictures on this – use your favorite search engine.

Here is the HOBos code (ignore the part about symbols for now):

void bt_fp(void *fp)
	do {
		uint64_t prev_rbp = *((uint64_t *)fp);
		uint64_t prev_ip = *((uint64_t *)(fp + sizeof(prev_rbp)));
		struct sym_offset sym_offset = get_symbol((void *)prev_ip);
		printf("\t%s (+0x%x)\n",, sym_offset.offset);
		fp = (void *)prev_rbp;
		/* Stop if rbp is not in the kernel
		 * TODO: need an upper bound too*/
		if (fp <= (void *)KVADDR(DMAP_PML,0,0,0))
	} while(1);

As far as I know, all modern CPUs work in a similar fashion with differences sprinkled here and there. ARM for example has an LR register for the return address instead of using the stack.

ABI/Calling Conventions

The fact that we can work backwards this way is a byproduct of the calling convention. One example of an aspect of the calling convention is where to put the arguments to a function. Do they go on the stack, in registers, or somewhere else? In addition to these arguments, the way in which RBP, and RSP are used are strictly software constructs that are part of the convention. As a result, it might not always be possible to get a backtrace if:

  1. This convention is not adhered to (or -fomit-frame-pointer)
  2. The contents of RBP are destroyed
  3. The contents of the stack are corrupted.

How arguments are passed to function are also needed to make sure linkers and loaders (both static and dynamic) can operate to correctly form an executable, or dynamically call a function. Since this isn’t really important to obtaining a backtrace, I will leave it there. Some architectures do provide a way to obtain useful backtrace information without caring about the calling convention: Intel’s Processor Trace for example.

Symbol information

The previous section will get us a reverse list of addresses for all function calls working backward from a given point during execution. But having names makes things much more easier to quickly diagnose what is going wrong. There is a lot of information on the internet about this stuff. I’m simply providing all that’s relevant to my specific problem.

ELF Symbols (linking)

The ELF format provides everything we need (assuming things aren’t stripped). Glossing over the details (see this simple tool if you’re curious), we end up with two “sections” that tell us everything we need. They are conventionally named, “.symtab”, and “.strtab” and are conveniently of type, SHT_SYMTAB, and SHT_STRTAB. The symbol table defines the information about each symbol (functions, variables, whatever). Part of the information is a name, which is an index into the string table. In this simplest case, these are provisions for inter-object linking. If I had defined foo() in foo.c, and bar() in bar.c, the compiled object files can be linked together, but the linker needs the information about the symbol bar (in this case) in order to do its job.

readelf -S a.out

[Nr] Name Type Address Offset
[33] .symtab SYMTAB 0000000000000000 000015b8
[34] .strtab STRTAB 0000000000000000 00001c90

> readelf -S a.out | egrep "\.strtab|\.symtab" | wc -l
> strip a.out
> readelf -S a.out | egrep "\.strtab|\.symtab" | wc -l

Summing that up, if we have an entire ELF file, and the symbol and string tables haven’t been stripped, we’re good to go. However, ELF sections are not the unit in which an ELF loader decides what to load. The loader loads segments which are of type PT_LOAD. A segment is made up of 0 or more sections, plus padding. Since the Operating System is itself an ELF loaded by an ELF loader (the bootloader) we’re not in a good position. :(

> readelf -l a.out | egrep "\.strtab|\.symtab" | wc -l

ELF Loader

ELF Loader

Debug Info

Note that what we want is not the same thing as debug information. If one wants to do source level debug, there needs to be some way of correlating a machine instruction to a line of source code. This is also a software abstraction, and there is usually a spec for it unless you are using some proprietary thing. It would technically be possible to include DWARF capabilities within the kernel, but I do not know of a way to get that info to the OS (see multiboot stuff for details).

From boot to symbols

The HOBos project implements a small Multiboot compliant bootloader called smallboot. When the machine starts up, boot firmware is loaded from a fixed location (this is currently done by SeaBIOS). The boot firmware then loads the smallboot bootloader. The bootloader will load the kernel (smallboot, and most modern bootloaders will do this through a text configuration file on the resident filesystem). In the HOBos case, the kernel is simply an ELF file. smallboot implements a basic ELF loader to load the kernel into memory and give execution over.

The multiboot specification is a standardized communication mechanism (for various things)  from the bootloader to the Operating System (or any file really). One of these things is symbol information. Quoting the multiboot spec

If bit 5 in the ‘flags’ word is set, then the following fields in the Multiboot information structure starting at byte 28 are valid:

     28      | num               |
     32      | size              |
     36      | addr              |
     40      | shndx             |

These indicate where the section header table from an ELF kernel is, the size of each entry, number of entries, and the string table used as the index of names. They correspond to the ‘shdr_*’ entries (‘shdr_num’, etc.) in the Executable and Linkable Format (elf) specification in the program header. All sections are loaded, and the physical address fields of the elf section header then refer to where the sections are in memory (refer to the i386 elf documentation for details as to how to read the section header(s)). Note that ‘shdr_num’ may be 0, indicating no symbols, even if bit 5 in the ‘flags’ word is set.

Since the beginning I had implemented these fields in the bootloader:

multiboot_info.flags |= MULTIBOOT_INFO_ELF_SHDR;
multiboot_info.u.elf_sec = *table;

Because of the fact that the symbols weren’t in the ELF segments though, I was stumped as to how to get at the data one the OS is loaded. As it turned out, I didn’t actually read all 4 sentences and  I had missed one very important part.

All sections are loaded, and the physical address fields of the elf section header then refer to where the sections are in memory

What the spec is dictating is that even though the sections are not in loadable segments, they shall exist within memory during the handover to the OS, and the section header information will be updated so that the OS knows where to find it. With this, the OS can copy out, or just make sure not to overwrite the info, and then get access to it.

for (i = 0; i < shnum; i++) {
    __ElfN(Shdr) *sh = &shdr[i];
    if (sh->sh_size == 0)

    if (sh->sh_addr) /* Already loaded */

    ASSERT(sizeof(void *) == 4);
    *((volatile __ElfN(Addr) *)&sh->sh_addr) = sh->sh_offset + (uint32_t)addr;

et cetera

The code for pulling out the symbols is quite a bit longer, but it can be found in kern/core/syms.c. With the given RBP unwinder near the top, we can easily get the IP for the caller. With that IP, we do a symbol lookup from the symbols we got via the multiboot info.

Screenshot with backtrace

Screenshot with backtrace

Inkscape Links

Download PDF
August 27, 2015

LAST REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!

Here's the last reminder that the systemd.conf 2015 CfP ends on August 31st 11:59:59pm Central European Time (that's monday next week)! Make sure to submit your proposals until then!

Please submit your proposals on our website!

And don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! Register here!.

For further details about systemd.conf consult the conference website.

August 26, 2015

click here to jump to the instructions

Mice have an optical sensor that tells them how far they moved in "mickeys". Depending on the sensor, a mickey is anywhere between 1/100 to 1/8200 of an inch or less. The current "standard" resolution is 1000 DPI, but older mice will have 800 DPI, 400 DPI etc. Resolutions above 1200 DPI are generally reserved for gaming mice with (usually) switchable resolution and it's an arms race between manufacturers in who can advertise higher numbers.

HW manufacturers are cheap bastards so of course the mice don't advertise the sensor resolution. Which means that for the purpose of pointer acceleration there is no physical reference. That delta of 10 could be a millimeter of mouse movement or a nanometer, you just can't know. And if pointer acceleration works on input without reference, it becomes useless and unpredictable. That is partially intended, HW manufacturers advertise that a lower resolution will provide more precision while sniping and a higher resolution means faster turns while running around doing rocket jumps. I personally don't think that there's much difference between 5000 and 8000 DPI anymore, the mouse is so sensitive that if you sneeze your pointer ends up next to Philae. But then again, who am I to argue with marketing types.

For us, useless and unpredictable is bad, especially in the use-case of everyday desktops. To work around that, libinput 0.7 now incorporates the physical resolution into pointer acceleration. And to do that we need a database, which will be provided by udev as of systemd 218 (unreleased at the time of writing). This database incorporates the various devices and their physical resolution, together with their sampling rate. udev sets the resolution as the MOUSE_DPI property that we can read in libinput and use as reference point in the pointer accel code. In the simplest case, the entry lists a single resolution with a single frequency (e.g. "MOUSE_DPI=1000@125"), for switchable gaming mice it lists a list of resolutions with frequencies and marks the default with an asterisk ("MOUSE_DPI=400@50 800@50 *1000@125 1200@125"). And you can and should help us populate the database so it gets useful really quickly.

How to add your device to the database

We use udev's hwdb for the database list. The upstream file is in /usr/lib/udev/hwdb.d/70-mouse.hwdb, the ruleset to trigger a match is in /usr/lib/udev/rules.d/70-mouse.rules. The easiest way to add a match is with the libevdev mouse-dpi-tool (version 1.3.2). Run it and follow the instructions. The output looks like this:

$ sudo ./tools/mouse-dpi-tool /dev/input/event8
Mouse Lenovo Optical USB Mouse on /dev/input/event8
Move the device along the x-axis.
Pause 3 seconds before movement to reset, Ctrl+C to exit.
Covered distance in device units: 264 at frequency 125.0Hz | |^C
Estimated sampling frequency: 125Hz
To calculate resolution, measure physical distance covered
and look up the matching resolution in the table below
16mm 0.66in 400dpi
11mm 0.44in 600dpi
8mm 0.33in 800dpi
6mm 0.26in 1000dpi
5mm 0.22in 1200dpi
4mm 0.19in 1400dpi
4mm 0.17in 1600dpi
3mm 0.15in 1800dpi
3mm 0.13in 2000dpi
3mm 0.12in 2200dpi
2mm 0.11in 2400dpi

Entry for hwdb match (replace XXX with the resolution in DPI):
mouse:usb:v17efp6019:name:Lenovo Optical USB Mouse:
Take those last two lines, add them to a local new file /etc/udev/hwdb.d/71-mouse.hwdb. Rebuild the hwdb, trigger it, and done:

$ sudo udevadm hwdb --update
$ sudo udevadm trigger /dev/input/event8
Leave out the device path if you're not on systemd 218 yet. Check if the property is set:

$ udevadm info /dev/input/event8 | grep MOUSE_DPI
E: MOUSE_DPI=1000@125
And that shows everything worked. Restart X/Wayland/whatever uses libinput and you're good to go. If it works, double-check the upstream instructions, then file a bug against systemd with those two lines and assign it to me.

Trackballs are a bit hard to measure like this, my suggestion is to check the manufacturer's website first for any resolution data.

Update 2014/12/06: trackball comment added, udevadm trigger comment for pre 218
Update 2015/08/26: udpated link to systemd bugzilla (now on github)

August 24, 2015

First Round of systemd.conf 2015 Sponsors

We are happy to announce the first round of systemd.conf 2015 sponsors!

Our first Silver sponsor is CoreOS!

CoreOS develops software for modern infrastructure that delivers a consistent operating environment for distributed applications. CoreOS's commercial offering, Tectonic, is an enterprise-ready platform that combines Kubernetes and the CoreOS stack to run Linux containers. In addition CoreOS is the creator and maintainer of open source projects such as CoreOS Linux, etcd, fleet, flannel and rkt. The strategies and architectures that influence CoreOS allow companies like Google, Facebook and Twitter to run their services at scale with high resilience. Learn more about CoreOS here, Tectonic here, or follow CoreOS on Twitter @coreoslinux.

A Bronze sponsor is Codethink:

Codethink is a software services consultancy, focusing on engineering reliable systems for long-term deployment with open source technologies.

A Bronze sponsor is Pantheon:

Pantheon is a platform for professional website development, testing, and deployment. Supporting Drupal and WordPress, Pantheon runs over 100,000 websites for the world's top brands, universities, and media organizations on top of over a million containers.

A Bronze sponsor is Pengutronix:

Pengutronix provides consulting, training and development services for Embedded Linux to customers from the industry. The Kernel Team ports Linux to customer hardware and has more than 3100 patches in the official mainline kernel. In addition to lowlevel ports, the Pengutronix Application Team is responsible for board support packages based on PTXdist or Yocto and deals with system integration (this is where systemd plays an important role). The Graphics Team works on accelerated multimedia tasks, based on the Linux kernel, GStreamer, Qt and web technologies.

We'd like to thank our sponsors for their support! Without sponsors our conference would not be possible!

We'll shortly announce our second round of sponsors, please stay tuned!

If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our Becoming a Sponsor page!

Reminder! The systemd.conf 2015 Call for Presentations ends on monday, August 31st! Please make sure to submit your proposals on the CfP page until then!

Also, don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! Register here!.

For further details about systemd.conf consult the conference website.

August 19, 2015

So I realized I hadn’t posted a Wayland update in a while. So we are still making good progress on Wayland, but the old saying that the last 10% is 90% of the work is definitely true here. So there was a Wayland BOF at GUADEC this year which tried to create a TODO list for major items remaining before Wayland is ready to replace X.

  • Proper menu positioning. Most visible user issue currently for people testing Wayland
  • Relative pointer/confinement/locking. Important for games among other things.
  • Kinetic scroll in GTK+
  • More work needed to remove all X11 dependencies (so that desktop has no dependency on XWayland being available.
  • Minimize main thread stalls (could be texture uploads for example)
  • Tablet support. Includes gestures, on-screen keyboard and more.

A big thank you to Jonas Ådahl, Owen Taylor, Carlos Garnacho, Rui Matos, Marek Chalupa, Olivier Fourdan and more for their ongoing work on polishing
up the Wayland experience.

So as you can tell there is a still lot of details that needs working out when doing something as major as switching from one Display system to the other, but things are starting to looking really good now.

One new feature I am particularly excited about is what we call multi-DPI support ready for Wayland sessions in Fedora 23. What this means is that if you have a HiDPI laptop screen and a standard DPI external monitor you should be able to drag windows back and forth between the two screens and have them automatically rescale to work with the different DPIs. This is an example of an issue which was relatively straightforward to resolve under Wayland, but which would have been a lot of pain to get working under X.

We will not be defaulting to Wayland in Fedora Workstation 23 though, because as I have said in earlier blog posts about the subject, we will need to have a stable and feature complete Wayland in at least one release before we switch the default. We hope Fedora Workstation 23 will be that stable and feature complete release, which means Fedora Workstation 24 is the one where we can hope to make the default switchover.

Of course porting the desktop itself to Wayland is just part of the story. While we support running X applications using XWayland, to get full benefit from Wayland we need our applications to run on top of Wayland natively. So we spent effort on getting some major applications like LibreOffice and Firefox Wayland ready recently.

Caolan McNamara has been working hard on finishing up the GTK3 port of LibreOffice which is part of the bigger effort to bring LibreOffice nativly to Wayland. The GTK3 version of LibreOffice should be available in Fedora Workstation 23 (as non-default option) and all the necessary code will be included in LibreOffice 5 which will be released pretty soon. The GTK3 version should be default in F24, hopefully with complete Wayland support.

For Firefox Martin Stransky has been looking into ensuring Firefox runs on Wayland now that the basic GTK3 port is done. Martin just reported today that he got Firefox running natively under Wayland, although there are still a lot of glitches and issues he needs to figure out before it can be claimed to be ready for normal use.

Another major piece we are working on which is not directly Wayland related, but which has a Wayland component too is to try to move the state of Linux forward in the context of dealing with multiple OpenGL implementations, multi-GPU systems and the need to free our 3D stack from its close ties to GLX.

This work with is lead by Adam Jackson, but where also Dave Airlie is a major contributor, involves trying to decide and implement what is needed to have things like GL Dispatch, EGLstreams and EGL Device proposals used across the stack. Once this work is done the challenges around for instance using the NVidia binary driver on a linux system or using a discreet GPU like on Optimus laptops should be a thing of the past.

So the first step of this work is getting GL Dispatch implemented. GL Dispatch basically allows you to have multiple OpenGL implementations installed and then have your system pick the right one as needed. So for instance on a system with NVidia Optimus you can use Mesa with the integrated Intel graphics card, but NVidias binary OpenGL implementatioin with the discreet Optimus GPU. Currently that is a pain to do since you can only have one OpenGL implementation used. Bumblebee tries to hack around that requirement, but GL Dispatch will allow us to resolve this without having to ‘fight’ the assumptions of the system.

We plan to have easy to use support for both Optimus and Prime (the Nouveau equivalent of Optimus) in the desktop, allowing you to choose what GPU to use for your applications without needing to edit any text files or set environment variables.

The final step then is getting the EGL Device and EGLStreams proposal implemented so we can have something to run Wayland on top of. And while GL Dispatch are not directly related to those we do feel that having it in place should make the setup easier to handle as you don’t risk conflicts between the binary NVidia driver and the Mesa driver anymore at that point, which becomes even more crucial for Wayland since it runs on top of EGL.

August 18, 2015

REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!

We'd like to remind you that the systemd.conf 2015 Call for Presentations ends on August 31st! Please submit your presentation proposals before that data on our website.

We are specifically interested in submissions from projects and vendors building today's and tomorrow's products, services and devices with systemd. We'd like to learn about the problems you encounter and the benefits you see! Hence, if you work for a company using systemd, please submit a presentation!

We are also specifically interested in submissions from downstream distribution maintainers of systemd! If you develop or maintain systemd packages in a distribution, please submit a presentation reporting about the state, future and the problems of systemd packaging so that we can improve downstream collaboration!

And of course, all talks regarding systemd usage in containers, in the cloud, on servers, on the desktop, in mobile and in embedded are highly welcome! Talks about systemd networking and kdbus IPC are very welcome too!

Please submit your presentations until August 31st!

And don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! Register here!.

Also, limited travel and entry fee sponsorship is available for community contributors. Please contact us for details!

For further details about the CfP consult the CfP page.

For further details about systemd.conf consult the conference website.

REMINDER! systemd.conf 2015 Call for Papers ends August 31st!

We'd like to remind you that the systemd.conf 2015 Call for Presentations ends on August 31st! Please submit your presentation proposals before that data on our website.

We are specifically interested in submissions from projects and vendors building today's and tomorrow's products, services and devices with systemd. We'd like to learn about the problems you encounter and the benefits you see! Hence, if you work for a company using systemd, please submit a presentation!

We are also specifically interested in submissions from downstream distribution maintainers of systemd! If you develop or maintain systemd packages in a distribution, please submit a presentation reporting about the state, future and the problems of systemd packaging so that we can improve downstream collaboration!

And of course, all talks regarding systemd usage in containers, in the cloud, on servers, on the desktop, in mobile and in embedded are highly welcome! Talks about systemd networking and kdbus IPC are very welcome too!

Please submit your presentations until August 31st!

And don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! Register here!.

Also, limited travel and entry fee sponsorship is available for community contributors. Please contact us for details!

For further details abou the CfP consult the CfP page.

For further details about systemd.conf consult the conference website.

First Round of systemd.conf 2015 Sponsors

We are happy to announce the first round of systemd.conf 2015 sponsors!

Our first Silver sponsor is CoreOS!

CoreOS develops software for modern infrastructure that delivers a consistent operating environment for distributed applications. CoreOS's commercial offering, Tectonic, is an enterprise-ready platform that combines Kubernetes and the CoreOS stack to run Linux containers. In addition CoreOS is the creator and maintainer of open source projects such as CoreOS Linux, etcd, fleet, flannel and rkt. The strategies and architectures that influence CoreOS allow companies like Google, Facebook and Twitter to run their services at scale with high resilience. Learn more about CoreOS here, Tectonic here, or follow CoreOS on Twitter @coreoslinux.

A Bronze sponsor is Codethink:

Codethink is a software services consultancy, focusing on engineering reliable systems for long-term deployment with open source technologies.

A Bronze sponsor is Pantheon:

Pantheon is a platform for professional website development, testing, and deployment. Supporting Drupal and WordPress, Pantheon runs over 100,000 websites for the world's top brands, universities, and media organizations on top of over a million containers.

A Bronze sponsor is Pengutronix:

Pengutronix provides consulting, training and development services for Embedded Linux to customers from the industry. The Kernel Team ports Linux to customer hardware and has more than 3100 patches in the official mainline kernel. In addition to lowlevel ports, the Pengutronix Application Team is responsible for board support packages based on PTXdist or Yocto and deals with system integration (this is where systemd plays an important role). The Graphics Team works on accelerated multimedia tasks, based on the Linux kernel, GStreamer, Qt and web technologies.

We'd like to thank our sponsors for their support! Without sponsors our conference would not be possible!

We'll shortly announce our second round of sponsors, please stay tuned!

If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our Becoming a Sponsor page!

Reminder! The systemd.conf 2015 Call for Presentations ends on monday, August 31st! Please make sure to submit your proposals on the CfP page until then!

Also, don't forget to register for the conference! Only a limited number of registrations are available due to space constraints! Register here!.

For further details about systemd.conf consult the conference website.

August 17, 2015

A couple of weeks I visited my mother back home in Norway. She had gotten a new laptop some time ago that my brother-in-law had set up for her. As usual when I come for a visit I was asked to look at some technical issues my mother was experiencing with her computer. Anyway, one thing I discovered while looking at these issues was that my brother-in-law had installed OpenOffice on her computer. So knowing that the OpenOffice project is all but dead upstream since IBM pulled their developers of the project almost a year ago and has significantly fallen behind feature wise, I of course installed LibreOffice on the system instead, knowing it has a strong and vibrant community standing behind it and is going from strength to strength.

And this is why I am writing this open letter. Because while a lot of us who comes from technical backgrounds have already caught on to the need to migrate from OpenOffice to LibreOffice, there are still many non-technical users out there who are still defaulting to installing OpenOffice when they are looking for an open source Office Suite, because that is the one they came across 5 years ago. And I believe that the Apache Foundation, being an organization dedicated to open source software, care about the general quality and perception of open source software and thus would share my interest in making sure that all users of open source software gets the best experience possible, even if the project in question isn’t using their code license of preference.

So I realize that the Apache Foundation took a lot of pride in and has invested a lot of effort trying to create an Apache Licensed Office suite based on the old OpenOffice codebase, but I hope that now that it is clear that this effort has failed that you would be willing to re-direct people who go to the website to the LibreOffice website instead. Letting users believe that OpenOffice is still alive and evolving is only damaging the general reputation of open source Office software among non-technical users and thus I truly believe that it would be in everyones interest to help the remaining OpenOffice users over to LibreOffice.

And to be absolutely clear I am only suggesting this due to the stagnant state of the OpenOffice project, if OpenOffice had managed to build a large active community beyond the resources IBM used to provide then it would have been a very different story, but since that did not happen I don’t see any value to anyone involved to just let users keep downloading an aging releases of a stagnant codebase until the point where bit rot chases them away or they hear about LibreOffice true mainstream media or friends. And as we all know it is not about just needing a developer or two to volunteer here, maintaining and developing something as large as OpenOffice is a huge undertaking and needs a very sizeable and dedicated community to be able to succeed.

So dear Apache developers, for the sake of open source and free software, please recommend people to go and download LibreOffice, the free office suite that is being actively maintained and developed and which has the best chance of giving them a great experience using free software. OpenOffice is an important part of open source history, but that is also what it is at this point in time.

August 16, 2015
After a few years of development the atomic display update IOCTL for drm drivers is finally ready for prime time with the 4.2 pull request from Dave Airlie. It's been a long road, with a lot of drivers already converted over to atomic and even more in progress, the atomic helper libraries and support code in the drm subsystem sufficiently polished. But what's really missing is a design overview of what the overall atomic infrastructure looks like and why some decisions and details are implemented like they are.

That's now done and published on LWN: Part 1 talks about the problem space, issues with the Android atomic display framework and the basic atomic IOCTL interface. Part 2 goes into more detail about a few specific things like locking, helper library design and the exact semantics of atomic modessetting updates. Happy Reading!
August 15, 2015
So the big news for the upcoming mesa 11.0 release is gl4.x support for radeon and nouveau.  Which has been in the works for a long time, and a pretty tremendous milestone (and the reason that the next mesa release is 11.0 rather than 10.7).  But on the freedreno side of things, we haven't been sitting still either.  In fact, with the transform-feedback support I landed a couple weeks ago (for a3xx+a4xx), plus MRT+z32s8 support for a4xx (Ilia landed the a3xx parts of those a while back), we now support OpenGLES 3.0[1] on both adreno 3xx and 4xx!!

In addition, with the TBO support that landed a few days ago, plus handful of other fixes in the last few days, we have the new antarctica gl3.1 render engine for supertuxkart working!

Note that you need to use MESA_GL_VERSION_OVERRIDE=3.1 and MESA_GLSL_VERSION_OVERRIDE=140, since while we support everything that stk needs, we don't yet support everything needed to advertise gl3.1.  (But hey, according to qualcomm, adreno 3xx doesn't even support higher than gles3.0.. I guess we'll have to show them ;-))

The nice thing to see about this working, is that it is utilizing pretty much all of the recent freedreno features (transform feedback, MRT, UBO's, TBO's, etc).

Of course, the new render engine is considerably more heavyweight compared to older versions of stk.  But I think there is some low hanging fruit on the stk engine side of things to reclaim some of those lost fps.

update: oh, and the first time around, I completely forgot to mention that qualcomm has recently published *some* gpu docs, for a3xx, for the dragonboard 410c. Not quite as extensive as what broadcom has published for vc4, but it gives us all the a3xx registers, which is quite a lot more than any other SoC vendor has done to date :-)

[1] minus MSAA.. There is a bigger task, which is on the TODO list, to teach mesa st about some extensions to support MSAA resolve on tile->mem.. such as EXT_multisampled_render_to_texture, plus perhaps driconf option to enable it for apps that are not aware, which would make MSAA much more useful on a tiling gpu.  Until then, mesa doesn't check MSAA for gles3, and if it did we could advertise PIPE_CAP_FAKE_SW_MSAA.  Plus, who really cares about MSAA on a 5" 4k screen?

August 13, 2015

I've started to use Pocket a few months ago to store my backlog of things to read. It's especially useful as I can use it to read content offline since we still don't have any Internet access in places such as airplanes or the Paris metro. It's only 2015 after all.

I am also a subscriber for years now, and I really like their articles from the weekly edition. Unfortunately, as the access is restricted to subscribers, you need to login: it makes it impossible to add these articles to Pocket directly. Sad.

Yesterday, I thought about that and decided to start hacking on it. LWN provides a feature called "Subscriber Link" that allows you to share an article with a friend. I managed to use that feature to share the articles with my friend… Pocket!

As doing that every week is tedious, I wrote a small Python program called lwn2pocket that I published on GitHub. Feel free to use it, hack it and send pull requests.

August 11, 2015

One of the things we discussed at this year’s Akademy conference is making AppStream work on Kubuntu.appstream-logo

On Debian-based systems, we use a YAML-based implementation of AppStream, called “DEP-11”. DEP-11 exists for historical reasons (the DEP-11 YAML format was a superset of AppStream once) and because YAML, unlike XML, is one accepted file format by the Debian FTPMasters team, who we want to have on board when adding support for AppStream.

So I’ve spent the last few days on setting up the DEP-11 generator for Kubuntu, as well as improving it greatly to produce more meaningful error messages and to generate better output. It became a bit slower in the process, but the greatly improved diagnostics data is worth it.

For example, maintainers of Debian packages will now get a more verbose explanation of issues found with metadata in their packages, making them easier to fix for people who didn’t get in contact with AppStream yet.

At time, we generate AppStream metadata for Tanglu, Kubuntu and Debian, but so far only Tanglu makes real use of it by shipping it in a .deb package. Shipping the data as package is only a workaround though, for a proper implementation, the data will be downloaded by Apt. To achieve that, the data needs to reach the archive first, which is something that I can hopefully discuss and implement with the FTPMasters team of Debian at this year’s Debconf.

When this is done, the new metadata will automatically become available in tools like GNOME-Software or Muon Discover.

How can I see if there are issues with my package?

The dep11-generator tool will return HTML pages to show both the extracted metadata, as well as issues with it.

You can find the information for the respective distribution here:

Each issue tag will contain a detailed explanation of what went wrong. Errors generally lead to ignoring the metadata, so it will not be processed. Warnings usually concern things which might reduce the amount of metadata or make it less useful, while Info-type hints contain information on how to improve the metadata or make it more useful.

Can I use the data already?

Yes, you can. You just need to place the compressed YAML files in /var/cache/app-info/yaml and the icons in /var/cache/app-info/icons/<suite>-<component>/<size>, for example: /var/cache/app-info/icons/jessie-amd64/64×64

I think I found a bug in the generator…

In that case, please report the issue against the appstream-dep11 package at Debian, or file an issue at Github..

The only reason why I announce this feature now is to find remaining generator bugs, before officially announcing the feature on debian-devel-announce.

When will this be officially announced?

I want to give this feature a little bit more testing, and ideally have the integration into the archive ready, so people can see how the metadata looks like when rendered in GNOME-Software/Discover. I also want to write a bit more documentation to help Debian developers and upstreams to improve their metadata.

Ideally, I also want to incorporate some feedback at Debconf when announcing full AppStream support in Debian. So take all the stuff you’ve read above as a little sneak peek 😉

Debconf15goingI will also give a talk at Debconf, titled “AppStream, Limba, XdgApp – Where we are going.” The aim of this talk is to give an insight i tnto the new developments happening in the software distribution area, and what the goal of these different projects is (the talk should give an overview of what’s in the oven and how it will impact Debian). So if you are interested, please drop by :-) Maybe setting up a BOF would also be a good idea.

August 10, 2015

I am very late with this, but I still wanted to write a few words about Akademy 2015.

First of all: It was an awesome conference! Meeting all the great people involved with KDE and seeing who I am working with (as in: face to face, not via email) was awesome. We had some very interesting discussions on a wide variety of topics, and also enjoyed quite some beer together. Particularly important to me was of course talking to Daniel Vrátil who is working on XdgApp, and Aleix Pol of Muon Discover fame.

Also meeting with the other Kubuntu members was awesome – I haven’t seen some of them for about 3 years, and also met many cool people for the first time.

My talk on AppStream/Limba went well, except that I got slightly confused by the timer showing that I had only 2 minutes left, after I had just completed the first half of my talk. It turned out that the timer was wrong 😉

Another really nice aspect was to be able to get an insight into areas where I am usually not involved with, like visual design. It was really interesting to learn about the great work others are doing and to talk to people about their work – and I also managed to scratch an itch in the Systemsettings application, where three categories had shown the same icon. Now Systemsettings looks like it is supposed to be, finally :-)

The only thing I could maybe complain about was the weather, which was more Scotland/Wales like than Spanish – but that didn’t stop us at all, not even at the social event outside. So I actually don’t complain 😉

We also managed to discuss some new technical stuff, like AppStream for Kubuntu, and plenty of other things that I’ll write about in a separate blog post.

Generally, I got so many new impressions from this year’s Akademy, that I could write a 10-pages long blogpost about it while still having to leave out things.

Kudos to the organizers of this Akademy, you did a fantastic job! I also want to thank the Ubuntu community for funding my flight, and the Kubuntu community for pushing me a little to attend :-).

KDE is a great community with people driving the development of Free Software forward. The diversity of people, projects and ideas in KDE is a pleasure, and I am very happy to be part of this community.


August 08, 2015

Wanted to let everyone know that the GStreamer Conference 2015 is happening for the 6th time this year. So if you want to attend the premier open source multimedia conference you can do so in Dublin, Ireland between the 8th and 9th of October. If you want to attend I suggest registering as early as possible using the GStreamer Conference registration webpage. Like earlier years the GStreamer Conference is co-located with other great conferences like the Embedded Linux Conference Europe so you have the chance to combine the attendance into one trip.

The GStreamer Conference has always been a great opportunity to not only learn about the latest developments in GStreamer, but about whats happeing in the general Linux multimedia stack, latest news from the world of codec development and other related topics. I strongly recommend setting aside the 8th and the 9th of October for a trip to Dublin and the GStreamer Conference.

Also a heads up for those interested in doing a talk. The formal deadline for submitting a proposal is this Sunday the 9th of August, so you need to hurry to send in a proposal. You find the details for how to submit a talk on the GStreamer Conference 2015 website. While talks submitted before the 9th will be prioritized I do recommend anyone seeing this after the deadline to still send in a proposal as there might be a chance to get on the program anyway if you get your proposal in during next week.

August 04, 2015

It's been a while since I talked about Ceilometer and its companions, so I thought I'd go ahead and write a bit about what's going on this side of OpenStack. I'm not going to cover new features and fancy stuff today, but rather a shallow overview of the new project processes we initiated.

Ceilometer growing

Ceilometer has grown a lot since that time when we started it 3 years ago. It has evolved from a system designed to fetch and store measurements, to a more complex system, with agents, alarms, events, databases, APIs, etc.

All those features were needed and asked for by users and operators, but let's be honest, some of them should never have ended up in the Ceilometer code repository, especially not all at the same time.

The reality is we picked a pragmatic approach due to the rigidity of the OpenStack Technical Committee in regards to new projects to become OpenStack integrated – and, therefore, blessed – projects. Ceilometer was actually the first project to be incubated and then integrated. We had to go through the very first issues of that process.

Fortunately, now that time has passed, and all those constraints have been relaxed. To me, the OpenStack Foundation is turning into something that looks like the Apache Foundation, and there's, therefore, no need to tie technical solutions to political issues.

Indeed, the Big Tent now allows much more flexibility to all of that. Back a year ago, we were afraid to bring Gnocchi into Ceilometer. Was the Technical Committee going to review the project? Was the project going to be in the scope of Ceilometer for the Technical Committee? Now we don't have to ask ourselves those questions, now that we have that freedom, it empowers us to actually do what we think is good in term of technical design without worrying too much about political issues.

Ceilometer development activity

Acknowledging Gnocchi

The first step in this new process was to continue working on Gnocchi (a timeserie database and resource indexer designed to overcome historical Ceilometer storage issue) and to decide that it was not the right call to merge it into Ceilometer as some REST API v3, but that it was better to keep it standalone.

We managed to get traction to Gnocchi, getting a few contributors and users. We're even seeing talks proposed to the next Tokyo Summit where people leverage Gnocchi, such as "Service of predictive analytics on cost and performance in OpenStack", "Suveil" and "Cutting Edge NFV On OpenStack: Healing and Scaling Distributed Applications".

We are also doing some progress on pushing Gnocchi outside of the OpenStack community, as it can be a self-sufficient timeserie and resource database that can be used without any OpenStack interaction.

Branching Aodh

Rather than continuing to grow Ceilometer, during the last summit we all decided that it was time to reorganize and split Ceilometer into the different components it is made of, leveraging a more service-oriented architecture. The alarm subsystem of Ceilometer being mostly untied to the rest of Ceilometer, we decided it was the first and perfect candidate to do that. I personally engaged into doing the work and created a new repository with only the alarm code from Ceilometer, named Aodh.

Aodh is an Irish word meaning fire. A word picked so it also had some relation to Heat, and because we have some Irish influence around the project 😁.

This made sense for a lot of reason. First because Aodh can now work completely standalone, using either Ceilometer or Gnocchi as a backend – or any new plugin you'd write. I love the idea that OpenStack projects can work standalone – like Swift does for example – without implying any other OpenStack component. I think it's a proof of good design. Secondly, because it allows us to resonate on a smaller chunk of software – a reason really under-estimated today in OpenStack. I believe that the size of your software should match a certain ratio to the size of your team.

Aodh is, therefore, a new project under the OpenStack Telemetry program (or what remains of OpenStack programs now), alongside Ceilometer and Gnocchi, forked from the original Ceilometer alarm feature. We'll deprecate the latter with the Liberty release, and we'll remove it in the Mitaka release.

Lessons learned

Actually, moving that code out of Ceilometer (in the case of Aodh), or not merging it in (in the case of Gnocchi) had a few side effects that I admit I think we probably under-estimated back then.

Indeed, the code size of Gnocchi or Aodh ended up being much smaller than the entire Ceilometer project – Gnocchi is 7× smaller and Aodh 5x smaller than Ceilometer – and therefore much more easy to manipulate and to hack on. That allowed us to merge dozens of patches in a few weeks, cleaning-up and enhancing a lot of small things in the code. Those tasks are very much harder in Ceilometer, due to the bigger size of the code base and the small size of our team. By having our small team working on smaller chunks of changes – even when it meant actually doing more reviews – greatly improved our general velocity and the number of bugs fixed and features implemented.

On the more sociological side, I think it gave the team the sensation of finally owning the project. Ceilometer was huge, and it was impossible for people to know every side of it. Now, it's getting possible for people inside a team to cover a much larger portion of those smaller project, which gives them a greater sense of ownership and caring. Which ends up being good for the project quality overall.

That also means that we technically decided to have different core teams by project (Ceilometer, Gnocchi, and Aodh) as they all serve different purposes and can all be used standalone or with each others. Meaning we could have contributors completely ignoring other projects.

All of that reminds me some discussion I heard about projects such as Glance, trying to fit new features in - some that are really orthogonal to the original purpose. It's now clear to me that having different small components interacting together that can be completely owned and taken care of by a (small) team of contributors is the way to go. People that can therefore trust each others and easily bring new people in, makes a project really incredibly more powerful. Having a project covering a too wide set of features make things more difficult if you don't have enough manpower. This is clearly an issue that big projects inside OpenStack are facing now, such as Neutron or Nova.

July 28, 2015

Announcing systemd.conf 2015

We are happy to announce the inaugural systemd.conf 2015 conference of the systemd project.

The conference takes place November 5th-7th, 2015 in Berlin, Germany.

Only a limited number of tickets are available, hence make sure to sign up quickly.

For further details consult the conference website.

July 22, 2015

Below is an outline of the various types of touchpads that can be found in the wild. Touchpads aren't simply categorised into a single type, instead they have a set of properties, a combination of number of physical buttons, touch support and physical properties.

Number of buttons

Physically separate buttons

For years this was the default type of touchpads: a touchpad with a separate set of physical buttons below the touch surface. Such touchpads are still around, but most newer models are Clickpads now.

Touchpads with physical buttons usually provide two buttons, left and right. A few touchpads with three buttons exist, and Apple used to have touchpads with a single physical buttons back in the PPC days. Touchpads with only two buttons require the software stack to emulate a middle button. libinput does this when both buttons are pressed simultaneously.

A two-button touchpad, with a two-button pointing stick above.

Note that many Lenovo laptops provide a pointing stick above the touchpad. This pointing stick has a set of physical buttons just above the touchpad. While many users use those as substitute touchpad buttons, they logically belong to the pointing stick. The *40 and *50 series are an exception here, the former had no physical buttons on the touchpad and required the top section of the pad to emulate pointing stick buttons, the *50 series has physical buttons but they are wired to the touchpads. The kernel re-routes those buttons through the trackstick device.


Clickpads are the most common type of touchpads these days. A Clickpad has no separate physical buttons, instead the touchpad itself is clickable as a whole, i.e. a user presses down on the touch area and triggers a physical click. Clickpads thus only provide a single button, everything else needs to be software-emulated.

A clickpad on a Lenovo x220t. Just above the touchpad are the three buttons associated with the pointing stick. Faint markings on the bottom of the touchpad hint at where the software buttons should be.

Right and middle clicks are generated either via software buttons or "clickfinger" behaviour. Software buttons define an area on the touchpad that is a virtual right button. If a finger is in that area when the click happens, the left button event is changed to a right button event. A middle click is either a separate area or emulated when both the left and right virtual buttons are pressed simultaneously.

When the software stack uses the clickfinger method, the number of fingers decide the type of click: a one-finger is a left button, a two-finger click is a right button, a three-finger click is a middle button. The location of the fingers doesn't matter, though there are usually some limits in how the fingers can be distributed (e.g. some implementations try to detect a thumb at the bottom of the touchpad to avoid accidental two-finger clicks when the user intends a thumb click).

The libinput documentation has a section on Clickpad software button behaviour with more detailed illustrations

The touchpad on a T440s has no physical buttons for the pointing stick. The marks on the top of the touchpad hint at the software button position for the pointing stick. Note that there are no markings at the bottom of the touchpad anymore.

Clickpads are labelled by the kernel with the INPUT_PROP_BUTTONPAD input property.


One step further down the touchpad evolution, Forcepads are Clickpads without a physical button. They provide pressure and (at least in Apple's case) have a vibration element that is software-controlled. Instead of the satisfying click of a physical button, you instead get a buzz of happiness. Which apparently feels the same as a click, judging by the reviews I've read so far. A software-controlled click feel has some advantages, it can be disabled for some gestures, modified for others, etc. I suspect that over time Forcepads will become the main touchpad category, but that's a few years away.

Not much to say on the implementation here. The kernel has some ForcePad support but everything else is spotty.

Note how Apple's Clickpads have no markings whatsoever, Apple uses the clickfinger method by default.

Touch capabilities

Single-touch touchpads

In the beginning, there was the single-finger touchpad. This touchpad would simply provide x/y coordinates for a single finger and get mightily confused when more than one finger was present. These touchpads are now fighting with dodos for exhibition space in museums, few of those are still out in the wild.

Pure multi-touch touchpads

Pure multi-touch touchpads are those that can track, i.e. identify the location of all fingers on the touchpad. Apple's touchpads support 16 touches (iirc), others support 5 touches like the Synaptics touchpads when using SMBus.

Pure multi-touch touchpads are the easiest to support, we can rely on the finger locations and use them for scrolling, gestures, etc. These touchpads usually also provide extra information. In the case of the Apple touchpads we get an ellipsis and the orientation of the ellipsis for each touch point. Other touchpads provide a pressure value for each touch point. Though pressure is a bit of a misnomer, pressure is usually directly related to contact area. Since our puny human fingers flatten out as the pressure on the pad increases, the contact area increases and the firmware then calculates that back into a (mostly rather arbitrary) pressure reading.

Because pressure is really contact area size, we can use it to detect accidental palm contact or thumbs though it's fairly unreliable. A light palm touch or a touch at the very edge of a touchpad will have a low pressure reading simply because the palm is mostly next to the touchpad and thus the contact area itself remains small.

Partial multi-touch touchpads

The vast majority of touchpads fall into this category. It's the half-way point between single-touch and pure multi-touch. These devices can track N fingers, but detect more than N. The current Synaptics touchpads fall into that category when they're using the serial protocol. Most touchpads that fall into this category can track two fingers and detect up to four or five. So a typical three-finger interaction would give you the location of two fingers and a separate value telling you that a third finger is down.

The lack of finger location doesn't matter for some interactions (tapping, three-finger click) but it can cause issues in some cases. For example, a user may have a thumb resting on a touchpad while scrolling with two fingers. Which touch locations you get depends on the order of the fingers being set down, i.e. this may look like thumb + finger + third touch somewhere (lucky!) or two fingers scrolling + third touch somewhere (unlucky, this looks like a three-finger swipe). So far we've mostly avoided having anything complex enough that requires the exact location of more than two fingers, these pads are so prevalent that any complex feature would exclude the majority of users.

Semi-mt touchpads

A sub-class of partial multi-touch touchpads. These touchpads can technically detect two fingers but the location of both is limited to the bounding box, i.e. the first touch is always the top-left one and the second touch is the bottom-right one. Coordinates jump around as fingers move past each other. Most semi-mt touchpads also have a lower resolution for two touches than for one, so even things like two-finger scrolling can be very jumpy.

Semi-mt are labelled by the kernel with the INPUT_PROP_SEMI_MT input property.

Physical properties

External touchpads

USB or Bluetooth touchpads not in a laptop chassis. Think the Apple Magic Trackpad, the Logitech T650, etc. These are usually clickpads, the biggest difference is that they can be removed or added at runtime. One interaction method that is only possible on external touchpads is a thumb resting on the very edge/immediately next to the touchpad. On the far edge, touchpads don't always detect the finger location so clicking with a thumb barely touching the edge makes it hard or impossible to figure out which software button area the finger is on.

These touchpads also don't need palm detection - since they're not located underneath the keyboard, accidental palm touches are a non-issue.

A Logitech T650 external touchpad. Note the thumb position, it is possible to click the touchpad without triggering a touch.

Circular touchpads

Yes, used to be a thing. Touchpad shaped in an ellipsis or circle. Luckily for us they have gone full dodo. The X.Org synaptics driver had to be aware of these touchpads to calculate the right distance for edge scrolling - unsurprisingly an edge scroll motion on a circular touchpad isn't very straight.

Graphics tablets

Touch-capable graphics tablets are effectively external touchpads, with two differentiators: they are huge compared to normal touchpads and they have no touchpad buttons whatsoever. This means they can either work like a Forcepad, or rely on interaction methods that don't require buttons (like tap-to-click). Since the physical device is shared with the pen input, some touch arbitration is required to avoid touch input interfering when the pen is in use.

Dedicated edge scroll area

Mostly on older touchpads before two-finger scrolling became the default method. These touchpads have a marking on the touch area that designates the edge to be used for scrolling. A finger movement in that edge zone should trigger vertical motions. Some touchpads have markers for a horizontal scroll area too at the bottom of the touchpad.

A touchpad with a marked edge scroll area on the right.
July 21, 2015

I've had a lot of thought about Mont. (Sorry for the rhymes.) Mont, recall, is the set of all Monte objects. I have a couple interesting thoughts on Mont that I'd like to share, but the compelling result I hope to convince readers of is this: Mont is a simple and easy-to-think-about category once we define an appropriate sort of morphism. By "category" I mean the fundamental building block of category theory, and most of the maths I'm going to use in this post is centered around that field. In particular, "morphism" is used in the sense of categories.

I'd like to put out a little lemma from the other day, first. Let us say that the Monte == operator is defined as follows: For any two objects in Mont, x and y, x == y if and only if x is y, or for any message [verb, args] sendable to these objects, M.send(x, verb, args) == M.send(y, verb, args). In other words, x == y if it is not possible to distinguish x and y by any chain of sent messages. This turns out to relate to the category definition I give below. It also happens to correlate nicely with the idea of equivalence, in that == is an equivalence relation on Mont! The proof:

  • Reflexivity: For any object x, x == x. The first branch of the definition handles this.
  • Symmetry: For any objects x and y, x == y iff y == x. Identity is symmetric, and the second branch is covered by recursion.
  • Transitivity: For any objects x, y, and z, x == y and y == z implies x == z. Yes, if x and y can't be told apart, then if y and z also can't be told apart it immediately follows that x and z are likewise indistinguishable.

Now, obviously, since objects can do whatever computation they like, the actual implementation of == has to be conservative. We generally choose to be sound and incomplete; thus, x == y sometimes has false negatives when implemented in software. We can't really work around this without weakening the language considerably. Thus, when I talk about Mont/==, please be assured that I'm talking more about the ideal than the reality. I'll try to address spots where this matters.

Back to categories. What makes a category? Well, we need a set, some morphisms, and a couple proofs about the behavior of those morphisms. First, the set. I'll use Mont-DF for starters, but eventually we want to use Mont. Not up on my posts? Mont-DF is the subset of Mont where objects are transitively immutable; this is extremely helpful to us since we do not have to worry about mutable state nor any other side effect. (We do have to worry about identity, but most of my results are going to be stated as holding up to equivalence. I am not really concerned with whether there are two 42 objects in Mont right now.)

My first (and, spoiler alert, failed) attempt at defining a category was to use messages as morphisms; that is, to go from one object to another in Mont-DF, send a message to the first object and receive the second object. Clear, clean, direct, simple, and corresponds wonderfully to Monte's semantics. However, there's a problem. The first requirement of a category is that, for any object in the set, there exists an identity morphism, usually called 1, from that object to itself. This is a problem in Monte. We can come up with a message like that for some objects, like good old 42, which responds to ["add", [0]] with 42. (Up to equivalence, of course!) However, for some other objects, like object o as DeepFrozen {}, there's no obvious methods to use.

The answer is to add a new Miranda method which is not overrideable called _magic/0. (Yes, if this approach would have worked, I would have picked a better name.) Starting from Mont-DF, we could amend all objects to get a new set, Mont-DF+Magic, in which the identity morphism is always ["_magic", []]. This neatly wraps up the identity morphism problem.

Next, we have to figure out how to compose messages. At first blush, this is simple; if we start from x and send it some message to get y, and then send another message to y to get z, then we obviously can get to x from z. However, here's the rub: There might not be any message directly from x to z! We're stuck here. Unlike with other composition operators, there's no hand-wavey way to compose messages like with functions. So this is bunk.

However, we can cheat gently and use the free monoid a.k.a. the humble list. A list of messages will work just fine: To compose them, simply catenate the lists, and the identity morphism is the empty list. Putting it all together, a morphism from 6 to 42 might be [["multiply", [7]]], and we could compose that with [["asString", []]] to get [["multiply", [7]], ["asString", []]], a morphism from 6 to "42". Not shabby at all!

There we go. Now Mont-DF is a category up to equivalence. The (very informally defined) set of representatives of equivalence classes via ==, which I'll call Mont-DF/==, is definitely a category here as well, since it encapsulates the equivalence question. We could alternatively insist that objects in Mont-DF are unique (or that equivalent definitions of objects are those same objects), but I'm not willing to take up that sword this time around, mostly because I don't think that it's true.

"Hold up," you might say; "you didn't prove that Mont is a category, only Mont-DF." Curses! I didn't fool you at all, did I? Yes, you're right. We can't extend this result to Mont wholesale, since objects in Mont can mutate themselves. In fact, Mont doesn't really make sense to discuss in this way, since objects in sets aren't supposed to be mutable. I'm probably going to have to extend/alter my definition of Mont in order to get anywhere with that.

July 16, 2015

In a perfect world, any device that advertises absolute x/y axes also advertises the resolution for those axes. Alas, not all of them do. For libinput, this problem is two-fold: parts of the touchscreen API provide data in mm - without knowing the resolution this is a guess at best. But it also matters for touchpads, where a lack of resolution is a lot more common (though the newest generations of major touchpad manufacturers tend to advertise resolutions now).

We have a number of features that rely on the touchpad resolution: from the size of the software button to deciding which type of palm detection we need, it all is calculated based on physical measurements. Until recently, we had code to differentiate between touchpads with resolution and most of the special handling was a matter of magic numbers, usually divided by the diagonal of the touchpad in device units. This made code maintenance more difficult - without testing each device, behaviour could not be guaranteed.

With libinput 0.20, we now got rid of this special handling and instead require the touchpads to advertise resolutions. This requires manual intervention, so we're trying to fix this in multiple places, depending on the confidence of the data. We have hwdb entries for the bcm5974 (Apple) touchpads and the Chromebook Pixel. For Elantech touchpads, a kernel patch is currently waiting for merging. For ALPS touchpads, we ship size hints with libinput's hwdb. If all that fails, we fall back to a default touchpad size of 69x55mm. [1]

All this affects users in two ways: one is that you may notice a slightly different behaviour of your touchpad now. The software-buttons may be bigger or smaller than before, pointer acceleration may be slightly different, etc. Shouldn't be too bad, but you may just notice it. The second noticeable change is that libinput will now log when it falls back to the default size. If you notice a message like that in your log, please file a bug and attach the output of evemu-describe and the physical dimensions of your touchpad. Once we have that information, we can add it to the right place and make sure that everyone else with that touchpad gets the right settings out of the box.

[1] The default size was chosen because it's close enough to what old touchpads used to be, and those are most likely to lack resolution values. This size may change over time as we get better data.

July 15, 2015

With the release of Solaris 11.3 beta, I've gone back and made a new list of changes to the bundled software packages available in the Solaris IPS package repository, as I've done for the Solaris 11.1, Solaris 11.2 beta, and the Solaris 11.2 GA releases.

Oracle packages

Several bundled packages improve integration with other Oracle software. The Oracle Instant Client packages are now in the IPS repo for building software that connects to Oracle databases. MySQL 5.6 has also been added alongside the existing version 5.5 packages.

The Java runtime & developer kits for Java 7 & 8 were updated to new versions, while the Java 6 versions were removed as its support life winds down. The End of Feature Notices for Oracle Solaris 11 warns that Java 7 will be coming out as well in a later release.

Also updated was Oracle Hardware Management Pack (HMP), a set of tools that work with the ILOM, firmware, and other components in Sun/Oracle servers to configure low-level system options. HMP 2.2 was introduced in Solaris 11.2, and Solaris 11.3 now delivers HMP 2.3 packages.

Python packages

Solaris has long included and depended on Python 2. Solaris 11.3 adds Python 3 support for the first time, with the bundling of Python 3.4 and many module packages that work with it. Python 2.7 is still included, as is 2.6 for now, but Python 2 software in Solaris is almost completely switched over to 2.7 now, and Python 2.6 will be obsoleted soon.

A side effect of these changes was a revamping of the naming pattern for Python module packages in IPS - previously most modules delivered a set of packages following the pattern:

  • library/python-2/<module name>
  • library/python-2/<module name>-<for each Python version>
For example, there were three Mako packages, library/python-2/mako, library/python-2/mako-26, library/python-2/mako-27, where the latter two installed the modules built for the named versions of Python, and the first uses IPS conditional dependencies to install the modules for any Python versions that were installed on the system.

In extending this to provide Python 3 modules, it was decided to drop the python major version from the library/python-N prefix, leaving just the version at the end of the module name. Thus in Solaris 11.3, you'll see that the mako packages are now library/python/mako, library/python/mako-26, library/python/mako-27, and library/python/mako-34.

NVIDIA graphics driver packages

NVIDIA has been providing graphics driver packages for Solaris for almost a decade now. As new families and models of graphics cards are regularly introduced, they retire support for older generations from time to time in the new drivers. Support for these models is retained in a legacy driver, but that requires uninstalling the latest version and switching to a legacy branch. Previously that meant installing NVDIA's SVR4 package release instead of IPS, losing the ability to get updates with a simple “pkg update” command.

Now the legacy drivers are also available in IPS packages, which will continue to be updated as necessary for bug fixes and support for new Xorg releases during NVIDIA’s Support timeframes for Unix legacy GPU releases. To switch to the version 340 legacy driver on Solaris 11.3 or the later Solaris 11.2 SRU’s simply run:

  # pkg install --reject driver/graphics/nvidia driver/graphics/nvidiaR340 
and then reboot into the new BE created. For the previous version 304, change the above command to end in nvidiaR304 instead.

Other packages

There are far more changes than I've covered here - fortunately, the engineers who worked on many of these changes have written their own blog posts about them for you to check out:

One more thing... Solaris 11.2 packages

While all these are available now in the Solaris 11.3 beta, many are also available for testing and evaluation on existing Solaris 11.2 systems, when you're ready to upgrade a FOSS package, but not the rest of the OS. This is planned to be an ongoing program, so once Solaris 11.3 is officially released, the evaluation packages will keep moving forward to new versions of many packages. More details are available in a Solaris FOSS blog post and an article in the Solaris 11 OTN community space.

Not all packages are available in the evaluation program though, since some depend on OS changes not in Solaris 11.2. For instance, OpenSSH is not available for Solaris 11.2, since it depends on changes to the existing SunSSH packages that allow for the ssh package mediator to choose which ssh software to use on a given system.

Detailed list of changes

This table shows most of the changes to the bundled packages between the original Solaris 11.2.0 release, the latest Solaris 11.2 support repository update (SRU11, aka 11.2.11, released June 13, 2015), and the Solaris 11.3 beta released today. These show the versions they were released with, and not later versions that may now be available via the new FOSS Evaluation Packages for existing Solaris releases.

As with last time, some were excluded for clarity, or to reduce noise and duplication. All of the bundled packages which didn’t change the version number in their packaging info are not included, even if they had updates to fix bugs, security holes, or add support for new hardware or new features of Solaris.

PackageUpstream11. Beta
cloud/openstack OpenStack0.2013.2.30.2014.2.20.2014.2.2
cloud/openstack/cinder OpenStack0.2013.2.30.2014.2.20.2014.2.2
cloud/openstack/glance OpenStack0.2013.2.30.2014.2.20.2014.2.2
cloud/openstack/heat OpenStacknot included0.2014.2.20.2014.2.2
cloud/openstack/horizon OpenStack0.2013.2.30.2014.2.20.2014.2.2
cloud/openstack/keystone OpenStack0.2013.2.30.2014.2.20.2014.2.2
cloud/openstack/neutron OpenStack0.2013.2.30.2014.2.20.2014.2.2
cloud/openstack/nova OpenStack0.2013.2.30.2014.2.20.2014.2.2
cloud/openstack/swift OpenStack1.
communication/im/pidgin Pidgin2.
compress/pigz pigznot included2.
crypto/gnupg GnuPG2.
database/mysql-56 MySQLnot included
(MySQL 5.5 in database/mysql-56)
database/sqlite-3 SQLite3.
developer/build/ant Apache Ant1.
developer/documentation-tool/help2man GNU help2mannot includednot included1.46.1
developer/documentation-tool/xmlto xmltonot includednot included0.0.25
developer/java/jdk-6 Java1.6.0.75
(Java SE 6u75)
(Java SE 6u95)
not included
developer/java/jdk-7 Java1.7.0.65
(Java SE 7u65)
(Java SE 7u80)
(Java SE 7u80)
developer/java/jdk-8 Java1.8.0.11
(Java SE 8u11)
(Java SE 8u45)
(Java SE 8u45)
developer/test/check checknot includednot included0.9.14
developer/versioning/mercurial Mercurial SCM2.
developer/versioning/subversion Apache Subversion1.
diagnostic/nicstat nicstatnot includednot included1.95
diagnostic/tcpdump tcpdump4.
diagnostic/wireshark Wireshark1.
driver/graphics/nvidia NVIDIA0.331.38.00.346.35.00.346.35.0
driver/graphics/nvidiaR304 NVIDIAnot included0.304.125.00.304.125.0
driver/graphics/nvidiaR340 NVIDIAnot included0.340.65.00.340.65.0
file/mc GNU Midnight Commander4.
library/apr-15 Apache Portable Runtimenot includednot included1.5.1
library/c++/net6 Gobby1.
library/jansson Janssonnot includednot included2.7
library/json-c JSON-C0.90.90.12
library/libee libee0.
library/libestr libestr0.
library/libgsl GNU GSLnot includednot included1.16
library/liblogging LibLoggingnot includednot included1.0.4
library/libmicrohttpd GNU Libmicrohttpdnot includednot included0.9.37
library/libmilter Sendmail8.
library/libxml2 XML C parser2.
library/neon neon0.
library/perl-5/openscap-512 OpenSCAP1.
library/perl-5/xml-libxml CPAN: XML::LibXML2.142.142.121
was library/python-2/alembic
was library/python-2/amqp
library/python/barbicanclient OpenStacknot included3.
was library/python-2/boto
library/python/ceilometerclient OpenStack1.
library/python/cinderclient OpenStack1.
was library/python-2/cliff
library/python/django Django1.
library/python/django-pyscss django-pyscssnot included1.
was library/python-2/django_compressor
was library/python-2/django_openstack_auth
was library/python-2/eventlet
library/python/futures pythonfuturesnot included2.
library/python/glance_store OpenStacknot included0.
library/python/glanceclient OpenStack0.
was library/python-2/greenlet
library/python/heatclient OpenStack0.
library/python/iniparse iniparsenot included0.40.4
library/python/ipaddr ipaddr-pynot included2.
library/python/jinja2 Jinja2.
library/python/keystoneclient OpenStack0.
library/python/keystonemiddleware OpenStack not included1.
was library/python-2/kombu
library/python/ldappool ldappoolnot included1.01.0
was library/python-2/netaddr
was library/python-2/netifaces
library/python/networkx NetworkXnot included1.
library/python/neutronclient OpenStack2.
library/python/novaclient OpenStack2.
library/python/oauthlib OAuthLibnot included0.
library/python/openscap OpenSCAP1.
library/python/oslo.config OpenStack1.
library/python/oslo.context OpenStacknot included0.
library/python/oslo.db OpenStacknot included1.
library/python/oslo.i18n OpenStacknot included1.
library/python/oslo.messaging OpenStacknot included1.
library/python/oslo.middleware OpenStacknot included0.
library/python/oslo.serialization OpenStacknot included1.
library/python/oslo.utils OpenStacknot included1.
library/python/oslo.vmware OpenStacknot included0.
library/python/osprofiler OpenStacknot included0.
was library/python-2/pep8
PyPI: pep81.
was library/python-2/pip
library/python/posix_ipc POSIX IPC for Pythonnot included0.
was library/python-2/py
library/python/pycadf OpenStacknot included0.
was library/python-2/pyflakes
library/python/pyscss pyScssnot included1.
library/python/pysendfile pysendfilenot included2.
was library/python-2/pytest
was library/python-2/python-mysql
was library/python-2/pytz
was library/python-2/requests
library/python/retrying Retryingnot included1.
library/python/rfc3986 rfc3986not included0.
library/python/saharaclient OpenStacknot included0.
was library/python-2/setuptools
PyPI: setuptools0.
library/python/simplegeneric PyPI: simplegenericnot included0.
was library/python-2/simplejson
library/python/six PyPI: six1.
was library/python-2/sqlalchemy
was library/python-2/sqlalchemy-migrate
was library/python-2/stevedore
library/python/swiftclient OpenStack2.
library/python/taskflow OpenStacknot included0.
was library/python-2/tox
library/python/troveclient OpenStack0.
was library/python-2/virtualenv
library/python/websockify Websockify0.
library/python/wsme wsmenot included0.
library/ruby/hiera Puppetnot included1.
library/security/libassuan GnuPG2.
library/security/libksba GnuPG1.
library/security/openssl OpenSSL1.0.1.8 (1.0.1h) (1.0.1m) (1.0.1o)
library/unixodbc unixODBC2.
library/zlib zlib1.
mail/mailman GNU Mailmannot includednot included2.1.18.1
network/dns/bind ISC BIND9.
network/firewall OpenBSD PFnot includednot included5.5
network/mtr MTRnot includednot included0.86
network/openssh OpenSSHnot includednot included6.5.0.1
network/rsync rsync3.
print/filter/hplip HPLIP3.
runtime/erlang erlang15.2.317.517.5
runtime/java/jre-6 Java1.6.0.75
(Java SE 6u75)
(Java SE 6u95)
not included
runtime/java/jre-7 Java1.7.0.65
(Java SE 7u65)
(Java SE 7u80)
(Java SE 7u80)
runtime/java/jre-8 Java1.8.0.11
(Java SE 8u11)
(Java SE 8u45)
(Java SE 8u45)
runtime/python-27 Python2.
runtime/python-34 Pythonnot includednot included3.4.3
runtime/ruby-21 Rubynot included
(Ruby 1.9.3 in runtime/ruby-19)
security/compliance/openscap OpenSCAP1.
security/sudo Sudo1.
service/network/dns/bind ISC BIND9.
service/network/ftp ProFTPD1. (1.3.4c)
service/network/ntp NTP4.2.7.381 (4.2.7p381) (4.2.8p2) (4.2.8p2)
service/network/samba Samba3.
service/network/smtp/postfix Postfixnot includednot included2.11.3
service/network/smtp/sendmail Sendmail8.
shell/bash GNU bash4.
shell/watch procps-ngnot includednot included3.3.10
shell/zsh Zsh5.
system/data/hardware-registry pci.ids
system/data/timezone IANA Time Zone Data0.5.11 (2014c)0.5.11 (2015d)2015.4 (2015d)
system/font/truetype/google-droid Droid Fonts0.2010.2.240.2010.2.240.2013.6.7
system/library/freetype-2 FreeType2.
system/library/hmp-libs Oracle HMP2.
system/library/i18n/libthai libthai0.
system/library/libdatrie datrie0.
system/management/biosconfig Oracle HMP2.
system/management/facter Puppet1.
system/management/fwupdate Oracle HMP2.
system/management/fwupdate/qlogic Oracle HMP1.
system/management/hmp-snmp Oracle HMP2.
system/management/hwmgmtcli Oracle HMP2.
system/management/hwmgmtd Oracle HMP2.
system/management/ocm Oracle Configuration Manager12.
system/management/puppet Puppet3.
system/management/raidconfig Oracle HMP2.
system/management/ubiosconfig Oracle HMP2.
system/rsyslog rsyslog6.
system/test/sunvts Oracle VTS7.
terminal/tmux tmux1.
text/gnu-patch GNU Patch2.
text/groff GNU troff1.
text/less Less436436458
text/text-utilities util-linuxnot includednot included2.24.2
web/browser/firefox Mozilla Firefox17.0.1131.
web/browser/links Links1.
web/curl cURL7.
web/java-servlet/tomcat Apache Tomcat6.0.416.0.436.0.43
web/java-servlet/tomcat-8 Apache Tomcatnot includednot included8.0.21
web/novnc noVNCnot included0.50.5
web/php-53 PHP5.
web/php-56 PHPnot includednot included5.6.8
web/php-56/extension/php-suhosin-extension Suhosinnot includednot included0.9.37.1
web/php-56/extension/php-xdebug Xdebugnot includednot included2.3.2
web/server/apache-22 Apache HTTPD2.
web/server/apache-22/module/apache-jk Apache Tomcat1.
web/server/apache-22/module/apache-security ModSecurity2.
web/server/apache-22/module/apache-wsgi mod_wsgi3.
web/server/apache-24 Apache HTTPDnot includednot included2.4.12
web/server/apache-24/module/apache-dtrace Apache DTrace modulenot includednot included0.3.1
web/server/apache-24/module/apache-fcgid Apache mod_fcgidnot includednot included2.3.9
web/server/apache-24/module/apache-jk Apache Tomcatnot includednot included1.2.40
web/server/apache-24/module/apache-security ModSecuritynot includednot included2.8.0
mod_wsginot includednot included4.3.0
web/wget GNU wget1.141.161.16
x11/server/xorg/driver/xorg-input-keyboard X.Org1.
x11/server/xorg/driver/xorg-input-mouse X.Org1.
x11/server/xorg/driver/xorg-input-synaptics X.Org1.
x11/server/xorg/driver/xorg-video-ast X.Org0.
x11/server/xorg/driver/xorg-video-dummy X.Org0.
x11/server/xorg/driver/xorg-video-mga X.Org1.
x11/server/xorg/driver/xorg-video-vesa X.Org2.

The libinput test suite takes somewhere around 35 minutes now for a full run. That's annoying, especially as I'm running it for every commit before pushing. I've tried optimising things, but attempts at making it parallel have mostly failed so far (almost all tests need a uinput device created) and too many tests rely on specific timeouts to check for behaviours. Containers aren't an option when you have to create uinput devices so I started out farming out into VMs.

Ideally, the test suite should run against multiple commits (on multiple VMs) at the same time while I'm working on some other branch and then accumulate the results. And that's where git notes come in. They're a bit odd to use and quite the opposite of what I expected. But in short: a git note is an object that can be associated with a commit, without changing the commit itself. Sort-of like a post-it note attached to the commit. But there are plenty of limitations, for example you can only have one note (per namespace) and merge conflicts are quite easy to trigger. Look at any git notes tutorial to find out more, there's plenty out there.

Anyway, dealing with merge conflicts is a no-go for me here. So after a bit of playing around, I found something that seems to work out well. A script to run make check and add notes to the commit, combined with a repository setup to fetch those notes and display them automatically. The core of the script is this:

make check
if [ $? -eq 0 ]; then

if [ -n "$sha" ]; then
git notes --ref "test-$HOSTNAME" append \
-m "$status: $HOSTNAME: make check `date`" HEAD
exit $rc
Then in my main repository, I add each VM as a remote, adding a fetch path for the notes:

[remote "f22-libinput1"]
url = f22-libinput1.local:/home/whot/code/libinput
fetch = +refs/heads/*:refs/remotes/f22-libinput1/*
fetch = +refs/notes/*:refs/notes/f22-libinput1/*
Finally, in the main repository, I extended the glob that displays notes to 'everything':

$ git config notes.displayRef "*"
Now git log (and by extension tig) displays all notes attached to a commit automatically. All that's needed is a git fetch --all to fetch everything and it's clear in the logs which commit fails and which one succeeded.

:: whot@jelly:~/code/libinput (master)> git log
commit 6896bfd3f5c3791e249a0573d089b7a897c0dd9f
Author: Peter Hutterer
Date: Tue Jul 14 14:19:25 2015 +1000

test: check for fcntl() return value

Mostly to silence coverity complaints.

Signed-off-by: Peter Hutterer

Notes (f22-jelly/test-f22-jelly):
SUCCESS: f22-jelly: make check Tue Jul 14 00:20:14 EDT 2015

Whenever I look at the log now, I immediately see which commits passed the test suite and which ones didn't (or haven't had it run yet). The only annoyance is that since a note is attached to a commit, amending the commit message or rebasing makes the note "go away". I've copied notes manually after this, but it'd be nice to find a solution to that.

Everything else has been working great so far, but it's quite new so there'll be a bit of polishing happening over the next few weeks. Any suggestions to improve this are welcome.

July 09, 2015

Update June 09, 2015: edge scrolling for clickpads has been merged. Will be availble in libinput 0.20. Consider the rest of this post obsolete.

libinput supports edge scrolling since version 0.7.0. Whoops, how does the post title go with this statement? Well, libinput supports edge scrolling, but only on some devices and chances are your touchpad won't be one of them. Bug 89381 is the reference bug here.

First, what is edge scrolling? As the libinput documentation illustrates, it is scrolling triggered by finger movement within specific regions of the touchpad - the left and bottom edges for vertical and horizontal scrolling, respectively. This is in contrast to two-finger scrolling, triggered by a two-finger movement, anywhere on the touchpad. synaptics had edge scrolling since at least 2002, the earliest commit in the repo. Back then we didn't have multitouch-capable touchpads, these days they're the default and you'd be struggling to find one that doesn't support at least two fingers. But back then edge-scrolling was the default, and touchpads even had the markings for those scroll edges painted on.

libinput adds a whole bunch of features to the touchpad driver, but those features make it hard to support edge scrolling. First, libinput has quite smart software button support. Those buttons are usually on the lowest ~10mm of the touchpad. Depending on finger movement and position libinput will send a right button click, movement will be ignored, etc. You can leave one finger in the button area while using another finger on the touchpad to move the pointer. You can press both left and right areas for a middle click. And so on. On many touchpads the vertical travel/physical resistance is enough to trigger a movement every time you click the button, just by your finger's logical center moving.

libinput also has multi-direction scroll support. Traditionally we only sent one scroll event for vertical/horizontal at a time, even going as far as locking the scroll direction. libinput changes this and only requires a initial threshold to start scrolling, after that the caller will get both horizontal and vertical scroll information. The reason is simple: it's context-dependent when horizontal scrolling should be used, so a global toggle to disable doesn't make sense. And libinput's scroll coordinates are much more fine-grained too, which is particularly useful for natural scrolling where you'd expect the content to move with your fingers.

Finally, libinput has smart palm detection. The large majority of palm touches are along the left and right edges of the touchpad and they're usually indistinguishable from finger presses (same pressure values for example). Without palm detection some laptops are unusable (e.g. the T440 series).

These features interfere heavily with edge scrolling. Software button areas are in the same region as the horizontal scroll area, palm presses are in the same region as the vertical edge scroll area. The lower vertical edge scroll zone overlaps with software buttons - and that's where you would put your finger if you'd want to quickly scroll up in a document (or down, for natural scrolling). To support edge scrolling on those touchpads, we'd need heuristics and timeouts to guess when something is a palm, a software button click, a scroll movement, the start of a scroll movement, etc. The heuristics are unreliable, the timeouts reduce responsiveness in the UI. So our decision was to only provide edge scrolling on touchpads where it is required, i.e. those that cannot support two-finger scrolling, those with physical buttons. All other touchpads provide only two-finger scrolling. And we are focusing on making 2 finger scrolling good enough that you don't need/want to use edge scrolling (pls file bugs for anything broken)

Now, before you get too agitated: if edge scrolling is that important to you, invest the time you would otherwise spend sharpening pitchforks, lighting torches and painting picket signs into developing a model that allows us to do reliable edge scrolling in light of all the above, without breaking software buttons, maintaining palm detection. We'd be happy to consider it.

July 08, 2015

In my previous post I introduced ARB_shader_storage_buffer, an OpenGL 4.3 feature that is coming soon to Mesa and the Intel i965 driver. While that post focused on explaining the features introduced by the extension, in this post I’ll dive into some of the implementation aspects, for those who are curious about this kind of stuff. Be warned that some parts of this post will be specific to Intel hardware.

Following the trail of UBOs

As I explained in my previous post, SSBOs are similar to UBOs, but they are read-write. Because there is a lot of code already in place in Mesa’s GLSL compiler to deal with UBOs, it made sense to try and reuse all the data structures and code we had for UBOs and specialize the behavior for SSBOs where that was needed, that allows us to build on code paths that are already working well and reuse most of the code.

That path, however, had some issues that bit me a bit further down the road. When it comes to representing these operations in the IR, my first idea was to follow the trail of UBO loads as well, which are represented as ir_expression nodes. There is a fundamental difference between the two though: UBO loads are constant operations because uniform buffers are read-only. This means that a UBO load operation with the same parameters will always return the same value. This has implications related to certain optimization passes that work based on the assumption that other ir_expression operations share this feature. SSBO loads are not like this: since the shader storage buffer is read-write, two identical SSBO load operations in the same shader may not return the same result if the underlying buffer storage has been altered in between by SSBO write operations within the same or other threads. This forced me to alter a number of optimization passes in Mesa to deal with this situation (mostly disabling them for the cases of SSBO loads and stores).

The situation was worse with SSBO stores. These just did not fit into ir_expression nodes: they did not return a value and had side-effects (memory writes) so we had to come up with a different way to represent them. My initial implementation created a new IR node for these, ir_ssbo_store. That worked well enough, but it left us with an implementation of loads and stores that was a bit inconsistent since both operations used very different IR constructs.

These issues were made clear during the review process, where it was suggested that we used GLSL IR intrinsics to represent load and store operations instead. This has the benefit that we can make the implementation more consistent, having both loads and stores represented with the same IR construct and follow a similar treatment in both the GLSL compiler and the i965 backend. It would also remove the need to disable or alter certain optimization passes to be SSBO friendly.

Read/Write coherence

One of the issues we detected early in development was that our reads and writes did not seem to work very well together: some times a read after a write would fail to see the last value written to a buffer variable. The problem here also spawned from following the implementation trail of the UBO path. In the Intel hardware, there are various interfaces to access memory, like the Sampling Engine and the Data Port. The former is a read-only interface and is used, for example, for texture and UBO reads. The Data Port allows for read-write access. Although both interfaces give access to the same memory region, there is something to consider here: if you mix reads through the Sampling Engine and writes through the Data Port you can run into cache coherence issues, this is because the caches in use by the Sampling Engine and the Data Port functions are different. Initially, we implemented SSBO load operations like UBO loads, so we used the Sampling Engine, and ended up running into this problem. The solution, of course, was to rewrite SSBO loads to go though the Data Port as well.

Parallel reads and writes

GPUs are highly parallel hardware and this has some implications for driver developers. Take a sentence like this in a fragment shader program:

float cx = 1.0;

This is a simple assignment of the value 1.0 to variable cx that is supposed to happen for each fragment produced. In Intel hardware running in SIMD16 mode, we process 16 fragments simultaneously in the same GPU thread, this means that this instruction is actually 16 elements wide. That is, we are doing 16 assignments of the value 1.0 simultaneously, each one is stored at a different offset into the GPU register used to hold the value of cx.

If cx was a buffer variable in a SSBO, it would also mean that the assignment above should translate to 16 memory writes to the same offset into the buffer. That may seem a bit absurd: why would we want to write 16 times if we are always assigning the same value? Well, because things can get more complex, like this:

float cx = gl_FragCoord.x;

Now we are no longer assigning the same value for all fragments, each of the 16 values assigned with this instruction could be different. If cx was a buffer variable inside a SSBO, then we could be potentially writing 16 different values to it. It is still a bit silly, since only one of the values (the one we write last), would prevail.

Okay, but what if we do something like this?:

int index = int(mod(gl_FragCoord.x, 8));
cx[index] = 1;

Now, depending on the value we are reading for each fragment, we are writing to a separate offset into the SSBO. We still have a single assignment in the GLSL program, but that translates to 16 different writes, and in this case the order may not be relevant, but we want all of them to happen to achieve correct behavior.

The bottom line is that when we implement SSBO load and store operations, we need to understand the parallel environment in which we are running and work with test scenarios that allow us to verify correct behavior in these situations. For example, if we only test scenarios with assignments that give the same value to all the fragments/vertices involved in the parallel instructions (i.e. assignments of values that do not depend on properties of the current fragment or vertex), we could easily overlook fundamental defects in the implementation.

Dealing with helper invocations

From Section 7.1 of the GLSL spec version 4.5:

“Fragment shader helper invocations execute the same shader code
as non-helper invocations, but will not have side effects that
modify the framebuffer or other shader-accessible memory.”

To understand what this means I have to introduce the concept of helper invocations: certain operations in the fragment shader need to evaluate derivatives (explicitly or implicitly) and for that to work well we need to make sure that we compute values for adjacent fragments that may not be inside the primitive that we are rendering. The fragment shader executions for these added fragments are called helper invocations, meaning that they are only needed to help in computations for other fragments that are part of the primitive we are rendering.

How does this affect SSBOs? Because helper invocations are not part of the primitive, they cannot have side-effects, after they had served their purpose it should be as if they had never been produced, so in the case of SSBOs we have to be careful not to do memory writes for helper fragments. Notice also, that in a SIMD16 execution, we can have both proper and helper fragments mixed in the group of 16 fragments we are handling in parallel.

Of course, the hardware knows if a fragment is part of a helper invocation or not and it tells us about this through a pixel mask register that is delivered with all executions of a fragment shader thread, this register has a bitmask stating which pixels are proper and which are helper. The Intel hardware also provides developers with various kinds of messages that we can use, via the Data Port interface, to write to memory, however, the tricky thing is that not all of them incorporate pixel mask information, so for use cases where you need to disable writes from helper fragments you need to be careful with the write message you use and select one that accepts this sort of information.

Vector alignments

Another interesting thing we had to deal with are address alignments. UBOs work with layout std140. In this setup, elements in the UBO definition are aligned to 16-byte boundaries (the size of a vec4). It turns out that GPUs can usually optimize reads and writes to multiples of 16 bytes, so this makes sense, however, as I explained in my previous post, SSBOs also introduce a packed layout mode known as std430.

Intel hardware provides a number of messages that we can use through the Data Port interface to write to memory. Each message has different characteristics that makes it more suitable for certain scenarios, like the pixel mask I discussed before. For example, some of these messages have the capacity to write data in chunks of 16-bytes (that is, they write vec4 elements, or OWORDS in the language of the technical docs). One could think that these messages are great when you work with vector data types, however, they also introduce the problem of dealing with partial writes: what happens when you only write to an element of a vector? or to a buffer variable that is smaller than the size of a vector? what if you write columns in a row_major matrix? etc

In these scenarios, using these messages introduces the need to mask the writes because you need to disable the channels in the vec4 element that you don’t want to write. Of course, the hardware provides means to do this, we only need to set the writemask of the destination register of the message instruction to select the right channels. Consider this example:

struct TB {
    float a, b, c, d;

layout(std140, binding=0) buffer Fragments {
   TB s[3];
   int index;

void main()
   s[0].d = -1.0;

In this case, we could use a 16-byte write message that takes 0 as offset (i.e writes at the beginning of the buffer, where s[0] is stored) and then set the writemask on that instruction to WRITEMASK_W so that only the fourth data element is actually written, this way we only write one data element of 4 bytes (-1) at offset 12 bytes (s[0].d). Easy, right? However, how do we know, in general, the writemask that we need to use? In std140 layout mode this is easy: since each element in the SSBO is aligned to a 16-byte boundary, we simply need to take the byte offset at which we are writing, divide it by 16 (to convert it to units of vec4) and the modulo of that operation is the byte offset into the chunk of 16-bytes that we are writing into, then we only have to divide that by 4 to get the component slot we need to write to (a number between 0 and 3).

However, there is a restriction: we can only set the writemask of a register at compile/link time, so what happens when we have something like this?:

s[i].d = -1.0;

The problem with this is that we cannot evaluate the value of i at compile/link time, which inevitably makes our solution invalid for this. In other words, if we cannot evaluate the actual value of the offset at which we are writing at compile/link time, we cannot use the writemask to select the channels we want to use when we don’t want to write a vec4 worth of data and we have to use a different type of message.

That said, in the case of std140 layout mode, since each data element in the SSBO is aligned to a 16-byte boundary you may realize that the actual value of i is irrelevant for the purpose of the modulo operation discussed above and we can still manage to make things work by completely ignoring it for the purpose of computing the writemask, but in std430 that trick won’t work at all, and even in std140 we would still have row_major matrix writes to deal with.

Also, we may need to tweak the message depending on whether we are running on the vertex shader or the fragment shader because not all message types have appropriate SIMD modes (SIMD4x2, SIMD8, SIMD16, etc) for both, or because different hardware generations may not have all the message types or support all the SIMD modes we need need, etc

The point of this is that selecting the right message to use can be tricky, there are multiple things and corner cases to consider and you do not want to end up with an implementation that requires using many different messages depending on various circumstances because of the increasing complexity that it would add to the implementation and maintenance of the code.

Closing notes

This post did not cover all the intricacies of the implementation of ARB_shader_storage_buffer_object, I did not discuss things like the optional unsized array or the compiler details of std430 for example, but, hopefully, I managed to give an idea of the kind of problems one would have to deal with when coding driver support for this or other similar features.

July 04, 2015
So, I realized it has been a while since posting about freedreno progress, so in honor of US independence day I figured it was as good an excuse as any for an update about independence from gpu blob driver for snapdragon/adreno..

Back in end of March 2015 at ELC, I gave a freedreno update presentation at ELC, listing the following major tasks left for gles3 support:
  • Uniform Buffer Objects (UBO)
  • Transform Feedback (TF)
  • Multi-Render-Target (MRT)
  • advanced flow control in shader compiler
 and additionally for gl3:
  • Multisample anti-aliasing (MSAA)
  • NV_conditional_render
  • 32b depth (z32 and z32_s8) (which I forgot to mention in the presentation)
EDIT: Ilia pointed out that 32b depth is needed for gles3 too, and gl3 additionally needs clipdist/etc (which we'll have to emulate, but hopefully can do in a generic nir pass) and rgtc (which will need sw decompression hopefully in mesa core so other drivers for gles class hw can reuse).  Original list was based on what mesa's compute_version() code was checking quite some time back.
Since then, we've gained support for UBO's (a3xx by Ilia Mirkin, and a4xx), MRT (for a3xx and core, again thanks to Ilia.. still needs to be wired up for a4xx), 32b depth (a3xx and core, again thanks to Ilia), and I've finished up shader compiler for loops/flow-control for ir3 (a3xx/a4xx).  The shader compiler work was a somewhat larger task than I expected (and I did expect it to be a lot of work), but it also involved moving over to NIR, in addition to re-writing the scheduler and register allocation passes, as well as a lot of re-org to ir3 in order to support multiple basic blocks.  The move to NIR was not strictly required, but it brings a lot of benefits in the form of shared support for conversion to SSA, scalarizing, CSE, DCE, constant folding, and algebraic optimizations.  And I figured it was less work in the long run to move to NIR first and drop the TGSI frontend, before doing all the refactoring needed to support loops and non-lowerable flow-control.  Incidentally, the compiler work should make the shader-compiler part of TF easier (since we need to generate a conditional write to TF buffer iff not overwriting past the end of the TF buffer).

In the mean time, freedreno and drm/msm have also gained support for the a306 gpu found in the new dragonboard 410c.  This board is a nice new low cost ($75) snapdragon community board based on the 64bit snapdragon 410.  And thanks to a lot of work by linaro and qualcomm, the upstream kernel situation for this board is looking pretty good.  It is shipping initially with a 4.0 based kernel (with patches on top for stuff that hadn't yet been merged for 4.0, including a lot of stuff backported from 4.1 and 4.2), including gpu/display/audio/video-codec/etc.  I believe that the 4.1 kernel was the first version where a vanilla kernel could boot on db410c with basic stuff (like serial console) working.  The kernel support for the gpu and display, other than the adv7533 hdmi bridge chip) landed in 4.2.  There is still more work to get *everything* (including audio, vidc, etc) merged upstream, but work continues in that direction, making this quite an exciting board.
Also, we have a GSoC student, Varad, working on freedreno support for android.  It is still in early stages, with some debugging still to do, but he has made a lot of progress and things are starting to work.
And since no blog post is complete without some nice screenshots...  the other day someone pointed me at a post in the dolphin forums about how dolphin was running on a420 (same device as in the ifc6540).  We all had a good laugh about the rendering issues with the blob driver.  But, since dolphin was the first gl3 game that worked with freedreno, I was curious how freedreno would do.. so I fired up the ifc6540 and replayed some dolphin fifo logs that would let me render approximately the same scenes:

Yoshi looks to be rendering pretty well.. digimon has a bit of corruption, but no where near as bad as the blob driver.  I suspect the issue with digimon is an instruction scheduling issue in the shader compiler (well, no rest for the gpu driver writers), but nice to see that it is already in pretty good shape.

Now we just need steam store or some unigine demos for arm linux :-P

In type-theoretic terms, Monte has a very boring type system. All objects expressible in Monte form a set, Mont, which has some properties, but not anything interesting from a theoretical point of view. I plan to talk about Mont later, but for now we'll just consider it to be a way for me to make existential or universal claims about Monte's object model.

Let's start with guards. Guards are one of the most important parts of writing idiomatic Monte, and they're also definitely an integral part of Monte's safety and security guarantees. They look like types, but are they actually useful as part of a type system?

Let's consider the following switch expression:

switch (x):
    match c :Char:
        "It's a character"
    match i :Int:
        "It's an integer"
    match _:
        "I don't know what it is!"

The two guards, Char and Int, perform what amounts to a type discrimination. We might have an intuition that if x were to pass Char, then it would not pass Int, and vice versa; we might also have an intuition that the order of checking Char and Int does not matter. I'm going to formalize these and show how strong they can be in Monte.

When a coercion happens, the object being coerced is called the specimen. The result of the coercion is called the prize. You've already been introduced to the guard, the object which is performing the coercion.

It happens that a specimen might override a Miranda method, _conformTo/1, in order to pass guards that it cannot normally pass. We call all such specimens conforming. All specimens that pass a guard also conform to it, but some non-passing specimens might still be able to conform by yielding a prize to the guard.

Here's an axiom of guards: For all objects in Mont, if some object specimen conforms to a guard G, and def prize := G.coerce(specimen, _), then prize passes G. This cannot be proven by any sort of runtime assertion (yet?), but any guard that does not obey this axiom is faulty. One expects that a prize returned from a coercion passes the guard that was performing the coercion; without this assumption, it would be quite foolhardy to trust any guard at all!

With that in mind, let's talk about properties of guards. One useful property is idempotence. An idempotent guard G is one that, for all objects in Mont which pass G, any such object specimen has the equality G.coerce(specimen, _) == specimen. (Monte's equality, if you're unfamiliar with it, considers two objects to be equal if they cannot be distinguished by any response to any message sent at them. I could probably craft equivalency classes out of that rule at some point in the future.)

Why is idempotency good? Well, it formalizes the intuition that objects aren't altered when coerced if they're already "of the right type of object." I expect that if I pass 42 to a function that has the pattern x :Int, I might reasonably expect that x will get 42 bound to it, and not 420 or some other wrong number.

Monte's handling of state is impure. This complicates things. Since an object's internal state can vary, its willingness to respond to messages can vary. Let's be more precise in our definition of passing coercion. An object specimen passes coercion by a guard G if, for some combination of specimen and G internal states, G.coerce(specimen, _) == specimen. If specimen passes for all possible combinations of specimen and G internal states, then we say that specimen always passes coercion by G. (And if specimen cannot pass coercion with any possible combination of states, then it never passes.)

Now we can get to retractability. A idempotent guard G is unretractable if, for all objects in Mont which pass coercion by G, those objects always pass coercion by G. The converse property, that it's possible for some object to pass but not always pass coercion, would make G retractable.

An unretractable guard provides a very comfortable improvement over an idempotent one, similar to dipping your objects in DeepFrozen. I think that most of the interesting possibilities for guards come from unretractable guards. Most of the builtin guards are unretractable, too; data guards like Double and Str are good examples.

Theorem: An unretractable guard G partitions Mont into two disjoint subsets whose members always pass or never pass coercion by G, respectively. The proof is pretty trivial. This theorem lets us formalize the notion of a guard as protecting a section of code from unacceptable values; if Char is unretractable (and it is!), then a value guarded by Char is always going to be a character and never anything else. This theorem also gives us our first stab at a type declaration, where we might say something like "An object is of type Char if it passes Char."

Now let's go back to the beginning. We want to know how Char and Int interact. So, let's define some operations analagous to set union and intersection. The union of two unretractable guards G and H is written Any[G, H] and is defined as an unretractable guard that partitions Mont into the union of the two sets of objects that always pass G or H respectively, and all other objects. A similar definition can be created for the intersection of G and H, written All[G, H] and creating a similar partition with the intersection of the always-passing sets.

Both union and intersection are semigroups on the set of unretractable guards. (I haven't picked a name for this set yet. Maybe Mont-UG?) We can add in identity elements to get monoids. For union, we can use the hypothetical guard None, which refuses to pass any object in Mont, and for intersection, the completely real guard Any can be used.

object None:
    to coerce(_, ej):
        throw(ej, "None shall pass")

It gets better. The operations are also closed over Mont-UG, and it's possible to construct an inverse of any unretractable guard which is also an unretractable guard:

def invertUG(ug):
    return object invertedUG:
        to coerce(specimen, ej):
            escape innerEj:
                ug.coerce(specimen, innerEj)
                throw(ej, "Inverted")
            catch _:
                return specimen

This means that we have groups! Two lovely groups. They're both Abelian, too. Exciting stuff. And, in the big payoff of the day, we get two rings on Mont-UG, depending on whether you want to have union or intersection as your addition or multiplication.

This empowers a programmer, informally, to intuit that if Char and Int are disjoint (and, in this case, they are), then it might not matter in which order they are placed into the switch expression.

That's all for now!

June 30, 2015

So this will be the first in a series of blogs talking about some major initiatives we are doing for Fedora Workstation. Today I want to present and talk about a thing we call Pinos.

So what is Pinos? One of the original goals of Pinos was to provide the same level of advanced hardware handling for Video that PulseAudio provides for Audio. For those of you who has been around for a while you might remember how you once upon a time could only have one application using the sound card at the same time until PulseAudio properly fixed that. Well Pinos will allow you to share your video camera between multiple applications and also provide an easy to use API to do so.

Video providers and consumers are implemented as separate processes communicating with DBUS and exchanging video frames using fd passing.

Some features of Pinos

  • Easier switching of cameras in your applications
  • It will also allow you to more easily allow applications to switch between multiple cameras or mix the content from multiple sources.

  • Multiple types of video inputs
  • Supports more than cameras. Pinos also supports other type of video sources, for instance it can support your desktop as a video source.

  • GStreamer integration
  • Pinos is built using GStreamer and also have GStreamer elements supporting it to make integrating it into GStreamer applications simple and straightforward.

  • Pinos got some audio support
  • Well it tries to solve some of the same issues for video that PulseAudio solves for audio. Namely letting you have multiple applications sharing the same camera hardware. Pinos does also include audio support in order to let you handle both.

What do we want to do with this in Fedora Workstation?

  • One thing we know is of great use and importance for many of our users, including many developers who wants to make videos demonstrating their software, is to have better screen capture support. One of the test cases we are using for Pinos is to improve the built in screen casting capabilities of GNOME 3, the goal being to reducing overhead and to allow for easy setup of picture in picture capturing. So you can easily set it up so there will be a camera capturing your face and voice and mixing that into your screen recording.
  • Video support for Desktop Sandboxes. We have been working for a while on providing technology for sandboxing your desktop applications and while we with a little work can use PulseAudio for giving the sandboxed applications audio access we needed something similar for video. Pinos provides us with such a solution.

Who is working on this?
Pinos is being designed and written by Wim Taymans who is the co-creator of the GStreamer multimedia framework and also a regular contributor to the PulseAudio project. Wim is also the working for Red Hat as a Principal Engineer, being in charge of a lot of our multimedia support in both Red Hat Enterprise Linux and Fedora. It is also worth nothing that it draws many of its ideas from an early prototype by William Manley called PulseVideo and builds upon some of the code that was merged into GStreamer due to that effort.

Where can I get the code?
The code is currently hosteed in Wim’s private repository on freedesktop. You can get it at

How can I get involved or talk to the author
You can find Wim on Freenode IRC, he uses the name wtay and hangs out in both the #gstreamer and #pulseaudio IRC channels.
Once the project is a bit further along we will get some basic web presence set up and a mailing list created.


If Pinos contains Audio support will it eventually replace PulseAudio too?
Probably not, the usecases and goals for the two systems are somewhat different and it is not clear that trying to make Pinos accommodate all the PulseAudio usescases would be worth the effort or possible withour feature loss. So while there is always a temptation to think ‘hey, wouldn’t it be nice to have one system that can handle everything’ we are at this point unconvinced that the gain outweighs the pain.

Will Pinos offer re-directing kernel APIs for video devices like PulseAudio does for Audio? In order to handle legacy applications?
No, that was possible due to the way ALSA worked, but V4L2 doesn’t have such capabilities and thus we can not take advantage of them.

Why the name Pinos?
The code name for the project was PulseVideo, but to avoid confusion with the PulseAudio project and avoid people making to many assumptions based on the name we decided to follow in the tradition of Wayland and Weston and take inspiration from local place names related to the creator. So since Wim lives in Pinos de Alhaurin close to Malaga in Spain we decided to call the project Pinos. Pinos is the word for pines in Spanish :)

June 26, 2015
The 4.1 kernel release is still a few weeks off and hence a bit early to talk about 4.2. But the drm subsystem feature cut-off already passed and I'm going on vacation for 2 weeks, so here we go.

First things first: No, i915 does not yet support atomic modesets. But a lot of progress has been made again towards enabling it. As I explained last time around the trouble is that the intel driver has grown its own almost-atomic modeset infrastructure over the past few years. And now we need to convert that to the slightly different proper atomic support infrastructure merged into the drm core, which means lots and lots of small changes all over the driver. A big part merged in this release is the removal of the ->new_config pointer by Ander, Matt & Maarten. This was the old i915-specific pointer to the staged new configuration. Removing it required switching all the CRTC code over to handling the staged configuration stored in the struct drm_atomic_state to be compatible with the atomic core. Unfortunately we still need to do the same for encoder/connector states and for plane states, so there's still lots of shuffling pending for 4.2.

There has also been other feature work going on on the modeset side: Ville cleaned&fixed up the CDCLK support in anticipation of implementing dynamic display clock frequency scaling. Unfortunately that part of his patches hasn't landed yet. Ville has also merged patches to fix up some details in the CPT modeset sequence, maybe this will finally fix the remaining "DP port stuck" issues we still seem to have.

Looking at newer platforms the interesting bit is rotation support for SKL from Sonika and Tvrtko. Compared to older platforms skl now also supports 90° and 270° rotation in the scanout engines, but only when the framebuffer uses a special tiling layout (which have been enabled in 4.0). A related feature is support for plane/CRTC scalers on SKL, provided by Chandra. Skylake has also gained support for the new low-power display states DC5/6. For Broxton basic enabling has landed, but there's nothing too interesting yet besides piles of small adjustments all over. This is because Broxton and Skylake have a common display block (similar to how the render block for atom chips was already shared since Baytrail) and hence share a lot of the infrastructure code. Unfortunately neither of these platforms has yet left the preliminary hardware support label for the i915 driver.

There's also a few minor features in the display code worth mentioning: DP compliance testing infrastructure from Todd Previte - DP compliance test devices have a special DP AUX sidechannel protocol for requesting certain test procedures and hence need a bit of driver support. Most of this will be in userspace though, with the kernel just forward requests and handing back results. Mika Kahola has optimized the DP link training, the kernel will now first try to use the current values (either from a previous modeset or set up by the firmware). PSR has also seen some more work, unfortunately it's still not yet enabled by default. And finally there's been lots of cleanups and improvements under the hood all over, as usual.

A big feature is the dynamic pagetable allocation for gen8+ from Michel Thierry and Ben Widawsky. This will greatly reduce the overhead of PPGTT and is a requirement for 48bit address space support - with that big a VM preallocating all the pagetables is just not possible any more. The gen7 cmd parser is now finally fixed up and enabled by default (thanks to Rebecca Palmer for one crucial fix), which means finally some newer GL extensions can be used without adding kernel hacks. And Chris Wilson has fine-tuned the cmd parser with a big pile of patches to reduce the overhead. And Chris has tuned the RPS boost code more, it should now no longer erratically boost the GPU's clock when it's inappropriate. He has also written a lot of patches to reduce the overhead of execlist command submission, and some of those patches have been merged into this release.

Finally two pieces of prep work: A few patches from John Harrison to prepare for removing the outstanding lazy request. We've added this years ago as a cheap way out of a memory and ringbuffer space preallocation issue and ever since then paid the price for this with added complexity leaking all over the GEM code. Unfortunately the actual removal is still pending. And then Joonas Lahtinen has implemented partial GTT mmap support. This is needed for virtual enviroments like XenGT where the GTT is cut up between the different guests and hence badly fragmented. The merged code only supports linear views and still needs support for fenced buffer objects to be actually useful.

June 25, 2015

One of the bits we are currently finalising in libinput are touchpad gestures. Gestures on a normal touchscreens are left to the compositor and, in extension, to the client applications. Touchpad gestures are notably different though, they are bound to the location of the pointer or the keyboard focus (depending on the context) and they are less context-sensitive. Two fingers moving together on a touchscreen may be two windows being moved at the same time. On a touchpad however this is always a pinch.

Touchpad gestures are a lot more hardware-sensitive than touchscreens where we can just forward the touch points directly. On a touchpad we may have to consider software buttons or just HW-limitations of the touchpad. This prevents the implementation of touchpad gestures in a higher level - only libinput is aware of the location, size, etc. of software buttons.

Hence - touchpad gestures in libinput. The tree is currently sitting here and is being rebased as we go along, but we're expecting to merge this into master soon.

The interface itself is fairly simple: any device that may send gestures will have the LIBINPUT_DEVICE_CAP_GESTURE capability set. This is currently only implemented for touchpads but there is the potential to support this on other devices too. Two gestures are supported: swipe and pinch (+rotate). Both come with a finger count and both follow a Start/Update/End cycle. Gestures have a finger count that remains the same for the gestures, so if you switch from a two-finger pinch to a three-finger pinch you will see one gesture end and the next one start. Note that how to deal with this is up to the caller - it may very well consider this the same gesture semantically.

Swipe gestures have delta coordinates (horizontally and vertically) of the logical center of the gesture, compared to the previous event. A pinch gesture has the delta coordinates too and a delta angle (clockwise, in degrees). A pinch gesture also has the notion of an absolute scale, the Begin event always has a scale of 1.0 and that changes as the fingers move towards each other further apart. A scale of 2.0 means they're now twice as far apart as originally.

Nothing overly exciting really, it's a simple API that provides a couple of basic elements of data. Once integrated into the desktop properly, it should provide for some improved navigation. OS X has had this for a log time now and it's only time we caught up.

June 20, 2015

You’re a developer and you know AF_UNIX? You used it occasionally in your code, you know how high-level IPC puts marshaling on top and generally have a confident feeling when talking about it? But you actually have no clue what this fancy new kdbus is really about? During discussions you just nod along and hope nobody notices?


This is how it should be! As long as you don’t work on IPC libraries, there’s absolutely no requirement for you to have any idea what kdbus is. But as you’re reading this, I assume you’re curious and want to know more. So lets pick you up at AF_UNIX and look at a simple example.


Imagine a handful of processes that need to talk to each other. You have two options: Either you create a separate socket-pair between each two processes, or you create just one socket per process and make sure you can address all others via this socket. The first option will cause a quadratic growth of sockets and blows up if you raise the number of processes. Hence, we choose the latter, so our socket allocation looks like this:


Simple. Now we have to make sure the socket has a name and others can find it. We choose to not pollute the file-system but rather use the managed abstract namespace. As we don’t care for the exact names right now, we just let the kernel choose one. Furthermore, we enable credential-transmission so we can recognize peers that we get messages from:

struct sockaddr_un address = { .sun_family = AF_UNIX };
int enable = 1;

setsockopt(fd, SOL_SOCKET, SO_PASSCRED, &enable, sizeof(enable));
bind(fd, (struct sockaddr*)&address, sizeof(address.sun_family));

By omitting the sun_path part of the address, we tell the kernel to pick one itself. This was easy. Now we’re ready to go so lets see how we can send a message to a peer. For simplicity, we assume we know the address of the peer and it’s stored in destination.

struct sockaddr_un destination = { .sun_family = AF_UNIX, .sun_path = "..." };

sendto(fd, "foobar", 7, MSG_NOSIGNAL, (struct sockaddr*)&destination, sizeof(destination));

…and that’s all that is needed to send our message to the selected destination. On the receiver’s side, we call into recvmsg to receive the first message from our queue. We cannot use recvfrom as we want to fetch the credentials, too. Furthermore, we also cannot know how big the message is, so we query the kernel first and allocate a suitable buffer. This could be avoided, if we knew the maximum package size. But lets be thorough and support unlimited package sizes. Also note that recvmsg will return any next queued message. We cannot know the sender beforehand, so we also pass a buffer to store the address of the sender of this message:

char control[CMSG_SPACE(sizeof(struct ucred))];
struct sockaddr_un sender = {};
struct ucred creds = {};
struct msghdr msg = {};
struct iovec iov = {};
struct cmsghdr *cmsg;
char *message;
ssize_t l;
int size;

ioctl(fd, SIOCINQ, &size);
message = malloc(size + 1);
iov.iov_base = message;
iov.iov_len = size;

msg.msg_name = (struct sockaddr*)&sender;
msg.msg_namelen = sizeof(sender);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = control;
msg.msg_controllen = sizeof(control);

l = recvmsg(fd, msg, MSG_CMSG_CLOEXEC);

for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
        if (cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_CREDENTIALS)
                memcpy(&creds, CMSG_DATA(cmsg), sizeof(creds));

printf("Message: %s (length: %zd uid: %u sender: %s)\n",
       message, l, creds.uid, sender.sun_path + 1);

That’s it. With this in place, we can easily send arbitrary messages between our peers. We have no length restriction, we can identify the peers reliably and we’re not limited by any marshaling. Sure, we now dropped error-handling, event-loop integration and ignored some nasty corner cases, but that can all be solved. The code stays mostly the same.

Congratulations! You now understand kdbus. In kdbus:

  • socket(AF_UNIX, SOCK_DGRAM, 0) becomes open(“/sys/fs/kdbus/….”, …)
  • bind(fd, …, …) becomes ioctl(fd, KDBUS_CMD_HELLO, …)
  • sendto(fd, …) becomes ioctl(fd, KDBUS_CMD_SEND, …)
  • recvmsg(fd, …) becomes ioctl(fd, KDBUS_CMD_RECV, …)

Granted, the code will look slightly different. However, the concept stays the same. You still transmit raw messages, no marshaling is mandated. You can transmit credentials and file-descriptors, you can specify the peer to send messages to and you got to do that all through a single file descriptor. Doesn’t sound complex, does it?

So if kdbus is actually just like AF_UNIX+SOCK_DGRAM, why use it?


The AF_UNIX setup described above has some significant flaws:

  • Messages are buffered in the receive-queue, which is relatively small. If you send a message to a peer with a full receive-queue, you will get EAGAIN. However, since your socket is not connected, you cannot wait for POLLOUT as it does not exist for unconnected sockets (which peer would you wait for?). Hence, you’re basically screwed if remote peers do not clear their queues. This is a really severe restriction which makes this model useless for such setups.
  • There is no policy regarding who can talk to whom. In the abstract namespace, everyone can talk to anyone in the same network namespace. This can be avoided by placing sockets in the file system. However, then you lose the auto-cleanup feature of the abstract namespace. Furthermore, you’re limited to file-system policies, which might not be suitable.
  • You can only have a single name per socket. If you implement multiple services that should be reached via different names, then you need multiple sockets (and thus you lose global message ordering).
  • You cannot send broadcasts. You cannot even easily enumerate peers.
  • You cannot get notifications if peers die. However, this is crucial if you provide services to them and need to clean them up after they disconnected.

kdbus solves all these issues. Some of these issues could be solved with AF_UNIX (which, btw., would look a lot like AF_NETLINK), some cannot. But more importantly, AF_UNIX was never designed as a shared bus and hence should not be used as such. kdbus, on the other hand, was designed with a shared bus model in mind and as such avoids most of these issues.

With that basic understanding of kdbus, next time a news source reports about the “crazy idea to shove DBus into the kernel”, I hope you’ll be able to judge for yourself. And if you want to know more, I recommend building the kernel documentation, or diving into the code.

June 18, 2015

With the new v221 release of systemd we are declaring the sd-bus API shipped with systemd stable. sd-bus is our minimal D-Bus IPC C library, supporting as back-ends both classic socket-based D-Bus and kdbus. The library has been been part of systemd for a while, but has only been used internally, since we wanted to have the liberty to still make API changes without affecting external consumers of the library. However, now we are confident to commit to a stable API for it, starting with v221.

In this blog story I hope to provide you with a quick overview on sd-bus, a short reiteration on D-Bus and its concepts, as well as a few simple examples how to write D-Bus clients and services with it.

What is D-Bus again?

Let's start with a quick reminder what D-Bus actually is: it's a powerful, generic IPC system for Linux and other operating systems. It knows concepts like buses, objects, interfaces, methods, signals, properties. It provides you with fine-grained access control, a rich type system, discoverability, introspection, monitoring, reliable multicasting, service activation, file descriptor passing, and more. There are bindings for numerous programming languages that are used on Linux.

D-Bus has been a core component of Linux systems since more than 10 years. It is certainly the most widely established high-level local IPC system on Linux. Since systemd's inception it has been the IPC system it exposes its interfaces on. And even before systemd, it was the IPC system Upstart used to expose its interfaces. It is used by GNOME, by KDE and by a variety of system components.

D-Bus refers to both a specification, and a reference implementation. The reference implementation provides both a bus server component, as well as a client library. While there are multiple other, popular reimplementations of the client library – for both C and other programming languages –, the only commonly used server side is the one from the reference implementation. (However, the kdbus project is working on providing an alternative to this server implementation as a kernel component.)

D-Bus is mostly used as local IPC, on top of AF_UNIX sockets. However, the protocol may be used on top of TCP/IP as well. It does not natively support encryption, hence using D-Bus directly on TCP is usually not a good idea. It is possible to combine D-Bus with a transport like ssh in order to secure it. systemd uses this to make many of its APIs accessible remotely.

A frequently asked question about D-Bus is why it exists at all, given that AF_UNIX sockets and FIFOs already exist on UNIX and have been used for a long time successfully. To answer this question let's make a comparison with popular web technology of today: what AF_UNIX/FIFOs are to D-Bus, TCP is to HTTP/REST. While AF_UNIX sockets/FIFOs only shovel raw bytes between processes, D-Bus defines actual message encoding and adds concepts like method call transactions, an object system, security mechanisms, multicasting and more.

From our 10year+ experience with D-Bus we know today that while there are some areas where we can improve things (and we are working on that, both with kdbus and sd-bus), it generally appears to be a very well designed system, that stood the test of time, aged well and is widely established. Today, if we'd sit down and design a completely new IPC system incorporating all the experience and knowledge we gained with D-Bus, I am sure the result would be very close to what D-Bus already is.

Or in short: D-Bus is great. If you hack on a Linux project and need a local IPC, it should be your first choice. Not only because D-Bus is well designed, but also because there aren't many alternatives that can cover similar functionality.

Where does sd-bus fit in?

Let's discuss why sd-bus exists, how it compares with the other existing C D-Bus libraries and why it might be a library to consider for your project.

For C, there are two established, popular D-Bus libraries: libdbus, as it is shipped in the reference implementation of D-Bus, as well as GDBus, a component of GLib, the low-level tool library of GNOME.

Of the two libdbus is the much older one, as it was written at the time the specification was put together. The library was written with a focus on being portable and to be useful as back-end for higher-level language bindings. Both of these goals required the API to be very generic, resulting in a relatively baroque, hard-to-use API that lacks the bits that make it easy and fun to use from C. It provides the building blocks, but few tools to actually make it straightforward to build a house from them. On the other hand, the library is suitable for most use-cases (for example, it is OOM-safe making it suitable for writing lowest level system software), and is portable to operating systems like Windows or more exotic UNIXes.

GDBus is a much newer implementation. It has been written after considerable experience with using a GLib/GObject wrapper around libdbus. GDBus is implemented from scratch, shares no code with libdbus. Its design differs substantially from libdbus, it contains code generators to make it specifically easy to expose GObject objects on the bus, or talking to D-Bus objects as GObject objects. It translates D-Bus data types to GVariant, which is GLib's powerful data serialization format. If you are used to GLib-style programming then you'll feel right at home, hacking D-Bus services and clients with it is a lot simpler than using libdbus.

With sd-bus we now provide a third implementation, sharing no code with either libdbus or GDBus. For us, the focus was on providing kind of a middle ground between libdbus and GDBus: a low-level C library that actually is fun to work with, that has enough syntactic sugar to make it easy to write clients and services with, but on the other hand is more low-level than GDBus/GLib/GObject/GVariant. To be able to use it in systemd's various system-level components it needed to be OOM-safe and minimal. Another major point we wanted to focus on was supporting a kdbus back-end right from the beginning, in addition to the socket transport of the original D-Bus specification ("dbus1"). In fact, we wanted to design the library closer to kdbus' semantics than to dbus1's, wherever they are different, but still cover both transports nicely. In contrast to libdbus or GDBus portability is not a priority for sd-bus, instead we try to make the best of the Linux platform and expose specific Linux concepts wherever that is beneficial. Finally, performance was also an issue (though a secondary one): neither libdbus nor GDBus will win any speed records. We wanted to improve on performance (throughput and latency) -- but simplicity and correctness are more important to us. We believe the result of our work delivers our goals quite nicely: the library is fun to use, supports kdbus and sockets as back-end, is relatively minimal, and the performance is substantially better than both libdbus and GDBus.

To decide which of the three APIs to use for you C project, here are short guidelines:

  • If you hack on a GLib/GObject project, GDBus is definitely your first choice.

  • If portability to non-Linux kernels -- including Windows, Mac OS and other UNIXes -- is important to you, use either GDBus (which more or less means buying into GLib/GObject) or libdbus (which requires a lot of manual work).

  • Otherwise, sd-bus would be my recommended choice.

(I am not covering C++ specifically here, this is all about plain C only. But do note: if you use Qt, then QtDBus is the D-Bus API of choice, being a wrapper around libdbus.)

Introduction to D-Bus Concepts

To the uninitiated D-Bus usually appears to be a relatively opaque technology. It uses lots of concepts that appear unnecessarily complex and redundant on first sight. But actually, they make a lot of sense. Let's have a look:

  • A bus is where you look for IPC services. There are usually two kinds of buses: a system bus, of which there's exactly one per system, and which is where you'd look for system services; and a user bus, of which there's one per user, and which is where you'd look for user services, like the address book service or the mail program. (Originally, the user bus was actually a session bus -- so that you get multiple of them if you log in many times as the same user --, and on most setups it still is, but we are working on moving things to a true user bus, of which there is only one per user on a system, regardless how many times that user happens to log in.)

  • A service is a program that offers some IPC API on a bus. A service is identified by a name in reverse domain name notation. Thus, the org.freedesktop.NetworkManager service on the system bus is where NetworkManager's APIs are available and org.freedesktop.login1 on the system bus is where systemd-logind's APIs are exposed.

  • A client is a program that makes use of some IPC API on a bus. It talks to a service, monitors it and generally doesn't provide any services on its own. That said, lines are blurry and many services are also clients to other services. Frequently the term peer is used as a generalization to refer to either a service or a client.

  • An object path is an identifier for an object on a specific service. In a way this is comparable to a C pointer, since that's how you generally reference a C object, if you hack object-oriented programs in C. However, C pointers are just memory addresses, and passing memory addresses around to other processes would make little sense, since they of course refer to the address space of the service, the client couldn't make sense of it. Thus, the D-Bus designers came up with the object path concept, which is just a string that looks like a file system path. Example: /org/freedesktop/login1 is the object path of the 'manager' object of the org.freedesktop.login1 service (which, as we remember from above, is still the service systemd-logind exposes). Because object paths are structured like file system paths they can be neatly arranged in a tree, so that you end up with a venerable tree of objects. For example, you'll find all user sessions systemd-logind manages below the /org/freedesktop/login1/session sub-tree, for example called /org/freedesktop/login1/session/_7, /org/freedesktop/login1/session/_55 and so on. How services precisely label their objects and arrange them in a tree is completely up to the developers of the services.

  • Each object that is identified by an object path has one or more interfaces. An interface is a collection of signals, methods, and properties (collectively called members), that belong together. The concept of a D-Bus interface is actually pretty much identical to what you know from programming languages such as Java, which also know an interface concept. Which interfaces an object implements are up the developers of the service. Interface names are in reverse domain name notation, much like service names. (Yes, that's admittedly confusing, in particular since it's pretty common for simpler services to reuse the service name string also as an interface name.) A couple of interfaces are standardized though and you'll find them available on many of the objects offered by the various services. Specifically, those are org.freedesktop.DBus.Introspectable, org.freedesktop.DBus.Peer and org.freedesktop.DBus.Properties.

  • An interface can contain methods. The word "method" is more or less just a fancy word for "function", and is a term used pretty much the same way in object-oriented languages such as Java. The most common interaction between D-Bus peers is that one peer invokes one of these methods on another peer and gets a reply. A D-Bus method takes a couple of parameters, and returns others. The parameters are transmitted in a type-safe way, and the type information is included in the introspection data you can query from each object. Usually, method names (and the other member types) follow a CamelCase syntax. For example, systemd-logind exposes an ActivateSession method on the org.freedesktop.login1.Manager interface that is available on the /org/freedesktop/login1 object of the org.freedesktop.login1 service.

  • A signature describes a set of parameters a function (or signal, property, see below) takes or returns. It's a series of characters that each encode one parameter by its type. The set of types available is pretty powerful. For example, there are simpler types like s for string, or u for 32bit integer, but also complex types such as as for an array of strings or a(sb) for an array of structures consisting of one string and one boolean each. See the D-Bus specification for the full explanation of the type system. The ActivateSession method mentioned above takes a single string as parameter (the parameter signature is hence s), and returns nothing (the return signature is hence the empty string). Of course, the signature can get a lot more complex, see below for more examples.

  • A signal is another member type that the D-Bus object system knows. Much like a method it has a signature. However, they serve different purposes. While in a method call a single client issues a request on a single service, and that service sends back a response to the client, signals are for general notification of peers. Services send them out when they want to tell one or more peers on the bus that something happened or changed. In contrast to method calls and their replies they are hence usually broadcast over a bus. While method calls/replies are used for duplex one-to-one communication, signals are usually used for simplex one-to-many communication (note however that that's not a requirement, they can also be used one-to-one). Example: systemd-logind broadcasts a SessionNew signal from its manager object each time a user logs in, and a SessionRemoved signal every time a user logs out.

  • A property is the third member type that the D-Bus object system knows. It's similar to the property concept known by languages like C#. Properties also have a signature, and are more or less just variables that an object exposes, that can be read or altered by clients. Example: systemd-logind exposes a property Docked of the signature b (a boolean). It reflects whether systemd-logind thinks the system is currently in a docking station of some form (only applies to laptops …).

So much for the various concepts D-Bus knows. Of course, all these new concepts might be overwhelming. Let's look at them from a different perspective. I assume many of the readers have an understanding of today's web technology, specifically HTTP and REST. Let's try to compare the concept of a HTTP request with the concept of a D-Bus method call:

  • A HTTP request you issue on a specific network. It could be the Internet, or it could be your local LAN, or a company VPN. Depending on which network you issue the request on, you'll be able to talk to a different set of servers. This is not unlike the "bus" concept of D-Bus.

  • On the network you then pick a specific HTTP server to talk to. That's roughly comparable to picking a service on a specific bus.

  • On the HTTP server you then ask for a specific URL. The "path" part of the URL (by which I mean everything after the host name of the server, up to the last "/") is pretty similar to a D-Bus object path.

  • The "file" part of the URL (by which I mean everything after the last slash, following the path, as described above), then defines the actual call to make. In D-Bus this could be mapped to an interface and method name.

  • Finally, the parameters of a HTTP call follow the path after the "?", they map to the signature of the D-Bus call.

Of course, comparing an HTTP request to a D-Bus method call is a bit comparing apples and oranges. However, I think it's still useful to get a bit of a feeling of what maps to what.

From the shell

So much about the concepts and the gray theory behind them. Let's make this exciting, let's actually see how this feels on a real system.

Since a while systemd has included a tool busctl that is useful to explore and interact with the D-Bus object system. When invoked without parameters, it will show you a list of all peers connected to the system bus. (Use --user to see the peers of your user bus instead):

$ busctl
NAME                                       PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION
:1.1                                         1 systemd         root             :1.1          -                         -          -
:1.11                                      705 NetworkManager  root             :1.11         NetworkManager.service    -          -
:1.14                                      744 gdm             root             :1.14         gdm.service               -          -
:1.4                                       708 systemd-logind  root             :1.4          systemd-logind.service    -          -
:1.7200                                  17563 busctl          lennart          :1.7200       session-1.scope           1          -
org.freedesktop.NetworkManager             705 NetworkManager  root             :1.11         NetworkManager.service    -          -
org.freedesktop.login1                     708 systemd-logind  root             :1.4          systemd-logind.service    -          -
org.freedesktop.systemd1                     1 systemd         root             :1.1          -                         -          -
org.gnome.DisplayManager                   744 gdm             root             :1.14         gdm.service               -          -

(I have shortened the output a bit, to make keep things brief).

The list begins with a list of all peers currently connected to the bus. They are identified by peer names like ":1.11". These are called unique names in D-Bus nomenclature. Basically, every peer has a unique name, and they are assigned automatically when a peer connects to the bus. They are much like an IP address if you so will. You'll notice that a couple of peers are already connected, including our little busctl tool itself as well as a number of system services. The list then shows all actual services on the bus, identified by their service names (as discussed above; to discern them from the unique names these are also called well-known names). In many ways well-known names are similar to DNS host names, i.e. they are a friendlier way to reference a peer, but on the lower level they just map to an IP address, or in this comparison the unique name. Much like you can connect to a host on the Internet by either its host name or its IP address, you can also connect to a bus peer either by its unique or its well-known name. (Note that each peer can have as many well-known names as it likes, much like an IP address can have multiple host names referring to it).

OK, that's already kinda cool. Try it for yourself, on your local machine (all you need is a recent, systemd-based distribution).

Let's now go the next step. Let's see which objects the org.freedesktop.login1 service actually offers:

$ busctl tree org.freedesktop.login1
  │ ├─/org/freedesktop/login1/seat/seat0
  │ └─/org/freedesktop/login1/seat/self
  │ ├─/org/freedesktop/login1/session/_31
  │ └─/org/freedesktop/login1/session/self

Pretty, isn't it? What's actually even nicer, and which the output does not show is that there's full command line completion available: as you press TAB the shell will auto-complete the service names for you. It's a real pleasure to explore your D-Bus objects that way!

The output shows some objects that you might recognize from the explanations above. Now, let's go further. Let's see what interfaces, methods, signals and properties one of these objects actually exposes:

$ busctl introspect org.freedesktop.login1 /org/freedesktop/login1/session/_31
NAME                                TYPE      SIGNATURE RESULT/VALUE                             FLAGS
org.freedesktop.DBus.Introspectable interface -         -                                        -
.Introspect                         method    -         s                                        -
org.freedesktop.DBus.Peer           interface -         -                                        -
.GetMachineId                       method    -         s                                        -
.Ping                               method    -         -                                        -
org.freedesktop.DBus.Properties     interface -         -                                        -
.Get                                method    ss        v                                        -
.GetAll                             method    s         a{sv}                                    -
.Set                                method    ssv       -                                        -
.PropertiesChanged                  signal    sa{sv}as  -                                        -
org.freedesktop.login1.Session      interface -         -                                        -
.Activate                           method    -         -                                        -
.Kill                               method    si        -                                        -
.Lock                               method    -         -                                        -
.PauseDeviceComplete                method    uu        -                                        -
.ReleaseControl                     method    -         -                                        -
.ReleaseDevice                      method    uu        -                                        -
.SetIdleHint                        method    b         -                                        -
.TakeControl                        method    b         -                                        -
.TakeDevice                         method    uu        hb                                       -
.Terminate                          method    -         -                                        -
.Unlock                             method    -         -                                        -
.Active                             property  b         true                                     emits-change
.Audit                              property  u         1                                        const
.Class                              property  s         "user"                                   const
.Desktop                            property  s         ""                                       const
.Display                            property  s         ""                                       const
.Id                                 property  s         "1"                                      const
.IdleHint                           property  b         true                                     emits-change
.IdleSinceHint                      property  t         1434494624206001                         emits-change
.IdleSinceHintMonotonic             property  t         0                                        emits-change
.Leader                             property  u         762                                      const
.Name                               property  s         "lennart"                                const
.Remote                             property  b         false                                    const
.RemoteHost                         property  s         ""                                       const
.RemoteUser                         property  s         ""                                       const
.Scope                              property  s         "session-1.scope"                        const
.Seat                               property  (so)      "seat0" "/org/freedesktop/login1/seat... const
.Service                            property  s         "gdm-autologin"                          const
.State                              property  s         "active"                                 -
.TTY                                property  s         "/dev/tty1"                              const
.Timestamp                          property  t         1434494630344367                         const
.TimestampMonotonic                 property  t         34814579                                 const
.Type                               property  s         "x11"                                    const
.User                               property  (uo)      1000 "/org/freedesktop/login1/user/_1... const
.VTNr                               property  u         1                                        const
.Lock                               signal    -         -                                        -
.PauseDevice                        signal    uus       -                                        -
.ResumeDevice                       signal    uuh       -                                        -
.Unlock                             signal    -         -                                        -

As before, the busctl command supports command line completion, hence both the service name and the object path used are easily put together on the shell simply by pressing TAB. The output shows the methods, properties, signals of one of the session objects that are currently made available by systemd-logind. There's a section for each interface the object knows. The second column tells you what kind of member is shown in the line. The third column shows the signature of the member. In case of method calls that's the input parameters, the fourth column shows what is returned. For properties, the fourth column encodes the current value of them.

So far, we just explored. Let's take the next step now: let's become active - let's call a method:

# busctl call org.freedesktop.login1 /org/freedesktop/login1/session/_31 org.freedesktop.login1.Session Lock

I don't think I need to mention this anymore, but anyway: again there's full command line completion available. The third argument is the interface name, the fourth the method name, both can be easily completed by pressing TAB. In this case we picked the Lock method, which activates the screen lock for the specific session. And yupp, the instant I pressed enter on this line my screen lock turned on (this only works on DEs that correctly hook into systemd-logind for this to work. GNOME works fine, and KDE should work too).

The Lock method call we picked is very simple, as it takes no parameters and returns none. Of course, it can get more complicated for some calls. Here's another example, this time using one of systemd's own bus calls, to start an arbitrary system unit:

# busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartUnit ss "cups.service" "replace"
o "/org/freedesktop/systemd1/job/42684"

This call takes two strings as input parameters, as we denote in the signature string that follows the method name (as usual, command line completion helps you getting this right). Following the signature the next two parameters are simply the two strings to pass. The specified signature string hence indicates what comes next. systemd's StartUnit method call takes the unit name to start as first parameter, and the mode in which to start it as second. The call returned a single object path value. It is encoded the same way as the input parameter: a signature (just o for the object path) followed by the actual value.

Of course, some method call parameters can get a ton more complex, but with busctl it's relatively easy to encode them all. See the man page for details.

busctl knows a number of other operations. For example, you can use it to monitor D-Bus traffic as it happens (including generating a .cap file for use with Wireshark!) or you can set or get specific properties. However, this blog story was supposed to be about sd-bus, not busctl, hence let's cut this short here, and let me direct you to the man page in case you want to know more about the tool.

busctl (like the rest of system) is implemented using the sd-bus API. Thus it exposes many of the features of sd-bus itself. For example, you can use to connect to remote or container buses. It understands both kdbus and classic D-Bus, and more!


But enough! Let's get back on topic, let's talk about sd-bus itself.

The sd-bus set of APIs is mostly contained in the header file sd-bus.h.

Here's a random selection of features of the library, that make it compare well with the other implementations available.

  • Supports both kdbus and dbus1 as back-end.

  • Has high-level support for connecting to remote buses via ssh, and to buses of local OS containers.

  • Powerful credential model, to implement authentication of clients in services. Currently 34 individual fields are supported, from the PID of the client to the cgroup or capability sets.

  • Support for tracking the life-cycle of peers in order to release local objects automatically when all peers referencing them disconnected.

  • The client builds an efficient decision tree to determine which handlers to deliver an incoming bus message to.

  • Automatically translates D-Bus errors into UNIX style errors and back (this is lossy though), to ensure best integration of D-Bus into low-level Linux programs.

  • Powerful but lightweight object model for exposing local objects on the bus. Automatically generates introspection as necessary.

The API is currently not fully documented, but we are working on completing the set of manual pages. For details see all pages starting with sd_bus_.

Invoking a Method, from C, with sd-bus

So much about the library in general. Here's an example for connecting to the bus and issuing a method call:

#include <stdio.h>
#include <stdlib.h>
#include <systemd/sd-bus.h>

int main(int argc, char *argv[]) {
        sd_bus_error error = SD_BUS_ERROR_NULL;
        sd_bus_message *m = NULL;
        sd_bus *bus = NULL;
        const char *path;
        int r;

        /* Connect to the system bus */
        r = sd_bus_open_system(&bus);
        if (r < 0) {
                fprintf(stderr, "Failed to connect to system bus: %s\n", strerror(-r));
                goto finish;

        /* Issue the method call and store the respons message in m */
        r = sd_bus_call_method(bus,
                               "org.freedesktop.systemd1",           /* service to contact */
                               "/org/freedesktop/systemd1",          /* object path */
                               "org.freedesktop.systemd1.Manager",   /* interface name */
                               "StartUnit",                          /* method name */
                               &error,                               /* object to return error in */
                               &m,                                   /* return message on success */
                               "ss",                                 /* input signature */
                               "cups.service",                       /* first argument */
                               "replace");                           /* second argument */
        if (r < 0) {
                fprintf(stderr, "Failed to issue method call: %s\n", error.message);
                goto finish;

        /* Parse the response message */
        r = sd_bus_message_read(m, "o", &path);
        if (r < 0) {
                fprintf(stderr, "Failed to parse response message: %s\n", strerror(-r));
                goto finish;

        printf("Queued service job as %s.\n", path);


        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;

Save this example as bus-client.c, then build it with:

$ gcc bus-client.c -o bus-client `pkg-config --cflags --libs libsystemd`

This will generate a binary bus-client you can now run. Make sure to run it as root though, since access to the StartUnit method is privileged:

# ./bus-client
Queued service job as /org/freedesktop/systemd1/job/3586.

And that's it already, our first example. It showed how we invoked a method call on the bus. The actual function call of the method is very close to the busctl command line we used before. I hope the code excerpt needs little further explanation. It's supposed to give you a taste how to write D-Bus clients with sd-bus. For more more information please have a look at the header file, the man page or even the sd-bus sources.

Implementing a Service, in C, with sd-bus

Of course, just calling a single method is a rather simplistic example. Let's have a look on how to write a bus service. We'll write a small calculator service, that exposes a single object, which implements an interface that exposes two methods: one to multiply two 64bit signed integers, and one to divide one 64bit signed integer by another.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <systemd/sd-bus.h>

static int method_multiply(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
        int64_t x, y;
        int r;

        /* Read the parameters */
        r = sd_bus_message_read(m, "xx", &x, &y);
        if (r < 0) {
                fprintf(stderr, "Failed to parse parameters: %s\n", strerror(-r));
                return r;

        /* Reply with the response */
        return sd_bus_reply_method_return(m, "x", x * y);

static int method_divide(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
        int64_t x, y;
        int r;

        /* Read the parameters */
        r = sd_bus_message_read(m, "xx", &x, &y);
        if (r < 0) {
                fprintf(stderr, "Failed to parse parameters: %s\n", strerror(-r));
                return r;

        /* Return an error on division by zero */
        if (y == 0) {
                sd_bus_error_set_const(ret_error, "net.poettering.DivisionByZero", "Sorry, can't allow division by zero.");
                return -EINVAL;

        return sd_bus_reply_method_return(m, "x", x / y);

/* The vtable of our little object, implements the net.poettering.Calculator interface */
static const sd_bus_vtable calculator_vtable[] = {
        SD_BUS_METHOD("Multiply", "xx", "x", method_multiply, SD_BUS_VTABLE_UNPRIVILEGED),
        SD_BUS_METHOD("Divide",   "xx", "x", method_divide,   SD_BUS_VTABLE_UNPRIVILEGED),

int main(int argc, char *argv[]) {
        sd_bus_slot *slot = NULL;
        sd_bus *bus = NULL;
        int r;

        /* Connect to the user bus this time */
        r = sd_bus_open_user(&bus);
        if (r < 0) {
                fprintf(stderr, "Failed to connect to system bus: %s\n", strerror(-r));
                goto finish;

        /* Install the object */
        r = sd_bus_add_object_vtable(bus,
                                     "/net/poettering/Calculator",  /* object path */
                                     "net.poettering.Calculator",   /* interface name */
        if (r < 0) {
                fprintf(stderr, "Failed to issue method call: %s\n", strerror(-r));
                goto finish;

        /* Take a well-known service name so that clients can find us */
        r = sd_bus_request_name(bus, "net.poettering.Calculator", 0);
        if (r < 0) {
                fprintf(stderr, "Failed to acquire service name: %s\n", strerror(-r));
                goto finish;

        for (;;) {
                /* Process requests */
                r = sd_bus_process(bus, NULL);
                if (r < 0) {
                        fprintf(stderr, "Failed to process bus: %s\n", strerror(-r));
                        goto finish;
                if (r > 0) /* we processed a request, try to process another one, right-away */

                /* Wait for the next request to process */
                r = sd_bus_wait(bus, (uint64_t) -1);
                if (r < 0) {
                        fprintf(stderr, "Failed to wait on bus: %s\n", strerror(-r));
                        goto finish;


        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;

Save this example as bus-service.c, then build it with:

$ gcc bus-service.c -o bus-service `pkg-config --cflags --libs libsystemd`

Now, let's run it:

$ ./bus-service

In another terminal, let's try to talk to it. Note that this service is now on the user bus, not on the system bus as before. We do this for simplicity reasons: on the system bus access to services is tightly controlled so unprivileged clients cannot request privileged operations. On the user bus however things are simpler: as only processes of the user owning the bus can connect no further policy enforcement will complicate this example. Because the service is on the user bus, we have to pass the --user switch on the busctl command line. Let's start with looking at the service's object tree.

$ busctl --user tree net.poettering.Calculator

As we can see, there's only a single object on the service, which is not surprising, given that our code above only registered one. Let's see the interfaces and the members this object exposes:

$ busctl --user introspect net.poettering.Calculator /net/poettering/Calculator
NAME                                TYPE      SIGNATURE RESULT/VALUE FLAGS
net.poettering.Calculator           interface -         -            -
.Divide                             method    xx        x            -
.Multiply                           method    xx        x            -
org.freedesktop.DBus.Introspectable interface -         -            -
.Introspect                         method    -         s            -
org.freedesktop.DBus.Peer           interface -         -            -
.GetMachineId                       method    -         s            -
.Ping                               method    -         -            -
org.freedesktop.DBus.Properties     interface -         -            -
.Get                                method    ss        v            -
.GetAll                             method    s         a{sv}        -
.Set                                method    ssv       -            -
.PropertiesChanged                  signal    sa{sv}as  -            -

The sd-bus library automatically added a couple of generic interfaces, as mentioned above. But the first interface we see is actually the one we added! It shows our two methods, and both take "xx" (two 64bit signed integers) as input parameters, and return one "x". Great! But does it work?

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Multiply xx 5 7
x 35

Woohoo! We passed the two integers 5 and 7, and the service actually multiplied them for us and returned a single integer 35! Let's try the other method:

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx 99 17
x 5

Oh, wow! It can even do integer division! Fantastic! But let's trick it into dividing by zero:

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx 43 0
Sorry, can't allow division by zero.

Nice! It detected this nicely and returned a clean error about it. If you look in the source code example above you'll see how precisely we generated the error.

And that's really all I have for today. Of course, the examples I showed are short, and I don't get into detail here on what precisely each line does. However, this is supposed to be a short introduction into D-Bus and sd-bus, and it's already way too long for that …

I hope this blog story was useful to you. If you are interested in using sd-bus for your own programs, I hope this gets you started. If you have further questions, check the (incomplete) man pages, and inquire us on IRC or the systemd mailing list. If you need more examples, have a look at the systemd source tree, all of systemd's many bus services use sd-bus extensively.

June 16, 2015

Recently, I've been fighting with the never ending issue of timezones. I never thought I would have plunged into this rabbit hole, but hacking on OpenStack and Gnocchi I felt into that trap easily is, thanks to Python.

“Why you really, really, should never ever deal with timezones”

To get a glimpse of the complexity of timezones, I recommend that you watch Tom Scott's video on the subject. It's fun and it summarizes remarkably well the nightmare that timezones are and why you should stop thinking that you're smart.

The importance of timezones in applications

Once you've heard what Tom says, I think it gets pretty clear that a timestamp without any timezone attached does not give any useful information. It should be considered irrelevant and useless. Without the necessary context given by the timezone, you cannot infer what point in time your application is really referring to.

That means your application should never handle timestamps with no timezone information. It should try to guess or raises an error if no timezone is provided in any input.

Of course, you can infer that having no timezone information means UTC. This sounds very handy, but can also be dangerous in certain applications or language – such as Python, as we'll see.

Indeed, in certain applications, converting timestamps to UTC and losing the timezone information is a terrible idea. Imagine that a user create a recurring event every Wednesday at 10:00 in its local timezone, say CET. If you convert that to UTC, the event will end up being stored as every Wednesday at 09:00.

Now imagine that the CET timezone switches from UTC+01:00 to UTC+02:00: your application will compute that the event starts at 11:00 CET every Wednesday. Which is wrong, because as the user told you, the event starts at 10:00 CET, whatever the definition of CET is. Not at 11:00 CET. So CET means CET, not necessarily UTC+1.

As for endpoints like REST API, a thing I daily deal with, all timestamps should include a timezone information. It's nearly impossible to know what timezone the timestamps are in otherwise: UTC? Server local? User local? No way to know.

Python design & defect

Python comes with a timestamp object named datetime.datetime. It can store date and time precise to the microsecond, and is qualified of timezone "aware" or "unaware", whether it embeds a timezone information or not.

To build such an object based on the current time, one can use datetime.datetime.utcnow() to retrieve the date and time for the UTC timezone, and to retrieve the date and time for the current timezone, whatever it is.

>>> import datetime
>>> datetime.datetime.utcnow()
datetime.datetime(2015, 6, 15, 13, 24, 48, 27631)
datetime.datetime(2015, 6, 15, 15, 24, 52, 276161)

As you can notice, none of these results contains timezone information. Indeed, Python datetime API always returns unaware datetime objects, which is very unfortunate. Indeed, as soon as you get one of this object, there is no way to know what the timezone is, therefore these objects are pretty "useless" on their own.

Armin Ronacher proposes that an application always consider that the unaware datetime objects from Python are considered as UTC. As we just saw, that statement cannot be considered true for objects returned by, so I would not advise doing so. datetime objects with no timezone should be considered as a "bug" in the application.


My recommendation list comes down to:

  1. Always use aware datetime object, i.e. with timezone information. That makes sure you can compare them directly (aware and unaware datetime objects are not comparable) and will return them correctly to users. Leverage pytz to have timezone objects.
  2. Use ISO 8601 as input and output string format. Use datetime.datetime.isoformat() to return timestamps as string formatted using that format, which includes the timezone information.

In Python, that's equivalent to having:

>>> import datetime
>>> import pytz
>>> def utcnow():
>>> utcnow()
datetime.datetime(2015, 6, 15, 14, 45, 19, 182703, tzinfo=<UTC>)
>>> utcnow().isoformat()

If you need to parse strings containing ISO 8601 formatted timestamp, you can rely on the iso8601, which returns timestamps with correct timezone information. This makes timestamps directly comparable:

>>> import iso8601
>>> iso8601.parse_date(utcnow().isoformat())
datetime.datetime(2015, 6, 15, 14, 46, 43, 945813, tzinfo=<FixedOffset '+00:00' datetime.timedelta(0)>)
>>> iso8601.parse_date(utcnow().isoformat()) < utcnow()

If you need to store those timestamps, the same rule should apply. If you rely on MongoDB, it assumes that all the timestamp are in UTC, so be careful when storing them – you will have to normalize the timestamp to UTC.

For MySQL, nothing is assumed, it's up to the application to insert them in a timezone that makes sense to it. Obviously, if you have multiple applications accessing the same database with different data sources, this can end up being a nightmare.

PostgreSQL has a special data type that is recommended called timestamp with timezone, and which can store the timezone associated, and do all the computation for you. That's the recommended way to store them obviously. That does not mean you should not use UTC in most cases; that just means you are sure that the timestamp are stored in UTC since it's written in the database, and you check if any other application inserted timestamps with different timezone.

OpenStack status

As a side note, I've improved OpenStack situation recently by changing the oslo.utils.timeutils module to deprecate some useless and dangerous functions. I've also added support for returning timezone aware objects when using the oslo_utils.timeutils.utcnow() function. It's not possible to make it a default unfortunately for backward compatibility reason, but it's there nevertheless, and it's advised to use it. Thanks to my colleague Victor for the help!

Have a nice day, whatever your timezone is!


This weekend my work on implementing NVIDIA global performance counters has been merged in Nouveau.

With Linux 4.2, Nouveau will allow the userspace to monitor both compute and graphics (global) counters for Tesla, but only compute counters for Fermi. I need to go back to Windows for reverse engineering graphics counters with NVIDIA Perfkit. About Kepler, I have to figure out how to deal with clock gating but this is not going to be hard, so I’ll probably submit a series which adds compute counters this month.

All of these performance counters will be exposed through the Gallium’s HUD and GL_AMD_performance_monitor once I have finished writing the code in mesa.

But don’t be too excited for the moment, because we still need to implement the new nvif interface exposed by Nouveau in libdrm.

My plan is to complete all of this work before the XDC 2015.


June 12, 2015

Last post and the one before were about how to create your own piglit tests. Previously, I have written an introduction to piglit and how to launch a tailored piglit run (more details about these last two topics in my FOSDEM 2015 talk).

Now it’s time to talk about how to contribute to piglit.

How to contribute to piglit

Once you want to contribute something to piglit, you need to generate the patches and send them for review to the mailing list. They are usually created by git format-patch and sent by git send-email command (if you need help with git, there are a lot of tutorials). Remember to rebase your branch against up-to-dated master before creating the patches, so no merge conflicts will appear if the reviewers wants to apply them locally.

Whether you have some patches ready to be submitted or you have questions about piglit, subscribe to and send them there.

Most piglit developers work in other areas (such as OpenGL driver development!) which means that the review process of piglit patches could take some time, so be patient and wait.

If after some time (something like one or two weeks) there is no answer about your patches, you can send a reminder saying that a review is pending. If you have commit rights and the patch is trivial (or you are very confident that it is right), you can even push it to the repository’s master branch after that time. Piglit is not as strict as other projects in this regard, however do not abuse of this rule.

Once you have a good track of contributions or other contributors told you to do so, you can ask for piglit repository’s commit rights by following these instructions. And don’t hesitate to review patches from others!

Table of contents

This is the list of my piglit related posts:

  1. Piglit, an open-source test suite for OpenGL implementations
  2. piglit (II): How to launch a tailored piglit run
  3. piglit (III): How to write GLSL shader tests
  4. piglit (IV): How to write binary test programs
  5. piglit (V): how to contribute to piglit and table of contents

Plus my FOSDEM 2015 talk.

Thanks for following this short introduction to piglit. Happy hacking!

June 11, 2015

Last post I talked about how to develop GLSL shader tests and how to add them to piglit. This is a nice way to develop simple tests but sometimes you need to do something more complex. For that case, piglit can run binary test programs.


Binary test programs in piglit are written in C language taking advantage of the piglit framework to facilitate the test development (there is no main() function, no need to setup GLX (or WGL or EGL or whatever), no need to check manually the available extensions or call to glXSwapBuffers() or similar functions…), however you can still use all OpenGL C-language API.

Simple example

Piglit framework is mostly undocumented but easy to understand once you start reading some existing tests. So I will start with a simple example explaining how a typical binary test looks like and then show you a real example.

 * Copyright © 2015 Intel Corporation
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 * The above copyright notice and this permission notice (including the next
 * paragraph) shall be included in all copies or substantial portions of the
 * Software.

/** @file test-example.c
 * This test shows an skeleton for creating new tests.

#include "piglit-util-gl.h"

    config.window_width = 100;
    config.window_height = 100;
    config.supports_gl_compat_version = 10;
    config.supports_gl_core_version = 31;


static const char vs_pass_thru_text[] =
    "#version 330n"
    "in vec4 piglit_vertex;n"
    "void main() {n"
    "   gl_Position = piglit_vertex;n"

static const char fs_source[] =
    "#version 330n"
    "void main() {n"
    "       color = vec4(1.0, 0.0. 0.0, 1.0);n"

GLuint prog;

piglit_init(int argc, char **argv)
    bool pass = true;

    /* piglit_require_extension("GL_ARB_shader_storage_buffer_object"); */

    prog = piglit_build_simple_program(vs_pass_thru_text, fs_source);


    glClearColor(0, 0, 0, 0);

    /* <-- OpenGL commands to be done --> */

    glViewport(0, 0, piglit_width, piglit_height);

    /* piglit_draw_* commands can go into piglit_display() too */
    piglit_draw_rect(-1, -1, 2, 2);

    if (!piglit_check_gl_error(GL_NO_ERROR))
       pass = false;

    piglit_report_result(pass ? PIGLIT_PASS : PIGLIT_FAIL);

enum piglit_result piglit_display(void)
    /* <-- OpenGL drawing commands, if needed --> */

    /* UNREACHED */
    return PIGLIT_FAIL;

As you see in this example, there are four different parts:

  1. License header and description of the test
    • The license details should be included in each source file. There is one agreed by most contributors and it’s a MIT license assigning the copyright to Intel Corporation. More information in COPYING file.
    • It includes a brief description of what the test does.

     * Copyright © 2015 Intel Corporation
     * Permission is hereby granted, free of charge, to any person obtaining a
     * copy of this software and associated documentation files (the "Software"),
     * to deal in the Software without restriction, including without limitation
     * the rights to use, copy, modify, merge, publish, distribute, sublicense,
     * and/or sell copies of the Software, and to permit persons to whom the
     * Software is furnished to do so, subject to the following conditions:
     * The above copyright notice and this permission notice (including the next
     * paragraph) shall be included in all copies or substantial portions of the
     * Software.
    /** @file test-example.c
     * This test shows an skeleton for creating new tests.

  2. Piglit setup. This is needed to check if a test can be executed by a given driver (minimum supported GL version), or to create a window of a specific size, or even to define if we want double buffering.

        config.window_width = 100;
        config.window_height = 100;
        config.supports_gl_compat_version = 10;
        config.supports_gl_core_version = 31;
        config.window_visual = PIGLIT_GL_VISUAL_DOUBLE | PIGLIT_GL_VISUAL_RGBA;

  3. piglit_init(). This is the function that it is going to be called for configuring the test itself. Some tests implement all their code inside piglit_init() because it doesn’t need to draw anything (or it doesn’t need to update the drawing frame by frame). In any case, you usually put here the following code:
    • Check for needed extensions.
    • Check for limits or maximum values of different variables like GL_MAX_SHADER_STORAGE_BLOCKS, GL_UNIFORM_BLOCK_SIZE, etc.
    • Setup constant data, upload it.
    • All the initialization setup you need for drawing commands: compile and link shaders, set clear color, etc.

    piglit_init(int argc, char **argv)
        bool pass = true;
        /* piglit_require_extension("GL_ARB_shader_storage_buffer_object"); */
        prog = piglit_build_simple_program(vs_pass_thru_text, fs_source);
        glClearColor(0, 0, 0, 0);
        /* <-- OpenGL commands to be done --> */
        glViewport(0, 0, piglit_width, piglit_height);
        /* piglit_draw_* commands can go into piglit_display() too */
        piglit_draw_rect(-1, -1, 2, 2);
        if (!piglit_check_gl_error(GL_NO_ERROR))
           pass = false;
        piglit_report_result(pass ? PIGLIT_PASS : PIGLIT_FAIL);

  4. piglit_display(). This is the function that it is going to be executed periodically to update each frame of the rendered window. In some tests, you will find it almost empty (it returns PIGLIT_FAIL) because it is not needed by the test program.

    enum piglit_result piglit_display(void)
        /* <-- OpenGL drawing commands, if needed --> */
        /* UNREACHED */
        return PIGLIT_FAIL;

Notice that you are free to add any helper functions you need like any other C program but the aforementioned parts are required by piglit.

Piglit API

Piglit provides a lot of functions under its API to be used by the test program. They are usually often-used functions that substitute one or several OpenGL function calls and other code that accompany them.

The available functions are listed in file, which must be included in every binary test source code.

 * Copyright (c) The Piglit project 2007
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * on the rights to use, copy, modify, merge, publish, distribute, sub
 * license, and/or sell copies of the Software, and to permit persons to whom
 * the Software is furnished to do so, subject to the following conditions:
 * The above copyright notice and this permission notice (including the next
 * paragraph) shall be included in all copies or substantial portions of the
 * Software.

#pragma once
#ifndef __PIGLIT_UTIL_GL_H__
#define __PIGLIT_UTIL_GL_H__

#ifdef __cplusplus
extern "C" {

#include "piglit-util.h"

#include <piglit/gl_wrap.h>
#include <piglit/glut_wrap.h>

#define piglit_get_proc_address(x) piglit_dispatch_resolve_function(x)

#include "piglit-framework-gl.h"
#include "piglit-shader.h"

extern const uint8_t fdo_bitmap[];
extern const unsigned int fdo_bitmap_width;
extern const unsigned int fdo_bitmap_height;

extern bool piglit_is_core_profile;

 * Determine if the API is OpenGL ES.
bool piglit_is_gles(void);

 * \brief Get version of OpenGL or OpenGL ES API.
 * Returned version is multiplied by 10 to make it an integer.  So for
 * example, if the GL version is 2.1, the return value is 21.
int piglit_get_gl_version(void);

 * \precondition name is not null
bool piglit_is_extension_supported(const char *name);

 * reinitialize the supported extension List.
void piglit_gl_reinitialize_extensions();

 * \brief Convert a GL error to a string.
 * For example, given GL_INVALID_ENUM, return "GL_INVALID_ENUM".
 * Return "(unrecognized error)" if the enum is not recognized.
const char* piglit_get_gl_error_name(GLenum error);

 * \brief Convert a GL enum to a string.
 * For example, given GL_INVALID_ENUM, return "GL_INVALID_ENUM".
 * Return "(unrecognized enum)" if the enum is not recognized.
const char *piglit_get_gl_enum_name(GLenum param);

 * \brief Convert a string to a GL enum.
 * For example, given "GL_INVALID_ENUM", return GL_INVALID_ENUM.
 * abort() if the string is not recognized.
GLenum piglit_get_gl_enum_from_name(const char *name);

 * \brief Convert a GL primitive type enum value to a string.
 * For example, given GL_POLYGON, return "GL_POLYGON".
 * We don't use piglit_get_gl_enum_name() for this because there are
 * other enums which alias the prim type enums (ex: GL_POINTS = GL_NONE);
 * Return "(unrecognized enum)" if the enum is not recognized.
const char *piglit_get_prim_name(GLenum prim);

 * \brief Check for unexpected GL errors.
 * If glGetError() returns an error other than \c expected_error, then
 * print a diagnostic and return GL_FALSE.  Otherwise return GL_TRUE.
piglit_check_gl_error_(GLenum expected_error, const char *file, unsigned line);

#define piglit_check_gl_error(expected) \
 piglit_check_gl_error_((expected), __FILE__, __LINE__)

 * \brief Drain all GL errors.
 * Repeatly call glGetError and discard errors until it returns GL_NO_ERROR.
void piglit_reset_gl_error(void);

void piglit_require_gl_version(int required_version_times_10);
void piglit_require_extension(const char *name);
void piglit_require_not_extension(const char *name);
unsigned piglit_num_components(GLenum base_format);
bool piglit_get_luminance_intensity_bits(GLenum internalformat, int *bits);
int piglit_probe_pixel_rgb_silent(int x, int y, const float* expected, float *out_probe);
int piglit_probe_pixel_rgba_silent(int x, int y, const float* expected, float *out_probe);
int piglit_probe_pixel_rgb(int x, int y, const float* expected);
int piglit_probe_pixel_rgba(int x, int y, const float* expected);
int piglit_probe_rect_r_ubyte(int x, int y, int w, int h, GLubyte expected);
int piglit_probe_rect_rgb(int x, int y, int w, int h, const float* expected);
int piglit_probe_rect_rgb_silent(int x, int y, int w, int h, const float *expected);
int piglit_probe_rect_rgba(int x, int y, int w, int h, const float* expected);
int piglit_probe_rect_rgba_int(int x, int y, int w, int h, const int* expected);
int piglit_probe_rect_rgba_uint(int x, int y, int w, int h, const unsigned int* expected);
void piglit_compute_probe_tolerance(GLenum format, float *tolerance);
int piglit_compare_images_color(int x, int y, int w, int h, int num_components,
                const float *tolerance,
                const float *expected_image,
                const float *observed_image);
int piglit_probe_image_color(int x, int y, int w, int h, GLenum format, const float *image);
int piglit_probe_image_rgb(int x, int y, int w, int h, const float *image);
int piglit_probe_image_rgba(int x, int y, int w, int h, const float *image);
int piglit_compare_images_ubyte(int x, int y, int w, int h,
                const GLubyte *expected_image,
                const GLubyte *observed_image);
int piglit_probe_image_stencil(int x, int y, int w, int h, const GLubyte *image);
int piglit_probe_image_ubyte(int x, int y, int w, int h, GLenum format,
                 const GLubyte *image);
int piglit_probe_texel_rect_rgb(int target, int level, int x, int y,
                int w, int h, const float *expected);
int piglit_probe_texel_rgb(int target, int level, int x, int y,
               const float* expected);
int piglit_probe_texel_rect_rgba(int target, int level, int x, int y,
                 int w, int h, const float *expected);
int piglit_probe_texel_rgba(int target, int level, int x, int y,
                const float* expected);
int piglit_probe_texel_volume_rgba(int target, int level, int x, int y, int z,
                 int w, int h, int d, const float *expected);
int piglit_probe_pixel_depth(int x, int y, float expected);
int piglit_probe_rect_depth(int x, int y, int w, int h, float expected);
int piglit_probe_pixel_stencil(int x, int y, unsigned expected);
int piglit_probe_rect_stencil(int x, int y, int w, int h, unsigned expected);
int piglit_probe_rect_halves_equal_rgba(int x, int y, int w, int h);

bool piglit_probe_buffer(GLuint buf, GLenum target, const char *label,
             unsigned n, unsigned num_components,
             const float *expected);

int piglit_use_fragment_program(void);
int piglit_use_vertex_program(void);
void piglit_require_fragment_program(void);
void piglit_require_vertex_program(void);
GLuint piglit_compile_program(GLenum target, const char* text);
GLvoid piglit_draw_triangle(float x1, float y1, float x2, float y2,
                float x3, float y3);
GLvoid piglit_draw_triangle_z(float z, float x1, float y1, float x2, float y2,
                  float x3, float y3);
GLvoid piglit_draw_rect_custom(float x, float y, float w, float h,
                   bool use_patches);
GLvoid piglit_draw_rect(float x, float y, float w, float h);
GLvoid piglit_draw_rect_z(float z, float x, float y, float w, float h);
GLvoid piglit_draw_rect_tex(float x, float y, float w, float h,
                            float tx, float ty, float tw, float th);
GLvoid piglit_draw_rect_back(float x, float y, float w, float h);
void piglit_draw_rect_from_arrays(const void *verts, const void *tex,
                  bool use_patches);

unsigned short piglit_half_from_float(float val);

void piglit_escape_exit_key(unsigned char key, int x, int y);

void piglit_gen_ortho_projection(double left, double right, double bottom,
                 double top, double near_val, double far_val,
                 GLboolean push);
void piglit_ortho_projection(int w, int h, GLboolean push);
void piglit_frustum_projection(GLboolean push, double l, double r, double b,
                   double t, double n, double f);
void piglit_gen_ortho_uniform(GLint location, double left, double right,
                  double bottom, double top, double near_val,
                  double far_val);
void piglit_ortho_uniform(GLint location, int w, int h);

GLuint piglit_checkerboard_texture(GLuint tex, unsigned level,
    unsigned width, unsigned height,
    unsigned horiz_square_size, unsigned vert_square_size,
    const float *black, const float *white);
GLuint piglit_miptree_texture(void);
GLfloat *piglit_rgbw_image(GLenum internalFormat, int w, int h,
                           GLboolean alpha, GLenum basetype);
GLubyte *piglit_rgbw_image_ubyte(int w, int h, GLboolean alpha);
GLuint piglit_rgbw_texture(GLenum internalFormat, int w, int h, GLboolean mip,
            GLboolean alpha, GLenum basetype);
GLuint piglit_depth_texture(GLenum target, GLenum format, int w, int h, int d, GLboolean mip);
GLuint piglit_array_texture(GLenum target, GLenum format, int w, int h, int d, GLboolean mip);
GLuint piglit_multisample_texture(GLenum target, GLenum tex,
                  GLenum internalFormat,
                  unsigned width, unsigned height,
                  unsigned depth, unsigned samples,
                  GLenum format, GLenum type, void *data);
extern float piglit_tolerance[4];
void piglit_set_tolerance_for_bits(int rbits, int gbits, int bbits, int abits);
extern void piglit_require_transform_feedback(void);

piglit_get_compressed_block_size(GLenum format,
                 unsigned *bw, unsigned *bh, unsigned *bytes);

piglit_compressed_image_size(GLenum format, unsigned width, unsigned height);

piglit_compressed_pixel_offset(GLenum format, unsigned width,
                   unsigned x, unsigned y);

piglit_visualize_image(float *img, GLenum base_internal_format,
               int image_width, int image_height,
               int image_count, bool rhs);

float piglit_srgb_to_linear(float x);
float piglit_linear_to_srgb(float x);

extern GLfloat cube_face_texcoords[6][4][3];
extern const char *cube_face_names[6];
extern const GLenum cube_face_targets[6];

 * Common vertex program code to perform a model-view-project matrix transform
    "ATTRIB iPos = vertex.position;\n"      \
    "OUTPUT oPos = result.position;\n"      \
    "PARAM  mvp[4] = { state.matrix.mvp };\n"   \
    "DP4    oPos.x, mvp[0], iPos;\n"        \
    "DP4    oPos.y, mvp[1], iPos;\n"        \
    "DP4    oPos.z, mvp[2], iPos;\n"        \
    "DP4    oPos.w, mvp[3], iPos;\n"

 * Handle to a generic fragment program that passes the input color to output
 * \note
 * Either \c piglit_use_fragment_program or \c piglit_require_fragment_program
 * must be called before using this program handle.
extern GLint piglit_ARBfp_pass_through;

static const GLint PIGLIT_ATTRIB_POS = 0;
static const GLint PIGLIT_ATTRIB_TEX = 1;

 * Given a GLSL version number, return the lowest-numbered GL version
 * that is guaranteed to support it.
required_gl_version_from_glsl_version(unsigned glsl_version);

#ifdef __cplusplus
} /* end extern "C" */

#endif /* __PIGLIT_UTIL_GL_H__ */

Most functions are undocumented although there are a lot of examples of how to use it in other piglit tests. Furthermore, once you know which function you need, it is usually straightforward to learn how to call it.

Just to mention a few: you can request extensions (piglit_require_extension()) or a GL version (piglit_require_gl_version()), compile program (piglit_compile_program()), draw a rectangle or triangle, read pixel values and compare them with a list of expected values, check for GL errors, etc.

piglit-shader.h has all the shader-related functions: compile a shader, link a simple program, build (i.e. compile and link) a simple program with vertex and fragment shaders, require a specific GLSL version, etc.

 * Copyright (c) The Piglit project 2007
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * on the rights to use, copy, modify, merge, publish, distribute, sub
 * license, and/or sell copies of the Software, and to permit persons to whom
 * the Software is furnished to do so, subject to the following conditions:
 * The above copyright notice and this permission notice (including the next
 * paragraph) shall be included in all copies or substantial portions of the
 * Software.

#pragma once

 * Null parameters are ignored.
 * \param es Is it GLSL ES?
void piglit_get_glsl_version(bool *es, int* major, int* minor);

GLuint piglit_compile_shader(GLenum target, const char *filename);
GLuint piglit_compile_shader_text_nothrow(GLenum target, const char *text);
GLuint piglit_compile_shader_text(GLenum target, const char *text);
GLboolean piglit_link_check_status(GLint prog);
GLboolean piglit_link_check_status_quiet(GLint prog);
GLint piglit_link_simple_program(GLint vs, GLint fs);
GLint piglit_build_simple_program(const char *vs_source, const char *fs_source);
GLuint piglit_build_simple_program_unlinked(const char *vs_source,
                        const char *fs_source);
GLint piglit_link_simple_program_multiple_shaders(GLint shader1, ...);
GLint piglit_build_simple_program_unlinked_multiple_shaders_v(GLenum target1,
                                 const char*source1,
                                 va_list ap);
GLint piglit_build_simple_program_unlinked_multiple_shaders(GLenum target1,
                               const char *source1,
GLint piglit_build_simple_program_multiple_shaders(GLenum target1,
                          const char *source1,

extern GLboolean piglit_program_pipeline_check_status(GLuint pipeline);
extern GLboolean piglit_program_pipeline_check_status_quiet(GLuint pipeline);

 * Require a specific version of GLSL.
 * \param version Integer version, for example 130
extern void piglit_require_GLSL_version(int version);
/** Require any version of GLSL */
extern void piglit_require_GLSL(void);
extern void piglit_require_fragment_shader(void);
extern void piglit_require_vertex_shader(void);

There are more header files such as piglit-glx-util.h, piglit-matrix.h, piglit-util-egl.h, etc.

Usually, you only need to add piglit-util-gl.h to your source code, however I recommend you to browse through tests/util/ so you find out all the available functions that piglit provides.


A complete example of how a piglit binary test looks like is ARB_uniform_buffer_object rendering test.

 * Copyright (c) 2014 VMware, Inc.
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 * The above copyright notice and this permission notice (including the next
 * paragraph) shall be included in all copies or substantial portions of the
 * Software.

/** @file rendering.c
 * Test rendering with UBOs.  We draw four squares with different positions,
 * sizes, rotations and colors where those parameters come from UBOs.

#include "piglit-util-gl.h"


    config.supports_gl_compat_version = 20;


static const char vert_shader_text[] =
    "#extension GL_ARB_uniform_buffer_object : require\n"
    "layout(std140) uniform;\n"
    "uniform ub_pos_size { vec2 pos; float size; };\n"
    "uniform ub_rot {float rotation; };\n"
    "void main()\n"
    "   mat2 m;\n"
    "   m[0][0] = m[1][1] = cos(rotation); \n"
    "   m[0][1] = sin(rotation); \n"
    "   m[1][0] = -m[0][1]; \n"
    "   gl_Position.xy = m * gl_Vertex.xy * vec2(size) + pos;\n"
    " = vec2(0, 1);\n"

static const char frag_shader_text[] =
    "#extension GL_ARB_uniform_buffer_object : require\n"
    "layout(std140) uniform;\n"
    "uniform ub_color { vec4 color; float color_scale; };\n"
    "void main()\n"
    "   gl_FragColor = color * color_scale;\n"

#define NUM_SQUARES 4
#define NUM_UBOS 3

/* Square positions and sizes */
static const float pos_size[NUM_SQUARES][3] = {
    { -0.5, -0.5, 0.1 },
    {  0.5, -0.5, 0.2 },
    { -0.5, 0.5, 0.3 },
    {  0.5, 0.5, 0.4 }

/* Square color and color_scales */
static const float color[NUM_SQUARES][8] = {
    { 2.0, 0.0, 0.0, 1.0,   0.50, 0.0, 0.0, 0.0 },
    { 0.0, 4.0, 0.0, 1.0,   0.25, 0.0, 0.0, 0.0 },
    { 0.0, 0.0, 5.0, 1.0,   0.20, 0.0, 0.0, 0.0 },
    { 0.2, 0.2, 0.2, 0.2,   5.00, 0.0, 0.0, 0.0 }

/* Square rotations */
static const float rotation[NUM_SQUARES] = {

static GLuint prog;
static GLuint buffers[NUM_UBOS];
static GLint alignment;
static bool test_buffer_offset = false;

static void
    static const char *names[NUM_UBOS] = {
    static GLubyte zeros[1000] = {0};
    int i;

    glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &alignment);
    printf("GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT = %d\n", alignment);

    if (test_buffer_offset) {
        printf("Testing buffer offset %d\n", alignment);
    else {
        /* we use alignment as the offset */
        alignment = 0;

    glGenBuffers(NUM_UBOS, buffers);

    for (i = 0; i < NUM_UBOS; i++) {
        GLint index, size;

        /* query UBO index */
        index = glGetUniformBlockIndex(prog, names[i]);

        /* query UBO size */
        glGetActiveUniformBlockiv(prog, index,
                      GL_UNIFORM_BLOCK_DATA_SIZE, &size);

        printf("UBO %s: index = %d, size = %d\n",
               names[i], index, size);

        /* Allocate UBO */
        /* XXX for some reason, this test doesn't work at all with
         * nvidia if we pass NULL instead of zeros here.  The UBO data
         * is set/overwritten in the piglit_display() function so this
         * really shouldn't matter.
        glBindBuffer(GL_UNIFORM_BUFFER, buffers[i]);
        glBufferData(GL_UNIFORM_BUFFER, size + alignment,
                             zeros, GL_DYNAMIC_DRAW);

        /* Attach UBO */
        glBindBufferRange(GL_UNIFORM_BUFFER, i, buffers[i],
                  alignment,  /* offset */
        glUniformBlockBinding(prog, index, i);

        if (!piglit_check_gl_error(GL_NO_ERROR))

piglit_init(int argc, char **argv)

    if (argc > 1 && strcmp(argv[1], "offset") == 0) {
        test_buffer_offset = true;

    prog = piglit_build_simple_program(vert_shader_text, frag_shader_text);


    glClearColor(0.2, 0.2, 0.2, 0.2);

static bool
probe(int x, int y, int color_index)
    float expected[4];

    /* mul color by color_scale */
    expected[0] = color[color_index][0] * color[color_index][4];
    expected[1] = color[color_index][1] * color[color_index][4];
    expected[2] = color[color_index][2] * color[color_index][4];
    expected[3] = color[color_index][3] * color[color_index][4];

    return piglit_probe_pixel_rgba(x, y, expected);

enum piglit_result
    bool pass = true;
    int x0 = piglit_width / 4;
    int x1 = piglit_width * 3 / 4;
    int y0 = piglit_height / 4;
    int y1 = piglit_height * 3 / 4;
    int i;

    glViewport(0, 0, piglit_width, piglit_height);


    for (i = 0; i < NUM_SQUARES; i++) {
        /* Load UBO data, at offset=alignment */
        glBindBuffer(GL_UNIFORM_BUFFER, buffers[0]);
        glBufferSubData(GL_UNIFORM_BUFFER, alignment, sizeof(pos_size[0]),
        glBindBuffer(GL_UNIFORM_BUFFER, buffers[1]);
        glBufferSubData(GL_UNIFORM_BUFFER, alignment, sizeof(color[0]),
        glBindBuffer(GL_UNIFORM_BUFFER, buffers[2]);
        glBufferSubData(GL_UNIFORM_BUFFER, alignment, sizeof(rotation[0]),

        if (!piglit_check_gl_error(GL_NO_ERROR))
            return PIGLIT_FAIL;

        piglit_draw_rect(-1, -1, 2, 2);

    pass = probe(x0, y0, 0) && pass;
    pass = probe(x1, y0, 1) && pass;
    pass = probe(x0, y1, 2) && pass;
    pass = probe(x1, y1, 3) && pass;


    return pass ? PIGLIT_PASS : PIGLIT_FAIL;

The source starts with the license header and describing what the test does. Following those, it’s the config setup for piglit: request doube buffer, RGBA pixel format and GL compat version 2.0.

Furthermore, this test defines GLSL shaders, the global data and helper functions as any other OpenGL C-language program. Notice that setup_ubos() include calls to OpenGL API but also calls to piglit_check_gl_error() and piglit_report_result() which are used to check for any OpenGL error and tell piglit that there was a failure, respectively.

Following the structure introduced before, piglit_init() indicates that ARB_uniform_buffer_object extension is required, it builds the program with the aforementioned shaders, setups the uniform buffer objects and sets the clear color.

Finally in piglit_display() is where the relevant content is placed. Among other things, it loads UBO’s data, draw a rectangle and check that the rendered pixels have the same values as the expected ones. Depending of the result of that check, it will report to piglit that the test was a success or a failure.

How to add a binary test to piglit

How to build it

Now that you have written the source code of your test, it’s time to learn how to build it. As we have seen in an earlier post, piglit uses cmake for generating binaries.

First you need to check if cmake tracks the directory where your source code is. If you create a new extension subdirectory under tests/spec, modify this CMakeLists.txt file and add yours.

Once you have done that, cmake needs to know that there are binaries to build and where to get its source code. For doing that, create a CMakeLists.txt file in your test’s directory with the following content:


If you have binaries inside directories pending the former, you can add them to the same CMakeLists.txt file you are creating:

add_subdirectory (compiler)
add_subdirectory (execution)
add_subdirectory (linker)

But this is not enough to compile your test as we have not specified which binary is and where to get its source code. We do that by creating a new file with a similar content than this one.


link_libraries (

piglit_add_executable (arb_shader_storage_buffer_object-minmax minmax.c)

# vim: ft=cmake:

As you see, first we declare where to find the needed headers and libraries. Then we define the binary name (arb_shader_storage_buffer_object-minmax) and which is its source code file (minmax.c).

And that’s it. If everything is fine, next time you run cmake . && make in piglit’s directory, piglit will build the test and place it inside bin/ directory.

Example in piglit repository

Let’s review a real example of how a test was added for a new extension in piglit (this commit). As you see, it added tests/spec/arb_robustness/ subdirectory to tests/spec/CMakeLists.txt, create a tests/spec/arb_robustness/CMakeLists.txt to indicate cmake to track this directory, tests/spec/arb_robustness/ to compile the binary test file and add its source code file tests/spec/arb_robustness/draw-vbo-bounds.c.

tests/spec/CMakeLists.txt | 1 +
 tests/spec/arb_robustness/ | 19 +++++++++++++++++++
 tests/spec/arb_robustness/CMakeLists.txt | 1 +
 tests/spec/arb_robustness/draw-vbo-bounds.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 226 insertions(+)

If you run git log tests/spec/<dir> command in other directories, you will find similar commits.

How to run it with profile

Once you successfully build the binary test program you can run it standalone. However, it’s better to add it to tests/ profile so it is executed automatically by this profile.

Open tests/  and look for your extension name, then add your test by looking at  how other tests are defined and adapt it to your case. In case you are developing tests for a new extension, you have to add more lines to tests/ For example this is how was done for ARB_shader_storage_buffer_object extension:

with profile.group_manager(
        grouptools.join('spec', 'arb_shader_storage_buffer_object')) as g:
    g(['arb_shader_storage_buffer_object-minmax'], 'minmax')

These lines make several things: indicate under which extension you will find the results, which are the binaries to run and which short name will appear in the summaries for each binary.

ARB_shader_storage_buffer_object's minmax test result in HTML summaryARB_shader_storage_buffer_object’s minmax test in HTML summary


This post and last one were about how to create your own piglit tests. I hope you will find them useful and I will be glad to see new contributors because of them :-)

Next post will talk about how to send your contributions to piglit and a table of contents so you can easily jump to any post of this series.

June 05, 2015

I've recently found myself explaining polkit (formerly PolicyKit) to one of Collabora's clients, and thought that blogging about the same topic might be useful for other people who are confused by it; so, here is why udisks2 and polkit are the way they are.

As always, opinions in this blog are my own, not Collabora's.

Privileged actions

Broadly, there are two ways a process can do something: it can do it directly (i.e. ask the kernel directly), or it can use inter-process communication to ask a service to do that operation on its behalf. If it does it directly, the components that say whether it can succeed are the Linux kernel's normal permissions checks (DAC), and if configured, AppArmor, SELinux or a similar MAC layer. All very simple so far.

Unfortunately, the kernel's relatively coarse-grained checks are not sufficient to express the sorts of policies that exist on a desktop/laptop/mobile system. My favourite example for this sort of thing is mounting filesystems. If I plug in a USB stick with a FAT filesystem, it's reasonable to expect my chosen user interface to either mount it automatically, or let me press a button to mount it. Similarly, to avoid data loss, I should be able to unmount it when I'm finished with it. However, mounting and unmounting a USB stick is fundamentally the same system call as mounting and unmounting any other filesystem - and if ordinary users can do arbitrary mount system calls, they can cause all sorts of chaos, for instance by mounting a filesystem that contains setuid executables (privilege escalation), or umounting a critical OS filesystem like /usr (denial of service). Something needs to arbitrate: “you can mount filesystems, but only under certain conditions”.

The kernel developer motto for this sort of thing is “mechanism, not policy”: they are very keen to avoid encoding particular environments' policies (the sort of thing you could think of as “business rules”) in the kernel, because that makes it non-generic and hard to maintain. As a result, direct mount/unmount actions are only allowed by privileged processes, and it's up to user-space processes to arrange for a privileged process to make the desired mount syscall.

Here are some other privileged actions which laptop/desktop users can reasonably expect to “just work”, with or without requiring a sysadmin-like (root-equivalent) user:

  • reconfiguring networking (privileged because, in the general case, it's an availability and potentially integrity issue)
  • installing, upgrading or removing packages (privileged because, in the general case, it can result in arbitrary root code execution)
  • suspending or shutting down the system (privileged because you wouldn't want random people doing this on your server, but should normally be allowed on e.g. laptops for people with physical access, because they could just disconnect the power anyway)

In environments that use a MAC framework like AppArmor, actions that would normally be allowed can become privileged: for instance, in a framework for sandboxed applications, most apps shouldn't be allowed to record audio. This prevents carrying out these actions directly, again resulting in the only way to achieve them being to ask a service to carry out the action.

Ask a system service to do it

On to the next design, then: I can submit a request to a privileged process, which does some checks to make sure I'm not trying to break the system (or alternatively, that I have enough sysadmin rights that I'm allowed to break the system if I want to), and then does the privileged action for me.

You might think I'm about to start discussing D-Bus and daemons, but actually, a prominent earlier implementation of this was mount(8), which is normally setuid root:

% ls -l /bin/mount
-rwsr-xr-x 1 root root 40000 May 22 11:37 /bin/mount

If you look at it from an odd angle, this is inter-process communication across a privilege boundary: I run the setuid executable, creating a process. Because the executable has the setuid bit set, the kernel makes the process highly privileged: its effective uid is root, and it has all the necessary capabilities to mount filesystems. I submit the request by passing it in the command-line arguments. mount does some checks - specifically, it looks in /etc/fstab to see whether the filesystem I'm trying to mount has the “user” or “users” flag - then carries out the mount system call.

There are a few obvious problems with this:

  • When machines had a static set of hardware devices (and a sysadmin who knew how to configure them), it might have made sense to list them all in /etc/fstab; but this is not a useful solution if you can plug in any number of USB drives, or if you are a non-expert user with Linux on your laptop. The decision ought to be based on general attributes of devices, such as “is removable?”, and on the role of the machine.
  • Setuid executables are alarmingly easy to get wrong so it is not necessarily wise to assume that mount(8) is safe to be setuid.
  • One fact that a reasonable security policy might include is “users who are logged in remotely should have less control over physically present devices than those who are physically present” - but that sort of thing can't be checked by mount(8) without specifically teaching the mount binary about it.

Ask a system service to do it, via D-Bus or other IPC

To avoid the issues of setuid, we could use inter-process communication in the traditional sense: run a privileged daemon (on boot or on-demand), make it listen for requests, and use the IPC channel as our privilege boundary.

udisks2 is one such privileged daemon, which uses D-Bus as its IPC channel. D-Bus is a commonly-used inter-process system; one of its intended/designed uses is to let user processes and system services communicate, especially this sort of communication between a privileged daemon and its less-privileged clients.

People sometimes criticize D-Bus as not doing anything you couldn't do yourself with some AF_UNIX sockets. Well, no, of course it doesn't - the important bit of the reference implementation and the various interoperable reimplementations consists of a daemon and some AF_UNIX sockets, and the rest is a simple matter of programming. However, it's sufficient for most uses in its problem space, and is usually better than inventing your own.

The advantage of D-Bus over doing your own thing is precisely that you are not doing your own thing: good IPC design is hard, and D-Bus makes some structural decisions so that fewer application authors have to think about them. For instance, it has a central “hub” daemon (the dbus-daemon, or “message bus”) so that n communicating applications don't need O(n²) sockets; it uses the dbus-daemon to provide a total message ordering so you don't have to think about message reordering; it has a distributed naming model (which can also be used as a distributed mutex) so you don't have to design that; it has a serialization format and a type system so you don't have to design one of those; it has a framework for “activating" run-on-demand daemons so they don't have to use resources initially, implemented using a setuid helper and/or systemd; and so on.

If you have religious objections to D-Bus, you can mentally replace “D-Bus” with “AF_UNIX or something” and most of this article will still be true.

Is this OK?

In either case - exec'ing a privileged helper, or submitting a request to a privileged daemon via IPC - the privileged process has two questions that it needs to answer before it does its work:

  • what am I being asked to do?
  • should I do it?

It needs to make some sort of decision on the latter based on the information available to it. However, before we even get there, there is another layer:

  • did the request get there at all?

In the setuid model, there is a simple security check that you can apply: you can make /bin/mount only executable by a particular group, or only executable by certain AppArmor profiles, or similar. That works up to a point, but cannot distinguish between physically-present and not-physically-present users, or other facts that might be interesting to your local security policy. Similarly, in the IPC model, you can make certain communication channels impossible, for instance by using dbus-daemon's ability to decide which messages to deliver, or AF_UNIX sockets' filesystem permissions, or a MAC framework like AppArmor.

Both of these are quite “coarse-grained” checks which don't really understand the finer details of what is going on. If the answer to "is this safe?” is something of the form “maybe, it depends on...”, then they can't do the right thing: they must either let it through and let the domain-specific privileged process do the check, or deny it and lose potentially useful functionality.

For instance, in an AppArmor environment, some applications have absolutely no legitimate reason to talk to udisks2, so the AppArmor policy can just block it altogether. However, once again, this is a coarse-grained check: the kernel has mechanism, not policy, and it doesn't know what the service does or why. If the application does need to be able to talk to the service at all, then finer-grained access control (obeying some, but not all, requests) has to be the service's job.

dbus-daemon does have the ability to match messages in a relatively fine-grained way, based on the object path, interface and member in the message, as well as the routing information that it uses itself (i.e. the source and destination). However, it is not clear that this makes a great deal of sense conceptually: these are facts about the mechanics of the IPC, not facts about the domain-specific request (because the mechanics of the IPC are all that dbus-daemon understands). For instance, taking the udisks2 example again, dbus-daemon can't distinguish between an attempt to adjust mount options for a USB stick (probably fine) and an attempt to adjust mount options for /usr (not good).

To have a domain-specific security policy, we need a domain-specific component, for instance udisks2, to get involved. Unlike dbus-daemon, udisks2 knows that not all disks are equal, knows which categories make sense to distinguish, and can identify which categories a particular disk is in. So udisks2 can make a more informed decision.

So, a naive approach might be to write a function in udisks2 that looks something like this pseudocode:

may_i_mount_this_disk (user, disk, mount options) → boolean
    if (user is root || user is root-equivalent)
        return true;

    if (disk is not removable)
       return false;

    if (mount options are scary)
       return false;

    if (user is in “manipulate non-local disks” group)
        return true;

    if (user is not logged-in locally)
       return false;

    if (user is not logged-in on the same seat where the disk is
            plugged in)
       return false;

    return true;

Delegating the security policy to something central

The pseudocode security policy outlined above is reasonably complicated already, and doesn't necessarily cover everything that you might want to consider.

Meanwhile, not every system is the same. A general-purpose Linux distribution like Debian might run on server/mainframe systems with only remote users, personal laptops/desktops with one root-equivalent user, locked-down corporate laptops/desktops, mobile devices and so on; these systems should not necessarily all have the same security policy.

Another interesting factor is that for some privileged operations, you might want to carry out interactive authorization: ask the requesting user to confirm that the action (which might have come from a background process) should take place (like Windows' UAC), or to prove that the person currently at the keyboard is the same as the person who logged in by giving their password (like sudo).

We could in principle write code for all of this in udisks2, and in NetworkManager, and in systemd, ... - but that clearly doesn't scale, particularly if you want the security policy to be configurable. Enter polkit (formerly PolicyKit), a system service for applying security policies to actions.

The way polkit works is that the application does its domain-specific analysis of the request - in the case of udisks2, whether the device to be mounted is removable, whether the mount options are reasonable, etc. - and converts it into an action. The action gives polkit a way to distinguish between things that are conceptually different, without needing to know the specifics. For instance, udisks2 currently divides up filesystem-mounting into org.freedesktop.udisks2.filesystem-mount, org.freedesktop.udisks2.filesystem-mount-fstab, org.freedesktop.udisks2.filesystem-mount-system and org.freedesktop.udisks2.filesystem-mount-other-seat.

The application also finds the identity of the user making the request. Next, the application sends the action, the identity of the requesting user, and any other interesting facts to polkit. As currently implemented, polkit is a D-Bus service, so this is an IPC request via D-Bus. polkit consults its database of policies in order to choose one of several results:

  • yes, allow it
  • no, do not allow it
  • ask the user to either authenticate as themselves or as a privileged (sysadmin) user to allow it, or cancel authentication to not allow it
  • ask the user to authenticate the first time, but if they do, remember that for a while and don't ask again

So how does polkit decide this? The first thing is that it reads the machine-readable description of the actions, in /usr/share/polkit-1/actions, which specifies a default policy. Next, it evaluates a local security policy to see what that says. In the current version of polkit, the local security policy is configured by writing JavaScript in /etc/polkit-1/rules.d (local policy) and /usr/share/polkit-1/rules.d (OS-vendor defaults). In older versions such as the one currently shipped in Debian unstable, there was a plugin architecture; but in practice nobody wrote plugins for it, and instead everyone used the example local authority shipped with polkit, which was configured via files in /etc/polkit-1/localauthority and /etc/polkit-1/localauthority.d.

These policies can take into account useful facts like:

  • what is the action we're talking about?
  • is the user logged-in locally? are they active, i.e. they are not just on a non-current virtual console?
  • is the user in particular groups?

For instance, gnome-control-center on Debian installs this snippet:

polkit.addRule(function(action, subject) {
        if (( == "org.freedesktop.locale1.set-locale" ||
    == "org.freedesktop.locale1.set-keyboard" ||
    == "org.freedesktop.hostname1.set-static-hostname" ||
    == "org.freedesktop.hostname1.set-hostname" ||
    == "org.gnome.controlcenter.datetime.configure") &&
            subject.local &&
            subject.isInGroup ("sudo")) {
                    return polkit.Result.YES;

which is reasonably close to being pseudocode for “active local users in the sudo group may set the system locale, keyboard layout, hostname and time, without needing to authenticate”. A system administrator could of course override that by dropping a higher-priority policy for some or all of these actions into /etc/polkit-1/rules.d.


  • Kernel-based permission checks are not sufficiently fine-grained to be able to express some quite reasonable security policies
  • Fine-grained access control needs domain-specific understanding
  • The kernel doesn't have that information (and neither does dbus-daemon)
  • The privileged service that does the domain-specific thing can provide the domain-specific understanding to turn the request into an action
  • polkit evaluates a configurable policy to determine whether privileged services should carry out requested actions
June 04, 2015

libinput provides a number of different out-of-the-box configurations, based on capabilities. For example: middle mouse button emulation is enabled by default if a device only has left and right buttons. On devices with a physical middle button it is available but disabled by default. Likewise, whether tapping is enabled and/or available depends on hardware capabilities. But some requirements cannot be gathered purely by looking at the hardware capabilities.

libinput uses a couple of udev properties, assigned through udev's hwdb, to detect device types. We use the same mechanism to provide us with specific tags to adjust libinput-internal behaviour. The udev properties named LIBINPUT_MODEL_.... tag devices based on a set of udev rules combined with hwdb matches. For example, we tag Chromebooks with LIBINPUT_MODEL_CHROMEBOOK.

Inside libinput, we parse those tags and use them for model-specific configuration. At the time of writing this, we use the chromebook tag to automatically enable clickfinger behaviour on those touchpads (which matches the google defaults on chromebooks). We tag the Lenovo X230 touchpad to give it it's own acceleration method. This touchpad is buggy and the data it sends has a very bad resolution.

In the future these tags will likely expand and encompass more devices that need customised tweaks. But the goal is always that libinput works well out of the box, even if the hardware is quirky. Introducing these tags instead of a sleigh of configuration options has short-term drawbacks: it increases the workload on us maintainers and it may require software updates to get a device to work exactly the way it should. The long-term benefits are maintainability and testability though, as well as us being more aware of what hardware is out there and how it needs to be fixed. Plus the relief of not having to deal with configuration snippets that are years out of date, do all the wrong things but still spread across forums like an STD.

Note: the tags are considered private API and may change at any time, depending what we want or need to do with them. Do not use them for home-made configuration.

June 03, 2015

libinput uses udev tags to determine what a device is. This is a significant difference to the X.Org stack which determines how to deal with a device based on an elaborate set of rules, rules grown over time, matured, but with a slight layer of mould on top by now. In evdev's case that is understandable, it stems from a design where you could just point it at a device in your xorg.conf and it'd automagically work, well before we had even input hotplugging in X. What it leads to now though is that the server uses slightly different rules to decide what a device is (to implement MatchIsTouchscreen for example) than evdev does. So you may have, in theory, a device that responds to MatchIsTouchscreen only to set itself up as keyboard.

libinput does away with this in two ways: it punts most of the decisions on what a device is to udev and its ID_INPUT_... properties. A device marked as ID_INPUT_KEYBOARD will initialize a keyboard interface, an ID_INPUT_TOUCHPAD device will initialize a touchpad backend. The obvious advantage of this is that we only have one place where we have generic device type rules. The second advantage is that where this one place isn't precise enough, it's simple to override with custom rule sets. For example, Wacom tablets are hard to categorise just by looking at the device alone. libwacom generates a udev rule containing the VID/PID of each known device with the right ID_INPUT_TABLET etc. properties.

This is a libinput-internal behaviour. Externally, we are a lot more vague. In fact, we don't tell you at all what a device is, other than what events it will send (pointer, keyboard, or touch). We have thought about implementing some sort of device identifier and the conclusion is that we won't implement this as part of libinput's API because it will simply be wrong some of the time. And if it's wrong, it requires the caller to re-implement something on top of it. At which point the caller may as well implement all of it instead. Why do we expect it to be wrong? Because libinput doesn't know the exact context that requires a device to be labelled as a specific type.

Take a keyboard for example. There are a great many devices that send key events. To the client a keyboard may be any device that can get an XKB layout and is used for typing. But to the compositor, a keyboard may be anything that can send a few specific keycodes. A device with nothing but KEY_POWER? That's enough for the compositor to decide to shut down but that device may not otherwise work as a keyboard. libinput can't know this context. But what libinput provides is the API to query information. libinput_device_pointer_has_button() and libinput_device_keyboard_has_key() are the two candidates here to query about a specific set of buttons and/or keys.

Touchpads, trackpoints and mice all look send pointer events and there is no flag that tells you the device type and that is intentional. libinput doesn't have any intrinsic knowledge about what is a touchpad, we take the ID_INPUT_TOUCHPAD tag. At best, we refuse some devices that were clearly mislabelled but we don't init devices as touchpads that aren't labelled as such. Any device type identification would likely be wrong - for example some Wacom tablets are touchpads internally but would be considered tablets in other contexts.

So in summary, callers are encouraged to rely on the udev information and other bits they can pull from the device to group it into the semantically correct device type. libinput_device_get_udev_device() provides a udev handle for a libinput device and all configurable features are query-able (e.g. "does this device support tapping?"). libinput will not provide a device type because it would likely be wrong in the current context anyway.

June 02, 2015

TLDR: as of libinput 0.16 you can end a touchpad tap-and-drag with a final additional tap

libinput originally only supported single-tap and double-tap. With version 0.15 we now support multi-tap, so you can tap repeatedly to get a triple, quadruple, etc. click. This is quite useful in text editors where a triple click highlights a line, four clicks highlight a paragraph, and 28 clicks order a new touchpad from ebay. Multi-tap also works with drag-and drop, so a triple tap followed by a finger down and hold will send three clicks followed by a single click.

We also support continuous tap-and-drag which is something synaptics didn't support provided with the LockedDrags option: Once the user is in dragging mode (x * tap + hold finger down) they can lift the finger and set it down again without the drag being interrupted. This is quite useful when you have to move across the screen, especially on smaller touchpads or for users that prefer a slow acceleration.

Of course, this adds a timeout to the drag release since we need to wait and see whether the finger comes down again. To help accelerate this, we added a tap-to-release feature (contributed by Velimir Lisec): once in drag mode a final tap will release the button immediately. This is something that OS X has supported for years and after a bit of muscle memory retraining it becomes second nature quickly. So the new timeout-free way to tap-and-drag on a touchpad is now:

tap, finger-down, move, .... move, finger up, tap

Update 03/06/25: add synaptics LockedDrag option reference

I don't often write about tools I use when for my daily software development tasks. I recently realized that I really should start to share more often my workflows and weapons of choice.

One thing that I have a hard time enduring while doing Python code reviews, is people writing utility code that is not directly tied to the core of their business. This looks to me as wasted time maintaining code that should be reused from elsewhere.

So today I'd like to start with retrying, a Python package that you can use to… retry anything.

It's OK to fail

Often in computing, you have to deal with external resources. That means accessing resources you don't control. Resources that can fail, become flapping, unreachable or unavailable.

Most applications don't deal with that at all, and explode in flight, leaving a skeptical user in front of the computer. A lot of software engineers refuse to deal with failure, and don't bother handling this kind of scenario in their code.

In the best case, applications usually handle simply the case where the external reached system is out of order. They log something, and inform the user that it should try again later.

In this cloud computing area, we tend to design software components with service-oriented architecture in mind. That means having a lot of different services talking to each others over the network. And we all know that networks tend to fail, and distributed systems too. Writing software with failing being part of normal operation is a terrific idea.


In order to help applications with the handling of these potential failures, you need a plan. Leaving to the user the burden to "try again later" is rarely a good choice. Therefore, most of the time you want your application to retry.

Retrying an action is a full strategy on its own, with a lot of options. You can retry only on certain condition, and with the number of tries based on time (e.g. every second), based on a number of tentative (e.g. retry 3 times and abort), based on the problem encountered, or even on all of those.

For all of that, I use the retrying library that you can retrieve easily on PyPI.

retrying provides a decorator called retry that you can use on top of any function or method in Python to make it retry in case of failure. By default, retry calls your function endlessly until it returns rather than raising an error.

import random
from retrying import retry
def pick_one():
if random.randint(0, 10) != 1:
raise Exception("1 was not picked")

This will execute the function pick_one until 1 is returned by random.randint.

retry accepts a few arguments, such as the minimum and maximum delays to use, which also can be randomized. Randomizing delay is a good strategy to avoid detectable pattern or congestion. But more over, it supports exponential delay, which can be used to implement exponential backoff, a good solution for retrying tasks while really avoiding congestion. It's especially handy for background tasks.

@retry(wait_exponential_multiplier=1000, wait_exponential_max=10000)
def wait_exponential_1000():
print "Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards"
raise Exception("Retry!")

You can mix that with a maximum delay, which can give you a good strategy to retry for a while, and then fail anyway:

# Stop retrying after 30 seconds anyway
>>> @retry(wait_exponential_multiplier=1000, wait_exponential_max=10000, stop_max_delay=30000)
... def wait_exponential_1000():
... print "Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards"
... raise Exception("Retry!")
>>> wait_exponential_1000()
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Wait 2^x * 1000 milliseconds between each retry, up to 10 seconds, then 10 seconds afterwards
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/usr/local/lib/python2.7/site-packages/", line 212, in call
raise attempt.get()
File "/usr/local/lib/python2.7/site-packages/", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/local/lib/python2.7/site-packages/", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "<stdin>", line 4, in wait_exponential_1000
Exception: Retry!

A pattern I use very often, is the ability to retry only based on some exception type. You can specify a function to filter out exception you want to ignore or the one you want to use to retry.

def retry_on_ioerror(exc):
return isinstance(exc, IOError)
def read_file():
with open("myfile", "r") as f:

retry will call the function passed as retry_on_exception with the exception raised as first argument. It's up to the function to then return a boolean indicating if a retry should be performed or not. In the example above, this will only retry to read the file if an IOError occurs; if any other exception type is raised, no retry will be performed.

The same pattern can be implemented using the keyword argument retry_on_result, where you can provide a function that analyses the result and retry based on it.

def retry_if_file_empty(result):
return len(result) <= 0
def read_file():
with open("myfile", "r") as f:

This example will read the file until it stops being empty. If the file does not exist, an IOError is raised, and the default behavior which triggers retry on all exceptions kicks-in – the retry is therefore performed.

That's it! retry is really a good and small library that you should leverage rather than implementing your own half-baked solution!

May 29, 2015

In earlier posts I talked about how to install piglit on your system and run the first time and how to tailor your piglit run. I have given a talk in FOSDEM 2015 about how to test your OpenGL drivers using Free Software where I explained how to run piglit and dEQP. This post and the next one are going to introduce the topic of creating new tests for piglit, as this was not covered before in this post series.

Brief introduction

Before start talking about how to create a GLSL test, let me explain you a couple of things about how piglit organizes its tests.

First is that all the tests are defined inside tests/ directory.

  • The ones related to a specific spec (like OpenGL extension, GL version, GLSL version, etc) are placed inside tests/spec subdirectory.
  • Shader tests are usually defined inside  tests/shaders/ directory, but not always. If they are specific to a OpenGL extension, they will be in the its subdirectory (see first bullet).
  • Tests specific to other APIs: tests/egl, tests/cl, etc.
  • Etc.

Take your time browsing all these directories and figure out the best place for your tests or, if you are looking for a specific test, where it is likely placed.

Second thing is that binary tests should be defined inside tests/ to be executed in a full piglit run. This is not the case for GLSL tests or shader runner tests as we are going to see in this post.

The most common language to write these tests is C but there are a couple of alternatives if you are interested on writing GLSL shader tests. Let’s start with a couple of examples of GLSL shader tests as it is the easiest way to contribute new tests to piglit.

GLSL compiler test

When creating a GLSL compiler test, most of the time you can avoid to write a C code example with all those bells and whistles. Actually, it is pretty straightforward if you just want to check that a GLSL shader compiles successfully or not according to your expectations.

Usually, the rule of thumb is to keep GLSL tests simple and focused to detect failures (or success) in shaders’ compilation. For example, you want to check that defining a shader storage buffer block is only possible if ARB_shader_storage_buffer_object extension is enabled in the driver.

#version 120
#extension GL_ARB_shader_storage_buffer_object: disable

buffer ssbo {
    vec4 a;

void foo(void) {

Once you write the GLSL shader, save it to a file named like extension-disabled-shader-storage-block.frag. There are several suffixes that you can use depending of the GLSL shader type: .vert for vertex shader, .geom for geometry shader, .frag for fragment shader. Notice that the name itself is a summary of what the test does, so other people can find what it does without opening the file.

There is a piglit binary called glslparser that it is going to pick this shader and compile it checking for errors. This binary needs some parameters to know how to run this test and the expected result, so we provide them. Add at the top of the shader the following content:

// [config]
// expect_result: fail
// glsl_version: 1.20
// require_extensions: GL_ARB_shader_storage_buffer_object
// [end config]

There we write down that we expect the test to fail, which is the minimum supported GLSL version to run this test and the required extensions. At this case we need GLSL 1.20 version and GL_ARB_shader_storage_buffer_object extension.

// [config]
// expect_result: fail
// glsl_version: 1.20
// require_extensions: GL_ARB_shader_storage_buffer_object
// [end config]

#version 120
#extension GL_ARB_shader_storage_buffer_object: disable

buffer ssbo {
    vec4 a;

void foo(void) {

Once you have it, you should save it in the proper place. The directory will be the same than the extension name it is going to test, usually it is saved in a subdirectory called compiler because we are testing the GLSL shader compilation, for this example: tests/spec/arb_shader_storage_buffer_object/compiler.

And that’s it. Next time you run piglit with tests/ profile, one script will find this test, parse it and execute glslparser with the provided configuration checking if the result of your shader is the expected one or not.

You can execute it manually by running the following command:

$ bin/glslparser tests/spec/arb_shader_storage_buffer_object/compiler/extension-disabled-shader-storage-block.frag fail 1.20 GL_ARB_shader_storage_buffer_object

As you see, the last arguments come from the config we defined in the test file but we added them by hand.

It is possible to link a GLSL shader and look for errors by using check_link with true or false, depending if you expect the test to succeed or to fail at link time:

// check_link: true

Nevertheless, it only links one shader. If you want to link more than one shader, I recommend you to use another type of piglit tests (see next section).

As you see, you can easily write new GLSL shader tests to verify some bits of a specification: valid/invalid keywords, initialization, special cases explained in the spec, etc.

GLSL shader linker test

Sometimes compiling/linking only one shader is not enough, there are cases that you want to compile and link different but related shaders: link a vertex and a fragment shader or a geometry shader between them, etc. In the previous example, we saw that the GLSL shader test does not link different shaders (actually there is only one). When there are several GLSL shaders, you need to use another type of piglit tests: shader_runner tests.

As usual, you first start writing the GLSL shaders. In some cases, you can substitute one shader by its pass-through counterpart to avoid writing “useless” shaders when there is not any user provided interface dependency.

Let’s start with an example of a pass-through vertex shader:

# ARB_gpu_shader5 spec says:
#   "If an implementation supports  vertex streams, the individual
#   streams are numbered 0 through -1"
# This test verifies that a link error occurs if EmitStreamVertex()
# is called with a stream value which is negative.

GLSL >= 1.50

[vertex shader passthrough]

[geometry shader]

#extension GL_ARB_gpu_shader5 : enable

layout(points) in;
layout(points, max_vertices=3) out;

void main()
    gl_Position = vec4(1.0, 1.0, 1.0, 1.0);

[fragment shader]

out vec4 color;

void main()
  color = vec4(0.0, 1.0, 0.0, 1.0);

link error

The file starts with a comment describing what the test does and which spec (or part of it) is checking.

Next step is very similar to the previous example: specify the minimum GL and/or GLSL version required and the extensions needed for the test execution. This information is written in the [require] section.

Then the different GLSL shader definitions start: vertex shader is pass-through as it has no user-defined interface dependency to next stage (geometry shader at this case). After it, it defines the  geometry and fragment shaders and finally the test commands to run. In this case, we expect the test to fail at link time.

However you are not limited to link checking, there are other available commands to run in the test commands part:

    • Load uniform values
      • uniform <type> <name> <value>

uniform vec4 color 0.0 0.5 0.0 0.0

    • Draw rectangles
      • draw rect <coordinates>

draw rect -1 -1 2 2

    • Probe a pixel color value and compare it to an expected value
      • probe {rgb, rgba}

probe rgb 1 1 0.0 1.0 0.0

    • Probe all pixel values.
      • probe all {rgb,rgba}

probe all rgba 0.0 1.0 0.0 0.0

Or even more complex commands:

    • Load data into a vertex array object and render primitives using that vertex array data.

[vertex data]
 1.0  1.0  1.0
-1.0  1.0  1.0
-1.0 -1.0  1.0
 1.0 -1.0  1.0

draw arrays GL_TRIANGLE_FAN 0 4

    •  Relative probe pixel color

relative probe rgb (.5, .5) (0.0, 1.0, 0.0)

    • And much more:

[fragment program]
OPTION ARB_fragment_program_shadow;
TXP result.color, fragment.texcoord[0], texture[0], SHADOWRECT;

texture shadowRect 0 (32, 32)
texparameter Rect depth_mode luminance
texparameter Rect compare_func greater

Just check other tests to figure out if they are doing something like what you want to do in your test and copy the interesting bits!

Save the file with a good name describing briefly what it does (example’s filename is stream-negative-value.shader_test) and place it in the corresponding subdirectory. For this example, the proper place is under GL_ARB_gpu_shader5 test directory and linker subdirectory because this is a linker test: tests/spec/arb_gpu_shader5/linker/

In order to execute this test with tests/ profile, you don’t need to do anything else. A script will find this test file and call shader_runner binary with the filename as a parameter.

In case you want to run it by yourself, the command is easy:

$ bin/shader_runner tests/spec/arb_gpu_shader5/linker/stream-negative-value.shader_test

Next post will cover the creation of GL binary tests within the piglit framework. I hope this post and next one will help you to contribute to piglit!

May 26, 2015

So we just got the second Fedora Workstation release out the door, and I am quite happy with it, we had quite a few last minute hardware issues pop up, but due to the hard work of the team we where able to get them fixed in time for todays release.

Every release we do is of course the result of both work we do as part of the Fedora Workstation team, but we also rely on a lot of other people upstream. I would like to especially call out Laurent Pinchart, who is the upstream maintainer of the UVC driver, who fixed a bug we discovered with some built in webcams on newer laptops. So thank you Laurent! So for any users of the Toshiba z20t Portege laptop, your rear camera now works thanks to Laurent :)

Having a relatively short development cycle this release doesn’t contain huge amounts of major changes, but our team did manage to sneak in a few nice new features. As you can see from this blog entry from Allan Day the notification area re-design that he and Florian worked on landed. It is a huge improvement in my opinion and will let us continue polishing the notification behavior of applications going forward.

We have a bunch of improvements to the Nautilus file manager thanks to the work of Carlos Soriano. Recommend reading through his blog as there is a quite sizeable collection of smaller fixes and improvements he was able to push through.

Another thing we got properly resolved for Fedora Workstation 22 is installing it in Boxes. Boxes is our easy to use virtual machine manager which we are putting resources into to make a great developer companion. So while this is a smaller fix for Boxes and Fedora, we have some great Boxes features lining up for the next Fedora release, so stayed tuned for more on that in another blog post.

Wayland support is also marching forward with this release. The GDM session you get upon installing Fedora Workstation 22 will now default to Wayland, but fall back to X if there is an issue. It is a first step towards migrating the default session to Wayland. We still have some work to do there to get the Wayland session perfect, but we are closing the gap rapidly. Jonas Ådahl and Owen Taylor is pushing that effort forward.

Related to Wayland we introduce libinput as the backend for both X and Wayland in this release. While we shipped libinput in Fedora 21, when we wrote libinput we did so with Wayland as the primary target, yet at the same time we realized that we didn’t want to maintain two separate input systems going forward, so in this release also uses libinput for input. This means we have one library to work on now that will improve input in both your Wayland session and X sessions.

This is also the first release featuring the new Adwaita theme for Qt. This release supports Qt4, but we hope to support Qt5 in an upcoming Fedora release and also include a dark variant of the theme for Qt applications. Martin Briza has been leading that effort.

Another nice little feature addition this release is the notification of long running jobs in the terminal. It was a feature we wanted to do from early on in the Fedora Workstation process, but it took quite some while to figure out the fine details for how we wanted to do it. Basically it means you no longer need to check in with your open terminals to see if a job has completed, instead you are now getting a notification. So you can for instance start a compile and then not have to think about it again until you get the notification. We are still tweaking the notifications a little bit for this one, to make sure we cut down the amount of unhelpful notifications to an absolute minimum, so if you have feedback on how we can improve this feature we be happy to hear it. For example we are thinking about turning off the notification for UI applications launched from a terminal.

Anyway, we have a lot of features in the pipeline now for Fedora Workstation 23 since quite a few of the items planned for Fedora Workstation 22 didn’t get completed in time, so I am looking forward to writing a blog informing you about those soon.

You can also read about this release in Fedora Magazine.